Loan Pricing

How a Bank Can Get in Trouble with Fair Lending Statistical Analysis

At the Independent Bankers Association of Texas (IBAT) Lending Compliance Summit in April, 2014 and at a lecture on compliance at the Southwest Graduate School of Banking (SWGSB) alumni program in May, there was a great deal of discussion about the statistical analysis that the FDIC is doing to test for disparate treatment under the Fair Lending (Reg Z) laws. One participant said incredulously "we do 80% of our loans to Hispanic borrowers, and they [FDIC] are telling us that we're discriminating against Hispanic borrowers." She assumed that because her bank does a large number of loans to Hispanic borrowers, that it is impossible that the bank could have disparate treatment. I'll use an example to show that the participant's assumption is wrong--a bank can have disparate treatment even though the bank does a majority of its lending to Hispanic borrowers. It can actually be easier to have disparate treatment problems when most of a bank's loans are to a protected group and a minority of the bank's loans are to non-Hispanic whites.

Another person told of an encounter with a regulator where the regulator stated that if the bank were engaging in disparate treatment that the regulators would find and punish it. I would not have that much confidence; disparate treatment can exist but not rise to a level where it will be identified by statistical methods; this can be easily demonstrated.

Another person described a bank officer who was so appalled and stressed that his bank was being cited for disparate treatment, that he had a heart attack--he felt that he was being accused of being racist. It is possible--even probable--that a bank can have a very real disparate treatment problem without anyone on the staff being remotely racist. I'll describe a scenario where the cause is not remotely racist, but where the pricing is undeniably disparate and unfair.

A fourth discussion cited an instance where a bank was cited for disparate treatment, but the penalty was approximately $1,000 to cover a large number of Hispanic borrowers. It is possible to have statistically significant disparate treatment that is not financially material. The examples in the article that follows will illustrate financial materiality and illustrate different ways to determine whether or not a pricing difference is material. Finally, it struck me as unusual that all of the disparate treatment questions discussed involved Hispanic borrowers and that none involved blacks (the Census term). The method that is presumed to be used by the FDIC for identifying a borrower’s race/ethnicity is more accurate at identifying Hispanic borrowers than non-Hispanic whites or blacks; the variability that this introduces makes it less likely that disparate treatment would be identified for a black population. This will be illustrated with an example. The same inaccuracy in racial/ethnic identification that make a finding of disparate treatment for blacks unlikely also makes it makes it possible that a bank could be subject to a finding of disparate treatment of Hispanic borrowers when perfect racial/ethnic identification would not trigger a finding of disparate treatment. This will be illustrated with an example. The article is a grossly simplified and contrived series of examples that are intended to illustrate both the power and weakness of statistical analysis for disparate treatment in fair lending. It will hopefully allow bankers and regulators--neither of whom commonly have statistical training--to have more useful discussions about the statistics involved in a finding of disparate treatment. The article touches on many controversial subjects; if you don't like controversy, stop reading now. To make the discussion easier to follow and more precise in conjunction with other publications, I identify racial groups by the terms used in the Census, rather than terms which may be more common in current journalism style sheets. The examples are contrived but hopefully illustrate what may be a common scenario in banking and society that can be understood and discussed. The examples are constructed to illustrate how statistics can and cannot be applied to the analysis of disparate treatment. There is some discussion of the statistical use of the term significant and the popular use of the term “significant” which more commonly is used with the meaning of the accounting term material or the common word “frequent.” A bank can have disparate treatment that is statistically significant while at the same time such treatment is neither financially material nor frequent. This article is the second in a series on Fair Lending disparate treatment analysis. You should also read Preparing for a Fair Lending Examination Statistical Analysis. The article is divided into the following sections: Example Loan Portfolio The loan portfolio for these examples has 1000 borrowers, of whom 80 percent are Hispanic borrowers and 20 percent are non-Hispanic white borrowers. To take out all of the complications of differences in credit scores and other credit-worthiness metrics, we assume that all borrowers’ credit-worthiness is absolutely the same in all respects. The only difference is that some are Hispanic and some are non-Hispanic white. The interest rates are assigned to the whole population using a random number generator with a normal distribution of specified mean and standard deviation that differs from example to example. Example 1: How Many Preferential Loans are Needed When Loan Pricing is Variable and Preferential Rate is 1.5% Better? For the first example, we start out with a loan portfolio where the average interest rate is 15.00% for both Hispanic borrowers and non-Hispanic whites. In this bank, there is a lot of negotiation on loan rates, and thus a lot of variability. We will assume that one loan officer is non-Hispanic white and goes to a church that is overwhelmingly non-Hispanic white. Since many churches are very closely tied to ethnic groups--Greek and Russian Orthodox congregants are predominately of Greek and Russian descent, Lutherans are predominately of German or Scandinavian descent, Presbyterians are predominately of Scottish or Scots-Irish descent, Episcopals are predominately of British descent, African Methodist Episcopal church members are predominately African American--it is reasonable to assume that the vast majority of the congregants at the loan officer’s church are similarly non-Hispanic white. The loan officer knows several congregants well and knows far more about how they spend (or don't spend) money than one could ever tell from a loan application. He feels that they are unusually good credit risks. He knows that some of them struggled for several years but managed to pay off substantial medical bills. If this loan officer were to give a preferential interest rate to these congregants--1.50% better--how many preferential loans would he have to make before the bank has a disparate treatment problem that is identifiable through statistics or in the language of statistics, is “statistically significant.” The first figure below shows the the distribution of interest rates from the loan portfolio before we apply the preferential interest rates, with the Hispanic distribution on the left and non-Hispanic white distribution on the right. The second figure shows the distribution after the loan officer gives enough preferential loans to detect it using a statistical test at the 0.01 level. This means that the probability is less than 0.01 (1%) that the Hispanic and non-Hispanic white groups are getting the same interest rate. It doesn't look a lot different, but you can find the differences if you study it for a moment. Statistics by itself only gives the probability that the two groups are getting the same interest rate; I've arbitrarily chosen to say that 0.01 (1%) is improbable enough that I will accept that the two groups are being treated differently. 0.01 (1%) is a stringent threshold for accepting or rejecting that two groups are different. For some types of analysis where the stakes are low--like a marketing direct mail response model--statisticians might use 0.10 (10%) probability of random occurrence (p-value) as the threshold while for medical drug efficacy it might be a much more stringent threshold of 0.02 (2%) or 0.01 (1%). The third figure shows how the probability drops as the number of preferential loans increases. 57 preferential loans would be statistically significant at the 0.01 level. This is useful in understanding that although findings of disparate treatment are pass/fail, regulators will almost certainly notice when a bank’s loan portfolio comes back with a p-value of 0.10 or 0.05. From a regulatory point of view, it would be efficient to use this information to allocate more resources to fair lending examinations for banks that scored a p-value that was near the level of enforcement on previous examinations. The fourth figure shows how the average interest rate for non-Hispanic whites drops as the number of preferential loans increases. In this example, it drops to 25 basis points to 14.75% at about 40 loans--the point where the preferential treatment becomes statistically significant at the 0.01 level. In this example, we know that all of the preferential loans indicate disparate treatment, but the threshold that I've chosen doesn't kick in until I've give preferential rates to 57 of the 200 non-Hispanic whites in the loan portfolio. With 0.01 as the standard of significance, a bank with only 55 preferential loans would get a pass, while one with 57 preferential loans would get a fail. This example illustrates two points: • A bank can have statistically identifiable disparate pricing even when the majority of the bank's loans are to Hispanic borrowers. • A large number of preferential loans can occur without rising to a level where the disparate treatment can be identified statistically. With the variability in this example, 28% of the non-Hispanic whites could receive preferential rates without rising to the level of detection at the 0.01 level. Statistically Significant vs. Financially Material In the scenario that I've described, the bank is clearly giving preferential treatment to the non-Hispanic white group that is statistically significant, but I don't think that anyone would describe the cause of preferential rates to long-known friends as intentionally racist--just grossly unfair to people who don't attend his church. Statistically significant has special meaning in statistical terminology--but is this case also “significant” as we use the term “significant” in everyday speech? More precisely, is this financially material? Let's assume that these are unsecured term loans for$1000.00 with a term of 12 months, and calculate the difference in interest paid between the normal and the preferential rate in a simplified estimate using just the average interest rate of the Hispanic borrowers and the average interest rate of the preferential loans as shown in Table 1. To simplify the example analysis, the time value of money will be ignored in estimating the value of the preferential pricing.

 Loan Amount $1000.00 Loan Term 12.00 months Normal Rate 15.00% Preferential Rate 13.50% Average non-Hispanic white Int Rate 14.73% Normal Cumulative Interest$83.10 Preferential Cumulative Interest $74.62 Individual Preference Value$8.48 Total Preference Value Given $483.10 Average Preference Value for non-Hispanic white Population$2.42 Average Preference Value as a Percent of Cumulative Interest 2.91% Total Preference Spread Across Hispanic Population $0.60 Total Preference Value if preferential rate extended to all Hispanic borrowers$6780.34 Number of Preferential Loans 57

As shown in Table 1, the preference value is $8.48 per loan, for a total value of the preference given of$483.10 . If this $483.10 were apportioned to all 800 Hispanic borrowers, they would each receive$0.60. If the preferential rate were extended to all 800 Hispanic borrowers, the value would be $6780.34. Since all 1000 people--including 800 Hispanic borrowers--have exactly the same creditworthiness, if any of the non-Hispanic white borrowers were eligible for the 13.50% rate, then all borrowers should be eligible for the 13.50% rate. What is Financially Material? At what point does the individual preference value become material? This is a subject for much discussion and debate. To provide context, most financial statements are in units of$1,000, so $1,000 will appear as a change on a financial statement. At most institutions, the staff would work for a considerable time before closing a teller workstation with a$1,000 discrepancy. If we take this as an arbitrary standard for the lowest amount that is financially material for a bank, how would this translate to a household? A $500 million bank generates$5,000,000 in income for a 1% ROA--what would this look like scaled to a household income? Figure 5 shows how a $1,000 discrepancy at a$500 million institution would scale to modest household incomes. By this measure, $10 would be material for a household with an income of$50,000 per year, and $5 would be material for a household income of$25,000 per year.

A colleague who is black reviewed this article and said, "it's not just that [bank loan prices]. It's everything." This conversation inspired another approach to determining whether or not a preference could be considered material. Out of this conversation came an approach to materiality that looks at the value of the preference as a percentage of the transaction, which is 2.91% in this case. If all purchases by a protected group were at a 2% higher price, what would be the aggregate affect--would 2% be material in the aggregate? Figure 6 shows 2% of income--an amount that is clearly material for all income levels.

In any case, annecdotes suggest that the FDIC has chosen an average rate difference of 0.25% as the threshold enforcement.

Example 5: The Danger Zone for Likely Enforcement

The anecdotal FDIC enforcement triggers of a 0.01 level of significance and a 0.25% difference in the interest rate between protected groups and non-Hispanic whites makes it possible to develop a chart estimating how many preferential loans of a particular size are likely to trigger an enforcement action. The FDIC has not made public statements of these triggers, so these assumptions are just that--assumptions. Do not depend upon the accuracy and correctness of this curve, as these assumptions could be completely wrong, and this simulation is a contrived two-group population rather than the complex populations that occur in real life. To the extent that the various assumptions are reasonable, Figures 19 and 20 give an estimate of the zone where banks are in danger of triggering an enforcement action. Above the curve enforcement action would be likely under these assumptions, while enforcement action is less likely as a portfolio moves farther below the curve. This isn't a hard line--this curve assumes the variability used in Examples 1 and 2 (in statistical terms, the standard deviation is 1.25%); with higher variability the curve will move up while with lower variability the curve will move down. Figure 19 shows all of the data points from 20 simulations; think of this as doing this test with portfolios from 20 different banks. Figure 20 is from the same data, but is drawn to show the 95% confidence interval for the enforcement boundary.

The key lesson from this analysis is that a small number of big “sweetheart deals” in a small pool of non-Hispanic white borrowers may be enough to cause significant problems for a bank during a Fair Lending examination.

Example 6: How Does Inaccuracy in Race/Ethnicity Estimation Alter Results?

All of the previous examples assumed that the identification of Hispanic and white non-Hispanic borrowers was perfect. In reality, we know that this is not perfect. The paper thought to describe the FDIC approach to estimating a borrower’s race and ethnicity is described in Using the Census Bureau’s Surname List to Improve Estimates of Race/ethnicity and Associated Disparities. The paper lists correlations of 0.7 for identification of black borrowers, 0.76 for non-Hispanic white/other and 0.82 for Hispanic borrowers which corresponds roughly to accuracy of 49%, 58% and 67% respectively. If through identification errors, non-Hispanic whites (some of whom have preferential loan rates) are mixed in with blacks or Hispanic borrowers who do not have preferential loan rates, how does this alter the number of preferential loans needed to reach a p-value of 0.01 and an average rate difference of 0.25%?

Figures 21 and 22 show the region where enforcement action is likely for four scenarios where the fraction of Hispanic correctly identified is varied from 100% to 70% and the fraction of white non-Hispanic borrowers is varied from 100% to 60%. Generally, as accuracy decreases the likely enforcement region relaxes upward and to the right.

Although this simulation is a binary example between two populations, it illustrates a possible cause for the anecdotal observation that there are few enforcement actions involving disparate pricing for blacks; the lower accuracy of the the racial/ethnic identification of blacks relaxes the enforcement region significantly to the right. From high to low, the relative accuracy of identification is Hispanic, non-Hispanic white and black. This makes it more likely that enforcement will occur for disparate pricing to a Hispanic population than to a black population.

The simulation was run 20 times with a different random number seed each time--think of this as running the simulation using portfolios from many banks. The discontinuities in the various curves (especially the 70%/60% curve) are due to the variability of the data. In a single bank, a small number of loans could move the bank from the non-enforcement region into the enforcement region.

Table 5 shows the number of portfolios from this simulation where the 70%/60% curve crosses below the 100%/100% curve; these a particular portfolios where an enforcement action might occur when perfect race/ethnicity identification would not cause it to occur. The inaccuracy in race/ethnicity assignment makes it much less likely that most banks would face enforcement, but for a small number of banks, enforcement could occur when it would not occur with perfect race/ethnicity identification.

 Total Simulations 20 Simulations where 70%/60% Identification Curve More Strict than Perfect 100%/100% Identification Curve 0 Percent Simulations with 70%/60% Identification Curve More Strict than Perfect 100%/100% Identification Curve 0%

The differences in Figures 23/24 and 25/26 illustrate the variability introduced by having a small non-Hispanic white population and a large Hispanic population. Figure 24 (varying correct identification of the Hispanic population) shows curves that are not smooth and which have some discontinuities, while Figure 26 (varying correct identification of non-white Hispanic population) shows curves that are very smooth. Incorrectly identifying a few Hispanic borrowers as non-Hispanic white results in significant changes in the interest rate distribution of the small non-Hispanic white population, while incorrectly identifying a non-Hispanic white as Hispanic does not significantly alter the interest rate distribution of the Hispanic population. The differences in these two figures emphasizes how a small number of preferential loans in a small non-Hispanic white population can quickly move a bank from the non-enforcement region to the likely enforcement region.

Discussion

The examples in this article are grossly simplified from a real case in a few ways:

• Some of the examples exactly identify Hispanic and non-Hispanic white borrowers. In reality, race and ethnicity are inferred using the borrower's surname and address in a process that is accurate for about 50-64% of borrowers from the four major racial groups (accuracy is lowest for blacks and highest for Hispanic borrowers). See Using the Census Bureau’s Surname List to Improve Estimates of Race/ethnicity and Associated Disparities for a discussion of how race and ethnicity can be estimated.
• All of the examples assume all borrowers are exactly equally creditworthy. In a real loan portfolio there would be numerous differences in the creditworthiness between individuals that would have to be corrected before doing any type of analysis for disparate treatment. The simple statistical tests used in this article could not be used for a real loan portfolio, although the concepts would be the same.
• All of the examples assume that 80% of the borrowers are Hispanic and 20% are non-Hispanic whites and tha the loan portfolio is exactly 1000 loans. The various curves would change significantly as the number borrower groups and proportions of borrower groups change.

Disparate Treatment Can Occur but Not Rise to Level of Enforcement

Figure 7 shows that if the preference is small, disparate treatment could occur frequently (though not necessarily materially) without rising to the level of enforcement using what is currently understood to be the FDIC threshold for enforcement. When this characteristic is combined with the inaccuracy of race/ethnicity identification for some groups, disparate treatment could be pervasive for some groups without rising to the level of enforcement.

Procedures Don't Distinguish Well Between Small but Widespread Preference and Large but Unusual Preference

Figures 7, 8, 9 and 10 show that there is no difference between a small but widespread preference and a large but unusual preference. Most people would view these cases very differently; most would agree that widespread small preference is probably more worthy of enforcement than infrequent large preference; in the latter case, the vast majority of non-Hispanic whites who didn't get the preferential rate will be just as angry as Hispanic borrowers that did not get the preferential rate.

Relative Inaccuracy of Race/Ethnicity Identification Reduces Likelihood of Enforcement But Can Cause Enforcement When Perfect Identification Would Not

It is clear in Figures 21 and 22 that the inaccuracy in race/ethnicity identification relaxes the enforcement region for most banks, but that it can lead to a small number of scenarios where enforcement could occur when perfect identification would not cause enforcement.

Figures 22 and 26 show that when the accuracy of identification is higher for Hispanic borrowers than non-Hispanic whites, there is a wide degree of overlap in the 95% confidence intervals for the perfect identification scenario and for the 70%/60% correct identification scenario. In Figures 21/22 and 25/26, it is clear that for some portfolios, enforcement would occur due to errors in identification when enforcement would not occur with perfect identification.

If this is the procedure used for race/ethnicity identification, then enforcement actions could occur in some circumstances where they should not occur. For banks that are aggressive in working to identify problems before an examination, this presents further frustration. The easiest way to estimate race/ethnicity is to purchase data from a third party provider. Since some third party data includes self-reported race and ethnicity, it is likely more accurate than the procedure thought to be used by the FDIC. With the more accurate identification, the bank's work might not find disparate treatment when the FDIC ends up finding disparate treatment due inaccuracy of the race and ethnicity estimation.

It is Unlikely that Disparate Treatment of Black Borrowers Would be Identified

The inaccuracy in the identification of blacks makes it unlikely that even widespread disparate treatment would rise to the level of enforcement. Blacks who are descended from slaves frequently have the same surname as the non-Hispanic white slaveholder; surnames do not provide useful information for race/ethnicity identification in this case, resulting in some number of black borrowers being mis-identified as non-Hispanic white borrowers. This makes it unlikely that pervasive disparate treatment of black borrowers would be identified.

Recommendations to Avoid Disparate Treatment

Given that churches, synagogues, mosques and many other religious and social institutions have very strong immigration histories and resulting race and ethnicity patterns, if a bank's workforce is disproportionately non-Hispanic white and the bank allows relationship-based price negotiation, the bank can unintentionally have a disparate treatment pricing problem that is statistically significant. If a bank’s workforce is predominately non-Hispanic white and the borrower pool for a product is predominately Hispanic, the bank is a higher risk for an unintentional disparate treatment problem. There are a couple of actions that a bank can take to avoid problems:

• Eliminate rate negotiation without exception.
• Work to change the lending workforce racial/ethnic composition to match the borrower pool racial/ethnic composition, and hope the all lenders maintain relationships within their respective racial/ethnic groups. If the racially mixed lending workforce maintains relationships with their income peers rather than racial/ethnic peers, this may do nothing to resolve the problem.
• Increase marketing and originations to black borrowers and other groups that will disproportionately be mis-identified as non-Hispanic white borrowers. The previous two items work to prevent disparate treatment while this item merely masks a problem; I'm not remotely comfortable making this recommendation, but it would have the effect of diluting the non-Hispanic white pool with more borrowers who did not receive a preference. This item is a highly undesirable side-effect of the error patterns in the race/ethnicity approach thought to be used by the FDIC.

Responding to a Finding of Disparate Treatment

If your bank is subject to a finding of disparate treatment there are a few things from an analytical standpoint that you should do in response:

• Make sure that you included ALL credit-worthiness data that your bank uses in underwriting as part of the dataset provided to the FDIC. If you only include a subset of the data elements used in underwriting, it is likely that a multi-variate analysis will result in an incorrect finding of disparate treatment when the use of ALL credit-worthiness data would not result in a finding of disparate treatment. If your bank gives rate breaks for automatic payments or presense of a checking account, include indicators for these accounts. Anecdotally, providing an incomplete dataset is a common self-inflicted problem that consumes significant time and money for both banks and regulators. Since the 70%/60% curve in Figure 22 is quite far to the right, it is likely that the anecdotal evidence is correct.
• Discuss this item with your attorney before proceeding. Perform a data enrichment with third-party race/ethnicity identification that is more accurate than that used by the FDIC, and perform an analysis to identify disparate treatment. If the more accurate race/ethnicity identification does not support a finding of disparate treatment, bring this to the attention of regulators. You may (but probably do not) have a portfolio where the specific characteristics of race/ethnicity identification cause a finding of disparate treatment when perfect identification would not.
• Examine your portfolio for frequent small-preference loans or unusual large-preference loans. If your portfolio has a small number of large-preference loans, your bank may have a insider fairness problem rather than a race/ethnicity disparate treatment problem. This still isn't a good thing, but is perhaps better than a finding of disparate treatment.

Conclusion

There is the old saying “there are lies, damned lies and statistical lies;” in discussions of regulatory use of statistics in Fair Lending analysis, bankers who avoid learning statistical terminology do themselves, their bank and their customers a disservice. A banker that ignores the issues of workforce racial and ethnic composition in a racially and ethnically self-segregated community does the bank and the bank’s customers a disservice. When a regulator uses inaccurate borrower race/ethnicity identification and arrives at a finding of statistically significant disparate treatment, the regulator should take steps to confirm that enforcement would not occur with perfect race/ethnicity identification.

The simulations discussed in this article show that it is disturbingly easy to have a statistically significant disparate treatment problem that is not material and that the enforcement region is very nebulous and can vary by a factor of 2 due to the misidentification errors in the procedures used to estimate race and ethnicity.

Future Research

Deriving a closed-form equation to estimate the probability of enforcement when perfect race/ethnicity identification would not cause enforcement would be useful, as would graphical models to illustrate the characteristics of more racially complex populations than were considered here.

print(proc.time() - startTime)

##      user    system   elapsed