The Consumer Financial Protection Bureau (CFPB) proposed a method for identifying the race and ethnicity of consumers when the data are not available. The CFPB requires this information in order to monitor lending patterns and ensure that lending institutions are complying with fair lending laws that prohibit Title VII discrimination.
The problem is that these same lenders are generally prohibited by law from collecting demographic data. In particular, “auto lenders and non-mortgage lenders are generally not allowed to collect consumers’ demographic information.” This presents a conundrum to the CFPB as it wishes to examine these lenders for possible race and ethnic discrimination in loan amounts, terms, and conditions. To solve the problem, the CFPB proposes the use of a statistical algorithm, which incorporates the combination of surname and the geographic area of the borrower, to determine (or “estimate”) the race and ethnicity of borrowers.
In its 2014 study “Using publicly available information to proxy for unidentified race and ethnicity” the CFPB proposes the use of a statistical algorithm, the Bayesian Improved Surname Geocoding (BISG) proxy method, in analyzing the outcomes of lending decisions by race and/or ethnicity. According to the study, the CFPB believes that this method outperforms the possible assignment of race and ethnicity when surname or surname/geography are used individually rather than in combination. In support of this assertion, CFPB shows in its study that the assignment of race/ethnicity, on average, is more accurate using a combination of surname and geography. However, the details of the assignment of race/ethnicity on a per-borrower basis and the assumptions that must be made to use this type of information in a statistical analysis call into question the reliability of statistical findings of differences in loan terms, etc. based on the algorithm.
For example, suppose that a white, non-Hispanic male in the mid-West with a surname of Northern European descent is married to a female borrower of Hispanic descent. If the female borrower has taken the name of her husband then the race/ethnicity assignment will be tipped toward being white, non-Hispanic despite the fact that she is Hispanic. Further, the CFPB does not suggest a cut-off to the assigned probability of being in a certain race/ethnic group such that it will determine that the borrower is actually in that group. The CFPB study cautions against the use of a threshold rule since such a rule may introduce bias and loss of precision in the statistical tests between the racial/ethnic groups on loan terms, etc.:
The threshold rule removes the uncertainty about group membership at the cost of decreased statistical precision, with that precision deteriorating with decreases in the proxy’s ability to create separation across races and ethnicity. In situations in which researchers can obtain clear separation between groups—for instance, situations for which the probabilities of assignment tend to be very close to 0 or 1—the consequences of using a threshold assignment rule, beyond simple measurement error, would be minor. However, when insufficient separation exists—for example, when there are a significant number of individuals with probabilities between 20% and 80% of belonging to a particular group—the use of thresholds can artificially bias, usually downward, estimates of the number of individuals belonging to particular racial and ethnic groups and potentially attenuate estimates of differences in outcomes between groups. (p. 21)
However, hypothetically, CFPB could use a value of 80% or higher to designate an assignment of being African-American or Hispanic. Further, it is possible that the cut-off to the assigned probability could or should vary depending on the geographic area. While the CFPB’s proxy method may be an improvement on the surname only or geography only proxy methods, the lending community is still left with the question of how the CFPB will assign race/ethnicity to a particular borrower. The loss of statistical precision and potential bias in the estimates may or may not affect the analysis of a large, geographically diverse loan portfolio. However, the loss of precision may have a very large biasing effect on more geographically localized lending activities where the delineation of the race/ethnicity by surname is in the range of 20% to 80%, i.e. a range of uncertainty of the assignment.