Be sure to receive our expert commentary on racial preferences and other issues. Sign up for the City Journal newsletter today.

Read more of our affirmative action and preferences coverage here.

In Students for Fair Admissions v. Harvard, the Supreme Court effectively bans affirmative action as it’s known and practiced today. But if universities discreetly continue giving an admission advantage to students of certain races, how would anyone know? How does one prove discrimination without a smoking gun?

We can get a glimpse of the answer by examining the Harvard case as it was originally argued a few years back. Back then, the case largely focused on whether Harvard discriminated against Asian-American applicants. Specifically, Harvard had been accused of rejecting Asian-American applicants who were at least as qualified as white applicants who got accepted. Harvard admitted that it indeed used affirmative action to boost underrepresented groups, but it denied treating Asians any differently from whites.

By the time the case reached the Supreme Court, the central issue was broader: whether universities could even use race as one of the factors driving the decision of whom to admit or reject. In the end, the Supreme Court, with a 6-3 majority, rewrote the rules for race-based affirmative action from the ground up—invalidating practices that Harvard openly admitted to, that are widespread at selective universities across the country, and that the Court itself had sanctioned in previous cases, such as giving a boost to black applicants in the interest of improving campus diversity.

Students for Fair Admissions leaves behind a paper trail that provides a fascinating look into how, exactly, to prove whether discrimination exists in the college-admissions setting. To make their cases, both the plaintiffs and Harvard ended up hiring expert economists to analyze the school’s admissions data. The evidence summarized in the expert reports offers a valuable lesson for anyone hoping to prove discrimination based on data and, indeed, for anyone trying to decide how trustworthy are statistical findings of any kind.

Start with a basic tool of social science. “Regression analysis” is beloved by social scientists of all persuasions. In theory, one can use it to detect discrimination in real-world contexts. A regression model relates some outcome, such as whether a person is hired, to the various factors that enter the employer’s hiring decision.

Suppose a business hires workers based mainly on two factors: their score on an occupational test, and their years of experience in similar jobs. If this is indeed the way that the employer makes hiring decisions, then it would be easy to predict whether a new applicant would be hired or not from data on past hiring decisions. A regression analysis would tell us that each additional point on the test increases the chance of hiring by X percent, and that an extra year of experience increases the chance by Y percent. That model can then be used to predict whether the candidate will end up hired.

If the business is accused of racial discrimination, a variable for the applicant’s race can be added to the model. This will show whether the historical data indicate that, say, a black applicant has lower chances of getting hired even if he has the same test score and experience as a white applicant. In other words, the regression model will enable researchers to estimate the difference in the probability of hiring between black and white applicants who are otherwise identical. Since test scores and experience are included in the regression model, they’re “held constant” in calculating the effect of being black. When someone says they “controlled for” a variable in a regression model, this is what they mean.

That’s a simple, straightforward example. In reality, hiring decisions involve countless variables, some of which may not even be recorded in the data set available to researchers. Arguments often break out over whether some variables are actually mechanisms of discrimination. These issues have plagued the study of discrimination from the beginning.

If our occupational test were specifically designed to screen out black applicants by including culturally unfamiliar material, for example, the comparison of black and white job applicants who received the same test score conceals the presence of discrimination. The regression model that “controls for” test scores would be comparing the typical black who performs poorly because the test is stacked against blacks with whites who perform poorly because they are not qualified. Because low-scoring applicants don’t get hired, that comparison makes it seem as if there is no racial gap in hiring. But that would be a very misleading inference because the test is the mechanism through which discrimination happens. If the test were racially biased, a regression without the test-score variable would do a better job of measuring the extent of anti-black bias in the hiring process.

The competing expert reports in the Harvard case, filed in 2018, illustrate similar difficulties in proving that discrimination exists and, more generally, in analyzing complicated data collected from real-world situations. The dueling reports show how what seem like similar statistical approaches on the surface can produce alarmingly different results in the hands of different researchers.

A general lesson can also be drawn from these expert reports. Almost always, the evidence depends heavily on the assumptions that researchers make to analyze the available data. Different assumptions, as we will see, can lead the data to “say” radically different things. The assumptions often reflect a researcher’s perception of what is proper to do in a certain context. Inevitably, different researchers have different opinions of exactly how a particular question should be framed or addressed. And these opinions play a role in determining the answer to the specific question that the researcher is examining.

To complicate matters, in the case of researchers preparing litigation reports, it makes sense to suspect that the assumptions made to analyze (or manipulate) the data may be influenced by the testimony that the researcher is expected to give. When the two sides in a case disagree on what counts as the “truth,” that disagreement determines which experts a particular side will rely on. Whom they trust will depend on the assumptions the experts have used in related previous research or previous cases. And each side must then decide how to frame and rationalize those assumptions in a way that makes it easy to sell their particular “truth” to the judge and the jury.

In the Harvard case, the opponents of affirmative action chose Duke economist Peter Arcidiacono, while the university chose Berkeley economist David Card. Their work spans two expert reports, two rebuttal reports, and a total of well over 500 pages. But one table, mostly drawn from Card’s rebuttal, speaks volumes, clarifying what’s really going on. (The underlying methodological debate is highly consequential, but also mundane.)

The table, presented below, gives a stark illustration of how a series of subtle changes to a statistical model can radically change the result. The appearance is of a magic trick in which a finding of anti-Asian bias disappears—or, read in reverse, in which the absence of any sign of bias whatsoever transforms into an alarming racial disparity.

Description of modelGap in probability of admission between Asian-Americans and whites
(percentage points)
A. Actual gap (includes the ALDC applicants*)-2.10
B. Arcidiacono’s preferred model, excludes ALDC applicants and controls for academic and socioeconomic background-1.02
C. Beginning with Arcidiacono’s preferred model, Card adds the ALDC applicants back in (plus minor technical tweak of estimating the model separately each year)-0.79
D. Card adds the applicant’s personal rating-0.38
E. Card adds parental occupation-0.19
Bold indicates statistical significance. * ALDC applicants refers to a group that gets preferential treatment in admissions, composed of recruited athletes, legacies, dean’s list applicants, and children of faculty and staff.
Source: Row A, Arcidiacono Expert Report, Table 2.1; all other rows, Card Rebuttal Report, Exhibit 13.

To begin with, Row A of the table reports the “raw” data, unaffected by any researcher input. Harvard provided both sides in the case with data that contained detailed information for each applicant for undergraduate admission for the classes of 2014 through 2019, including academic and personal information, test scores (such as the SAT), and whether the applicant was accepted or rejected. Harvard’s data indicate that white applicants had a probability of admission of 8 percent, and that applicants of Asian-American ethnicity had a 5.9 percent probability of admission, resulting in an “admissions gap” of 2.1 percentage points.

This kind of “raw” gap, however, may say almost nothing about whether Harvard discriminated against Asian-American applicants. Why? Consider the voluminous literature that economists have developed for measuring labor-market discrimination. It is certainly the case that, on average, women get paid less than men and blacks get paid less than whites. But it’s not necessarily clear that these wage gaps reflect discrimination. In the case of the female–male wage gap, for example, it would seem important to control for the possibility that women may be more likely to work part time or in different types of jobs before concluding anything about gender discrimination. Similarly, with the black–white wage gap, it would seem important to control for differences in the amount of schooling that the two groups have. The purpose of these adjustments, widely used in discrimination studies, is to estimate the wage gap among workers who are as similar as possible: those with equal levels of schooling, who work in the same types of jobs, and so on. Only then would measured discrimination reflect “unequal pay for equal work.”

In the Harvard context, then, a raw gap in the probability of admission among different groups, or even a lack of such a gap, tells us little. No one expects Harvard to admit students by a random lottery. Legitimate criteria, such as academic qualifications, doubtless explain much of the reason that some students are admitted and others not. It would not be shocking if these academic qualifications differ, on average, across groups, accounting for some of the group differences in admission rates.

Arcidiacono and Card both agree that one must control for these legitimate criteria. Their expert reports are essentially empirical exercises in looking for disparate treatment between Asian and white students with similar observable qualifications. But they disagree as to which differences between the groups are legitimate to control for, and how exactly to account for these differences statistically. The different approaches lead to dramatically different findings.

Row B in the table shows the “adjusted” admissions gap in the model that Arcidiacono prefers. Note that the acronym “ALDC” refers to recruited athletes, legacies, dean’s list applicants, and children of faculty and staff—in other words, applicants likely to get preferable treatment. In his favored model, Arcidiacono restricts the data to non-ALDC applicants because he sees these “special recruiting categories” as, basically, a separate beast. In his view, Harvard pays such careful attention to these applicants, and values them so highly, that it is unlikely to discriminate racially when contemplating their applications—leaving the task of achieving its desired racial balance to the rest of the applicant pool, which is where one should look for discrimination.

Using the non-ALDC sample, Arcidiacono then controls for differences in such factors as academic qualifications, SAT scores, gender, and socioeconomic background, including parental education and whether the applicant is a first-generation college student. These adjustments to the data reduce the Asian admissions gap to 1 percentage point, a gap that is statistically significant (meaning that it is very likely that an admissions gap indeed exists between the two groups). Arcidiacono’s preferred model ends up indicating that Harvard’s admission process is stacked against Asian-American applicants: in other words, that Asian-American applicants are less likely to be admitted than equally qualified white applicants.

Though a 1-percentage-point difference in the probability of admission due to being Asian-American seems numerically small, it is not trivial, given how low Harvard acceptance rates are to begin with. The probability of admission for non-ALDC Asian-Americans is 4 percent. The finding of a 1 percentage point gap due to discrimination means that the admission rate for these students would have risen by 25 percent if Harvard had treated these Asian-American applicants the same as white applicants.

Card, by contrast, believes that the correct way to measure discrimination should include the ALDC applicants, and that the regression model should then simply add the ALDC status of an applicant as a control variable, reflecting the fact that those applicants have a built-in admission advantage. As Row C of the table shows, the addition of the ALDC applicants (and a minor technical tweak of allowing the impact of the set of control variables to differ across admission years) shrinks the Asian–white differential slightly, to about 0.8 points.

It is at this point that Card finds two variables that purportedly demolish the case that Harvard discriminates against Asian-American applicants. The Harvard admissions process requires that the officer reviewing the file rate the applicant (on a scale of one to four) on four dimensions: academic, extracurricular, athletic, and personal. In theory, the personal rating is a single number summarizing non-academic qualities that the admissions officer gleans from the paper trail in the applicant’s file—traits including personality, likability, courage, and kindness, as one court filing from the plaintiffs’ lawyers noted.

In Row D, Card controls for this rating. It turns out that Asian-American applicants tend to get lower personal ratings than white applicants. About 24 percent of white applicants get a 2 or more in that rating, but only about 19 percent of Asian-American applicants do. (Asian applicants do quite badly in the athletic rating but excel in both the academic and extracurricular ratings, surpassing the ratings given to white applicants.)

Because a regression model that controls for personal ratings effectively compares the typical Asian-American applicant with a white applicant who also got a low personal rating and who would then be less likely to be accepted, it is not surprising that controlling for this single factor cuts the admissions gap by half, to 0.4 percentage points. In other words, a large part of the admissions gap between Asians and whites can be traced directly to the fact that Harvard admission officers give Asian applicants a lower personal rating.

Arcidiacono did not adjust for differences in the personal rating among applicants. The personal rating, he pointed out, may be a mechanism through which Harvard discriminates against Asians. In fact, Arcidiacono showed that Asians tend to get poor personal ratings from the admissions officers, despite excelling on more objective measures, and despite the fact that alumni interviewers who actually met the applicants (as opposed to the Harvard admissions officers who only read the file) actually scored “Asian-American applicants higher on the personal rating than African-American and Hispanic applicants and only slightly lower than white applicants.”

Card, though, argues that personal rating scores are an important part of the admissions process, and that Arcidiacono’s analysis can’t account for all the subjective factors that contribute to the personal rating score. For instance, the data that Harvard made available to Arcidiacono and Card did not contain the personal essay submitted with the application. It seems reasonable to suspect that the personal rating could partly reflect personal traits that jump out of an applicant’s essay.

This back-and-forth argument shows why it is so hard to determine if discrimination really exists. On the one hand, the personal rating may indeed be measuring an unobserved and unmeasurable aspect of the application that only the admissions officer is privy to. On the other, the rating gives admissions officers an awful lot of discretion to reward or penalize specific groups, such as scoring written personal essays in ways that further a different goal.

But even granting Card’s argument that the personal rating is a legitimate measure of the candidate’s quality, we are left with a gap of 0.4 percentage points, which is smaller but statistically significant. Evidence would still suggest, therefore, that Asian-Americans, as a group, face a built-in discriminatory disadvantage in getting admitted to Harvard.

The next step in Card’s analysis (Row E) does away with the statistical significance altogether. Card adds a single variable to the regression model: parental occupation. This variable reduces the gap by half again, from 0.4 to 0.2. More important, it makes the difference between Asian Americans and whites no longer significant in a statistical sense. In other words, the margin of error around the 0.2 gap is so large that one cannot reject the proposition that the true admissions gap equals zero. Card’s model no longer finds any statistically valid evidence of discrimination against Asian-Americans. In fact, with some further adjustments to the model (which we ignore for the sake of focusing on the important modeling choices), Card manages to estimate admission gaps that are even closer to zero numerically—ranging from a 0.05 percentage point penalty to a 0.02 percentage point bonus—and are all statistically indistinguishable from zero.

As with the personal-rating score, the question becomes: Should an analysis that purports to document the extent of discrimination in Harvard admissions adjust for differences in the occupation of the applicants’ parents? Arcidiacono’s preferred model already controls for such socioeconomic background variables as parental education. Moreover, he documents that the parental occupation data seem unreliable and contain odd fluctuations from year to year. For instance, 1,097 applicants in the 2014 admission class had fathers who worked in low-skill occupations, but strangely that number fell to only 37 by the time of the 2015 class. Similarly, the 2014 data report exactly zero fathers who were self-employed, but that number jumps to 2,134 fathers in the 2015 data.

Card counters that “of the available variables that reflect socioeconomic status in Harvard’s data, [parental occupation] contains the most detailed information.” Without having access to the actual data, it is hard to infer why the parental occupation variable matters so much or why it seems to be so error-prone.

Nevertheless, both sides can question these research choices. To the extent that family socioeconomic status matters in Harvard’s admission policies (and one would think that it does because Harvard makes an effort to recruit applicants from families with lower socioeconomic status), the parental occupation variable would seem an obvious candidate for inclusion in the regression model, in addition to the other background variables in Arcidiacono’s preferred model. At the same time, the measure of parental occupation provided by Harvard to the experts is so volatile that it seems obvious something is wrong with it. Wouldn’t it be best, then, simply to leave it out of the analysis?

The preparation of an empirical research report in social science is analogous to a trek down a road with many forks. The choice made at each juncture—should the calculations include this subsample of persons? Should the analysis control for this variable?—can play a role in determining the answer to the research question. Some of the choices may not matter much in the end, but other choices are equivalent to ensuring that the research project reaches a specific, predetermined endpoint. Far too often, these choices are described in dense footnotes that only the aficionado can understand or appreciate, leaving the typical reader wondering why different studies that analyze exactly the same question and use exactly the same data reach such different conclusions.

The Harvard case provides a classic example of the role played by the choices made by social-science researchers. Our goal here isn’t to say which side is right, but to highlight all the subjective judgments a researcher must exercise when analyzing data in a high-stakes discrimination lawsuit.

How would you measure bias in this case? Would you include the personal ratings? Focus only on non-ALDCs? Control for differences in parental occupation even if the data are flawed? The Harvard case may be over for now. But judges and juries will almost surely be asking themselves such questions more often in the future, as the legal system processes what is likely to be an avalanche of cases alleging that colleges continue to discriminate based on race, despite the Supreme Court’s prohibition.

Photo: ineskoleva/iStock


City Journal is a publication of the Manhattan Institute for Policy Research (MI), a leading free-market think tank. Are you interested in supporting the magazine? As a 501(c)(3) nonprofit, donations in support of MI and City Journal are fully tax-deductible as provided by law (EIN #13-2912529).

Further Reading

Up Next