Christopher F. Rufo

Claudine Gay’s Data Problem

Interview: Danish data scientist Jonatan Pallesen discusses the serious questions that remain about the former Harvard president’s scholarship.

/ Education, The Social Order

/ Eye on the News / Education, The Social Order

The scandals at Harvard have revealed an ugly truth about America’s elite institutions: they have been unwilling, or unable, to self-correct. Consequently, it has been up to outsiders to critique Harvard from beyond its walls. In recent weeks, Christopher Brunet, the Free Beacon’s Aaron Sibarium, and I published exposés on Claudine Gay’s plagiarism, while investor Bill Ackman and Congresswoman Elise Stefanik raised questions about Gay’s failure to address anti-Semitism and to uphold basic standards.

Another one of these outsiders is Jonatan Pallesen, a Danish data scientist who has raised new questions about Claudine Gay’s use—and potential misuse—of data in her Ph.D. thesis. Pallesen has a Ph.D. in statistical genomics from Aarhus University, served as a visiting researcher at University of California, Berkeley, and is now a lead data scientist for Dansk Industri, the Danish trade association.

I spoke with Pallesen about these anomalies and how they might further compromise Gay’s scholarly record. Though she has resigned from the presidency, Gay has retained her Harvard professorship, at a reported salary of $900,000 per year, which, in light of the compounding violations of academic integrity, should be scrutinized. Would another scholar at Harvard with similar problems be retained at all? Or does Gay’s intersectional status still grant her special protections?

This interview has been lightly edited for length and clarity.

Christopher Rufo: Tell me about the problems you’ve discovered with Gay’s Ph.D. thesis, and with her related 2001 paper, “The Effect of Black Congressional Representation on Political Participation.”

Jonatan Pallesen: The thesis and the paper claim to find that the election of black representatives causes a reduced white voter turnout. But what they show is only a correlation, not a causal relationship.

To better understand this issue, consider a simple example: the relationship between the presence of black representatives and factors such as average income, the proportion of renters, and black population density. There is also a correlation here, but it would be incorrect to conclude that electing black representatives directly causes higher population density. Instead, it is more likely that areas with a higher black population density have a greater tendency to elect black representatives.

The paper is like this, just with one extra step. In step one, Gay employs a method known as “ecological inference” to estimate white voter turnout. This estimation relies on data such as the total votes cast per precinct, as well as information about the precinct, including average income and the other previously mentioned factors. In step two, a regression shows a correlation between this estimate and black representatives. The paper concludes that black representation has a causal effect on white voter turnout, based on this correlation.

But this has the same basic problem as the simple example. Factors like black population density are likely to influence the tendency to elect black representatives. And since they, by construction, also influence the estimate of the white turnout, this leads to a correlation in the data—without any causal effect from the election of black representatives.

This is very basic. For many people who work with data, such considerations about possible alternative hypotheses are the first thing we think about. But for some reason it was not considered in the paper, which means that the conclusion it makes about causality is invalid.

Rufo: Christopher Brunet has reported that Gay refused to provide her raw data to researchers who wanted to verify or attempt to replicate her work. Some have raised the question of whether she manipulated the data. What do you think?

Pallesen: One thing people have noticed is that the results for Missouri in the paper are highly significant, even though this was not the case in her Ph.D. thesis, which looked at the same data. In the methodology section, she describes testing a number of different models, and ultimately selecting model seven, in which the variables are dichotomized. This approach would give ample opportunity for “p-hacking,” choosing the model with the preferred results.

Another thing worth mentioning is that the data point of Illinois was not included in the 2001 paper, even though it was included in the Ph.D. thesis. This is especially notable because Illinois had two districts with results contradicting the hypothesis. The paper provides no explicit reasons for excluding Illinois, apart from a vague statement mentioning the criteria for selecting the eight states in the analysis, based on “geographic diversity and variation in district type.” It is obviously not good scientific practice to exclude data points that go against your hypothesis.

In any case, Gay should definitely make this data public. This should be standard scientific practice to begin with, and she has a responsibility to set an example. Additionally, in light of the fraud scandal involving Francesca Gino at Harvard Business School, there is an even greater urgency for an open data policy among researchers.

Rufo: If Gay’s errors are so fundamental, how did they pass through the peer-review process? How did they earn her tenure at America’s most prestigious universities?

Pallesen: Peer review is simply not that good. There are often issues that are not caught. Even so, I am still surprised that something this egregious was not noticed. It is important that we see science as an ongoing process, and not as one that concludes with peer review.

Regarding how she earned her tenure, it is worth considering that her scientific output for tenure was thin, even when no problems had been pointed out with this paper or with plagiarism. It is well known that universities give preferential treatment to people based on their race and gender, instead of basing their selection process on merit.

It is also worth considering whether the plagiarism could be a symptom of more than sloppiness. Some scientists have wondered why she didn’t just write her own dry science prose. One possibility is that she may not fully comprehend the scientific nuances in the topics she’s writing about. In such cases, there might be a greater temptation to plagiarize, to ensure the avoidance of inaccuracies. It’s noteworthy that several of the plagiarized segments are found in sections involving statistical inferences.

Rufo: Why have American scholars been so hesitant to criticize Gay? What does this say about the state of academia as a whole?

Pallesen: There is a high level of bias within the scientific community, with a vast majority of researchers having left-leaning political views. Research that aligns with woke claims tends to find easier acceptance. It is quite extreme when you think about the edge cases. Claudine Gay does research that is simultaneously plagiarized, p-hacked, and based on an obviously flawed approach, and gets promoted to president of Harvard. Meanwhile, non-woke white male researchers, such as Bo Winegard, do meticulous research and are fired.

Raising these issues, which involve critiquing the research of others, can be uncomfortable. But when individuals are elevated to influential positions with far-reaching effects on society, such as the presidency of Harvard, the issues must be raised.

Christopher F. Rufo is a senior fellow at the Manhattan Institute, a contributing editor of City Journal, and the author of America’s Cultural Revolution.

Photo by Suzanne Kreiter/The Boston Globe via Getty Images

Donate

City Journal is a publication of the Manhattan Institute for Policy Research (MI), a leading free-market think tank. Are you interested in supporting the magazine? As a 501(c)(3) nonprofit, donations in support of MI and City Journal are fully tax-deductible as provided by law (EIN #13-2912529).

Up Next

- article

Claudine Gay’s Data Problem

Further Reading

Up Next