Six years ago, Allegheny County, Pennsylvania, launched a program that used predictive analytics to help child-welfare workers sort through the thousands of calls they get about abuse or neglect yearly. These calls can range from a person reporting that a parent left a nine-year-old child in a car in front of the dry cleaners to a neighbor worrying that a parent’s drug use may be interfering with his or her ability to care properly for an infant. The program’s predictive algorithm then uses information that authorities already have about the family—for example, whether a child has chronically missed school, whether someone recently released from prison is living in the home, whether the family has been using its allotment of food assistance—to assign a risk score to the case. That score helps child-welfare workers engage in a kind of triage, determining which families need to be investigated most urgently.
A recent Associated Press article suggests, however, that this program—the Allegheny Family Screening Tool (AFST)—is reinforcing and even worsening racial disparities in the child-welfare system. Because the data used by algorithms are coming from a racist system, the argument goes, the results will also be racist. The article implies that other localities considering adopting such tools, including New York City, should think twice.
The AP’s report misrepresents the program, the data, and the research. The article, for instance, never discusses possible reasons for racial disparities other than bias—such as family structure, which differs dramatically by race and affects the likelihood of child maltreatment.
The AP reporters note that they got exclusive access to data from researchers at Carnegie Mellon, whose work shows that the racial disparities in child-welfare enforcement are getting worse. In a presentation called “How Child Welfare Workers Reduce Racial Disparities in Algorithmic Decisions,” Carnegie Mellon researchers Logan Stapleton and Hao-Fei Cheng compare decisions made solely according to artificial intelligence against those made with the input of both artificial intelligence and call screeners. It’s a strange comparison to make, since child-welfare officials never use the AFST without human input. Nevertheless, the authors conclude that the AFST, when used alone, increases racial disparities.
The evidence suggests, though, that the AFST, as it is actually used, is improving outcomes for children—and doing so, moreover, without increasing racial bias in child-welfare investigations. An April 2019 impact evaluation by the Allegheny County Office of Children, Youth and Families found that “Use of the tool led to an increase in the screening-in of children who were subsequently determined to need further intervention or supports. Specifically, there was a statistically significant increase in the proportion of children screened-in whose child-welfare case was then opened or, if no case was opened, were rereferred within 60 days.” In other words, the kids investigated with the new tool were more likely to be ones who really needed help. Moreover, no increase in referrals took place for the kids screened out with the tool. Predictive analytics helps screeners identify the children most in need of help.
According to the impact study, the AFST not only helped identify these kids but also “led to reductions in disparities of case-opening rates between black and white children.” The county shared with the AP subsequent research confirming these findings, but the AP chose to ignore these conclusions. The AP also chose to ignore the conclusion of the Carnegie Mellon researchers that “there was no noticeable change in the screen-in rate” after the screening tool was implemented.
It’s instructive that the Carnegie Mellon researchers decided to measure the AFST on two different scales originally—one for racial disparities and the other for “accuracy.” As the researchers explain, “What’s interesting here is if you look at the AFST and then versus the workers though, AFST actually made more accurate decisions. And this is consistent with prior work.”
The Carnegie Mellon researchers’ work raises an interesting question about algorithms: Why would we want to judge their success on any metric besides accuracy? Isn’t it the goal of predictive analytics to give us a better sense, in this case, of which kids are most at risk, rather than which decisions will make people feel better about racial outcomes? This isn’t to argue that we should make decisions based on algorithms alone. Even users of predictive analytics to measure the performance of, say, baseball players might also care whether a particular player gets into fights with his teammates or is beating his girlfriend.
The logic implicit in the AP’s reporting is not dissimilar to the controversy over the use of the SATs for college admissions. We can criticize the tests for producing racially disparate outcomes, but no serious scholar argues that they are bad at predicting the performance of college freshmen. If you want to say that you don’t care what the algorithm says—that freshman performance is not as important as racial diversity—fine, but that’s not what predictive analytics are for. And that goes double for situations concerning the well-being of children.