Masks are back in San Diego, California, where the school board has just decreed that students must cover their faces or be barred from setting foot inside a classroom. Never mind that, per CDC statistics and Census Bureau population figures, more than 99.99 percent of children in California (where governor Gavin Newsom has regularly imposed mask mandates) and more than 99.99 percent of children in Florida (where Governor Ron DeSantis has let kids live mask-free) have not died of Covid—either because they haven’t gotten it, or because they’ve gotten it and survived it. Never mind that more than 99.99 percent of kids nationally have not died of Covid, either. And never mind that, again, based on CDC statistics, those over age 85 have had more than 2,000 times the chance of dying of Covid as those under age 18; that even those in their thirties have had 25 times the chance of dying of Covid as those under 18; and that, out of every 40 school-age kids (ages 5-17) who have died during the Covid era, only one of those deaths has involved Covid. Regardless, school officials have decided that everyone must mask up.

Nor are schools alone in returning to mask mandates. The military has been one of the most mask-happy of all institutions. Right on cue, the Navy announced that everyone, whether uniformed or not, must wear masks indoors on its bases in the San Diego area. Up the coast, Bay Area Rapid Transit has reimposed a mask mandate. Meantime, many colleges across the country have announced that they will be requiring masks this fall.

Such decrees ignore the facts that masks are physically uncomfortable, make it harder to breathe, and profoundly compromise human social interaction. But none of that matters to the mask zealots, who are convinced that benefits far outweigh any potential costs. So, where is the proof?

The nature of the public-health establishment’s embrace of masks is nicely captured in an article published last spring and currently posted on the website of the National Institutes of Health. The article, by Seán M. Muller, speaks of “the failure of randomized controlled trials (RCTs) to provide supportive evidence” that masks work to reduce viral transmission—a matter I discussed at length last summer.

Muller deserves credit for being more honest than most mask advocates. He notes that the World Health Organization said in March 2020 that “there is no evidence” that masks work, and he adds that “it was the absence of significant positive effects from RCTs prior to the pandemic that informed the WHO’s initial [anti-mask] stance.” Yet Muller laments the reliance on RCTs as opposed to “mechanism-based reasoning.” This is a fancy term for applying one’s own reasoning faculties. Muller’s reasoning leads him to be convinced that masks must work. But that, of course, is why we have RCTs: to test people’s notions about what works and what doesn’t.

Muller recognizes that people “may transfer infectious material by touching their faces with unsanitized hands to place and remove a mask,” but this important realization doesn’t seem to affect his conclusions. Instead, he writes, “Mechanism-based reasoning provides a justification for the stance ultimately advocated by the WHO and adopted by many countries.” He admits that the “logic” entailed in such reasoning “relies only on a fairly simple germ theory of disease.” Yet—incredibly—he then asserts that such reasoning “places the burden of proof on those who would argue against recommending masks.” So, even if RCTs provide no evidence for the claim that masks work, even if they continually suggest, on the contrary, that masks don’t work, then health officials should still recommend masks—and probably mandate them—because the claim that they work seems logical to some.

This is fundamentally anti-scientific. Yet it effectively captures the thinking that has animated mask mandates for more than two years now. This kind of thinking continues even though (as John Tierney has detailed) the remarkable similarity in Covid results between mask-mandate and mask-free states, and between mask-mandate and mask-free countries, strongly suggests that masks don’t work—just as RCTs have indicated they don’t.

The lone, slender scientific reed onto which mask advocates can grasp, at least in terms of RCTs, is a recent study from Bangladesh. Released well over a year after the CDC and others had already embraced masks wholeheartedly, the study claimed to find statistically significant benefits from surgical masks. The first author listed on that study, Yale economics professor Jason Abaluck, weighed in publicly on the mask debate before the study ever went into the field. In the early days of Covid, he opined that both the federal government and state governments should give out free masks and perhaps levy fines on those who refused to wear them. Unfortunately for mask advocates, the very small differences that the study found, and the questionable methodology on which those findings were based, provide little more scientific support for mask-wearing than does mechanism-based reasoning.

The Bangladesh RCT found that 1,086 people in the study’s mask group, and 1,106 people in the study’s non-mask control group, got Covid. Amazingly, these numbers did not come from the study’s authors—even though they provide the answer to the main question the study was addressing. Rather, Ben Recht, a professor of electrical engineering and computer science at the University of California, Berkeley, computed these numbers from those that the authors did release, and Abaluck subsequently confirmed Recht’s calculation of a 20-person difference between the two groups.

This 20-person difference (out of more than 300,000 participants) meant that about 1 out of 132 people got Covid in the control group, versus 1 out of 147 in the mask group. That equates to 0.76 percent of people in the control group and 0.68 percent of people in the mask group catching Covid—a difference of 0.08 percentage points—which the study’s authors prefer to describe as a 9 percent reduction. Abaluck and company also describe their study as having provided “clear evidence” that surgical masks work—even though those masks’ alleged benefit registered as statistically significant only after the researchers “adjusted” the ratio of how many people got Covid in each group by providing “baseline controls,” which they do not transparently describe. (That adjustment, however—and its necessity for achieving statistical significance—is plainly indicated.)

This reported difference of 0.08 percentage points tested as statistically significant only because of the massive sample size that the authors claimed, which allowed tiny differences to test as significant rather than being attributable to random chance. It is not at all clear, however, that this study could really produce such precision.

Imagine if researchers randomly divided 340,000 individuals, regardless of where they lived, into a mask group (170,000 people) or a non-mask control group (the other 170,000). One would assume that this random division would result in the two groups being very similar. That’s part of the essence of an RCT—that if you randomly assign enough people to one group or another, the two groups will end up being essentially alike simply by chance. It would be a very different thing, however, to assign two whole cities of 170,000 people into two groups, with each member of a given city going into the same group. In that case, it wouldn’t be clear whether any potential differences in outcomes would be due to the intervention (in this case, masks) or to the differences between the cities (in rates of virus exposure, cultural norms, and so on).

The Bangladesh study’s approach falls somewhere between these two scenarios. Its researchers randomly assigned 300 villages to its mask group (in which it encouraged mask-wearing) and 300 villages with similar characteristics to its non-mask control group (in which it didn’t encourage mask-wearing). Every member of a given village was assigned to the same group. As a result, Recht writes, “Though the sample size looked enormous (340,000 individuals), the effective number of samples was only 600 because the treatment was applied to individual villages.”

However, the researchers didn’t analyze the findings at the level of villages. Instead, they did so as if they had randomly assigned 340,000 individuals to either the mask group or the control group. Recht writes that because “the individual outcomes are not independent” and “outcomes inside a village are correlated,” analyzing the study in this manner is “certainly wrong.” Put another way, when individuals are randomly assigned to one group or another in an RCT, one person’s outcome isn’t supposed to affect another’s—but this is hardly the case when analyzing the effects of a highly contagious virus among people living in the same village, all of whom were assigned to the same group. In layman’s terms, each roll of the dice should be independent and shouldn’t affect subsequent rolls. But in the Bangladesh study, each roll of the dice did affect subsequent rolls.

Recht cites a previous RCT on masks (which I discussed in my 2021 essay) that adjusted for such correlation—that is, adjusted for the fact that one person’s outcome could influence another’s. Even though that earlier RCT randomly assigned families rather than villages to a particular group, it still assumed correlation and adjusted for it. The Bangladesh study, which had far greater correlation, assumed none. Adjusting for correlation, Recht found that the Bangladesh study showed no statistically significant benefits from masks.

The danger in pretending to have randomly assigned 340,000 individuals is that huge sample sizes—which suggest great accuracy—allow small differences to test as statistically significant, since there is less likelihood that they merely reflect random events. This is fine if a test is really that accurate, but not if it’s inflating its sample size by a factor of more than 500 (600 versus 340,000)—or even by a factor of five. Such a scenario risks producing “statistically significant” results that are really just a product of random chance. This is exactly what seems to have happened in the Bangladesh study.

The mainstream press heralded this study as confirming that surgical masks work and suggesting that cloth masks (which, overall, didn’t show a statistically significant benefit) should perhaps be shelved. But the study’s actual findings were more interesting. It found no statistically significant evidence that masks work for people under the age of 40. For people in their forties, however, it found statistically significant evidence that cloth masks work but no corresponding evidence to support the use of surgical masks. For people in their fifties (or older), it found statistically significant evidence that surgical masks work, but no corresponding evidence to support the use of cloth masks. Further complicating matters, the researchers distributed both red cloth masks and purple ones. Recht, citing data from the study that the authors didn’t include in their write-up or tables, writes that, based on the study’s method of analysis, “cloth purple masks did nothing, but the red masks ‘work.’” He adds, “Indeed, red masks were more effective than surgical masks!” When a study starts producing findings like these, its results start to look like random noise.

Moreover, since there were just 20 fewer Covid cases in the mask group than in the non-mask control group, most of the difference between the 0.68 percent Covid rate in the former and the 0.76 percent rate in the latter was because of differences in the sizes of what were supposed to be two equally sized groups. The researchers omitted from their analysis thousands of people—disproportionately from the control group—whom they didn’t successfully contact. The University of Pittsburgh’s Maria Chikina, Carnegie Mellon’s Wesley Pegden, and Recht found that the study’s “unblinded staff”—who knew which participants were assigned to which group—“approached” those in the mask group at a “significantly” higher rate than those in the control group. Indeed, Chikina, Pegden, and Recht write that the “main significant difference” that led to an “imbalance” between the two groups was “the behavior of the study staff.”

Under the “intention-to-treat” principle, everyone who was originally randomly assigned to either group should have been included in the analysis, whether or not the staff had contacted them. Eric McCoy, an M.D. at the University of California, Irvine, explains that intention-to-treat analysis “preserves the benefits of randomization, which cannot be assumed when using other methods of analysis.” Recht, agreeing with McCoy, writes, “For the medical statistics experts, the intention to treat principle says that the individuals who are unreachable or who refuse to be surveyed must be counted in the study. Omitting them invalidates the study.” Yet that’s exactly what the authors of the Bangladesh study did. When Chikina, Pegden, and Recht analyzed the study’s finding using intention-to-treat analysis, they found no statistically significant difference between the number of people who got Covid in the mask group and the number who got it in the control group.

Thus, in order to show a statistically significant benefit from masks, the Bangladesh study both had to depart from intention-to-treat analysis and treat 340,000 people who were not randomly assigned to a group on an individual basis as if they had been. Doing just one or the other would have failed to produce a statistically significant result.

In addition, the study made no real secret that it was pro-mask, launching an all-out campaign to convince people in half of the villages to wear them. The researchers found that physical distancing was 21 percent greater in the mask villages than in the control villages, muddying efforts to distinguish between the effects of masks and distancing. The study also provided monetary incentives to some people, opening up the possibility that, given that participants and staff both knew what group people were in, some participants might have desired to give responses that pleased the researchers (and only those who reported Covid-like symptoms got tested for antibodies). Finally, the study didn’t test how many people had Covid antibodies beforehand, even though its principal findings about masks were based on how many people had Covid antibodies afterward. This is like determining whether a family bought butter during their most recent grocery trip by seeing if there’s butter in the refrigerator.

To sum up, the Bangladesh study’s findings show tiny differences in how many people got Covid in the mask and (non-mask) control groups, and these tiny differences register as statistically significant only because of myriad questionable methodological choices. The study’s researchers conducted their analysis as if they had randomly divided 340,000 individuals into either the mask group or the control group, when in fact they had just randomly divided 600 villages. They also deviated from intention-to-treat analysis, without which they would not have shown statistical significance even on the basis of this inflated sample size. They adjusted the ratio of Covid cases between the mask and control groups by adding baseline controls that were not well-explained—without which surgical masks would not have tested as providing statistically significant benefits. And they based their primary findings on whether people had acquired Covid antibodies by the end of the study, without having tested whether they had already acquired them before the start of the study.

Nevertheless, the CDC favorably references this study and calls it “well-designed.” And even before the effort had been peer-reviewed or published as an official study, Abaluck proclaimed, “I think this should basically end any scientific debate about whether masks can be effective.”

Keep in mind that there are no real grounds for cherry-picking results from the Bangladesh study. If the study persuades people that masks work, then it should also persuade them that those in their forties should wear cloth masks (red ones, not purple!) and then switch to surgical masks once they turn 50. All those statistically significant findings resulted from the same abandonment of intention-to-treat analysis and the same determination to analyze 340,000 people as if they had been randomly assigned to a group on an individual basis, when instead they had been lumped in with the rest of their village. To put it in layman’s terms: garbage in, garbage out.

The best scientific evidence continues to suggest that masks don’t work. Meantime, the public-health establishment continues to ignore that evidence. Public-health officials also remain almost completely blind to masks’ profoundly adverse effects on human interaction and quality of life. Seeing others’ faces and showing one’s own are at the heart of human social life. In the words of the political philosopher Pierre Manent, “To present visibly one’s refusal to be seen is an ongoing aggression against human coexistence.”

To use the power of government to bar individuals from showing their faces to others is something even worse: an ongoing assault on human liberty. Indeed, as Manent writes, “The visibility of the face is one of the elementary conditions of sociability, of [the] mutual awareness that is prior to and conditions any declaration of rights.” And about the only thing worse than denying the rights of free men and women is going after their children.

Photo by David Crane/MediaNews Group/Los Angeles Daily News via Getty Images


City Journal is a publication of the Manhattan Institute for Policy Research (MI), a leading free-market think tank. Are you interested in supporting the magazine? As a 501(c)(3) nonprofit, donations in support of MI and City Journal are fully tax-deductible as provided by law (EIN #13-2912529).

Further Reading

Up Next