Few figures in the medical world generate more controversy than psychiatrist Jack Turban. An assistant professor of child and adolescent psychiatry at the University of California, San Francisco, Turban is one of the leading figures promoting “gender-affirming care” in the United States. He is also regularly criticized for producing deeply flawed research and denying the significant rollback of youth gender transition in Europe.
The American Civil Liberties Union recently retained Turban as an expert witness—paying him $400 per hour—in its legal challenge to Idaho’s Vulnerable Child Protection Act, which restricts access to “gender-affirming” drugs and surgeries to adults only. On October 16, Turban submitted to a seven-hour deposition at the hands of John Ramer, an attorney with the law firm Cooper & Kirk, who is assisting Idaho in the litigation. In the course of the deposition, Turban revealed that, aside from churning out subpar research and misleading the public about scientific findings, he also appears not to grasp basic principles of evidence-based medicine.
Evidence-based medicine (EBM) refers to “the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients. . . . The practice of evidence-based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic research.” Because the expert opinion of doctors, even when guided by clinical experience, is vulnerable to bias, EBM “de-emphasizes intuition, unsystematic clinical experience, and pathophysiologic rationale as sufficient grounds for clinical decision making and stresses the examination of evidence from clinical research.” EBM thus represents an effort to make the practice of medicine more scientific, with the expectation that this will lead to better patient outcomes.
Systematic reviews and meta-analyses sit at the top of the hierarchy of evidence in EBM. A key difference between the U.S. and European approaches to pediatric gender medicine is that European countries have changed their clinical guidelines in response to findings from systematic reviews. In the U.S., medical groups have either claimed that a systematic review “is not possible” (the World Professional Association for Transgender Health), relied on systematic reviews but only for narrowly defined health risks and not for benefits (the Endocrine Society), or used less scientifically rigorous “narrative reviews” (the American Academy of Pediatrics). One of the world’s leading experts on EBM has called U.S. medical groups’ treatment recommendations “untrustworthy.”
In the deposition, Ramer asked Turban to explain what systematic reviews are. “[A]ll a systematic review means,” Turban responded, “is that the authors of the reports pre-defined the search terms they used when conducting literature reviews in various databases.” The “primary advantage” of a systematic review, he emphasized, is to function as a sort of reading list for experts in a clinician field. “Generally, if you are in a specific field where you know most of the research papers, the thing that’s most interesting about systematic review is if it identifies a paper that you didn’t already know about.” Ramer showed Turban the EBM pyramid of evidence, which appears in the Cass Review (page 62) of the U.K.’s Gender Identity Development Service. He asked Turban why systematic reviews sit at the top of the pyramid. Turban responded: “Because you’re looking at all of the studies instead of looking at just one.”
Turban’s characterization represents a fundamental misunderstanding of what EBM is and why systematic reviews are the bedrock of trustworthy medical guidelines.
First, even if the only thing that makes a review systematic is that it “pre-defines the search terms,” Turban failed to explain the relevance of this. A major reason systematic reviews rank higher than narrative reviews in EBM’s information hierarchy is that systematic reviews follow a transparent, reproducible methodology. Anyone who applies the same methodology and search criteria to the same body of research should arrive at the same set of conclusions. Narrative reviews don’t use transparent, reproducible methodologies. Their conclusions are consequently more likely to be shaped by the personal biases of their authors, who may, for instance, cherry-pick studies.
To achieve transparency and reproducibility, systematic reviews define in advance the populations, interventions, comparisons, and outcomes of interest (PICO). They search for and filter the available literature with Preferred Reporting Items for Systematic Reviews and Meta-Analyses. Authors register their methodology and search criteria in advance in databases such as PROSPERO. These steps are meant to minimize the risk that authors will change their methodology midway through the process in response to inconvenient findings.
Turban acknowledged that pre-defining the search terms “makes it a little bit easier for another researcher to repeat their search.” However, he did not seem to grasp that the additional steps introduced by systematic reviews are designed to reduce bias and improve accuracy. Turban, one should note, endorses the American Academy of Pediatrics’ 2018 narrative review—a document that, with its severe flaws, perfectly illustrates why EBM prefers systematic to narrative reviews.
Second, Turban is incorrect that the “primary advantage” of the systematic review is to generate a comprehensive reading list for (in this case) gender clinicians. Systematic reviews also assess the quality of evidence from existing studies. In other words, they avoid taking the reported findings of individual studies at face value. This is especially important in gender medicine because so much of the research in this field comes from authors who are professionally, financially, and intellectually invested in the continuation of gender medicine—in other words, who have conflicts of interest. Financial conflicts of interest are typically reported, but professional and intellectual conflicts rarely so. Conflicted researchers frequently exaggerate positive findings, underreport negative findings, use causal language where the data don’t support it, and refrain altogether from studying harms. In short, assessing the quality of evidence is especially important in a field known for its lack of equipoise and scientific rigor.
In EBM, quality of evidence is a technical term that refers to the degree of certainty in the estimate of the effects of a given intervention. The higher the quality, the more confident we can be that a particular intervention is what causes an observed effect. It was only in response to Ramer’s prodding that Turban addressed “the risk of bias associated with primary studies”—namely, one of the key considerations for assessing quality of evidence.
During the deposition, Ramer read Turban excerpts from Users’ Guides to the Medical Literature, a highly regarded textbook of EBM published by the American Medical Association. Ramer asked Turban to explain what the Users’ Guides means when it says that narrative reviews, unlike systematic reviews, “do not include systematic assessments of the risk of bias associated with primary studies and do not provide quantitative best estimates or rate the confidence in these estimates.” Turban responded that systematic reviews do sometimes assess the quality of evidence, but that this is not a necessary condition for a review to be called systematic.
I asked Gordon Guyatt, professor of health research methods, evidence, and impact at McMaster University, what he thought of Turban’s answer. Guyatt is widely regarded as a founder of the field of EBM and is the primary author of Users’ Guides. “The primary advantage of a systematic review,” Guyatt assured me, “is not only not missing studies, but also assessing quality of the evidence. Anybody who doesn’t recognize that a crucial part of a systematic review is judging the quality or certainty of the evidence does not understand what it’s all about.”
Ramer asked Turban to explain the GRADE method (Grading of Recommendations Assessment, Development and Evaluations), a standardized EBM framework for evaluating quality. “GRADE generally involves looking at the research literature,” Turban explained. “And then there’s some subjectivity to it, but they provide you with general guidelines about how you would—like, great level of confidence in the research itself. Then there’s a—and then each of those get GRADE scores. I think it’s something like low, very low, high, very high. I could be wrong about the exact names of the categories.” Turban is indeed wrong: the categories are high, moderate, low, and very low. It’s surprising that someone involved in the debate over gender-medicine research for several years, and who understands that questions of GRADE and of quality are central, doesn’t know this by heart.
Ramer asked Turban what method, if any, he uses to assess quality in gender-medicine research. Turban explained that he reads the studies individually and does his own assessment of bias. GRADE is “subjective,” and this subjectivity, Turban said, is one reason that the U.K. systematic reviews rated studies that he commonly cites as “very low” quality. Turban’s thinking seems to be that, because GRADE is “subjective,” it is no better than a gender clinician sitting down with individual studies and deciding whether they are reliable.
I asked Guyatt to comment on Turban’s understanding of systematic reviews and GRADE. “Assessment of quality of evidence,” he told me, “is fundamental to a systematic review. In fact, we have more than once published that it is fundamental to EBM, and is clearly crucial to deciding the treatment recommendation, which is going to differ based on quality of evidence.” Guyatt said that “GRADE’s assessment of quality of the evidence is crucial to anybody’s assessment of quality of evidence. It provides a structured framework. To say that the subjective assessment of a clinician using no formal system is equivalent to the assessment of an expert clinical epidemiologist using a standardized system endorsed by over 110 organizations worldwide shows no respect for, or understanding of, science.”
At one point, Ramer pressed Turban to explain his views on psychotherapy as an alternative to drugs and surgeries. Systematic reviews have rated the studies Turban relies on for his support of puberty blockers and cross-sex hormones “very low” quality in part because these studies are confounded by psychotherapy. Because the kids who were given drugs and improved were also given psychotherapy and the studies lack a proper control group, it is not possible to know which of these interventions caused the improvement.
Turban seemed not to grasp the significance of this fact. If hormonal treatments can be said to cause improvement despite confounding psychotherapy, why can’t psychotherapy be said to cause improvement despite confounding drugs?
The exchange about confounding factors came up in the context of Ramer asking Turban about an article he wrote for Psychology Today. The article, aimed at a popular audience, purports to give an overview of the research that confirms the necessity of “gender-affirming care.” Last year, I published a detailed fact-check of the article, showing how Turban ignores confounding factors, among other problems. Four days later, Psychology Today made a series of corrections to Turban’s article. Some of these corrections were acknowledged in a note; others were done without any acknowledgement. In the deposition, Ramer asked Turban about my critique, to which Turban replied that he “left Psychology Today to do whatever edits they needed to do,” and that, when he later read the edits, he found them “generally reasonable.”
In sum, though Turban says that “there are no evidence-based psychotherapy protocols that effectively treat gender dysphoria itself,” the same studies he cites furnish just as much evidence for psychotherapy as they do for puberty blockers or cross-sex hormones—which is to say “very low” quality evidence.
Other remarkable moments occur in the Turban deposition. For instance, when asked whether he had read the Florida umbrella review (a systematic review of systematic reviews) conducted by EBM experts at McMaster University and published over a year ago, Turban said that he hadn’t because he “didn’t have time.” When I mentioned this confession to Guyatt, he seemed taken aback. How could a clinician who claims expertise in a contested area of medicine not be curious about a systematic review of systematic reviews? “If all systematic reviews come to the same conclusion,” Guyatt told me, “it clearly increases our confidence in that conclusion.” (My conversation with Guyatt dealt exclusively with Turban’s claims and how they stack up against EBM. I did not ask Guyatt about, and he did not opine on, the wisdom of state laws restricting access to “gender-affirming care.”)
I believe that Turban is being honest when he says he didn’t read the Florida umbrella review. He doesn’t seem interested in literature that might call his beliefs into question. He has staked his personal and professional reputation on a risky and invasive protocol before the appearance of any credible evidence of its superiority to less risky alternatives. Turban regularly maligns as bigoted and unscientific anyone who disagrees with him. Some gender clinicians in Europe now admit that the evidence is weak, the risks serious, and the protocol still experimental. Turban, however, would seemingly rather go down with the sinking ship than admit that he was too hasty in promoting “gender-affirming care.”
Put another way, Turban has intellectual, professional, and financial conflicts of interest that prejudice his judgment on how best to treat youth experiencing issues with their bodies or sex. European health authorities are aware of this problem; that’s why they chose to commission their evidence reviews from clinicians and researchers not directly involved in gender medicine. For instance, England’s National Health Service appointed physician Hilary Cass to chair the Policy Working Group that would lead the investigation of its Gender Identity Development Service and its systematic reviews. The NHS explained that there was “evident polarization among clinical professionals,” and Cass was “asked to chair the group as a senior clinician with no prior involvement or fixed views in this area.”
Unfortunately, in the U.S., personal investment in gender medicine is often seen as a benefit rather than a liability. James Cantor, a psychologist who testifies in lawsuits over state age restrictions, emphasizes the difference between the expertise of clinicians and that of scientists. The clinician’s expertise “regards applying general principles to the care of an individual patient and the unique features of that case.” The scientist’s expertise “is the reverse, accumulating information about many individual cases and identifying the generalizable principles that may be applied to all cases.” Cantor writes:
In legal matters, the most familiar situation pertains to whether a given clinician correctly employed relevant clinical standards. Often, it is other clinicians who practice in that field who will be best equipped to speak to that question. When it is the clinical standards that are themselves in question, however, it is the experts in the assessment of scientific studies who are the relevant experts.
The point is not that clinicians are never able to exercise scientific judgment. It’s that conflicts of interest for involved clinicians need to be acknowledged and taken seriously when “the clinical standards . . . are themselves in question.” Unfortunately, the American propensity for setting policy through the courts makes that task difficult. Judges intuitively believe that gender clinicians are the experts in gender medicine research. The result is a No True Scotsman argument wherein the more personally invested a clinician is (and the more conflict of interest he has as a result), the more credible he appears.
Last year, a federal judge in Alabama dismissed Cantor’s expert analysis of the research, citing, among other things, the fact that Cantor “had never treated a child or adolescent for gender dysphoria” and “had no personal experience monitoring patients receiving transitioning medications.” Turban’s deposition illustrates why this thinking is misguided. It is precisely gender clinicians who often seem to be least familiar, or at any rate least concerned, with subjecting their “expert” views to rigorous scientific scrutiny. It is precisely these clinicians who are most likely to be swimming in confirmation bias, least interested in the scientific method, and, conveniently, least concerned with evidence-based medicine.