Diego Martinez made a diving catch for the gently descending softball. At 48, he was getting old for such theatrics, but he enjoyed showing off for the small crowd of friends and family members. As he got up from the turf, Martinez felt a pain in his back. A pulled muscle, he thought. But when he sat on the bench, it got worse. Soon he was sweating and breathing hard. “You don’t look right,” his wife said.

By the time the ambulance arrived, Martinez was having trouble moving his legs or even speaking clearly. As they loaded him into the van, the EMTs assumed he was having a stroke, or perhaps a heart attack. Here’s where Martinez caught his first break. His city’s emergency medical service had recently rolled out an AI-assisted “smart ambulance” system that helped the EMTs improve their pre-hospital triage protocol. Data in the cloud-based system detailed what resources were available at each nearby hospital. Within moments, it recommended a trauma facility suited to Martinez’s condition and planned the fastest route. By the time the ambulance pulled into the ER bay, the medical team already had his ECG and other vital data downloaded. Within minutes, they were wheeling him to radiology.

The hospital’s radiologists had seen thousands of similar cases. But they wouldn’t be relying on their experience alone. For every type of CT, MRI, or other scan available, the hospital had an AI software package designed to search the images for subtle patterns that even the most seasoned radiologist might miss. Whatever was wrong with Diego Martinez, this combination of experienced physicians and the massive power of machine-learning AI gave him a fighting chance.

The scenario above is invented but based on technology already in use or arriving soon in most major hospitals. After decades of painstaking research and development, the U.S. health-care system is embracing artificial intelligence at a dizzying rate. Ten years ago, roughly 40 clinical AI systems had been approved under the FDA’s AI-as-medical-device protocol. Today, more than 1,200 AI applications are FDA-approved, most designed to help doctors read radiological scans. Other systems predict which ER patients face an elevated risk of sepsis or other complications during their hospital stay. Machine-learning programs also help hospitals manage staffing levels, improve workflows, and track supplies. In addition, an uncounted number of nurses, doctors, and hospital managers rely on ChatGPT and other large language model (LLM) platforms to help keep records, compose e-mails, and handle administrative tasks. Some hospitals are even experimenting with AI chatbots that advise doctors on diagnoses and treatments.

In his book Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again (2019), cardiologist and digital-medicine expert Eric Topol described AI as “a beacon of hope” that could “automate mundane tasks, reduce human error, and provide support in clinical decision-making, thereby streamlining the entire process of patient care.” Many AI critics fear that the technology will depersonalize interactions now handled by humans. But, Topol argued, by freeing doctors from endless digital paperwork, AI could help them give more attention to patients. AI’s greatest promise, he wrote, “is the opportunity to restore the precious and time-honored connection and trust—the human touch—between patients and doctors.”

That was an ambitious agenda for a technology that was—then and now—regarded with trepidation by countless Americans, including many health-care workers. In the intervening years, Topol himself has often drawn attention to the risks and limitations of medical AI applications. But when I asked him whether he remains upbeat about AI’s potential, he replied that he is “more optimistic” today than in 2019.

The AI-driven transformation of health care is in its early stages, so precise estimates of its benefits are hard to come by. But researchers predict significant improvements in cancer screening, cardiac care, and other branches of medicine. In theory, more efficient diagnoses and treatments should also bring down costs. A 2023 study from McKinsey & Co. and the National Bureau of Economic Research estimated that widespread adoption of AI could lead to reductions of 5 percent to 10 percent in annual U.S. health-care spending.

Nonetheless, the benefits that AI could bring to American health care are far from assured. For one thing, the AI revolution writ large may be hitting a speed bump. AI pioneer Gary Marcus, for example, has long argued that extravagant predictions of AI dominance are overblown. The latest LLMs from Google, OpenAI, and other companies require exponentially larger investments in infrastructure and power consumption to yield smaller increments in improved performance. In its August debut, OpenAI’s hotly anticipated ChatGPT-5 “landed with a dull thud,” the tech press concluded. Even the newest LLMs show an alarming tendency to hallucinate false information, obsequiously endorse users’ paranoid theories, and slide into dark, edgelord rhetoric.

The American public also appears to be growing more pessimistic about AI: in a poll this year, only 30 percent of respondents said that they thought the technology would have a positive effect on society, while 40 percent expect a negative impact. Colorado and other states have passed or proposed laws aimed at preventing AI’s supposed “algorithmic bias” against protected groups. A pending California bill would subject AI developers to annual audits and require health-care providers to police their AI tools for vaguely defined “biased impacts.” These regulatory regimes threaten to hobble AI startups in response to a threat that, so far, seems best addressed through better AI design and training. The European Union’s blanket of AI regulations may help explain why the EU lags so far behind the U.S. and China in AI patent applications.

Some concerns about AI in health care do merit attention. We don’t want doctors and nurses led astray by LLM hallucinations, for example. But advocates for aggressive AI regulation should recall the Hippocratic admonition: first, do no harm. For now, the benefits of AI tools in medicine appear to outweigh the risks dramatically. Even if the grandiose predictions of future AI capabilities never materialize, today’s real-world AI tools already show the potential to reshape medicine for the better. In many cases, those benefits will prove lifesaving.

Martinez was fading as orderlies wheeled him into radiology. The ER team was leaning toward a diagnosis of myocardial infarction and had already alerted the hospital’s cath lab to prepare for an emergency coronary catheterization. But they needed radiology to confirm the hunch and to rule out a stroke, as well. The radiologist was at her workstation as images loaded on the screen. First came the CT scans of the head. They looked fine. Then the technician injected a contrast dye and began a series of chest scans. The radiologist searched for the telltale signs of coronary artery blockage. But the blood flow to the heart appeared normal. Tricky case, she thought.

As the radiologist looked for clues, a system developed by Aidoc, a Tel Aviv–based AI company, was also reviewing the images. Through a process of machine learning, the system had taught itself to recognize obvious, as well as extremely subtle, patterns linked to conditions such as pulmonary embolisms or cancerous lesions. As images flowed into the database, the Aidoc system suddenly pinged an alert. It brought one slide to the top of the queue and highlighted a thin, barely visible line inside the patient’s ascending aorta. The cardiologist looked closely and confirmed the system’s preliminary finding. She got on the phone to the ER. “Alert surgery,” she said. “It’s an aortic dissection, Type A.”

Martinez had gotten another break. An aortic dissection is a partial breakdown of the aorta wall that can rapidly progress to a fatal rupture. Of patients who don’t receive a timely diagnosis and surgery, roughly half will be dead within 24 hours. Martinez was prepped for cardiothoracic surgery.

Today, most leading AI systems depend on some version of machine learning (ML) or, more specifically, deep learning. An ML system learning to identify, say, tree species, doesn’t need to be programmed with specific information about bark or leaf types; it simply digests images labeled “white oak” or “sugar maple” and learns their distinguishing features. (Sometimes, humans step in to “reinforce” correct answers and flag the bad ones.) This kind of deep-learning pattern recognition is used in many of today’s medical AI applications, including reading radiological scans, predicting medical issues such as heart attacks, and in drug discovery.

But as anyone who has used an AI chatbot knows, the latest LLMs offer much broader capabilities. While also rooted in machine learning, LLMs ingest huge quantities of written material, learn the likely word patterns used in various subject areas, and generate original content based on those underlying connections. With their ability to engage in human-like dialogue, these generative AI systems can help doctors and nurses with record-keeping and patient communication, even offering clinical advice.

Each of these broad approaches to medical AI offers distinct strengths and risks. Chad McClennan is president and CEO of the medical imaging company Koios, which helped pioneer the use of ML to diagnose breast and thyroid cancer using ultrasound images. Unlike an LLM chatbot, a task-specific ML “doesn’t improvise,” McClennan told me. He also stressed that the Koios system isn’t meant to replace the judgment of the radiologist but instead to offer a kind of virtual second opinion. It also provides a backstop against simple errors. “It’s like the way spell-check alerts you when you’ve typed ‘your’ instead of ‘you’re,’ ” McClennan said. In complex cases, Koios software can also augment the physician’s necessarily limited experience, he added, since its training sources include “unique outlier cases that AI never forgets.”

“We should stop training radiologists right now,” British AI researcher Geoffrey Hinton famously proclaimed in 2016. “It’s just completely obvious that, within five years, deep learning is going to do better than radiologists.” Since then, a number of ML radiology platforms have outperformed human radiologists in controlled tests. But this year, the Nobel Prize–winning scientist walked back his 2016 statement, telling the New York Times that he had spoken too broadly. Rather than putting physicians out of work, AI was making “radiologists a whole lot more efficient in addition to improving accuracy,” he said. Indeed, in hospitals around the world, systems from companies including Koios and Aidoc aren’t just helping radiologists make better diagnoses; they’re also speeding up the scanning process, creating documentation needed for medical records, and simplifying workflows.

Nonetheless, helping doctors make the most of AI diagnostic tools turns out to be harder than expected. A recent Harvard–MIT study tested how physicians responded to AI analyses of chest X-rays that contradicted their own judgments. In the study, the AI system achieved 92 percent accuracy when working alone. Radiologists working unaided achieved a less impressive 74 percent accuracy rate. But when the radiologists combined the AI results with their own judgment, they reached an accuracy level of only 76 percent.

In a New York Times op-ed, Topol and a coauthor argued that the findings indicate that “right now, simply giving physicians AI tools and expecting automatic improvements doesn’t work.” Research into how physicians actually use AI reveals several pitfalls. Sometimes, as in the Harvard–MIT study, doctors undervalue the AI input and fall back on their own flawed instincts. But on the flip side, Topol told me, there’s the risk of automation bias, which he defines as “the human tendency to over-rely on AI and to ignore contradictory information, even when it is correct.” Another worry is that doctors will experience “deskilling” after learning to rely on AI. One study suggested that doctors who used an AI-assisted endoscopy tool for three months became less adept at finding precancerous polyps when performing colonoscopies unaided.

These dilemmas don’t mean that AI won’t dramatically improve radiology and other medical specialties. But they suggest that integrating humans and machines will require a systematic effort. To start, we need “more and better AI training for physicians,” Topol said.

A Mayo Clinic radiologist working with an AI tool that saves 15 to 30 minutes per examination (Jenn Ackermann/The New York Times/Redux)

Martinez woke up in the ICU. The rapid AI diagnosis of his aortic dissection had allowed a fast transition to cardiothoracic surgery, likely saving his life. But he wasn’t out of the woods. Martinez’s recovery would depend on diligent care from the ICU nurses and physicians.

In some ways, a modern ICU can be seen as an elaborate data hub: the patient’s vital signs are monitored by several devices, while nurses and doctors double as data-entry workers, tracking every change in the patient’s condition, every dose of medication, and every procedure for which the hospital will need to seek reimbursement. Fortunately, the team working with Martinez had several AI tools to help with these tasks. A system developed by the Boston-based firm Etiometry displayed all the crucial data—vital signs, medications, fluid levels, and more—on a single screen at his bedside. A related package from the company tracked key data using an FDA-approved AI program designed to predict patient deterioration. If Martinez’s condition started to slide, the nurses should get an alert before the problem turned perilous. Another AI program helped automate the input of the copious data that nurses were expected to collect, including the codes needed for later billing.

Martinez’s wife had previously been at the bedside of other family members in ICUs. She noticed how the nurses in this unit seemed a bit less harried. They spent less time clicking through forms on their tablets and more time talking to the patient—and reassuring her that her husband’s recovery was on track.

The U.S. health-care system is far better overall than critics claim. But anyone who has spent much time in U.S. hospitals knows that even elite institutions leave much to be desired. Patients entering the system through the ER often spend long periods—sometimes more than 24 hours—waiting for a room to open up. Medical workers widely report feeling stressed and burned out. In a 2023 study published by the Mayo Clinic, 45 percent of physicians reported experiencing at least one burnout symptom. Perhaps it is no surprise that, in a 2024 McKinsey & Co. survey, 35 percent of practicing physicians said that they are likely to leave their current roles in the next five years. Worse yet, medical errors remain a significant problem. Some estimates of death rates due to medical mistakes are wildly inflated, but a judicious accounting suggests that more than 100,000 patients annually suffer some “adverse effect of medical treatment” that contributes, at least partially, to their deaths.

While not a silver bullet, AI can help address these problems. New York’s Mount Sinai hospital system is a pioneer in finding ways to integrate AI into hospital operations. Mount Sinai’s chief digital transformation officer, Robbie Freeman, is helping test a machine-learning algorithm designed to determine which ER patients will require a hospital bed. “We want to minimize what we call ‘boarding’ in the emergency department, patients waiting for a room to open up,” said Freeman, whose background includes both a doctorate in nursing practice and an MBA. By predicting admissions earlier, the hospital can more quickly adjust staffing levels and other resources to ensure that the right beds are available. “When we can plan better,” he added, “we can help move patients through in a way that’s best for them.”

Another ML algorithm, being developed at Mount Sinai’s Icahn School of Medicine, forecasts which patients are most likely to develop delirium, a serious complication that can lead to combative behavior—and ultimately to higher mortality. In a study published this year, the algorithm achieved a fourfold improvement over traditional clinical approaches. Patients at high risk of delirium can receive modified treatment protocols—for example, being given lower doses of sedatives. Other hospitals are using similar tools to predict heart attacks and other patient crises.

Medical ML applications such as these have been rolling out gradually over the past three decades. As noted, most are focused on reading scans or other very specific tasks. But the 2022 release of OpenAI’s ChatGPT platform (soon followed by Google’s Gemini, Meta AI, and other LLMs) opened up a new vista of possibility. LLM platforms enable open-ended generative AI. Users can ask them anything, and they do. I talked with one oncologist who routinely uses ChatGPT to draft letters to insurance companies seeking preauthorization to cover the cost of expensive new drugs. “I’ll tell it, ‘Here are the patient’s medical details. Please compose a letter explaining why she needs X medication, and please cite these five journal articles supporting this claim,’ ” he said. The process saves him hours each week. That’s time he can use talking to patients instead of insurance companies. Many nurses, too, are turning to chatbots to help with routine work, such as transcribing patient-intake interviews.

But health-care workers need more than informal solutions. Over the years, clinical data and billing codes have grown more complex, and efforts to digitize record-keeping have forced doctors and nurses to spend their days clicking boxes on screens. A 2016 study published in the Annals of Internal Medicine found that doctors spent only 27 percent of their time in clinical face time with patients and 49 percent of their day on digital paperwork. “In our quest for the ultimate clinical information system, we created a mess,” one longtime hospital chief information officer told me.

In their book The AI Revolution in Medicine: GPT-4 and Beyond (2023), physicians Carey Goldberg and Isaac Kohane, along with Microsoft’s Peter Lee, explored how LLM chatbots could help untangle this data dilemma. They envisioned a new era of “symbiotic medicine,” with the physician and an AI assistant working as partners. GPT-4 was particularly useful as a record-keeping assistant and as a “universal translator” between different medical data standards. For example, Medicare requires that data be recorded in the FHIR (Fast Healthcare Interoperability Resources) standard. In their tests, GPT-4 was “able to convert health data both into and out of FHIR.”

“GPT-4 appears to be a real game changer” in automating clinical documentation, the authors write. Mount Sinai and other hospitals are experimenting with ambient AI, systems that monitor patient interviews (with the patient’s permission) and then convert those conversations into clinical notes. “This can reduce what we call ‘pajama time,’ ” Freeman said. “That’s the time after work when our clinical teams are catching up on their documentation.”

LLMs can even provide sophisticated analysis of tricky medical conditions. In The AI Revolution, Kohane writes that he was stunned to find GPT-4 giving clinical guidance “better than many doctors I’ve observed.” But not always. The AI Revolution authors also observed the chatbot making mistakes that included “highly convincing fabrications, omissions, and even negligence.” Before such a system can be trusted as a clinical advisor, they argued, researchers will need to “find a path to trusting, but always verifying” the chatbot’s outputs. They ask, “How can we reap its benefits—speed, scale, and scope of analysis—while keeping it subordinate to the judgment, experience, and empathy of human doctors?”

Mount Sinai’s Eyal Klang has been asking the same question. Klang is director of Mount Sinai’s Generative AI Research Program and one of the authors of a 2024 study that looked at how ChatGPT and other LLMs might perform in the role of clinical assistant. To see if they could nudge the LLMs into hallucinating, the study’s authors created medical vignettes that each included a single imaginary medical term, such as “Faulkenstein Syndrome” or “Renal Stormblood Rebound Echo.” Alarmingly, Klang said, “About 50 percent of the time, [the chatbot] happily went and elaborated on this funny science that doesn’t exist.” Another study that Klang worked on found that LLMs often generated vague or fabricated information when asked to translate notes into standard medical billing codes.

These are not trivial problems. Before doctors and nurses can rely on LLMs for clinical advice, or even for handling routine paperwork, these quirks must be solved. Kohane puts it bluntly: “For the foreseeable future, GPT-4 cannot be used in medical settings without direct human supervision.” Fortunately, The AI Revolution authors, Klang, and others are already developing solutions. “Just telling it, ‘Please do not hallucinate’ helps a lot,” Klang said with a trace of amusement. Better yet, he added, “You can instruct the agent to double-check itself.” Of course, doctors won’t always rely on general-purpose chatbots. Finding ways to automate hallucination detection will be a key task for companies developing proprietary LLM platforms for clinical advice.

Virtually everyone who has studied LLMs in medical applications agrees that this technology will be a force multiplier for overworked doctors and nurses. AI platforms should also help level the playing field for rural or underfinanced hospitals that lack cutting-edge medical expertise.

First, though, the bugs need to be discovered and worked out. The FDA can play a role in this process. (The FDA has been “far too permissive” in regulating AI algorithms, Topol told me; it should insist on more transparency and more published data.) But lawmakers must resist the temptation to address these AI limitations through premature regulation. The best way to avoid AI pitfalls is to continue the kind of research that Klang and others are pursuing. Then, health-care organizations must develop and document—and continuously monitor—best practices for using AI.

After four days in the ICU and another week in a cardiac-care unit, Diego Martinez is leaving the hospital. But the hospital isn’t leaving him. On his upper arm, he wears an AI-enabled cuff that will remotely monitor his blood pressure, blood oxygen, and other vitals. If his condition starts to slip, his doctors will probably know it before he does. Martinez faces months of rehab, but he is looking forward to sitting with his wife at breakfast and dreaming about—just maybe—being ready for softball season next year.

The AI medical revolution won’t be confined to hospitals. Generative AI is already transforming drug development, with several AI-designed molecules now in the pipeline. If these drugs prove effective in clinical trials, new treatments for various cancers, Alzheimer’s, and drug-resistant infections could reach patients more quickly than drugs developed by traditional means. LLMs also promise to streamline telehealth platforms, taking a burden off physicians struggling to keep up with the growing flood of text-based “asynchronous communications” with patients. Even dentists will benefit as new AI platforms help their office staffs manage today’s tangled billing processes.

And because ML systems can see patterns that human researchers have yet to discern, AI will help doctors detect—and someday, perhaps, prevent—diseases like Alzheimer’s and cancer years before they become clinically observable.

It will not be a simple task to integrate the strengths of AI with the skills and judgment of human medical workers. But if we can fully exploit AI’s benefits, learn to temper its risks—and head off efforts to overregulate the AI revolution before it starts—a more effective, more humane vision of health care is within reach.

Top Photo: An AI-powered machine tests for breast cancer during a clinical trial. (Klaudia Radecka/NurPhoto/Getty Images)

Donate

City Journal is a publication of the Manhattan Institute for Policy Research (MI), a leading free-market think tank. Are you interested in supporting the magazine? As a 501(c)(3) nonprofit, donations in support of MI and City Journal are fully tax-deductible as provided by law (EIN #13-2912529).

Further Reading