Artificial intelligence (AI) might not care whether humans live or die, but tools like ChatGPT would still affect life-and-death decisions — once they become a standard tool in the hands of doctors. Some are already experimenting with ChatGPT to see if it can diagnose patients and choose treatments. Whether this is good or bad hinges on how doctors use it.
GPT-4, the latest update to ChatGPT, can get a perfect score on medical licensing exams. When it gets something wrong, there is often a legitimate medical dispute over the answer. It is even good at tasks we thought took human compassion, such as finding the right words to deliver bad news to patients.
These systems are developing image processing capacity as well. At this point you still need a real doctor to palpate a lump or assess a torn ligament, but AI could read an MRI or CT scan and offer a medical judgment. Ideally, AI would not replace hands-on medical work, but enhance it — and yet we are nowhere near understanding when and where it would be practical or ethical to follow its recommendations.
‘DR GOOGLE’
It is inevitable that people will use it to guide our own healthcare decisions just the way we have been leaning on “Dr Google” for years. Despite more information at our fingertips, public health experts last week blamed an abundance of misinformation for our relatively short life expectancy — something that might get better or worse with GPT-4.
Andrew Beam, a professor of biomedical informatics at Harvard, has been amazed at GPT-4’s feats, but told me he can get it to give him vastly different answers by subtly changing the way he phrases his prompts. For example, it would not necessarily ace medical exams unless you tell it to ace them by, say, telling it to act as if it is the smartest person in the world.
He said that all it is really doing is predicting what words should come next — an autocomplete system. Yet it looks a lot like thinking.
“The amazing thing, and the thing I think few people predicted, was that a lot of tasks that we think require general intelligence are autocomplete tasks in disguise,” he said.
That includes some forms of medical reasoning.
The whole class of technology, large language models, are supposed to deal exclusively with language, but users have discovered that teaching them more language helps them to solve ever-more complex math equations.
“We don’t understand that phenomenon very well,” Beam said. “I think the best way to think about it is that solving systems of linear equations is a special case of being able to reason about a large amount of text data in some sense.”
Isaac Kohane, a physician and chairman of the biomedical informatics program at Harvard Medical School, had a chance to start experimenting with GPT-4 last fall. He was so impressed that he rushed to turn it into a book, The AI Revolution in Medicine: GPT-4 and Beyond, coauthored with Microsoft’s Peter Lee and former Bloomberg journalist Carey Goldberg.
One of the most obvious benefits of AI, he told me, would be in helping reduce or eliminate hours of paperwork that are keeping doctors from spending enough time with patients, something that often leads to burnout.
PEDIATRIC DIAGNOSIS
However, he has also used the system to help him make diagnoses as a pediatric endocrinologist.
In one case, he said, a baby was born with ambiguous genitalia and GPT-4 recommended a hormone test followed by a genetic test, which pinpointed the cause as 11 hydroxylase deficiency.
“It diagnosed it not just by being given the case in one fell swoop, but asking for the right workup at every given step,” he said.
For him, the value was in offering a second opinion — not replacing him — but its performance raises the question of whether getting just the AI opinion is still better than nothing for patients who do not have access to top human experts.
Like a human doctor, GPT-4 can be wrong and not necessarily honest about the limits of its understanding.
“When I say it ‘understands,’ I always have to put that in quotes because how can you say that something that just knows how to predict the next word actually understands something? Maybe it does, but it’s a very alien way of thinking,” he said.
You can also get GPT-4 to give different answers by asking it to pretend it is a doctor who considers surgery a last resort, versus a less conservative doctor.
However, in some cases, it is quite stubborn: Kohane tried to coax it to tell him which drugs would help him lose a few kilograms and it was adamant that no drugs were recommended for people who were not more seriously overweight.
Despite its amazing abilities, patients and doctors should not lean on it too heavily or trust it too blindly. It might act like it cares about you, but it probably does not. ChatGPT and its ilk are tools that will take great skill to use well — but exactly which skills are not yet well understood.
Even those steeped in AI are scrambling to figure out how this thought-like process is emerging from a simple autocomplete system. The next version, GPT-5, would be even faster and smarter. We are in for a big change in how medicine gets practiced — and we would better do all we can to be ready.
Faye Flam is a Bloomberg Opinion columnist covering science. She is host of the Follow the Science podcast.
This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.