OpenAI Transcriptions Are What You Never Told Your Doctor
For those of us who follow the news about generative artificial intelligence, it should come as no surprise that AI is not perfect. In fact, generative AI produces incorrect, deceptive, and otherwise incorrect results so often that we have a name for it: hallucinations .
This is part of the problem with outsourcing much of our work and tasks to AI at the moment. AI can be used for good, but blindly trusting it to solve important problems without oversight and fact-checking carries real risks. We are now seeing the consequences of this game in a disturbing way.
OpenAI Whisper has a problem with hallucinations
The latest high-profile case of hallucination involves Whisper , an artificial intelligence-powered transcription tool from ChatGPT maker OpenAI. Whisper is popular, with transcription services often connecting to the platform to implement their tools, which in turn are used by many users and clients to make transcribing conversations faster and easier. On the surface, this is a good thing: Whisper and the services it provides have a positive reputation among users , and use of the platform is growing across all industries.
However, hallucinations get in the way. As AP News reports , researchers and experts are sounding the alarm about Whisper, claiming that it is not only inaccurate, but often makes things up entirely. While all AIs are prone to hallucinations, the researchers warn that Whisper will report things said that absolutely didn’t happen, including “racial comments, violent rhetoric, and even imaginary treatments.”
This is bad enough for those of us who use Whisper for personal use. But the bigger concern here is that Whisper has a large user base in professional industries: the subtitles you see when watching videos online may have been created by Whisper, which could affect the experience the video creates for users who are deaf or hard of hearing. . Important interviews may be transcribed using Whisper’s tools, leaving an incorrect record of what was actually said.
Your conversations with doctors may not be accurately recorded.
However, what is currently attracting the most attention is the use of Whisper in hospitals and medical centers. Researchers are concerned about the number of doctors and healthcare professionals who have turned to Whisper’s tools to transcribe their conversations with patients. Your conversation about your health with your doctor may be recorded and then analyzed by Whisper, only to be recorded with completely false statements that were never part of the conversation.
This isn’t hypothetical either: different researchers have each come to similar conclusions by studying transcriptions of Whisper-powered tools. AP News summarized some of these findings: A University of Michigan researcher found hallucinations in eight of 10 transcriptions made by Whisper; a machine learning engineer found problems in 50% of the transcriptions he examined; and one researcher found hallucinations in almost all of the 26,000 Whisper transcriptions they created. The study even found persistent hallucinations when the audio recordings were short and clear.
But it’s reports from Cornell professors Allison Koenecke and Mona Sloan that offer the most intuitive look at the situation: These professors found that nearly 40% of the hallucinations they found in transcripts taken from Carnegie Mellon’s TalkBank research repository were “harmful or disturbing.” since the speaker may be “misinterpreted or misrepresented.”
In one example, the speaker said: “He, the boy, was going to, I’m not exactly sure, take an umbrella.” The AI added the following to the transcription: “He took a big piece of the cross, a small, small piece… I’m sure he didn’t have a horror knife, so he killed several people.” In another example, the speaker said “two other girls and one woman” and the AI turned it into “two other girls and one woman, um, who were black.”
When you take all this into account, it seems alarming that more than 30,000 doctors and 40 health systems are currently using Whisper using a tool developed by Nabla . To make matters worse, you can’t check the transcription against the original recordings to determine whether Nabla’s tool distorted part of the report, since Nabla designed the tool to remove audio “for data security reasons.” According to the company, about seven million doctor visits used the tool to transcribe conversations.
Is AI Really Ready for Prime Time?
Generative AI as a technology is not new, but ChatGPT began its widespread adoption in late 2022 . Since then, companies have strived to create and add artificial intelligence to their platforms and services. Why not? The public seemed to really like AI, and it seemed like generative AI could do almost anything. Why not embrace this and use the magic of AI to solve problems like transcription?
We see why at the moment. Artificial intelligence has great potential, but it also has many disadvantages. Hallucinations are not just random annoyances: they are a byproduct of technology, a defect built into the structure of neural networks. We don’t fully understand why AI models hallucinate, and that’s part of the problem. We trust technologies with flaws we don’t fully understand to do the work that matters to us, so in the name of safety, we remove data that could be used to double-check AI results.
Personally, I don’t feel safe knowing that my medical records may contain outright lies just because my doctor’s office decided to use Nabla’s tools in their system.