ChatGPT Advanced Voice Mode Is Here (If You’re Willing to Pay for It)
Back in May, OpenAI announced the launch of an enhanced voice mode for ChatGPT. The company introduced the new feature as the then-existing voice mode on steroids. Not only will you be able to have a discussion using ChatGPT, but the conversations will become more natural: you can interrupt the bot when you want to change the topic, and ChatGPT will understand the speed and tone of your voice. and respond in kind with your emotions.
If it looks a bit like the AI voice assistant from the 2013 film Her , that’s no coincidence. In fact, OpenAI demoed the product in a voice that sounded too much like actress Scarlett Johansson, who voices this fictional machine intelligence. Johansson sued, and the company later removed the voice entirely. No matter: you can try nine more voices.
Although OpenAI began testing the enhanced voice mode with a small group of testers back in July , the feature is now being rolled out to all paying users. If you have an eligible account, you can try it out on your end today.
How to Use ChatGPT Enhanced Voice Mode
Currently, only paid ChatGPT subscribers can access the enhanced voice mode. This means you need either a ChatGPT Plus or ChatGPT Teams membership to see this feature. Free version users can still use the free voice mode, which appears in the app as a headset.
Advanced mode appears as a waveform icon, visible only to Plus and Team subscribers. To access this feature, open a new chat and tap this icon. The first time you use Enhanced Voice Mode, you’ll need to choose a voice from nine options. I’ve included OpenAI descriptions for each:
-
Gazebo: light and versatile
-
Breeze: lively and serious
-
Cove: Discreet and direct
-
Amber: Confident and optimistic.
-
Juniper: Open and optimistic.
-
Maple: cheerful and frank.
-
Sol: Smart and relaxed
-
Spruce: calm and assertive.
-
Vale: smart and inquisitive.
I ended up going with Arbor, which reminds me a lot of the guy from Headspace . From here, the enhanced voice mode works very similarly to the standard voice mode. You say something to ChatGPT and it responds back to you.
How ChatGPT Enhanced Voice Mode Really Works
In my short time with the new mode, I didn’t notice much improvement over the previous voice mode. The new voices are new, sure, and I suppose they’re a little more “natural” than the past voices, but I don’t think the conversation feels any more realistic. The ability to interrupt your digital partner sells the illusion a bit, but it’s sensitive: I picked up my iPhone while ChatGPT was talking and it instantly stopped. This is something I noticed in the original OpenAI demo as well. I think OpenAI needs to work on the bot’s ability to understand when the user wants to interrupt and when there is random external sound.
( OpenAI recommends using headphones to avoid unwanted interruptions, and if you’re using an iPhone, turning on voice isolation mode . I used voice isolation mode without headphones, so take that as you will.)
While OpenAI seems to have abandoned the quirky and flirty side of ChatGPT, you can still make the bot laugh—if you ask it. I think the laugh is impressive for a fake voice, but it feels unnatural, like it was taken from another recording to get a “laugh.” However, ask him to make any other similar sounds, such as crying or screaming, and he will refuse.
I tried to turn on voice mode to listen to the song and identify it, but it said it couldn’t do that. The bot specifically asked me to share the lyrics, which I did, and suggested a song based on the vibe of those lyrics rather than the lyrics themselves. So his assumption was completely wrong, but it doesn’t seem to be built for this type of task yet, so I’ll skip it.
Can ChatPGT talk to itself?
I had to pit the two voice modes against each other. The first time I tried it, they kept interrupting each other in a completely socially awkward conversation until one of them glitched and ended up repeating the message they had told me earlier about sharing the lyrics to understand the song. Then another said something like, “Sure, share the text with me and I’ll help you figure it out.” Another replied: “Sure: share the lyrics and I’ll do my best to identify the song.” This went on for five minutes before I stopped talking.
Once I set up the chatbots to have clear chat, they were always going back and forth, saying almost nothing interesting. They talked about augmented reality, cooking, and morning routines with the usual enthusiasm and uncertainty that chatbots are known for . What was strange, however, was that one of the bots ended up saying that if he could cook, he would want to make lasagna; he asked another chatbot about the dishes he likes to cook or would like to try. Another bot replied: “The user likes to drink coffee and watch the news in the morning.”
This is what I told ChatGPT during the last test when they asked me about my morning routine. This proves that OpenAI’s memory function works, but the execution was, um, weird. Why did he answer this way when asked about his favorite recipes? Have I shorted the bot? Did it realize that it was talking to itself and decide to warn the other bot about what was going on? I don’t really like the consequences here.
How Enhanced Voice Ensures User Privacy
When you use Enhanced Voice Mode, OpenAI saves your recordings, including recordings of your side of the conversation. When you delete a chat, OpenAI says it will delete your audio recordings within 30 days, unless the company decides to keep them for security or legal reasons. OpenAI will also retain a recording after deleting a chat if you previously shared audio and the audio clip was disabled from your account.
To make sure you don’t allow OpenAI to train its models with your voice recordings and chat transcripts, go to ChatGPT settings, select Data Controls , then turn off Improve Model for Everyone and Improve Voice for Everyone .