OpenAI collaborated with professional voice actors to train the models to speak.
The generative artificial intelligence (AI) space continues to heat up as OpenAI has unveiled GPT-4V, a vision-capable model, and multimodal conversational modes for its ChatGPT system.
With the new upgrades, announced on Sep. 25, ChatGPT users will be able to engage ChatGPT in conversations. The models powering ChatGPT, GPT-3.5 and GPT-4, can now understand plain language spoken queries and respond in one of five different voices.
According to a blog post from OpenAI, this new multimodal interface will allow users to interact with ChatGPT in novel ways:
“Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it. When you’re home, snap pictures of your fridge and pantry to figure out what’s for dinner (and ask follow up questions for a step by step recipe). After dinner, help your child with a math problem by taking a photo, circling the problem set, and having it share hints with both of you.”
The upgraded version of ChatGPT will roll out to Plus and Enterprise users on mobile platforms in the next two weeks, with follow-on access for developers and other users “soon after.”
ChatGPT’s multimodal upgrade comes fresh on the heels of the launch of DALL-E 3, OpenAI’s most advanced image generation system.
According to OpenAI, DALL-E 3 also integrates natural language processing. This allows users to talk to the model in order to fine-tune results and to integrate ChatGPT for help in creating image prompts.
In other AI news, OpenAI competitor Anthropic announced a partnership with Amazon on Sep. 25. As
Cryptopurity, reported, Amazon will invest up to $4 billion to include cloud services and hardware access. In return, Anthropic says it will provide enhanced support for Amazon’s Bedrock foundational AI model along with “secure model customization and fine-tuning for businesses.”