OpenAI is launching a new flagship generative AI model named GPT-4o, which will be introduced “iteratively” into the company’s developer and consumer products over the coming weeks. There had been speculation that a search engine would be rolled out but CEO Sam Altman denied the rumors.
OpenAI’s CTO, Muri Murati, stated that GPT-4o offers “GPT-4-level” intelligence while improving the capabilities of GPT-4 in text, vision, and now audio.
Murati stressed the growing complexity of these models and the goal of making interactions more natural and effortless, stating, “We want the experience of interaction to actually become more natural, easy, and for you not to focus on the UI at all, but just focus on the collaboration with [GPTs].”
Say hello to GPT-4o, our new flagship model which can reason across audio, vision, and text in real time: https://t.co/MYHZB79UqN
Text and image input rolling out today in API and ChatGPT with voice and video in the coming weeks. pic.twitter.com/uuthKZyzYx
— OpenAI (@OpenAI) May 13, 2024
What features does GPT-4o have?
During a keynote at OpenAI’s offices, Murati explained, “GPT-4o reasons across voice, text and vision. This is incredibly important, because we’re looking at the future of interaction between ourselves and machines.”
@openai GPT-4o reasons across text vision and speech.
Starting today anyone can use-GPTs and ChatGPT-4o-vision-memory-browse (research across your chats)-qualitiy and speed in 50 different languagesfor free.
Paid users will have 5x more capacityChatGPT-4o is:2x faster… pic.twitter.com/7E5UQuV0dB
— Erik Machorse (@erikmachorse) May 13, 2024
The predecessor, GPT-4, was capable of processing both images and text, performing tasks such as extracting text from images or describing their content. GPT-4o extends these functionalities to include speech.
Significantly changing the ChatGPT experience, GPT-4o allows for more interactive and assistant-like interactions. Previously, ChatGPT included a voice mode that converted text to speech. Now, GPT-4o enhances this feature, enabling users to interrupt ChatGPT during responses, with the model offering “real time” responsiveness. It can also detect emotional cues in the user’s voice and respond in various emotive tones.
GPT-4o also boosts ChatGPT’s visual capabilities. Whether analyzing a photograph or a computer screen, ChatGPT can now rapidly respond to queries ranging from software code analysis to identifying clothing brands. The company is also releasing a desktop version of ChatGPT and introducing a revamped user interface.
Starting today, the new model is accessible in the free tier of ChatGPT and is also available to OpenAI’s ChatGPT Plus subscribers with “5x higher” message limits. OpenAI plans to introduce the new voice feature powered by GPT-4o to Plus users in alpha within the next month.
🚨 BREAKING: OpenAI’s new voice assistant acts as a translator. Impressive range of emotion and fluency throughout. pic.twitter.com/JPNJjLAGhn
— Zain Kahn (@heykahn) May 13, 2024
The model also has improved multilingual capabilities, with enhanced performance across 50 different languages, according to OpenAI. In OpenAI’s API, GPT-4o operates at double the speed of its predecessor, specifically GPT-4 Turbo, which costs half as much and offers higher rate limits.
What new features are available for free ChatGPT users?
With the rollout of GPT-4o, ChatGPT free users are set to experience a suite of new features, including GPT-4 level intelligence. Users will be able to receive answers directly from the model, as well as access information pulled from the web.
GPT-4o will also be able to do data analysis and visualizations such as creating charts. People will also be able to use the chat function to talk about their photos, allowing users to engage in discussions or seek information about images they upload. The model also supports users with more complex tasks such as file uploads for help with summarizing documents, writing content, or performing detailed analyses.
Finally, there is now a Memory feature, designed to build a more helpful experience, remembering previous interactions and context to provide a more cohesive and personalized user journey.
Featured image: Canva