OpenAI Rolls Out Next Evolution of ChatGPT, Able to Accept or Output Any Combination of Text, Audio, or Image -- Campus Technology

Breaking News

OpenAI Rolls Out Next Evolution of ChatGPT, Able to Accept or Output Any Combination of Text, Audio, or Image

By John K. Waters
05/13/24

OpenAI is introducing a new iteration of its flagship GPT-4 large multimodal language model.

Called "GPT 4o." (The "o" stands for "omni"), this new flagship model was designed, the company said, to "reason" across audio, vision, and text in real time.

OpenAI also announced the release of the desktop version of ChatGPT, and a refreshed UI designed to make it simpler to use and more natural.

The new iteration was designed to accept as input any combination of text, audio, and image, and to generate any combination of text, audio, and image outputs. In a blog post, the company said it can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, "which is similar to human response time in a conversation." This level of performance matches GPT-4 Turbo performance on text in English and code, the company says, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.

The new iteration will be free for all users, said OpenAI CTO Mira Murati during the livestream announcement, and paid users will continue to have up to five times the capacity limits of free users. "The special thing about GPT 4o is that it brings GPT-4-level intelligence to everyone, including our free users," Murati said. "A very important part of our mission is to be able to make our advanced AI tools available to everyone for free. We think it's very, very important that people have an intuitive feel for what the technology can do."

The company plans to roll out the full capabilities of the new model iteratively over the next few weeks, Murati said.

"For the past couple of years, we've been very focused on improving the intelligence of these models," Murati said, "and they've gotten pretty good. But this is the first time that we are really making a huge step forward when it comes to the ease of use. And this is incredibly important, because we're looking at the future of interaction between ourselves and the machines. And we think that GPT 4o is really shifting that paradigm into the future of collaboration, where this interaction becomes much more natural and far, far easier."

Because GPT-4-class intelligence is now available to free users via GPT 4o, Murati said, builders posting to the ChatGPT Store have a larger audience. "University professors can create content for their students, or podcasters can create content for their listeners," she said, "and you can also use vision so now you can upload screenshots, photos, documents containing both texts and images. And you can start conversations with chargeability about all of this content. You can also use memory, which makes ChatGPT far more useful and helpful, because now it has a sense of continuity across all your conversations. And you can use browse where you can search for real time information in your conversation."

This iteration also improves the quality and speed in 50 different languages for ChatGPT, Murati said, which makes the experience available to many more people.

"This is something that we've been trying to do for many, many months. And we're very, very excited to finally bring GBT four o to all of our users," she said.

OpenAI CEO Sam Altman said in a post on X that GPT 4o is "our best model ever. it is smart, it is fast, it is natively multimodal." Developers will have access to the API, "which is half the price and twice as fast as GPT-4 Turbo," Altman added on X.

During the live stream, OpenAI team members demonstrated some of the new model’s audio capabilities. Responding to a greeting from OpenAI researcher Mark Chen's greet, it said, "Hey there, what’s up? How can I brighten your day today?" Chen said the model has the ability to "perceive your emotion" and demonstrated by asking the model for help calm him down ahead of a public speech, and then panting dramatically. A calming female voice responded with, "Woa, calm down," and started guiding Chen in some slow, calming breathing. OpenAI team member Barret Zoph asked it to analyze his facial expressions to show off its ability to perceive emotions accurately.

"As we bring these technologies into the world, it's quite challenging to figure out how to do so in a way that's both useful and also safe," Murati said. "And GPT 4o presents new challenges for us when it comes to safety, because we're dealing with real time audio, real time vision. And our team has been hard at work figuring out how to build in mitigations against misuse. We continue to work with different stakeholders out there from government, media entertainment, all industries red teamers civil society to figure out how to best bring these technologies into the world."

Read the full OpenAI blog post here.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at [email protected].

E-Mail this page

Printable Format

Featured

Faculty Need Training, Time, and Tools to Make Course Content Accessible, Survey Finds

In a recent survey by Anthology, only one in five faculty (22%) said they consistently consider accessibility when designing course materials. And just 11% felt they had the right tools and training to create accessible course content.
Linux Foundation to Host Protocol for AI Agent Interoperability

The Linux Foundation has announced it will host the Agent2Agent (A2A) protocol project, an open standard originally developed by Google to support secure communication and interoperability among AI agents.
A Return to Openness: Apereo Examines Sustainability in Open Source

Surprisingly, on many of our campuses, even the IT leadership responsible for the lion's share of technology deployments doesn't realize the extent to which the institution is dependent on open source. And that lack of awareness can be a threat to campuses.
Jamf to Acquire Identity Automation, Combining Identity and Device Management in One Platform

Apple mobile device management company Jamf has announced the intent to acquire Identity Automation, a provider of identity and access management (IAM) solutions for K-12 and higher education.

CAMPUS TECHNOLOGY NEWS

Email Address*Country*Select primary job title/function*

Please type the letters/numbers you see above.

OpenAI Rolls Out Next Evolution of ChatGPT, Able to Accept or Output Any Combination of Text, Audio, or Image

Featured

Faculty Need Training, Time, and Tools to Make Course Content Accessible, Survey Finds

Linux Foundation to Host Protocol for AI Agent Interoperability

A Return to Openness: Apereo Examines Sustainability in Open Source

Jamf to Acquire Identity Automation, Combining Identity and Device Management in One Platform

Portals

Artificial Intelligence

Cybersecurity

Data & Analytics

Learning Tools

Student Services

WEBCASTS

How Colleges and Universities Can Take the First Step with a Managed Services Provider

Avoiding Shiny Object Syndrome: How to Lay a Solid Foundation for Emerging Tech

How an AI-Powered Admissions Modernization effort Kicked Off a Data Transformation Journey at Illinois Tech

Whitepapers

Compare Your IT Budget to Peer Institutions—See the Savings Potential

The Pros and Cons of In-House IT

How Managed IT Services Solve Cybersecurity and Staffing Challenges and Save Colleges up to 30%

Transforming Student Engagement and Support with AI-Powered Communication

SPONSORED CONTENT