What’s in ChatGPT’s new generative AI model and how does it work?

OpenAI has raised the bar in the highly competitive world of generative artificial intelligence by introducing a new model that it hopes will attract more users to its platform and fend off all challengers.

GPT-4o is an updated version of the underlying major language model technology that powers ChatGPT. It was rumored last week that it would be launched as a search engine to challenge Google, but Reuters reported that OpenAI has postponed this.

OpenAI CEO Sam Altman denied any launch – only to post on X that the company has been “working hard on new things that we think people will love”.

The “o” in the name stands for “omni,” and the California-based company touts GPT-4o as something for everyone, which makes sense because “omni” means “all” or “everything” – if OpenAI wants to be ubiquitous ? in our lives?

What is GPT-4o?

Short answer: GPT-4o is the “new flagship model that can reason over audio, images and text in real time,” according to OpenAI.

Shorter answer: it is OpenAI’s fastest AI model.

The name “omni” refers to “a step toward much more natural human-computer interaction,” OpenAI said in a blog post Monday.

It is also inherently multimodal, meaning it can accept any combination of text, audio, and images as input, and can also generate any combination of text, audio, and images as output.

How fast is GPT-4o?

OpenAI claims that GPT-4o can respond to audio input in just 232 milliseconds, with an average of 320 milliseconds, which according to several studies is comparable to human response time in a conversation.

Consequently, GPT-4o requires the use of fewer tokens in languages, the basic unit in AI that calculates the length of text and can include punctuation and spaces. The number of tokens varies from language to language.

Among the languages ​​highlighted by OpenAI that use fewer tokens with GPT-4o are Arabic (from 53 to 26), Gujarati (145 to 33), Hindi (90 to 31), Korean (45 to 27) and Chinese (34 to 24) . .

For perspective, we can make some comparisons to a 1968 study by Robert Miller: Response time in human-computer conversational transactions – which described the three parameters of computer mainframe responsiveness.

The research shows that a response time of 100 milliseconds is perceived as instantaneous, while a second or less is fast enough to give users the feeling that they can interact with the information freely. A response time of more than 10 seconds would completely lose the user’s attention.

How does GPT-4o work?

The simplest answer is that OpenAI has simplified the process of converting input into output.

In OpenAI’s previous AI models, Voice Mode was used to talk to ChatGPT with an average latency of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4). Voice Mode used three separate models: a simple model transcribes audio to text, GPT-3.5 or GPT-4 records and outputs text, and a third simple version converts that text back to audio.

“This process means that the main source of intelligence, GPT-4, loses a lot of information – it cannot directly perceive tone, multiple speakers or background noise, and it cannot express laughter, song or emotion,” according to Open AI.

But with GPT-4o, OpenAI was able to merge all these functions into a single model, with end-to-end capabilities for text, images and audio, significantly reducing the amount of time consumed and the amount of information processed.

“All input and output are processed by the same neural network,” OpenAI said. A neural network is an AI technique that teaches computers to process data in the same way as the human brain.

Still, OpenAI said it’s “just the beginning” of GPT-4o’s capabilities and limitations, as it’s their first model to combine all these modalities.

What can GPT-4o not Doing?

Speaking of limitations, OpenAI acknowledged “several” of them in the GPT-4o model, including inconsistencies in the answers seen in a blooper reel. It even showed how GPT-4o can be adept at sarcasm.

Additionally, OpenAI said it continues to refine the model’s behavior through post-training – which is critical in addressing security issues, a major bottleneck in modern AI.

The company said it has created new safety systems that serve as guardrails for voice output, in addition to testing the model, with more than 70 experts in social psychology, bias, fairness and disinformation to identify any risks that could trickle down.

“We will continue to mitigate new risks as they are discovered. We recognize that GPT-4o’s audio modalities pose a variety of new risks,” OpenAI said.

How much does GPT-4o cost?

Good news: It’s free for all users, with paid users enjoying “up to five times the capacity limits” of their free peers, OpenAI chief technology officer Mira Murati said in the revealing presentation.

However, if you are not a paying OpenAI user, you will get $5 and $15 back for a million tokens worth of inputs and outputs respectively.

Allowing the free use of GPT-4o should serve OpenAI well, which would also complement the company’s other paid offerings.

In August, OpenAI launched its ChatGPT Enterprise monthly subscription, the price of which varies depending on user requirements. It is the third tier after the free basic service and the $20 per month Plus plan.

The company launched its online ChatGPT Store in January, which gives users access to more than three million customized versions of GPTs developed by OpenAI’s partners and its community.

OpenAI hopes to attract more users as competition increases in the generative AI world – and there’s a lot riding on them.

How does OpenAI currently stack up against its biggest rivals?

OpenAI’s move to introduce a new, free and faster large language model is indicative of how it has its hands full against the competition in the generative AI space.

Google, perhaps its biggest rival in the space, has Gemini, the first AI model to beat human experts at massively multitask language understanding, one of the widely used methods to test AI’s knowledge and problem-solving skills.

Gemini can be accessed through the Google One AI Premium plan for $19.99 per month, which includes 2TB of storage, 10 percent back on Google Store purchases, and more features in Gmail, Google Docs, Google Slides, and Google Meet.

In February it launched Gemma, aimed at helping developers and researchers ‘build AI responsibly’ and is intended more for modest tasks such as simple chatbots or summary tasks.

Anthropic, meanwhile, launched Claude 3 in March – the direct challenge to generative AI leader OpenAI.

The company, backed by Google itself and Amazon, has three tiers – Haiku, Sonnet and Opus – each offering increasing capabilities to suit the user’s needs.

Haiku costs $0.25 per million tokens (MTok) for input and $1.25 for output, while Sonnet costs $3 and $15. Opus is the most expensive at $15 and $75.

By comparison, OpenAI’s GPT-4 Turbo costs $10 for input and $30 for output, and also with a smaller context window of 128,000 MTok.

Microsoft, OpenAI’s biggest backer, charges $20 per month for its Copilot pro service, which guarantees faster performance and “everything” the service offers. If you’re not willing to pay, there’s a free Copilot tier, which obviously has limited functionality.

And then there’s Grok from xAI, from OpenAI’s friend-turned-foe, Elon Musk.

The current version of Grok, Grok-1.5, is only available to subscribers of X’s Premium+ level, which starts at $16 per month, or $168 per year.

Regional entities are also targeting the leaders: On Monday, Abu Dhabi’s Technology Innovation Institute introduced the second version of its major language model, Falcon 2, to compete with models developed by Meta, Google and OpenAI.

Also on Monday, Core42, a unit of Abu Dhabi’s artificial intelligence and cloud company G42, launched a bilingual Arabic and English chatbot developed in the UAE, Jais Chat. It can be downloaded and used for free on Apple’s iPhones.

Updated: May 15, 2024, 10:34 AM