LEFT SIDEBAR AD

Hidden in mobile, Best for skyscrapers.

Here’s What We Know About GPT-4o (& What to Expect from GPT-5)

1 year ago 962

ARTICLE AD

Today, OpenAI announced its newest AI model, called GPT-4o. The “o” stands for “omni,” because GPT-4o can accept text, audio, and image input and deliver outputs in any combination of these mediums.

Buzz about a new generative pre-trained transformer from OpenAI has been circulating for months. “We’re going to make the model smarter; it’s going to be better at everything across the board,” Sam Altman, CEO of OpenAI, said discussing future iterations of GPT at the World Government Summit in January. “This is a bigger deal than it sounds because what makes these models so magical is that they’re general.”

Performance typically scales linearly with data and model size unless there’s a major architectural breakthrough, explains Joe Holmes, Curriculum Developer at Codecademy who specializes in AI and machine learning. “However, I still think even incremental improvements will generate surprising new behavior,” he says. Indeed, watching the OpenAI team use GPT-4o to perform live translation, guide a stressed person through breathing exercises, and tutor algebra problems is pretty amazing.

While we still don’t know when GPT-5 will come out, this new release provides more insight about what a smarter and better GPT could really be capable of. Ahead we’ll break down what we know about GPT-5, how it could compare to previous GPT models, and what we hope comes out of this new release.

A brief timeline of GPT models

June 2018

GPT-1

OpenAI put generative pre-trained language models on the map in 2018, with the release of GPT-1. This groundbreaking model was based on transformers, a specific type of neural network architecture (the “T” in GPT) and trained on a dataset of over 7,000 unique unpublished books. You can learn about transformers and how to work with them in our free course Intro to AI Transformers.

February 2019

GPT-2

In late 2019, OpenAI developed GPT-2, the successor to GPT-1. This large transformer-based language model had 1.5 billion parameters — variables that the model learns from data during training — and was trained on a dataset of 8 million web pages. For context: that’s 10 times the parameters and data as GPT-1 used.

GPT-3

With GPT-3, OpenAI upped the number of parameters to 175 billion.

GPT-3.5

In November 2022, ChatGPT entered the chat, adding chat functionality and the ability to conduct human-like dialogue to the foundational model. The first iteration of ChatGPT was fine-tuned from GPT-3.5, a model between 3 and 4. If you want to learn more about ChatGPT and prompt engineering best practices, our free course Intro to ChatGPT is a great way to understand how to work with this powerful tool.

March 2023

GPT-4

The latest GPT model came out in March 2023 and is “more reliable, creative, and able to handle much more nuanced instructions than GPT-3.5,” according to the OpenAI blog about the release. In the video below, Greg Brockman, President and Co-Founder of OpenAI, shows how the newest model handles prompts in comparison to GPT-3.5.

May 2024

GPT-4o

OpenAI announced their new AI model called GPT-4o, which stands for “omni.” It can respond to audio input incredibly fast and has even more advanced vision and audio capabilities.

TBD

GPT-5

An official release date for GPT-5 hasn’t been announced yet.

What to expect from GPT-5

Even more multimodality

When Bill Gates had Sam Altman on his podcast in January, Sam said that “multimodality” will be an important milestone for GPT in the next five years. In an AI context, multimodality describes an AI model that can receive and generate more than just text, but other types of input like images, speech, and video.

In September 2023, OpenAI announced ChatGPT’s enhanced multimodal capabilities, enabling you to have a verbal conversation with the chatbot, while GPT-4 with Vision can interpret images and respond to questions about them. And in February, OpenAI introduced a text-to-video model called Sora, which is currently not available to the public.

The newest model, GPT-4o, uses one neural network to process all different types of input: audio, vision, and text. For example, you could use your device’s camera to show ChatGPT an object and say, “I’m learning Spanish, how do you say the name of this item in Spanish?” The new model will detect what the object is and translate it incredibly quickly. Take a look at this demo video from OpenAI to see it in action:

Future GPT upgrades will expand on the modalities that ChatGPT can work with: “Clearly, people really want that,” Sam said on the podcast Unconfuse Me. “We’ve launched images and audio, and it had a much stronger response than we expected.”

Improved “reasoning” and accuracy

AI systems can’t reason, understand, or think — but they can compute, process, and calculate probabilities at a high level that’s convincing enough to seem human-like. And these capabilities will become even more sophisticated with the next GPT models.

“Maybe the most important areas of progress will be around reasoning ability,” Sam said on Unconfuse Me. “Right now, GPT-4 can reason in only extremely limited ways.” GPT-4o has improved reasoning on par with GPT-4 Turbo, and it can answer general knowledge questions at 87.2% accuracy. GPT-5 will likely be able to solve problems with greater accuracy because it’ll be trained on even more data with the help of more powerful computation.

“Pre-training” is the process where the model learns from training data to generate probability distributions. The more diverse and robust the training data is, the better AI is at generating new content.

One thing to keep an eye on is the context window, Joe says. A token is a chunk of text, usually a little smaller than a word, that’s represented numerically when it’s passed to the model. “It’s basically how the model understands language,” Joe says. Every model has a context window that represents how many tokens it can process at once. GPT-4o currently has a context window of 128,000, while Google’s Gemini 1.5 has a context window of up to 1 million tokens.

“If GPT-5 makes similarly huge context available to the public I think it’ll have profound implications for research, learning, and analysis across a variety of domains,” Joe says. “You’ll be able to paste huge amounts of knowledge into a single question you’re asking the model, saving countless hours and dramatically increasing the productivity of knowledge work.”

Customization capabilities

The ability to customize and personalize GPTs for specific tasks or styles is one of the most important areas of improvement, Sam said on Unconfuse Me. Currently, OpenAI allows anyone with ChatGPT Plus or Enterprise to build and explore custom “GPTs” that incorporate instructions, skills, or additional knowledge. Codecademy actually has a custom GPT (formerly known as a “plugin”) that you can use to find specific courses and search for Docs. Take a look at the GPT Store to see the creative GPTs that people are building.

Sam hinted that future iterations of GPT could allow developers to incorporate users’ own data. “The ability to know about you, your email, your calendar, how you like appointments booked, connected to other outside data sources, all of that,” he said on the podcast.

How to use GPT-5

The release date for GPT-5 hasn’t been announced yet, but it’s safe to say that it’s in the works. (OpenAI had been working on GPT-4 for at least two years before it officially launched.)

GPT-4o will be available for everyone, even people with ChatGPT’s free membership tier (paid ChatGPT Plus subscribers have higher message limits). Additionally, developers can access GPT-4o through the API as a text and vision model. You can select the model you want to work with from a dropdown menu in ChatGPT:

You can change the model you work with in ChatGPT by clicking the dropdown menu.

TL;DR

So, what does all this mean for you, a programmer who’s learning about AI and curious about the future of this amazing technology? The upcoming model GPT-5 may offer significant improvements in speed and efficiency, so there’s reason to be optimistic and excited about its problem-solving capabilities. But it’s not going to instantly change the world.

“People have these unrealistic expectations that GPT-5 is going to be doing back flips in the background in my bedroom while it also writes all my code for me and talks on the phone with my mom or something like that,” Logan Kilpatrick, Head of DevRel at OpenAI, said on an episode of Lenny’s Podcast. “I’m like, ‘That’s not the case.’ It’s just going to be this very effective tool, very similar to GPT-4, and it’s also going to be become very normal very quickly.”

It’s crucial to view any flashy AI release through a pragmatic lens and manage your expectations. As AI practitioners, it’s on us to be careful, considerate, and aware of the shortcomings whenever we’re deploying language model outputs, especially in contexts with high stakes.

The best way to prepare for GPT-5 is to keep familiarizing yourself with the GPT models that are available. You can start by taking our AI courses that cover the latest AI topics, from Intro to ChatGPT to Build a Machine Learning Model and Intro to Large Language Models. We also have AI courses and case studies in our catalog that incorporate a chatbot that’s powered by GPT-3.5, so you can get hands-on experience writing, testing, and refining prompts for specific tasks using the AI system. For example, in Pair Programming with Generative AI Case Study, you can learn prompt engineering techniques to pair program in Python with a ChatGPT-like chatbot. Look at all of our new AI features to become a more efficient and experienced developer who’s ready once GPT-5 comes around.

This blog was originally published in March 2024 and has been updated to include new details about GPT-4o, the latest release from OpenAI.