ChatGPT Now Shows You Your Thought Process
o1, OpenAI’s latest generative artificial intelligence model, is available now. On Thursday, the company announced o1-preview and o1-mini , marking a departure from the GPT naming scheme. There’s a good reason for this: OpenAI says that, unlike its other models, o1 is designed to spend more time “thinking” about problems before returning results, and will also show you how it solved your problem.
In a statement to OpenAI, the company said this new “thought process” helps its models try new tactics and reflect on their mistakes. According to the company, o1 performs “as well as graduate students” in biology, chemistry and physics. While GPT-4o solved 13% of the International Mathematics Olympiad problems, o1 reportedly solved 83%. The company also emphasized that the models are more efficient for coding and programming. This “thinking” means that o1 takes longer to respond than previous models.
As OpenAI research director Jerry Tworek told The Verge , o1 learns through reinforcement learning. Instead of looking for patterns in the training set, o1 learns through “rewards and punishments.” OpenAI doesn’t reveal the specific methodology, but claims that this new thinking model causes hallucinations less than previous models, although it still hallucinates.
There are two versions of o1: o1-preview, a fully functional version of the model, and o1-mini, a lightweight version created on a similar platform. The company is reportedly shipping these models at an early stage of development and states that this is the reason why they do not include standard GPT features such as web access and file and image uploading.
o1-preview thinks a hot dog is a sandwich?
I admit, I’m not a programmer, and I don’t have to solve many complex math problems on a daily basis. This makes it difficult to properly test OpenAI’s latest models for their proposed strengths and use cases. What I can appreciate, on the non-technical side, is the thought process of o1-preview: when you propose a new model, it now displays a feedback message as the question is processed. (e.g. “Thinking…”) When finished, it displays the results as you’d expect, but with a drop-down menu above.
I used OpenAI’s suggested prompt “Is a hot dog a sandwich” and the response was preceded by the message “Thought for 4 seconds.” (His answer, by the way, consisted of three paragraphs of “it depends.”)
Anyway, when I clicked the “Thought for 4 Seconds” dropdown, I saw the logic of the model: For this prompt, it broke its process into two parts. The first, Question Analysis, says, “Okay, let me see. Asking whether a hot dog is a sandwich involves understanding semantics and considering OpenAI’s policies, focusing on accuracy, and avoiding personal opinions or prohibited content.” The second, “Exploring Definitions,” reads: “I consider whether a hot dog is a sandwich by looking at definitions and cultural views. This opens up space for discussion.” I think that’s all that’s needed to answer the question.
What about tacos? Is this a sandwich?
I also asked o1 for his opinion on another controversial food-related issue: is a taco a sandwich? A model can say a lot. After thinking for a full five seconds, the AI returned a 364-word answer. His thought process included focusing on definitions, refining them ( “I define a taco by its basic ingredients: tortilla, filling, and sauce. This helps determine whether it meets the definition of a sandwich.” ) and exploring viewpoints ( “I look into the classification of tacos and sandwiches , highlighting their culinary differences: tacos use tortillas, sandwiches use bread; tacos have cultural roots in Mexican cuisine, while sandwiches come from European influences.”
While acknowledging that this is “a topic of debate,” they concluded that the answer depends on definitions drawn from culinary traditions, cultural contexts, and even legal interpretations, weighed “key differences” (notably, there is no bread in a taco, and although a sandwich involves placing ingredients between pieces of bread; taco involves placing ingredients on a tortilla ).
All things considered, o1 concluded that a taco is not a sandwich, according to “most culinary experts and food enthusiasts” – even citing a court case in which a judge ruled that a burrito is not a sandwich. ( Here’s the context if you’re interested.)
But is a taco a hot dog?
As a follow up, I asked o1 if he would classify a taco as a hot dog. Nine seconds later, he gave the definitive answer: “Although both tacos and hot dogs involve placing fillings inside a bread form or bread-like base, they are not the same and belong to different culinary categories.” Here it is, the Internet. You can stop arguing about this.
o1 can also perform more complex non-sandwich tasks.
Let’s try another one. I chose the second clue offered by OpenAI: “Create me a 6×6 nonogram puzzle where the solved grid looks like the letter Q.”
As you’d expect from a more demanding request, o1-preview took longer to process this task—84 seconds to be exact. It presented just such a puzzle with instructions on how to solve it. Clicking on the drop-down menu required 36 separate thought processes while the prompt was being executed. In the “Puzzle Formulation” section, the bot said, “I’m going through the process of creating a 6×6 nonogram whose solution will be the letter Q. We need to design a grid, get hints, and present the puzzle for a solution.” He then tries to figure out how to include the “tail” of the letter Q in the image. He decides that he needs to adjust the bottom row of the layout to add the tail before continuing to figure out how to complete the puzzle.
It’s definitely fun to scroll through each step of o1-preview. OpenAI apparently taught the model to use words and phrases like “OK,” “hmm,” and “I’m wondering” when “thinking,” perhaps to make the model sound more human. (Is this really what we want from an AI?) However, if the query is too simple and only takes a couple of seconds to solve, it won’t show its work.
It’s still very early days, so it’s hard to say whether o1 represents a significant leap over previous AI models. We’ll have to see if this new “thinking” actually improves on the usual features that tell you whether a piece of text was AI-generated .
How to try OpenAI o1 models
These new models are available now, but you’ll need to be an eligible user to try them out. This means having a ChatGPT Plus or ChatGPT Team subscription. If you are a ChatGPT Enterprise or ChatGPT Ed user, the models should be available next week. Users of the free version of ChatGPT will receive o1-mini at some point in the future.
If you have one of these subscriptions, you will be able to select o1-preview and o1-mini from the model drop-down menu when you start chat. OpenAI says that at launch the weekly limits are 30 posts for o1-preview and 50 for o1-mini. If you plan to test these models frequently, just keep this in mind before spending all your posts on day one.