OpenAI Promises Next ChatGPT Model Will Reason Better

OpenAI has unveiled a new model for its products that will be available to users towards the end of January 2025: it’s called o3 (we seem to have jumped over o2 ), and it promises another significant step forward in the thinking about artificial intelligence. It will make tools like ChatGPT better than ever at programming and solving math problems, the developers say.

OpenAI CEO Sam Altman called the o3 “incredibly smart” in a video announcing the model , released as part of his company’s 12 Days of OpenAI event during the holiday season. Before full launch, the model is undergoing various security tests – initially, most likely only for paid users of ChatGPT Plus .

According to OpenAI, the o3 model is more than 20 percent better than the previous o1 model at encoding, according to the SWE-bench Verified test . It also performs well on math and science problems, at least according to benchmark tests—like the o3 model, the o3 model is trained to think and reason before answering, carefully checking the accuracy of its answers. Along with the major update, OpenAI will also release a smaller and faster o3-mini model.

The pattern of filling the squares with a dark blue square is simple for humans, but difficult for AI – and this is the task that is part of ARC. 1 credit

We won’t know how good o3 is until users can actually test it, but we already have an idea of ​​what o3 is capable of as it was tested in the famous Abstraction and Reasoning Corpus (ARC) challenge. designed to track the actions of AI. progress towards artificial general intelligence (AGI), a somewhat controversial area in which AI cognitive abilities are superior to humans.

This task forces the AI ​​to come up with new approaches to solving problems rather than simply relying on its memory, and involves a series of visual tasks that the models must complete. They must match patterns in colored grids—exercises that would be easy for humans to do without any training, but difficult for AI to understand.

Within the computing power limits of the ARC test, the o3 scored 75.7% . This is much higher than the 5% achieved by the GPT-4o model, which is currently the best ChatGPT model available to free users . Although we still lack AGI (the model is still below human estimates and cannot complete all tasks), this is an impressive step forward.

o1 and o1-mini are currently available to ChatGPT Plus users. 1 credit

“OpenAI’s new o3 model represents a significant step forward in AI’s ability to adapt to new tasks,” writes François Chollet , the software engineer who developed the ARC benchmark. “This is not just an incremental improvement, but a real breakthrough, marking a qualitative shift in AI capabilities compared to the previous limitations of LLM.”

As you might expect, OpenAI didn’t talk about the power requirements of AI, the ethics of training AI on public data that may be copyrighted, or the tendency of these models to hallucinate incorrect answers – although there should be fewer errors due to the extra thinking time o3. , they will not be eradicated. The company mentioned expanding its security testing program to prevent these models from being used for malicious purposes.

The ability of AI models to truly “think” or “reason”—or at least attempt to somewhat approximate these human capabilities—will no doubt continue to be debated as AI advances. Google also just unveiled its Gemini 2.0 model , which enables enhanced thinking.

More…

Leave a Reply