OpenAI Launches STEM-Optimized 'Reasoning' AI Model

OpenAI has launched a new family of AI models that are optimized for "reasoning-heavy" tasks like math, coding and science.

OpenAI o1-preview and its lighterweight counterpart, OpenAI o1-mini, use "chain of thought" reasoning to answer prompts. They may take longer to solve problems for that reason, but are more likely to provide accurate outputs, specifically in response to complex, multistep problems. "Through training, they learn to refine their thinking process, try different strategies, and recognize their mistakes," OpenAI said in a blog post.

Based on reports, "o1" is the public name for "Strawberry," the top-secret AI project that OpenAI has been working on since at least last year, when it was internally labeled "Q-star."

Though the primary o1 model is still in preview, it represents an important step in OpenAI's road to artificial general intelligence (AGI). According to OpenAI's testing, when it exits preview, o1 will significantly outperform GPT-4o and be on par with human experts when asked to solve complex math, chemistry, physics and biology problems:

OpenAI o1 ranks in the 89th percentile on competitive programming questions (Codeforces), places among the top 500 students in the US in a qualifier for the USA Math Olympiad (AIME), and exceeds human PhD-level accuracy on a benchmark of physics, biology, and chemistry problems (GPQA).

o1 also appears better at warding off jailbreak attacks, which are designed to make AI systems violate their own safeguards around security and responsible use. In what OpenAI called one of its "hardest jailbreaking tests," GPT-4o scored 22 (on a 0-100 scale) compared to o1-preview's 84. OpenAI attributed the improvement to its decision to train o1 to include the company's model behavior policies into its chain of reasoning.

"By teaching the model our safety rules and how to reason about them in context, we found evidence of reasoning capability directly benefiting model robustness," OpenAI said. "We believe that using a chain of thought offers significant advances for safety and alignment because (1) it enables us to observe the model thinking in a legible way, and (2) the model reasoning about safety rules is more robust to out-of-distribution scenarios."

The o1 family does have its shortcomings. o1-preview is not yet feature-complete, lacking multimodal support and Web browsing capabilities. "For many common cases GPT-4o will be more capable in the near term," said OpenAI. Meanwhile, o1-mini is less useful for non-STEM prompts — for instance, those that require "broad world knowledge."

OpenAI expects to issue regular updates to improve the models. Meanwhile, it said, "We believe o1 — and its successors — will unlock many new use cases for AI in science, coding, math, and related fields."

Both o1-preview and o1-mini are now available to ChatGPT Plus and Team users, while ChatGPT Enterprise and Edu users will get access sometime next week. Non-paying users of ChatGPT will eventually get access to o1-mini, though OpenAI did not provide a timeframe for this.  

More information is available on the OpenAI site here.

About the Author

Gladys Rama (@GladysRama3) is the editorial director of Converge360.

Featured

  • stacks of glowing digital documents with circuit patterns and data streams

    Mistral AI Introduces AI-Powered OCR

    French AI startup Mistral AI has launched Mistral OCR, an advanced optical character recognition (OCR) API designed to convert printed and scanned documents into digital files with "unprecedented accuracy."

  • open laptop in a college classroom with holographic AI icons like a brain and data charts rising from the screen

    4 Ways Universities Are Using Google AI Tools for Learning and Administration

    In a recent blog post, Google shared an array of education customer stories, showcasing ways institutions are using AI tools like Gemini and NotebookLM to transform both learning and administrative tasks.

  • a professional worker in business casual attire interacting with a large screen displaying a generative AI interface in a modern office

    Study: Generative AI Could Inhibit Critical Thinking

    A new study on how knowledge workers engage in critical thinking found that workers with higher confidence in generative AI technology tend to employ less critical thinking to AI-generated outputs than workers with higher confidence in personal skills.

  • abstract geometric pattern of glowing interconnected triangles, hexagons, and circles in blue, gold, and white, spread across a dark navy-to-black gradient background

    OpenAI Unveils 'Operator' AI for Performing Web Tasks

    OpenAI has launched "Operator," an AI agent designed to perform web-based tasks autonomously using its own browser. Currently available as a research preview for Pro users in the United States, the tool aims to automate everyday activities such as filling out forms, ordering groceries, and even creating memes.