What to know about OpenAI o3-mini

openai chatgpt

OpenAI has just released its latest reasoning model, o3-mini. The model is faster, cheaper, and sometimes smarter than o1. o3 is remarkable, but more important is perhaps the timing of its release.

The release of DeepSeek-R1 has created shockwaves across the tech industry, questioning the dynamics of the AI market. Reportedly, DeepSeek-R1 was trained at a fraction of the cost of state-of-the-art models, questioning the competitive advantage and business models of companies like OpenAI.

o3-mini might be OpenAI’s response to this trend, a signal that it still has the edge, especially as it prepares to raise another round of funding that will bring it to a record valuation of $300 billion, making it one of the most valuable private companies.

What is o3-mini?

o3-mini is a reasoning model, which means it uses inference-time scaling techniques to review and revise its responses. Instead of answering the prompt in one go, it uses more compute cycles to analyze the problem, generate multiple answers, and choose the best one. In the ChatGPT interface, this process is shown as a collapsible “thinking” section. o3-mini and the yet unreleased full version of o3 are especially useful for tasks that have clear and measurable outcomes, such as coding and data analysis.

o3-mini supports three modes: low, medium, and high. The modes determine how much resources the model can use to refine its answer. Giving the model more resources will improve its responses but will also make it slower.

According to figures released by OpenAI, o3-mini outperforms the full-scale version of o1 on advanced benchmarks, including math (AIME and FrontierMath), science (GPQA), and coding (CodeForces and SWE-Bench).

More importantly, o3-mini is much cheaper than o1. It costs $1.10 per million input tokens and $4.40 per million output tokens against $15 and $60 for o1. (For reference, DeepSeek-R1 costs $2.19 per million tokens on the DeepSeek cloud and around $7-8 on U.S. providers.) 

This makes o3-mini very competitive and possibly helps OpenAI regain some of the market it has lost to DeepSeek. Microsoft has already added support for DeepSeek-R1 on its Azure AI Foundry and Perplexity supports Pro Search with DeepSeek-R1.

o3-mini in action

Mastering benchmarks is impressive, but more important is how the models perform on real-world tasks. My previous experiments with o1 and R1 showed that reasoning models still have a long way to go before they can be very reliable in the messiness of the real world. 

I tested o3-mini with the ChatGPT Plus subscription on a task that requires the model to analyze data from the web. Ideally, the application should be able to retrieve the data itself. Unfortunately, ChatGPT Plus does not support web search and file attachments, so I had to paste the data into your prompt. (As a side note, many failures in LLM applications are caused by poor retrieval.)

In my experiments, I provided the model with raw stock price data from Yahoo Finance. I browsed the price history of each of the Mag 7 stocks, filtered it on monthly intervals from January 1, 2024, to the current date. I then copied the <table> element that contained the stock data and copied it into a text file. I inserted all stock tables into a single file and prepended each with the name of the stock. So, this was a messy file that contained plain text and some HTML elements.

I gave the model the following prompt:

The following data contains the stock prices of mag 7 companies from the beginning of each month from January 2024 until today. Say I put 140 USD in stocks on the first day of the month from January to December 2024. I split the money evenly across all Mag 7 stocks. How much would my investment be worth on the latest registered date in the tables?

In my experiments with previous models, I got mixed results with the same prompt. More importantly, I was concerned with the lack of transparency in the previous generation of o1. Since OpenAI mostly hides the reasoning trace of the o1 model, it makes it very difficult to troubleshoot the model’s mistakes and determine how to improve the prompt to get better results.

o3-mini performed impressively. It was able to parse the data (and there is a lot of noise in there), reason over how the investment would be calculated across the months, and calculate the final value of the portfolio. It arrived at the right answer and provided a detailed explanation of how the calculation was done. More impressively (at least in my experience), the reasoning chain is much more detailed and actually contains useful information as opposed to before, where it only provided very vague titles of the different stages of reasoning.

I then prompted o3 to create an interactive HTML file that visualized the growth of the stock prices. It did it perfectly. Then I iterated on the output, adding charts that required more complex calculations, such as stock price growth relative to the base price (January 1, 2024), and stock growth relative to the latest price (i.e., if you had invested in stock X on date Y, how much return would you see on your investment today?). I then followed up with additional queries, such as what would be my returns if I had invested the yearly amount for each stock (12 x 20 = $240) when it was at its lowest price. o3-mini nailed all of them.

I don’t know whether OpenAI has discovered a method to reduce the costs of its models or is accepting losing money to avoid losing market share. But in any case, o3-mini is an impressive comeback for the leading AI lab at a time of uncertainty and shaky alliances.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.