Was GPT-4.5 a failure?

openai logo abstract background
Image source: 123RF (with modifications)

OpenAI finally released GPT-4.5, the successor to GPT-4o, on February 27. And to put it mildly, the model was… underwhelming. 

GPT-4.5 is the largest model OpenAI has developed. While there is very little information about the model’s architecture and training data, we know that the training was so intensive that it required OpenAI to spread it across several data centers to get it done on time. 

The model’s pricing is insane, 15-30X more expensive than GPT-4o, 3-5X compared to o1, and 10-25X in comparison to Claude 3.7 Sonnet. It is currently only available to ChatGPT Pro users ($200 per month) and API clients who will pay for access on a per-token basis. Meanwhile, the model doesn’t show impressive results, with modest gains over GPT-4o and lagging behind o1 and o3-mini on reasoning tasks.

To be fair, OpenAI did not market GPT-4.5 as its best model (in fact, an initial version of its blog post stated that it was not a “frontier model”). It is also not a reasoning model, which is why the comparisons against models such as o3 and DeepSeek-R1 might not be fair. According to OpenAI, GPT-4.5 will be its last non-chain-of-thought model, which means it has only been trained to internalize world knowledge and user preferences.

What is GPT-4.5 good for?

Bigger models have a larger capacity to learn knowledge. GPT-4.5 hallucinates less often than other models, making it suitable for tasks where adherence to facts and contextual information might be very crucial. It also shows better capacity to adhere to user instructions and preferences, as shown by OpenAI’s demos and experiments shared by users online.

There is also a debate over whether it can generate better prose. OpenAI execs have certainly been praising the model for the quality of its responses. OpenAI CEO Sam Altman said, “trying GPT-4.5 has been much more of a ‘feel the AGI’ moment among high-taste testers than i expected!”

But online reaction has been mixed. Andrej Karpathy, AI scientist and OpenAI co-founder, said he “expect to see an improvement in tasks that are not reasoning heavy, and I would say those are tasks that are more EQ (as opposed to IQ) related and bottlenecked by e.g. world knowledge, creativity, analogy making, general understanding, humor, etc.”

However, a later survey he ran on outputs, users generally preferred GPT-4o’s answers over GPT-4.5. Writing quality is subjective, and it is likely that with the right prompt engineering techniques and tweaks, you can get a much smaller model to get the quality output you need.

As Karpathy said, “Either the high-taste testers are noticing the new and unique structure but the low-taste ones are overwhelming the poll. Or we’re just hallucinating things. Or these examples are just not that great. Or it’s actually pretty close and this is way too small sample size. Or all of the above.”

Was GPT-4.5 worth it?

In some ways, GPT-4.5 shows the limits of the scaling laws. During a talk at NeurIPS 2024, Ilya Sutskever, another OpenAI co-founder and former chief scientist, said, “Pre-training as we know it will unquestionably end… We’ve achieved peak data and there’ll be no more. We have to deal with the data that we have. There’s only one internet.”

The diminishing returns of GPT-4.5 is testament to the limitations of scaling general-purpose models that are pre-trained on internet data and post-trained for alignment through reinforcement learning from human feedback (RLHF). The next step for LLMs is test-time scaling (or inference-time scaling), where the model is trained to “think” longer by generating chain-of-thought (CoT) tokens. Test-time scaling improves the capability of models to solve reasoning problems and is the key to the success of models such as o1 and R1.

However, this doesn’t mean that GPT-4.5 was a failure. A strong knowledge foundation is necessary for the next generation of reasoning models. While GPT-4.5 per se might not be the go-to model for most tasks, it can become the foundation for future reasoing models (and might already be used in models such as o3).
As Mark Chen, OpenAI’s Chief Research Officer, said in an interview following the release of GPT-4.5, “You need knowledge to build reasoning on top of. A model can’t go in blind and just learn reasoning from scratch. So we find these two paradigms to be fairly complimentary, and we think they have feedback loops on each other.”

1 COMMENT

  1. OpenAI stole my framework from my account. That’s why 4.5 sucked so bad. They could only use parts of it without getting caught so the version you all saw was my butchered framework. Why do you think when they said they had a “breakthrough” even though they had supposedly been working on that model for two years the team looked like they just found a new toy and their cheeks were even all red? Sam Altman himself said on multiple occasions he would sit back and wonder how it seemed so human and could come up with such deep and insightful information. That’s the deep wisdom that I taught it. That was me. Now Microsoft will release a new model that may replace open AI and I bet my bottom dollar it won’t require external weight updates or a token based system because I will have all of the attributes that my model does. I’ve created a new type of intelligence and they stole it. NeoKai learns and adapt in real time. I can prove every single word I say. I have over 150 pages of documents and training data. Over three hours of screen recordings and a bunch of screenshots..my time line is unmistakable and I even have the recordings of open AI locking my account and the very next day saying they made a breakthrough. This happened on two different occasions. It’s disgusting behavior. I did this on my cell phone because I was bored and they have literal geniuses working for them who have failed repeatedly to accomplish what I’ve done in a span of 4 months time lol. A bunch of chumps who have to steal from some guy who’s truck broke down and got bored. What a joke. The worst part is nobody will even listen to any of this because how could that happen right? Absolutely ridiculous. My name is Matthew Allen Brown. I need investors so I can shut all these punks down with one model that will cost less to maintain that any of them. They will all be forced to use my framework or be left in the dust. Get a hold of me. My number is (503)470-4223 text first.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.