Blog

The (not so) hidden costs of AI’s “Bigger is Better” paradigm

October 20, 2024

This article is part of our coverage of the latest in AI research.

In recent years, AI research has been dominated by an arms race toward scale—the “bigger is better” paradigm. Leading AI laboratories and tech giants are locked in competition to build ever-larger compute clusters to training increasingly massive machine models on large datasets.

However, the “bigger is better” paradigm also has undesirable outcomes, including damage to the long-term progress of AI, a new paper discusses. “The perceived success of these approaches has further entrenched the assumption that bigger-is-better in AI,” the authors write. “We argue not that scale isn’t useful, in some cases, but that there is too much focus on scale and not enough value on other research.”

Narrow research

The bigger-is-better mentality has created a self-reinforcing cycle in AI research: it establishes norms that reward scale-focused work and creates benchmarks that, in turn, demand even larger models and greater computational resources. On the other hand, AI conferences and journals are pushed toward asking for experiments at very large scale to compare results with existing models.

This comes at the cost of research directions that are not focused on scale but could have otherwise provided very promising results. Currently, a very small portion of funding and resources go to this kind of research.

Even the recent interest in small language models (SLMs) that can fit on edge devices and run on consumer-grade hardware (e.g., phones and laptops) is not immune from the bigger-is-better mentality. Training SLMs still requires large compute resources and data.

“This dynamic sees academia increasingly marginalized in the AI space and reliant on corporate resources to participate in large-scale research and development of the kind that is likely to be published and recognized,” the researchers write. “And it arguably disincentives work that could challenge the bigger-is-better paradigm and the actors benefiting from it.”

Inefficient solutions

The bigger-is-better paradigm not only creates accessibility barriers but also frequently leads to inefficient solutions. Despite their impressive capabilities, larger models aren’t always the optimal approach for every machine learning challenge. LLMs are showing impressive results on many different tasks, but they are not necessarily the best model for every problem.

For example, decision trees and XGBoost are better suited for tabular data and can outperform LLMs at a fraction of the computational costs. Even smaller BERT-based transformer models can be much more suitable than autoregresssive LLMs in some language-related tasks, such as document classification.

In many cases, ML applications require simple and cost-effective solutions for narrow and well-defined problems. As costly, general-purpose models, LLMs are unsuitable for such applications.

Centralized power

Costs of training ML models — The compute requirements of training ML models continues to rise while the costs of memory and compute remain nearly constant (source: arXiv)

One of the important points the paper raises is that growth in terms of model parameters, data, and compute is faster than the increased capacity of hardware used to train and serve the models. The authors point out that while the size of large models is doubling every five months, the cost of memory has not changed much for a decade. Nvidia accelerators, which are the industry norm used to train models, cost tens of thousands of dollars each. And training models requires large clusters of such cards. Meanwhile, big tech companies are spending large sums on purchasing more accelerators to speed up the training of the next generation of models and maintain their competitive edge.

These incentives and norms push the budget requirements of AI research beyond what is accessible to most university labs. This in turn makes many labs increasingly dependent on close ties with wealthy companies to fund their research, giving these companies an outsized influence over which type of research should be carried out.

“The expense and scarcity of the ingredients needed to build and operate increasingly large models benefits the actors in AI that have access to compute and distribution markets,” the authors write. “This works to concentrate power and influence over AI, which, in turn, provides incentives for those with sufficient resources to perpetuate the bigger-is-better AI paradigm in service of maintaining their market advantage.”

The exorbitant costs of large models also put an extra strain on companies to commercialize them. This incentivizes the development of models that can be turned into profitable products. And hyperscalers with the infrastructure to run the models at scale have an advantageous position in the market.

Where to go from here

With the focus on “bigger is better,” there is a lot of missed opportunities in exploring small-scale AI systems. However, aligning the incentives of the community and industry will not be easy.

“There is no magic bullet that unlocks research on AI systems both small-scale and powerful, but shared goals and preferences do shape where research efforts go,” the authors write. “We believe that the research community can and must act to pursue scientific questions beyond ever-larger models and datasets, and needs to foster scientific discussion engaging the trade-offs that come with scale.”

Pareto curve ML models — The Pareto curve tracks the tradeoff of cost and performance (source: arXiv)

One solution the researchers suggest is to report compute cost, energy usage, and memory footprint for training and inference in machine learning research studies. Solutions should account for efficiency as well as performance when comparing models.

Studies should also display the tradeoff between performance and computing resources, for example in the form of a Pareto curve. Encouraging resource efficiency can open up new directions of research and product development that can benefit the industry as a whole.

“We believe that scientific understanding and meaningful social benefits of AI will come from de-emphasizing scale as a blanket solution for all problems, instead focusing on models that can be run on widely-available hardware, at moderate costs,” the authors write.

Are we at the cusp of a new era for artificial…

What to know about o3 and o4-mini, OpenAI’s new reasoning models

GPT-4.1: OpenAI’s most confusing model

Making data work for you: Challenges, innovations, and lessons learned

Under the hood: The Innovations powering DeepSeek’s AI breakthrough

How to turbocharge your product and market research with DeepSearch

How looking differently at data can save your machine learning project

Building a solid data foundation for generative AI applications

These mistakes can ruin your machine learning project

How to choose metrics for your machine learning projects

How Google’s Agent2Agent can boost AI productivity through inter-agent communication

Demystifying vibe coding: Hype, reality, and why you still need to…

What is Model Context Protocol (MCP)?

What to know about Claude 3.7 Sonnet, Anthropic’s new frontier language…

Everything you need to know about Grok-3

The cash flow conundrum: How technology is reshaping small business finance

What to know about the security of open-source machine learning models

Understanding the impact of open-source language models

What we learned from the deep learning revolution

AI21 Labs’ mission to make large language models get their facts…

The (not so) hidden costs of AI’s “Bigger is Better” paradigm

Narrow research

Inefficient solutions

Centralized power

Where to go from here

Like this:

Leave a ReplyCancel reply

Narrow research

Inefficient solutions

Centralized power

Where to go from here

Like this:

Leave a ReplyCancel reply

Discover more from TechTalks