News center > News > Headlines > Context
The large model has been launched, and it has been "the strongest" for a long time.
Editor
2024-12-20 10:02 4,968

The large model has been launched, and it has been

Image source: generated by Unbounded AI

There is no list that cannot be improved, only data sets that have not been over-fitted;

No The first thing I can't figure out is that I haven't added enough qualifiers for XX field, XX size, and XX language.

Although AI rankings have become the default practice in the industry since the resurgence of deep learning in 2012, is this really correct?

In September last year, an LLM-flavored ironic article caused an uproar on arXiv, "Pretraining on the Test Set Is All You Need." Pre-training is enough.

In addition to complaining about the endless test lists of various large models on the market, this paper directly named phi-1, TinyStories and phi-1.5 as several large models that are blatantly committing list fraud. .

For example, if you use the test set data to ask phi-1.5, the model will immediately give an accurate reply. However, as long as you change a number or change the data format, the answer will immediately become incorrect and hallucinations will occur. .

The reason is very simple. In order to improve the ranking, the model has conducted targeted training on public data sets such as MMLU, GSM8K, Big-Bench, and AGIEval.

The A side of overfitting is that the model has achieved the status of being the number one big model in the ranking process. The B side is that the generalization ability of the model has been greatly weakened. It has lost the ability to create and think that large models should have.

The big model has been launched, and it has been a long time of being "first". Although, this has become an unspoken secret in the large model industry.

So, does "first" really mean stronger ability? Does the so-called most powerful model really exist? What standards are needed for the industry to be implemented?

Perhaps, the essence of successive rankings is the arrogance of the strong that belongs exclusively to the large model industry.

01. Pride and Prejudice of the Best Large Model

Definitely To a certain extent, benchmark test distortion, for large models, appears to be a standard issue, but is actually a publicity issue. In essence, it is a business model problem of how to implement it.

Under the guidance of Scaling law, Hengqiang, a large model powerhouse that has entered the era of trillions of parameters, has become the only survival password: According to public information disclosure, GPT4 alone has approximately 1.8 trillion parameters, and its training process, It needs to consume about 2.15e25 FLOPS computing power. More intuitively, it requires about 25,000 A100 GPUs with full horsepower., training for a full 100 days. At $1 per hour per A100 GPU in the cloud, ideally one training session would cost at least $60 million.

At the same time, the top big model players will usher in a technology or product iteration in about two months on average; in sharp contrast, the domestic big model Six Little Dragons, even now Zhipu, which has the highest valuation, has a total financing amount of only 5.5 billion including the latest round of financing.

On the one hand, there are ever-expanding parameters and increasing costs; on the other hand, there are blind box and black box technical principles, and the deepening cognitive barriers between users. How to prove one's worth in this survival competition where the strong are always strong? Ranking has naturally become the most intuitive shortcut.

But the premise of all this is that the large model is implemented. Is there really the so-called most powerful model?

The answer may be no.

Technology implementation is different from research. In specific scenarios, companies such as Open AI, Google and Anthropic also have to face the impossible triangle between technology, latency and cost. Therefore, in the face of different needs, there are often different optimal solutions: Anthropic's Claude is divided into enhanced version Sonnet and lightweight version Haiku; GPT 4o is divided into standard version and mini version; Google Gemini is divided into enhanced Pro version and Try the Flash version.

A counterintuitive data is that among the enhanced and mini versions of major models, in actual deployment, the mini version with lower latency and low cost is often more popular.

Even if we simply focus on a single dimension of technology, the "strongest model" is still a false proposition. In relatively objective subjects such as physics, chemistry and biology, mathematics, astronomy, etc., each large model often has its own merits; once the perspective changes to writing, painting, and video generation, how to evaluate the optimal large model? 1,000 readers There are 1,000 Hamlets.

As the world's largest cloud service platform, Amazon Cloud Technology has noticed that on Amazon Cloud, different developers tend to focus on different performance. For example, some people will focus on lower latency, lower latency, and lower latency. In terms of cost, some people are more concerned about whether the model has fine-tuning capabilities and can better coordinate different knowledge bases to fix data, while other teams are more concerned about the multi-modal capabilities of the model, or the ability of knowledge diffusion and transfer.

For this reason, what is the so-called most powerful model? In the waves of craze, this topic has been repeatedly discussed, hotly discussed, and reflected on, but it has never been solved.

But what is certain is that any single-dimensional "first" and "strongest" narrative is a simplification of complex scenes.

02. Choice is All You Need

"Strongest" = invincible, it is just the arrogance of the technical supremacists and the prejudice against the real needs of users. This has been reflected in countless industries has been repeatedly confirmed.

At the beginning of the 21st century, many economists and industrialists discovered a strange phenomenon after observing the development of Japan's historically advantageous industries:

Whether television Whether it’s machines, semiconductors, or even cars, Japan is undoubtedly the top advocate of the “strongest” narrative and the best practice cases, but the final solution is without exception, and the collective decline.

For example, Japanese TVs achieved the highest picture quality during the picture tube era, but were defeated by thinner and lighter LCDs in just a few years; the memory chips developed by Japan for the mainframe era once had a shelf life of 30 Years ago, they were defeated by South Korea’s Samsung’s low-priced “defective” products of varying quality in the wave of consumer electronics. Japanese cars are undoubtedly the most durable and value-preserving in the era of fuel vehicles. Even in the new energy era, Japanese cars are no longer the best. The cleanest hydrogen energy fuel cell has been developed, but it has missed the biggest revolution in the automobile industry in the past decade - electrification.

Why are the “strong ones” eliminated first? Biologists have given the answer - Japanese industry has fallen into the Galapagos trap. The "best" that has evolved in a single environment like the Galapagos Islands often faces complex realities. The scene and the needs will appear out of place, or even vulnerable.

Compared with "the best", what the industry needs more is that the needs are seen, the process is well-chosen, and the results are more suitable.

Just like in the database field, even though traditional SQL databases have been dominant for many years, there will still be various noSQL databases popping up, and even noSQL databases can be distinguished into graph databases, document databases, etc. different types.

AI framework is also the best example. Before TensorFlow, cafe was enough to meet the market demand, but then TensorFlow appeared and dominated the world. However, a few years later, PyTorch appeared. The academic field launched a counterattack and became the king of a new generation of frameworks. However, at the same time, TensorFlow and other niche AI ​​frameworks still occupy a considerable market share in the industrial field.

Using the classic sentence pattern of big modelers - Choice is All You Need.

Amazon CEO Andy Jassy shared such an observation at the recent annual cloud service re:Invent conference:

“Within Amazon, all developers Everyone has the right to make their own choices. I originally thought that everyone would choose Anthropic’s Claude model. After all, it has been the top-performing model in the world for the past year or so., it is true that many internal developers are using the Claude model, but they also use the Llama model, the Mistral model, some models developed by themselves, and even models developed by themselves. ”

For example, the financial industry requires absolute accuracy in content generation; most enterprises need to repeatedly balance performance and cost. Even in painting, in scenes such as restoring the Classic of Mountains and Seas, the illusion of large models is the best gift to generate imagination; while in drawing realistic comics or character modeling, any illusion will bring disastrous final results. out of control.

Since there are various judging criteria, it is better to give users full choices than to choose for users.

03. Amazon Cloud Technology’s Choice matters

In fact, Giving customers choices is a slogan promoted by all major public cloud vendors. However, the definitions behind what it means to have choices and the scope of choices often vary. Amazon Cloud Technology is undoubtedly the most open and radical one among them.

In Amazon Cloud Technology, some choices can be interpreted with three meanings.

The first priority is performance or cost. Users have a choice.

During re:Invent, Amazon Cloud Technology launched the newly released self-developed Nova basic model, which includes four versions: Micro, Lite, Pro, and Premier. Among them, Amazon Nova Micro, which can achieve 210 tokens/s, is a pure text model and focuses on high efficiency. Among the remaining three multi-modal large models, Lite focuses on lightweight, Pro focuses on balance, and the flagship model Amazon Nova Premier mainly uses to cope with complex tasks.

The second level, Amazon Cloud Technology or others, users have the choice is the highest principle.

Compared with the self-developed Nova basic model, how to support more models to be put on Amazon Cloud Technology is the real protagonist of this conference.

By applying the shelf concept in e-commerce to cloud services and large models, Amazon Bedrock, Amazon Cloud Technology’s large model shelf, not only launched its own Nova series, but also launched Anthropic’s Claude invested by Amazon. series.

In addition, Amazon Bedrock offers Meta’s Llama, AI21 Labs’ Jurassic, Mistral AI, Technology Innovation Institute’s Falcon RW 1B, NVIDIA NIM microservices, and 100 moreA variety of industry-leading large models.

Not only general-purpose large models, but also Palmyra-Fin in the financial field, Solar Pro as a translation star, Stable Diffusion in the multi-modal direction, Camb.ai in the audio generation direction, and ESM3 generative biology in the biological direction. The models are also all listed on Amazon Bedrock.

From self-research to third-party, from text to multi-modal, from universal to vertical, Amazon Bedrock should be able to provide everything users need.

Of course, it is not just about having choices, the most important thing is the third layer. Amazon Cloud Technology not only allows customers to have choices, but also allows them to choose at a low cost.

If you just put on the shelves third-party models, then most public cloud companies in the industry can do it.

But how to avoid cloud service companies from being players, referees and starters, and to be able to truly develop their own products without partiality and be oriented to user needs, we also need to observe how cloud service companies design users. Threshold of choice.

The first is pricing. How to sell models on Amazon Cloud Technology. Pricing is set by the model provider.

At the same time, in order to reduce the cost and difficulty of selecting large models for users, Amazon Bedrock also provides custom fine-tuning and model distillation (Model Distillation) functions as well as multi-agent collaboration tools (Multi-agent collaboration ), automated reasoning checks (Automated Reasoning checks) and other functions.

On the one hand, it helps enterprises better choose appropriate models; on the other hand, it accelerates efficient collaboration between different models and agents.

Of course, this kind of selective choice is not only reflected in the model side, but also in the fields of computing power and database.

For example, in terms of computing power, Amazon Cloud Technology will provide different levels of EC2 instances. Users can choose standard servers or more powerful UltraServers according to their own computing needs, without being limited to a single chip platform or computing power. force plan.

In the database direction, Amazon Cloud Technology broke the CAP "dilemma" and launched the serverless distributed SQL database Amazon Aurora DSQL and the fully managed serverless NoSQL database Amazon DynamoDB global tables, respecting the real needs of customers.

From models to computing power to databases, the highest principle of all decisions is "Choice matters", allowing users to make decisions freely.

04. Ending

p>

In economics, there is an interesting paradox called Goodhart’s Law.

The background is that once we pay too much attention to or even manage an economic indicator, we often distort the true purpose in order to achieve this indicator, so that the interests of other aspects are sacrificed, so that the indicator itself becomes invalid.

The same is true in the field of AI. When parameters and performance become the only indicators, its powerful distortion field will cause real user needs to be ignored.

Although, AI is used to replace customer service, so it pays more attention to costs, AI is used to help disabled people draw the pictures they want, so it pays more attention to multi-modal capabilities, and AI is used to help companies complete quality inspection optimization, so it pays more attention to efficiency. Countless small changes are the real ingredients of AI changing the world.

In this process, users’ real needs are seen, respected, and given choices, which is the basis for all progress.

Keywords: Bitcoin
Share to: