The big model Liu Xiaohu is about to hit the copyright wall

Image source: Generated by Unbounded AI

"Almost no large model company will seek authorization from video websites for AI video generation training." At the beginning of 2025, a large model company determined to "tie Sora" in the field of video generation will hit On the copyright wall.

AI entrepreneur Chen Lin told Alphabet (ID: wujicaijing) that this also includes 6 Chinese large-scale model unicorns known as the "Six Little Tigers of AI". The lack of authorized model training will inevitably become a potential risk for large model companies.

Nowadays, in the ever-growing field of AI video generation, the first case of “video platform suing large model infringement” has appeared.

A few days ago, it was reported that iQiyi sued MiniMax’s Conch AI for copyright infringement. It is reported that the reason may be that MiniMax used iQiyi’s copyrighted materials for model training without authorization, and is currently undergoing judicial proceedings. In response, iQiyi responded to Alphabet and confirmed that the case is under review. MiniMax has not yet officially responded.

When MiniMax just launched the large video model in September last year, its founder Yan Junjie told Alphabet that the data MiniMax used for training, in addition to high-quality data from corpus companies, also purchased part of it. platform data.

Picture note: MiniMAX founder Yan Junwen introduces Conch AI Picture source: Photographed by Alphabet List

Large companies have also repeatedly been involved in disputes over copyright. In August last year, OpenAI was sued by more than 100 YouTube anchors, accusing it of illegally transcribing millions of YouTube videos to train large models. Giants such as Nvidia, Apple, and Anthropic are also involved. Mira, the former CEO of OpenAI, was once asked by a reporter in an interview whether she would use YouTube videos to train Sora. Mira chose to refuse to answer.

Although OpenAI has signed paid agreements with Politico, "The Atlantic", "Time", and "Financial Times" for training, such authorizations belong to the text field. Chen Lin told Alphabet that in Sora In the training agreement of the video, "OpenAI has not signed the corresponding agreement."

Data, algorithm and computing power are the three pillars of large AI models, and data is the basis for training large models. It can be said that the prosperity of generative AI is based on the scale of data. The more training data, the more powerful the model. Data gives models massive amounts of knowledge for learning and thinking, and data has also become part of the technical barriers for model manufacturers.

Unlike BATs, huge private domain databases have been accumulated in the era of graphics and text and the mobile Internet, and these data have long been divided up by different platforms. The late arrival of the large model Six Little Dragons is surrounded by many barriers.

Once iQiyi’s lawsuit succeeds, it may herald the arrival of a larger era of large-scale model copyright disputes.

“iQiyi is successful, YouKu and Tencent Video may also sue," Chen Lin said. This is undoubtedly a pouring of cold water on the big model companies that are running wild. For companies that independently conduct large model training, "videos require copyright fees, and pictures Copyright fees will also be required, and text may also require copyright fees, causing the cost of AI training to be much higher than it is now. ”

Today, there is no conclusion on what is right and what is wrong, but what is certain is that a new copyright wall is being erected on the road to the big model.

Domestic AI video generation The first infringement case has appeared.

Recently, according to foreign media reports, MiniMax was accused of using AI without authorization. Qiyi owns copyrighted materials for model training. iQiyi has filed a lawsuit with the People's Court of Xuhui District, Shanghai, demanding that MiniMax immediately stop the infringement and seek compensation of approximately 100,000 yuan.

At the end of August last year, MiniMax launches video large model, users can experience MiniMa by logging into Conch AI x's video generation model. According to the AI product list data, Conch AI search popularity surged in September last year. Visits to the Conch AI web version increased by 860% in September, ranking first in global and domestic AI application growth in September 2024. .

However, MiniMax, which joins the "tying Sora" army, also Just like OpenAI was sued by the New York Times, it hit a copyright wall.

According to our country's laws, generative artificial intelligence service providers must use data and basic models from "legal sources" to ensure that they are not used. Infringe the intellectual property rights of others

"Internet Law Review" editor-in-chief Zhang Ying told Alphabet. In this civil lawsuit, iQiyi sued and needed to prove that MiniMax used iQiyi data for training without its permission and included its copyrighted content in the generated content. In addition, iQiyi also needed to prove. Prove that the defendant was subjectively at fault, that is, intentionally or failed to fulfill his duty of care.

In other words. There are two possibilities for Conch AI’s “infringement”.

One possibility is that iQiyi’s copyrighted material content was used without authorization in the training generated by MiniMax Conch AI.

Another possibility is that when users use Conch AI, they upload copyrighted materials for AI without iQiyi’s authorization. "Magic transformation"

Caption: AI magic transformation video, the content below is said to be synthesized using AI technology Source: Alphabet screenshots

"The Legend of Zhen Huan" transformation gun battle film, "Dream of Red Mansions" becomes a martial arts drama. Using AI video tools, you can make Er Kang drink beer, Zhen Huan eat burgers, and Lin Daiyu pick up Gatling. Chen Lin said that the above. Some short videos of AI "magic modification" of classic film and television dramas have received millions of views on social platforms.

Most of these "no logic, only funny" videos of AI "magic modification" are not. Authorization, “Part of it is the large model company entrusting a third party to sell the company’s works when promoting its products, and more of it is the user’s nonsensical ideas. "goIn December 2020, the Radio, Television and Network Audiovisual Department also issued a "Management Tips", proposing to investigate and clean up short videos of AI "magically modified" film and television dramas. This means that generative AI will face more detailed content review.

As the sued party, if MiniMax wants to prove that it has not infringed, it probably needs to prove that its data source and generated content have nothing to do with iQiyi, or that there is no intentional infringement.

In Conch AI’s user agreement, users are also required to ensure that they will not use the content produced by the platform or use it for related purposes, including “unauthorized cutting, adaptation of movies, TV series, online movies and TV series, etc.” Class audio-visual programs and excerpts”.

However, it is worth noting that "100,000 yuan in compensation is too little for iQiyi." Zhang Ying said that despite the lack of iQiyi's indictment as a basis, judging from the amount Look, it is speculated that MiniMax’s infringement may not be serious, and the possibility of reconciliation between the two parties is high. Moving towards reconciliation, for MiniMax, whether it is compensation of 100,000 yuan according to the price or a substantial copyright fee, it seems that it will inevitably "lose money" for the copyright wall.

In fact, "Domestic AI practitioners do not have a high awareness of copyright for training data, and generally believe that copyright will hinder AI training."

After leaving a major factory to start an AI application business , Chen Lin found that there seemed to be not many domestic AI training companies that proactively requested copyright authorization. The reason is not only the high cost of AI training itself, but also because once videos, pictures, and even text require copyright, for AI startups, "the model cannot be trained."

Many AI The first step for a startup to do video generation training is to “pick up video training from the Internet.”

In order to avoid copyright risks, companies will use keyword filtering to try to prevent users from entering copyrighted words such as Mickey Mouse to reduce the generation of infringing content.

Illustration: Taking Jimeng AI as an example, AI video generation automatically filters keywords Source: provided by Chen Lin

Xinyi Technology CEO Lei Tao told Alphabet List that as To B AIGC video generation company, the data used by Xinyi Technology to train the large AI video model comes from the data accumulation in previous applications such as Miaopai and Xiaokaxiu, as well as the directional cooperation database, and the "from scratch" training based on the original algorithm. There is” material. But only if the AI video generated is realistic enough can it have a training effect.

For AI startups that have neither accumulated enough data nor can afford the copyright fees for targeted cooperation, it is inevitable to sit on the poker table and hit the copyright wall sooner or later. result.

However, this kind of controversy is nothing new.

In the age of pictures and texts, disputes over picture copyrights once made creators "dare not to add pictures." A major self-media V was informed that dozens of images from search engines in historical articles were suspected of infringement. After deleting all original articles, they also paid a large amount of infringement fees for settlement. A photographer used 173 photos he took for illustrations, but was visuallyChina filed a lawsuit alleging infringement.

Now, it is AI’s turn to stand in the dock.

In China, in June last year, four painters sued Xiaohongshu AI’s large model “Trik AI” for infringement. This was the first domestic case of collective infringement of AI model training data, and the lawsuit is now ongoing.

Abroad, both Meta and OpenAI have been involved in copyright disputes.

At the end of April last year, eight well-known newspapers in the United States, including the New York Daily News and the Chicago Tribune, jointly sued OpenAI and Microsoft, accusing them of using millions of copyrighted news articles to train their AI without permission. Chatbot. To this end, OpenAI has reached paid agreements with news publishers such as Politico, The Atlantic, Time, and the Financial Times to use and quote copyrighted news articles.

As early as May 2023, OpenAI CEO Altman publicly admitted that AI companies will consume all the data on the Internet in the near future. In June 2024, the research institution Epoch AI also released a research prediction that the data available for public training of AI language models will be exhausted by technology companies between 2026 and 2032.

For large model startups, high-quality data is always scarce, and the computing power and application wars among large model companies will also expand to data wars.

It’s just that OpenAI’s lawsuit may have sent a clear signal that high-quality training data is not free. For large model startups, unlike companies such as Meta and BAT, which have accumulated nearly 20 years of social media data, they may only be able to "spend money to pave the way."

However, in order to solve the stumbling block of copyright, large model companies need to answer a question, where does the money come from?

For large-scale startups that have not yet made a profit, investors’ wallets are getting tighter.

According to statistics from the alphabet list, among the six small tigers of large models (including Zero One Thousand Things, MiniMax, Baichuan Intelligence, Zhipu AI, Stepping Stars, and Dark Side of the Moon), five will get it in 2024 Billion-dollar financing. Currently, the valuations of Zhipu, Dark Side of the Moon, Baichuan Intelligence, and Step Star have exceeded 20 billion yuan.

However, according to the "Smart Emergence" report, with valuations reaching 20 billion yuan, the latest single round of financing for domestic large model companies will be stuck at around 5 billion yuan. That is, the higher the valuation, the more difficult it is to raise funds. According to China Renaissance data, the total domestic market investment and financing in the first three quarters of 2024 was 260.3 billion yuan, less than 40% of the same period in the past three years.

Take Dark Side of the Moon as an example. In February 2024, the A+ round completed over US$1 billion in financing, and its post-money valuation rose to US$2.5 billion. The following August, Xiaohongshu, Meituan Longzhu, and Sequoia China, which participated in the first round of investment, were no longer around, and Dark Side of the Moon’s Series B financing amounted to over US$300 million. MiniMax, which is in the midst of the turmoil, has not yet waited for new products after receiving US$600 million in Series B financing in March last year.financing news.

For large-scale startups, the urgency of “waiting for money to get started” is tantamount to waiting for rice to be cooked.

With the billions of yuan in financing waiting, large model start-ups must not only allocate it to model training that continues to pile up parameters, but also allocate it to AI applications that require hundreds of millions of yuan in investment and money-burning marketing. The money to buy copyrights for big models is not rich.

The soul torture of large model startups goes far beyond copyright disputes. High R&D investment and limited commercial realization are the Sword of Damocles hanging over the heads of the "Six Little Tigers". .

Musk once estimated that the training cost of GPT-5 is 30,000-50,000 H100, and the cost of the chip alone exceeds US$700 million (approximately 5 billion yuan). At present, the income of Liu Xiaohu has not been disclosed. According to foreign media reports, MiniMax is expected to have an annual income of US$70 million. At present, it seems that even MiniMax, which has already made money overseas through Talkie, will find it difficult to achieve positive revenue in the short term.

When the water temperature of the large model turns cooler, the first person to feel it may be the practitioners splashing in it.

“One large model company has reduced its headcount to 500 people after layoffs, another company has directly given up on pre-training and the C-end market, and the remaining few have almost gone silent and no longer open HC (recruitment) quota).”

Chen Lin told Alphabet that by the end of 2024, it would be almost impossible to get 700,000+ AI offers. In 2023, as long as you have AI-related experience, you will be able to get an interview opportunity. Now in the past year, not only is it required to have practical experience, but the salary increase has also been discounted. At present, apart from Beijing, there are very few AI job recruitments in other cities.

What is certain is that for large model startups who want to stay on the poker table in 2025, paying for copyrights in the model training process is only the first step. Cost reduction is not a long-term solution. Finding ways to make money and increase efficiency is the key.

Online Consultation