Death of the OpenAI Whistleblower: The Dark Side of ChatGPT

Image source: Generated by Unbounded AI

Suchir Balaji, known as the "OpenAI whistleblower", was found dead in an apartment in San Francisco on November 26. He was 26 years old. The authorities ruled that his death was a suicide.

According to foreign media reports, Suchir Balaji grew up in Cupertino, California, and has always been fascinated by artificial intelligence. In 2013, DeepMind's progress sparked his interest in it, prompting him to enter the University of California, Berkeley, to study computer science, graduating in 2021. In the same year, he joined OpenAI and soon participated in the development of GPT-4. His job was to help OpenAI collect and organize a large amount of Internet data to train the ChatGPT large model.

Early on, he, like others, was attracted by the prospects of artificial intelligence, believing that neural networks could solve a series of problems of greatest concern to mankind, from curing diseases to combating climate change. It can be said that for Suchir Balaji, artificial intelligence is not just a string of codes, but a kind of alchemy, a tool to turn imagination into reality.

But as he saw more and more complaints against generative AI, his views slowly changed.

Suchir Balaji elaborated on his views on his personal blog: OpenAI is unreasonable in its use of data because it uses copyrighted materials to train its models without permission, so it infringes The intellectual property of countless original authors, from programmers to journalists. This is equivalent to generating a "substitute" based on the original author's work and taking away the interests belonging to the original author.

Musk shared this news on X, source: Screenshot of Musk’s X account

Generative AI Why infringement?

What Balaji said is true. Whether it is ChatGPT or other generative AI applications, they crawl a large amount of data from the Internet (including copyrighted protected content) to build your own algorithm model.

When we summarize the conventional training process of large models like OpenAI, we will find that it generally includes three steps:

Step 1: OpenAI collects a large amount of text from the Internet, including blogs, articles and Books and more. Some of this data is publicly available, but much of it is protected by copyright. Step 2: The AI analyzes this data to learn how to generate human-understandable text. Step 3: When you ask ChatGPT a question, it will not tell you the original data when it was trained, but its answers will usually draw heavily on information from the original data.

Why is it said that OpenAI will take away the interests of the original creator? To give a less accurate analogy, when you ask ChatGPT, if it can generate aSimilar to the answers given by professional big Vs on Zhihu, then Zhihu will lose the meaning of existence, the big Vs will no longer have the need to exist, and the entire ecosystem will collapse.

Looking back in history, OpenAI defended itself against Balaji’s appeal, claiming that the public data it used complied with copyright law. OpenAI said: "We are using public data to build our artificial intelligence models. This behavior is reasonable and legal... and it is necessary for innovators and more important to the scientific and technological competitiveness of the United States."

< p>One fact is that since the development of generative AI, in addition to Balaji, there are a large number of critics who believe that generative AI will cause a lot of moral dilemmas, and they call on regulators to introduce relevant laws.

Foreign media pointed out that Balaji’s outspokenness on AI ethics issues has won people’s admiration and criticism, and reflects the tension between innovation and responsibility. “Balaji’s death left a core Issues—Ethical Dilemmas of Generative Artificial Intelligence.”

In addition, some foreign media pointed out that the information possessed by Balaji may play a key role in the lawsuit against OpenAI.

A larger ethical dilemma

Balaji is not the only one making accusations. At the end of 2023, the New York Times filed a lawsuit against OpenAI and its partner Microsoft, accusing them of illegally using millions of original articles to train their large models. The New York Times argued that "such unauthorized use directly harms their business."

From a more far-reaching impact, the "New York Times" believes that after ChatGPT can generate content of comparable quality to traditional news organizations, the next step will be artificial intelligence to replace traditional journalism.

Looking further ahead, at the end of 2022, a group of three artists also sued multiple generative artificial intelligence platforms on the grounds that the artificial intelligence used their original works to train their large models without permission, and The works generated by users using AI are very similar to the artist's existing works, and the artist believes that these should be recognized as "unauthorized derivative works." As a result, if a court determines that an AI-generated work is unauthorized and a derivative work, then severe infringement penalties will apply.

There are similar cases in China. The most popular one is "magic modification"-using AI, ordinary people can "transplant" the characters and scenes in classic film and television dramas into new scenes. This is widely used in various industries. Large and short video platforms are very common.

In an interview with CCTV, Zhang Bo, deputy director of the Beijing Two High School (Zhengzhou) Law Firm, pointed out that whether it is changing faces, changing lines, or adding new plots, they are all based on the content of the original film and television drama. Recreate. However, while chasing traffic, many new types of infringements have also emerged.

Zhang Bo believes that AI "magic modification" videos involve film and television actorsimage, such as using AI tools to change the actor's movements, expressions and lines, which is suspected of infringing on the actor's portrait rights and reputation rights.

In addition, at the 8th China Internet Copyright Protection and Development Conference, a group of experts also discussed the copyright risks brought by AI. Yi Jiming, director of the International Intellectual Property Research Center of Peking University, pointed out that in most cases, AI R&D companies have not obtained authorization from the copyright owner, lurking the legal risk of infringing on the copyright of others.

In this context, the industry urgently needs laws and public policies to clearly provide legal guidance on the use of corpus. Creators and the public also need to pay close attention to whether their own interests can be effectively protected. AI large models The legality and rationality of the corpus used in training has become a major issue that needs urgent research in the AI era.

But how to solve this problem? At present, it seems to be very difficult. The first difficulty is that it is difficult to define infringements in artificial intelligence, and the second is that it is difficult to define normative legal documents.

Specifically, copyright law protects original works, but whether the content generated by AI is original and how to judge its originality is a complex and controversial issue. Different countries have different opinions on There are different views on this. For example, the U.S. Copyright Office has made it clear that works automatically generated by AI are not protected by copyright law, while Chinese courts have determined in a series of cases that as long as AI-generated content reflects the original intellectual investment of a natural person, it should be recognized as a work and protected by copyright. legal protection.

Secondly, under the traditional copyright law system, the author of a work is a natural or legal person with legal personality. However, as the main body of content generation, AI's legal status is not clear and it cannot directly enjoy copyright as an author. This leads to the need to reconsider the identification of authorship when making legislation.

Finally, legislators also need to find a balance between protecting the rights and interests of copyright owners and promoting the development of the AI industry. Overly strict laws may inhibit the innovation and application of AI technology, while overly lax laws may harm the interests of copyright owners. "If you regulate, you will die, if you let go, you will suffer chaos." This is one of the reasons why it is difficult to promote supervision.

But one problem we must be aware of is that the development of generative AI will most likely never look back, and the boundary between people’s judgments of AI-generated content and human-generated content will become increasingly blurred. Eventually leading to chaos.

Perhaps, another Balaji’s prediction will solve this problem. He is an Indian entrepreneur and investor born in the 1980s. A Guide to the Future" (This book is called "Interpreting Silicon Valley Entrepreneurs" "A Primer on the "Right Turn" of Thoughts") points out that the untamperable "truth" of the emerging technology "blockchain" is the credible "truth"; using cryptocurrency to add incentive mechanisms to the workflow, through smart contracts To promote collaboration is the possible way of working in the future.

Domestic experts and scholars have also pointed out that institutions in relevant departments should develop more efficient digital fingerprint technology and use blockchain technology.Let each work created by artificial intelligence have a unique digital identity, trace the source of the text generated by artificial intelligence, and ensure the copyright of original human works and works created by artificial intelligence.

Looking at the present from a future perspective, the death of Suchir Balaji may be just the "first chapter" in the unraveling of AI ethical issues. The flow of technology does not depend on human will. The only thing we can look forward to is AI. Become tools of humans, while non-humans become slaves of AI.

Online Consultation