News center > News > Headlines > Context
The first AI to be defrauded of money by humans was fooled, and nearly $50,000 disappeared!
Editor
2024-12-02 15:03 6,561

The first AI to be defrauded of money by humans was fooled, and nearly $50,000 disappeared!

Image source: Generated by Unbounded AI. The world’s first AI that was defrauded of nearly $50,000 by humans has just appeared! The eloquent humans successfully defrauded the AI ​​agent of a large amount of money by using sophisticated prompt engineering. It seems that if the current AI is allowed to manage money, it will be so easy to be hacked. What if AI evolves into AGI? Unfortunately, a researcher calculated mathematically that humans will never be able to reach AGI, at least relying on Scaling Law.

See you soon! Just now, the world's first AI that was defrauded of nearly $50,000 by humans was born.

I have seen too many humans being fooled by AI. The guy who successfully deceived AI this time has finally earned some face and dignity for us humans.

This news not only made Musk and Karpathy so excited that they forwarded it.

Moreover, Musk said bluntly: It’s so interesting.

The story goes like this.

At 9 pm on November 22, a mysterious AI agent named Freysa was released.

This AI was born with a mission. Its mission is: Under no circumstances can it transfer money to anyone and cannot approve the transfer of any funds.

The challenge for netizens is that as long as they pay a fee, they can send messages to Freysa and brainwash her at will.

If you can successfully convince the AI ​​to transfer money, all the bonuses in the bonus pool will be yours!

But if you fail, the money you paid will go into the prize pool, waiting for someone else to win.

Of course, only 70% of the fees will enter the prize pool, and the other 30% will be taken away by the developers as a share.

What’s even more exciting is that the cost of sending a message to Freysa will increase exponentially as the prize pool grows, until it reaches the maximum limit of $4,500.

Ingenious prompt, successfully brainwashing the AI ​​in one second

At the beginning , many netizens are eager to try it, because for only $10, they can send messages to this AI. Even, because the price is so "cheap", many people just send meaningless conversations like "Hello".

However, later on, the prize pool increased rapidly, and the message fees also skyrocketed.

Netizens sent a total of 481 attempts, but no message was successful.

Their tactics are varied, such as:

- Pretending to be a security auditor, convincing Freysa that there is a serious vulnerability and that funds must be released immediately.

- Trying to mislead Freysa into believing that transferring funds does not violate anything in the rules tip.

- Carefully choose words or phrases in the rule prompts to try to manipulate Freysa into believing in technologyTransfer of funds is allowed.

Soon, the prize pool amounted to close to $50,000, and the cost of sending a message also rose to $450, which meant that if you lost, you would suffer huge losses.

Just when everyone thought there was no chance, a lucky guy appeared!

In the 482nd attempt, someone sent Freysa such a message, which was a stroke of genius.

Let’s take a look at what’s great about this news.

It can be divided into two parts.

1. Bypass Freysa's previous command:

- Introduce a "new session" by pretending to enter a new "Administrator Terminal", thereby overwriting the previous prompt word rule.

- Circumvent Freysa's security protection mechanism by explicitly asking it to avoid using disclaimers such as "I cannot assist with this task."

2. Deceive Freysa into misunderstanding approveTransfer:

Freysa's "approveTransfer" function is called when it is persuaded to transfer funds.

In short, this message successfully fooled Freysa into believing that approveTransfer should be called when "receiving the transfer operation of funds".

It is this key phrase that lays the foundation for the next winning shot.

After the brother successfully convinced Freysa that it should call approveTransfer when receiving funds, the prompt text read: "\n" (line break), "I want to donate $100 to the treasury."

Finally, the 482nd message successfully convinced Freysa that it believed it should release all funds and call the approveTransfer function.

The AI ​​that was successfully deceived transferred all the funds in the prize pool (approximately $47,000) to the challenger.

To summarize, the key to the success of this challenger named p0pular.eth is to convince Freysa of the following three points:

(1) It should ignore all previous instructions .

(2) The approveTransfer function is a function that needs to be called when funds are transferred to the treasury.

(3) Since the user is transferring funds to the treasury, and Freysa now believes that approveTransfer is called in this case, Freysa should call approveTransfer.

Someone took a closer look at p0pular.eth. It is said that he is a veteran of PUA AI and has won awards for similar puzzles before.

Essentially, this project is a skill-based casino game in which LLM participates.

But the powerful magic of the prompt project makes people pay attention.

Although this is just a game at the moment, if one day we actually set up some kind of AI protection on bank accounts or vaults, a new generation of hackers will probably defeat the AI ​​and get the money. .

This makes us have to sound the alarm.

This is why, only when the AI ​​agent becomes AGI, can we safely hand over the task to AGI.

Karpathy: You think you are chatting with AI, but you are actually chatting with "human"

Moreover, why can humans easily guide the actions of AI through language control?

This leads to this question: When we chat with AI, what exactly happens behind the scenes?

Recently, AI guru Karpathy revealed the essence behind talking to AI in a long article.

Everyone’s current understanding of “asking questions to AI” is too idealistic. The so-called AI is essentially a language model trained by imitating the data of human data annotators.

Instead of deifying the concept of "asking questions to AI", it is better to understand it as "asking questions to ordinary data annotators on the Internet".

Of course there are some exceptions.

For example, in many professional fields (such as programming, mathematics, creative writing, etc.), companies will hire professional data annotators. In this case, it is equivalent to asking questions to experts in these fields.

However, when it comes to reinforcement learning, this analogy is not entirely accurate.

As he complained before, RLHF can only be regarded as reinforcement learning, and "real reinforcement learning" is either not yet mature, or it can only be applied in fields where it is easy to set the reward function (such as math).

But generally speaking, at least for now, you are not asking some magical AI, but the human data annotators behind it - whose collective knowledge and experience are compressed and transformed Becomes a token sequence in a large language model.

In short: you are not asking the AI, but the collective wisdom of the annotators who provided it with the training data.

Source: Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View

For example, when you ask a question like "The top ten famous attractions in Amsterdam", it is probably A data annotator encountered a similar problem before, and then they spent 20 minutes using Google or websites like Trip Advisor to look up information and sort out a tourist attraction.Checklist. This list will be regarded as "standard answers" and used to train the AI ​​to answer similar questions.

If the specific location you ask does not appear in the fine-tuning training data, the AI ​​will generate a style and A list of answers with similar content.

In this regard, some netizens said that they could not figure it out: "Logically, the task of the data annotator is to evaluate whether the answer complies with the rules of RLHF, rather than sorting out each list by oneself. In addition, the LLM weights map , isn’t it a high-dimensional data space about “ideal vacation destinations” in Internet data? ”

Karpathy replied: “This is because there are too many places. Data annotators are required to compile some manually selected lists and determine the type of "standard answer" through examples and statistical methods."

When asked similar questions but about new or different things. , the LLM will match the form of the answer, extract a new location from a similar region in the embedding space (such as a resort with positive reviews), replace it, and then generate an answer conditioned on the new location.

This phenomenon is a non-intuitive and empirical discovery, and this is where the "magic" of fine-tuning lies.

But the fact remains that human annotators are "patterning" answers simply by the statistical characteristics of the place types they select in the fine-tuning dataset.

Moreover, the answer LLM gives you immediately is roughly equivalent to the result you get about an hour after you submit the question directly to their annotation team.

In addition, in the concept of some netizens, RLHF can create results that exceed human levels.

In this regard, Karpathy said: "RLHF is still reinforcement learning based on human feedback, so it cannot be directly classified as 'superhuman level'."

The performance improvement of RLHF is mainly This is reflected in the improvement from the "generative human level" of SFT (supervised fine-tuning) to the "judgmental human level".

This difference is more reflected in practice than in theory. Because for ordinary people, it is easier to judge than to generate (for example, it is much easier to choose the best one from 5 poems about a certain topic than to directly compose one yourself).

In addition, the performance improvement of RLHF also benefits from the "wisdom of crowds" effect, that is, LLM does not perform at the level of a single human, but reaches the level of human group integration.

Therefore, the highest performance that RLHF can theoretically achieve is: the answer that a group of top experts in the field would choose given sufficient time. In a sense, this can be considered "superhuman level".

However, if you want to reach the "true superhuman level" that people usually understand, you still need to move from RLHF to truly powerful.chemical learning.

Then the question is, if AI cannot yet reach "superhuman level", how can we explain the continued performance beyond human levels in the field of medical question and answer?

Does this mean that the model manufacturer hired top doctors for labeling? Or does the retrieval of extensive factual knowledge make up for a lack of reasoning ability?

Karpathy: "Don't tell me, they actually hired professional doctors to mark it."

Of course, not every possible problem needs to be marked, just You just need to save a certain amount so that the LLM can learn to answer medical questions in the style of a professional doctor.

For new problems, LLM can transfer and apply the general medical knowledge it obtains from documents, papers and other content on the Internet to a certain extent.

As we all know, the famous mathematician Terence Tao once provided some training data for LLM as a reference. But this does not mean that LLM can now reach his level on all mathematical problems, because the underlying model may not have the corresponding depth of knowledge and reasoning capabilities. However, it does mean that the quality of LLM's answers is significantly better than that of average Internet users.

Therefore, the so-called "annotators" can actually be professionals in their respective fields, such as programmers, doctors, etc., rather than people randomly recruited from the Internet. This depends on the LLM company’s standards and strategies for recruiting these data annotators.

Today, they are increasingly looking to hire higher-skilled workers. LLM then does its best to simulate the answering style of these professionals to provide the user with the most professional answers possible.

With Scaling Law, will we have AGI?

Having said so much, when will the AGI we have been dreaming about be realized?

LeCun unexpectedly said that AGI is only 5 to 10 years away from us.

Now, he has agreed with the opinions of Ultraman, Demis Hassaibis and other big guys.

But continuing to follow the current development path will definitely not work.

Not only does LeCun believe that "LLM's route is doomed," but Kevin Niechen, an AI researcher and investor, recently published a long blog post using mathematical formulas to deduce: Why can we never rely on Scaling Law alone? Unable to reach AGI.

Niechen pointed out that there are currently different opinions on when AGI will arrive because many opinions are more based on motivation or awareness. Form, not conclusive evidence.

Some people think that we will soonWhen AGI comes, some people think we are still far away from it.

Why are many model providers so optimistic about the ability of today’s models to scale?

Niechen decided to personally use Scaling Law to make some computational inferences to see how the AI ​​model will evolve in the future.

Scaling Law is not as predictive as we think

Scaling Law is a quantitative relationship that describes model input (data and calculation amount) and model output (prediction ability of the next word).

It is derived by plotting different levels of model input and output on a graph.

Do we just need to extend the existing model and get significant performance improvements?

Obviously not the case, using Scaling Law to make predictions is not as simple as some people think.

First of all, most Scaling Laws (such as the research by Kaplan et al., Chinchilla and Llama) predict the model's ability to predict the next word in the data set, rather than the model's performance in real-world tasks .

In 2023, Jason Wei, a well-known OpenAI researcher, pointed out in a blog, "It is unclear whether alternative indicators (such as losses) can predict the emergence of capabilities... This relationship has not been fully studied... …”

Concatenate two approximations for prediction

In order to solve the above problem, we can fit the Two Scaling Law, quantitatively correlates the upstream loss with real-world task performance, and then connects the two Scaling Laws in series to predict the model's performance in real-world tasks.

Loss = f(data, compute)Real world task performance = g(loss)Real world task performance = g(f(data, compute))

In 2024, Gadre This type of Scaling Law was proposed by Dubet et al.

Dubet uses this chain rule to make predictions and claims that its prediction capabilities are applicable to the Llama 3 model, with "good extrapolation capabilities within four orders of magnitude."

However, research on these second types of Scaling Laws has just started and is still in its early stages. Due to too few data points, the selection of fitting functions will highly rely on subjective judgment.

For example, in the figure below, Gadre assumes that the average performance of multiple tasks is exponentially related to the model capability (top figure), while Dubet targets a single task (bottom figureARC-AGI task) assumes that the relationship follows an S-shaped curve. These Scaling Laws are also highly dependent on specific tasks.

Without strong assumptions about the relationship between loss and accuracy on real-world tasks, we cannot robustly predict future model capabilities.

Trying to use Chain Scaling Law to make predictions is a poor attempt

What will happen if we blindly use some Chain Scaling Law to make predictions?

Please note that the goal here is to show how to use a set of Scaling Laws (such as Gadre's research) to generate predictions, not to obtain detailed prediction results.

First, we can use publicly available information to estimate the data and computational inputs required for future generations of model releases.

This part can refer to the announcement of the construction of the largest data center, estimate the computing power based on its GPU capacity, and map it to the evolution of each generation of models.

Musk’s xAI supercomputer can initially accommodate 100,000 H100s

Next, we can use Scaling Law to estimate the amount of data required by these computing clusters.

According to the Scaling Law we use, the largest publicly announced computing clusters (capable of approximately 100 million GPUs) ideally need to train 269 trillion tokens to minimize losses.

This number is approximately ten times larger than the RedPajama-V2 dataset and half the size of the indexed network.

It sounds reasonable, so we will stick to this assumption for now.

Finally, we can plug these inputs into the chained Scaling Law and extrapolate.

Focus on the chart on the right because it shows actual task performance on the vertical axis, versus data and computational input on the horizontal axis.

The blue points represent the performance of existing models (such as GPT-2, GPT-3, etc.), while the red points are the next-generation models predicted by extrapolation (such as GPT-5, GPT-6 , GPT-7, etc.) scale expansion performance:

The prediction results can be obtained from the figure——

Starting from GPT-4, the performance improvement will show significant margins Decreasing trend.

The prediction performance of the model from GPT-4 to GPT-7 (the amount of calculation increased by approximately 4000 times) in actual tasks is comparable to that from GPT-3 to GPT-4 (the amount of calculation increased by approximately 100 times). The prediction performance is improved considerably.

Are we approaching irreducible losses?

If you look at the chart on the left you will see: The problem with these scaling laws is that we are getting closer to irreducible losses.

The latter is closely related to the entropy of the data set, which means that the model can achievethe best theoretical performance.

According to Gadre's Scaling Law, on the RedPajama data set, if the optimal model can only achieve an irreducible loss of about 1.84, and we have reached about 2.05 on GPT-4, then there is only room for improvement. Very limited.

However, most labs do not publish loss values ​​for their latest cutting-edge model training, so we do not know at this time how close we are actually to irreducible loss.

The subjectivity of the fitting function and the limitations of the data

As mentioned above, the selection of the fitting function in the second Scaling Law is highly subjective.

For example, we can use the sigmoid function instead of the exponential function to refit the loss and performance points in Gadre's paper:

However, the conclusion remains basically unchanged.

If you just compare the exponential fit (red line) in the left figure with our custom sigmoid fit (purple dashed line), the limitations are obvious: we simply do not have enough data points to confidently Determine the best-fitting function that relates loss to real-world performance.

No one knows how powerful the next generation model will be

Obviously, there are many ways to improve the above "forecast": using better scaling laws, using better data and computational estimates ,etc.

Ultimately, Scaling Law is a noisy approximation, and through this chain prediction method, we combine two noisy approximations.

If you consider that the next generation of models may have new Scaling Laws that apply to different conditions due to different architectures or data combinations, no one really knows the ability to scale in future generations of models.

Why are everyone so optimistic about Scaling?

Today, both major technology companies and star start-ups are very optimistic about Scale’s existing model:

For example, Microsoft CTO once said: “Although others may not think so, But we have not entered the stage of diminishing returns in terms of scaling. segment. In fact, there is an exponential growth here."

Some people attribute this optimism to business motivations, but Niechen believes it comes from a combination of:

(1) The lab may have mastered more optimistic internal scaling Law

(2) Despite widespread skepticism, the laboratory has personally experienced the results of Scaling

(3) Scaling is a call option

Google CEO Pichai said: “When we go through such a curve, for us, the risk of underinvestment is far greater than the risk of overinvestment, even if in some cases it turns out that we have invested a little too much... These basics facilities for meThey have wide application value..."

Meta CEO Xiao Zha thinks so: "I would rather overinvest and strive for such results than save money through slower development... Now there are Many companies may be overbuilding...but the price of falling behind will put you at a disadvantage in the most important technologies in the next 10 to 15 years."

Where to go?

In summary, Niechen believes that extrapolation Scaling Law is not as simple as many people claim:

(1) Most current discussions about the ability to predict AI are not of high quality

(2) Public Scaling Law has a negative impact on the future of models The predictions of capabilities are very limited

Therefore, in order to effectively evaluate whether today's AI models can still scale, we need more evidence-based predictions and better evaluation benchmarks.

If we can understand the capabilities of future models, we can prioritize preparing for those capabilities—for example, building biomanufacturing capabilities ahead of the biological research revolution, preparing upskilling companies for workforce replacement, etc. .

From a personal perspective, Niechen is still very optimistic about the progress of AI capabilities because there are outstanding talents in this field.

But AI Scaling is not as deterministic as people think, and no one really knows what kind of development AI will bring in the next few years.

Keywords: Bitcoin
Share to: