Do large language models have non-verbal reasoning capabilities?

Source: Quantum

A lead article today in Ars Technica explores the question of whether large language models are capable of non-verbal reasoning, citing researchers' findings that in the "latent space" Processing can help artificial intelligence solve difficult logic problems. What's going on? Let's continue reading.

To date, large language models have achieved great success, using their transformer architecture to efficiently predict the next word (i.e., language token) required to respond to a query. However, when it comes to complex reasoning tasks that require abstract logic, some researchers have found that explaining everything through this "linguistic space" can cause some problems, even for modern "inference" models.

Now researchers are trying to solve these problems by designing models that can work out the underlying logic entirely in "latent space" - the hidden layer of computation before the converter generates the language. solution. While this approach does not lead to a sea change in the reasoning capabilities of large language models, it does significantly improve the accuracy of certain types of logic problems and points to some interesting directions for new research.

Wait a minute, what space?

Modern inference models (such as ChatGPT's o1) tend to work by generating "thought chains". In these models, each step of the logical process is represented as a series of natural language word tokens and fed back through the model.

In a new paper, researchers from the Meta Basic Artificial Intelligence Research Group and the University of California, San Diego identify this reliance on natural language and "word tokens" as a "fundamental constraint on these inference models." factor". This is because successful completion of reasoning tasks often requires complex planning of specific key markers in order to find the correct logical path from numerous options.

The above figure illustrates the difference between the standard model, which goes through a converter at every step, and the COCONUT model, which uses hidden "latent" states. (Source: Training Large Language Models to Reason in a Continuous Latent Space)

In current thought chain models, word tagging is usually for "textual coherence" and "fluency," the researchers wrote. ” and “contributes little to the actual reasoning process.” Instead, they suggest, "Ideally, large language models would be free to reason without any language constraints and then translate their findings into language only when necessary."

To achieve this" Ideal," the researchers describe a method for "training large language models to reason in continuous latent spaces," as the paper's title states. This "latent space" essentially consists of a "hidden" set of intermediate label weights thatReset is exactly what the model contains before the transformer generates a human-readable natural language version of that internal state.

In the researchers' COCONUT model (Chain of Continuous Thoughts), these hidden states are encoded as "latent thoughts" that replace individual written steps in a logical sequence when training and processing queries. This avoids having to convert to natural language at every step and "frees reasoning from language space," the researchers write, resulting in an optimized reasoning path they call "continuous thinking."

Broader vision

While logical processing in the latent space has certain benefits for improving model efficiency, the more important finding is that this model can "encode multiple potential subsequent steps simultaneously." Processing logic in a "latent space" enables a kind of instant backtracking that the researchers liken to a breadth-first search in a graph. Rather than searching for each logical option completely and one by one in a "greedy" process.

This emergent, simultaneous processing characteristic manifests itself in testing even if the model is not explicitly trained, the researchers wrote. "While the model may not make the correct decision initially, it can maintain many possible choices in sequential thinking, guided by some implicit value function, and gradually eliminate incorrect paths through reasoning," they wrote .

This diagram highlights some of the ways that different models can fail in certain types of logical reasoning. (Source: Training Large Language Models to Reason in a Continuous Latent Space)

In the relatively simple mathematical reasoning test (GSM8K) or general reasoning (ProntoQA) test, compared with the traditional thinking chain model ,This multi-path reasoning does not really improve the ,accuracy of COCONUT. But the researchers found that the model performed relatively well on a set of randomly generated ProntoQA-style queries involving complex and convoluted sets of logical conditions (e.g., “Every apple is a fruit, every fruit is food, etc").

For these tasks, standard thinking chain reasoning models often fall into dead ends in reasoning when trying to solve logical chain problems, or even produce completely fictitious rules. Previous research has also suggested that the "verbal" logical steps output by these thought chain models "may actually tap into underlying reasoning processes that are distinct from shared reasoning processes."

The new research joins a growing body of research aimed at understanding and exploiting how large language models work at the level of their underlying neural networks. Although this type of research has yet to achieve major breakthroughs, the researchers believe that models pretrained with this "continuous thinking" from the beginning,can "enable models to generalize more effectively across a wider range of inference scenarios."