On the 11th day of the OpenAI “12 Days Marathon” release, their mediocre release was once again stolen by Google.
Just now, Google released the experimental "Gemini 2.0 Flash Thinking" model, which is known for its impressive reasoning capabilities and can "explicitly demonstrate its thinking" to solve complex problems. The level is equivalent to that of Ph.D. students in physics, chemistry and biology.
Google CEO Sundar Pichai wrote in a post on social network X that it was "our most thoughtful model yet," along with a smiley face.
As the name suggests, it is built on "the speed and performance of Flash 2.0". Google says it's "trained to think out loud," resulting in "enhanced inference performance."
In order to compete with OpenAI’s o1, Jeff Dean, chief scientist of Google DeepMind, said in a post on Faster Gemini Flash 2.0 model.
Dean shared a demo showing how Gemini 2.0 Flash Thinking can answer a physics problem by "thinking" through a series of steps and then providing a solution. Google shared several demos across physics and probability:
Gemini 2.0 Flash Thinking is now available in Google AI Studio and Vertex AI. It debuted on the Chatbot Arena LLM leaderboard, ranking “#1 in all categories.” Just yesterday, Google launched 2.0 Experimental Advanced in the Gemini app, with Gemini-Exp-1206 also topping the charts.
This is not necessarily "reasoning" in the same way humans do, but it means that the machine breaks down instructions into smaller tasks that produce stronger results.
Another example posted by Google product lead Logan Kilpatrick shows how the model can reason about solving problems involving visual and textual elements. "This is just the first step in our reasoning journey," Kilpatrick said.
Easier to understand and more transparent reasoning
In the developer documentation, Google explains that "Think Mode's responsive reasoning capabilities are stronger than the basic Gemini 2.0 Flash model," And the base Gemini 2.0 Flash model is Google's latest and greatest model, released just 8 days ago.
The new model supports only 32,000 token inputs (approximately 50-60 pages of text), and can produce 8,000 tokens per output response. In Google AI Studio's side panel, the company claims it's best suited for "multimodal understanding, reasoning," and "encoding."
Full details of the model’s training process, architecture, licensing and cost have not yet been released. Currently, it shows zero cost per token in Google AI Studio.
Unlike OpenAI’s competing inference models o1 and o1 mini, Gemini 2.0 allows users to access its step-by-step inference through a drop-down menu, providing a clearer and more transparent understanding of how the model reaches its conclusions.
By allowing users to understand the decision-making process, Gemini 2.0 addresses long-standing concerns about artificial intelligence operating as a "black box" and brings the model (for which licensing terms are still unclear) on par with other open source models from competitors .
Early simple tests of the model by some developers have shown that it can correctly and quickly (within 1 to 3 seconds) answer some questions that are very difficult for other AI models, such as calculating "Strawberry" The number of R's in a word. (See screenshot above).
Native support for image upload and analysis
Gemini 2.0 Flash Thinking is a further improvement over the rival OpenAI o1 series and is designed to handle images on the fly.
o1 started as a text-only model but has since been expanded to include image and file upload analysis. Currently, both models can only return text.
According to the developer documentation, Gemini 2.0 Flash Thinking currently does not support integration with Google search, nor does it support integration with other Google applications and external third-party tools.
Gemini 2.0 Flash Thinking’s multimodal capabilities expand its potential use cases, allowing it to address scenarios that combine different types of data.
For example, in one test, the model solved a difficult problem requiring analysis of text and visual elements, demonstrating its versatility in integrating and reasoning across formats.
Developers can take advantage of these capabilities through Google AI Studio and Vertex AI, where models are available for experimentation.
As competition in the field of artificial intelligence intensifies, Gemini 2.0 Flash Thinking may mark the beginning of a new era of problem-solving models. Its ability to process multiple data types, provide visual reasoning, and execute at scale makes it a strong contender in the inference AI market, competing with OpenAI’s o1 series and other productsComparable.
Reference links: https://lmarena.ai/?leaderboardhttps://analyticsindiamag.com/ai-news-updates/openai-sets-the-stage-for-agentic-ai-with-chatgpt-desktop- apps-for-mac-and-windows/