Google attacks OpenAI!

Image source: Generated by Unbounded AI

On December 12, when OpenAI announced that ChatGPT will be fully integrated into Apple, Google released a new generation of large model Gemini 2.0. It is worth noting that Gemini 2.0 is designed for AI agents (AI Agents). ) was born.

Google CEO Sundar Pichai said in an open letter, “Over the past year, we have been investing in the development of more “agent” models, that is, models that can better understand the world around you. world, think multiple steps ahead and perform tasks for you under your supervision. Today, we are excited to welcome a new generation of models – the Gemini. 2.0, our most powerful model yet. Through new advances in multimodality—such as native image and audio output—and native tool usage, we are able to build new AI agents that bring us closer to universal AI. The vision of an assistant.”

Google DeepMind CEO Demis Hassabis also said that 2025 will be the era of AI agents, and Gemini 2.0 will be the latest generation model to support our work based on agents.

At present, the Gemini 2.0 version has not yet been officially launched. Google said that it has been provided to some developers for internal testing. The first to go online is the experimental version of Gemini 2.0 Flash, which is stronger than Gemini 1.5 Pro. The experimental version has been opened on the web page. Gemini users can access Gemini 2.0 Flash through the PC. The mobile version will be launched soon.

According to the benchmark test results released by Google, the performance of Gemini 2.0, the experimental version of Flash, almost surpasses Gemini 1.5 in terms of multi-modal picture and video capabilities, coding, mathematics and other capabilities. Pro, and the response speed is increased by 2 times.

Google concentrates its firepower on AI agents

Through Google's With this update, we can already get a glimpse of the glacier of its AI layout - everything is for the intelligent agent.

1. More powerful multi-modal capabilities:

In addition to supporting multi-modal input such as images, videos and audios, Gemini 2.0 Flash experimental version also supports multi-modal output, such as Natively generated images combined with text and manipulated multilingual text-to-speech (TTS) audio.

2. More professional AI search:

Google has launched a new agent function called Deep Research in Gemini Advanced. This feature combines Google’s search expertise with Gemini’s advanced reasoning capabilities to generate research reports on a complex topic, equivalent to aa personal research assistant.

3. Multiple agents have been updated and launched:

The agent Project Astra based on Gemini 2.0 has been updated: Astra’s new features include support for multi-language mixed conversations; the ability to communicate in Gemini Call Google directly from the app Lens and map functions; memory capabilities are improved, with up to 10 minutes of intra-session memory, making conversations more coherent; with new streaming processing technology and native audio understanding capabilities, the agent can understand language with a latency close to human conversation. It is worth noting that Astra is Google’s forward-looking project for the glasses project. Google mentioned that it is porting Project Astra to more mobile terminals such as glasses.

Releases Project Mariner, an agent for browsers: The agent can understand and reason about information on the browser screen, including pixels and web page elements (such as text, code, and images), Then use this information to help you complete your task through a Chrome extension.

Released Jules, an AI programming agent specially built for developers: Jules supports direct integration into GitHub workflow. Users can use natural language to describe problems and directly generate code that can be merged into GitHub projects;

Publish a game agent: It can interpret the screen in real time, give suggestions for the next step through the actions on the user's game screen, or directly communicate with you through voice while you are playing the game.

Google said that early next year, it will expand Gemini 2.0 to more of its products. The previously launched AI Overviews will be integrated with Gemini 2.0 to improve complex problem processing capabilities, including advanced mathematical formulas, multi-modal queries and programming. Limited testing has begun this week, with plans to roll out and expand to more countries and languages next year.

Online Consultation