OpenAI releases advanced speech vision function and a Christmas Easter egg｜OpenAI live broadcast day 6

Image source: generated by Unbounded AI

On December 13, OpenAI launched advanced speech vision on the sixth day of its 12-day conference Function (Advanced Voice with Vision). This means that ChatGPT is now able to interact with users through sight and hearing, providing a real-time visual interaction experience that is as comfortable as video chatting with a real person.

Since December 5th, local time, OpenAI has started an intensive new feature release cycle, and plans to launch new products and features through 12 live broadcast events in the next 12 days. Prior to this, OpenAI has successively released a number of innovations, including the ChatGPT Pro plan, enhanced fine-tuning technology, Sora, the interactive interface Canvas, and how to integrate ChatGPT in Siri on iPhone or Mac in various ways.

It is worth noting that the advanced speech vision function released by OpenAI on the sixth day was actually demonstrated in May. The technology, demonstrated with the release of the GPT-4o model, allows ChatGPT to simultaneously process visual information while conducting voice conversations, such as recognizing images through video input devices. In addition, the technology provides a more natural, real-time conversation experience and is able to recognize non-verbal details, such as the speed of speaking, and respond with emotion.

What’s even more exciting is that during December, OpenAI users will experience a Santa Claus voice with a British English accent, which can naturally conduct video conversations with humans. Starting Thursday, the ChatGPT mobile app will begin rolling out advanced voice-visual features to Teams, Plus and Pro subscribers globally (excluding Europe).

The sixth day of OpenAI’s press conference was led by chief product officer Kevin Weil, with speech and vision technology experts Jackie Shannon and Michelle Qin Qin) and Rowan Zellers also participated in the live broadcast.

In the demonstration session, ChatGPT showed significant progress in video, voice and text memory. It can even remember the names of people shown on camera using only voice descriptions. Advanced voice features make conversations more natural and smooth with its native multi-modal interaction. In addition, it supports video calling and screen sharing capabilities, enabling users to show applications to ChatGPT for assistance with troubleshooting. With the "Share Screen" feature, users can show ChatGPT any app on their phone. Whether opening a message or anything else, users can ask ChatGPT for reply suggestions. Impressively, ChatGPT can also identify the user is currently usingWhich application specifically.

In another demonstration, visual technology expert Rowan Zellers activated ChatGPT's visual recognition capabilities while preparing hand-brewed coffee. Not only did ChatGPT successfully identify the Santa hat on his head and the coffee dripper in his hand, it was also able to guide him step by step through the entire process of making hand-brewed coffee. Throughout the demonstration, ChatGPT's advanced voice features demonstrated a natural and friendly tone, and even laughed at the right time during the conversation, giving people the feeling of communicating with a real person.

OpenAI’s advanced speech vision capabilities are similar to Google’s Project Astra, which was further enhanced in this week’s Google Gemini 2.0 update.

Advanced Voice Vision Function Information Summary:

--Advanced Voice Mode has now added screen sharing and visual recognition capabilities, which can be based on the screen captured by the user's mobile phone camera Or provide corresponding assistance with the information displayed on the screen.

--These new features expand on what Advanced Speech Mode already excels at - simulating everyday human conversation. Not only can these conversations be interrupted at any time, supporting multiple rounds of interaction, but they can also understand jumping thinking patterns.

--In the demonstration session, the demonstrator made coffee according to the guidance of ChatGPT and voice visual function. ChatGPT will simultaneously provide verbal suggestions and guidance as the presenter proceeds step by step.

--During Christmas, OpenAI specially launched the Santa Claus voice function. Users only need to click the snowflake icon on the interface to activate it easily. No matter where the user is, as long as they can use ChatGPT's voice mode, they can experience the voice of Santa Claus. Moreover, when users talk to Santa for the first time, they can talk to Santa as much as they want without being affected by usage restrictions.

--Starting today, the latest mobile app will gradually roll out advanced voice and vision features to all Team users and most Pro and Plus subscribers. For Pro and Plus users in Europe, OpenAI will make this service available as soon as possible. As for enterprise and education users, they will gain access early next year.