Sign up for our daily and weekly newsletters to stay up to date with the latest updates and exclusive content on industry-leading AI coverage. More information
Sometimes it seems like Google is falling behind in the generative AI race against rivals like Meta, OpenAI, Anthropic, and Mistral. But that’s no longer the case.
Today, the company has overtaken most others by Gemini Live announcementa new speech mode for its AI model Gemini via the Gemini mobile app, allowing users to speak to the model in plain, everyday language and even interrupt it and have it respond in the AI’s human voice and cadence. Or as Google put it in a post on X: “You can now have a free-flowing conversation and even interrupt or change the subject, just like you would during a normal phone call.”
If that sounds familiar, it's because OpenAI demonstrated its own “Advanced Voice Mode” for ChatGPT in May who openly compared it to the talking AI operating system from the movie Heronly to postpone the function and begin to rolled it out selectively to Alpha participants only at the end of last month.
Gemini Live is now available in English on the Google Gemini app for Android devices via a Gemini Advanced Subscription ($19.99 USD per month), with an iOS version and support for more languages in the coming weeks.
In other words, while OpenAI was the first to demonstrate a similar feature, Google plans to make it available to a much wider potential audience (more than 3 billion active users on Android and 2.2 billion iOS devices) much faster than ChatGPT's advanced voice mode.
Still, part of the reason OpenAI delayed ChatGPT's advanced voice mode may be due to its own internal “red-teaming,” or controlled adversarial security testing, which found that the voice mode in particular sometimes engaged in strange, disturbing, and even potentially dangerous behavior, such as imitating the user's own voice without permission — which could be used for fraud or malicious purposes.
How is Google addressing the potential harm this kind of technology causes? We don’t really know yet, but VentureBeat has reached out to the company to ask and will update if we hear back.
What is Gemini Live good for?
Google pitches Gemini Live as a platform that delivers fluid, natural conversations that are great for brainstorming ideas, preparing for important conversations, or just casually chatting about “a variety of topics.” Gemini Live is designed to respond and adapt in real time.
Additionally, this feature can work hands-free, allowing users to continue their interactions even when their device is locked or other apps are running in the background.
Google further announced that the Gemini AI model is now fully integrated into the Android user experience, delivering more context-aware assistance tailored to the device.
Users can open Gemini by long-pressing the power button or by saying, “Hey Google.” This integration allows Gemini to interact with the content on the screen, such as providing details about a YouTube video or generating a list of restaurants from a travel vlog to add directly to Google Maps.
In a blog post, Sissy HsiaoVice President and General Manager of Gemini Experiences and Google Assistant, emphasized that the evolution of AI has led to a reinterpretation of what it means for a personal assistant to be truly helpful. With these new updates, Gemini is poised to deliver a more intuitive and conversational experience, making it a reliable sidekick for complex tasks.