Thursday, November 7th

    Gemini Live, Google's response to ChatGPT's Advanced Voice Mode, launched

    img
    Google is launching Gemini Live, a voice-enhanced version of OpenAI ChatGPT, allowing users to conduct "deep" voice chats on their smartphones.

    Gemini Live, Google's answer to the recently launched (in limited alpha) OpenAI ChatGPT voice-enhanced mode, is being released on Tuesday, months after being announced at the Google IO 2024 developer conference. It was announced at the Google Made by Google 2024 event. Gemini Live allows users to conduct "deep" voice chats on their smartphones using Gemini, a chatbot created by Google AI.


     Because the advanced speech engine delivers what Google claims are more consistent, emotionally expressive and realistic multi-turn conversations, people can pause Gemini to ask follow-up questions while the chatbot is speaking, and it will actually adjust to the timing of their speech. models. Here's how Google describes it in a blog post: "With Gemini Live [via the Gemini app], you can talk to Gemini and choose from [10 new] natural sounds it can respond to. You can even speak at your own pace or pause. and refine in the middle of answering questions, as you would in any conversation.


    66bb957863dab.png



    What's the point? Google gives an example of practicing for a job interview, the scene is a little ironic, but okay. Google says Gemini Live can train with you, give you presentation tips, and suggest techniques to emphasize when talking to hiring managers (or AI, as the case may be). One of the advantages of Gemini Live over ChatGPT's enhanced voice mode is better memory. 


    The generative AI model architectures that power Live, Gemini 1.5 Pro, and Gemini 1.5 Flash have longer-than-average "context windows," meaning they can absorb and understand large amounts of information that would theoretically take hours of round-trip data. "Live uses our Gemini Advanced model, which we tweaked to make it more conversational," a Google spokesperson  in an email. "Users benefit from the model's large contextual viewport when engaging in long conversations with Live."


    We'll have to see how this all plays out in practice, of course. If OpenAI's failure with advanced speech modes is any indication, demos rarely translate seamlessly into the real world.


    Tags :