Avatar photo

By Faizel Patel

Senior Digital Journalist


WATCH: Google I/O 2024: All the Gemini and Android announcements

Google CEO Sundar Pichai and others from Google's leadership team announced all kinds of AI-powered goodies


Google kicked off its I/O 2024 developer conference with a rapid-fire stream of announcements, including advancements in search, creative tools, and music it’s been working on putting a heavy significance on artificial intelligence (AI)

Google CEO Sundar Pichai and others from Google’s leadership team announced all kinds of AI-powered goodies from the stage at Mountain View, California this week.

The Google I/O 2024 was packed with exciting announcements centred around the Gemini family of models to innovative search features, AI-powered enhancements across various Google products, new AI music creation tools, and the introduction of Google’s new family of models, Gemma.

Here’s a full recap of Google’s news and updates from the #Google I/O

Gemini 1.5 Pro and Gemini 1.5 Flash

Google announced the general availability of Gemini 1.5 Pro, a powerful AI model with a 1 million token context window, enabling it to process vast amounts of information like an hour of video or 1,500 pages of a PDF and respond to complex queries about this source material.

Gemini 1.5 pro will also be available in more than 35 languages starting – providing access to the latest technical advances, including deep analysis of data files like spreadsheets, enhanced image understanding and a greatly expanded context window, starting at 1 million tokens.

Additionally, Google also introduced Gemini 1.5 Flash, a more cost-efficient model built based on user feedback, with lower latencies; and Project Astra, Google’s vision for the next generation of AI assistants, a responsive agent that can understand and react to the context of conversations.

ALSO READ: WATCH: OpenAI unveils GPT-4o, a new ChatGPT that listens and talks

Google is integrating Gemini into Search, enhancing its ability to understand and respond to complex queries

This includes features like:

AI Overview – Designed for advanced multi-step reasoning, planning, and multimodal capabilities. This enhancement ensures that people can ask intricate, multi-step questions, tailor their search outcomes, and interact using videos for an enriched query experience. This is set to launch soon, starting in the US before expanding globally.

Multi-step reasoning: Breaks down complex questions into smaller parts, synthesising the most relevant information and stitching it all together into a comprehensive AI Answer.

Search with video: Allows users to ask questions about video content by taking a quick video and get AI-powered answers in response. This feature will be available beginning with the US and rolling out to other regions over time.

Gemini for Android

Google said Gemini is also being integrated into Android to power new features like Circle to Search,” which allows users to search for anything they see on their screen.

“This feature is expanding to more surfaces like Chrome desktop and tablets.”

It said Gemini Nano will enhance TalkBack and Android’s screen reader, with new features that make it easier for people with visual impairments to navigate their devices and access information. This feature will come first to Pixel devices later in the year.

Other features included Live scam detection where Gemini Nano will be used to detect scam phone calls in real-time, providing users with warnings and helping them avoid falling victim to fraud.

An AI assistant on Android will also provide contextual suggestions and anticipating user actions based on the current app screen. This feature will be available where the Gemini app is already available and requires Android 10+ and 2GB+ RAM. The new overlay features announced at I/O will roll out over the coming months.

The upgraded Circle to Search in action. Video: Google

Gemini features in Gmail mobile app

Google also announced that users will see more detailed suggested replies in Gmail.  Gemini will automatically provide draft email responses that you can edit, or simply send.

“Gemini can analyse email threads and provide a summarised view directly in the Gmail app. Gemini in Gmail will offer helpful features, like “summarise this email,” “list the next steps,” or “suggest a reply,” when you click the Gemini icon in the mobile app.

Google said Help me write in Gmail and Docs is also supported in Spanish and Portuguese.

Photos and Video

Google Photos is getting a new feature called “Ask Photos,” which uses Gemini to answer questions about photos and videos, such as finding specific images or recalling past events.

Veo is Google’s most capable video generation model, capable of creating high-quality 1080p videos up to a minute or more long. Veo closely follows user prompts and offers unprecedented creative control, accurately following directions like quick zooming or slow-motion crane shots.

It captures the nuance and emotional tone of prompts in various visual styles, from photorealism to animation, and maintains consistency across complex details. Veo builds upon years of generative video model work and combines architecture, scaling laws, and novel techniques to improve latency and output resolution.

Veo is available to select creators in private preview in VideoFX by joining the waitlist. In the future, Veo’s capabilities will be included in YouTube Shorts and other products.

Music

Other announcement made by Google at the I/O include music AI tools.

Google is collaborating with musicians, songwriters, and producers, in partnership with YouTube, to better understand the role of AI in music creation. They are developing a suite of music AI tools that can create instrumental sections, transfer styles between tracks, and more.

These collaborations Google said inform the development of generative music technologies like Lyria, Google’s most advanced family of models for AI music generation.

“New experimental music created with these tools by Grammy winner Wyclef Jean, electronic musician Marc Rebillet, songwriter Justin Tranter, and others was released on their respective YouTube channels at I/O.”

AI test kitchen

Google is also  extending SynthID to text and video, allowing for watermarking of AI-generated content. SynthID can now embed a digital watermark directly into the pixels of an image or video, making it imperceptible to the human eye but detectable for identification.

Google said the  AI Test Kitchen is expanding its reach, now available in over 100 countries and territories, including several in Sub-Saharan Africa like Kenya, Nigeria, South Africa, and more.

“Users can now experience and provide feedback on Google’s latest AI technologies, like ImageFX and MusicFX, in 37 languages, including Arabic, Chinese, English, French, German, Hindi, Japanese, Korean, Portuguese, and Spanish.” Google said.

ALSO READ: Experience fashion squared with the new HUAWEI WATCH FIT 3: Where fashion meets advanced functionality

Read more on these topics

Android Artificial Intelligence (AI) google