From the time Alphabet CEO Sundar Pichai walked onto theannual Google I/O stageto the time the two-hour-long event wrapped up, the team would mention AI more than 120 times. That count, of course, is according to Gemini itself. The annual event held in California on May 14 was heavily focused on Gemini 1.5 Pro, Google’s latest update to the AI platform formerly known as Bard.

Google I/O 2024: The 13 biggest announcements from the show

Android 15 wasn’t the focus at all. Instead, it was AI, AI, AI.

The updates coming toGoogle Geminifocus on “making AI helpful for everyone,” as Pichai described. Key to the newest AI skills are the ability to mix and match text with audio, photos and video as well as the ability to now handle one million tokens (or two million, for developers). That will soon empower Gemini to use your phone’s camera to ask questions about your surroundings, have Gemini return that online order you didn’t like, orrecognize scam calls on Androidin real time, to name just a few of the on-stage demonstrations.

Untitled design (87)-1

If you missed the biggest announcements coming from Google’s largest developers conference, or perhaps tuned out after the first Taylor Swift joke, we’ve rounded up the biggest problems that Google’s AI will soon attempt to solve.

1Searching the web when you don’t know exactly what to search for

You could soon search with video

With the latest updates, Pichai says Gemini will even do the Googling for you. Rolling out today, searchers will be able to ask Google a question and have Gemini answer right in Search.

But perhaps the more powerful tool is the ability to search when you don’t have the right words to explain what you are looking for. In the coming weeks, Google is rolling out video capabilities in Search. In the demonstration, the company showed how you could use video to fix a record player or a film camera when you don’t even know what the name of the broken part is or why its not working.

AI Overviews - Search with Video (still) copy 2

Google’s AI will soon power a more powerful web search that allows you to ask multiple questions in one. Multistep reasoning capabilities allow Search to answer multi-part questions. For example, the company demoed searching not just for a nearby yoga studio, but searching for specific characteristics, like studios that are beginner-friendly and within walking distance.

If you don’t know what to ask, Google says Search will soon get AI organization, rolling out to dining first. This means you can search for a place to spend your anniversary dinner, and Search will organize into different options to give you more ideas, like rooftop dining or historic places. While the organization is heading first to dining, it will soon also roll out books, music, shopping, hotels and more.

AI Overview - Sofa (still) copy 2

2Ask about real world objects in real time

Give Gemini a live camera view and get real-time data

Alphabet’s AI will soon help users search in the world around them, much like Google Search helps find things on the web. During I/O, the company demonstrated Project Astra, which uses live video to search the surroundings in real-time, tackling things like finding a specific book on your physical bookshelf to asking where you left your glasses.

During the demonstration, the feature worked both on a smartphone and using AR glasses. The demo also showed asking the AI questions in real-time, from locating a specific object to showing the AI code and asking what it does.

google ar glasses in io demo video 2

Did Google sneak a pair of A/R glasses into its I/O demo?

Despite no mention of them at all, Google may have dropped some big hardware news at its IO event. Could we see the return of Google Glass?

The beginnings of these video features will be rolling out to the Gemini app later this year.

google ar glasses in io demo video

3Consolidate long-form content, even across multiple apps

Subscribers can feed the AI up to 1,500 PDF pages

OpenAI finally has a ChatGPT desktop app. Mac users get first dibs

A Windows version will be launched “later this year,” according to OpenAI.

But the update doesn’t just bring the ability to handle large amounts of data, but the ability to work across multiple apps. For example, you’re able to ask Gemini to summarize all the emails from your child’s school in Gmail, but it can also read the Google Meet board meeting and summarize that as well.

4Transform large data into a new format

Turn your study notes into an auditory lecture

Gemini’s large data summarization capabilities sound impressive, but Gemini will also be able to change the format of that data. It isn’t limited to summarizing text and then spitting out more text – it can tell you about those documents audibly.

Google is bringing homework help and a multimodal Gemini Nano to Android

Math and science questions could soon be trivial if you’ve got an Android phone.

According to the demo, you can even interrupt this summary to ask more questions. In the demo, this capability was used to consolidate multiple resources from a student to generate a study guide, take practice tests, or listen to an audible lecture on the topic.

5Search your photos for answers

Gemini can use your photos to answer personalized questions

Gemini’s enhanced search capabilities also extend to Photos. Yes, Google Photos already has a search box. But, instead of delivering multiple images of your car when you ask it for your license plate number, Gemini can soon jump straight to the answer, listing your license plate number instead of a hundred photos of your car that might contain the correct information.

Gemini will make searching your overwhelming Google Photos library suddenly easy

Searching through years of your personal photos might soon be easy as pie.

you may also soon ask it milestone questions, like when your child first learned to swim, and it will simply tell you the answer rather than displaying all photos of a swimming pool.

6Generate more detailed photos, even with text

Generative photos, video and music also gets a major boost

The Gemini updates also extend to its generative capabilities for images, video and music. A key update for images is the ability to handle text. AI typically can’t place text on an image without creating nonsensical, misspelled words. Google’s Senior Research Director Doug Eck says that the new Imagen 3 creates more detailed generative images with fewer distortions, but is also better at rendering text. (OpenAI similarly announcedenhanced capabilities with text on images during its event yesterday.)

Video generation also gets a boost with Veo, the new generative video model. It delivers more tools like creating aerial images and timelapses, along with tools like extending the length of an existing video.

How I joined the waitlist for Google’s Veo AI video tool

The photo and video capabilities, along with enhanced music AI, don’t yet have a launch date but are available to select creators through Google Labs, with a waitlist open now.

7Summarize tasks in Gmail

Gemini can soon automate tasks for you

Gmail’s AI integration is about to get a lot more advanced than simple reply suggestions. Rolling out to Google Lab users this September, Gemini will soon power tasks like asking your Gmail questions. It can also create rules for future emails, like adding a receipt sent to your email to an expense tracker in Sheets, then continuing to update that document with new Sheets.

9 Gmail settings I immediately change to improve my email experience on iPhone

If you’re using the Gmail app on iPhone, there are some tweaks and key settings you can do change the Gmail app and make it more useful.

Those features begin rolling out to Google Labs in September.

8Answer questions or flag scammers inside Android apps

Android users can use Gemini within more key apps

Gemini AI is Google’s new secret weapon against spam calls

Pixel phones are morphing into the bane of every phone scammers' existence.

Part of this integrated Android AI experience is scam detection, where the AI listens to your calls and immediately alerts you if it suspects the caller is a scammer. Google says that this feature is currently in testing.

9Let AI Agents to do the work for you

Gemini can handle more tasks like filling out forms with less input from you

Gemini can already write your emails for you, but with Agents, Gemini can take more actions for you. During I/O, the company demonstrated how Gemini could help you return a pair of shoes by locating a receipt in your Gmail, filling out the return form for you, and even scheduling a package pickup. Or, it could help update your address after you move across all the different services that you use. The company says that the Agents work under your supervision but are able to reason, plan and think multiple steps ahead.

10Aid in learning with LearnFM

LearnNM is a new model of Gemini specific for education

Much of the demonstrations centered on how a student (or a parent of a student) can use AI for learning. LearnNM is an educational model of Gemini that’s designed specifically to help with homework, like creating a study guide or practice tests, or using the camera to help solve a math problem.

10 ChatGPT prompts to unlock the full power of OpenAI’s chatbot

Want to get the most out of ChatGPT? Try these prompts to unleash its full potential and make the AI work harder for you.