Google I/O Reveals Gemini Still Needs Time to Bake

During the Google I/O 2024 kickoff keynote , the overall tone seemed to be: “Can we get an extension?” Google’s promised AI improvements are definitely front and center here, but with a few exceptions, most are still in development.

This isn’t all that surprising—it’s a developer conference, after all. But it looks like consumers will have to wait a little longer for the promised “Her” moment. Here’s what you can expect when Google’s new features start rolling out.

AI in Google Search

Credit: Google/YouTube

Perhaps the most impressive addition for most people will be Gemini’s expanded integration into Google search. While Google already has a “generative search” feature that lets you jot down a paragraph or two quickly, it will soon be joined by an “AI Reviews” feature.

AI Reviews will optionally expand the generative search to a full page with answers to your questions as well as suggestions based on the context of your search.

For example, if you live in a sunny area with good weather and ask for “restaurants near you,” Reviews might give you a few basic suggestions, as well as a separate, unprompted subheading with restaurants that have good patio seating.

Instead, on a more traditional search results page, you’ll be able to use “AI-driven search results” that avoid traditional SEO and intelligently recommend web pages to you based on very specific suggestions.

For example, you can ask Google to “create a three-day gluten-free meal plan with lots of vegetables and at least two desserts,” and the search page will generate several subheadings with links to relevant recipes under each.

Google is also introducing artificial intelligence into the search process, with an emphasis on multimodality—meaning you can use it with more than just text. Specifically in development is an “Ask with Video” feature that will allow you to simply point your phone camera at an item, ask for help with identification or repair, and get answers using generative search.

Google hasn’t directly said how it deals with criticism that AI search results essentially steal content from online sources without requiring users to click through to the original source. However, demonstrators have repeatedly emphasized that these features lead you to useful links that you can check for yourself, perhaps covering your base in the face of these criticisms.

“AI Reviews” is already being rolled out in Google’s search pilot labs, while “AI Organized Search Results” and “Ask with Video” are planned for the “coming weeks.”

Search your photos with AI

Credit: Google/YouTube

Another of the more specific features in development is Ask Photos, which takes advantage of multimodality to help you sort through the hundreds of gigabytes of images on your phone.

Let’s say your daughter took swimming lessons last year and you lost your first photos of her in the water. Asking for photos will allow you to simply ask, “When did my daughter learn to swim?” Your phone will automatically understand who you mean by “your daughter” and show images from her first swimming lesson.

This is similar to searching your photo library for photos of your cat by simply typing “cat”, of course, but the idea is that multimodal AI can support more detailed questions and understand what you’re asking about with broader context, drawing on Gemini and data , already stored on your phone.

Other details are unclear: Ask Photos is scheduled to debut “in the coming months.”

Project Astra: AI agent in your pocket

Credit: Google/YouTube

This is where we get to more pie in the sky. Project Astra is the biggest C-3PO we’ve ever seen from an AI. The idea is that you’ll be able to download the Gemini app to your phone, open the camera, point it, and ask questions and help based on what your phone sees.

For example, hover over a speaker and Astra can tell you what pieces of equipment are there and how they are used. Point to a drawing of a cat of questionable vitality, and Astra will answer your riddle “Schrödinger’s Cat.” Ask him where your glasses are, and if Astra was looking at them earlier in your shot, he will be able to tell you.

This is perhaps a classic dream when it comes to AI, and it’s very similar to the recently announced OpenAI GPT-4o , so it makes sense that it’s not ready yet. Astra is due to arrive “later this year”, but interestingly it will also work with both AR glasses and phones. We might soon hear about a new Google wearable device.

Create your own podcast hosted by Robots

Credit: Google/YouTube

It’s unclear when this feature will be ready, as it appears to be more of an example of Google’s improved AI models than a headliner, but one of the more impressive (and perhaps alarming) demos Google showed off during I/O involved the creation of a user-generated podcast hosted by AI voices.

Let’s say your son studies physics at school, but learns more from audio than text. Presumably Gemini will soon allow you to upload recorded PDFs to Google’s NotebookLM app and ask Gemini to create an audio program to discuss them. The app will create a sort of podcast hosted by artificial intelligence voices speaking naturally about topics from PDFs.

After this, your son will be able to interrupt the owners at any time and ask for clarification.

Hallucination is obviously a big issue here, and naturalistic language can be a bit “cringe-worthy” for lack of a better word. But there’s no doubt that it’s an impressive sight… if only we knew when we’d be able to recreate it.

Paid features

Credit: Google/YouTube

There are a few more tools in the works that seem tailor-made for the typical consumer, but for now they will be limited to paid Google Workspace plans.

The most promising of these is the Gmail integration, which involves a three-pronged approach. The first is summaries that can be read in a Gmail thread and highlight the key points for you. This isn’t all that new, nor is the second option, which allows AI to offer you contextual responses based on information in your other emails.

But Gemini’s questions and answers seem truly transformative. Imagine you need roofing work done and you’ve already emailed three different roofing firms with quotes. Now you want to make a spreadsheet of each company, its stated price, and its availability. Instead of going through every email you send with them, you can ask the Gemini box at the bottom of Gmail to create this table for you. It will search your Gmail inbox and generate a spreadsheet in minutes, saving you time and possibly helping you find missed emails.

This kind of context table building will also be coming to apps outside of Gmail, but Google also proudly showed off its new “Gemini-powered Virtual Teammate.” This upcoming Workspace feature is still in its early stages and is a mix of the typical Gemini and Astra chat. The idea is that organizations will be able to add AI agents to their Slack counterparts, ready to answer questions and create documents 24/7.

Gmail Gemini features will be available to Workspace Labs users this month.

Gems

Credit: Google/YouTube

Earlier this year, OpenAI replaced ChatGPT plugins with ” GPT “, allowing users to create custom versions of their ChatGPT chatbots designed to address specific issues. Gems are Google’s answer to this problem, and they work much the same way. You will be able to create multiple gems, each of which will have its own page in your Gemini interface and each will respond to a specific set of instructions. In Google’s demo, Gems suggested included examples like “Yoga Bestie,” which offers exercise tips.

Gems is another feature that won’t be released for a few months, so you’ll have to stick with GPT for now.

Agents

Credit: Google/YouTube

After the muted reception of the Humane AI Pin and Rabbit R1, AI fans were hoping that Google I/O would show Gemini’s answer to the promise behind these devices – the ability to go beyond simple information matching and actually interact with websites for you. . What we got was a light teaser with no set release date.

In a statement from Google CEO Sundar Pichai, we saw the company’s intention to create artificial intelligence agents that can “think several steps ahead.” For example, Pichai talked about the possibility of a future Google AI agent that can help you return your shoes. This can range from “searching for a receipt in your mailbox” to “filling out a return form” to “scheduling a pickup,” all under your control.

This all came with a huge caveat: this was not a demo, but just an example of what Google wants to work on. Imagine If Gemini Could did a lot of the heavy lifting during this portion of the event.

New Google AI Models

Credit: Google/YouTube

In addition to highlighting specific features, Google also touted the release of new AI models and updates to its existing AI model. From generative models like Imagen 3 to larger, more contextually intelligent Gemini builds, these aspects of the presentation were aimed more at developers than end users, but there are still a few interesting points worth noting.

The key differences are the introduction of Veo and Music AI Sandbox, which generate AI video and audio respectively. There aren’t a lot of details about how they work yet, but Google has tapped big stars like Donald Glover and Wyclef Jean for promising quotes like “Everyone will be a director” and “We’re digging through endless drawers.”

For now, the best demos we have of these generative models are in examples posted on celebrity YouTube channels. Here’s one below:

Google also kept talking about Gemini 1.5 Pro and 1.5 Flash during its presentation. New versions of LLM are primarily aimed at developers who support a larger number of tokens, providing greater contextuality. This probably won’t make much difference to you, but take a look at Gemini Advanced .

Gemini Advanced is already available in the market as Google’s paid Gemini plan and allows for more question asking, some easy interaction with Gemini 1.5, integration with various apps like Docs (separate from the exclusive Workspace features), and file uploading like PDF -files. .

Some of Google’s promised features sound like they would require a Gemini Advanced subscription, especially those that require you to upload documents so the chatbot can answer questions related to them or mock them with its own content. We don’t yet know for sure what will be free and what won’t, but that’s another caveat to keep in mind as Google promises to “follow us” on this I/O.

This concludes Google’s general announcements for Gemini. However, they also announced new AI features in Android , including a new Circle to Search feature and the use of Gemini for fraud detection. (This isn’t Android 15 news, though: that’s coming tomorrow.)

More…

Leave a Reply