Gemini ‘Omni’ Will Generate Media Content From Any Input Signal, Starting With Video.

Today at Google I/O 2026, numerous AI-related announcements were made, but perhaps the most impressive is a new multimodal model called Gemini Omni. While it is initially launched as a video generator, it will eventually be able to integrate images and audio as both input and output.
The idea is that you can combine various audio, video, and video files into a completely new clip using a custom prompt. Currently, Gemini only allows you to create videos from text prompts and images, so you get the additional ability to combine audio clips and existing video files to create something new—multiple input sources, and then the result, which Google promises, will be better than ever in terms of realism and accuracy.
While image and audio generation are still in development, video creation will be available first with a model called Gemini Omni Flash. Google cites the example of selecting multiple styles from images in your phone’s gallery and applying them to an existing video: this way, you could make a real-life video of yourself look like a Pixar animated film.
As Google reports , you can also edit your videos using “dialogue.” This aspect of dialogue will be familiar to anyone who already uses Gemini to create videos: you simply explain what you want to see, and Omni will take care of it. You can use additional prompts to change something specific in the video, such as an object or color, or create your own re-shots of scenes where the action changes.
You can also change the angle or setting of your video—for example, from a bedroom to a beach. Google claims you can change the angle multiple times to improve your videos, while still being able to revert to the original video.
Gemini’s worldview
Google claims that Gemini Omni uses an “intuitive understanding of physics” combined with Gemini’s “knowledge of history, science, and cultural context” to make videos as realistic and consistent as possible—though I’ll have to test it myself to see if it actually works as well as Google claims.
Omni now better understands forces like gravity, kinetic energy, and fluid dynamics, which should reduce the incidence of unconventional AI behavior. In addition to constructing scenes, Google claims that Gemini Omni can reason about what should happen next.
AI-generated videos are often distorted because they attempt to follow patterns discovered in vast amounts of training data rather than the laws of physics. If a person disappears from the camera’s view, they won’t necessarily be there when the camera returns. Google claims that Gemini Omni will exhibit fewer such issues.
To protect against deepfakes, Google is imposing some restrictions on video creation. Currently, you can only use your own voice and a digital avatar created using your data to create videos. Furthermore, all videos will contain Google’s invisible SynthID watermark, indicating that the content was created using artificial intelligence.
The Gemini Omni Flash feature is now available in the Gemini app and Google Flow for Google AI Plus, Pro, and Ultra subscribers. It will also be available for free in YouTube Shorts and the YouTube Create app later this week.
At the time of writing, there is no information about usage limits. Currently, users of Google’s AI Plus plan ($7.99 per month) can create two videos per day using the Veo 3.1 Lite model. It remains to be seen how generous Google will be with the Gemini Omni generations—they appear to consume quite a bit of AI processing power.