Google Just Launched New AI Models for Video and Images

Posted on May 21, 2025 by resourcefulman

The pace of AI development shows no signs of slowing down. After the big ChatGPT image update a few weeks ago, it’s Google’s turn to show off new models for generating videos and images from text cues: we have Veo 3 (for videos) and Imagen 4 (for images), announced during Google I/O 2025 , and they come with some significant improvements.

Starting with Veo 3, it’s the next step up from the Veo 2 model, which was recently rolled out to Gemini’s paid subscribers last month. Google says Veo 3 brings with it noticeable improvements in real-world physics (something AI video often struggles with) and details like lip syncing. In short: Your clips should look more realistic than ever.

There’s another major upgrade here, and that’s sound. Previously, Veo-created clips didn’t have any audio, but now the AI is smart enough to add relevant ambient sounds, including traffic noise, wildlife sounds, and even dialogue between characters.

Google has provided several sample videos demonstrating the new capabilities, as expected, including Old Sailor . It’s certainly impressive that such a clip can be created from a text prompt, and it meets high standards in terms of realism — we’re no longer getting the six-fingered hands we’re used to with AI.

Still, the usual hallmarks of AI are clear: this is a regular sailor on a regular sea, speaking regular dialogue about the ocean. It’s a mix and match of every video of the sea and old sailors that Veo 3 has been trained on, and may or may not match the original prompt (which Google didn’t provide).

Veo 3 is only available to those brave enough to pay $250 a month for Google’s AI Ultra plan, but Veo 2 is also getting some upgrades for those of us who pay a tenth of that for AI Pro. It’s now more controllable and consistent, Google says, with improved camera movements and outpainting (expanding the view of the frame). It can also now add and remove objects from clips.

Moving on to images, we have Imagen 4, the successor to Imagen 3. Here we are promised “remarkable clarity of fine details like intricate fabrics, water droplets, and animal fur,” as well as support for higher resolutions (up to 2K) and more aspect ratios. You get top-notch results in both photorealistic and abstract styles, according to Google.

What do you think at the moment?

In Google’s AI world, there are sheep the size of tractors. Credit: Google

Google has also taken on one of the biggest problems with AI image generation: typography. Imagen 4 appears to be much better than previous models at producing coherent and accurate symbols and words, without weird spellings or letters that dissolve into incomprehensible glyphs.

Imagen 4 is now available to all users inside the Gemini app. Google doesn’t mention any usage limits, though it’s likely that if you don’t have a subscription you’ll hit those limits faster, as with Imagen 3 (there’s no fixed quota for those limits, and they seem to be based on overall demand for Google’s AI infrastructure).

The carefully curated samples Google provided look good, with no obvious errors or inaccuracies — just the usual AI brilliance. Imagen 4 is also faster than Imagen 3, Google says, and more improvements are on the way: a variant of the model that is 10 times faster than Imagen 3 will launch soon.

There’s one more image and video tool worth talking about: Flow . It’s Google’s AI-powered movie-making tool that combines its text, video, and graphics models to help you stitch together coherent scenes with the same characters and locations. You can use Flow if you’re an AI Pro or AI Ultra subscriber, with higher usage limits and better models for those on the more expensive plan.

More…

Google Just Launched New AI Models for Video and Images

Leave a Reply Cancel reply