Runway Says Gen-4 AI Videos Are Now More Consistent

Producing video content is a particular challenge for generative AI models, which have no real understanding of space or physics and essentially come up with clips frame by frame. This can lead to obvious errors and inconsistencies, as we wrote about in December with OpenAI’s Sora after it submitted a video of a taxi disappearing.

AI video company Runway says it’s making progress on these specific issues with its new Gen-4 models. According to Runway, the new models offer “a new generation of consistent and controllable media,” with characters, objects and scenes now much more likely to look the same throughout a project.

If you’ve experimented with AI video, you know that many clips are short, show slow motion, and don’t contain elements that move in and out of the frame—usually because the AI ​​renders them differently. People merge with buildings, limbs transform into animals, and entire scenes mutate as seconds pass.

This is because, as you may have already realized, these AIs are essentially probabilistic machines. They more or less know what a futuristic cityscape should look like based on a large number of futuristic cityscapes, but they do not understand the building blocks of the real world and cannot retain a fixed view of the world in their memory. Instead, they keep rethinking it.

Runway aims to fix this with reference images that can be returned to while it reinvents everything else in the frame: People should look the same from shot to shot, and there should be fewer problems with main characters walking through furniture and turning into walls.

The new Gen-4 models can also “understand the world” and “simulate real-world physics” better than ever before, according to Runway. The advantage of going out into the world with a real video camera is that you can film a bridge on one side, then cross it and film the same bridge on the other side. Using AI, you tend to get a different approximation of the bridge every time – and Runway wants to solve this problem.

Check out the demo videos Runway has put together and you’ll see that they do a pretty good job in terms of consistency (though, of course, they’re hand-picked from a wide range of contributors). The characters in this clip look more or less the same from shot to shot, although with some differences in facial hair, clothing, and physical age.

What are your thoughts so far?

There’s also “Lonely Little Flame” (above), which, like all Runway videos, was reportedly synthesized through the hard work of real animators and directors. It looks impressively professional, but you’ll see how the shape and markings on the skunk change from scene to scene, as does the shape of the rock character in the second half of the story. Even with these latest models, there’s still some way to go.

While the Gen-4 models are now available for image-to-video conversion for paid Runway users, the cross-scene consistency features have not yet been implemented, so I can’t test them personally. I’ve been experimenting with making a few short videos of Sora, and here the problem with consistency and real-world physics remains: objects appear (and disappear) out of thin air and characters move through walls and furniture. Below you can see one of my creations:

As you can see on the official Sora presentation page , it’s possible to create some flawless clips, and the technology has now reached a high enough standard that it’s starting to see limited use in professional productions. However, the problems with the disappearance and transformation of taxis, which we wrote about last year, have not gone away.

Of course, you only need to look at where AI video technology was a year ago to know that these models will get better and better, but generating video is not the same as generating text or a static image: it requires a lot more processing power and a lot more “thinking”, as well as an understanding of real-world physics that AI will have a hard time mastering.

More…

Leave a Reply