Artemisia1508/iStock/Getty Images Plus via Getty Images
Follow ZDNET: Add us as a preferred source on Google.
ZDNET's key takeaways
Google's new Veo 3.1 video model has landed.
It can blend individual images into a unified video clip.
Like its predecessor, it also creates videos with audio.
Once upon a time, animators had to painstakingly work frame-by-frame, stitching together long strings of still images to create the illusion of motion. Today, they only need to upload a few images, and AI will do the rest.
On Wednesday, Google DeepMind released its latest video-generating AI model, Veo 3.1, available now in Flow, Vertex AI, the Gemini API, the Gemini App, and Vids. The company also released a smaller, less powerful version of the model called Veo 3.1 Fast.
Also: I used Google's photo-to-video AI tool on my selfie - and it made me do the tango
Veo 3.1 specializes in blending disparate images into natural-looking videos, significantly reducing the time and resources that have historically been required for video production. Amazon also recently debuted an AI tool which allows brands to generate short video ads from still images of products in a matter of seconds.
Google's new model arrives less than four months after the public launch of its predecessor, Veo 3, which quickly became a hit because of its ability to generate video with synchronized audio. Google also later upgraded that model with the ability to generate short videos from a single image.
Veo 3.1 also comes with that feature and more. According to a promotional deck from Google shared with ZDNET, the model "offers richer audio and enhanced realism that captures true to life textures." It also has a more sophisticated "understanding of storytelling, cinematic styles, and character interactions," the company wrote.
Video 'ingredients'
Veo 3.1 blends multiple images to create a single, natural-looking video, like an AI blender that takes separate assets and combines them into a single visual smoothie.
Also: Try Google's Nano Banana image generator in Search and NotebookLM - here's how
An image of a woman's face, another of a collection of clothing grouped together, and a third of an ornate-looking room could, for example, prompt the model to create a short video clip of the woman wearing the pictured clothes and strolling through the room (no obviously detectable extra fingers included).
More interestingly, you can upload images which, at first glance, you'd never expect could be brought together in any kind of comprehensible way. This is where the "creativity" (to use a loaded term) of Veo 3.1 shines brightest.
Want more stories about AI? Sign up for AI Leaderboard, our weekly newsletter.
A demo provided by Google showing one image of a decorated Christmas tree behind a pair of sliding doors and another of a psychedelic mixture of colors -- resembling a collection of various paint colors blended together -- creates a video of the doors sliding open to release a flood of multicolored, Christmas ornament-sized balls, like a Surrealist reimagining of the blood-filled elevator in The Shining.
First and last frame
Veo 3.1 also allows users to upload just two images -- the first and last in a sequence -- and the model will automatically fill in the intermediary blank spot with video.
Also: You can test Microsoft's new in-house AI image generator model now - here's how
In one demo video, for example, Google shows an image of an old, rustic barn, with low sunlight pouring through the entryway, and another of a cowboy astride a horse, which appears to be casually trotting through tall grass. Veo 3.1 combines these two images by panning the camera through the barn's doorway until all we see is the (now actually moving) cowboy.
The first and last image feature is available now on Flow, Vertex AI, and the Gemini API, but not the Gemini App.
Caveats
In that demo video and in others provided by Google, both the first and last images have similar lighting and artistic aesthetics. Uploading two images that are completely distinct from and unrelated to one another -- a black and white image of a Ferrari paired with a color pencil sketch of an orange tree, say -- will yield less predictable results.
Scene extension
Veo 3.1 also comes with a new scene extension feature, through which users can easily lengthen their AI-generated video clips, along with another capability that allows them to add or remove visual elements to and from existing videos.