Google DeepMind has introduced an innovative AI tool designed to generate soundtracks for videos using video pixels and text prompts. This new video-to-audio (V2A) technology can create soundscapes that perfectly align with the visuals, adding music, sound effects, and even dialogue to enhance the content.
The V2A tool stands out by offering users the flexibility to create synchronized audio without manually aligning the sounds to video scenes. It analyzes the visual elements and generates fitting audio based on the user’s input, or even without a text prompt, making it easier for content creators to add custom soundtracks to their projects.
Trained on a vast dataset of video, audio, and annotations, the V2A system can generate an unlimited number of soundtracks, allowing users to experiment with different audio styles for any video. The technology is expected to be particularly useful for filmmakers and content creators looking to elevate the quality of their work.
However, the tool still has some limitations. DeepMind is working to improve lip synchronization for dialogue-heavy videos and address issues where low-quality visuals result in a noticeable drop in audio quality.
Before being released to the public, the technology will undergo rigorous testing to ensure safety and reliability. When launched, all AI-generated audio will include Google’s SynthID watermark to clearly indicate it was created using artificial intelligence.