Meta has introduced a new internally developed and DALL-E-inspired AI system that is capable of generating short video clips based on text descriptions. The official blog of its developers states that Meta AI engineers have advanced their research into artificial intelligence and used their findings to create a new service dubbed Make-A-Video.

Chinese Experts Created a Neural Network That Generates Videos Based on Text Input
By now, CogVideo can create videos for such queries as “a couple are having dinner,” “a lion is drinking water,” “a boy is surfing in the sea,” “a dog is running on the lawn,” and more.

There already exist similar AIs that can generate images from text prompts, but Meta's team has gone further and developed a text-to-video generator, which Meta CEO Mark Zuckerberg calls "amazing progress," noting that "it's much harder to generate video than photos because beyond correctly generating each pixel, the system also has to predict how they'll change over time."

At this stage, the new AI has many technical limitations: the videos it generates are blurred, low quality, have no sound, and can last no more than five seconds. The model analyses a text query and outputs 16 frames of video at a resolution of 64 by 64 pixels, which are then boosted using a separate AI model to 768 by 768 pixels. Despite these limitations, Meta says that the resulting videos will become much better in the future, and the development could help to significantly advance the field of AI-assisted content creation.

Meta Introduced a Next-Gen AI Supercomputer
The AI Research SuperCluster (RSC) computer will help researchers create better AI models, which will be able to learn from trillions of examples, work with hundreds of various languages and seamlessly analyze images, videos, and text.

The company has also shared examples of video clips generated by its new AI. At the moment, Make-A-Video is not available to the public – all videos were provided by the developers themselves. Therefore, it is still unclear how well Make-A-Video actually understands text descriptions and creates videos based on them.