Chinese Experts Created a Neural Network That Generates Videos Based on Text Input

Chinese experts have developed a neural network called CogVideo capable of generating short videos (GIFs) based on a given text description. So far, the text inputs are in Chinese.

Image: CogVideo on GitHub

The videos are generated at a frame rate of 32 frames per 4 seconds and last no more than 4 seconds.

By now, CogVideo can create videos for such queries as "a couple are having dinner," "a lion is drinking water," "a boy is surfing in the sea," "a dog is running on the lawn," and more.

Image: CogVideo on GitHub

CogVideo works similarly to DALL-E 2 from OpenAI and Imagen from Google, which can generate images based on text descriptions.

So far, the developers have published several examples of generated videos along with storyboards on GitHub.