Google developers have revealed a new AI system dubbed MusicLM that can generate high-fidelity music of any genre from text descriptions given by users.
To be able to create songs based on complex descriptions, the AI has been trained on a dataset containing more than 280,000 hours of musical compositions. As a result, the new neural network can generate music based on abstract descriptions as well as pictures and descriptions for it.
The queries could be anything from “meditative song, calming and soothing, with flutes and guitars” and “Berlin '90s techno with a low bass and strong kick” to “enchanting jazz song with a memorable saxophone solo and a solo singer” and “induces the experience of being lost in space.” The AI model can even capture nuances like instrumental riffs, moods, and melodies. It can also create music for a specific action or state, like awakening or meditation.
MusicLM can also be instructed to generate audio that's played by a specific type of instrument, and the experience level of the AI “musician” can be set.
However, the new neural network isn't flawless. Some of the samples have a lot of noise and distorted quality, whereas voices and lyrics can sometimes sound very bad. Furthermore, the developers have also found out that nearly 1% of the music generated by the AI model was directly replicated from songs that it was trained on, which could cause copyright issues.
As for now, Google has no plans to release the source code of MusicLM or make the neural network publicly available. More than 5,000 music-text pairs have only been published for research.
A scientific paper describing MusicLM was published on arXiv.org. You can listen to examples of music generated by MusicLM on GitHub here.