Riffusion is an AI model that composes music by visualizing it. It was created as a hobby project by Seth Forsgren and Hayk Martiros and uses a latent text-to-image diffusion model to generate spectrogram images from any text input. These spectrograms can then be converted into audio files, allowing for the creation of AI-generated music. Riffusion also utilizes the v1.5 stable diffusion model to create AI music from spectrograms paired with text.
The result is jazzy clarinet with maracas, jazzy rapping from Paris, and smooth tropical dance jazz – all generated by Riffusion! This innovative technology allows you to create anything in real time with stable diffusion.