Following chatbots After ChatGPT and the AI video model Sora, OpenAI is about to unleash another major breakthrough.
According to reports, OpenAI is actively developing AI music models. OpenAI engineers are collaborating with students at the Juilliard School to annotate musical scores as training data for the music models.
In internal discussions, OpenAI explored specific applications of using text and audio prompts to generate music. For example, a user could input a description and request the AI to add guitar accompaniment to an existing vocal track.
This feature helps users add background music to short videos, making the videos entirely AI-generated. Imagine a user using Sora to generate a TikTok-style dance video, then having it instantly paired with dynamic background music by AI, and then sharing it on ChatGPT's AI-powered social platform. This will significantly lower the barrier to content creation.
Currently, OpenAI has over 800 million active users, and the music model will help OpenAI build a more comprehensive AI ecosystem, further enhancing user engagement.
However, whether the music model will be seamlessly integrated with ChatGPT or Sora, or become a standalone application, remains to be seen, and an OpenAI spokesperson declined to comment.
Music models are not only suitable for personal entertainment but can also be combined with commercial scenarios, helping OpenAI expand into the advertising field. It is understood that advertising companies will be able to use OpenAI's music models to create lyrics and melodies for advertisements.
In fact, OpenAI's forays into the music field are not new. In 2019, OpenAI launched the music generation model MuseNet, which can combine the timbres of up to 10 different instruments to generate music of various styles, such as classical, rock, and country, up to 4 minutes long, but it cannot sing. Subsequently, in 2020, OpenAI released the Jukebox model, which can "sing".
However, neither MuseNet nor Jukebox has been integrated into ChatGPT and Sora. Due to limitations in technology and computing costs, the music they generate still falls short of human-created music.
Global AI Music Race
Now, with advancements in computing power and model architecture, music generation technology has finally become a viable practical application and may become the focus of a new round of AI technology competition following text and video.
In May of this year, Google launched Lyria, the second-generation music production model. Google specifically emphasized that Lyria can create soundtracks for advertisements, which highly overlaps with the potential commercialization direction of OpenAI's music model.
Meanwhile, startups Suno and Udio have successfully commercialized their AI music generation products. Suno, a company founded just three years ago, has already achieved annual recurring revenue of $150 million, nearly four times that of a year ago.
The Science and Technology Innovation Board Daily noted that China's AI music models are also rapidly emerging.
Last year, ByteDance's Doubao Big Model team launched Seed-Music, a family of music generation models with flexible control capabilities.
Earlier this year, Alibaba's Tongyi Labs open-sourced the music generation model InspireMusic, aiming to create an open-source AIGC toolkit that integrates music generation, song generation, and audio generation capabilities.
On March 26, Kunlun Tech... Released the world's first music reasoning model, Mureka O1, with multiple performance features surpassing Suno V4 and reaching the top of the state-of-the-art.
On June 16, Tencent AI Lab open-sourced the SongGeneration music generation model, focusing on solving three common challenges in music AIGC: sound quality, musicality, and generation speed.
On September 12, MiniMax launched the Music 1.5 music generation model, achieving four major breakthroughs: strong control, natural and full vocals, rich arrangement layers, and clear song structure.
(Source: Science and Technology Innovation Board Daily)