'Game-powered machine learning' opens door to Google for music
Screen-shot of the Facebook game Herd It. The game acts as an incentive for online music fans to classify music, providing sets of examples that are used to train computers to automatically label more songs.
In Lanckriet’s solution, computers study the examples of music that have been provided by the music fans and labeled in categories such as “romantic,” “jazz,” “saxophone,” or “happy.” The computer then analyzes waveforms of recorded songs in these categories looking for acoustic patterns common to each. It can then automatically label millions of songs by recognizing these patterns. Training computers in this way is referred to as machine learning. “Game-powered” refers to the millions of people who are already online that Lanckriet’s team is enticing to provide the sets of examples by labeling music through a Facebook-based online game called Herd It.
A screen-shot of the Facebook-based game Herd It.
It’s the active feedback loop that combines human knowledge about music and the scalability of automated music tagging through machine learning that makes “Google for music” a real possibility. Although human knowledge about music is essential to the process, Lanckriet’s solution requires relatively little human effort to achieve great gains. Through the active feedback loop, the computer automatically creates new Herd It games to collect the specific human input it needs to most effectively improve the auto-tagging algorithms, said Lanckriet. The game goes well beyond the two primary methods of categorizing music used today: paying experts in music theory to analyze songs – the method used by Internet radio sites like Pandora – and collaborative filtering, which online book and music sellers now use to recommend products by comparing a buyer’s past purchases with those of people who made similar choices.
Both methods are effective up to a point. But paid music experts are expensive and can’t possibly keep up with the vast expanse of music available online. Pandora has just 900,000 songs in its catalog after 12 years in operation. Meanwhile, collaborative filtering only really works with books and music that are already popular and selling well.
The big picture: Personalized radio
Lanckriet foresees a time when – thanks to this massive database of cataloged music -- cell phone sensors will track the activities and moods of individual cell phone users and use that data to provide a personalized radio service – the kind that matches music to one’s activity and mood, without repeating the same songs over and over again.
“What I would like long-term is just one single radio station that starts in the morning and it adapts to you throughout the day. By that I mean the user doesn’t have to tell the system, “Hey, it’s afternoon now, I prefer to listen to hip hop in the afternoon. The system knows because it has learned the cell phone user’s preferences.”
This kind of personalized cell phone radio can only be made possible if the cell phone has a large database of accurately labeled songs from which to choose. That’s where efforts to develop a music search engine are ultimately heading. The first step is figuring out how to label all the music online well beyond the most popular hits. As Lanckriet’s team demonstrated in PNAS, game-powered machine learning is making that a real possibility.
Lanckriet’s research is funded by the National Science Foundation, National Institutes of Health, the Alfred P. Sloan Foundation, Google, Yahoo!, Qualcomm, IBM and eHarmony. You can watch a video about the research and Lanckriet's auto-tagging algorithms to learn more.