Department: Electrical & Computer Engineering
Research Institute Affiliation: Information Theory and Applications Center
Faculty Advisor(s): Gert Lanckriet

Primary Student
Name: Emanuele Coviello
Email: ecoviell@ucsd.edu
Phone: 619-400-9373
Grad Year: 2014

Student Collaborators
Katherine Ellis, kellis@ucsd.edu

In the digitalized era the advent and success of new technologies has drastically changed the way people interact with multimedia content, especially with music. As a consequence, music collections that are only one-click away from the user have reached unprecedented huge dimensions: for example, Apple's iTunes store provides instantaneous access to over 20 million songs. For popular music, the deployment of search technologies can widely benefit from the large availability of meta-data (e.g., lyrics, genre annotations, critical reviews, purchase-data and charts). However, for the long-tail of less popular artists this type of information is often not available. This motivates the development of content-based retrieval systems. This research addresses some important aspects for improving content-based retrieval of music by proposing the bag-of-systems (BoS) representation of music. The BoS representation of music is analogous to the bag-of-words representation of text documents, where documents are represented by counting the occurrences of each word. Specifically, in the BoS framework the codebook is formed by generative time-series models instead of words, each of them compactly characterizing typical textures and dynamics patterns in audio fragments. Hence, a song is represented by a BoS histogram with respect to the codebook, by assigning individual fragments to the most likely codeword, and then counting the frequency with which each codeword is selected. An advantage of the BoS approach is that it decouples modeling content from modeling tags. As a consequence, a codebook of sophisticated generative models can be robustly compiled from a large collection of songs, while simpler models, based on standard text mining algorithms, are used to capture statistical regularities in the BoS histograms representing the subsets of songs associated to each individual tag. In practice, the efficacy of the BoS descriptor (or any bag-of-words representation) depends on the richness of the codebook, i.e., the ability to effectively quantize the feature space, which directly depends on the number of codewords in the codebook. Since, increasing the number of codewords also increases the computational cost of mapping a song onto the codebook, we propose the BoS-Tree, a fast way of indexing BoS codewords. Starting from a large BoS codebook, we construct a bottom-up hierarchy of codewords using the recently proposed hierarchical EM algorithm, and then leverage the tree structure to efficiently index the codewords. In this way, we achieve fast look-ups on the codebook and consequently enable the practical use of a large BoS codebook in large-scale music application.

« Back to Posters or Search Results