dan rosenbaum nir muchtar yoav yosipovich faculty member : prof. daniel lehmannindustry...

Dan Rosenbaum Nir Muchtar Yoav Yosipovich

Faculty member : Prof. Daniel Lehmann

Industry Representative : Music Genome

A short project’s overview:

• Song recognition. Recognition of musical tracks by measuring the distance between the signal’s fingerprint.

• Feature extraction Finding the most compatible fingerprints and thresholds in order to classify music in different categories such as leading instrument, genre and mood.

We are putting an emphasis on automatically acquiring data from the music signal itself. Our assumption is that the music itself encapsulates sufficient information which will enable us to recognize songs and features in songs.

The Product – a brief reminder:

We are building a Matlab based package which addresses these Challenges:

• Building a fingerprinting system for recognition and estimation of distance between musical tracks. The system is flexible enough to enable us to consider different types of parameters.

• Testing the different fingerprinting parameters and thresholds and finding the most compatible ones for every application (e.g. song recognition, feature extraction).

• Performing song recognition and demonstrating it using a Graphical User Interface.

Song Recognition

Upon receiving a song snippet (possibly noisy), find the closest match from a database of songs.

Signal over time

First we record the song:

Spectrogram - STFT

One frame’s spectrum (3 sec)

Peaks vector

0010010010000001000001010010001010010000

We've conducted a song-recognition test which takes 3 frames out of each song, in random offset and random length (2-4 seconds), and for each frame:

- creates their fingerprint. - compare the prints to find the closest 3-second frame print in the db. - build a vector of similarities between this frame to all db songs (based on the closest frame of each song).

Results:

Accurate: 99.4% success (number of frames successfully identified / total number of frames).

Fast: About 2 seconds to recognize a song among 500 songs. Even without optimizations, our method is scalable to large databases.

When testing with 4000 song DB, no degradation was noticed. We got good results for noisy analog recordings as well (using a computer mic).

Test Results:

The resulting similarity matrix size is 1500 test frames x 500 songs.

We got a diagonal matrix which indicates that each frame is most similar to the fingerprint of its original song in the db.

Feature extraction

classify music in categories such as leading instrument, genre, mood, etc.

A computer vision approach:

• we observed that audio researches commonly employ 2-D representations such as spectrogram, when analyzing sound or speech.

• We apply a current computer vision technique: boosted classifiers on local object-recognition features.

• By learning these “images”, we extract features from the song.

Create a raw vector of candidate filters:• We first transform the audio data into a Mel Frequency Cepstral Coefficients matrix: 20 rows of MFCC coefficients X 1200 columns (40 frames in sec X 30-sec snippet).• On this matrix we apply a set of filters (Viola - Jones object detection) in order to capture important time-frequency characteristics of audio. We are roughly applying 3500 filters.

Candidate filter set (Haar basis function). The value of a two-rectangle feature is the difference between the sum of the pixels within two rectangular regions.

Training (offline):• We apply a multi-label boosting algorithm (Adaboost) which makes a use of a weak learner in order to select a compact subset of these filters, which best divides the data (with minimal error), according to a specific feature. The algorithm returns the thresholds for each feature.

Output:A function that takes a song snippet as input, and returns whether its genre is rock for Instance, or if it includes a leading guitar or a flute.

Results:

Haven’t accomplished the expected results yet:

Training error lowered to zero but testing error stayed quite high.

Possible reasons:• Problematic database• Inappropriate features selection.

Demonstration idea:

Record the music playing at the background of a user’s office (taken by the lap-top recorder for example), recognize the song and send back through the internet relevant information.

Record the music

Send back online informationRecognize the song from

the songs DB

Music is playing atA user’s living room

GUI Prototype

Porting of on-line algorithms to python, to enable GUI demonstration:

dan rosenbaum nir muchtar yoav yosipovich faculty member : prof. daniel lehmannindustry...

Documents

song recognition

songrecognition test

song snippet

song db

original song

test frames x

recognition of musical

closest frame