convenient mir systems vision vs. reality check, research & e-commerce stephan baumann
Post on 20-Dec-2015
212 views
TRANSCRIPT
convenient MIR systems vision vs. reality check, research & e-commerceStephan Baumann
Agenda
• Personal Profile• Convenient Music Information Retrieval
– Multi-modal queries– Identification by description– Multi-facet music similarity
• Timbre • Lyrics• Cultural aspects
• Project MPEER: P2P, semantic web and MIR
Research Diary (1991-2003)
• 1991/92 optical music recognition• 1992/93 online handwriting recognition• 1993/94 optical music recognition• 1995/97 document analysis and
understanding• 1996 first look on MultimediaIR (S.Pfeiffer) • 1998/99 spinoff activities with Insiders GmbH • 2000 freelancing/research for draft MIR system • 2001 co-founding Sonicson GmbH• 2001/03 subjective music similarity (Ph.D..
Sep03)
Desiderata MIR [Huron]
• 1. Access to all of the world’s music• 2. Access via an indexing method• 3. Fair use (reimbursement to all contributors)• 4. Open system• 5. Self-correcting system• 6. Ensurement of privacy and cultural
practices
MIR Categorization [Futrelle]Representation Description Research
Symbolic Notation,Event-based recordings (MIDI),Hybrid representations
Matching, Theme/Melody Extraction, Voice Separation, Musical Analysis
Audio Recordings, Streaming Audio, Instr. Libraries
Sound/Song Spotting, Transcription, Timbre/Genre Classification, Musical Analysis, Recommendation Systems
Visual Scores Score Reading (OMR)
Metadata Cataloging, Bibliography, Descriptions
Library Testbeds, Traditional IR, Interoperability, Recommendation Systems
Related Work• Audio:
– [Blum, Wold], [Pfeiffer], [Foote], [Logan], ...– [Scheirer], [Tzanetakis], [Welsh], [Aucouturier], [Peeters], ...
• Cultural: – [Whitman], [Pachet], [Ellis, Berenzweig]
• Multi-modal MIR– [Bainbridge], ...
• Recommendation– [Amazon, Moodlogic, MusicGenome, MuBu, MongoMusic], ...– [Uitdenbogerd]
• User Models– [Chai, Vercoe], [Rolland]
• Music Psychology– [Bruhn, Rösing], [Gabriellson, Västfjäll], ...
• Usability, Convenience– [Shneiderman], [Nielson], ...
Convenience
• Using natural language as input for queries of non-musicians
• Accessing meta data, symbolic and audio layers in one interface
• Evaluation of usability (e.g. eye-tracking + user interviews)
• Acquisition of audio features, symbolic features, meta data and lyrics
• Machine communication by using shared music ontologies (MPEG-7, RDF/S, DAML-S)
Prototype
bilingual matching of
phonetic ambiguities and
misspellings
recognition of intention
treatment of refinements and
negations
automatic generation of SQL
queries on demandIntention-based
result presentation
extraction of musical concepts from natural
language queries
Software Development Lifecylce
• System Design Philosophy: Google-Style• 1. Collection of User req. V1
– Offline– 20 germans, different user segments
• 2. Setup of prototype V1– Online Refinement of req. V1 -> Introduction of PhoneticMatch
• 3. Collection of User req. V2– Online with prototype V1– 100 american native-speakers, internet-aware users
• 4. Setup of prototype V2– Bilingual phonetic match– NLP frontend– Audio-based music similarity
• 5. Scaling of phonetic match component for commercial website
Convenience
www.musicline.de
´s no.1 hit
status que -> Status Quo
golgen earing -> Golden Earring
Fisher Set -> Fischer Z
Novospaski Chor -> Novo Spassky Chor
four none blondes -> 4 Non Blondes
Matchbox twenty -> Matchbox 20
Statistics: 540.000 queries/month 400.000 queries for artists/month 80.000 fuzzy queries for artists/month
Usability Evaluation: helping text
Multi-facet Music Similarity
• Audio: MFCCs• Lyrics: TFIDF• Cultural:
– Webcrawling– POS– TFIDF
Song Similarity: Audio-based Perception
• Feature Extraction– Input Segment [30..60] sec– 30ms Hanning-Windows, Log Spectrum, Mel-Scale, Inverse Fourier Transform– 1000 vectors using the first 13 MFCCs
• Representation– Intra-Song-Clustering -> Song Signature [Logan]– (Gaussian Mixture Models [Aucouturier])
• Similarity Measure– Euclidean Distance [Foote]– Kullback-Leibler Distance [Logan, Aucouturier]– (Approximative solutions: Sampling [Ellis, Aucouturier])– DistMinMean [Ellis]– Earth Moving Distance (EMD) [Logan]
• Different Features & Similarity Measures– [Welsh] Tonal histograms, tonal transition, volume, tempo, noise->Euclidean Distance– [Rauber&Frühwirth] Psychoacoustic Features -> Hierarchical SOM– [Pfeiffer] A review of MP3-native features– ...
Perception of similar Timbre in Songs: Evaluation?!?!
• Audio Database: 700 MP3s of mainstream music at full-length, 40 artists, 70 different genres
• Evaluation: no GT available! only anecdotal evidence or genre/artist/volume GT
Lyrics: Vector Space Model (TFIDF)
• Representation of a Collection of Lyrics# of terms k:Song j:Occurence of term h in collection d(h):Weight of term j in song i:
• Similarity metric
Song Similarity: Lyrics (1)
Reference Song 112: Lucy pearl - Dance tonight.txt Most-relevant terms: toast spend tonight dance money1. Similar Song : Lucy Pearl - you (feat. snoop dogg and Q-tipp).txt2. Similar Song: Phil Collins - Please Come Out Tonight.txt 3. Similar Song: Madonna - Into the groove.
Reference Song 56: Das Kind Vor Dem Euch.txt - die fantastischen vierMost-relevant terms: wollten euch sehn entsetzt selben1. Similar Song: Die fantastischen Vier - Auf Der Flucht.txt 2. Similar Song: Freundeskreis - Mit Dir.txt Artist: 3. Similar Song: Die fantastischen Vier – Populär
Reference Song 145: madonna - Paradise.txtMost-relevant terms: remains pas encore fois moiZero Hits
Song Similarity: Lyrics (2)
Reference Song 193: Phil Collins - One More Night.txt Most-relevant terms: forever wait night cos ooh1. Similar Lyrics: Phil Collins - YOU CAN'T HURRY LOVE.txt2. Similar Lyrics: Phil Collins - Inside Out.txt 3. Similar Lyrics: Phil Collins - This must be Love.txt
Reference Song 297: Cat Stevens - Father And Son.txt Most-relevant terms: fault decision marry son settle1. Similar Lyrics: Phil Collins - We're Sons Of Our Fathers.txt2. Similar Lyrics: Sheryl Crow - No One Said It Would Be Easy.txt 3. Similar Lyrics: George Michael - Father Figure.txt
Artist Similarity: Cultural Aspects
Web Crawling+PartOfSpeech+TFIDF
adj Terms TFIDF Phrases TFIDFdaft 0,20463 techno music 0,86982new 0,14242 old school 0,80009french 0,12907 great techno buzz 0,40004different 0,09314 overall groove 0,40004digital 0,08607 electronic artists 0,40004vocal 0,07558 new wave 0,40004cool 0,07339 usual drum n bass 0,40004electronic 0,06887 only band 0,36956funky 0,06497 big thing prodigy 0,36956underground 0,06497 good beat 0,34793
Visual Evaluation: Similarity (Cosine)
high
low
HEAVYMETAL ROCK POP SOUL DANCE
DANCE
SOUL
POP
ROCK
Recall/Precision against P2P, AMG data
R
ARa
Learning? Supervised
Evaluation ?![Downie, Uitdenbogerd]
SimilarityClustering Classification
Web Sources
Rel.Feedback (Rocchio)-subjective
-context-dependent-„personal taste“
Unsupervised GroundTruth
MusicSeer ? AMG=Experts
P2P=collabor. Experiment ?
Cosine vs.
Learning
Listening mode
Personal
Classifier
Part Of Speech + TermWeighting
VectorSpaceModel
WEKA Suite?
Psychological Factors >>Musical Taste
• Personality >> preferred Styles, Genres– Stability– Introversion / Extraversion– Aggressive / Passive
• Socio-economics >> preferred Styles, Genres• Demographic >> similar users in CF approaches >>
recos– Gender– Age
• Situation– Mood >> tempo, tonality, beatness, pitch height– Listening Mode [Huron]
User Model [Chai,Vercoe]
<user>
<generalbackground> <name>John White </name> <education>MS</education> <citizenship>US</citizenship> <birthdate>9/7/1974</birthdate> <sex>male</sex> <occupation>student</occupation></generalbackground>
</user>
<musicbackground>
<education>none</education>
<instrument>piano</instrument>
<instrument>vocal</instrument>
</musicbackground>
<generalpreferences>
<color>blue</color>
<animal>dog</animal>
</generalpreferences>
<musicpreferences>
<genre>classical</genre>
<genre>blues</genre>
<genre>rock/pop</genre>
<composer>Wolfgang Amadeus Mozart</composer>
<artist>Beatles</artist>
<sample>
<title>Yesterday</title>
<artist>Beatles</artist>
</sample>
</musicpreferences>
<habit>
<context>I’m happy
<tempo>very fast</tempo>
<genre>pop</genre>
</context>
<pfeature>romantic
<tempo>very slow</tempo>
<softness>very soft<softness>
<title>*love*</title>
</pfeature>
<context>bedtime
<pfeature>romantic</pfeature>
</context>
</habit>
Multi-facet Music Similarity and Adaptive User Model
• Hard-wired multi-facet similarity [Whitman]• Weighting of audio vs. cultural description by slider usage
[Aucouturier]• Description Weight Vectors (DWV) [Rolland]
– Original work for melodic similarity– DWV contains weight for each description in the
representation– Weight is varying with user interaction– Explicit user feedback: re-ranking of system´s output– Implicit adaptation of weights
• Future Work– Apply DWV to multi-facet similarity (audio,lyrics,cultural)– Infer initial setting of weights according to psychological
factors
Project MPEER"In a world of spontaneously federating services, there is no point in having a proprietary service, there is no point in staying out of the directory, there is no point in using an XML protocol that no one understands, there is no point in basing it on a proprietary server, and there is no need to justify the obvious error in following that path."
- Simon Phipps, chief technology evangelist, Sun Microsystems, Inc., 2001
MPEER Objectives
“Bringing the web to its full potential” [Fensel, Bussler]
Centralized /Static
Distributed /Dynamic
UDDI, WSDL, SOAP
Web Services
URI, HTML, HTTPRDF, RDF(S), DAML, OIL
WWW Semantic Web
Intelligent Web Services
WFSL -> WSMFDAML-S
Formal Semantic
• Relate MIR to the Semantic Web activities (W3C)
• Create (composite) Semantic Web Services for MIR
• Explore the P2P computing paradigm (shared resources)
MPEER Architecture
User
Audio(MP3)
Meta Data (XML / RDF)
Title,Artist,Volume,Genre,bpm,Loud,Sound,Like,Dislike,SimilarToAudio(MP3)
Meta Data (XML / RDF)
Title,Artist,Volume,Genre,bpm,Loud,Sound,Like,Dislike,SimilarToAudio(MP3)
Meta Data (XML / MPEG-7 / RDF-S)
Title,Artist,Volume,Genre,bpm,Loudness,Timbre,Like,Dislike,SimilarTo
WebService
e.g.
- Ontologies, Taxonomies
- CD-Retailers, EMD
- MIR services
- Audio ID
- Thumbnails
- ...
P2P Client/Server (Jtella/JXTA)
P2P Client GUI
Basic Features,
Descriptors
Tempo, Loudness
Timbre
Classification Music Similarity
Clustering
Semantic Web Wrapper
„Title Artist Volume Genre
Bpm Loud Sound
Like Dislike
SimilarTo ...“
MPEER: composite Webservice• Service Type: „query service“
– Sub Type: Semantic web enabled
– Domain: Music
– Supported ontologies: {ontoson, musicbrainz.com, allmusicguide, ..}
• Port Types:
– Identification by audio, Similarity by audio, Retrieval by partial information
– Personalized recommendations, Playlist generation
– Music-Question Answering
• Operations/Messages of Port Type Identification by audio:
– IF_NOT_MP3(input)->Convert2MP3(input)->CalculateMetadata-> ...
• Composite, Distributed Services: (maybe P2P using users local content&processing
power)
– (1) MPeer.getEverythingFrom(Prince)
– (2) WebServiceRepository.discover&select(SpecialArtistService)
– (3) SpecialArtistService=AllMusicGuide.detailedInfo
– (4) NegotiateContract(contract1,MPeer,AllMusicGuide)
– (5) Contract1.StartTransaction(MPeer,AllMusicGuide)
– (5.1) AllMusicGuide.detailedinfo(Prince)
– (5.2) ...
Prototypical P2P Client
OpenSource Tools: Ontology Editor
OpenSource Tools: DataMining, ML
Conclusion
• The Web offers potential beyond symbolic or audio-based MIR reflecting cultural issues
• User-centric MIR systems may benefit from user models and situation-driven adaptation
• The field is too large to be handled by individual institutes
• Composite web services offer a way for collaboration on the topic and maybe to provide holistic, high-quality MIR systems