foraging for music donald byrd school of informatics & jacobs school of music, indiana...
TRANSCRIPT
Foraging for Music
Donald ByrdSchool of Informatics & Jacobs School of Music,
Indiana University
rev. 10 April 2008
8 April 08 2
What’s the Problem?
• How much music is there?– Music holdings of Library of Congress: over 10M items
• Most is notation, especially CWMN (Conventional Western Music Notation), not audio
• Includes over 6M pieces of sheet music, 10’s/100’s of thousands of scores of operas, symphonies, etc.
• Today– iTunes: 6M tracks– P2P: 15B tracks
• Tomorrow– “All music will be on line”
• People have very diverse tastes, etc.
8 Apr. 08 3
Classification: Logician General’s Warning
• Classification is dangerous to your understanding– Almost everything in the real world is messy, ill-defined– Absolute correlations between characteristics are rare
• Example: Ginger Baker says Cream wasn’t a rock group• Example: did Bach write piano music?
– People say “an X has characteristics A, B, C…”– Usually mean “an X has A, & usually B, C…”– Leads to:
• People who know better claiming absolute correlations• “Is it this or that or that?” questions that don’t have an
answer• Don changing his mind
• But lack of classification is dangerous to understanding!• Should we abandon (hierarchic) classifications?
– Of course not! they're too useful, & impossible to avoid– Just be on guard for misleading things, consider
alternatives
27 Jan. 06 4
Basic Representations of Music & Audio
Audio (e.g., CD, MP3): like speech
Time-stamped Events (e.g., MIDI file): like unformatted text
Music Notation: like text with complex formatting
Digital Audio
Time-stamped Events
Music Notation
rev. 15 Feb. 5
Basic and Specific Representations vs. Encodings
Audio Time-stamped Events Music Notation
CMN Mensural not.
Gamelan not.
SMF
Csound score
NotelistMusicXML
FinaleETFexpMIDI File
Time-stamped MIDI
Time-stamped expMIDI
Csound score
Waveform
Red Book (CD)
Tablature
.WAV
Basic and Specific Representations (above the line)
Encodings (below the line)
27 Mar. 07 6
A Similarity Scale for Content-Based Music IR
• Categories describe how similar to query the items to be found are expected to be (from closest to most distant)
• Detailed audio characteristics in common1. Same music, arrangement, performance venue, session,
performance, & recording2. …4. Same music, arrangement, performance venue; different
session, performance, recording• No detailed audio characteristics in common
6. Same music, different arrangement; or different but closely-related music, e.g., conservative variations (Mozart, etc.), many covers, minor revisions
7. Different & less closely-related music: freer variations (Brahms, much jazz, etc.), wilder covers, extensive revisions
8. Music in same genre, style, etc.9. Music influenced by other music
6 Mar. 06 7
Ways of Finding Music (1)
• How can you find information/music you’re interested in?– You know some of it– You know something about it– “Someone else” knows something about your interests– => Content, Metadata, and “Collaboration”
• Metadata– “Data about data”: information about a thing, not thing itself (or
part)– Includes the standard library idea bibliographic information, plus
information about structure of the content– Metadata is the traditional library way– Also basis for iTunes, etc.: iTunes Music Library.xml– iTunes, Winamp, etc., use ID3 tags in MP3’s
• Content (as in content-based retrieval)– Cf. tasks in Music Similarity Scale
• Collaborative– “People who bought this also bought…”
8 Mar. 06 8
Ways of Finding Music (2)
• Do you just want to find the music now, or do you want to put in a “standing order”?
• => Searching and Filtering• Searching: data stays the same; information need
changes• Filtering: information need stays the same; data
changes– Closely related to recommender systems– Sometimes called “routing”
• Collaborative approach to identifying music makes sense for filtering, but not for searching(?)
6 Mar. 08 9
Ways of Finding Music (3)
• Most combinations make sense & seem useful
Searching Filtering
By content Shazam, NightingaleSearch, Themefinder
FOAFing the Music, Pandora
By metadata iTunes, Amazon.com, Variations2, etc. etc.; also Wikipedia, Google!
iTunes RSS feed generator, FOAFing the Music
Collaboratively N/A(?) Amazon.com, Last.fm; word of mouth!
22 March 07 10
Searching: Metadata (the old and new way) vs. Content (in the middle)
• To librarians, “searching” means of metadata– Has been around as long as library catalogs (c. 300 B.C.?)
• To IR experts, it means of content– Only since advent of IR: started with experiments in 1950’s
• Ordinary people don’t distinguish– Expert estimate: 50% of real-life information needs involve
both• The two approaches are slowly coming together
– Metadata creating “games” (Listen Game, etc.) should help a lot
– Need ways to manage both together
8 Apr. 08 11
To the Rescue: Music Recommenders! (1)
• Music Recommendation Tutorial– by Paul Lamere & Òscar Celma, at ISMIR 2007– Introduction: Why music recommendation is important
• 4-5: the Long Tail -- 6-10: different types of uses– 20 Formalization of the recommendation problem
• 26-31: users & items -- 64-80: genre & other text tags– 105 Recommendation algorithms– 135 Problems with recommenders
• 136-155: social recommenders -- 156-157: content-based– 158 Recommender examples
• 159ff: social -- 168ff: content (Pandora) -- 180ff: hybrid– 184 Evaluation of recommenders
• 188ff: metrics -- 191-192: mainstream vs. eclectic users– 246 Conclusions / Future
8 Apr. 08 12
To the Rescue: Music Recommenders! (2)
• Tim Westergren’s approach: Pandora– “Music Genome Project” defined 400 “genes” (attributes)– Every piece (song) has value 1 thru 10 assigned for each– ...completely manual: done by experts w/ degrees in music
theory, etc.– Mostly content-based– Has major advantages, but hybrid (social & content) is
probably best
rev. 10 April 08 13
“I don’t want similar music, I want something completely different!” (1)
• Much research, many commercial ventures designed to help people find music similar to something they have
• …but what about people who want something very different?– May not be that unusual: cf. Celma & Lamere “mainstream
vs. eclectic users” slides– E.g., something as far as possible from Britney Spears
• Don has “Seriously Weird” playlist & “Music as Different as Possible” project
• How about Brian Whitman’s “Eigenmusic” approach?– Problem: parameters too low-level, not perceptually
significant!
10 April 08 14
“I don’t want similar music, I want something completely different!” (2)
• How practical to make a system do depends on its representation of music– Must represent perceptual features well enough
• MusicStrands’ representation (every song is an attribute) doesn’t help much– …though might be possible to infer from network
• Pandora “music genome” (400 attributes for all music) is ideal– Find points far away instead of nearby in 400-D
metric space– Could do “Anti-Britney Spears Radio”!
8 April 08 15
Good Research Is Difficult (1)
• 1. Hard to evaluate reliability of info sources– Especially difficult on the Web– Ex: www.dhmo.org
• 2. People see what they expect to see– Ex: use of kitchen sponges increases E. coli
• 3. Almost everything in the world is complex, messy, etc.– Backus (in Musical Acoustics): why musicians’
explanations in acoustics are almost always wrong– “Classification: Logician General’s Warning”– Ex: What was the first piano? What is a trombone?
8 April 08 16
Good Research Is Difficult (2)
• 3. Easy to overgeneralize– Ex: Blair & Maron (1985): An Evaluation of Retrieval
Effectiveness for a Full-text Document-Retrieval System. CACM 28(3)
• Famous paper in text-IR research world• Well-thought-out, meticulously done large-scale
study• Conclusion (essentially): fulltext IR (vs. using
abstracts, hand indexing) isn’t worth the trouble(!)
• Faulty assumptions: – Litigation is typical domain, so recall is critical; no
statistical methods; storage is expensive; text must be entered for IR system
– Ex (fiction, but very plausible): Asimov short story: “Not Final”
10 April 08 17
Further Information
• Music Recommendation Tutorial• by Paul Lamere & Òscar Celma, at ISMIR 2007
– http://mtg.upf.edu/~ocelma/MusicRecommendationTutorial-ISMIR2007/
• Paul Lamere’s “Duke Listens!” blog– http://blogs.sun.com/plamere/
• My “Information Sources for Music Informatics Students”– http://www.informatics.indiana.edu/donbyrd/Teach/
GeneralInformationSources.html