the extraction of structure from a musical...
TRANSCRIPT
![Page 2: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/2.jpg)
related to human perception
rather from the listener's standpoint
than from the composer's standpoint
Musical Structure
![Page 3: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/3.jpg)
audio, no MIDI or symbolic information
audio descriptors
not (yet) limited to one style
looking for similarity and borders
Finding structure
![Page 4: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/4.jpg)
raw audio (11025 Hz)
log FFT
log FFT
power spectrogram
spectrum band variation information
EOFprincipal components
most significant spectrum band variations
musical piece (ogg, wav, mp3)
Most significant spectrum variations
![Page 5: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/5.jpg)
Most significant spectrum variations
PC of “log FFT” of framesfrom every band
spectrogram
100 feature vectors per second
time
frequency
most significantspectrum band variations
about 1 feature vector per second
![Page 6: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/6.jpg)
EOF based on SVD
Empirical Orthogonal Functions, based on Singular Value Decomposition
popular in climate research
type of Principal Component Analysis
useful for reducing number of dimensions
while explaining large part of variance
![Page 7: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/7.jpg)
Similarity matrixJ. Foote, 1999
1) the audio descriptors are Ndimensional space
2) calculate mutual distances: distance matrix
3) rescale: similarity matrix
![Page 8: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/8.jpg)
Similarity matrixtime
time
time
most significantspectrum variations
Similarity Matrix
Chardonnay Says by Nood/Banana
![Page 9: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/9.jpg)
Finding similar partsstep 1: calculate lag matrix
`
similarity matrix lag matrix
time
time delay time
time
![Page 10: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/10.jpg)
Finding similar partsstep 2: apply 2D FIR filter to blur
blurred lag matrixlag matrix
time
delay timedelay time
time
![Page 11: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/11.jpg)
Finding similar partsstep 3: find vertical local maxima
its local maxima(values from nonblurred matrix)blurred lag matrix
time
time
delay timedelay time
![Page 12: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/12.jpg)
Finding similar partsstep 4: postprocessing
0) forget first column (diagonal of similarity matrix)1) localize sufficiently long contiguous parts2) remove overlaps3) remove diagonal parts
local maxima similar parts
![Page 13: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/13.jpg)
Finding borders
step 1: convolution, kernels of different sizes
similarity matrix filtered matrices
![Page 14: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/14.jpg)
Finding borders
step 2: diagonals => columns
filtered matrices
diagonals of filtered matrices
tim
e
kernel size
![Page 15: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/15.jpg)
Finding borders
step 3: find local maxima in columns diagonals of filtered matrices local maxima
tim
e
![Page 16: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/16.jpg)
Finding borders
step 4: postprocessing
1) localize contiguous parts
2) sum their values
3) throw away positionswith too low values
4) refine the positionsusing the spectrogram
time
![Page 17: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/17.jpg)
Structural Information Theory formal calculus for Gestalt laws
focus on visual patterns
experimented with Genetic Programming
problem:
need for much higher description, musical objects,
thus source seperation, classification, ...
![Page 18: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/18.jpg)
Framework for Audio Analysis functionality interesting for
audio and music research
integrating research could be fruitful
finding musical structure
audio signal separation
sound classification
...
![Page 19: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/19.jpg)
scripting language, interpreted
objectoriented
flexible, extensible, easy to embed
modular
free software (BSD style license)
Python
![Page 20: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/20.jpg)
Scientific analysis environment standalone application: QtFfAA
GUI + command line, object viewer, visualisation
Embeddable in free audio software
for audio editors and recorders
for music players, DJ tools
FfAA modes
![Page 21: The Extraction of Structure from a Musical Piecerecherche.ircam.fr/equipes/analyse-synthese/jjcaas/slides/Souren.pdf · related to human perception rather from the listener's standpoint](https://reader034.vdocument.in/reader034/viewer/2022042321/5f0b7adf7e708231d430b990/html5/thumbnails/21.jpg)
versatile interface MDI GUI (PyQt)
commandline (IPython)
load and analyse sound files
database
visualisation
easily extensible
FfAA right now