detection and segmentation of bird song in noisy environments
DESCRIPTION
Detection and Segmentation of Bird Song in Noisy Environments. Lawrence Neal, UHC Honors Thesis. Bioacoustics Project. Bird Species Identifiable by species Presence/Absence, activity data is useful Bird activity may shift in response to climate change, ecological factors. - PowerPoint PPT PresentationTRANSCRIPT
Detection and Segmentation of Bird Song in Noisy EnvironmentsLawrence Neal, UHC Honors Thesis
Bioacoustics ProjectBird Species
◦Identifiable by species◦Presence/Absence, activity data is
useful Bird activity may shift in response to
climate change, ecological factors
Bioacoustics Project
Automated RecordingSong Meter automated recordersCollected May-August beginning
2009
Audio Data Analysis
Involves several steps:◦Extracting Bird Sound from Audio◦Identifying Bird Species◦Mapping species data back to sites
Audio Data Analysis
Involves several steps:◦Extracting Bird Sound from Audio
(Segmentation)◦Identifying Bird Species◦Mapping species data back to sites
SegmentationTime-Domain Segmentation
◦Separates audio into multiple clips◦Energy Thresholding, Onset/Offset
Detection◦Has been applied to bird song
Harma 2003, Fagerlund 2004, Lee 2008
SegmentationTime-Domain Segmentation
SegmentationTime-Domain Segmentation
◦Cannot separate overlapping sounds
SegmentationTime-Frequency Segmentation
◦Segment regions of the 2D spectrogram
SegmentationSpectrogram Segmentation
◦Similar to image segmentation
SpectrogramsTwo-dimensional representation
of sound◦Audio amplitude at each (time,
frequency)◦Generated by short-time Fourier
Transform Male voice saying 'nineteenth century'.
Violin playing (note harmonics)
SpectrogramsTradeoffs in parameters
◦Larger STFT size◦Higher freq. resolution
SpectrogramsTradeoffs in parameters
◦Shorter step size◦Higher time resolution
Spectrogram SegmentsEach segment is a continuous
region◦Defined by a binary mask over the
spectrogram
Spectrogram SegmentsCan be converted back to audio
with inverse STFT, or left as 2D segments
Segmentation MethodsPer-Pixel Random Forest
◦Trains on one feature vector per pixel◦Outputs probability per-pixel
Superpixel Merger Method◦First splits spectrogram into
‘superpixels’◦Trains on one feature vector per
superpixel◦Second classifier trains per
superpixel pair◦Outputs connected sets of
superpixels
Random ForestSupervised Classifier
◦Trains on human-provided data with labels “Feature Vector” of values, each with
yes/no label
◦Learns to mimic the human’s labelsBased on decision trees:
◦Tree is traversed with feature vector X
◦Each interior node is a decision of the type: If (Xd < θ) go left; else go right
◦Each leaf node contains a class label In this case, two classes: ‘Bird Sound’ and
‘Negative’
Random ForestConstructed by recursive procedure
◦Check if all remaining examples are the same If so, finish with a leaf node
◦Select a random subset of features For each one, find the optimal split (highest Gini)
◦Choose the (feature, split) pair for maximum Gini coefficient and create new interior node
◦Split the examples and recursively create two child nodes
Classification is a vote among all trees
Per-Pixel TrainingHand-Drawn mask over
spectrogram◦Pixels are randomly sampled
Per-Pixel TrainingFeature vector includes:
◦Pixel Frequency◦Window Variance◦All window pixel values
Per-Pixel OutputProbability Mask over the
spectrogramThreshold is applied to extract
segments
Per-Pixel Output
Per-Pixel Output
Per-Pixel Output
Per-Pixel LimitationsScope is limited to window sizeHigh threshold causes
oversegmentationLow threshold causes
undersegmentationSlow- must classify for each pixel
Superpixel MethodBegins with an initial pre-
segmentation◦Modification of Simple Linear
Iterative Clustering (SLIC) image segmentation
◦Uses computed features that describe regions of the spectrogram
Segments are sets of superpixels
Superpixel ClusteringBased on SLIC method:
◦Each pixel is assigned a 5-valued vector (X,Y, L, a, b) for position and color
Locally-constrained K-Means Clustering◦Each centroid searches only a radius
of 2S S = sqrt(N/K)
Creates a set of regularly-sized regions◦Some regions’ boundaries follow the
edges of larger objects in the image
Superpixel ClusteringOver-segments an image
◦Edges of clusters arealong image edges
But, doesn’t workfor spectrograms
Superpixel ClusteringSpectrograms lack edges
◦Also, only one channel of colorInstead of (x,y,L,a,b), we use a
new vector:◦(x, y, B, V, Gx, Gy, Px, Py)
Superpixel ClusteringX,Y values
◦Time and frequency values in the spectrogram
B, V◦Pixel values after Gaussian blur, variance of
pixel valuesGx ,Gy
◦Horizontal/Vertical Sobel Gradient valuesPx, Py
◦Time and Frequency values of nearest peak (weighted by Gaussian kernel)
Superpixel Clustering
Foreground/Background ClassifierRandom Forest trained using the
same manual spectrogram labels as per-pixel◦Each superpixel is labeled positive
(foreground) if more than 10% of its area overlaps with a positive-labeled region
Feature vector describes superpixel:◦Mean and variance of pixel values,
blurred pixel values, peak frequencies◦Histogram of Oriented Gradients
Foreground/Background Classifier
Superpixel Merger ClassifierRandom Forest trained to classify
pairs of adjacent superpixels◦Positive classification: Merge
together◦Negative classification: Split apart
After background pixels are discarded, all remaining edges between superpixels are classified◦All edges above a threshold are
merged
Superpixel Merger Classifier
Superpixel Method Output
Superpixel Method Output
Superpixel Method Output
Superpixel Method Output
Superpixel Method Output
Superpixel Method Output
Evaluation DatasetsHJ Andrews dataset, 625
recordings◦Each 15 seconds long◦Drawn 2 each from 24 hours
“Set A” dataset, 166 recordings◦All from early and mid morning◦Paired by year, 2009/2010
Differences in Training Data
Results
Results
Results
Results
Future WorkSuperpixel Method is promising
◦Faster than per-pixel classification◦Could use more sophisticated
merger technique
Bibliography A. Harma, “Automatic identification of bird species based on sinusoidal
modeling of syllables,” in IEEE International Conference on Acoustics Speech and Signal Processing, April 2003, pp. 545–548.
Chang-Hsing Lee, Chin-Chuan Han, and Ching-Chien Chuang, “Automatic classification of bird species from their sounds using two-dimensional cepstral coefficients,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 8, pp. 1541 – 1550, 2008.
Leo Breiman, “Random forests,” Machine Learning, pp. 5–32, January 2001. Fagerlund, Seppo. Automatic Recognition of Bird Species by Their Sounds.
Master’s Thesis, HELSINKI UNIVERSITY OF TECHNOLOGY, Laboratory of Acoustics and Audio Signal Processing. Nov. 8, 2004