sonogram - dfki · 2002-05-13 · sonogram acoustical frequency analysis tool christoph lauer...

SonogramAcoustical Frequency Analysis Tool

Christoph Lauer

[email protected]

March 16, 2002

Abstract

Sonogram transformates time-domain based au-dio information (samples) into the frequency do-main using the transformation methods of ”FastFourier Transformation” (FFT), ”Linear Predic-tion Coefficients” (LPC), ”Cepstral Analysis” andthe ”Wavelet transformation”. It extracts samplesfrom multimedia files and daws them into the mainwindow which shows the frequency informationin a two-dimensional form (frequency-time area).The most common audio and video file formatsare supported. The two-dimensional plots can beshown in an interactive three-dimensional repre-sentation. Many signal analysis parameters canbe configured and the representation can be con-figured in different forms. The program is writtenin Java2 therefore it runs on any operating sys-tem where Java2, is supported. Calculated pic-tures can be stored as “Scalable Vector Graphics”(SVG) and bitmaps. It has a build-in documenta-tion which describe most of software components.The program can be free used for non comercialand research purposes.

Contents

1 Introduction 2

1.1 Background . . . . . . . . . . . . . 2

1.2 How to use this Document . . . . . 2

1.3 System Requirements and the JavaInstallation . . . . . . . . . . . . . 2

1.4 Installation and Starting . . . . . . 2

2 Digital Signal Processing 3

2.1 What is a Sonogram ? . . . . . . . 3

2.2 Digital Audio Processing . . . . . . 3

2.3 Signal Analysis and the FourierTransformation . . . . . . . . . . . 4

2.4 Windowing the Signal . . . . . . . 4

2.4.1 Aliaising in Spectrum Anal-ysis . . . . . . . . . . . . . 4

2.4.2 Window Function . . . . . 4

2.4.3 Choice of Window Function 5

2.4.4 Leakage in Spectrum Analysis 5

2.4.5 Overlapping . . . . . . . . . 5

2.5 Linear Prediction Method (LPC) . 6

2.6 Frequency-domain Pitch Detectors 6

2.7 Cepstral Analysis . . . . . . . . . . 6

2.8 Analysis by Wavelets . . . . . . . . 7

3 Concept 8

3.1 Opening, Calculating and Drawing 8

3.2 Supported File Formats . . . . . . 8

4 User Interface 8

4.1 Main Window . . . . . . . . . . . . 8

4.2 Adjustment Dialog . . . . . . . . . 9

4.3 Toolbar and Menu bar . . . . . . . 10

4.4 Analyse Windows . . . . . . . . . . 11

4.4.1 Waveform Display . . . . . 11

4.4.2 Fourier Display . . . . . . . 11

4.4.3 LPC Display . . . . . . . . 11

4.4.4 Wavelet Transformation . . 11

4.4.5 Cepstrum Display . . . . . 12

4.5 Information Dialog . . . . . . . . . 12

4.6 3D Surface Plot . . . . . . . . . . . 12

4.7 File Open Dialog . . . . . . . . . . 12

4.8 Online-Help . . . . . . . . . . . . . 12

4.9 Keyboard Shortcuts . . . . . . . . 13

1

5 Working with Sonogram 13

5.1 Opening a File . . . . . . . . . . . 13

5.2 Configuring the Sonogram with theAdjustment Dialog . . . . . . . . . 14

5.3 Analysing Windows . . . . . . . . 14

5.4 3D Surface Plot . . . . . . . . . . . 14

5.5 Working with SVG . . . . . . . . . 14

6 Future Work 14

A Musical Frequencies 15

B About this Document 15

1 Introduction

1.1 Background

Sonogram arose from a spectral analysis modulefor Michael Kipp’s annotation program “Anvil“(www.dfki.de/∼kipp/anvil) , at the ”GermanResearch Center for Artifical Intelligence” (DFKI)in Saarbrucken. The first task of this mod-ule was to transformate audio files in a fre-quency spectrum. Bit for bit more features wereadded to the program, resulting Sonogram in thecurent version. It analyses audio informationin the frequency-domain with the most commonmethods used in phonetics and linguistics, andpresents them graphically. Since 1965 Cooley andJ.W. Tukey educe the “Fast Fourier Transforma-tion“ (see chapter 2.3) algorithm, computer pro-vided frequency analysis has made great advances.Sonograms aim is to be as configurable as possi-ble for the different usecases of spectral analysisin speech and sound analyse.

1.2 How to use this Document

This document tries to be as complete as possi-ble and describe’s the use of Sonogram. It alsogives an introduction into the digital signal analy-sis.The ”Digital Signal Processing” chapter showsthe more theoretical background of the programand makes clear the view from the point of thesignal analysis. The ”User Interface” section de-scribes all functions in a more detailed form. A

more practical point of view gives the ”Workingwith Sonogram” chapter.

1.3 System Requirements and the JavaInstallation

Sonogram needs Java2 in the version 1.3 orhigher, Java Media Framework (JMF) andJava3D. For Solaris and Windows systems JMFcan be downloaded from Sun Microsystems(www.java.sun.com). The Java Media Frame-work for Linux platforms is available from Black-down (www.blackdown.org). The installation forLinux is done by copying the jar files into theright folder how described in the readme file whichcomes with the installation package. Apple OS-Xusers should use the cross platform version of JMFand copy the jar files in the /Library/Java folderor add the jar files to her classpath. The Java3D installation is optional. MS-Windows have thechoice between the DirectX and the OpenGL ver-sion at Sun’s website. Windows NT 4 users shoulduse the OpenGL version, the rest can downloadthe DirectX version. To install Java3D to theUNIX derivations simply unpack the tar archiveinto the Java installation folder and set the class-path how described in the readme file. At this timea Java3D version for Apple OS’es is not availabletherefore the 3D surfaceplot function is not avali-able for the Macintosh operating systems.. It wassuccessfully tested on Linux, Solaris 7, Mac OS-Xand all Windows platforms. The processor speedshould be greater than 100MHz and the systemshould have at least 64MB of physicaly RAM.

1.4 Installation and Starting

Sonogram can be downloaded from DFKI’sNITE Project page www.dfki.de/nite, fromwww.dfki.de/∼clauer/sonogram or from the au-thors home page www.christoph-lauer.de. TheSonogram.zip file includes the Sonogram.jar

class archive, start scripts for UNIX, Mac OS-X and Windows and various icons. The un-packed files should be placed into a new directory.On Windows systems, Sonogram is started withthe start script Sonogram.bat. Unix and Linuxusers change the file attributes with chmod u+x

2

Sonogram.sh and start it with ./Sonogram.sh.Macintosh OS-X users should open the programsimply by clicking the jar archive. Sonogram canalso be started manually from a console with java

-classpath Sonogram.jar:. Sonogram. Theprogram does start without installed Java3D,without JMF the start process will end with amessage. Without the Java3D installation the 3Dsurface plot function is disabled.

2 Digital Signal Processing

This chapter give a short introduction of digitalthe audio-processing, the signal analysis and themethods of windowing.

2.1 What is a Sonogram ?

A sonogram, sonograph or spectogram is a well-known spectrum display technique in speech pro-cessing and sound analyse, having been used fordecades to analyse sounds and voices. A sono-gram shows an overview of the spectrum of sev-eral seconds of sound. This enables the viewer tosee general features such as the onset of notes orphonemes, formant peaks, and major transitions.A trained viewer can read the phonems in thesonogram. The sonogram representation has alsobeen employed as an interface for spectrum editingand music analysis. The original sonogram wasBackhaus‘s(1932) system [Roa96], described ear-lier in the background section on spectrum analy-sis. In the 1950s the Kay Sonograph [Roa96] wasstandard device for making sonograms. It con-sisted of a number of narrow bandpass analog fil-ters and a recording system that printed dark barson a roll of paper. The bars get thicker in propor-tion to the energy output from each filter. Todaysonograms are implemented with the ”WindowsFourier Transformation”.

2.2 Digital Audio Processing

Digital audio data is commonly stored as time-domain based discrete information. Discretisationof the signal is done in the time and the amplitudedimension. In the time dimension discretisation

(also named quantization) depended on the sam-pling rate, which describes how many samplingpoints are computed for second. The Nyquist the-

orem defines the maximal frequency that can berebuild from a digital stored signal. This is thehalf sampling frequency called the (Nyquist Fre-

quency). A CD player which read the samplesstored with 44100 samples per second play the sig-nal with maimal 22050 = 44100/2 Hz. The otherquantsitation is take on the amplitude. The val-ues of the samples from the signal cannot take anyconceivable value. This is because digital num-bers can only be represented within a certain rangeand with a certain accuracy, which varies with thenumber of bytes used to store the amplitude.Theimplications of this are an important factor in dig-ital audio quality. Samples are usually representedas integers. If the input signal has a voltage corre-sponding to a value between 53 and 54, for exam-ple, then the analog digital converter might roundif off and assign a value of 53. In general, for eachsample taken, the value of the sample usually dif-fers from slightly from the value of the originalsignal. This problem in digital signals is knownas quantization error or quantization noise. Theamplitude quantization error is dependent on twofactors: the digital signal itself, and the accuracythe signal is represented in digital form. If thequantization is very fine, for example 24 bit whichdivides the amplitude into 2 up 24 parts, the quan-tization error is very small. In test with humanspsycho acoustic scientists have found the maximalaudible amplitude difference between two signals,and calculate that 12 bit amplitude quantizationis enough for this span. To sum up quantizationis done in two typs:

• Amplitude quantization, which divides theamplitude in discrete parts and make quanti-zation errors of the amplitude in comparisonwith the original signal.

• Time quantization, which is presented bysampling rate. They give the stored ampli-tude numbers per second.

3

2.3 Signal Analysis and the FourierTransformation

Classical signal analysis goes back to JeanBaptiste-Joseph Fourier (1768-1830). He de-scribed in his work about “Analytical Theory ofHeat” [Lau00] that every symbolic function can bedisplayed as infinite summation of sinus-functions.He developed the “Fourier Integral” which trans-formed every symbolic function in a summationof sines and cosines functions. For time-domainbased real functions, e.g. sound waves, this sum-mation represent the included harmonic frequen-cies. For discrete datasets the continuous fourierintegral pass over to a discrete summation namefourier Series. 1965 Cooley and J.W. Tukey de-veloped in their work ”An Algorithm For TheMachine Calculation Of Complex Fourier Series”[Lau00] an algorithm that is very fast if the num-ber of samples is an power of two)1. This al-gorithm, named “Fast Fourier Transformation”(FFT), or one of his deviations is used in manytechnologies, for instance: mobile telefone, MP3,geologic research, audio analyse devices... TheFFT transformation for a complete audio signal,e.g. 180 seconds with 44100 samples per sec-ond , generates a frequency vector with 44100/2 ∗180 = 3969000 frequency points 2in the scale from0...22050Hz. The frequency resolution for this sig-nal is 22050Hz/3969000 = 000555Hz. This is avery very good resolution, but at which time thefrequency in the signal appears is absolutely notclear. Although the complete information of thesignal is stored in this frequency vector it is notpossible to make any predication at which timeevery frequency has his maxima or minima and soon. 1946 Denis Gabor developed the “WindowedFourier Transformation” (WFT) also named Ga-bor Transformation to elude this disadvantage.

1see [Bri95] for detailed information about FFT.2with the idealism for physical real signals without any

imagine allotment. Normally the FFT computes complexvectors but for real vectors the outcoming spectrum is mir-rored at the middle, so the negative frequencies are notrelevant. In this physical case the spectrum is the absoluteamount spectrum. Compare [Lau00].

2.4 Windowing the Signal

In signal analysis the therm ”windowing” meansthat a time-domain based signal is splitted intopieces of fixed length.The time resolution is pre-cisely, but still limited to window length. If the 180second time-domain based signal is splitted into3000 peaces with 44100 ∗ 180/3000 = 2646 sam-ples per window the time resolution is 180/3000 =0.06secons = 60milliseconds. The FFT algo-rithm computes 2646/2 = 1323(comp. Nyquist)frequency points and consequently an frequencyresolution from 22050Hz/1323 = 16.66Hz. Con-sequently, when time resolution is going up fre-quency resolution is going down and vice versa.This fact is the acoustic uncertainty relation,which means that not both units can be de-fined exactly. The acoustic quant is conse-quently: frequencyresolution ∗ timeresolution =constant.

2.4.1 Aliaising in Spectrum Analysis

Frequency components in the signal over thenyquist frequency can not be strobed correctly.(seeFigure 1) They are under-sampled and are mir-rored at the head of the spectrum. This is thereason for anti aliasing filters which margin theband width to nyquist frequency before the sig-nal is sampled. Sonogram‘s option to open fileswith 8KHz instead of the original sampling ratehas no anti aliasing filter included, so signals withfrequencies more than 4KHz generate aliasing ef-fects.

2.4.2 Window Function

The single windows have one disadvantage. Theyare cut out of the signal, so on the ends the sig-nal is not zero. For frequency transformation thismeans the signal has frequency peaks on differ-ent places. To denoise this window, the windowitself is multiplied with any window function likethe gaussian, a triangle or the hamming function.The result of the FFT is smoother. All standard(nonrectangular) windows are symetric and havespectra whose shapes resemble a mathematical si-nus function, that is sin(t)/t. This shape is char-

4

Figure 1: Aliased sampling from a sinus signal.

acterized by a prominent main lobe and a seriesof side lobes on either side of the main lobe. Themain properties of the window rate the width ofthe main lobe defined as the number of frequen-cies bins it spans- and the highest side lobe level,which measures how many decibels down the high-est side lobe is from the main lobe. The magnitudeof the side lobes is dependent on the shape of thewindow function and is dependent of the window‘sduration.

2.4.3 Choice of Window Function

From a practical standpoint,there is no “universalbest” window choice for all sounds and all analysispurposes. For many musical applications involv-ing sound transformation, a smooth bell shapedcurve such as Kaiser or Gaussian window worksresonably well. The advanced user of the phasevocoder must decide which aspect of analysis ismost important in a specific application. For ex-ample, when it is known that the input signal iscomposed of a distinct number of sine components,then a window with a narrow main lobe is recom-mended. For noisy signals, where the energy is notcentered at individual frequencies, a window witha broad main lobe works best. Since the choiceof window affects the plot, spectrum displays forscientific purposes should indicate the type of win-dow used in the analysis. In the Sonogram adjust-ment dialog register card ”window Function” thelobes damping for the windows functions is dis-

played.

2.4.4 Leakage in Spectrum Analysis

When a single sine wave is fed into an FFT, wewould like the energy to be concentrated in themain lobe. Unfortunately the process of the win-dowing disorts the analysis and causes leakage. or“splatter” from the main lobes. Any frequencypresent in the window signal splatter across allfrequencies bins of the output spectrum. This canclearly be seen in the spectrum of a sine wave win-dowed by a rectangle. Here the window diffusesthe single input frequency component into energyextending across the bandwidth of the analyser.

Figure 2: Leakage in spectrum generated with nowindow function from a 500Hz sinus wave.

2.4.5 Overlapping

Overlapping in signal analysis means that singlewindows are not sequential one after an other. Ifoverlapping is zero every window follow anotherwith step width from one window to the next. Toget more windows for each time span step widthis smaller than one window length. The factorof step with to window length is named overlap-ping. Overlapping can not break the acoustic un-

5

certainty relation, it brings only more transforma-tions for each time span.

2.5 Linear Prediction Method (LPC)

Linear prediction derives its obscure name fromthe fact that in the spectrum analysis part of thesystem, output samples are “predicted” by linearcombination of filter parameters (coefficients) andprevious samples. A prediction algorithm tries tofind samples at positions outside a region whereone already has samples. That is, any extrapola-tion of a set of samples is prediction. Inherent inprediction is the possibility of being wrong; thusprediction algorithms always include an error es-timation. A simple predictor simply continues theslope of differences between the last sample andthe sample before it. This type of predictor canbe made sophisticated by takeing more samplesinto account. It can also take into account the er-ror or distance between the sample it predicts andthe actual value of the signal, if this is known (andit is known in LPC). Since the predictor is lookingat sums and differences of time-delayed samples,it can be viewed as a filter—a filter that decidesthe waveform it is currently processing. If we takeregular snapshots of these filter coefficients overtime, invert them, and then drive the resulting fil-ter with a rich, wide-bandwidth sound, we shouldhave a good approximation of the time-varyingspectrum of the output signal. Thus a “side ef-fect” of the prediction is to estimate the spectrumof the input signal: this is the important point.But spectrum estimation is only one stage of LPCanalysis, the others being applied to pitch, ampli-tude, and the voice/unvoiced decision. Spectrumof so linear estimated signals seems smoother likehis FFT brother. In Sonogram the LPC algorithm3 first generate the linear polynome with a config-urable number of samples (previous samples) withthem the extrapolated time-doamin signal is gen-erated. The degree of the polynome (number ofcoefficients) is change the precision of this vectorin relation of the original. This extrapolated sam-ple vector with the length of the FFT buffer isthen

3See online manual for a further information and a pic-ture of this LPC algorithm

transformed. The result is the LPC spectrum.

2.6 Frequency-domain Pitch Detectors

Frequency-domain pitch detection methods dis-sect the input signal into frequencies that makeup the overall spectrum. The spectrum shoes thestrength of the various frequency components con-tained in the signal. The goal is to isolate the dom-inant frequency, or pitch out of the spectrum. Atypical frequency-domain approach analyzes suc-cessive segments of the input signal using theshort-time fourier transform (WFT). Frequency-domain pitch detectors seek peaks in the spec-trum that correspond to prominent frequencies.After finding the peaks, the pitch detector mustdecide which frequencies are fundamentals (gen-eral perceived as pitches) and which frequenciesare merely harmonics or extraneous partials. Aquick real-time frequency-domain pitch detectormight simply select the strongest frequencies asthe pitch. A more sophisticated detector mightlook for harmonic relationships that imply a fun-damental frequency. This fundamental may notbe the strongest component, but it may be themost prominent perceived pitch due to the “rein-forcement” of multiple harmonics. A problem withWFT based pitch detectors is that the WFT di-vides the audio bandwidth into a set of equallyspaced frequencies channels or bins where eachchannel is n HZ apart from its neighbors. Sincehuman pitch perception is basically logarithmic,this means that low pitch may be tracked less ac-curately than high pitches. For example , anaanalyser with frequency resolution of 20 Hz canresolve microtones in the register between 10 and20KHz, but offers less than a semiton resolutionbelow middle C. Accurate pitch resolution at thelow end of the spectrum of analysis channels.

2.7 Cepstral Analysis

A common frequency-domain pitch detectionmethod in speech research is the cepstrum tech-nique, having first been used in analysis of speech.Cepstrum analysis has often been applied in con-junction with the technique of linear predictivecoding (LPC). The term “cepstrum” war formed

6

by reversing the first for letters of “spectrum”. Asimple way of describing the cepstrum is to thatit tends to reasonable model of many vocal andinstrumental sounds whose spectrums can be con-sidered as the sum of an excitation (the originalvibrational impulses, typically at the pitch of thesound) and the resonances (the filtered part ofa sound created by the body of an instrumentor the vocal tract). Technically, the cepstrum isthe inverse fourier transform of the log-magnitude

Fourier spectrum. This is the absolute value of thelog (base 10) of the output of the discrete fouriertransform. The result of the cepstrum computa-tion is a time sequence, like the input signal it-self. If the input signal has strong fundamentalpitch period, this shows up as a peak in the cep-strum. By measuring the time distance from time0 to the time of the peak, one finds the funda-mental period of this pitch. How does cepstrumanalysis work for speech? The cepstrum servesto separate two superposed spectra: the glottalpulse (vocal cord) excitation, and the vocal tractresonance. The excitation can be viewed as a se-quence of quasi-periodic pulses. The fourier trans-form of these pulses is a line spectrum where thelines are spaced at harmonics of the original fre-quency. The process of taking the log magni-tude does not affect the general form of this spec-trum. The inverse fourier transform yields anotherquasi-periodic waveform of pulses. By contrast,the spectrum of the response of the vocal tract(acting as filter) is a slowly varying function offrequency. The process of applying the log mag-nitude and the inverse fourier transform yields awaveform that has significant amplitude only for afew samples, generally less than the fundamentalpitch period. It can be shown that the impulseresponse decays as a function of 17n, then its cep-strum decays as 1/(n*n). Thus the cepstrum clus-ters the impulse response into a short burst at thebeginning of the cepstrum wave, and it clustersthe pitch into a series of peaks at the period ofthe fundamental frequency. Cepstrum computa-tion has many applications because it tries to sortout the impulse response from the excitation. Inother words, the cepstrum tends to deconvolve thetwo convolved spectra. The log magnitude oper-

ations in the cepstrum procedure tend to clusterthese two quasi-separate components of the spec-trum. By advanced operations that we will notdescribe here, either of these features can be fil-tered out so that cepstrum contains spectrum in-formation associated with either timbre or pitch.Another application of the cepstrum is found inspeech analysis/resythesis. If there is no peak inthe cepstrum this is an indication that the soundbeing analyzed is unvoiced–that is, it is a breathyor consonant sound with no pitch like “f” or “s”,as opposed to a voiced vowel like “ah”.

2.8 Analysis by Wavelets

The wavelet transform (WT) was originaly devel-oped by scientists at the University of Marseillefor applications in physics and acousts. A waveletis a signal that forms a sinusoid with a smoothattack and decay. The therm “wavelet” and itsFrench equivalent ondelette are not new, how ever,beeing used in early-twentieth-century physics todescribe the packets of energy emitted by atomicprocesses. From a signal analysis perspective, theWT can be considered as a spherical case of theconstant Q filter paradigm. Wavelets inject thenotion of a “short time” or “granular” represen-tation into the constant Q filter model. The WTpresents and manipulates sounds mapped onto atime-frequency grid or plane.Each rectangle in thisgrid presents its uncertainly product. The centerof each grid is the mean time of occurrence andthe spectral centoid. Such a grid is also implicitin constant Q methods(used with digital filters),but it is rarely used explicitly. in music analy-sis with the WT, one sets up the grid accordingto the goals of the analysis and disorts the gridaccording to the goals of the resynthesis. In thewavelet theory, every signal can be expressed as asum of wavelets having a precise starting time, du-ration, frequency, and initial phase. The protyp-ical wavelet for music has a Gaussian envelope,but other types of wavelet envelopes can be de-fined. Thus the wavelet is similar to the gain andto the windowed segments of the short time fourieranalysis (comp. WFT). The precular aspect ofthe wavelet is that no matter what frequency it

7

contains, it always encapsulates a constant num-ber of cycles. This implies that size (duration)of the wavelet window expands or compress ac-cording to the frequency being analysed. Thisstretching and shrinking is refered to as dillation

in the literature and is usually specified as a fac-tor of 1/frequency. The implication of the dilat-ing window size is that the WT trades frequencyresolution for time resolution at high frequencies,and trades time resolution for time for frequencyresolution at low frequencies. Thus the WT cansimutaneously detect precise onset times signaledby high-frequency transients , and also resolve thelow-frequency spectrum well.

3 Concept

3.1 Opening, Calculating and Drawing

Generating a Sonogram happens in three steps.Samples are read from file, frequency transforma-tion gets calculated with adjusted settings anda picture is drawn on screen. Those steps canbe started manually from the adjustment dia-log. Usually ”Fast Fourier Transformation” is thestandard transformation algorithm for frequencyanalysis in speech and musical frequency analy-sis and is implemented internally. Alternatively,the ”Linear Prediction Coefficients Method” (seechapter 2.5) algorithm can be chosen which gener-ates smoother spectrums than the FFT (see chap-ter 2.3) transformation. The window length forthose transformations can be selected manuallyor the program may search the best length foroptimal displaying on the screen. The transfor-mation results are stored in a matrix for repaintdrawing events. For faster displaying areas whenwindow comes in foreground or other windows aremoved, the double buffering technology in the pro-gram stores the picture internally and refresh theneeded zones without calling the paint functions.To speed up displaying, double buffering is usedto redraw areas of the window when it is exposed:The picture is stored internally and used to refreshneeded zones without having to call the paint func-tion.

3.2 Supported File Formats

With the help of the ”Java Media Framework” li-brary the program extract the samples from themultimedia files. The Supported file formats aredependent from the version of JMF. But mostcommon audio and video 4 files are supported.The supported file formats are dependent from theoperating system version of the JMF. The at thismoment the supported file formats for the mostoperating systems are:

• AUDIO: Sun Audio (au), Wave Audio(wav), Apple Audio (aif,aiff,aifc).

• VIDEO5 AVI Video (avi), MacromediaFlash (swf,spl), GSM Video (gsm) , MPEGLayer I,II,II (mpeg,mp2,mp3), Apple QuickTime (mov), IBM HotMedia (mvr),

4 User Interface

4.1 Main Window

The biggest region of the main window show theSonogram, displayed in the selected color. TheFrequency scale can be switched between a lin-ear or a logarithm scale.In the under section ofthe main window the signal is shown in the time-domain, colored in the self color the sonogram it-self is shown. This can show either the amplitudeor the energy of the time signal, dependent fromthe selected option in the adjustment dialog. Onleft side the frequency numbers are plotted besidesthe grid if it is enabled. The information displayedin the under right corner, shows information in twoareas. The upper area shows the peak frequencyof time point under the mouse (P), the actual fre-quency under the mouse (F), the time positionunder the mouse (T) and the relative amplitudein relating the maximal amplitude of the displayedsonogram (A). The under ara shows choices madein the options for the program like window func-tion, logarithm amplitude or window overlapping.The graph of the right side illustrates the current

4Only audio information track of video files are analyzed.5Video files can be stored with different CoDdec’s. Here

performed files must not be support implicit! See Sun’sJMF supported file formats for details.

8

spectrum from the time position under the mouse.Under the menu bar the toolbar with his violetcolored buttons is placed.

Figure 3: Main Window.

4.2 Adjustment Dialog

The adjustment dialog is partitioned in registercards.

• General:

– Spectrum while playing enables the ani-mation of the right side spectrum.

– Monochrome single curve let the rightside spectrum shown in violet color.

– Normalize single curve uses the full areaof the right side spectrum.

– Smooth out sonogram over frequencyaxis smoothed the frequency picture hor-izontal.

– Smooth out sonogram over time axissmoothed the frequency picture vertical.

– Save hot list while ending stores the filehistory in the SonogramConfig.xml file.

– Save screen positions while endingstores the screen positions and win-dow dimensions of all windows in theSonogramConfig.xml file.

– Save configuration while ending storesall adjustment informations in theSonogramConfig.xml file.

– Loop playing repeat the playing.

– Open last file at start opens automati-cally the last file when Sonogram startednext time.

– Open last file with zoom opened the filewith zoom which was selected last time.

– Open with 8KHz sample rate convert allfiles to 8KHz sample rate. This may beprofitable for long files.

– Show grid in Sonogram.

– Show intensity shows the energy in-stead the amplitude in under partitionon main window.

– Search local peak highlighted highestpeak in the showed Sonogram.

• Colors: Colors for the main window can se-lected and inverted.

• Window Functions: Every window func-tion has a specific attenuation characteristicwhich is shown in upper area.

• Window Length: Automatically window-ing search the best length for optimal display-ing.

• Overlapping: Can be enabled or disabled.

• Logarithm Amplitude: Influence coloring.

• Logarithm Frequency: Influence the scaleof the logarithm frequency graduation.

• GainThe The loudness.

• 3D Surface Plot: Point, line and a solid sur-face are selectable. Length and width alterthe proportions of the wave-mountain Colorchange the background color and universeswitcher enables/disables the background pic-ture. The data reduction slider is in betaphase.

• FFT: Window length and scale-width are ad-justed here for the Fourier window.

9

• LPC: This settings are explained in the LPCchapter(2.5) of this document and in theonline-help where an overview picture is avali-able. This register card changes the settingsfor the LPC single window and the major set-tings for the LPC Sonogram if the transfor-mation in selected to LPC ! Is the Sonogramis calculated with the LPC transformation therelevant sliders are highlighted.

• Cepstrum: Calculating buffer and loga-rithm quefrenz (see chapter for Cepstral anal-ysis 2.7).

• Waveform: The time span shown in dialogwindow can adjusted here.

• Transformation: The transformation forthe Sonogram generating is changed here.

• Pitch: If pitch detection is enabled a forcan be lay over the Sonogram. The frequencylimitation bounded the pitch search into a fre-quency span from zero to selected frequency.The smooth out function tries to smooth thepitch curve.

• Wavelet: The number of otaves and thewavelet for the transformation can be selectedhere.

Figure 4: Adjustment Dialog.

4.3 Toolbar and Menu bar

Some items in the menu bar are linked with func-tions in the adjustment dialog. Only not linkeditems are listed here.

• File - open local Media File: Shows thefile-open dialog.

• File - open local Media File by URL:

Shows the net-file-open dialog. The full refer-enced URL must entered here.

• File - open Sonogram from SVG: If theSonogram is exported as SVG file, it can bereopened with this dialog. After re-importingthe Sonogram the program ask you for reopenoriginal file. If the original file is not avaliableor user cancel reopen not all features of Sono-gram program will be avaliable why originalsignal is not present.

• File - Save Sonogram as SVG: The Sono-gram can exported to SVG. After file is se-lected colors and grid can selected. Thepure SVG option removes all Sonogram spe-cific information from the SVG file, but re-importing without this information is not pos-sible! Graphic size adjusts the square size ofthe single pixel.

• File - Export Sonogram as Bitmap:

Sonogram will be stored as bitmap.

• File - Print Sonogram: Prints the Sono-gram without single spectrum and informa-tion box.

• File - Exit: Quits program.

• Options - Fullscreen: The Main window issized to full screen size and toolbar is hided.

• Options - Default Settings: The First di-alog ask for reset all settings to default, lastdialog ask for resize all windows.

• Options - Select LookAndFeel: Generaloutfit for all dialogs can be changed

• Help - System Information: Displays de-tailed informations about the operating sys-tem and the java configuration.

10

• File - The file history Sonogram storesall filenames from files they are successfullyopened. Non avaliable files have a red lineand net files have a green earth in icon.

The Icons in the toolbar correspond with accord-ing elements on the menu bar.

4.4 Analyse Windows

The Analyse windows ”Wave-Form”, ”Fourier”,”Linear Prediction”, ”Cepstum” and ”Wavelet”are single windows which work separate from theSonogram picture. All the settings for this win-dows are made in the adjustment dialog on theadequate register cards. The windows can be re-sized.

4.4.1 Waveform Display

The waveform window shows the signal as a oscil-logram graph in time-domain. Amplitude of thesignal is absolute amplitude from the file. Thesignal walks with mouse over the Sonogram andshows the length selected in adjustment dialog. Isthe time window after the mouse at the end of thesignal a red information on upper right corner willbe displayed.

Figure 5: Waveform Window.

4.4.2 Fourier Display

The fourier window shows the fast fourier trans-formed of the signal under the mouse with the win-dow length selected in the general adjustment di-alog. All parameters for this transformation are

adjusted in the analogical register card. The fre-quency fraction slider scales the x axis of the fre-quency plot.Is the time window after the mouse atthe end of the signal a red information on upperright corner will be displayed.

Figure 6: Fourier Window.

4.4.3 LPC Display

The linear prediction coefficients window showsthe spectrum of the linear extrapolated signal un-der th mouse. See in the online-help and in thechapter for LPC for detailed information aboutthis transformation. All parameters can be con-figured in analogical register card in adjustmentdialog. Is the time window after the mouse at theend of the signal a red information on upper rightcorner will be displayed.

Figure 7: LPC Window.

4.4.4 Wavelet Transformation

The wavelet tranformation window shows the twodimensional time-octave view. The time window

11

for the transformation begins under the mouse andhas the span corresponding to the number of oc-taves selected in the adjustment dialog. Accuracythe number of samples is the octave power of two.⇒ windowtimespan = 2octaves

samplerate

Figure 8: Wavelet Window.

4.4.5 Cepstrum Display

The cepstrum window shows the cepstral trans-formed signal from the position under the mouse.See chapter cepstral analysis for more details onselected algorithm.

Figure 9: Cepstrum Window.

4.5 Information Dialog

The information dialog (enabled with the corre-sponding button on the toolbar or the menu item)gives detailed information over the signal, the file

format and all internal selected parameters. Allinformations are updated immediately when cor-responding information has changed.

Figure 10: Information Dialog.

4.6 3D Surface Plot

If Java3D is installed on local machine this fea-ture is enabled. All options are made in adjust-ment dialog. If the Sonogram has more than 20000points the dialog will ask you if you want to gen-erate this plot. More than 20000 are very slow onvideo-cards with less than 64MB ram. The gener-ated plot is movable and sizable with mouse. TheRight mouse button cycles the surface plot and leftbutton moves them in room. With middle button(both buttons on windows systems) the plot canbe zoomed in and out.

4.7 File Open Dialog

File open dialog in Sonogram have enhanced possi-bility like a standard java file chooser dialog. Theincluded find utility permits searching for files andshows the directory history. Supported file typesfor JMF 2.1 are listed in the onlinehelp or in thechapter 3.2.

4.8 Online-Help

Info button on toobar enables online help for Sono-gram. He shows additional information for in-cluded features. If local computer is connectedto internet, blue underlined links can be used.

12

Figure 11: Surface Plot.

With the home button the original help pagewill be shown again. The online help shows theindex.html file in sonogram folder.

4.9 Keyboard Shortcuts

Key Function��

��Strg +

��

��o open file from Disk

��

��Strg +

��

��u open file from URL

��

��Strg +

��

��y re-import SVG spectrum file

��

��Strg +

��

��v save Sonogram as SVG file

��

��Strg +

��

��e export Sonogram as Bitmap

��

��Strg +

��

��n open file n from history list

��

��Strg +

��

��q quit program

��

��Strg +

��

��a Open adjustment dialog

��

��Strg +

��

��f select “Look and Feel“

��

��Strg +

��

��n enables/disables inverse colors

��

��Strg +

��

��g enable/disable grid in Sonogram

��

��Strg +

��

��k enable/disable logarithm amplitude

��

��Strg +

��

��l enable/disable logarithm frequency

��

��F11 Fullscreen

��

��Strg +

��

��i info dialog

��

��Strg +

��

��p play selected signal

��

��Strg +

��

��r rewind playing

��

��Strg +

��

��s stop playing

��

��Strg +

��

��3 show 3D surface plot

��

��Strg +

��

��o show LPC window

��

��Strg +

��

��c show cepstrum window

��

��Strg +

��

��l show waveform window

��

��Strg +

��

��H show wavelet window

��

��Strg +

��

��z zoom in marked area

��

��Strg +

��

��b zoom back to full signal

��

��Strg +

��

��v zoom back to previous zoom

��

��F1 open online-help

5 Working with Sonogram

5.1 Opening a File

Opening the an multimedia file is splitted in twophases. First the samples are read out from thefile and second the Sonogram will be calculatedwith the options adjusted in the configuration di-alog. Two times a progress dialog appears in theunder edge of the main window. During progressbar advance a mouse click on it will stop thecomplete opening procedure immediately. The sogenerated Sonogram occurs on the main window.Mouse movement over the Sonogram picture up-date the information box in the under right cor-ner. The info dialog (activated with scroll icon

13

on the toolbar) let show detailed information. Ifthe mouse is pressed on the Sonogram picture theyellow slider can an specific range in time-domain.Zoom in button on the toolbar zoom into this se-lected range. All zooms are stored in an internallist for later re-zooms to previous zooms. Zoomback to full signal show Sonogram generated firsttime opened the file. Playing button plays onlythe signal selected with the yellow slider. WithF11-key the main window can maximized to fullscreen and the toolbar will be disabled.

5.2 Configuring the Sonogram with theAdjustment Dialog

Turn off the “automatically searching best win-dow length“ let you select the window length man-ual. After selection the calculate button mustbe pressed again to update the spectrum. Mostchanges in the for the main window must be donemanual recalculate button or with redraw button.

5.3 Analysing Windows

On the toolbar four analyse windows can acti-vated.

• Fast Fourier window.

• Cepstrum window.

• Linear Prediction window.

• Waveform window.

All these four windows are updated immediatelywhen mouse is moved over the Sonogram of themain window. Time point for the computationsis the time point under the mouse. Is the timewindow from the mouse to the end of the signallonger than the time window selected for the sig-nal, an red warn message appears in the upperright corner of the analyse window. All windowswhere closed with the main close button for thewindow.

5.4 3D Surface Plot

Pressing the 3D button on the toolbar begins gen-erating of the 3D surface plot. If the Sonogram has

more then 20000 analyse pictures a warn messageis shown and ask you for continue the surface plotgenerating. Plots with more than 20000 elementsare very slow in interactive movement. With rightmouse button the plot can cycled and with leftbutton plot can moved. Zooming happened withmiddle mouse button (or on Windows systems bypressing right and left button together). Threeplot types are avaliable: point-cloud, line-grid andsolid-surface. The background of the plot can beeither monochrome colored or have a picture. Thedimensions of the ground area is modifiable withthe sliders for time and frequency. The data reduc-tion feature is in beta phase and work not correctfor all types of plots.

5.5 Working with SVG

Scalable Vector Graphic is an XML standard forvector graphics defined by the WWW Consortium.In comparison with the bitmap format SVG-XMLfiles store the data in human readable format likehtml. In the export dialog colors and the size ofthe single specpoint can be selected. Exportationcan be done in two different ways; pure SVG howdefined in the W3C or with additional informa-tions over the plot like filename, adjustment in-formations and sample informations. Pure SVGfiles can not re-imported into the program! To re-import of Sonograms SVG files simple open thefile with the SVG file chooser. The time-domainbased information for the amplitude representa-tion on the bottom of the main window will bereconstructed from the spectrum.

6 Future Work

Wavelet transformation seems to be usefully foracoustical octave analyse and will be included innext time. An video-window which shows the cur-rent video parallel to the played or scrolled posi-tion seems easy to realize with java media frame-work. Sonogram should became a speech analyserfunction that work with ”Hidden Makrov Mod-els”. At this point the speech analysis function isin width distance from development actual point

14

of development. Java source code for HMM algo-rithm is avaliable.

A Musical Frequencies

In c-dur gamut musical frequencies are separatedin twelve tones (C C# D D# E F F# G G# A A#

H). Reference point is fa = 440Hz. Other frequen-

cies are calculated with fx = 2tone

12 fa with tone =

difference between seeked tone and reference tone

A. Every next octave has double frequency.

B About this Document

This document was build with LATEX for Linux andMac OS-X, teTeX version 1.0.

References

[Roa96] Curtis Roads, “The Computer Music Tu-torial”,MIT Press 1996.

[Bri95] E. Oran Brigham, “Schnelle FourierTransformation”, R. Oldenburg Verlag 1995.

[Lau00] Christoph Lauer, Akkustische Signalanal-yse/Synthese mit Wavelets im vergleich zurklassischen Fourieranalyse”, 2000.

[Lau02] Christoph Lauer, onlinedocumentationfor sonogram, 2002.

15

sonogram - dfki · 2002-05-13 · sonogram acoustical frequency analysis tool christoph lauer...

Documents