health diagnosis through voice analysis sahil loomba & shamiek mangipudi, department of...

1
Health Diagnosis through Voice Analysis Sahil Loomba & Shamiek Mangipudi, Department of Electronics and Electrical Engineering, IIT Guwahati Deepest appreciation to Dr. Kannan Karthik for his motivation and supervision In the past, significant research has been done utilizing human speech sound to extract information regarding several physiological, psychological and psychiatric attributes of the speaker. The psychiatric focus so far has been on a small framework of illnesses mainly speech pathologies. Each of the human sounds reflect a particular attribute of the health of the individual. The human speech contains information about the vocal tract system and excitation source, which helps us diagnose speech disorders such as language disorders, articulation disorders, phonemic disorders, stuttering, larynx disorders et cetera. After studying human hum sounds and music, we concluded that hum sounds contain huge amount of information about health of an individual affected by our choice of illness. The hum sounds are entirely exhaled through the nose and since nose gets affected the most in our choice of illnesses, it would certainly show a significant difference. Furthermore, the hum capabilities of an individual and these capabilities would certainly get diminished during the unhealthy phase of an individual. In this project, we have automated a system that can draw conclusions about the health status of an individual. To remain within the scope of this project we have focused on common illnesses such as cold and cough. It was within the reach of our resources to collect data for these illnesses and hence build an automated system that is capable of distinguishing a healthy individual from a sick one with respect to sickness of cold and cough. This work shall find significant applications in health diagnosis techniques. Implementing this algorithm in mobile based and cloud based applications and allowing user to speak into the microphone in real-time shall serve as an economical and user-friendly tool for assisting medical community in their diagnosis of the health status of the patient. Moreover, their use in telemedicine environments is possible as a remote and automatic screening method. Finally, they can be used as a medical–legal documentation tool to express the success of a surgical intervention in a quantitative manner. BACKGROUND PURPOSE DATA COLLECTION Hundred samples in each of healthy and sick group on following tunes: Jana Gana Mana by Rabindranath Tagore, Fur Elise by Beethoven and Jingle Bells, a Christmas carol. Mono channel at 16 KHz. 16 bit floating point precision in natural environment using Realtek High Definition Audio Microphone on laptop Lenovo Ideapad Z510. Average length of the sample is 21.4 seconds with standard deviation of 4.1 seconds. DATA ANALYSIS In the ‘pause’ of sick samples, a high frequency noise component was observed which is an important cue containing valuable information about the health of individual. PRE-PROCESSING OF HUM DATA An end-point detection algorithm for extraction of the information where hum is present. The algorithm finds the start and end of hum in a given waveform, allowing the information to be removed and analyzed. Our implementation is done for the short-time magnitude analysis of the hum. This algorithm gives the entire region where hum exists in an input signal. FEATURE EXTRACTION We extracted following types of features from the hum data and compared the performance of our model with different combinations of these features: (i) Mel Frequency Cepstral Coefficients (ii) Deltas (Differential) Coefficients (iii) Delta-Deltas (Acceleration) Coefficients (iv) Chroma (Semitones) of musical CLASSIFICATION METHODS (i) K means and CrispKmeans (ii) Support Vector Machine METHODS Music based features i.e. chroma features are best suited for information extraction with an average accuracy of 80.89 over k-means clustering classification model. It outperforms MFCC based model. Among the speech based features i.e. MFCC, Delta and Delta-Delta, the combination of all three features gives best performance in SVM based model. SVM based model is fastest among others but less accurate. Comparison of model accuracies RESULTS CONCLUSIONS Sick sample observed in Wavesurfer Chroma Features on Chromagram IMPLEMENTATION K means: Each data group was randomly divided into training set and testing set in 4:1 ratio over several iterations. We trained separate templates for healthy and sick profile. The testing set was used to calculate the accuracy of the model developed. This was done for Chroma features based model. CrispKmeans: Due to extremely large number of MFCC feature vectors extracted, we employed batch wise Kmeans over randomly selected batches in each of the healthy and sick template modeling. The testing set was used to calculate the accuracy of the model developed. Support Vector Machine: The entire features vectors were labelled and pooled together for both classes. SVM was trained for different training vs testing ratios. Linear Kernel was used as number of feature vectors was much larger than the dimension of feature vector. Chroma MFCC 80.89 72.52 Chroma MFCC MFFC +Del MFCC + Del + Del-Del 50.42 76.46 76.56 77.39 Kmeans model accuracies SVM model accuracies All our data sets, MATLAB codes and reference materials are available on our website: http://www.ai4hd.strikingly.com ONLINE ACCESS

Upload: peregrine-reynolds

Post on 27-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Health Diagnosis through Voice Analysis Sahil Loomba & Shamiek Mangipudi, Department of Electronics and Electrical Engineering, IIT Guwahati Deepest appreciation

Health Diagnosis through Voice AnalysisSahil Loomba & Shamiek Mangipudi, Department of Electronics and Electrical Engineering, IIT Guwahati

Deepest appreciation to Dr. Kannan Karthik for his motivation and supervision

• In the past, significant research has been done utilizing human speech sound to extract information regarding several physiological, psychological and psychiatric attributes of the speaker. The psychiatric focus so far has been on a small framework of illnesses mainly speech pathologies.

• Each of the human sounds reflect a particular attribute of the health of the individual. The human speech contains information about the vocal tract system and excitation source, which helps us diagnose speech disorders such as language disorders, articulation disorders, phonemic disorders, stuttering, larynx disorders et cetera.

• After studying human hum sounds and music, we concluded that hum sounds contain huge amount of information about health of an individual affected by our choice of illness. The hum sounds are entirely exhaled through the nose and since nose gets affected the most in our choice of illnesses, it would certainly show a significant difference. Furthermore, the hum reflects the lung capacity and breathing capabilities of an individual and these capabilities would certainly get diminished during the unhealthy phase of an individual.

• In this project, we have automated a system that can draw conclusions about the health status of an individual. To remain within the scope of this project we have focused on common illnesses such as cold and cough. It was within the reach of our resources to collect data for these illnesses and hence build an automated system that is capable of distinguishing a healthy individual from a sick one with respect to sickness of cold and cough.

• This work shall find significant applications in health diagnosis techniques. Implementing this algorithm in mobile based and cloud based applications and allowing user to speak into the microphone in real-time shall serve as an economical and user-friendly tool for assisting medical community in their diagnosis of the health status of the patient. Moreover, their use in telemedicine environments is possible as a remote and automatic screening method. Finally, they can be used as a medical–legal documentation tool to express the success of a surgical intervention in a quantitative manner.

BACKGROUND

PURPOSE

DATA COLLECTIONHundred samples in each of healthy and sick group on following tunes:Jana Gana Mana by Rabindranath Tagore, Fur Elise by Beethoven and Jingle Bells, a Christmas carol.

Mono channel at 16 KHz. 16 bit floating point precision in natural environment using Realtek High Definition Audio Microphone on laptop Lenovo Ideapad Z510. Average length of the sample is 21.4 seconds with standard deviation of 4.1 seconds.

DATA ANALYSISIn the ‘pause’ of sick samples, a high frequency noise component was observed which is an important cue containing valuable information about the health of individual.

PRE-PROCESSING OF HUM DATA

An end-point detection algorithm for extraction of the information where hum is present. The algorithm finds the start

and end of hum in a given waveform, allowing the information to be removed and analyzed. Our implementation is

done for the short-time magnitude analysis of the hum. This algorithm gives the entire region where hum exists in an

input signal.

FEATURE EXTRACTION

We extracted following types of features from the hum data and compared the performance of our model with different

combinations of these features: (i) Mel Frequency Cepstral Coefficients (ii) Deltas (Differential) Coefficients (iii)

Delta-Deltas (Acceleration) Coefficients (iv) Chroma (Semitones) of musical octave.

CLASSIFICATION METHODS

(i) K means and CrispKmeans (ii) Support Vector Machine

METHODS

• Music based features i.e. chroma features are best suited for information extraction with an average accuracy of 80.89 over k-means clustering classification model. It outperforms MFCC based model.

 • Among the speech based features i.e. MFCC, Delta and Delta-Delta,

the combination of all three features gives best performance in SVM based model.

 • SVM based model is fastest among others but less accurate. • K-means based model is most accurate but computationally more

intense.

Comparison of model accuracies

RESULTS

CONCLUSIONS

Sick sample observed in Wavesurfer Chroma Features on Chromagram

IMPLEMENTATIONK means: Each data group was randomly divided into training set and testing set in 4:1 ratio over several iterations. We trained separate templates for healthy and sick profile. The testing set was used to calculate the accuracy of the model developed. This was done for Chroma features based model.

CrispKmeans: Due to extremely large number of MFCC feature vectors extracted, we employed batch wise Kmeans over randomly selected batches in each of the healthy and sick template modeling. The testing set was used to calculate the accuracy of the model developed.

Support Vector Machine: The entire features vectors were labelled and pooled together for both classes. SVM was trained for different training vs testing ratios. Linear Kernel was used as number of feature vectors was much larger than the dimension of feature vector.

Chroma MFCC

80.89 72.52

Chroma MFCC MFFC +Del MFCC + Del + Del-Del

50.42 76.46 76.56 77.39

Kmeans model accuracies SVM model accuracies

All our data sets, MATLAB codes and reference materials are available

on our website:

http://www.ai4hd.strikingly.com

ONLINE ACCESS