comparing the strengths and difficulties questionnaire...

5
DOI: 10.7763/IPEDR. 2014. V70. 16 Comparing the Strengths and Difficulties Questionnaire (SDQ) and Sakchai Tangwannawit and Montean Rattanasiriwongwut Department of Information Technology Management, Information Technology, KMUTNB, Thailand Abstract. The purpose of this research was to compare the Strengths and Difficulties Questionnaires (SDQ) and behavior consideration assessment using Support Vector Machine algorithm techniques (SVM). The research process was divided in to three steps. Firstly, the comparison study of screening behavior using random samples selected from the population by a sampling technique which included 17 advisers, 304 parents and 304 students. The data was collected from 304 vocational students of the academic year 2012 at Singburi Vocational College, by means of the Strengths and Difficulties Questionnaire (SDQ). Secondly, the performance measurement of accuracy employed Support Vector Machine algorithm. The classification result of accuracy found that SVM was the best at 90.36%. Finally, the hypothesis was tested with One Way ANOVA (F-test) between two groups with a confidence level at 95%. Test results were compared statistically via the Scheffe Method. Keywords: Strengths and Difficulties Questionnaire (SDQ), Support Vector Machine Algorithm (SVM). 1. Introduction The Strengths and Difficulties Questionnaire (SDQ) is designed to be used as a screening tool in clinical assessment, to assess treatment outcomes, and as a research tool [1]. At present, no survey instruments exist that have been specifically designed to provide information on the behavioral characteristics. Yet existing standardized instruments that have been developed for, and validated on, general populations may not be appropriate to provide information on the behavioral characteristics of children. The SDQ asks about 25 attributes, some positive and some negative. The items which have been selected on the basis of contemporary diagnostic criteria as well as factor analysis, are divided between three scales of five items each, generating score for: emotional symptoms, hyperactivity, conduct problems, peer problems and pro-social behaviors. The same questionnaire can be completed by parents, students and advisers of students. The SVM technique was applied for t h e assessment of SDQ [2]. The SVM model for classification focuses on improving accuracy. The SVM accuracy was f o u n d t o b e the best and used to develop the recommender system for counseling students. The ANOVA F-test is relatively resistant to violations of assumptions. Similarly, the multiple comparisons tests after the rejection of the null hypothesis in the one-way ANOVA will be resistant to those violations. There are several powerful multiple comparison procedures available, such as Tukeys HSD, Scheffés and Bonferroni methods, see e.g. Lapin [3]. The researcher of this paper applied SVM for classification of SDQ and then compared results by one way ANOVA. This paper consists of five sections. Section 1 introduces the background. Section 2 is the methodologies. Section 3 is the analysis and results. Section 5 is the conclusions of this study. 2. Methodology 2.1. Data and study population Population in this research include advisers, parents, and vocational students of the academic year 2012 at Singburi Vocational College in Thailand. Corresponding author. Tel.: +662555-2000 ext 2717. E-mail address: [email protected]. 81 Behavior Consideration Assessment U sing SVM Techniques

Upload: others

Post on 17-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparing the Strengths and Difficulties Questionnaire ...ipedr.com/vol70/016-ICEMI2014_H00028.pdf · Behavior Consideration Assessment U sing SVM Techniques . The sample was divided

DOI: 10.7763/IPEDR. 2014. V70. 16

Comparing the Strengths and Difficulties Questionnaire (SDQ) and

Sakchai Tangwannawit and Montean Rattanasiriwongwut

Department of Information Technology Management, Information Technology, KMUTNB, Thailand

Abstract. The purpose of this research was to compare the Strengths and Difficulties Questionnaires (SDQ) and behavior consideration assessment using Support Vector Machine algorithm techniques (SVM). The research process was divided in to three steps. Firstly, the comparison study of screening behavior using random samples selected from the population by a sampling technique which included 17 advisers, 304 parents and 304 students. The data was collected from 304 vocational students of the academic year 2012 at Singburi Vocational College, by means of the Strengths and Difficulties Questionnaire (SDQ). Secondly, the performance measurement of accuracy employed Support Vector Machine algorithm. The classification result of accuracy found that SVM was the best at 90.36%. Finally, the hypothesis was tested with One Way ANOVA (F-test) between two groups with a confidence level at 95%. Test results were compared statistically via the Scheffe Method.

Keywords: Strengths and Difficulties Questionnaire (SDQ), Support Vector Machine Algorithm (SVM).

1. Introduction

The Strengths and Difficulties Questionnaire (SDQ) is designed to be used as a screening tool in clinical assessment, to assess treatment outcomes, and as a research tool [1]. At present, no survey instruments exist that have been specifically designed to provide information on the behavioral characteristics. Yet existing standardized instruments that have been developed for, and validated on, general populations may not be appropriate to provide information on the behavioral characteristics of children. The SDQ asks about 25 attributes, some positive and some negative. The items which have been selected on the basis of contemporary diagnostic criteria as well as factor analysis, are divided between three scales of five items each, generating score for: emotional symptoms, hyperactivity, conduct problems, peer problems and pro-social behaviors. The same questionnaire can be completed by parents, students and advisers of students.

The SVM technique was applied for t h e assessment of SDQ [2]. The SVM model for classification focuses on improving accuracy. The SVM accuracy was f o u n d t o b e the best and used to develop the recommender system for counseling students. The ANOVA F-test is relatively resistant to violations of assumptions. Similarly, the multiple comparisons tests after the rejection of the null hypothesis in the one-way ANOVA will be resistant to those violations. There are several powerful multiple comparison procedures available, such as Tukey’s HSD, Scheffé’s and Bonferroni methods, see e.g. Lapin [3].

The researcher of this paper applied SVM for classification of SDQ and then compared results by one way ANOVA. This paper consists of five sections. Section 1 introduces the background. Section 2 is the methodologies. Section 3 is the analysis and results. Section 5 is the conclusions of this study.

2. Methodology

2.1. Data and study population

Population in this research include advisers, parents, and vocational students of the academic year 2012 at Singburi Vocational College in Thailand.

Corresponding author. Tel.: +662555-2000 ext 2717.

E-mail address: [email protected]. 81

Behavior Consideration Assessment U sing SVM Techniques

Page 2: Comparing the Strengths and Difficulties Questionnaire ...ipedr.com/vol70/016-ICEMI2014_H00028.pdf · Behavior Consideration Assessment U sing SVM Techniques . The sample was divided

The sample was divided into three groups: 304 students, 17 advisers and 304 parents. The sample uses random sampling by students who are enrolled in certificate level 2 and 3. The data was collected from students, parents and advisers for 304 students (912 records). (Fig. 1)

Fig. 1: Shows the collected data from students, parents and the adviser.

2.2. Instruments Data were collected from vocational students using the extended versions of the Strength and

Difficulties Questionnaire (SDQ) to assess children’s mental health status. The SDQ measures 25 emotional and behavioral symptoms. Each item is scored from 0 to 2 (not true, somewhat true, and certainly true). The SDQ is a validated instrument to screen children’s and adolescents’ emotional and behavioral problems developed by Goodman [1], [4] and has been used extensively in Great Britain [1], [4] – [6] and in many other countries [7] - [10]. The SDQ includes five factors: emotional symptoms (e.g., has many worries, is often unhappy, etc.), hyperactivity (e.g., gets restless, cannot sit for long, etc.), conduct problems (e.g., fights a lot, often has a hot temper, etc.), peer problems (e.g.,is bullied by others, tends to play alone, etc.), and pro-social behaviors (e.g., often volunteers to help others, shares readily with other children, etc.). A total difficulties score is calculated as the sum of scores of the conduct, hyperactivity, emotional, and peer problems scales [1], [4].

2.3. Methods of analysis

Total difficulties score were classified according to the Thai version of the SDQ as “normal,” “borderline,” or “abnormal”. Estimates of frequency of possible mental disorders based on normative data for Thailand cut offs developed by Woerner and others was used to describe the mental health status of the children in Thai context [10]. 2.3.1. Data classifications

Fig. 2: Two groups of data divided by the linear SVM.

Support Vector Machine (SVM) [11] is a data classifier algorithm that has been applied in various disciplines. It takes a set of input data and applies a simple linear method to the data but in a high dimensional feature space, which is non-linearly related to the input space. The SVM algorithm consists of

82

Page 3: Comparing the Strengths and Difficulties Questionnaire ...ipedr.com/vol70/016-ICEMI2014_H00028.pdf · Behavior Consideration Assessment U sing SVM Techniques . The sample was divided

support vectors (training samples), which are the data points that lie closest to the decision surface. The decision function is specified by a subset of support vectors. Two dimensions of data are separated by a linear (hyperplane). The SVM also uses a kernel function that corresponds to a dot product of two feature vectors in some expanded feature space that aims to minimize errors and maximize the margin around the separating hyperplane. The original input space can be mapped to some higher-dimensional feature space where the training set is separable. It is different from other techniques such as Artificial Neural Network (ANN) that aims to minimize the possibility of predictive errors only.

There are some occasions that two groups of data cannot be divided using the linear SVM because data are clustered in different positions. There is a need for appropriate tools and techniques for ranking data in higher dimension space. In this case, multidimensional linear classifiers are considered as more efficient than general methods in data classification. The Polynomial kernel classifier is generally used to calculate a linear classifier higher than two degree. The Radial Basis Function Kernel classifier (RBFKC) uses C as a variable for the balancing point to measure the best classification range and the least potential error rate. This research will be based on the RBFC. The RBFKC equation is listed below.

( ⃗ ⃗ ) | ⃗ ⃗ |

) (1)

2.3.2. Analysis of variance Analysis of variance (ANalysis of VAriance) is a general method to study sampled-data. relationships.

The method enables the difference between two or more sample means to be analyzed, achieved by subdividing the total sum of squares. One way ANOVA is the simplest case. The purpose is to test for significant differences between class means, and this is done by analyzing the variances. Incidentally, if we are only comparing two different means then the method is the same as the t-test for independent samples. The basis of ANOVA is the partitioning of sums of squares into between-class (SSb) and within-class (SSw). It enables all classes to be compared with each other simultaneously rather than individually; it assumes that the samples are normally distributed. The one way analysis is calculated in three steps, first the sum of squares for all samples, then the within class and between class cases. For each stage the degrees of freedom df are also determined, where df is the number of independent `pieces of information' that go into the estimate of a parameter. These calculations are used via the Fisher statistic to analyze the null hypothesis. The null hypothesis states that there are no differences between means of different classes, suggesting that the variance of the within-class samples should be identical to that of the between-class samples (resulting in no between-class discrimination capability). It must however be noted that small sample sets will produce random fluctuations due to the assumption of a normal distribution.

3. Results

3.1.

Classification technique results

The performance measurement of accuracy employed Support Vector Machine algorithm shown in Table 1. Table 1: The SVM Classification result of Accuracy

The SVM classification result of accuracy was the best at 90.36%.

83

SDQ Correct Precision Recall F1

1. Emotional symptoms 97.09 0.96 0.97 0.96

2. Hyperactivity 97.39 0.95 0.97 0.96

3. Conduct problems 98.99 0.96 0.99 0.99

4. Peer problems 97.30 0.96 0.97 0.98

Total( 4 factors) 90.36 0.90 0.90 0.89

5. Pro-social behaviors 99.00 0.99 0.99 0.99

Page 4: Comparing the Strengths and Difficulties Questionnaire ...ipedr.com/vol70/016-ICEMI2014_H00028.pdf · Behavior Consideration Assessment U sing SVM Techniques . The sample was divided

3.2. SDQ results

This study used the Thai version of the SDQ. Table 2 shows the overall comparison of groups. Table 3 shows the category result of the comparison.

Table 2: Overall of Compare Group

Compare group Number of

SDQ Percentage

No different Different Total No different Different Total

Adviser : Student 235 69 304 77.30 22.70 100 Adviser : Parent 244 60 304 80.26 19.74 100 Student : Parent 257 47 304 84.54 15.46 100

Mean 245.30 58.70 304 80.69 19.31 100

Table 3: The Category Result of Compare Group

Factors Mean Percentage

No different Different Total No different Different Total Emotional symptoms 194.30 109.70 304 63.91 36.09 100 Hyperactivity 264.33 39.67 304 86.95 13.05 100 Conduct problems 250.67 53.33 304 82.46 17.54 100 Peer problems 267.00 37.00 304 87.83 12.17 100 Pro-social behaviors 282.00 22.00 304 92.76 7.24 100

3.3. ANOVA results

Table 5: The Scheffe's Method to Compare Results by Factors

Factors

Groups Sum

of

Squares

df Mean

Square

F

Sig.

Emotional

symptoms

Between group

Within group

Total

487.03

4546.74

5033.78

2

909

911

243.51

5.00

48.68

.000

Hyperactivity Between

group

Within group

Total

35.21

1573.15

1608.36

2

909

911

17.60

1.73 10.17

.000

Conduct problems

Between group

Within group

Total

115.30

3995.95

4111.25

2

909

911

57.65

4.39 13.11

.000

Peer problems

Between group

Within group

Total

10.14

1771.27

1781.42

2

909

911

5.07

1.94 2.60

.075

Pro-social

behaviors

Between group

Within group

Total

66.44

3478.15

3544.59

2

909

911

33.22

3.82 8.68

.000

The statistical package for social science (SPSS) was used for data analysis. Mean, standard deviation,

and comparison of the mean grade point average was determined by one way ANOVA. The correlation of achievement at the SDQ and throughout the whole program was determined by Pearson’s Correlation. The

84

Table 4: The Behavior Consideration by One Way ANOVA between Three Groups

GroupSum ofSquares df

MeanSquare F Sig.

Between group 1149.8 7 2 574.94 22.70 .000

Within group 23022.60 909 25.33

Total 24935.44 911

Page 5: Comparing the Strengths and Difficulties Questionnaire ...ipedr.com/vol70/016-ICEMI2014_H00028.pdf · Behavior Consideration Assessment U sing SVM Techniques . The sample was divided

hypothesis was tested with One Way ANOVA (F-test) between two groups with a confidence level at 95%. Test results were compared statistically via the Scheffe's Method. In conclusion, the results from this study support the hypothesis that evaluation results from advisers, parents, and students were statistically different at a .05 significance level. Table 4 shows the behavior consideration by one way ANOVA between three groups, and they were significantly different at p <0.05. Table 5 shows the Scheffe's method to compare results by factors of behavior consideration between advisers, students and parents. The Emotional symptoms, Hyperactivity, Conduct problems and Pro-social behaviors were significantly different at p<0.05 and the Peer problems was no significantly different at p<0.05.

4. Conclusion

The purpose of this study was to find a suitable model for classification by focusing on improving accuracy. The performance measurement of accuracy employed SVM algorithm. The SVM classification result of accuracy was the best at 90.36%. The classification result of accuracy b y factors found that it was higher than the classification result (Emotional symptoms, Hyperactivity, Conduct problems, and Peer problems). This study also undertaken a comparison study of screening behaviors using Strengths and Difficulties Questionnaires (SDQ). The hypothesis was tested with One Way ANOVA (F-test) between two groups with a confidence level at 95%. Test results were compared statistically via the Scheffe Method. In conclusion, the results from this study support the hypothesis that evaluation results from advisers, parents, and students were statistically different at a .05 significance level.

5. References

[1] R. Goodman, 1997. “The Strengths and Difficulties Questionnaire: A research note,” Journal of Child Psychology

and Psychiatry, 38(5): 581-586.

[2] P. Dunkhuntod and S. Tangwannawit (2011). “Recommender system for counseling student using support vector machine classification technique,” The 7th National Conference on Computing and Information Technology

(NCCIT), pp. 115 - 120.

[3] L. L. Lapin, Probability and Statistics for Modern Engineering, 2nd ed. Belmont: Duxbury Press, 1990.

[4] R. Goodman, 2001. “Psychometric properties of the strengths and difficulties questionnaire,” Journal of the

American Academy of Child and Adolescent Psychiatry, vol. 40, pp. 1337-1345.

[5] R. Goodman, (1999). “The extended version of the strengths and difficulties questionnaire as a guide to child psychiatric caseness and consequent burden,” Journal of Child Psychology and Psychiatry, vol. 40, pp. 791-799.

[6] R. Goodman, T. Ford, H. Simmons, R. Gatward and H. M. (2000). “ Using the strengths and difficulties questionnaire (SDQ) to screen for child psychiatric disorders in a community sample,” The British Journal of

Psychiatry, vol. 177, pp. 534-539.

[7] T. M. Achenbach, A. Becker, M. Döpfner, E. Heiervang, V. Roessner, H.-C. Steinhausen, et al. (2008). “Multicultural assessment of child and adolescent psychopathology with ASEBA and SDQ instruments: Research findings, applications, and future directions,” Journal of Child Psychology and Psychiatry, 49(3): 251-275.

[8] K. H. Bourdon, R. Goodman, D. S. Rae, G. Simpson, and D. S. Koretz, (2005). “The strengths and difficulties questionnaire: u.s. normative data and psychometric properties,” Journal of the American Academy of Child

&amp; Adolescent Psychiatry, 44(6): 557-564.

[9] E. Heiervang, A. Goodman , and R. Goodman, (2008). “ The Nordic advantage in child mental health: separating health differences from reporting style in a cross-cultural comparison of psychopathology,” Journal of child

psychology and psychiatry, 49(6): 678-685.

[10] W. Woerner, S. N. Becker, A. Y. Wongpiromsarn, and A. M. (2011). “Normative data and psychometric properties of the that version of the strengths and difficulties questionnaire,” Journal of Mental Health of

Thailand, 19(1).

[11] V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag. New York (1995).

85