sample assessment results · 2019. 9. 6. · summated scale score because the items used to compute...

31
SAMPLE ASSESSMENT RESULTS [PROJECT TITLE] [DATE OF REPORT] This report is for the EVI-2 administration in November 2002. There were 52 pre-tests and 57 post-tests administered and completed. A total of 47 students completed both the pre- and post-test. Because there were more than thirty participants with both pre- and post-test scores, repeated measures t-tests could be conducted on the scale scores in order to assess if there was a significant increase in the scores after participation in the program. Descriptive statistics for each subscale, and the total score can be found in Table 1 below, with the results of the repeated measures t-tests included in the right-hand column of the descriptive statistics table. There was a significant increase in the subscales of Integrity, Values, and Ethics. There was no significant difference from pre- to post-test in the Community subscale. There was also a significant increase in the total scores from pre- to post-test. Table 2 provides descriptive statistics for each item at both pre- and post-test. The percentage of students answering each item correctly, can be obtained by looking at the means for each item. Reliability estimates (Cronbachs coefficient alpha) were computed for the scores from both the subscales and total test at both pre- and post-testing occasions. The results of these analyses are in Table 3. Reliability is the degree to which the items representing a construct are answered consistently. Reliability, in this case internal consistency, is necessary to assess if one intends to compute summated scores. In the current situation, the EVI-2 was designed to produce a total and four subscale summated scores. Therefore, the consistency of the responses to the items representing each summated scale score must be assessed. If reliability is poor, one should not compute a summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence we can have that the scores obtained from the administration of the test are essentially the same scores that would be obtained if the test were readministered. Reliability is expressed numerically, usually as a coefficient that ranges from 0 to 1.00. A high coefficient indicates high reliability or consistency. If the test scores were perfectly reliable, the coefficient would be 1.00. However, seldom are test scores perfectly reliable. Scores can be affected by errors in measurement or characteristics of the test itself (e. g., ambiguous items, items with no variance). In general, reliabilities above .70 are considered adequate for program evaluation or research. For evaluation of individual students, reliabilities closer to .80 are desirable. Examining the results in Table 3, we can see that scores from the total test can be considered reliable. However, the reliability estimates of the subscale scores are low to moderate. In order to get a better picture of why the reliability estimates are low to moderate for the subscale scores, correlation matrices, item total correlations, and the reliability value if an item was removed from the scale should be examined. These, therefore, are presented in Appendix A. When examining the correlation matrices, negative correlations and extremely low correlations should be noted. This indicates that an item is negatively, or very weakly, related to another item that represents the construct. If two items are actually representing the same construct, one would expect them to have a moderate to strong positive relationship. Along the same lines, the item-total correlation indicates how correlated the corresponding item is to the total of all the items representing the subscale. If all the items consistently represent the construct, one would expect these item-total correlations to be positive and of at least moderate magnitude (i.e., at least .40). Finally, the column entitled Alpha if Item Deletedindicates what the reliability of the subscale score would be if the corresponding item was deleted. This should be compared to the actual Cronbachs Coefficient Alpha value (called Alphain Appendix A). If the value of

Upload: others

Post on 16-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence

SAMPLE ASSESSMENT RESULTS

[PROJECT TITLE] [DATE OF REPORT]

This report is for the EVI-2 administration in November 2002. There were 52 pre-tests and 57 post-tests administered and completed. A total of 47 students completed both the pre- and post-test. Because there were more than thirty participants with both pre- and post-test scores, repeated measures t-tests could be conducted on the scale scores in order to assess if there was a significant increase in the scores after participation in the program. Descriptive statistics for each subscale, and the total score can be found in Table 1 below, with the results of the repeated measures t-tests included in the right-hand column of the descriptive statistics table.

There was a significant increase in the subscales of Integrity, Values, and Ethics. There was no significant difference from pre- to post-test in the Community subscale. There was also a significant increase in the total scores from pre- to post-test.

Table 2 provides descriptive statistics for each item at both pre- and post-test. The percentage of students answering each item correctly, can be obtained by looking at the means for each item.

Reliability estimates (Cronbachs coefficient alpha) were computed for the scores from both the subscales and total test at both pre- and post-testing occasions. The results of these analyses are in Table 3. Reliability is the degree to which the items representing a construct are answered consistently. Reliability, in this case internal consistency, is necessary to assess if one intends to compute summated scores. In the current situation, the EVI-2 was designed to produce a total and four subscale summated scores. Therefore, the consistency of the responses to the items representing each summated scale score must be assessed. If reliability is poor, one should not compute a summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence we can have that the scores obtained from the administration of the test are essentially the same scores that would be obtained if the test were readministered. Reliability is expressed numerically, usually as a coefficient that ranges from 0 to 1.00. A high coefficient indicates high reliability or consistency. If the test scores were perfectly reliable, the coefficient would be 1.00. However, seldom are test scores perfectly reliable. Scores can be affected by errors in measurement or characteristics of the test itself (e. g., ambiguous items, items with no variance). In general, reliabilities above .70 are considered adequate for program evaluation or research. For evaluation of individual students, reliabilities closer to .80 are desirable. Examining the results in Table 3, we can see that scores from the total test can be considered reliable. However, the reliability estimates of the subscale scores are low to moderate.

In order to get a better picture of why the reliability estimates are low to moderate for the subscale scores, correlation matrices, item total correlations, and the reliability value if an item was removed from the scale should be examined. These, therefore, are presented in Appendix A. When examining the correlation matrices, negative correlations and extremely low correlations should be noted. This indicates that an item is negatively, or very weakly, related to another item that represents the construct. If two items are actually representing the same construct, one would expect them to have a moderate to strong positive relationship. Along the same lines, the item-total correlation indicates how correlated the corresponding item is to the total of all the items representing the subscale. If all the items consistently represent the construct, one would expect these item-total correlations to be positive and of at least moderate magnitude (i.e., at least .40). Finally, the column entitled Alpha if Item Deletedindicates what the reliability of the subscale score would be if the corresponding item was deleted. This should be compared to the actual Cronbachs Coefficient Alpha value (called Alphain Appendix A). If the value of

Page 2: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence

alpha is higher if the item is deleted, it indicates that this item isnt answered in the same manner as the other items representing the scale.

By looking at the correlation matrix for the Integrity subscale, it is apparent that items 9, 27, and 34 are negatively correlated with several other items representing the subscale. In addition, when looking at the Item-Total Correlation column, we can see that all three of these items have low values (below .14 !). Further inspection of the Alpha if Item Deletedcolumn reveals that if items 27 and 34 were removed, the reliability estimate of the scores would increase from .4761 to the .50 range. In fact, if all three of these items were removed from the scale the reliability of the Integrity scores would equal .56 (this information was gathered from re-running the analysis with these three items removed). The above evidence would suggest that these three items may need inspection to see if the wording of the items or the distracters could be contributing to them being negatively correlated to the other items in the subscale.

When looking at the correlation matrix for the Values subscale, items 6, 7, 8 and 31 have several negative or extremely weak correlations with other items representing the subscale. In fact, item 6 has some serious problems as it is negatively correlated with ten of the other eleven items representing the subscale. Therefore, it is not a surprise that it has a negative item-total correlation (-.04). The negative and weak correlations between items 7, 8, and 31 with other items on the scale produces low item-total correlations for these three items (.09, .18, .21). Finally, the Alpha if Item Deletedcolumn indicates that the reliability estimate of the scores would increase if items 6 or 7 are removed and would remain the same if item 8 removed. In fact, if items 6, 7, and 8 are removed, the reliability increases to .70.

Problems items were also easily identified for the Ethics subscale. Specifically, items 24, 28, 32, 36, 38 and 39 had several negative or extremely weak relationships with other items representing the subscale. This information is further represented by the low or negative item-total correlations for these items. If items 24, 32, 38, or 39 were deleted reliability would increase (if all four are deleted it increases to .60, if 36 was also deleted it would increase to .61).

For the Community subscale, items 25 and 41 have several negative correlations with other items representing this construct. They, in turn, have negative item-total correlations. The Alpha if Item Deletedcolumn reveals that if item 25 were removed from the subscale, the reliability estimate of the scores would increase from .5233 to .6500; this is quite an increase. In addition, it indicates that reliability would increase if item 41 was deleted. In fact, reliability increases to .69 when both items are removed from the subscale.

When examining the statistical characteristics of the items listed above (6, 7, 8, 9, 24, 25, 27, 28, 31, 32, 34, 36, 38, 39, 41), a few of them have little or no variance (e.g., 6, 7, 27) at post-test (see Table 2). Because reliability is a function of the variance of the responses to items, low variance results in low reliability. Specifically, all or almost all students are responding correctly to these three items at post-test. Obviously, this is a good thing if the program promotes understanding of the concepts these items represent (even though it does decrease the estimate of reliability). However, notice that the majority of the students understood these concepts before completing the program (high percentage passing at pre-test). This questions the necessity of these items for evaluating the effectiveness of the program.

At this point, the items that functioned poorly for each subscale should be examined by content/program experts to try to identify the cause of the problem (poorly worded item, item doesnt actually represent construct, confusing options, etc.). For example, items 9, 27, and 34 of the Integrity scale caused concern. As noted above, item 27 had no variance at post-test, which in turn decreases its relationship with other items representing the scale. That is fairly easy to diagnose. Item 9, however, had extremely weak relationships with other items and this doesnt seem to be due to a lack of variance. This item was not answered in the same manner as the other items representing the Integrity construct (demonstrated by the weak relationships with other items). The same can be said for item 34. The question that needs to be answered by program directors is why? Obviously, there is no right or wrong answer

Page 3: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence

to this question, but if these items continue to function poorly something should be done. What that something is depends on how the problem is diagnosed.

When looking at the reliability of the scores from the Values subscale, it looks adequate. In fact, the items causing the biggest problems (6, 7) are simply a function of no variance (everyone is basically answering the question correctly at post-test). As noted above, given that students come in to the program with this knowledge, a decision should be made if these items are necessary to cover the breadth of this construct.

There were several items that caused problems for the Ethnic subscale. Again, you should go through each of these items to try to diagnose the problem. For example, Item 38 displays some interesting characteristics. There is variability in the responses (see Table 2) so the problem is not a function of low variance. When looking at the frequency distributions in Appendix B, it seems that students are split between b, the correct answer, and c, one of the distracters. Therefore, a possibility for the negative correlation with other items on the scale is that students who have gotten the other items correct may have gotten this item wrong because they felt that cis the best definition.

As a final example, item 25 was one of the problem items representing the Community scale. However, on the surface, the wording of the item appears to fit in the Values subscale. Each item should clearly represent the construct of interest with limited overlap with other constructs. This may be an issue with this item.

It should be re-stated that the reliability estimates reviewed above are for the post-test scores. Analyses were also conducted for the pre-test scores and they indicated that items 6, 25, and 39 present the same concern as they did on the post-test. However, at pre-test, items 11, 12, 28, 29, and 32 also have negative correlations with the total subscale scores. To summarize our reliability analyses, it seems that one can confidently make inferences about program effectiveness using the total score, however, we caution use of the subscale scores. If the instrument was designed for the purpose of attaining a studentsprofile in terms of integrity, values, ethics, and community, we highly encourage further work on this measure.

Appendix B contains the frequency distributions for the unscored items both before and after the program (make sure to examine the Valid Percent Column). These items were left unscored so one could see which distracter options were being chosen most often at each time period (the option with the * next to it was the correct answer). This can help identify misconceptions students bring to the program and those they leave with. Interestingly, there are several items that a majority of students answer correctly at the pre-test. This indicates that they come into the program with this knowledge. If 70% or more students answered the item correctly, it was considered that they had prior knowledge of the item. Items showing this effect are items 1-12, 14, 17, 24, 26, 27, 28, 31, 32, 34-37, 39-44, 46, and 47 (68% of the items). This indicates that students know the majority of the concepts before completing the program. This information can be useful in program development or redesign. Interestingly, for item 19, fewer students get the item correct at post-test than at pre-test.

Finally, Table 4 displays the results for the demographic items (48-51). According to these results, most students taking this administration of the EVI-2 are males, sophomores, and have not completed the By the Numbers or the Calling the Shots sanctions.

Page 4: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence

TABLE 1

Subscale Means, Standard Deviations, and t-values

Scale Pre-Test Mean (SD)

Post-Test Mean (SD)

Number of

Subjects

t-values, p

Integrity

(items 2, 5, 9, 15, 17, 21, 27, 34) 6.1277 (1.32889) 6.7872 (1.33410) 47 3.028, p=.004

Values

(items 1, 4, 6, 7, 8, 14, 20, 22, 23, 26, 31, 45)

9.2128 (1.68027) 10.2979 (1.87564) 47 3.944, p=.000

Ethics

(items 11, 13, 16, 18, 19, 24 28, 29, 30, 32, 33, 35, 36, 38,

39, 40, 46)

11.3191 (1.95722) 13.3617 (2.35377) 47 3.660, p=.001

Community

(items 3, 10, 12, 25, 37, 41, 42, 43, 44, 47)

7.5745 (1.26396) 7.9149 (1.26542) 47 1.563, p=.125

Total 35.1489 (4.42807) 39.2979 (5.83809) 47 5.833, p=.000

TABLE 2

Item Means and Standard Deviations

Item Number

Pre-Test Mean (SD)

Post-Test Mean (SD)

Number of

Subjects 1 .81 (.398) .85 (.360) 47 2 .83 (.383) .85 (.360) 47 3 .92 (.282) .94 (.247) 47 4 .85 (.360) .91 (.282) 47 5 .77 (.428) .96 (.204) 47 6 .87 (.337) 1.00 (.000) 47 7 .96 (.204) .98 (.146) 47 8 .98 (.146) .87 (.337) 47 9 .83 (.383) .81 (.398) 47 10 .89 (.312) .94 (.247) 47 11 .77 (.428) .89 (.312) 47 12 .96 (.204) .94 (.247) 47 13 .40 (.496) .75 (.441) 47 14 .92 (.282) .94 (.247) 47 15 .64 (.486) .79 (.414) 47 16 .53 (.504) .72 (.452) 47

Page 5: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence

Item Number

Pre-Test Mean (SD)

Post-Test Mean (SD)

Number of

Subjects 17 .96 (.204) .83 (.380) 47 18 .34 (.479) .75 (.441) 47 19 .70 (.462) .55 (.503) 47 20 .38 (.491) .62 (.491) 47 21 .53 (.504) .72 (.452) 47 22 .30 (.462) .66 (.479) 47 23 .72 (.452) .91 (.285) 47 24 .89 (.312) .89 (.312) 47 25 .47 (.504) .60 (.496) 47 26 .81 (.398) .85 (.360) 47 27 .85 (.360) 1.00 (.000) 47 28 .85 (.360) .85 (.360) 47 29 .11 (.312) .57 (.500) 47 30 .49 (.505) .68 (.471) 47 31 .92 (.282) .92 (.282) 47 32 .87 (.337) .85 (.363) 47 33 .68 (.471) .85 (.360) 47 34 .89 (.312) .83 (.380) 47 35 .94 (.247) .98 (.146) 47 36 .92 (.282) .94 (.247) 47 37 .89 (.312) .98 (.146) 47 38 .45 (.503) .60 (.496) 47 39 .85 (.360) .94 (.247) 47 40 .79 (.414) .77 (.428) 47 41 .89 (.312) .94 (.247) 47 42 .94 (.247) .89 (.312) 47 43 .87 (.337) .89 (.312) 47 44 .89 (.315) .87 (.337) 47 45 .70 (.462) .81 (.398) 47 46 .75 (.441) .81 (.398) 47 47 .79 (.414) .87 (.337) 47

TABLE 3

Reliability of Total Test and Subscales

Pre-Test Post Test Integrity Subscale a=.3826 a=.4761 Values Subscale a=.4835 a=.6765 Ethics Subscale a=.3822 a=.5344 Community Subscale a=.4517 a=.5233 Total Test a=.7152 a=.8393

Page 6: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence

TABLE 4

Results of Demographics Items 48-51

The breakdown of the total number of students who responded to these demographic items at pre-test and post-test is reported. The Combined columns are the results for those who responded at both pre- test and post-test. Some students did not answer these or used a response that is not included in the choice of responses.

PRE-TEST POST-TEST COMBINED N % N % N %

Gender: Male 40 78.4 39 70.9 35 76.1

Female 11 21.6 16 29.0 11 23.9 Total 51 100.00 55 100.00 46 100.00

Class Level: Freshman 14 27.5 1 21.4 11 23.9

Sophomore 27 52.9 29 51.8 26 56.5 Junior 7 13.7 9 16.1 6 13.0 Senior 3 5.9 6 10.7 3 6.6 Total 51 100.00 56 100.00 46 100.00

By the Numbers workshop: Yes 11 22.0 12 22.6 10 22.2 No 39 78.0 41 77.4 35 77.8

Total 50 100.00 53 10.00 45 100.00

Calling the Shots program: Yes 3 6.1 3 5.6 3 6.7 No 46 93.9 51 94.4 42 93.3

Total 49 100.0 54 100.0 45 100.0

Page 7: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence

APPENDIX A

Item Total Correlations and Correlation Matrices for Subscales

INTEGRITY SUBSCALE

Correlation Matrix

I2P I5P I9P I15P I17P I21P I27P I34P

I2P 1.0000

I5P .0532 1.0000

I9P .3791 .0358 1.0000

I15P .2869 -.0080 .1304 1.0000

I17P -.0536 .1831 -.0732 .2144 1.0000

I21P .1263 .2918 .0849 .3417 .4341 1.0000

I27P -.0771 .2779 -.0826 -.0985 .0880 .0596 1.0000

I34P .0792 .0200 -.0732 .2144 -.0915 -.0492 -.0880 1.0000

Item-Total Statistics

Scale Mean if Item Deleted

Corrected Variance if Item Deleted

Item-Total Correlation

Alpha if Item Deleted

I2P 5.8070 1.4799 .2704 .4203 I5P 5.7544 1.5815 .2374 .4377 I9P 5.8246 1.5758 .1324 .4750 I15P 5.8772 1.2882 .4027 .3512 I17P 5.8421 1.4925 .2065 .4457 I21P 6.0175 1.1604 .4252 .3244 I27P 5.7018 1.8202 -.0425 .5032 I34P 5.8421 1.6711 .0152 .5227 Alpha=.4761

Page 8: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence

VALUES SUBSCALE

Correlation Matrix

I1P I4P I6P I7P I8P I1P 1.0000 I4P .2831 1.0000 I6P -.0550 -.0374 1.0000 I7P .1964 -.0534 -.0259 1.0000 I8P .0236 .1281 -.0467 -.0667 1.0000 I14P .3561 .2419 -.0321 -.0458 .1740 I20P .0896 .2028 .1676 .0422 .0760 I22P .1386 .2406 -.0966 .0653 .1176 I23P -.1278 .3995 -.0422 .2772 .0940 I26P .3771 .2562 -.0590 -.0842 .1628 I31P -.1132 .1923 -.0374 -.0534 .1281 I45P .1835 .3864 -.0667 .1470 .1194

I14P I20P I22P I23P I26P I31P I45P

I14P 1.0000

I20P .2958 1.0000

I22P .1645 .4275 1.0000

I23P .2036 .2610 .4369 1.0000

I26P .3278 .1458 .0972 .2040 1.0000

I31P -.0660 .3448 .3870 .1563 -.1214 1.0000

I45P .0820 .2465 .2153 .3181 .2732 .0374 1.0000

Item-Total Statistics

Scale Mean if Item Deleted

Corrected Variance if Item Deleted

Item-Total Correlation

Alpha if Item Deleted

I1P 9.4643 3.0169 .2583 .6665 I4P 9.3929 2.9701 .4698 .6385 I6P 9.3393 3.4646 -.0483 .6886 I7P 9.3571 3.3610 .0908 .6815 I8P 9.4286 3.1584 .1826 .6764 I14P 9.3750 3.1114 .3686 .6538 I20P 9.7143 2.4987 .4601 .6286 I22P 9.6607 2.5192 .4688 .6261 I23P 9.4107 2.9373 .4444 .6391 I26P 9.4821 2.9088 .3262 .6552 I31P 9.3929 3.1883 .2183 .6701 I45P 9.5179 2.7633 .4010 .6411 Alpha=.6765

Page 9: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence

ETHICS SUBSCALE

Correlation Matrix

I11P I13P I16P I18P I19P I11P 1.0000 I13P -.0044 1.0000 I16P .0273 .1771 1.0000 I18P .1526 .1771 .0418 1.0000 I19P .1336 .1307 .2045 .0341 1.0000 I24P .2588 -.2025 .1056 .1056 .1291 I28P -.0214 .0413 .1928 .1928 .1405 I29P .3240 .0822 .0750 .1591 .2544 I30P .0987 .0914 .2414 .1511 .0161 I32P .0179 .2359 -.0979 .2778 .0223 I33P -.0386 .3208 -.1672 .0492 .0000 I35P .3563 .2125 -.0795 .2329 .1667 I36P -.0917 .0224 .2272 .0434 .1307 I38P .0367 -.0914 .1421 -.2015 .1222 I39P .1031 -.1794 -.1637 -.1637 -.0857 I40P .0624 .2432 -.2076 .0955 .1078 I46P -.0546 .1801 .2296 .0209 .1485

I24P I28P I29P I30P I32P I24P 1.0000 I28P .0311 1.0000 I29P .2319 .2054 1.0000 I30P .0622 -.0831 .2842 1.0000 I32P -.1208 -.0214 -.0060 -.0193 1.0000 I33P -.1491 .0463 .3457 .1947 -.0386 I35P -.0430 -.0602 .1547 .2035 .3563 I36P -.0760 -.1062 .1115 .0126 -.0917 I38P .0118 -.3476 -.0123 .0412 -.1878 I39P -.0886 .0654 .0359 -.1873 -.1069 I40P .1392 .2423 .2453 .2182 .0624 I46P -.1581 .0246 .0183 .0590 -.0546

I33P I35P I36P I38P I39P I33P 1.0000 I35P -.0642 1.0000 I36P -.1132 -.0327 1.0000 I38P -.0794 -.1069 .3056 1.0000 I39P .0495 -.0381 -.0673 .0681 1.0000 I40P .3217 .2576 -.1269 -.2339 .1911 I46P .2357 .2722 .0801 -.0187 .2100

Page 10: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence

Item-Total Statistics

Scale Mean if Item Deleted

Corrected Variance if Item Deleted

Item-Total Correlation

Alpha if Item Deleted

I11P 12.5455 4.8451 .2206 .5132 I13P 12.7091 4.5434 .2719 .4993 I16P 12.6727 4.7057 .2023 .5150 I18P 12.6727 4.6687 .2226 .5106 I19P 12.8182 4.4108 .3032 .4905 I24P 12.5091 5.1064 .0719 .5355 I28P 12.5818 4.9515 .1167 .5308 I29P 12.8545 4.0896 .4667 .4468 I30P 12.7273 4.5354 .2678 .5000 I32P 12.5455 5.1414 .0199 .5451 I33P 12.6000 4.7630 .2180 .5123 I35P 12.4364 5.0653 .3317 .5173 I36P 12.4727 5.1428 .0862 .5327 I38P 12.8000 5.2741 -.1020 .5840 I39P 12.4909 5.2916 -.0626 .5510 I40P 12.6364 4.6061 .2823 .4986 I46P 12.6182 4.7589 .2061 .5144 Alpha=.5344

COMMUNITY SUBSCALE

Correlation Matrix

I3P I10P I12P I25P I37P I3P 1.0000 I10P .2963 1.0000 I12P .2963 .6481 1.0000 I25P -.0255 .1359 .1359 1.0000 I37P .5669 .5669 .5669 -.1059 1.0000 I41P -.0648 -.0648 -.0648 -.0767 -.0367 I42P .2046 .2046 .2046 -.2458 .4309 I43P .1512 .1512 .1512 -.1868 .3571 I44P .1512 .1512 .1512 -.0771 .3571 I47P .1310 .1310 .1310 .0946 .3307

I41P I42P I43P I44P I47P I41P 1.0000 I42P -.0852 1.0000 I43P .3157 .2619 1.0000 I44P -.1028 .2619 .3486 1.0000 I47P .0867 .0532 .1566 .1566 1.0000

Page 11: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence

Item-Total Statistics

Scale Mean if Item Deleted

Corrected Variance if Item Deleted

Item-Total Correlation

Alpha if Item Deleted

I3P 7.9474 1.4793 .3156 .4784 I10P 7.9474 1.4079 .4571 .4456 I12P 7.9474 1.4079 .4571 .4456 I25P 8.2807 1.5627 -.0822 .6500 I37P 7.9123 1.4743 .6565 .4509 I41P 7.9649 1.6416 -.0076 .5531 I42P 7.9825 1.4818 .2011 .5028 I43P 8.0175 1.3390 .3320 .4601 I44P 8.0175 1.3747 .2816 .4774 I47P 8.0351 1.3559 .2749 .4793 Alpha=.5233

Page 12: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence

APPENDIX B

Frequency Tables for Items 1-47

The letter (P) in the item number indicates that this is a post-test item. The asterisk (*) indicates the correct response. Examine the valid percent column as it does not include missing data.

Page 13: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 14: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 15: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 16: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 17: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 18: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 19: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 20: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 21: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 22: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 23: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 24: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 25: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 26: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 27: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 28: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 29: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 30: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence
Page 31: Sample Assessment Results · 2019. 9. 6. · summated scale score because the items used to compute the score are not answered consistently. The more reliable a test, the more confidence