© 1996, carolyn b. cropper
TRANSCRIPT
A GENERALIZABILITY THEORY STUDY OF A SURVEY
INSTRUMENT TO IDENTIFY GIFTED AND TALENTED
STUDENTS: THE LOOKING FOR TRAITS, ATTRIBUTES,
AND BEHAVIORS STUDENT REFERRAL FORM
by
CAROLYN BROWN CROPPER, B.S.C, M.A.
A DISSERTATION
IN
EDUCATIONAL PSYCHOLOGY
Submittecj to the Gracjuate Faculty of Texas Tech University in
Partial Fulfillment of the Requirements for
the Degree of
DOCTOR OF EDUCATION
Approvecj
Accepted
December, 1996
© 1996, Carolyn B. Cropper
ACKNOWLEDGMENTS
I would like to express my appreciation to my committee
chair. Dr. Mary Tallent-Runnels, for her support, guidance and
understanding and to the other members of my committee, Dr. Joe
Cornett and Dr. Julie Thomas, for their support, guidance and
understanding.
I would also like to express my appreciation to my family,
Michael D. Cropper and Austin Cropper, for their encouragement and
patience.
II
TABLE OF CONTENTS
ACKNOWLEDGMENTS i i
ABSTRACT vii
LIST OF TABLES ix
CHAPTER
I. INTRODUCTION 1
Statement of the Problem 5 Purpose of the Study 6 Research Questions 7 Definition of Terms 8 Assumptions and Limitations 12
II. REVIEW OF THE LITERATURE 14
Issues Related to the Identification of Gifted Minority Children: An Overview 14
Issues to Consider When Assessing Students From Diverse Backgrounds 18
Behavorial Differences 18 Language Differences 20 Differences in Cognitive Styles 21
Survey Instruments 23 Teachers as Raters 27 Parents as Raters 31 Reliability and Validity of Survey
Instruments 34
III. METHODOLOGY 38
Participants 38 Teachers 38 Students 40 Parents 41
Instrument 41
Procedure 45
Data Analysis 48
Significance of the Study 51
IV. RESULTS 53
Descriptive Statistics 54 Ratings by the Classroom Teacher 54 Ratings by the Gifted and Talented
Program Teacher 55 Ratings by the Parents 56
Generalizabllity Theory 58
Generalizability Analysis 59 Combined Raters 59
Combined Ratings For Total Sample 59 Combined Ratings For Anglo Sample 60 Combined Ratings For
African-American Sample 60 Combined Ratings For Hispanic Sample 64
Teacher Ratings 66 Teacher Ratings For Total Sample 67 Teacher Ratings For Anglo Sample 70
IV
Teacher Ratings For African-American Sample 70
Teacher Ratings For Hispanic Sample 73
Parent Ratings 73 Parent Ratings For Total Sample 73 Parent Ratings For Anglo Sample 75 Parent Ratings For African-American
Sample 75 Parent Ratings For Hispanic Sample 78
Decision Studies Considerations 78
V. DISCUSSION 84
Generalizability Finding for TABS 85
D-Study Findings for TABS 86
Teachers As Raters 87
Parents As Raters 89
Limitations of the Study 90
Directions for Future Research Using the TABS Form 91
Implications 92
REFERENCES 94
APPENDIX
A. THE LOOKING FOR TRAITS, ATTRIBUTES AND BEHAVIORS STUDENT REFERRAL FORM 106
B. ANOVA SUMMARY TABLES AND GENERALIZABILITY CALCULATIONS FOR ALL RATINGS 111
C. GENOVA PROGRAM AND SAMPLE DATA 142
VI
ABSTRACT
Effective ways to identify children from economically
disadvantaged and limited English proficient backgrounds for
participation In programs for the gifted continues to gain much
attention. Numerous Instruments have been developed to aid In the
identification process. The Looking for Traits, Attributes and
Behaviors Student Referral (TABS) is one instrument designed to
specifically aid in the identification of giftedness in the minority
child by providing information from educators and other individuals
closely associated with the child. However, minimal information
has been published about the validity and reliability of the TABS.
This study investigated the reliability of TABS utilizing the
generalization theory.
Three groups of raters (regular classroom teachers, gifted and
talented program teachers, and parents) completed the TABS for
127 third grade students. The group of parents independently rated
each student on two occasions three months apart.
VII
Results indicated that minimal variance was noted between
the various source of error. Several sample measurement protocals
were also investigated. Results suggested that multiple raters
provide a more comprehensive view of the student when attempting
to screen for participation in a gifted and talented program and the
TABS form is a valuable instrument for this process.
VIM
LIST OF TABLES
1. Ratings by Classroom Teacher, Gifted and Talented Program Teacher and Parent 57
2. Generalizability Calculations for Combined Ratings, Total Sample 61
3. Generalizability Calculations for Combined Ratings, Anglo Sample 62
4. Generalizability Calculations for Combined Ratings, African-American Sample 63
5. Generalizability Calculations for Combined Ratings, Hispanic Sample 65
6. Generalizability Calculations for Teacher Ratings, Total Sample 68
7. Generalizability Calculations for Teacher Ratings, Anglo Sample 69
8. Generalizability Calculations for Teacher Ratings, African-American Sample 71
9. Generalizability Calculations for Teacher Ratings, Hispanic Sample 72
10. Generalizability Calculations for Parent Ratings, Total Sample 74
11. Generalizability Calculations for Parent Ratings, Anglo Sample 76
IX
12. Generalizability Calculations for Parent Ratings, African-American Sample 1'7
13. Generalizability Calculations for Parent Ratings, Hispanic Sample 79
14. D-Study: Phi Coefficients for Items Omitted from TABS 81
15. D-Study: Phi Coefficients of TABS with Parents As Raters and Occasions, Varied 82
16. D-Study: Phi Coefficients of TABS for Raters and Occasions, Varied 83
17. ANOVA Summary Table for Combined Ratings, Total Sample 112
18. Variance Components for Combined Ratings: Classroom Teachers, Gifted and Talented Program Teachers, and Parents, Total Sample 113
19. ANOVA Summary Table for Combined Ratings, Anglo Sample 114
20. Variance Components for Combined Ratings: Classroom Teachers, Gifted and Talented Program Teachers, and Parents, Anglo Sample 11 5
21. ANOVA Summary Table for Combined Ratings, Asian Sample 116
22. Generalizability Calculations for Combined Ratings, Asian Sample 117
23. ANOVA Summary Table for Combined Ratings, African-American Sample 118
24. Variance Components for Combined Ratings: Classroom Teachers, Gifted and Talented Program Teachers, and Parents, African-American Sample 119
25. ANOVA Summary Table for Combined Ratings, Hispanic Sample 120
26. Variance Components for Combined Ratings: Classroom Teachers, Gifted and Talented Program Teachers, and Parents, Hispanic Sample 121
27. ANOVA Summary Table for Teacher Ratings, Total Sample 122
28. Variance Components for Teacher Ratings, Total Sample 123
29. ANOVA Summary Table for Teacher Ratings, Anglo Sample 124
30. Variance Components for Teacher Ratings, Anglo Sample 125
31. ANOVA Summary Table for Teacher Ratings, Asian Sample 126
32. Generalizability Calculations for Teacher Ratings, Asian Sample 127
33. ANOVA Summary Table for Teacher Ratings, African-American Sample 128
XI
34. Variance Components for Teacher Ratings, African-American Sample 129
35. ANOVA Summary Table for Teacher Ratings, Hispanic Sample 130
36. Variance Components for Teacher Ratings, Hispanic Sample 131
37. ANOVA Summary Table for Parent Ratings, Total Sample 132
38. Variance Components for Parent Ratings, Total Sample 133
39. ANOVA Summary Table for Parent Ratings, Anglo Sample 134
40. Variance Components for Parent Ratings, Anglo Sample 135
41. ANOVA Summary Table for Parent Ratings, Asian Sample 136
42. Generalizability Calculations for Parent Ratings, Asian Sample 137
43. ANOVA Summary Table for Parent Ratings, African-American Sample 138
44. Variance Components for Parent Ratings, African-American Sample 139
45. ANOVA Summary Table for Parent Ratings, Hispanic Sample 140
XM
46. Variance Components for Parent Ratings, Hispanic Sample 141
XIII
CHAPTER I
INTRODUCTION
Many students from minority groups and/or economically
disadvantaged families are often denied eligibility for services In
programs for the gifted when only IQ scores are utilized to
determine eligibility. As a result, Identified gifted students form a
group that is usually culturally or ethnically homogeneous (Keller,
1990). Zappia (1989) reported that 80% of the enrollment in gifted
programs is Anglo-American with African-Americans, Hispanics and
Asians constituting only 18% of the enrollment. This homogeneity is
not necessarily a problem if the identified students can truly be said
to represent all those who should be included. Many researchers,
however, have strongly asserted that giftedness Is proportionally
represented in every ethnic and cultural group and at all
socioeconomic levels (Clark, 1992; Davis & RImm, 1994; Frasier,
1987; Gallagher, 1994; Kitano & Kirby, 1986; Marland, 1972).
DeHaan and Havighurst (1961) cautioned against relying on test
data alone to identify and select students with gifts. They asserted
that the complex, multidimensional nature of mental abilities
suggests that above-average ability can be described more
adequately as a group of independent factors than as a general
ability expressed by IQ.
Frasier (1990b) cautioned that one reason for
underrepresentation of minority and low socioeconomic students
may be that reliance on conventional identification procedures has
reduced the multlfaceted, complex phenomenon called giftedness to
a single-faceted phenomenon—high performance on intelligence
tests. Current views on intelligence and the assessment of
intellectual capacity suggest that intelligence tests provide a
narrow view of an Individual's abilities, therefore, a more
comprehensive assessment of ability is needed (Barkan, & Bernal,
1991; Bermudez & Rakow, 1990; Hunsaker & Callahan, 1993;
Pendarvis, Howley & Howley, 1990).
Another reason (Frasier, 1990b) for the underrepresentation of
these students in gifted programs is related to the inability of
educators to recognize children's "gifted behaviors." Many children's
gifts and talents go unrecognized. Including some minority group
children, underachievers, children whose proficiency with the
English language is limited, and highly gifted children. Although not
all the problems of these groups are the same, they do represent the
groups that are most underrepresented in programs for gifted
children (Clark, 1992; Gallagher 1994).
Giftedness exists in all human groups (Baldwin, 1980) and
minority group students may manifest gifted characteristics
differently from the majority student population (Frasier, 1987;
Renzulli, 1973; Torrance, 1969), therefore, suggestions for finding
ability in culturally diverse students lean heavily toward the
identification of noncognltive skills. Much of the investigation has
centered on creative abilities (Bernal, 1978). Frasier (1990b)
incorporated noncognltive gifted characteristics in the TABS and
presented the TABS as an instrument designed to pay special
attention to the different ways in which children from different
cultures manifest behavioral indicators of giftedness.
Current discussions concerned with the measurement of
intelligent behavior emphasize the use of multiple criteria (Hoge,
1989; Passow & Rudnitski, 1993). Maker and Schiever (1989) noted
two predominant conclusions from the various viewpoints presented:
(1) use multiple assessment procedures, including objective and
subjective data from a variety of sources; and (2) use a case study
approach, in which a variety of assessment data is interpreted in the
context of a student's individual characteristics.
According to Delcourt et al. (1994), solutions to the
underrepresentation of gifted students from minority groups have
taken many forms: (1) nominations from sources other than teachers;
(2) alternative checklists and rating scales; (3) conventional
identification models; (4) culturally sensitive standardized tests;
(5) matrix, culture-specific, quota system, and identification-
through-instruction models. Despite the worthiness of these and
other solutions, the representation of minority students in programs
for the gifted remains low.
In order to increase the minority representation in gifted
programs, many schools have implemented a multlfaceted
identification process to identify students for gifted and talented
programs. Often, survey instruments are included in the
multlfaceted identification process.
The classroom teacher is the individual in the school setting
who has the most contact with each student and is often asked to
rate behaviors on the survey instrument which are observed within
the school environment (Frasier, 1990a; Keller, 1990). Several
rating scales utilizing classroom teachers as raters are available to
assist in the identification of the gifted and talented student.
Statement of the Problem
The Looking for Traits. Attributes and Behaviors Student
Referral Form (TABS) is a rating scale currently utilized by
numerous school districts In Georgia and Texas (Mary Frasier,
personal communication. May 12, 1996). This survey instrument was
designed to specifically aid in the identification of minority and/or
economically disadvantaged gifted students. Many school districts,
however, are currently using this form throughout their entire
student population as the principal Instrument to aid in the
identification of gifted students. The TABS is an instrument with
minimal published research about Its inner-rater reliability or
validity. Additional information Is needed to determine if this
instrument will provide stable and consistent scores across raters.
If it is found to yield reliable scores, teachers' responses can be
utilized with increased confidence to aid in the Identification
process, thereby allowing for the most appropriate placement of the
student.
Purpose of the Study
Additional information is needed to determine if the TABS
will provide stable and consistent scores across raters and
occasions. This study Is an attempt to determine the reliability of
TABS scores using more than one rater to identify gifted and
talented elementary students. It will also examine the consistency
of raters' scores over time.
Research Questions
In general, this investigation is intended to answer the
following questions.
1. What is the reliability of the TABS scores using
generalizability to consider the facets of students, raters, and
occasions? (G-Study)
2. What is the effect on reliability and measurement precision
of using more or less than two raters? (D-Study)
3. What is the effect on reliability and measurement precision
of using more or less than two rating occasions? (D-Study)
4. What Is the effect on reliability and measurement precision
of using more or less than 10 items? (D-Study)
Definitions of Terms
Absolute Decision. This measurement is utilized to index the
absolute level of an individual's performance without regard to how
well or how poorly the individual's peer performed. Attention is
focused on the absolute value of an individual's performance, not
relative standing (Shavelson & Webb, 1991).
Classical True Score Theorv. The classical true score theory
examines true scores and error. Classical true score theory
evaluates one source of error in a measurement at a time.
Therefore, numerous reliability coefficients have been developed to
measure such factors as internal consistency reliability and test-
retest reliability. Determining reliability when utilizing the
classical true score theory requires multiple reliability coefficients
for each Instrument. The results are that the coefficients are often
different and contradictory (Eason, 1991).
Decision Study or D-$tudy. The D-study extends the results of
the G-study by placing an emphasis on estimation, use and
interpretation of the generated variance components for decision-
8
making with well-specified measurement procedures (Brennan,
1983). The D-study includes only facets of interest and varies those
values to determine the optimum number of items, forms, occasions,
or raters to include in the research study to achieve dependable
measurement.
Facet. Facets are the characteristics of the testing situation
that contain error variance (Thompson, 1989). Characteristics
Include the specifics of the measuring situation (test forms,
occasions, raters, etc.).
Generalizability Theorv. Generalizability theory-an
alternative to classical true score theory-provides alternative
ways of estimating the respective amounts of variance contributed
by all possible sources of variance that are operative in a given
testing situation. According to generalizability theory, given the
exact same conditions in the universe, the exact same test score
should be obtained (Cronbach, 1972). Generalizability theory
reminds us that a test's reliability Is not something that statically
resides within the test. The reliability of a test is a function of the
circumstances under which the test is developed, administered, and
interpreted. The generalizability theory allows the researcher to
consider all sources of error simultaneously. Through the use of
this theory, the researcher is also able to determine inter-rater
reliability as one source of error involving persons, occasions
and/or raters (Marsden, 1993; Thompson, 1989).
G-Study. The purpose of a G-study includes all the facets of
interest to obtain estimates of error variance components
associated with a universe of admissible observations (Brennan,
1983).
Inter-Rater Reliability. Inter-rater reliability is the form of
reliability that seeks to establish agreement between individuals
who are scoring data pieces. When a measure does not have scaled
response options such as true-false or multiple choice, It is
essential to establish inter-rater reliability. This is done to ensure
that the rating by different individuals remains the same across
cases. For example, if inter-rater reliability has been established in
the scoring of writing samples, one would expect that the scores of
10
a given piece would be the same in most if not all cases from
different raters. As a result of developing inter-rater reliability,
subjectivety is limited (Johnson, 1985).
Measurement Error. Measurement error is a component of
classical test theory which states that only one source of error can
be estimated at a time.
Relative Decisions. Relative decisions are based only on those
sources of error affecting the relative standing of individuals and
are used to rank order individuals or groups.
Reliability. Reliability refers to the consistency of
measurements (Pagano, 1990) and is generally considered to be
synonymous with dependability or consistency. Reliability refers to
the attribute of consistency in measurement. Test results need to
be dependable, therefore, they should be reproducible, stable
(reliable), and meaningful (valid). Reliability is expressed by a
reliability coefficient or by the standard error of measurement,
which is derived from the reliability coefficient. This consistency
of measurement is also referred to as dependability (Shavelson &
n
Webb, 1991). Classical true score theory and generalizability
theory are two ways to evaluate reliability.
Validitv. Validity refers to a judgment concerning how well a
test measures what it purports to measure (Pagano, 1990).
Assumptions and Limitations
Generalizability theory makes the following assumptions when
considering a set of data.
1. The population is assumed to consist of persons.
2. The universe of admissible observations and the universe of
generalizations involve conditions from the same single facet.
3. The described facets, especially persons are considered to
be essentially Infinite (N ^- o o ) .
4. Generalizability theory makes no assumptions about the
distributional form of the data. This study assumes that the data
set are representative sample of the universe.
5. The model assumes that the number of items is constant In
each content category or subscale of an instrument.
12
6. The G and D-study designs are basically the same. This does
not mean that the data are available from two different studies with
similar designs. It refers to the concept that the "estimated random
effects variance components are available from a G-study and ... they
can be used to make Inferences to an Infinite universe of
generalizations, based on applications of basically the same design"
(Brennan, 1983, p. 55).
GENOVA is a statistical method designed to analyze large sets
of data and consider all identified sources of error along with their
specific interactions. GENOVA, however, will not compute
coefficients for unbalanced data sets and/or unequal groupings. Data
sets containing unequal groupings must be balanced by randomly
selecting and discarding data from the set in order to balance the
set.
13
CHAPTER II
REVIEW OF THE LITERATURE
This review focuses on the issues related to the Identification
process of economically disadvantaged and limited English proficient
children for gifted and talented educational programs. Literature
pertaining to assessment related issues which include the reliability
of rating scales in the identification of economically disadvantaged
and limited English proficient gifted children, teachers and parents
as raters, and the use of the TABS is also discussed. Finally, a
rationale for generalizability theory as the preferred method of data
analysis rather than classical true score theory Is presented.
Issues Related to the Identification of Gifted Minority Children: An Overview
The identification of children from economically
disadvantaged and limited English proficient backgrounds for
participation in programs for gifted and talented students
14
continues to pose problems for educators. While research (Baldwin,
1991; Clark, 1992; Davis & Rimm, 1994; Gallagher, 1994b; Kitano &
Kirby, 1986; Marland, 1972) has illustrated the potential for
giftedness in every segment of society, the underrepresentation of
economically disadvantaged and limited English proficient students
in gifted and talented programs exists to this day. Literature
suggested giftedness is a complex, multlfaceted phenomenon, yet
more traditional and current practices across the nation define and
look for giftedness through the dominant use of intelligence and
achievement scores (Ford & Harris, 1990; Frasier, 1990a; Hunsaker
& Callahan, 1993; Treffinger & Renzulli, 1986).
Because many definitions and theories of giftedness are
grounded in psychometrics, educators rely heavily on Intelligence
and achievement tests only to decide who is gifted. Since many
minority students often score poorly on traditional intelligence and
achievement tests, many of these students are unlikely to be
identified as gifted (Ford & Harris, 1996).
15
When intelligence and achievement scores are the only
Instruments utilized in the Identification process for gifted and
talented individuals, giftedness is considered to be a static and
closed phenomenon and students must "fit" this definition. For
example, in some states, an individual scoring at the 99th percentile
on a standardized measure of mental ability Is considered gifted.
Consequently, significant numbers of Individuals from culturally
diverse, economically disadvantaged, bilingual, and rural
backgrounds are not placed in gifted programs, not because of lack of
cognitive, motivational, artistic, or creative potential, but because
traditional criteria does not assess the skills, knowledge, or
aptitudes they do possess. The use of intelligence and achievement
scores as the only Identification criteria has long been an Issue in
gifted education (Frasier, 1991; Frasier & Passow, 1994, Office of
Educational Research and Improvement, 1993).
Research by Gardner (1983) and Sternberg (1985) indicated
that intelligence (e.g., creativity, interpersonal intelligence) cannot
be adequately measured by traditional intelligence and achievement
16
tests. Gardner (1983) proposes a theory of multiple intelligences
that includes seven relatively independent Intelligences-
linguistic, musical, logical-mathematical, spatial, bodily-
kinesthetlc, interpersonal, and IntrapersonaL He also suggested that
gifted students should be assessed within a framework that
considers the gifted students' cultural and ethnic background and the
quality and quantity of their learning opportunities.
Gardner (1983) proposed utilizing one of the intelligences that
is well developed as an alternative learning mode for other
intelligences not as developed. This use of the multiple
intelligences supporting one another creates a learning environment
through which gifted students can display their talents.
Sternberg (1985) theorized a triarchic concept of intelligence:
the Internal world of the student, the external world of the student,
and the Interaction between these two worlds on the student's
experience. The internal world is exemplified by analytical thinking;
the external world is exemplified by contextual thinking (strategies)
and the interaction between these two worlds on the student's
17
experience is exemplified by experiences in insightful ways. The
triarchic theory includes three kinds of mental processes; (a)
metaprocesses, used to plan, monitor, and evaluate one's problem
solving; (b) performance processes, used to carry out the
instructions of the metaprocesses; and (c) knowledge-acquisition
processes, used to figure out how to solve problems.
Issues to Consider When Assessing Students From Diverse Backgrounds
Behavioral Differences
Children from diverse backgrounds exhibit various cultural
differences. Behavior, cognitive style, and learning style should be
considered when evaluating children for giftedness because these
individual differences often work against the child from a diverse
background. For example, cultural deprivation may effect the
development of talent.
Frierson (1965) designed a study to determine any significant
differences between upper and lower status students to determine
the effects of cultural deprivation on talent development. The
18
students were divided into four groups: (a) upper status gifted
students, (b) lower socioeconomic status gifted students, (c) upper
status average students, and (d) lower socioeconomic status average
students. Frierson concluded that differences between the two
groups of gifted students were clearly associated with differences
in their socioeconomic status.
Delgado-Galtan and Trueba (1985) observed teachers working
with students in various classroom activities. When teachers were
asked to rank the students according to "worthwhile" activities.
Delgado-Galton and Trueba concluded that the teachers participating
in the study did not give students a high rating If the teachers did
not value the particular learning style in which the student was
working.
Learning styles that do not exemplify those represented by the
majority population In classrooms in the United States add to the
preception that children from minority groups are not candidates for
gifted programs. Evidence of characteristics associated with
giftedness may be different in minority children, yet educators are
19
seldom trained in identifying those behaviors in ways other than the
way the characteristics are observed in the majority culture
(Ramirez, Herold, & Castaneda, 1974).
Language Differences
One of the greatest issues in the assessment of children from
diverse backgrounds for gifted programs is language. Taylor (1990)
suggested that language is a great determiner of the perception of
ability about an individual. Therefore, a lack of knowledge,
sensitivity, or appreciation of diverse communication styles can
result in inappropriate assessment. For children whose first
language is not English, observed scores are often the result of lack
of experience with English rather than lack of comprehension of
ideas and concepts (de Bernard, 1985). An understanding of this may
be useful when assessment processes include writing samples,
standardized intelligence test scores which are verbally loaded,
and/or achievement subtests with strong language dependent
components (Damico, 1985).
20
Differences in Cognitive Styles
An additional culture attribute Is cognitive style. As a result
of observations made by Tonemah and Brittan (1985), strong tribal
perspectives were associated with the concept of giftedness in the
description of gifted attributes of Native American students.
Characteristics for gifted Native American students were described
as: (a) acquired skills in language, learning, and technological skills;
(b) tribal/cultural understanding referring to their exceptional
knowledge of ceremonies, tribal traditions, and other tribes; (c)
personal/human/qualities such as high Intelligence,
visionary/inquisitive/intuitive, respectful of elders, and creative
skills; and (d) aesthetic abilities, referring to unusual talents in the
visual and performing arts, and arts based in the Indian culture.
Shade (1991) presented a different view of the cognitive
competencies of African-American students. She concluded that
African American students appear to have high motoric capabilities
and use visual perception as a way of protecting and orienting
themselves in the environment rather than for gathering
21
information. African-American students are largely trained to
concentrate more on people and have a preference for affective
materials and a high level of social interaction in their learning
environments.
Ramirez and Castaneda (1974) examined cognitive style and
found that teaching styles used in the classroom may not agree with
the cognitive styles of the students. Ramirez and Castaneda (1974)
observed teachers with different teaching styles. All teachers were
asked to teach the same concept In their perspective classrooms
with their own individual teaching styles. The results of their study
suggested that not all individuals learn through the same cognitive
method.
Beyond having Implications for classroom practices, the
research by Ramirez and Castaneda (1974) provided implications for
assessment. Observed scores may be skewed If an assessment
Instrument requires the use of a particular cognitive style and the
cognitive style of the child is different. Determining the cognitive
22
style of children may provide a context from which to interpret
standardized test scores.
Survev Instruments
In an effort to address the identification of minority children,
survey instruments were developed to assess the skills, knowledge
and/or aptitudes not assessed in traditional Instruments. Some of
the screening tools that have been used successfully by school
districts are the Bella Dranz Multidimensional Screening Device
(Kranz, 1978); the Baldwin Identification Matrix (Baldwin, 1980);
Scales for Rating the Behavioral Characteristics of Superior
Students (Renzulli & Hartman, 1971); GIFTS Talent Identification
Procedures (Perrone and Male, 1981); and the TABS (Frasier, 1990b).
Renzulli and Hartman (1971) developed the Scales for Rating
the Behavioral Characteristics of Superior Students. Raters are
asked to rate the child according to behavioral characteristics. Each
behavioral characteristic contains 8-10 components. A LIkert-type
scale of 1-5 (l=seldom or never, 5=almost always) focusing on
23
specific student behaviors Is utilized. Scoring sheets and
interpretation materials are included with the instrument.
GIFTS Talent Identification Procedures (Perrone & Male, 1981)
is another screening tool consisting of one or more behavior rating
sheets plus scoring and interpretation materials. Raters are asked
to rate the child only In the areas of talent that they have had the
opportunity to observe. Some of the areas that could be assessed are
mathematics, English, music, science, reading, interpersonal
relations, and art.
Instruments utilized as Identification instruments must be
carefully selected. The validity and reliability, the target
population and the limitations of the instrument should be of prime
importance to the educator (Hanson & Linden, 1990). When
checklists and nomination forms are utilized, they should be
sensitive to all reading levels and take into consideration the native
language of the parents. Specific examples and descriptors of how
the characteristics under consideration are exhibited by minority
students need to be understood. It is recommended that teachers and
24
parents complete the same checklists in order to explore
consistencies or discrepancies in their responses (Ford & Harris,
1996).
Shaklee et al. (1994) suggested that the best way to identify
young gifted and talented minority or economically disadvantaged
gifted students is to base observation and assessment procedures on
universal Identifiers of intellectual potential. The display of
potential giftedness does not just occur in school; potential
giftedness Is a 24-hour phenomenon. Persons inside and outside the
educational environment should be involved in any process to
Identify children with extraordinary gifts and talents.
Many alternative forms of assessment-surveys, portfolios,
oral examinations, open-ended questions, essays-rely heavily on
multiple raters. Multiple raters can Improve reliability just as
multiple test items can improve the reliability of standardized
tests. Choosing and training reliable raters can further Improve the
reliability and accuracy of Instruments that depend on the use of
raters (Chambliss & Melmed, 1990; Foster-Gaitskell & Pratt, 1989;
25
Hicks, 1988; Houston, Raymond & Svec, 1991; Wright & Plersel,
1992).
Hicks (1988) evaluated the Parent/Professional Preschool
Performance Profile observational scale through which preschool
teachers and parents evaluate the behavior of disabled or
nondlsabled children In natural settings while the children interact
with familiar adults over prolonged periods of time. Developmental
skills and interfering behaviors were the two main categories
observed and rated. After providing rater-tralning, the involved
teachers and parents rated the scales.
Foster-Gaitskell and Pratt (1989) compared the adaptive
behavior ratings of children with mental retardation by either
parents or teachers using the Adaptive Behavior Scale-School
Edition. All raters were given prior training concerning the
behaviors to be rated. Their findings Indicated no significant
differences in parents' and teachers' ratings in the categories
assessed or in the Importance ascribed to the behaviors evaluated.
26
Foster-Gaitskell and Pratt attribute the reliability of the raters to
the prior training.
Teachers as Raters
Teachers play an important role in the identification of
students for educational programs (Epkins, 1993; Mllich & Landau,
1988; Pelham, Gnagy, Greenslade, & Mllich, 1992). Because they are
able to observe behaviors exhibited in a task-oriented, academic
situation, teachers are often considered appropriate raters of
children's behaviors (Newcorn et al., 1994). Teachers' ability to
make accurate observations is critical in creating a group of
students to be considered for gifted program participation.
However, there has been continuing skepticism about the ability of
teachers to recommend students for an educational program,
especially when they have had no training (Borland, 1978; Clark,
1992; Davis & Rimm, 1994; Gallagher, 1994; Pegnato & Birch, 1959;
Stanley, 1976; Stone & Rosenbaum, 1988). Davis and Rimm (1994)
reminded us that although teacher nominations are widely utilized,
27
they are among the least reliable and least valid measures used to
identify gifted students.
Teachers' expectations of gifted students are often Influenced
by their values and beliefs, thereby significantly Influencing their
decisions, Including referrals for gifted programs. Utilizing
teachers as primary identifiers of gifted learners carries numerous
implications for the recruitment and retention of minority students,
particularly because many teachers are not substantively prepared in
gifted and multi-cultural education. This lack of preparation and
experience decreases the probability that gifted minority students
will be identified and placed in a gifted program (Ford & Harris,
1996; Hansen & Feldhusen, 1994).
A study conducted by Epkins (1993) compared teacher rating
scales with self reporting rating scales of depression, anxiety and
aggression in a sample of elementary school children. While this
study found a significantly high level of agreement (teacher and self
report) for the identification of externalizing symptoms for both
groups; a significant level of agreement for Internalizing symptoms
28
was found for one sample only. Epkins (1993) concluded that a
possible reason for the rater inconsistency might be that many
teachers considered the completion of the rating scale to be too
time consuming. A meta-analysis conducted by Achenbach,
McConaughy, and Howell (1987) concurred with Epkins' findings; low
correspondence between teacher reports and child self-reports
existed in most studies involved In the meta-analysis.
Russikoff's study (1994) Illustrated that preceptions of raters
can influence their decisions on rating scales. Examinations written
by limited-English-speakers were examined, particularly in the
context in which English writing skills were holistically assessed.
The study revealed a lack of Interrater reliability, raters'
perceptions of their role, a reductive approach to scoring, Imprecise
criteria for scoring, confusion between inaccurate and non-standard
structures, and clear prejudice based on the fact that the examinee
was a student of English as a Second Language (ESL). Most raters
felt non-native speakers of English should meet the same criteria
29
for English writing skills as native speakers, and declared that they
graded ESL writers as they would native speakers.
To increase the ability of teachers to accurately identify
giftedness in students, thereby Increasing their performance in the
role of a rater, the teachers must be provided with the Information
that guides their participation. Frasier (1990c) recommended that
staff development be provided to raters, because many teachers hold
stereotypes about gifted students as only well-behaved and
academically successful students. Often these teachers are unlikely
to refer gifted underachieving students and those students who are
currently misbehaving. Training in gifted education can Increase
teachers' understanding, awareness, and competence In recognizing
gifted behaviors (Hansen & Feldhusen, 1994).
Weigle (1994) presented a study on rater training that
involved the analysis of ratings given to Engllsh-as-a-Second-
Language compositions by eight inexperienced and eight experienced
raters both before and after rater training. Each essay was read by
two raters, an Inexperienced rater and an experienced rater during
30
the first (pre-tralning) section of the study. After the first section
of the study, all raters attended mandatory composition rater
training. Findings indicated that before the training was presented,
all raters as a group differed quite significantly from one another in
terms of severity but after the training, a clear distinction between
raters was no longer visible. The rater differences evened out
somewhat after training across the group. Rater consistency
improved, and rater extremism was reduced. Results of this study
confirmed that rater training cannot make raters into duplicates of
one another, but It can make them more consistent.
Parents as Raters
Parents are also valued as raters (Cornell, 1994; Gilbert,
1994), however a number of researchers (Barber & Cernik, 1976;
Christensen, Phillips, Glascow, & Johnson, 1983; Eisenstadt, 1994;
Forehand, Wells, McMahon, Griest, & Rogers, 1982; Rickard, Forehand.
Wells, Griest, & McMahon, 1981; Wall & Paradise, 1981; Webster-
Stratton, 1988) have cautioned against overreliance on parents'
31
perceptions of their children's behaviors and have suggested that
mothers and/or fathers may inaccurately label their children due to
their own personal adjustment problems, including depression,
anxiety, and marital dissatisfaction.
Adults appeared to have different perceptions of a child when
rating the same child on the Barber Scales of Self-Regard (Barber,
1976). Parents of the child disagreed on portions of the scale. The
results of this study indicated that parents had different
perceptions of their child. Barber concluded that, perhaps, both
parents were correct and the child was. In reality, somewhere in
between the levels described by the scale points.
Wall and Paradise (1981) compared mother and teacher reports
on two scales from the Adaptive Behavior Inventory for Children of
the System of Multicultural and Pluralistic Assessment. Results
indicated little agreement between mother and teacher reports.
Mothers tended to provide higher ratings of adaptive behaviors than
did teachers, irrespective of grade level.
32
Parental perception about the behaviors and characteristics of
the child within the family may differ. Eisenstadt (1994) conducted
a study to investigate interparental agreement of the Eyberg Child
Behavior Inventory. In this study, mothers rated their children's
disruptive behavior as more frequent and more problematic than did
fathers. Eisenstadt suggested that parents receive training before
rating their children.
However, when parents are utilized as raters, they, too, should
receive training presentations in order to recognize the different
traits of giftedness, definitions of giftedness, and a thorough
understanding of the identification process (Au & Punfrey, 1993;
Kaplan, 1993). After Inservicing, both parents and teachers will be
better prepared to accept the concepts associated with expanded
views of giftedness, understand more accurately those behaviors
indicating gifted potential, and to determine a variety of objective
and subjective data sources to be used In identification (Chambliss &
Melmed, 1990; Hansen & Feldhusen, 1994; Williams & Hartlage,
1988).
33
Reliability and Validity of Survev Instruments
There are numerous threats to the reliability of scores based
on ratings (SIgafoos & Pennell, 1995). Individuals being rated may
not be performing in their usual manner. The situation or task may
not elicit typical behavior or the raters may be unintentionally
distorting the results. Some of the rater effects are:
1. The halo effect. The Impressions that an evaluator forms
about an individual on one dimension can Influence his or her
impressions of that person on other dimensions. Nisbett and Wilson
(1977), for example, made two videotapes of the same professor. In
the first video, the professor acted In a friendly manner. In the
second video, the professor behaved arrogantly. Students watching
the friendly tape rated the professor more favorably.
2. Stereotyping. The impressions that an evaluator forms
about an entire group can alter his or her impressions about a group
member. In other words, a principal might find a mathematics
teacher to be precise because all mathematics teachers are supposed
to be precise (Hambleton & Powell, 1983).
34
3. Perception differences. The viewpoints and past
experiences of an evaluator can affect how he or she Interprets
behavior. In a classic study, Dearborn and Simon (1958) asked
business executives to identify the major problem described in a
detailed case study. The executives tended to view the problem in
terms of their own departmental functions.
4. Leniency/stringency error. When a rater does not have
enough knowledge to make an objective rating, he or she may
compensate by giving scores that are systematically higher or lower
(Chen, 1993).
5. Scale shrinking. Some raters will not use the end of any
scale (Chen, 1993).
Inter-rater reliability is essential If more than one individual
Is to be involved In the scoring of data pieces. Without Inter-rater
reliability, data from non-scaled measures is unusable (Marsden,
1993). Frasier (1990b) reminded us that to establish inter-rater
reliability, the following guidelines can be followed:
35
1. Have raters Independently score multiple randomly selected
data pieces.
2. Chart scores on each data piece.
3. Identify the response items on which all raters agree.
4. Form a concensus on the Interpretation for scoring.
5. Score another set of randomly selected data pieces.
6. Repeat steps 2 to 5 times until each individual is In
agreement on at least 90 percent of the Items 90 percent of the
time.
One should expect that the initial data set will have divergent
ratings. As subsequent sets are rated, agreement will increase. If
long periods of time elapse between scoring sessions, it might be
necessary to re-establish inter-rater reliability.
Training is needed to establish inter-rater reliability. With
training, inter-rater reliability is established and raters provide
reliable scores (Chambliss & Melmed, 1990; Foster-Gaitskell &
Pratt, 1989; Hanson & Linden, 1990; Houston, Raymond & Svec, 1991;
36
Wright & Piersel, 1992). Generalizability theory provides the
process to observe inter-rater reliability.
Generalizability theory Includes and extends classical test
score theory and is able to estimate the magnitude of the multiple
sources of error simultaneously. Unlike classical test score
analyses, generalizability theory will analyze sources of error
variance and interactions among these sources simultaneously.
Classical test score can only consider a single source of error at a
time. Classical test score also cannot consider the completely
independent or separate interaction effects of the sources of
measurement error variance.
Generalizability theory consists of two stages. The first
stage, the G-study, generates results that are generalizable to the
population of Interest. The second stage, the D-study, is conducted
to determine the most effective protocol to collect data with a
desired degree of reliability (Thompson, 1989). Generalizability
theory provides a powerful method for examining test reliability,
thus providing accurate generalizations.
37
CHAPTER 111
METHODOLOGY
Participants
All data for this study came from Information supplied on the
TABS which was previously requested by the school district. This
study utilized Information supplied by teachers and parents.
Teachers
The raters consisted of regular third grade classroom teachers
and third grade gifted and talented program teachers in one school
district. A regular third grade classroom teacher and a third grade
gifted/talented teacher completed the TABS for each student
nominated for the gifted and talented program In the school district.
Since the possibility existed that more than one student in any
particular third grade classroom could be nominated for the gifted
and talented program, each completed student form was not
necessarily from a different classroom teacher or gifted and
38
talented program teacher. There is a possibility that the same
teacher (either regular classroom teacher and/or gifted and talented
program teacher) completed forms for numerous students. The
researcher did not have access to the students who were rated, the
regular classroom teachers, gifted program teachers or parents who
were the raters.
The regular classroom teachers (N=89) completing the TABS
had between 1 and 20 years of classroom teaching experience (M =
7.9 years, SD = 5.9) . Each teacher reported currently having
students identified as gifted in the classroom. The teachers
indicated that they had received training to recognize gifted
characteristics in students through district in-service training and
university coursework.
The gifted and talented program teachers (N=19) had between 4
and 20 years of teaching experience (M = 7.8 years, SD = 5.0)
Involving gifted students. Each gifted and talented program teacher
also received training to recognize gifted characteristics in
students through district in-service training and university
39
coursework.
Every school In the district participated in the nominating
process and each school was considered by the district to be
ethnically and socioeconomlcally diverse.
Students
One hundred forty-nine third grade students nominated for the
district's gifted and talented program were randomly selected from
the nominated third grade students. Twenty-two forms were
excluded due to Insufficient information, therefore, 127 (73 Anglo,
6 Asian, 16 African-American, 32 Hispanic) third grade students (65
males, 59 females) nominated for the district's gifted and talented
program participated In the study.
The required criteria for a student to be nominated for the
gifted/talented program was that the student be In the third grade.
The student could "self-nominate," or any individual (within the
school or ourside the school environment) could nominate the
student.
40
Parents
Information was supplied on the TABS by 127 parents or
guardians (associated with each participating student). Eight hours
of training was provided to parents by a gifted and talented program
teachers. The parents were trained to identify gifted
characteristics In students. The parents received the same training
as the teachers. Each parent completed the first TABS in February
and again in April.
Instrument
The TABS (Frasier, 1990b) was designed to document
behavioral observations in students to identify giftedness in
culturally different groups (Appendix A). Frasier (1990b) stated
that the TABS meets the requirements of best practices through its
focus on diversity in the gifted population and involvement of people
Inside and outside the school.
The TABS was generated during years one and two of The
National Research Center on the Gifted and Talented (NRC/GT)
41
research project at The University of Georgia. This instrument was
developed with the view that giftedness is a construct. The
psychological concept of the TABS is that giftedness is not itself,
directly measurable, but believed to be Inferred (Bernal, 1978;
Frasier, 1990b; Gardner, 1983; Renzulli, 1973; Torrance, 1969).
Defined as a construct, the Inference of giftedness then Is carried
out through the observation and measurements of traits, aptitudes
and behaviors believed to demonstrate giftedness (Frasier, 1990b).
The TABS was eventually added to the Staff Development Model (SDM
Model), also utilized by The University of Georgia as a comprehensive
training model designed to provide educators with background
Information on giftedness as a psychological construct.
The TABS is a 10 item Instrument using a LIkert-type scale of
1-5 (5=strong, 1=weak) focusing on specific student behaviors. The
rater Is asked to rate the student being referred for assessment in
each of the following items believed to infer giftedness (Bernal,
1978; Frasier, 1990b; Gardner, 1983; Renzulli, 1973; Torrance,
1969):
42
a. communication-unusual ability to communicate (verbally,
nonverbally, physically, artistically, symbolically); uses
particularly apt examples, illustrations, or elaborations;
b. motivation-persistent in pursuing/completing self-
selected tasks (may be culturally Influenced) evident In
school or non-school type activities; enthusiastic learner;
has aspirations to be somebody, do something;
c. interests-unusual or advanced Interests in a topic or
activity; self-starter; pursues an activity unceasingly;
beyond the group;
d. problem solving ability-unusual ability to devise or adapt
a systematic strategy for solving problems and to change
the strategy If it Is not working;
e. memory-already knows; 1-2 repetitions for mastery; has a
wealth of Information about school or non-school topics;
pays attention to details; manipulates Information;
43
f. humor-keen sense of humor that may be gentle or hostile;
large accumulation about emotions; heightened capacity for
seeing unusual relationships; unusual emotional depth;
openness to experiences; heightened sensory awareness;
g. inquiry—asks unusual questions for age; plays around with
ideas; extensive exploratory behaviors directed toward
eliciting information about materials, devices or
situations;
h. insight—has exceptional ability to draw inferences;
appears to be a good guesser; is keenly observant;
Integrates Ideas and disciplines;
1. reasoning—ability to make generalizations; ability to use
metaphors and analogies; can think things through in a
logical manner; critical thinker; ability to think things
through and come up with a plausible answer; and,
44
j . imaglnation/creativity-shows exceptional ingenuity in
using everyday materials; is keenly observant; has wild,
seemingly silly ideas; fluent and flexible producer of ideas;
Is highly curious.
Over 300 public and non-profit private elementary and
secondary school districts (Frasier, 1990c) representing various
ethnic, demographic and socioeconomic groups through the country
have served as major research sites utilizing the TABS. No
published research Is available concerning the TABS.
Procedure
Each year the school district requests that the regular
classroom teacher, the gifted and talented program teacher, and the
parent complete the TABS for each student nominated for the gifted
and talented program. Each of the 127 students in this study had 4
TABS forms; one completed by the classroom teacher, one completed
by the gifted and talented program teacher and two completed by the
parent (the two forms completed by the parent were used in this
45
study to examine the occasion facet). Each teacher completed the
TABS once and each parent completed two ratings of the TABS.
Gathering the second parent rating is the customary procedure by
the school district. The second rating Involved the same parents
who participated in the first rating.
Third grade students were nominated for the gifted/talented
program during January. The parents of the nominated students
were Invited to participate in an 8-hour in-service training taught
by a gifted and talented program teacher. The training was designed
to famllarize the parents with the characteristics of gifted
students.
The training consisted of films illustrating students
participating in a classroom situation. Discussions were conducted
focusing on the gifted characteristics Illustrated by the students In
the film. Assignments and curriculum were briefly discussed,
highlighting higher-level thinking and the products associated with
higher-level thinking. The problems of identifying the culturally
diverse, handicapped or educationally atypical gifted student were
46
also discussed. Of particular benefit (according to the evaluation
form completed at the end of the session) was the question and
answer sessions because many parents seemed to have the same
questions. Upon the completion of the in-service training, each
parent was asked by the school district to complete a TABS for
his/her child.
The regular classroom teacher and the gifted and talented
program teacher were asked to attend the same 8 hour in-service
training as the parents. The training was taught by same gifted and
talented program teacher who taught the parents and the curriculum
was the same as that presented to the parents.
The regular classroom teacher completed a TABS for each
student nominated from her classroom. The gifted and talented
program observed the nominated student in her respective school
while the student worked In the regular classroom. She completed a
TABS after the classroom observation. There was no time limit
Imposed on the teacher's observation time of the nominated student.
47
teacher was allowed as much time as necessary to observe the
student In order to complete the TABS.
Data Analysis
Data was analyzed according to generalizability theory using
GENOVA (Brennan, 1992). The generalizability theory contributes to
the generalizability of results of various dimensions or facets and
reports on the quality of the data (Brennan, 1992). The
generalizability theory was selected for this study because. In the
behavioral and social sciences, such as psychology and education,
generalizability theory offers an extensive conceptual framework
and a powerful set of statistical procedures for addressing
numerous measurement issues. To an extent, generalizability theory
can be viewed both as an extension of classical test theory and as
an application of certain analysis of variance procedures to
measurement models Involving multiple sources of error.
A G-study Is first conducted with the data applicable to the
facets being Investigated; the G-study Is conducted to estimate
48
variance components associated with a universe of admissible
observations. These estimated variance components can then be
used to estimate results for various D-study designs and universes
of generalization. For any D-study, one must specify the objects of
measurement, define the universe of generalization, and identify the
sample sizes and structure of a D-study design. The typical
quantities that are estimated for a specified D-study design and
universe of generalization Include D-study design variance
components, universe score variance, error variances, and a
generalizability coefficient. Usually the magnitudes of these
quantities will vary for different universes of generalization and
for designs that differ In terms of sample sizes and/or structure
(Brennan, 1992).
Three G studies were examined: (1) teacher ratings, (2) parent
ratings, and (3) combined parent and teacher ratings. Finally, G
studies were conducted on each of the total samples (teacher,
parent, combined teacher and parent) to examine the difference
across ethnic groups (Anglo, Asian, African-American, Hispanic).
49
The data was analyzed using a "GENerallzed analysis Of
VArlance system (GENOVA). GENOVA enables the researcher to
Isolate major sources of measurement error and to conduct decision
study (D-study) analysis (Shavelson & Webb, 1991). The D-study
attempted to provide the following Information about the TABS:
1. The minimum number of raters necessary to maintain the
psychometric integrity of the TABS,
2. The effect on reliability and measurement precision of
using more or less than two raters,
3. The effect on reliability and measurement precision of
using more or less than two rating occasions, and
4. The effect on reliability and measurement precision of
using more or less than ten Items.
This study examines the Phi coefficient, not the G coefficient.
The Phi coefficient represents the absolute decision and the G
coefficient represents the relative decision. The absolute decision
(Phi coefficient) is used to Index the absolute level of an
individual's performance without regard to how well or poorly the
50
individual's peer performed. The G coefficient represents the
relative decision. Relative decisions are used to rank order
individuals or groups. Attention is paid to the standing of an
individual relative to others In the G coefficient while the Phi
coefficient focuses on the absolute value of an individual's
performance (Shavelson & Webb, 1991).
Significance of the Study
The information gathered from the first question being
investigated will contribute to the methods of data collection
utilized in the identification of gifted children. By considering the
facets discussed, the data collected In the data sets will contribute
to information concerning the reliability of the Instrument. The
Information gained will contribute to the research of instruments
designed to identify giftedness in children.
In particular, information gained from the final two research
questions will contribute to the reliability and measurement
51
precision of using more or less than two raters and more or less
than two rating occasions.
Information gained from this study will contribute to existing
information for those individuals struggling to find effective ways
to Identify children from economically disadvantaged and limited
English proficient background for participation In programs for the
gifted. Because referrals by educators and those closely associated
with children are traditional first steps in identifying children for
gifted program participation, the knowledge they hold about
giftedness and about the instrument being utilized may have a
profound impact on referral decisions.
52
CHAPTER IV
RESULTS
This study examined the following questions utilizing the
application of generalizability theory.
1. What is the reliability of the TABS scores using
generalizability theory to consider the facets of students, raters,
occasions and items?
2. What is the effect on reliability and measurement precision
of using more or less than two raters?
3. What is the effect on reliability and measurement precision
of using more or less than two rating occasions?
4. What is the effect on reliability and measurement precision
of using more or less than ten items?
Each student was rated once by a regular classroom teacher
and, once by a gifted and talented program teacher and on two
occasions by their parent/guardian. The classroom teacher, gifted
and talented program teacher and parent utilized TABS (Frasier,
53
1990b) to rate the students. Means and standard deviations of
scores are reported.
Descriptive Statistics
The scores obtained from TABS are reported In raw score
format. The instrument utilizes a Likert-type scale of 1-5
(5=strong, 1=weak) focusing on specific student behaviors. Each
rater was asked to rate the student being nominated for assessment.
The scores ranged from 5=strongest to l=weakest. Raters consisted
of a classroom teacher, a gifted and talented program program
teacher and a parent. In order for GENOVA to generate the results
designed for this study, the data were divided into ethnic groups.
Means and standard deviations of scores for the ethnic groups are
reported in Table 1.
Ratings bv the Classroom Teacher
All ethnic groups were rated once by the regular classroom
teacher. Table 1 presents the mean for the ratings by the classroom
54
teacher for the Anglo students is 4.39 with a standard deviation of
.69, the mean for the Asian students is 4.64 with a standard
deviation of .48, the mean for the Hispanic students is 4.24 with a
standard deviation of .73, and the mean for the African-American
students is 4.31 with a standard deviation of .71.
Ratings by the Gifted and Talented Program Teacher
The gifted and talented program teacher also rated all
students. On a range of 1 to 5, the mean for the Anglo students is
4.32 with a standard deviation of 1.65, the mean for the Asian
students is 4.5 with a standard deviation of .61, the mean for the
Hispanic students Is 3.96 with a standard deviation of .76, and the
mean for the African-American students is 4.34 with a standard
deviation of .71 (Table 1).
55
Ratings by the Parents
A parent of each student was asked to rate his/her child on
two different occasions. The mean for the ratings by the parents for
the Anglo students Is 4.35 with a standard deviation of 0.75, the
mean for the Asian students is 4.13 with a standard deviation of .72,
the mean for the Hispanic students Is 4.18 with a standard deviation
of .81, and the mean for the African-American students is 4.56 with
a standard deviation of .63 (Table 1).
Generalizability Theory
Data were analyzed utilizing GENOVA. The FORTRAN 11
program for analysis of variance and generalizability statistical
program, GENOVA (Crick & Brennan, 1983) was used for this study-
Complete GENOVA output files for this study are available from the
author. Generalizability theory treats both conceptual and
56
Table 1. Ratings by Classroom Teacher, Gifted and Talented Program Teacher and Parent.
Mean/ Standard Deviation
Anglo M. Anglo SD
Asian M Asian SD
Afr ican-American M. Afr ican-American SD
Hispanic M. Hispanic SD
Classroom Teacher
(1 Occasion)
4.39 .69
4.64 .48
4.31
.71
4.24 .73
Gifted and Talented Program Teacher
(1 Occasion)
4.32 1.65
4.51 .61
4.34
.72
3.96 .76
Parent
(2 Occasions)
4.35 .75
4.13 .72
4.56
.63
4.18 .81
57
statistical issues associated with generalizing from a sample of
conditions of measurement to a universe of such conditions. In
particular, generalizability theory emphasizes the estimation, use,
and interpretation of variance components associated with
universes.
Generalizability analysis were conducted to partition
systematic and measurement error sources within the data set of
scores obtained on TABS. Students were considered the object of
measurement. Error variance facets were raters, items, and
occasions, as well as the Interaction effects. Both generalizability
and phi coefficients were computed, but only the phi coefficient will
be considered since only absolute decisions are being considered.
The obtained variance partitions were also utilized In decision, or D-
study, analyses to explore the estimated effects on score Integrity
of selected changes In the measurement protocol.
58
Generalizability Analysis
Combined Raters
Analyses were generated by GENOVA to evaluate the data
generated for all the raters (classroom teachers, gifted and talented
program teachers, and parents). The following discussion will not
Include the Asian group of students because this group (N=6) was
considered too small to impact the study. Data for the Asian group
is found in Appendix B.
Combined Ratings for Total Sample
Variance components (Table 2) for the main effects of items
and raters are low (.01-.00) and are likely positive and stable due to
the low standard errors. The variance components for students (.07)
and items (.01) contribute to the high combined variance rate
between students, raters, and Items (.24). The Phi coeffrcient
revealed a rating from combined raters (teachers and parents) at .52.
This rating is considered to be marginal as related to reliability.
59
Combined Ratings for Anglo Sample
Data generated by GENOVA for the Anglo sample presents low
variance components for raters (.00) and students (.00). The
variance components in Table 3 suggest the largest source of
variance noted in the combined raters for the Anglo student
population was for the student, rater, and item interaction (.24).
The Phi coefficient suggests marginal reliability of scores.
Combined Ratings for African-American Sample
GENOVA has provided a G-Study for this population on Table 4.
Variance components, indicated on Table 4, range from .00 to .25.
The highest obtained variance components for the main effects of
students, raters, and Items Is .25. A Phi coefficient obtained for the
combined raters In this population of students reflects unreliable
scores.
60
Table 2. Generalizability Calculations for Combined Ratings, Total Sample.
Variance Components in Terms of G-study Universe (of Admissible Observations) Sizes
Variana
Effect
s* R* 1* SR* S 1* Rl* SRI*
3 Components Variance
Components for Single Observation
.07
.00
.01
.18 .03 .00 .24
Finite Universe
Corrections
1.00 1.00 1.00 1.00 1.00 1.00 1.00
D-study Sampling
Frequencies
1 3 8 3 8
24 24
For Mean Scores
Estimates
.07
.00 .00 .01 .00 .00 .01
Standard Errors
.02
.00
.00 .00 .00 .00 .00
Standard Standard Error of
Variance Deviation Variance
Universe Score Expected Observed Score
Lower Case Delta Upper Case Delta
Mean
.07484
.14451 .06967 .07184 .00331
.27357
.38015 .26396 .26804 .05752
.01902
.01806
.00597
.00612
Generalizability Coefficient - .51788 Phi- .51022
(1.07419) (1.04174)
* Rl = * SRI
* S = Students * R = Raters * 1 = Items * SR = Students combined with Raters * SI = Students combined with Items
Raters combined with Items = Students combined with
Raters combined with Items
61
Table 3. Generalizability Calculations for Combined Ratings, Anglo Sample.
Variance Components In Terms of G-study Universe (of Admissible Observations) Sizes
Variance Components
Effect
s R 1 SR SI Rl SRI
Variance Components for
Single Observation
.08
.00
.00
.16 .03 .01 .24
Finite Universe
Corrections
1.00 1.00 1.00 1.00 T.OO 1.00 1.00
D-study Sampling
Frequencies
1 3
10 3
10 30 30
For Mean Scores
Estimates
.08 .00 .00 .05 .00 .00 .01
Standard Errors
.02
.00 .00 .00 .00 .00 .00
Standard Standard Error of
Variance Deviation Variance
Universe Score Expected Observed Score
Lower Case Delta Upper Case Delta
Mean
Generalizability Coefficient -Phl-
.07621
.13993 .06372 .06519 .00338
.54462
.53897
.27606
.37408
.25243 .25532 .05818
(1.19597) (1.16905)
.02409 .02300 .00715 .00716
62
Table 4. Generalizability Calculations for Combined Ratings, African-American Sample.
Variance Components in Terms of G-study Universe (of Admissible Observations) Sizes
Variance
Effect
s R 1 SR SI Rl SRl
Components Variance
Components for single Observation
.03
.00
.01
.20
.02
.00
.25
Finite Universe
Corrections
1.00 1.00 1.00 1.00 1.00 1.00 1.00
D-study Sampling
Frequencies
1 3
10 3
10 30 30
For Mean Scores
Estimates
.03 .00 .00 .01 .00 .00 .01
Standard Errors
.04
.00 .00 .00 .00 .00 .00
Standard Standard Error of
Variance Deviation Variance
Universe Score Expected Observed Score
Lower Case Delta Upper Case Delta
Mean
Generalizability Coefficient -Phi -
.02552
.10249 .07697 .07764 .00669
.24897
.24737
.15974
.32014
.27744 .27863 .08180
(.33151) (.32868)
.03875
.03416
.01826
.01752
63
Combined Ratings for Hispanic Sample
GENOVA generated the G-study shown In Table 5. This G-study
reports a low overall standard error for the mean scores ranging
from .00 to .04. The highest obtained variance component is found In
the student/rater effect (.21) and the student/rater/item effect
(.26). A Phi coefficient Indicates questionable scores.
Teacher Ratings
Analyses were also generated by GENOVA to evaluate the data
generated for teacher ratings for the total sample and teacher
ratings with Individual ethnic student groups (Anglo, African-
American, Hispanic) completing the TABS.
64
Table 5. Generalizability Calculations for Combined Ratings, Hispanic Sample.
Variance G-study Universe
Variance Components
Effect
s R 1 SR Sl Rl SRI
Variance Components for
Single Observation
.07
.02
.00
.21
.03
.00
.26
! Components in Terms of (of Admiss
Finite Universe
Corrections
1.00 1.00 1.00 1.00 1.00 1.00 1.00
ibie Observations) Sizes
D-study Sampling
Frequencies
1 3
10 3
10 30 30
For Mean Scores
Estimates Standard Errors
.07 .04 .00 .00 .00 .00 .07 .01 .00 .00 .00 .00 .00 .00
Standard Standard Error of
Variance Deviation Variance
Universe Score Expected Observed Score
Lower Case Delta Upper Case Delta
Mean
Generalizability Coefficient -Phi -
.07736
.15870
.08134 .08696 .01058
.48746
.47080
.27814
.39837 .28520 .29488 .10284
(.95107) (.88964
.04148 .03907 .01394 .01456
65
Teacher Ratings for Total Sample
Table 6 reports the analysis of the data for the teacher ratings
for the total sample of the TABS as generated by GENOVA.
GENOVA utilizes the variance components in Table 6 to estimate the
variance components of the G-study universe. Those estimates, as
well as overall generalizability and Phi coefficients, are reported.
The most significant sources of variance found for teacher raters
were the students' main effect (.11) and the two-way interactions
of students and raters (.16). The interaction of all facets (students,
raters, and Items) resulted in a variance component of .23. The main
effects and Interactions that excluded the student component or the
systematic variance contributed no or negligible variance to the
obtained scores. Student and rater effect comprised the largest
portion of variance among the facets. The Phi coefficient of .51
suggests that teachers as raters need to be examined further. The
standard errors of the mean scores computed by GENOVA, ranging
from .00 to .03, are low relative to the estimated variance
components, which ranged from .00 to .23. Therefore, the variance
66
components appear to provide stable estimates of teacher rater
observations.
Teacher Ratings for Anglo Sample
GENOVA generated the G-study results for Table 7. This table
contains the variance components derived from the algorithm
method and estimated mean square equations for this population.
Standard errors for the mean scores were low and ranged from
.00 to .04. The interaction of all facets (students, raters, items)
produced the most variance (.22) among the possible sources. The
Phi coefficient of .59 suggests that further investigation may be
needed.
The variance components for single observations revealed low
components for raters (.00) and items (.00). Students contributed
the greatest portion (.13) of the interaction between students,
raters, and items (.22).
67
Table 6. Generalizability Calculations for Teacher Rating, Total Sample.
Variance Component; 5 in Terms of G-study Universe (of Admissible Observations) Sizes
Variance Components
Effect
s R 1 SR SI Rl SRI
Variance Components for
Single Observation
.11
.00
.00
.16
.03
.00
.23
Universe Score Expected Observed Score
Lower Case Delta Upper Case Delta
Mean
Generalizability Coefficient -Phi-
Finite Universe
Corrections
1.00 1.00 1.00 1.00 1.00 1.00 1.00
Variance
.10727
.20369
.09642
.10164
.00683
.52663
.51349
D-study Sampling
Frequencies Es
1 2
10 2
10 20 20
Standard Deviation
.32752
.45132
.31052
.31880
.08266
(1.11253) (1 .05544)
For Mean Scores
timates Standard Errors
.11 .03 .00 .00 .00 .00 .08 .01 .00 .00 .00 .00 .01 .00
Standard Error of Variance
.02814
.02556
.01177
.01261
68
Table 7. Generalizability Calculations for Teacher Ratings, Anglo Sample.
Variance G-study Universe
Variance Components
Effect
s R 1 SR SI Rl SRI
Variance Components for
single Observation
.13
.00
.00
.14
.03
.00
.22
J Components in Terms of (of Admissible Observations) Sizes
Finite Universe
Corrections
1.00 1,00 1.00 1.00 1.00 1.00 1.00
D-study Sampling
Frequencies
1 2
10 2
10 20 20
For Mean Scores
Estimates Standard Errors
.13 .04 .00 .00 .00 .00 .07 .01 .00 .00 .00 .00 .01 .00
Standard Standard Error of
Variance Deviation Variance
Universe Score Expected Observed Score
Lower Case Delta Upper Case Delta
Mean
Generalizability Coefficient -Phi -
.12964
.21439 .08475 .08740 .00560
.60471
.59730
.36006
.46302 .29111 .29564 .07481
(1.52980) (1.48326)
.03775 .03525 .01353 .01365
69
Teacher Ratings for African-American Sample
Table 8 displays the GENOVA results for the African-American
student population. The rater and item main effects are zero since a
negative estimate was generated. The leading source of variance
(.22) was produced by the interaction of all facets (students, raters,
items). A Phi coefficient of .38 was computed for the teacher
ratings for the African-American student sample. This coefficient
further supports the notion that additional research in this area is
needed to establish reliability.
Teacher Ratings for Hispanic Sample
The G-study results (Table 9) for this population obtained
variance components ranging from .00 to .24. The variance
component of .24, was produced by the interaction of all the facets
(students, raters, items). Remaining variance components are likely
positive and stable due to low standard errors. The Phi coefficient
of .22 suggests that additional investigation Is needed.
70
Table 8. Generalizability Calculations for Teacher Ratings, African-American Sample.
Variance Components in Terms of G-study Universe (of Admissible Observations) Sizes
Variance Components
Effect
S R 1 SR SI Rl SRI
Variance Components for
Single Observation
.06
.00
.00
.17
.02
.02
.22
Finite Universe
Corrections
1.00 1.00 1.00 1.00 1.00 1.00 1.00
D-study Sampling
Frequencies
1 2
10 2
10 20 20
For Mean Scores
Estimates Standard Errors
.06 .06 .00 .00 .00 .00 .09 .03 .00 .00 .00 .00 .01 .00
Standard Standard Error of
Variance Deviation Variance
Universe Score Expected Observed Score
Lower Case Delta Upper Case Delta
Mean
Generalizability Coefficient -Phi -
.06093 .16066 .09973 .10060 .01091
.37923
.37720
.24683
.40082 .31580 .31717 . 10444
(.61091) (.60565)
.06446 .05510 .03345 .03137
71
Table 9. Generalizability Calculations for Teacher Ratings, Hispanic Sample.
Variance Components In Terms of G-study Universe (of Admissible Observations) Sizes
Variance
Effect
s R 1 SR SI Rl SRI
1 Components Variance
Components for Single Observation
.04
.03
.00
.24
.03
.00
.24
Finite Universe
Corrections
1.00 1.00 1.00 1.00 1.00 1.00 1.00
D-study Sampling
Frequencies
1 2
10 2 8
20 20
For Mean Scores
Estimates Standard Errors
.04 .05
.02 .02 .00 .00 .00 .03 .00 .00 .00 .00 .01 .00
Standard Standard Error of
Variance Deviation Variance
Universe Score Expected Observed Score
Lower Case Delta Upper Case Delta
Mean
Generalizability Coefficient -Phi -
.04191
.17794 .13603 .15210 .02163
.23554
.21604
.20473
.42183 .36882 .39000 .14707
(.30812) (.27557)
.05468
.04381
.03272
.03574
72
Parent Ratings
Analyses were also generated by GENOVA to evaluate the data
generated for individual ethnic student groups (Anglo, African-
American, Hispanic) completing theTABS.
Parent Ratings for Total Sample
A G-study for the parents as raters for the TABS was
conducted utilizing GENOVA (Table 10). All students were included
in the data for Table 10 as well as two occasions (ratings) and 10
items.
The main effects for occasions produced the lowest variance
(.00). The leading source of variance (.23) was produced by the
interaction of students and occasions. The standard errors for mean
scores, ranging from .00 to .03, are low. The Phi coefficient of .82
Indicates an acceptable measure of parents as raters.
73
Table 10. Generalizability Calculations for Parent Ratings, Total Sample.
Variance
Effect
s 0 i so SI 0! SOI
Variance G-study Universe
Components Variance
Components for Single Observation
.18
.00
.02 .03 .10 .00 .23
i Components in Terms of (of Admissible Observations) Sizes
Finite Universe
Corrections
1.00 1.00 1.00 1.00 1.00 1.00 1.00
D-study Sampling
Frequencies
1 2
10 2
10 20 20
For Mean Scores
Estimates Standard Errors
.18 .00 .00 .02 .00 .00 .01
.03
.00 .00 .00 .00 .00 .00
Standard Standard Error of
Variance Deviation Variance
Universe Score Expected Observed Score
Lower Case Delta Upper Case Delta
Mean
Generalizability Coefficient -Phi -
.18124
.21982 .03857 .04067 .00384
.82451
.81673
.42573
.46885 .19640 .20167 .06196
(4.69846) (4.45655)
.02784
.02759
.00375
.00385
74
Parent Ratings for Anglo Sample
The standard errors for mean scores, as Indicated in Table 11,
were low, ranging from .00 to .03 and the remaining variance
components are likely positive and stable due to the low standard
error. The greatest Interaction between the student, occasions and
items produced the most variance (.22) among the possible sources.
A Phi coefficient of .75 suggests reliable scores.
Parent Ratings for African-American Sample
The standard error for mean scores (Table 12) for the African-
American population for alt areas remained low. Occasions (.00) and
items (.00) provided the lowest variance components for a single
observation. The interaction of students, occasions, and Items
produced the most variance (.26). The Phi coefficient or
dependability Index was computed at .68 suggesting that the parent
ratings obtained on the TABS reflect fairly reliable scores.
75
Table 11. Generalizability Calculations for Parent Ratings, Anglo Sample.
Variance G-study Universe
Variance Components
Effect
s 0 1 so SI 01 SOI
Variance Components for
Single Observation
.14
.00
.02 .04 .12 .00 .22
1 Components In Terms of (of Admissible Observations) Sizes
Finite Universe
Corrections
1.00 1.00 1.00 1.00 1.00 1.00 1.00
D-study Sampling
Frequencies
1 2
10 2
10 20 20
For Mean Scores
Estimates Standard Errors
.14 .03
.00 .00 .00 .00 .02 .00 .00 .00 .00 .00 .01 .00
Standard Standard Error of
Variance Deviation Variance
Universe Score Expected Observed Score
Lower Case Delta Upper Case Delta
Mean
Generalizability Coefficient -Phi -
.14229
.18616
.04387 .04734 .00603
.76436
.75035
.37722 -43146 .20944 .21758 .07764
(3.24385) (3.00557)
.03109 .03060 .00546 .00546
76
Table 12. Generalizability Calculations for Parent Ratings, African-American Sample.
Variance Components in Terms of G-study Universe (of Admissible Observations) Sizes
Variance Components
Effect
s 0 1 so SI 01 SOI
Variance Components for
Single Observation
.08
.00
.00 .04 .03 .00 .26
Finite Universe
Corrections
1.00 1.00 1.00 1.00 1.00 1.00 1.00
D-study Sampling
Frequencies
1 2
10 2
10 20 20
For Mean Scores
Estimates
.08 .00 .00 .01 .00 .00 .01
Standard Errors
.04
.00 .00 .00 .00 .00 .00
Standard Standard Error of
Variance Deviation Variance
Universe Score Expected Observed Score
Lower Case Delta Upper Case Delta
Mean
Generalizability Coefficient -Phi -
.07602 .11191 .03589 .03653 .00764
.67931
.67541
.27571
.33452 .18944 .19114 .08741
(2.11823) (2.08079)
.04002 .03838 .01133 .01081
77
Parent Ratings for Hispanic Sample
The most significant Information for the Hispanic sample
(Table 13) involved the student main effects (.30) and those of
students, occasions, items (.23). The student facet also produced
the highest standard error for the mean scores (.08). The remaining
variance components are likely positive and stable due to the low
standard errors. A Phi coefficient of .91 Is considered high and
Indicates a reliable score.
Decision Studies Considerations
The primary purpose of this study was to assess the reliability
of the TABS and the impact of raters and occasions on overall
reliability. Several D-studies were conducted to determine how the
generalizability and error coefficients would be affected by
different research designs; In particular, an increase or decrease in
the occasion or rater factors.
78
Table 13. Generalizability Calculations for Parent Ratings, Hispanic Sample.
Variance G-study Universe
Variance Components
Effect
S 0 1 SO SI 01 SOI
Variance Components for
Single Observation
.30
.00
.02 .02 .09 .00 .23
J Components in Terms of (of Admissible Observations) Sizes
Finite Universe
Corrections
1.00 1.00 1.00 1.00 1.00 1.00 1.00
D-study Sampling
Frequencies
1 2
10 2
10 20 20
For Mean Scores
Estimates
.30
.00 .00 .00 .00 .00 ,01
Standard Errors
.08
.00 .00 .00 .00 .00 .00
Standard Standard Error of
Variance Deviation Variance
Universe Score Expected Observed Score
Lower Case Delta Upper Case Delta
Mean
Generalizability Coefficient -Phi -
.29856
.32671
.02816
.02981
.01186
.91382
.90923
.54640
.57159 .16780 .17264
.10890
(10.60357) (10.01682)
.08060 .08043 .00520 .00513
79
While the TABS has a fixed number of items, a D-study was
conducted to evaluate the effect on reliability of omitting Items
from the survey instrument. The results of this D-study are found in
Table 14. The impact of Incomplete ratings was addressed by
altering the number of Items considered In the D-study. Although
the Items on TABS are fixed, often raters do not respond to all of the
Items. The TABS contains 10 Items and the Phi coefficient remains
near .51 when 8 or more items are completed. The Phi coefficient
achieved when less than 8 items were completed was below .49.
The result of another D-study, Table 15 and Table 16, utilizes
the variance components reported by GENOVA to derive the
coefficient for the different measurement protocols. Table 15
illustrates the parents as raters with two rating occasions. Table
16 Illustrates the combined raters and one rating occasion. Optimum
reliability for the TABS was achieved when the student was rated on
two occasions by parent raters (Table 15). The least desirable
situation examined was one occasion and one rater (Table 16).
80
Table 14. D-Study: Phi Coefficients for Items Omitted from TABS.
3 Raters, 1 Observation per Rater Item Numbers Varied
Items Phi Coefficients
10 .51
8 .50
5 .47
1 .30
81
Table 1 5. D-Study: Phi Coefficients of TABS with Parents As Raters and Occasions Varied
Ethnic Group
Occasions/ Raters
Anglo African-American
Hispanic
2 Occasions 1 Rater (Parent) ,75 .68 .90
1 Occasion 1 Rater (Parent)
.64 .52 .85
82
Table 16. D-Study: Phi Coefficients of TABS for Raters and Occasions, Varied.
Ethnic Groups
Raters/ Occasions
Anglo African-American
Hispanic Combined
3 Raters 1 Occasion ,54 .25 .47 .51
2 Raters 1 Occasion -44 18 .37 .44
1 Rater 1 Occasion .29 .10 .23 .26
83
CHAPTER V
DISCUSSION
If we are to become more effective in recognizing gifted
potential in minority, economically disadvantaged and limited
English proficient student populations, then a number of issues must
be addressed. This study has examined one of those Issues: the
reliability of an instrument utilized by educators to identify
minority, economically disadvantaged and limited English proficient
students for gifted programs.
This study was designed to test the reliability of the TABS
while simultaneously considering several sources of variance and
computing a reliability coefficient for the students, raters, Items,
and occasions. The occasions (test-retest) reliability coefficient
utilizing parents as raters was examined. The utility of teachers as
raters and parents as raters was investigated and reviewed with
respect to previous research. Finally, the limitations of GENOVA and
84
this study were considered along with implications for future
research using TABS.
Generalizability Finding for TABS
The TABS is designed to document behavioral observations in
order to help guide the search for giftedness In culturally different
groups. Frasier (1990b) stated that TABS meets the requirements
of best practices through its focus on diversity In the gifted
population and involvement of people Inside and outside the school
and that TABS should only be utilized after the raters are provided
training to recognize gifted characteristics. The results of these G-
studies yielded coefficients suggesting that raters participating in
this study did not establish Inter-rater reliability before completing
the survey instrument. The results of this study also sugggest that
the TABS should not be used as the sole Identification instrument,
but rather as an adjunct to a comprehenslsve identification
portfolio.
85
D-Studv Findings for TABS
A results of a D-study revealed increases In the Phi
coefficients when parents rated his/her child on more than one
occasion. The Phi coefficient for the Hispanic ratings (2 occasions,
1 parent) was .90, indicating a very reliable rating. The lowest Phi
coefficient is found with the ratings (1 occasion, 1 parent) for the
African-American population. The overall Phi coefficients would
indicate that the training provided the parents before they rated
their children was effective.
A second D-study result, reported in Table 16, revealed
appreciable increases in the Phi coefficient when the number of
observations and raters was Increased from the original design of
one occasion/one rater (.26) for the combined raters to two
occasions/three raters (.51) for the combined raters. This D-study
suggests that additional raters and occasions increase the Phi
coefficient. While cost and time must be considered when utilizing
any instrument, this D-study suggests that additional raters and
occasions add to the reliability of the Instrument. Both coefficients
86
of this D-study suggest that teachers as raters need to be examined
further.
The Phi coefficients dramatically increased for all groups of
students. The highest Phi coefficient for the African-American
population, however, did not indicate reliability (.25 with 3 raters, 1
occasion). This low Phi coefficient would suggest low reliability
concerning scores within this population of students.
Teachers As Raters
This study supports previous findings that teachers play an
important role In the identification of students for educational
programs (Epkins, 1993; Mllich & Landau, 1988; Pelham, Gnagy,
Greenslade, & Mllich, 1992). It is important for teachers to be a
part of the selection process because they have data to offer that is
not available to other members of the selection team. However, the
particular beliefs and attitudes of the teacher must be considered
(Pegnato & Birch, 1959), especially when the identification process
takes place at the elementary level (Jacobs, 1971).
87
Pegnato and Birch (1959) suggested that teachers most often
choose children like themselves as gifted. Whatever the teacher
values will be the criterion for selection. Often the quiet, well-
behaved, well-dressed child who gets good grades Is a prime target
for teacher selection. In their study, Pegnato and Birch found that
teachers Identified only 45% of the students in their classes who
were cognltively gifted, actually missing 55%. They suggested that
systematic bias may exist among teachers when attempting to
identify giftedness in students.
One way to help teachers know how to Identify the
characteristics of gifted learners Is to improve their accuracy in
selecting children who demonstrate these characteristics. Providing
teachers with Information about the common characteristics found
among gifted children would encourage them to look for those
characteristics they might otherwise miss (Borland, 1978; Gear
1978).
If teachers cannot recognize gifted characteristics in the
students in their classes, there may be waste of human potential.
88
Gear (1978) found that the effectiveness of teacher selection was
improved with a training program. The teachers participating in the
training program were twice as effective as were untrained
teachers. Teachers can improve their efficiency in selecting gifted
students when they are provided training to recognize gifted
behaviors in all groups of children. This Is especially important
when attempting to recognize the gifted abilities of children from
economically disadvantaged and limited English proficient
backgrounds. With training, the likelihood exists that gifted
children from underrespesented groups will also be better
recognized (Borland, 1978).
Parents As Raters
This study also found high Phi coefficients (Table 15) in
ratings by parents. This high Phi coefficient may be because parents
are aware of the behavior of their child and can provide Information
that is clearly Indicative of potential giftedness (Jacobs 1971).
89
The Phi coefficient findings in this study clearly Indicated that
parents as raters provided reliable data.
Limitations of the Study
The sample consisted of 127 students. Regular classroom
teachers and gifted and talented program teachers served as raters.
The sample size was considered adequate and the data provided an
accurate representation of the TABS results. The sample for this
study was limited to third grade students, thus limiting the results
outside this grade range. Generalizations to other groups should be
considered with caution.
Data for this study was collected from third grade teachers
across the school district, gifted and talented program teachers, and
parents. All raters had received eight hours of gifted and talented
training focusing on the characteristics of gifted and talented
children.
90
Directions for Future Research Using the TABS Form
The overall reliability of the TABS was examined in this study.
Existing research for the TABS does not exist. Additional studies
should be conducted to further examine the reliability of this
instrument and its usefulness in the identification of gifted
individuals.
The results obtained in this study suggested that reliability
may be Improved if Inter-rater reliability is established. Inter
rater reliability is essential if more than one Individual Is to be
Involved In the rating. Without inter-rater reliability, data are
invalid. Several hours (above those suggested to famllarize raters
with the characteristics of gifted behavior) should be allowed to
establish inter-rater reliability. One should expect that Initial data
set will have divergent ratings. As subsequent sets are rated,
agreement will increase. If long periods elapse between rating
sessions, it is necessary to reestablish Inter-rater reliability.
91
Implications
Information from this survey presents several directions for
the following related research. First, organize a follow-up of this
study to determine what changes were made in the school district
involved In this study to prepare their teachers and parents to meet
the needs of gifted students. Second, implement a survey of gifted
programs presently utilizing TABS to ascertain the extent of
training received by teachers and parents in order to identify gifted
characteristics in children. Third, implement a statewide survey of
gifted education programs to ascertain their methods of identifying
gifted students. Fourth, determine, from the statewide survey, If a
new identification instrument is needed.
New gifted Identification Information will contribute to those
individuals struggling to find effective ways to identify children
from economically disadvantaged and limited English proficiency
background for participation in programs for the gifted. Because
referrals by educators and those closely associated with children
are traditional first steps in identifying children for gifted program
92
participation, the knowledge they hold about giftedness and about
the Instrument being utilized may have a profound impact on referral
decisions.
93
REFERENCES
Achenbach, T. M., McConaughy, S. H., & Howell, C. T. (1987). Child/adolescent behavioral and emotional problems: Implications of cross-informant correlations for situational specificity. Psychological Bulletin 101. 213-232.
Au, M. L., & Punfrey, P. D. (1993). Parents' and teachers' expectations of children's attainments: Match or mismatch? British Journal of Special Education 20. 109-120.
Baldwin, A. Y. (1978). Curriculum and methods: What Is the difference? In A. Baldwin, G. Gear, & L. Luclto (Eds.), Educational planning for the gifted: Overcoming cultural, geographic, and socioeconomic barriers (pp. 37-49). Reston, VA: Council for Exceptional Children.
Baldwin, A. Y. (1980). The Baldwin Identification Matrix. Its development and use in programs for the gifted child. Philadelphia, PA: Paper presented at the Convention of the Council for Exceptional Children.
Barber, L W. & Barton, K. (1971) Useabllltv bv raters of the Barber Scales of Self-Regard for Preschool Children. Boston, MA: Paper presented at the Annual Meeting of the Eastern Psychological Association.
Barkan, J. H., & Bernal E. M. (1991). Gifted education for bilingual and limited English proficient students. Gifted Child Ouarterlv. 22. 144-147.
94
Bermudez, A. B., & Rakow, S. J.. (1990). Analyzing teachers' perceptions of identification procedures for gifted and talented Hispanic limited English proficient students at-risk. Journal of Educational Issues of Language Minority Students. 7. 21-33.
Bernal, E. (1978). The identification of gifted Chicano children. In A. Baldwin, G.Gear, & L. Luclto (EDS.). Educational planning for the gifted. Reston, VA: Council for Exceptional Children.
Borland, J. (1978). Teacher identification of the gifted: A new look. Journal for the Education of the Gifted. 2. 22-32.
Brennan, R. L (1983). Elements of generalizibilitv theory. Iowa City, lA: American College of Testing.
Brennan, R. L. (1992). An NOME Instructional module on generalizability theory. Instructional Theory In Educational Measurement. 27-34.
Brennan, R. L (1992). Elements of generalizability theory. Iowa City, lA: American College Testing Program.
Chambliss, C, & Melmed, M. (1990). Attitudinal and behavioral responses toward parent clientele of parent and nonparent child care providers. ERIC No. ED 320689.
Chen, X. (1993). Reliability coefficient and correlation ratio between the observed scores and latent trait. Acta Psychological Sinica. 25, 395-399.
Christensen, A., Phillips, S., Glascow, R. E., & Johnson, S. M. (1983). Parental characteristics and Interactional dysfunction in families with child behavior problems: A preliminary Investigation. Journal of Abnormal Child Psychology 11. 153-166.
95
Clark, B. (1992). Growing up gifted: Developing the potential of children at home and at school (4th ed.). New York: Merrill.
Cornell, D. G. (1994). Low Incidence of behavior problems among elementary school students in gifted programs. Journal for the Education of the Gifted 18. 4-19.
Crick, J. E., & Brennan, R. L (1983). Manual for GENOVA: A GENerallzed analysis Of VArlance system. (Number 43). Iowa City, lA: American College of Testing.
Cronbach, L. J., Gleser, G. C, Nanda, H., & Rajaratnam, N. (1972). The dependability of measurements. New York: John Wiley & Sons, Inc.
Damico, J. S. (1985). Clinical discourse analysis: A functional approach to language assessment. In C. Simen, Communication skills and classroom success: Assessment of language-learning disabled students, (pp. 165-204). San Diego, CA: College-Hill.
Davis, G. A., & Rimm, S. B. (1994). Education of the gifted and talented (3rd ed.). Englewood Cliffs, NJ: Prentice Hall.
Dearborn, D. C, & Simon, H. A. (1958). Selective perception: A note on the departmental identification of executives. Soclometry. 140-148.
de Bernard, A. E. (1985). Why Jose can't get into the gifted class: The bilingual child and standardized reading tests. Roeper Review. S, 80-82.
DeHaan, R. F., & Havighurst, R. J. (1961). Educating gifted children. Chicago, IL: University of Chicago Press.
96
Delcourt, M., Loyd, B., Cornell, D., Goldberg, M., & Bland, L. (1994). Evaluation of the effects of programming arrangements on student learning outcomes. (Research Monograph). The University of Virginia, Charlottesville, Virginia, Department of Education.
Delgado-Galtan, C, & Trueba, H. T. (1985). Ethnographic study of the participant structures in task completion: Reinterpretation of "handicaps" in Mexican children. Learning Disability Ouarterlv. 8, 67-75.
Eason, S. (1991). Why generalizability theory yields better results than classical test theory: A prier with concrete examples. In B. Thompson (Ed.), Advances In educational research: Substantive finding methodical development (pp. 83-98). Greenwich, CT: JAI Press, Inc.
Eisenstade, T. H. (1994). Interparent agreement on the Eyberg Child Behavior Inventory. Child and Family Behavior Therapy. 16. 21-27.
Epkins, C. C. (1993). A preliminary comparison of teacher ratings and child self-report depression, anxiety, and aggression in inpatient and elementary school samples. Journal of Abnormal Child Psychology. 21. 649-661.
Ford, D. Y., & Harris, J. J. (1990). On discovering the hidden treasure of gifted and talented black children. Roeper Review. 13. 27-32.
Ford, D. Y., & Harris, J. J. (1996). Recruiting and retaining diverse students In gifted education: Pitfalls and promises. Tempo 16. 8-12.
97
Forehand, R., Lautenschlager, G. J., Faust, J., & Grazlano, W. G. (1986). Parent perceptions and parent-child Interactions in clinic-referred children: A preliminary investigation of the effects of maternal depressive moods. Behavior Research and Therapy 24, 73-75.
Foster-Gaitskell, D., & Pratt, C. (1989). Comparison of parent and teacher ratings of adaptive behavior of children with mental retardation. American Journal on Mental Retardation. 94. 177-181.
Frasier, M. M. (1987). The identification of gifted black students: Developing new perspectives. Journal for the Education of the Gifted. 10. 155-180.
Frasier, M. M. (1990a). An investigation of giftedness in economically disadvantaged and limited English proficient populations. Athens, GA: The National Research Center on the Gifted and Talented Proposal, The University of Georgia.
Frasier, M. M. (1990b). Identifying the gifted: Observation and rating scales. Athens, GA: The Torrance Center for Creative Studies, The University of Georgia.
Frasier, M. M. (1990c). Instruction manual: Using the Frasier talent assessment profile (F-TAP). Athens, GA: The University of Georgia.
Frasier, M. M. (1991). Response to Kitano: The sharing of giftedness between culturally diverse and non-diverse gifted students. Journal for the Education of the Gifted. 15. 20-30.
Frasier, M. M., & Passow, A. H. (1994). Toward a new paradigm for Identifying talent potential: Executive summary (Research Monograph 94112). Storrs, CT: University of Connecticut, The National Research Center on the Gifted and Talented.
98
Frierson, E. C. (1965). Upper and lower status gifted children: A study of differences. Exceptional Children. 32. 83-90.
Gallagher, J. J. (1994). Teaching the gifted child (4th ed.). Boston: Allyn & Bacon.
Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. New York: Basic Books.
Gear, G. (1978). Effects of training on teachers' accuracy in the Identification of gifted children. Gifted Child Quarterly. 11. 90-97.
Gilbert, S. L. (1994). Parental and professional agreement In the assessment of children with disabilities: An examination. ERIC No. ED 378705.
Hambleton, R. K., & Powell, S. (1983). A framework for viewing the process of standard setting. Evaluation and the Health Professions. 6. 3-24.
Hanson, J. B. & Feldhusen, J. F. (1994). Comparison of trained and untrained teachers of gifted students. Gifted Child Quarterly. 38, 115-121.
Hanson, J. B., & Linden, L. W. (1990). Selecting Instruments for identifying gifted and talented students. Roeper Review 13. 10-15.
Hicks, J. S. (1988). The five D'S reolication study. Syosset, NY: Paper presented at Annual Convention of the Council for Exceptional Children.
Hoge, R. D. (1989). An examination of the giftedness construct. Canadian Journal of Education. 14. 6-17.
99
Houston, W. M., Raymond, M. R., & Svec, J. C. (1991). Adjustments for rater effects. Applied Psychological Measurement. 15. 409-421.
Hunsaker, S. L., & Callahan, C. M. (1993). Evaluation of gifted programs: Current practices. Journal for the Education of the Gifted. 16. 190-200.
Jacobs, J. (1971). Effectiveness of teacher and parent identification of gifted children as a function of school level. Psychology in the Schools. 8. 140-142.
Johnson, S., & Bell, J. (1985). Evaluating and predicting survey efficiency using generalizability theory. Journal of Educational Measurement. 22. 107-119.
Kaplan, J. A. (1993). The co-parenting system: Longitudinal effects for kindergartners of differences between mothers' and fathers' parenting styles. New Orleans, LA: Paper presented at the Biennial Meeting of the Society for Research In Child Development.
Keller, M. (1990). Holistic Identification of potentially gifted students: An alternative to the matrix. Instructional Leader, 12, 4-7.
Kitano, M. K., & Kirby, D. F. (1986). Gifted education: A comprehensive view. Boston, MA: Little, Brown.
Kranz, B., (1978). Multi-dimensional screening device for the identification of gifted/talented children. Grand Forks, ND: Bureau of Educational Research and Services, University of North Dakota.
100
Maker, C. J., & Schiever, S. W. (Eds.). (1989). Critical issues in gifted education: Defensible programs for cultural and ethnic minorities. Austin, TX: Pro-Ed.
Marland, S. P. (1972). Education of the gifted and talented: Report to the Congress of the United States. Washington, D.C.: U.S. Government Printing Office.
Marsden, P., (1993). The reliability of network density and composition measures. Social Networks. 15. 399-421.
Milich, R., & Landau, S. (1988). Teacher ratings of inattentlon/overactlvity and aggression: Cross-validation with classroom observations. Journal of Clinical Child Psychology. 17, 92-97.
Newcorn, J. H., Halperin, J. M., Schwartz, S., Pascualvasa, D., Wolf, L., Schmeidler, J., & Sharma, V. (1994). Parent and teacher ratings of attention-deficit hyperactivity disorder symptoms: Implications for case identification. Developmental and Behavioral Pediatrics. 86-91.
Nisbett, R. E., & Wilson, T. D. (1977). The halo effect: Evidence for the unconscious alteration of judgments. Journal of Personality and Social Psychology. 35. 450-456.
Office of Educational Research and Improvement. (1993). National excellence: A case for developing America's talent. Washington, DC: U.S. Department of Education.
Pagano, R. R. (1990). Understanding statistics in the behavioral sciences. New York: West Publishing Company.
Passow, A. H., & Rudnitski, R. A. (1993). State Dolicles regarding education of the gifted as reflected in legislation and regulation (CRS93302). Storrs, CT: University of Connecticut.
101
Pegnato, W., & Birch, J. (1959). Locating gifted children In junior high school. Exceptional Children. 25. 300-304.
Pelham, W. E., Gnagy, E. M., Greenslade, K. E., & Mllich, R. (1992). Teacher ratings of DSM-lll-R symptoms for the disruptive behavior disorders. Journal of the American Academy of Child and Adolescent Psychiatry. 31. 210-218.
Pendarvis, E. D., Howley, A. A., & Howley, C. B. (1990). The abilities of gifted children. Englewood Cliffs, NJ: Prentice Hall.
Perrone, P., & Male, R. (1981). The developmental education and guidance of talented learners. Rockville, MD: Aspen Publications.
Ramirez, M., & Castaneda, A. (1974). Cultural democracy, blcognative development, and education. New York: Academic Press.
Renzulli, J. (1973). Talent potential in minority group students. Exceptional Children. 39. 437-444.
Renzulli, J., & Hartman, R. (1971. Scale for rating behavioral characteristics of superior students. Exceptional Children. 38. 243-248.
Renzulli, J. (1973). Talent potential in minority group students. Exceptional Children. 39. 437-444.
Rickard, K. M., Forehand, R., Wells, K. C, Griest, D., L, & McMahon, R. J., (1981). Factors in referral of children for behavioral treatment: A comparison of mothers of clinic-referred deviant, clinic-referred nondeviant and noncllnic children. Behavior Research and Therapy 19. 201-205.
102
Russikoff, K. A. (1994). Hidden expectations: Faculty perceptions of SLA and ESL writing competence. Baltimore, MD: Paper presented at the Annual Meeting of the Teachers of English to Speakers of Other Languages.
Shade, B. J. (1991). African American patterns of cognition. In R. L. Jones (Ed.), Black Psychology (pp. 231-247). Berkeley, CA: Cobb & Henry.
Shaklee, B. D., Barbour, N., Ambrose, R., Rohrer, J., Whitmore, J. R., & Viechnickl, K. J. (1994). Early assessment for exceptional potential In young minority and/or economically disadvantaged students. In C .M. Callahan, C. A. Tomlinson, & P. M. Pizzat (Eds.), Contexts for promise: Noteworthy practices and innovations in the Identification of gifted student (pp. 22-42). Charlottesville, VA: University of Virginia, The National Research Center on the Gifted and Talented.
Shavelson, R .J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage.
SIgafoos, J., & Pennell, D. (1995). Parent and teacher assessment of receptive and expressive language In preschool children with developmental disabilities. Education and Training In Mental Retardation and Developmental Disabilities. 18. 329-35.
Stanley, J. C. (1976)Tests better finder of great math talent than teacher are. American Psychologist. 31. 313-314.
Sternberg, R. J. (1985). Bevond IQ: A triarchic theory of human intelligence. Cambridge: Cambridge University Press.
Stone, W., & Rosenbaum, J. L. (1988). A comparison of teacher and parent views of autism. Journal of Autism and Developmental Disorders. 18. 403-414.
103
Taylor, 0. L. (1990). Cross-cultural communication: An essential dimension of effective education. Washington, DC: The Mid-Atlantic Equity Center.
Thompson, B. (1989). Why generalizability coefficients are an essential aspect of reliable assessment. Houston, TX: Paper presented at the meeting of the Southwest Educational Research Association.
Tonemah, S. A., & Brittan, M. A. (1985). American Indian gifted and talented assessment model. Norman, OK: American Indian Research and Development.
Torrance, E. P. (1969). Creative positives of disadvantaged children and youth. Gifted Child Quarterly. 13. 71-81.
Treffinger, D., & Renzulli, J. S. (1986). Giftedness as a potential for creative productivity: Transcending IQ scores. Roeper Review. 8, 150-154.
Wall, S. M., & Paradise, L. V. (1981). A comparison of parent and teacher reports of selected adaptive behaviors of children. Journal of School Psychology. 19. 73-77.
Webster-Stratton, C. (1988), Mothers' and fathers' perceptions of child deviance: Roles of parent and child behaviors and parent adjustment. Journal of Counseling and Clinical Psychology 56. 909-915.
Weigle, S. C. (1994). Using FACETS to model rater training effects. Washington, D.C. Paper presented at the Language Testing Research Colloquium.
104
Williams, B. L, & Hartlage, L C. (1988). Communication and retention of psychoeducatlonal diagnostic Information in parent conferences. Atlanta, GA: Paper presented at the Annual Meeting of the American Psychological Association.
Wright, D., & Plersel, W. (1992). Components of variance In behavior ratings from parents and teachers. Journal of Psychoeducatlonal Assessment. 10. 310-318.
Zappia, I. A. (1989). Identification of gifted Hispanic students: A multidimensional view. In C.J. Maker & S.W. Schiever. (Eds.), Critical Issues In gifted education: Defensible programs for cultural and ethnic minorities. Austin, TX: Pro-Ed.
105
APPENDIX A
THE LOOKING FOR TRAITS, ATTRIBUTES AND BEHAVIORS
STUDENT REFERRAL FORM
106
The National Research Center on the Gifted and Talented at The University of Georgia
Looking for Traits, Attributes and Behaviors Student Referral Form
Name of Student: School: Grade:
Gender M Birthdate:
Student Ethnicity: American Indian (Circle One) 1
Primary Language Spoken at Home: Name of Person Completing Form:
Asian/Pacific ISL Black Hispanic White 2 3 4 5
(Circle One) Classroom Teacher PT Teacher Parent Other (specify)
Directions: Please rate the student being referred for assessment In each of the following areas. Circle the acprotiriate number and provide specific example(s) or comment(s) for each trait. attriDute or Oehavior. Specific examples must be given for a rating of 1 or 5. The anacned TAB'S Observation Sheet may assist you in completing this form.
Communication •unusual ability to communicate (vertaally, nonverbally,
physically, artistically, symbolically) •uses particularly apt examples, illustrations, or elaborations
In this area, the student is:
Strong Average Weak
1
Motivation •persistent in pursuing/completing self-selected tasks
(may be culturally influenced): evident in school or non-school type activities
•enthusiastic learner •has aspirations to be somebody, do something
In this area, the student is:
Strong Average Weak
107
Interests •unusual or advanced interests tn a topic or activity *self-5tarier •pursues an activity unceasingly •t)eyond the group
In this area the student is:
Strong Average Weax
Problem solving ability •unusual ability to devise or adapt a systematic
strategy for solving problems and to change the strategy If it is not working
•creates new designs •inventor/innovator
In this area, the student is:
Strong Average Weak
5 4 2 1
Specific exampie(s)
Memory •already knows •1-2 repetitions for mastery •has a wealth of information about
school or non-school topics •pays anention to details •manipulates information
In this area, the student is:
Strong Average Weak
5 4 2 1
Specific example(s)
108
Humor •keen sense of humor that may be gentle or hostile *<arge accumulation about emotions •heightened capacity for seeing unusual relationships 'unusual emotional depth -openness to expenences •heightened sensory awareness
In this area, the student Is:
Strong Average Weak
Specific exampie(s)
Inquiry •asks unusual questions for age •plays around with ideas •extensive exploratory behaviors directed
toward eliciting information about matenals. devices or situations
In this area, the student is:
Strong Average Weak
4
Specific example(s)
Insight •has exceptional ability to draw inferences •appears to be a good guesser •is keenly observant •integrates ideas and disciplines
In this area, the student is:
Strong Average Weak
Specific exampie(s)
109
Interests •unusual or advanced interests in a topic or activity •self-starter •pursues an activity unceasingly •beyond the group
In this area, the student is:
Strong Average Weak
Problem solving ability •unusual ability to devise or adapt a systematic
strategy for solving problems and to change the strategy if it is not working
•creates new designs •inventor/innovator
In this area, the student is:
Strong Average Weak
5 4 2 1
Specific exampie(s)
Memory •already knows •1-2 repetitions for mastery •has a wealth of information about
school or non-school topics •pays anention to details •manipulates information
In this area, the student is:
Strong Average Weak
5 4 2 1
Specific example(s)
110
APPENDIX B
ANOVA SUMMARY TABLES AND
GENERALIZABILITY CALCULATIONS FOR ALL RATINGS
111
Table 17. ANOVA Summary Table for Combineci Ratings, Total Sample.
Effect
S R 1
SR SI Rl
SRI
Mean
Total
Degrees of
Freedom
127 2 9
252 1134
18
2268
3809
Sums of Squares for Mean Scores
70993.16667 70463.07795 70474.40682
71515.70000 71389.66667 70504.74803
72481.00000
70446.90000
Sums of Squares for
Score Effects
546.26667 16.17795 27.50682
506.35538 368.99318
14.16325
554.63675
2034.10000
Mean Squares
4.33545 8.08896 3.05631
2.00935 .32539 .78685
.24455
112
Table 18. Variance Components for Combined Ratings: Classroom Teachers, Gifted and Talented
Program Teachers, and Parents, Total Sample.
Effect
S R 1
SR SI Rl
SRI
Degrees of
Freedom
127 2 9
252 1134
18
2268
Model Variance Components
Using Using EMs Standard Algorithm Equations Error
.0748420
.0043601
.0057444
.1764798
.0269473
.0042701
.2445488
.0748420
.0043601
.0057444
.1764798
.0269473
.0042701
.2445488
.0190240
.0045102
.0034825
.0178448 .0051543 .0019601
.0072588
113
Table 19. ANOVA Summary Table for Combined Ratings. Anglo Sample.
Effect
S R 1
SR SI Rl
SRI
Mean
Total
Degrees of
Freedom
72 2 9
144 648
18
1296
2189
Sums of Squares for Mean Scores
41799.10000 41503.98493 41516.78995
42069.10000 42028.33333 41535.82192
42617.00000
41496.84429
Sums of Squares for
Score Effects
302.25571 7.14064
19.94566
262.85936 209.28767
11.89132
306.77534
1120.1557
Mean Squares
4.19800 3.57032 2.21618
1.82541 .32297 .66063
.23671
114
Table 20. Variance Components for Combined Ratings: Classroom Teachers, Gifted and Talented Program
Teachers, and Parents, Anglo Sample.
Effect
S R 1
SR SI Ri
SRI
Degrees of
Freedom
72 2
9
144 648
18
1296
Model Variance Components
Using Algorithm
.0762106
.0018096
.0067091
.1588703
.0287551
.0058071
.2367094
Using EMs Equations
.0762106
.0018096
.0067091
.1588703
.0287551
.0058071
.2367094
Standard Error
.0240914
.0034825
.0044201
.0213850
.0067272
.0028646
.0092917
115
Table 21. ANOVA Summary Table for Combined Ratings, Asian Sample.
Effect
S R 1
SR SI Rl
SRI
Mean
Total
Degrees of
Freedom
4 2 9
8 36 18
72
149
Sums of Squares for Mean Scores
2926.46667 2929.48000 2923.60000
2941.20000 2942.66667 2936.80000
2980.00000
2921.62667
Sums of Squares for
Score Effects
4.84000 7.85333 1.97333
6.88000 14.22667
5.34667
17.25333
58.37333
Mean Squares
1.21000 3.92667
.21926
.86000 .39519 .29704
.23963
116
Table 22. Generalizability Calculations for Combined Ratings, Asian Sample.
Variance Components In Terms of G-study Universe (of Admissible Observations) Sizes
Variance
Effect
S R 1 SR SI Rl SRI
Components Variance
Components for Single Observation
.00
.06
.00
.06
.03
.01
.24
Finite Universe
Corrections
1.00 1.00 1.00 1.00 1.00 1.00 1.00
D-study Sampling
Frequencies
1 3
10 3
10 30 30
For Mean Scores
Estimates
.00 .02 .00 .02 .00 .00 .01
Standard Errors
.02
.02 .00 .01 .00 .00 .00
1 17
Table 23. ANOVA Summary Table for Combined Ratings, African-American Sample.
Effect
S R 1
SR SI Rl
SRI
Mean
Total
Degrees of
Freedom
16 2 9
32 144
18
288
309
Sums of Squares for Mean Scores
9826.23333 9781.75882 9782.84314
9903.10000 9876.33333 9794.52941
10033.00000
9777.03725
Sums of Squares for
Score Effects
49.19608 4.72157 5.80588
72.14510 44.29412
6.96471
72.83529
255.96275
Mean Squares
3.07475 2.36078
.64510
2.25433 .30760 .38693
.25290
118
Table 24. Variance Components for Combined Ratings Classroom Teachers, Gifted and Talented Program Teachers, and Parents, African-American Sample.
Effect
S R 1
SR SI Rl
SRI
Degrees of
Freedom
16 2 9
32 144
18
288
Using Algorithm
.0255174 -.0001634 .0039897
.2001634
.0182326
.0078840
.2529003
Model Variance
Using EMs Equations
.0255174 -.0001634 .0039897
.2001634
.0182326
.0078840
.2529003
Components
Standard Error
.0387469 .0103587 .0059594
.0547208 .0138933 .0073028
.0210022
119
Table 25. ANOVA Summary Table for Combined Ratings, Hispanic Sample.
Effect
S R 1
SR SI Rl
SRI
Mean
Total
Degrees of
Freedom
31 2 9
62 279
18
558
959
Sums of Squares for Mean Scores
16441.36667 16308.52813 16300.36458
16602.30000 16542.33333 16320.09375
16851.00000
16293.77604
Sums of Squares for
Score Effects
147.59062 14.75208
6.58854
146.18125 94.37813
4.97708
142.75625
557.22396
Mean Squares
4.76099 7.37604
.73206
2.35776 .33827 .27650
.25584
120
Table 26. Variance Components for Combined Ratings: Classroom Teachers, Gifted and Talented Program
Teachers, and Parents, Hispanic Sample.
Effect
S R 1
SR SI Rl
SRI
Degrees of
Freedom
31 2 9
62 279
18
558
Using Algorithm
.0773596
.0156175
.0038866
.2101927
.0274791
.0006459
.2558356
Model Variance
Using EMs Equations
.0773596
.0156175
.0038866
.2101927
.0274791
.0006459
.2558356
Components
Standard Error
.0414799
.0163532
.0033935
.0417078 .0107919 .0027739
.0152891
121
Table 27. ANOVA Summary Table for Teacher Ratings, Total Sample.
Effect
S R 1
SR SI Rl
SRI
Mean
Total
Degrees of
Freedom
125 1 9
125 1125
9
1125
2519
Sums of Squares for Mean Scores
46657.90000 46163.30159 46157.18254
46906.60000 46987.00000 46176.33333
47498.00000
46148.67302
Sums of Squares for
Score Effects
509.22698 14.62857
8.50952
234.07143 320.59048
4.52222
257.77778
1349.32698
Mean Squares
4.07382 .94550 .94550
1.87257 .28497 .50247
.22914
122
Table 28. Variance Components for Teacher Ratings, Total Sample.
Effect
Degrees of
Freedom
Model Variance Components
Using Algorithm
Using EMs Equations
Standard Error
s R 1
SR SI Rl
SRI
125 1 9
125 1125
9
1125
.1072705
.0099069
.0015365
.1643436
.0279168
.0021693
.2291358
.1072705
.0099069
.0015365
.1643436
.0279168
.0021693
.2291358
.0281430
.0094829
.0018128
.0235189
.0077020
.0017021
.0096526
123
Table 29. ANOVA Summary Table for Teacher Ratings, Anglo Sample.
Effect
S R 1
SR SI Rl
SRI
Mean
Total
Degrees of
Freedom
72 1 9
72 648
9
648
1459
Sums of Squares for Mean Scores
27718.50000 27414.84384 27417.94521
27841.80000 27906.00000 27426.57534
28178.00000
27409.77808
Sums of Squares for
Score Effects
308.72192 5.06575 8.16712
118.23425 179.33288
3.56438
145.13562
768.22192
Mean Squares
4.28780 5.06575
.90746
1.64214 .27675 .39604
.22497
124
Table 30. Variance Components for Teacher Ratings, Anglo Sample.
Effect
Degrees of
Freedom
Model Variance Components
Using Algorithm
Using EMs Equations
Standard Error
s R 1
73
CO
C
O
73
SRI
72 1 9
72 648
9
648
.1296444
.0044542
.0031414
.1418168
.0263868
.0023571
.2239747
.1296444
.0044542
.0031414
.1418168
.0263868
.0023571
.2239747
.0377548
.0056828
.0028949
.0270252
.0098744
.0023196
.0124239
125
Table 31 ANOVA Summary Table for Teacher Ratings, Asian Sample.
Effect
S R 1
SR SI Rl
SRI
Mean
Total
Degrees of
Freedom
4 1 9
4 36
9
36
99
Sums of Squares for Mean Scores
2101.70000 2098.00000 2100.20000
2102.40000 2117.00000 2102.00000
2128.00000
2097.64000
Sums of Squares for
Score Effects
4.06000 .36000
2.56000
.34000 12.74000
1 -44000
8.86000
30.36000
Mean Squares
1.01500 .36000 .28444
.08500 .35389 .16000
.24611
126
Table 32. Generalizability Calculations for Teacher Ratings, Asian Sample.
Variance Components in Terms of G-study Universe (of Admissible Observations) Sizes
Variance Components Variance Finite D-study For Mean Scores
Components for Universe Sampling Effect Single Observation Corrections Frequencies Estimates Standard Errors
s R 1 SR SI Rl SRI
.03
.00
.00
.00
.05
.00
.25
1.00 1.00 1.00 1.00 1.00 1.00 1.00
1 2
10 2
10 20 20
.03 .00 .00 .00 .00 .00 .01
.03
.00 .00 .00 .00 .00 .00
127
Table 33. ANOVA Summary Table for Teacher Ratings, African-American Sample.
Effect
S R 1
SR SI Rl SRI
Mean
Total
Degrees of
Freedom
15 1 9
15 135
9 135
319
Sums of Squares for Mean Scores
6042.65000 5994.70625 5997.21875
6072.10000 6081.50000 6001.93750 6145.00000
5994.45313
Sums of Squares for
Score Effects
48.19687 .25313
2.76563
29.19687 36.08436
4.46562 29.58436
150.54688
Mean Squares
3.21312 .25313 .30729
1.94646 .26729 -49618 .21914
128
Table 34. Variance Components for Teacher Ratings, African-American Sample.
Effect
Degrees of
Freedom
Model Variance Components
Using Algorithm
Using EMs Equations
Standard Error
s R 1
73
CO
CO
73
SRI
15 1 9
15 135
9
135
.0609259 -.0123148 -.0074074
.1727315
.0249741
.0173148
.2191435
.0609209 -.0123148 -.0074074
.1727315
.0240741
.0173146
.2191435
.0644609 .0045668 .0078856
.0668155
.0208810
.0133264
.0264779
129
Table 35. ANOVA Summary Table for Teacher Ratings, Hispanic Sample.
Effect
S R 1
SR SI Rl
SRI
Mean
Total
Degrees of
Freedom
31 1 9
31 279
9
279
639
Sums of Squares for Mean Scores
10795.05000 10697.66563 10686.67188
10890.30000 10882.50000 10701.59375
11047.00000
10684.72656
Sums of Squares for
Score Effects
110.32344 12.93906
1.94531
82.31094 85.50469
1.98281
67.26719
362.27344
Mean Squares
3.55882 12.93906
.21615
2.65519 .30647 .22031
.24110
130
Table 36. Variance Components For Teacher Ratings, Hispanic Sample.
Model Variance Components
Effect
Degrees of
Freedom Using
Algorithm Using EMs Equations
Standard Error
s R 1
SR SI Rl
SRI
31 1 9
31 279
9
279
.0419131
.0322021 -.0010865
.2414091
.0326837 -.0006496
.2411010
.0419131
.0321371 -.0014113
.2414091
.0326837 -.0006496
.2411010
.0419131
.0330792
.0021196
.0653979
.0164486
.0030037
.0203405
131
Table 37. ANOVA Summary Table for Parent Ratings, Total Sample.
Effect
S 0 1
SO SI 01
SOI
Mean
Total
Degrees of
Freedom
125 1 9
125 1125
9
1125
2519
Sums of Squares for Mean Scores
47973.60000 47424.57143 47475.38093
48046.00000 48504.00000 47479.38095
48424.00000
47424.05714
Sums of Squares for
Score Effects
549.54286 .51429
51.32381
71.88571 479.07619
3.48571
258.11429
1413.94286
Mean Squares
4.39634 .51429
5.70265
.57509
.42585
.38730
.22943
132
Table 38. Variance Components for Parent Ratings, Total Sample.
Effect
S 0 1
SO
Degrees of
Freedom
125 1 9
1125
Model Variance Components
Using Using EMs Standard Algorithm Equations Error
.1812423 -.0001735 .0203132
.2294349
.1812423 -.0001735 .0203132
.2294349
.0278386 .0003627 .0096718
.0096652
133
Table 39. ANOVA Summary Table for Parent Ratings, Anglo Sample.
Effect
S 0 1
SO SI 01
SOI
Mean
Total
Degrees of
Freedom
72 1 9
72 648
9
648
1459
Sums of Squares for Mean Scores
27990.70000 27724.78082 27758.95890
28039.00000 28329.00000 27762.95890
28528.00000
27722.63288
Sums of Squares for
Score Effects
268.06712 2.14795
36.32603
46.15205 301.07397
1.85205
148.84795
805.36712
Mean Squares
3.72315 2.14795 4.03623
.64100 .46601 .20578
.22970
134
Table 40. Variance Components for Parent Ratings, Anglo Sample.
Effect
S 0 1
SO SI 01
SOI
Degrees of
Freedom
72 1 9
72 648
9
648
Model Variance Components
Using Using EMs Standard Algorithm Equations Error
.1422924
.0020971
.0246174
.0411297
.1181526 -.0003277
.2297036
.1422924
.0020643 .0244535
.0411297 .1181526
-.0003277
.2297036
.0310878
.0024098 .0118050
.0106147
.0144096 .0012146
.0127417
135
Table 41. ANOVA Summary Table for Parent Ratings, Asian Sample.
Effect
S 0 1
SO SI 01
SOI
Mean
Total
Degrees of
Freedom
4 1 9
4 36
9
36
99
Sums of Squares for Mean Scores
1730.75000 1705.78000 1711.50000
1733.10000 1748.50000 1713.00000
1757.00000
1705.69000
Sums of Squares for
Score Effects
25.06000 .09000
5.81000
2.26000 11.94000
1-41000
4.74000
51.31000
Mean Squares
6.26500 .09000 .64556
.56500
.33167
.15667
.13167
136
Table 42. Generalizability Calculations for Parent Ratings, Asian Sample.
Variance Components in Terms of G-study Universe (of Admissible Observations) Sizes
Variance
Effect
S 0 1 SO SI 01 SOI
Components Variance Finite
Components for Universe Single Observation Corrections
.28 1.00
.00 1.00
.03 1.00 .04 1.00 .10 1.00 .00 1.00 .13 1.00
D-study Sampling
Frequencies
1 2
10 2
10 20 20
For Mean Scores
Estimates Standard Errors
.28 .18 .00 .00 .00 .00 .02 .02 .01 .00 .00 .00 .00 .00
137
Table 43. ANOVA Summary Table for Parent Ratings, African-American Sample.
Effect
S 0 1
SO SI 01
SOI
Mean
Total
Degrees of
Freedom
15 1 9
15 135
9
135
319
Sums of Squares for Mean Scores
6612.95000 6580.08125 6584.28125
6623.30000 6663.50000 6588.56250
6713.00000
6579.37813
Sums of Squares for
Score Effects
33.57188 .70312
4.90312
9.64688 45.64687
3.57813
35.57188
133.62188
Mean Squares
2.23813 .70312 .54479
.64313
.33812
.39757
.26350
138
Table 44. Variance Components for Parent Ratings, African-American Sample.
Effect
S 0 1
SO SI 01
SOI
Degrees of
Freedom
15 1 9
15 135
9
135
Model Variance Components
Using Using EMs Standard Algorithm Equations Error
.0760185 -.0004630 .0022685
.0379630
.0373146 .0083796
.2634954
.0760185 -.0004630
.0022685
.0379630
.0373146
.0083796
.2634954
.0400206 .0039922 .0091314
.0222876
.0258969
.0107805
.0318367
139
Table 45. ANOVA Summary Table for Parent Ratings, Hispanic Sample.
Effect
S 0 1
SO SI 01
SOI
Mean
Total
Degrees of
Freedom
31 1 9
31 279
9
279
639
Sums of Squares for Mean Scores
11399.95000 11197.39063 11210.48436
11412.10000 11524.50000 11213.15625
11603.00000
11197.38906
Sums of Squares for
Score Effects
202.56094 .00156
13.09531
12.14844 111.45469
2.67031
63.67969
405.61094
Mean Squares
6.53422 .00156
1.45503
.39189
.39948
.29670
.22824
140
Table 46. Variance Components for Parent Ratings, Hispanic Sample.
Effect
S 0 1
SO SI 01
SOI
Degrees of
Freedom
31 1 9
31 279
9
279
Model Variance Components
Using Using EMs Standard Algorithm Equations Error
.2985551 -.0014337 .0154234
.0163642
.0856183 .0021393
.2282426
.2985551 -.0014337 .0154234
.0163642
.0856183
.0021393
.2282426
.0805986 .0005008 .0099123
.0098378
.0194075
.0039991
.0192557
141
APPENDIX C
GENOVA PROGRAM AND SAMPLE DATA
142
COLUMN 111111111122222222223333333333444444444455555555556666 666666677777777778 123456789012345678901234567890123456789012345678901234 567890
TEACHER RATINGS FOR TOTAL SAMPLE NEGATIVE
* S 127 + R2 + I 10 (10F2.0/10F2.0)
GSTUDY OPTIONS EFECT EFFECT EFFECT FORMAT PROCESS 555445544433334434444 453454344334434344434 443344343343444433434 422443444344344444533 (DATA CONTINUES FOR 600 LINES) COMMENT 1 ST SET OF D STUDY CONTROL CARDS
#1 - S X R X 0 X I - I - FIXED $S 127 R 1 2 I 10
DSTUDY DEFECT DEFECT DEFECT ENDSTUDY COMMENT DSTUDY DEFECT DEFECT DEFECT ENDSTUDY COMMENT DSTUDY DEFECT DEFECT DEFECT ENDSTUDY FINISH
2ND SET OF D STUDY CONTROL CARDS #1 - S X R X 0 X I - I - FIXED $S R I 109 8 5 1
3RD SET OF D STUDY CONTROL CARDS #1 - S X R X 0 X I ~ I - FIXED $S R 1 2 3 I 10
143