uncertainty in skeletal aging: a retrospective study and test of
TRANSCRIPT
UNCERTAINTY IN SKELETAL AGING: A RETROSPECTIVE STUDY
AND TEST OF SKELETAL AGING METHODS AT THE JOINT
POW/MIA ACCOUNTING COMMAND CENTRAL
IDENTIFICATION LABORATORY
____________
A Thesis
Presented
to the Faculty of
California State University, Chico
____________
In Partial Fulfillment
of the Requirements for the Degree
Master of Arts
in
Anthropology
____________
by
Carrie Ann Brown
Spring 2009
UNCERTAINTY IN SKELETAL AGING: A RETROSPECTIVE STUDY
AND TEST OF SKELETAL AGING METHODS AT THE JOINT
POW/MIA ACCOUNTING COMMAND CENTRAL
IDENTIFICATION LABORATORY
A Thesis
by
Carrie Ann Brown
Spring 2009
APPROVED BY THE DEAN OF THE SCHOOL OF
GRADUATE, INTERNATIONAL, AND INTERDISCIPLINARY STUDIES:
_________________________________ Susan E. Place, Ph.D.
APPROVED BY THE GRADUATE ADVISORY COMMITTEE:
_________________________________ Eric J. Bartelink, Ph.D., Chair
_________________________________ Beth S. Shook, Ph.D.
_________________________________ John E. Byrd, Ph.D.
iii
DEDICATION
To the men and women who give their lives to protect our freedom, and to those who never returned…you are not forgotten.
Until they are home.
iv
ACKNOWLEDGMENTS
This thesis has been a labor of love. Its successful completion would not have
been possible without contributions from a significant number of people. What follows is
not an exhaustive list of everyone I have met during this process, but a thank you to the
key players in my research and writing and to those people that have made my graduate
school experience an unforgettable one.
Thanks are in order first to my thesis committee: Drs. Eric Bartelink, Beth
Shook, and John Byrd. As my chair, Dr. Bartelink has dutifully (and quickly) read my
thesis in the past few months, always offering insightful and timely comments and
critiques. I truly appreciate his help and regular availability, especially considering his
hectic schedule and his many, many duties as advisor, professor, researcher, and
anthropologist. The physical anthropology graduate students, including myself, benefit
tremendously from his expertise and dedication. I truly believe that my three years at
Chico State would not have been as successful or fulfilling without his guidance and
support.
An enormous thank you goes to Dr. Shook for serving on my committee. Her
comments helped me refine my ideas and statistical analyses. She too is a fabulous
mentor for the graduate students as a whole and serves as a model professor, the kind that
I one day hope to become. Additionally, she has served on my committee while expecting
v
her first child. Her son, Joel Douglas Shook, was born April 12, 2009, and my thanks to
him for his timely arrival following my thesis defense!
I am indebted to Dr. Byrd for my thesis topic and his statistical expertise, as
well as his guidance during data collection. When he asked me a little over a year ago
what my thesis topic was going to be, I had no way of knowing how his idea would grow
into a research project for my thesis, but also for years to come. His support and
confidence in my skills as an anthropologist are unwavering and I look forward to
working with him and the rest of the staff at the Central Identification Laboratory for
years to come, whether in Hawaii or elsewhere.
I would also like to thank the California State University, Chico,
Anthropology Department. Without listing all of the faculty and staff here, it is important
for me to say that this department is unique because while not everyone agrees all of the
time (perhaps even most of the time), we all still manage to get along (and have fun!). I
still remember how I felt immediately at home the first time that I met my cohort and a
majority of the faculty. This experience certainly would not have been as rich or as varied
without the many personalities that make up this department and I am thankful for this.
Thank you to my graduate cohort. The end goal of graduate school is certainly
important, but I have found the journey to be equally as rewarding. Through all the first
semester tears and frustrations, our food-filled seminars, and life outside of anthropology,
however rare, I know that I have made friends for life. I look forward to future
collaborations with them, as well as always sharing the memories of our years in graduate
school at Chico State.
vi
Thank you to all of the anonymous volunteers from the CIL and at the 2009
AAFS meetings that participated in the interobserver error study. Thank you also to the
three other members of the inaugural Forensic Science Academy: Angela Soler, Cate
Bird, and Laurel Freas. They were the first to hear about my research as I worked through
the kinks of data collection and initial statistical analyses and the first to volunteer their
time to test several age estimation methods. Additionally, Angela helped run my study at
the AAFS meetings. Unrelated to this thesis, but certainly not to my overall graduate
school experience, they were there for me through thick and thin, from activity-filled
days at the lab to the jungles in Laos.
Apart from my current academic community, I would also like to thank my
family, my first academic community, for their continued support. This includes late
night and weekend phone calls fielded by my mother, many, many conversations with my
sister, and every supportive and humorous email from my father. I am thankful that they
have always had faith in me and my dreams, especially in my earlier years when I was
not quite sure what they were. I am also indebted to them for instilling a life-long love of
learning and sense of curiosity, no matter how much my sister and I harass our father
about his many interests. Thanks to Grandma and Grandpa Brown for their support and
excitement about all those “bone books” I bring for them to read and thanks to
Grandaddy and Grandmommy Newcomer who are no longer with us.
Finally, thank you to everyone I have met during the thesis process and over
the last three years. I have to thank Empire Coffee for the delicious Aztec mochas and a
place to go when I could not possibly work any longer at home or school. Thanks to
vii
Shannon Damon for also offering me an alternative workspace, the Human ID Lab,
where I wrote a good number of pages. Thanks to Karen Smith Gardner for allowing me
to invade her house while I was in between homes, and also for being a great hostess
even though she swore she would never have the time! And thank you to the rest of my
friends here in Chico who are always ready to provide support and laughter at every
moment, whether it be knee-fighting or good old-fashioned jokes.
The end of my thesis is bittersweet because it signifies the end of three years
of hard work, but also the end of my time in Chico. A large number of students in my
cohort will also leave Chico at the end of May and I wish them all great success and
happiness. I look forward to the future with great anticipation, but also look back fondly
on my time here. Again, thank you to everyone who was a part of my life during the past
three years.
This research was supported in part by an appointment to the Student
Research Participation Program at the Joint POW/MIA Accounting Command/Central
Identification Laboratory (JPAC/CIL) administered by the Oak Ridge Institute for
Science and Education (ORISE) through an interagency agreement between the U.S.
Department of Energy and JPAC/CIL.
viii
TABLE OF CONTENTS
PAGE
Dedication................................................................................................................... iii Acknowledgments ...................................................................................................... iv List of Tables.............................................................................................................. xi List of Figures............................................................................................................. xiv Abstract....................................................................................................................... xix
CHAPTER I. Introduction .............................................................................................. 1
Research Design ........................................................................... 3 Outline of the Thesis .................................................................... 7
II. Adult Skeletal Aging ................................................................................ 8
Historical Perspectives ................................................................. 8 General Concepts.......................................................................... 10 Key Terms .................................................................................... 12 Trends ........................................................................................... 14 Published Methods ....................................................................... 18 The Statistical Basis of Age Estimation ....................................... 42 Summary....................................................................................... 45
III. Uncertainty Analysis ................................................................................ 46
Standards ...................................................................................... 46 Error.............................................................................................. 51 Uncertainty in Age Estimation ..................................................... 52 Summary....................................................................................... 58
ix
CHAPTER PAGE
IV. Methods I: Retrospective Study ............................................................... 60
The Sample................................................................................... 60 Data Collection............................................................................. 62 Data Analysis................................................................................ 68 Summary....................................................................................... 72
V. Methods II: Interobserver Error Study ..................................................... 73
Choice of Methods ....................................................................... 73 Design of Study ............................................................................ 75 Data Analysis................................................................................ 77 Summary....................................................................................... 79
VI. Results I: Retrospective Study.................................................................. 80
The Sample................................................................................... 80 Method-to-Method Comparison................................................... 91 Method by Method ....................................................................... 96 Summary....................................................................................... 136
VII. Results II: Interobserver Error Study ....................................................... 138
Participants ................................................................................... 138 Method Performance .................................................................... 141 Summary....................................................................................... 155
VIII. Discussion................................................................................................. 157
Method Performance .................................................................... 157 Error.............................................................................................. 167 Analyst Experience....................................................................... 173 General Observations ................................................................... 175 Limitations of the Thesis .............................................................. 176 Summary....................................................................................... 178
IX. Summary................................................................................................... 179
Uncertainty in Skeletal Age Estimation ....................................... 179 Future Research ............................................................................ 180
References Cited......................................................................................................... 183
x
CHAPTER PAGE Appendices A. Final Sample Sizes for All Methods......................................................... 196 B. Age Distributions for Long Bone Epiphyses:
McKern-Stewart (1957) .................................................................... 200 C. Moorrees et al. (1963) Correct and Incorrect Classification
by Tooth Number and Root .............................................................. 206 D. Mincer et al. (1993) Correct and Incorrect Classification by
Tooth Number ................................................................................... 209
xi
LIST OF TABLES
TABLE PAGE 1. Epiphyseal Scoring by Method ................................................................. 64 2. Descriptive Statistics by Method .............................................................. 89 3. P-values From One-way ANOVA with Bonferroni Correction:
All Methods....................................................................................... 90 4. Correct and Incorrect Classifications by Method (Excluding
Epiphyseal Fusion) ............................................................................ 91 5. Error by Method (Excluding Epiphyseal Fusion) ..................................... 92 6. Comparison of Pearson’s r and r2 by Method (Excluding Dental
Methods)............................................................................................ 95 7. Correct and Incorrect Classifications for Epiphyseal Fusion
Methods ............................................................................................. 97 8. Age Distribution of Stages of Iliac Crest Union (in %):
McKern-Stewart (1957)..................................................................... 102 9. Correct and Incorrect Classifications by Age Interval of the
Mann et al. Maxillary Suture Method (1987, 1991).......................... 104 10. Error Values for Mann et al. (1991) by Reported Interval........................ 105 11. Correct and Incorrect Classifications of Dental Formation Methods ....... 107 12. Error of Dental Formation Methods.......................................................... 107 13. Age Distribution of Stages of Dental Root Formation (in %):
Moorrees et al .................................................................................... 108 14. Age Distribution of Stages of Dental Root Formation (in %):
Mincer et al........................................................................................ 109
xii
TABLE PAGE 15. Todd (1920) Pubic Symphysis Method Sample (n=10) ........................... 110 16. Correct and Incorrect Classification: McKern-Stewart (1957)
Pubic Symphysis Method. ................................................................. 112 17. Error of McKern-Stewart (1957) Pubic Symphysis Method by
Composite Score Group .................................................................... 114 18. Correct and Incorrect Classification: Suchey-Brooks Pubic
Symphysis Method ............................................................................ 116
19. Error of Suchey-Brooks Pubic Symphysis Method by Phase ................... 117 20. P-Values from ANOVA with Bonferroni Correction Between
the First Four Phases of the Suchey-Brooks Method: Bias ............... 119 21. P-Values from ANOVA with Bonferroni Correction Between
the First Four Phases of the Suchey-Brooks Method: Inaccuracy..... 119 22. Comparison of Bias and Inaccuracy: McKern-Stewart and
Suchey-Brooks Pubic Symphysis Methods ....................................... 120 23. Correct and Incorrect Classification: Lovejoy et al. (1985b)
Auricular Surface Method ................................................................. 122 24. Correct and Incorrect Classification for Single Phases Only:
Lovejoy et al. (1985b) Auricular Surface Method............................. 123 25. Error of Lovejoy et al. (1985b) Auricular Surface Method by Phase ....... 124 26. Error of Lovejoy et al. (1985b) Auricular Surface Method:
Single Phases Only............................................................................ 125 27. P-Values from ANOVA with Bonferroni Correction Between
the First Three Phases of the Lovejoy et al. (1985b) Method: Bias ..................................................................................... 127
28. P-Values from ANOVA with Bonferroni Correction Between
the First Three Phases of the Lovejoy et al. (1985b) Method: Inaccuracy ........................................................................... 127
xiii
TABLE PAGE 29. P-Values from ANOVA with Bonferroni Correction Between
the First Three Phases of the Lovejoy et al. (1985b) Method: SEI....................................................................................... 127
30. Correct and Incorrect Classification: Osborne et al. (2004)
Auricular Surface method.................................................................. 128 31. Error of Osborne et al. (2004) Auricular Surface Method by Phase......... 129 32. Buckberry-Chamberlain (2002) Auricular Surface Method
Sample (n=10)................................................................................... 132 33. Correct and Incorrect Classification: Iscan et al. (1984b)
Sternal Rib End Method.................................................................... 134 34. Correct and Incorrect Classification Using Nawrocki (n.d.)
Prediction Intervals............................................................................ 134 35. Error of Iscan et al. (1984b) Sternal Rib End Method by Phase............... 136 36. Percent Confidence in Assigned Composite Score by Stage:
Samples A and B ............................................................................... 142 37. SEI by Highest Degree Obtained: Samples A and B ................................ 144 38. SEI by Years of Experience in Skeletal Aging: Samples A and B ........... 144 39. Percent Confidence in Assigned Phase by Stage: Samples C and D ........ 147 40. SEI by Highest Degree Obtained: Samples C and D ................................ 149 41. SEI by Years of Experience in Skeletal Aging: Samples C and D ........... 149
xiv
LIST OF FIGURES
FIGURE PAGE 1. Individuals Identified by Conflict (N=1717)............................................. 61 2. Age Distribution of Total Sample (n=979)............................................... 81 3. Age Distribution: Albert-Maples 1995 (n=24) ......................................... 82 4. Age Distribution: Webb-Suchey Clavicle 1985 (n=33)............................ 82 5. Age Distribution: McKern-Stewart Epiphyses 1957 (n=161).................. 83 6. Age Distribution: McKern-Stewart Pubic Symphysis 1957 (n=79) ......... 83 7. Age Distribution: Suchey-Brooks Pubic Symphysis (n=10)..................... 84 8. Age Distribution: Todd Pubic Symphysis 1920 (n=93)............................ 84 9. Age Distribution: Lovejoy et al. Auricular Surface 1985 (n=147) ........... 85 10. Age Distribution: Osborne et al. Auricular Surface 2004 (n=151)........... 85 11. Age Distribution: Buckberry-Chamberlain Auricular Surface
2002 (n=10) ....................................................................................... 86 12. Age Distribution: Iscan et al. Sternal Rib End 1984 (n=21)..................... 86 13. Age Distribution: Moorrees et al. Dental Formation 1963 (n=92) ........... 87 14. Age Distribution: Mincer et al. Dental Formation 1993 (n=105) ............. 87 15. Age Distribution: Mann et al. Maxillary Sutures 1991 (n=55)................. 88 16. Sum of Bias by Method (in Years) ........................................................... 94 17. Sum of Inaccuracy by Method (in Years) ................................................. 94
xv
FIGURE PAGE 18. Mean SEI by Method ................................................................................ 95 19. Comparison of Known Ages of Identified Males Superimposed
over the Summary Stage Observations for the Three Stages of Vertebral Centra Fusion as Given in the Albert and Maples (1995) Method ...................................................................... 99
20. Comparison of Known Ages of Identified Males Superimposed
over the Summary Stage Observations for the Four Stages of Vertebral Centra Fusion as Given in the Mckern and Stewart (1957) Method...................................................................... 100
21. Comparison of Known Ages of Identified Males Superimposed
over the Age Intervals for the Four Stages of Epiphyseal Fusion of the Medial Clavicle as Given in the Webb and Suchey (1985) Method ...................................................................... 101
22. Comparison of Known Ages of Identified Males Superimposed
over the Age Intervals for the Five Stages of Epiphyseal Fusion of the Medial Clavicle as Given in the Mckern and Stewart (1957) Method. .............................................................. 102
23. Comparison of Known Ages of Identified Males Superimposed
over the Age Intervals for the Five Stages of Epiphyseal Fusion of the First Two Sacral Segments as Given in the Mckern and Stewart (1957) Method.................................................. 103
24. Distribution of Bias for the Mann et al. (1991) Maxillary Suture
Method............................................................................................... 105 25. Correlation of Estimated and Known Ages-at-Death for the
Mann et al. (1991) Maxillary Suture Method (n=27) ....................... 106 26. Distribution of Bias for the Todd (1920) Pubic Symphysis
Method............................................................................................... 111 27. Correlation of Estimated and Known Age-at-Death for the
Todd (1920) Pubic Symphysis Method (n=10)................................. 112
xvi
FIGURE PAGE 28. Comparison of Known Ages of Identified Males Superimposed
over the Age Intervals for the Composite Scores of Pubic Symphysis Components as Given in the Mckern and Stewart (1957) Method...................................................................... 113
29. Distribution of Bias for the Mckern-Stewart (1957) Pubic Symphysis Method ............................................................................ 114
30. Correlation of Estimated and Known Age-at-Death for the
Mckern-Stewart (1957) Pubic Symphysis Method (n=73) ............... 115 31. Comparison of Known Ages of Identified Males Superimposed
over the Age Intervals for the Six Phases of the Pubis Symphysis as Given in the Suchey-Brooks Pubic Symphysis Method............................................................................................... 117
32. Distribution of Bias for the Suchey-Brooks Pubic Symphysis
Method............................................................................................... 118 33. Correlation of Estimated and Known Age-at-Death for the
Suchey-Brooks Pubic Symphysis Method (n=86) ............................ 119 34. Comparison of Known Ages of Identified Males Superimposed
over the Age Intervals for the Eight Phases of the Auricular Surface as Given in the Lovejoy et al. (1985b) Auricular Surface Method. ................................................................................ 123
35. Distribution of Bias for the Lovejoy et al. (1985b) Auricular
Surface Method ................................................................................. 125 36. Correlation of Estimated and Known Age-at-Death for the
Lovejoy et al. (1985b) Auricular Surface Method (n=147) .............. 126 37. Comparison of Known Ages of Identified Males Superimposed
over the Age Intervals for the Six Phases of the Auricular Surface as Given in the Osborne et al. (2004) Auricular Surface Method ................................................................................. 129
38. Distribution of Bias for the Osborne et al. (2004) Auricular
Surface Method ................................................................................. 130
xvii
FIGURE PAGE 39. Correlation of Estimated and Known Age-at-Death for the
Osborne et al. (2004) Auricular Surface Method (n=113)................ 131 40. Comparison of Known Ages of Identified Males Superimposed
over the Age Intervals for the Seven Stages of the Auricular Surface as Given in the Buckberry-Chamberlain (2002) Revised Auricular Surface Method ................................................... 133
41. Correlation of Estimated and Known Age-at-Death for the
Buckberry-Chamberlain (2002) Revised Auricular Surface Method (n=9) .................................................................................... 133
42. Comparison of Known Ages of Identified Males Superimposed
over the Age Intervals for the Eight Phases of the Sternal Rib End as Given in the Iscan et al. (1984b) Sternal Rib End Method (Solid Rectangles) and Prediction Intervals as Calculated by Nawrocki (Dashed Rectangles) .................................. 135
43. Distribution of Bias for the Iscan et al. (1984b) Sternal Rib
End Method ....................................................................................... 137 44. Correlation of Estimated and Known Age-at-Death for the
Iscan et al. (1984b) Sternal Rib End Method (n=14)........................ 137 45. Participants’ Self-Reported Fields of Study.............................................. 139 46. Participants’ Self-Reported Highest Degrees Obtained............................ 139 47. Participants’ Self-Reported Years of Experience with Skeletal
Aging ................................................................................................. 140 48. Participants’ Self-Reported Approximate Number of Skeletons
Analyzed............................................................................................ 141 49. Distribution of Assigned Stages for Sample A (n=37) ............................. 143 50. Distribution of Assigned Stages for Sample B (n=37) ............................. 145 51. Distribution of Assigned Phases for Sample C (n=37)............................. 148 52. Distribution of Assigned Phases for Sample D (n=34)............................. 150
xviii
FIGURE PAGE 53. All Combinations of Suture Obliteration As Reported by
Participants for Sample E (n=38) ...................................................... 153 54. Frequency of Sutures Scored As Obliterated: Sample E .......................... 154 55. All Combinations of Suture Obliteration As Reported by
Participants for Sample F (n=36) ...................................................... 155 56. Frequency of Sutures Scored As Obliterated: Sample F........................... 156
xix
ABSTRACT
UNCERTAINTY IN SKELETAL AGING: A RETROSPECTIVE STUDY
AND TEST OF SKELETAL AGING METHODS AT THE JOINT
POW/MIA ACCOUNTING COMMAND CENTRAL
IDENTIFICATION LABORATORY
by
Carrie Ann Brown
Master of Arts in Anthropology
California State University, Chico
Spring 2009
Adult skeletal age estimation is an important facet of forensic anthropology,
paleodemography, and bioarchaeology. Estimating the age-at-death of adults is prob-
lematic because of human variability in the aging process. Analysis of the error associ-
ated with skeletal age estimation methods is necessary so that the performance of these
methods is not overestimated and so that the uncertainty in these skeletal techniques can
be quantified and better understood.
The purpose of this thesis is to analyze and describe the error associated
with skeletal age estimation methods used at the Joint POW/MIA Accounting Com-
mand Central Identification Laboratory (JPAC/CIL) from 1972 to 31 July 2008. There
were
xx
six general categories of age estimation methods used: epiphyseal fusion, suture clo-
sure, dental formation and eruption, and morphological changes in the pubic symphysis,
auricular surface, and sternal rib end. The total identified known age-at-death sample
was 979 individuals, although method sub-samples were much smaller. Additional in-
terobserver error research was conducted with three methods that were problematic for
the JPAC/CIL sample.
Results indicate that adult age estimation methods perform well for the
JPAC/CIL identified known age-at-death sample, most likely because of the young age
composition of this sample. Bias, inaccuracy, and scaled error index (SEI) values are
low for most methods and phases or stages of methods. Correlation between estimated
and known age-at-death is statistically significant for maxillary suture closure, pubic
symphysis, auricular surface, and sternal rib end methods. The auricular surface is the
poorest age indicator of those examined in the JPAC/CIL sample. It is also recom-
mended that fusion of the sacral segments no longer be used for age estimation since
this method had a correct classification rate of only 32.1%. Future research in adult
skeletal age estimation and refinement of existing techniques should include estimation
of measurement uncertainty.
1
CHAPTER I
INTRODUCTION
Physical anthropology examines what it means to be biologically human,
including human variability. Estimating age-at-death from skeletal remains is an integral
part of the field of physical anthropology because it helps contribute to the understanding
of aging and aging processes as well as facilitating the construction of individual and
population profiles. Forensic anthropologists, bioarchaeologists, and paleodemographers
all apply age estimation techniques and make significant contributions to research in age
estimation.
Methods for estimating the age-at-death of sub-adults rely on processes of
growth and development, while those for adults generally rely on skeletal degeneration.
Due to this, sub-adult aging techniques are subjected to far less scrutiny than adult aging
techniques since growth and development follows a similar pattern for all individuals,
unlike the adult aging process. Aykroyd et al. (1999) have shown that many aging
techniques have a tendency to overage younger individuals and underage older
individuals. Additionally, age estimation of older individuals is more problematic than
that of younger individuals (e.g., Schmitt et al. 2002; Berg 2008).
Methods of age estimation in adults can be broken down into several broad
categories: epiphyseal fusion, suture closure, third molar formation and eruption,
morphological changes in the pubic symphysis, auricular surface, and sternal rib ends,
2
and also methods that combine a number of skeletal age indicators. This list is not
intended to be exhaustive, but rather to represent the most commonly employed
techniques for age estimation and those techniques that will be examined in this study.
Anthropologists usually apply a variety of methods to come up with a “best-fit” age
estimation. Research in refining and developing age estimation techniques is on-going
and varied.
Mathematical models of age estimation have undergone significant critiques.
Traditionally, linear regression and correlation have been used to derive age estimates
from skeletal age indicators. Recent research has suggested that Bayesian-based models
may be better suited to age estimation calculations than regression and correlation (e.g.,
Konigsberg and Frankenberg 1992; Schmitt et al. 2002; Steadman et al. 2006). While
mathematically more complex than traditional models, Bayesian models have the ability
to analyze non-linear relationships and may even be more successful in predicting the
age-at-death of older individuals (Schmitt et al. 2002).
Another key concern in age estimation is understanding the error associated
with different methods. This is particularly relevant in the field of forensic anthropology,
where evidence presented as part of expert witness testimony must have known error
rates and standards of application (Daubert v. Merrell Dow Pharmaceuticals, 509 US 579
[1993]). When applying several age estimation methods, it is important to not
overestimate the performance of a single method and to use methods appropriate to the
general age category of the individual (Martrille et al. 2007).
Estimating uncertainty associated with different measurements is
internationally regulated. The International Organization for Standardization (ISO)
3
publishes baseline documents to ensure global standardization in measurement, including
guidelines for estimating measurement uncertainty. Additionally, the American Society
of Crime Laboratory Directors/Lab Accreditation Board (ASCLD/LAB) regulates and
accredits forensic science testing laboratories in the United States and worldwide.
Finally, each institution is charged with maintaining its own standards that adhere to both
ISO and ASCLD/LAB regulations in order to maintain accreditation. This is usually
accomplished by the establishment and maintenance of standard operating procedures
(SOPs).
The Joint POW/MIA Accounting Command Central Identification Laboratory
(JPAC/CIL), located on Hickam Air Force Base (AFB) is tasked with the recovery and
identification of servicemen and women lost in foreign conflicts and is the largest skeletal
human identification laboratory in the world. The composition of the JPAC/CIL sample
is largely male, young, Caucasian, and of similar stature. This laboratory is accredited
under the ASCLD/LAB-International program and follows ISO standards as well as
ASCLD/LAB supplemental requirements for estimating uncertainty in measurement,
which includes age estimation methods. Currently, the JPAC/CIL laboratory manual does
not detail error associated with skeletal age estimation, but SOP 3.4 (Determining
Biological Profiles) outlines the procedures to follow for age estimation.
Research Design
The purpose of this thesis is to analyze and describe the error associated with
each skeletal age estimation method currently in use at the JPAC/CIL and to determine
how well each method performed for this sample. Because adult age estimation
4
techniques do not perform as well as sub-adult techniques and this sample is largely
comprised of young adults, it is important to continually analyze possible measurement
uncertainty on a method-by-method basis. Since the JPAC/CIL is an accredited
institution, it is also necessary to quantify the error associated with each method.
Therefore, this study also has the purpose of estimating measurement uncertainty in
relation to laboratory SOPs. Records of individuals identified between 1972 and 31 July
2008 were used to calculate error. Additional tests of error were conducted with skeletal
samples from the JPAC/CIL anatomical collection.
Hypothesis 1
The JPAC/CIL has a long history of identifying American war dead and
contributing to research in human identification. Of research in age estimation, McKern
and Stewart (1957) is certainly one of the most pivotal studies. McKern and Stewart’s
research was based on identified Korean War casualties and represents a sample
demographically similar to that of the JPAC/CIL sample used in this study. Many of the
observations made by McKern and Stewart are still used at the JPAC/CIL, including
epiphyseal fusion of the long bones, iliac crest, medial clavicle, and vertebrae and the
component scoring system for the pubic symphysis.
The sample of identified individuals from the JPAC/CIL is entirely young
men. Aging methods usually perform better for younger individuals (as compared to
older individuals). The age estimation methods employed at the JPAC/CIL conform to
this general rule, especially when considering that several of the methods are particularly
useful for late adolescents (e.g., epiphyseal fusion, dental formation and eruption).
5
Due to the use of methods developed on a similar sample and the young
composition of the JPAC/CIL identified sample, it is expected that age estimation
methods will perform well overall. Method performance will be measured by correct and
incorrect classifications, bias, inaccuracy, a scaled error index (SEI), and Pearson’s r
correlation coefficient, which will be calculated for each method. Methods that perform
well should have a high percentage of correct classifications, low bias, inaccuracy, and
SEI, and a high correlation between known and estimated age-at-death.
Hypothesis 2
Error can be measured in a variety of ways. Several different calculations
provide quantifications of error. Bias is the average error in years that takes into
consideration under- and overaging, while inaccuracy is the average error in years with
no implication of directionality (Meindl and Lovejoy 1989). The SEI is an index
developed for this thesis that allows the comparison of error between estimated and
actual age regardless of scale or sample size.
Correct application of a method produces error values that are normally
distributed. Because the measure of bias produces both positive and negative values, it
will be used to examine error distribution. Analysis of variance (ANOVA) and t-tests can
also help to determine where error is most pronounced. For example, a significant
difference in bias, inaccuracy, or SEI between phases of a single method indicates which
phase may be more likely to produce incorrect age estimations. When possible, methods
that use the same skeletal indicators (e.g., pubic symphysis methods) will be compared to
one another.
6
It is expected that error (as indicated by bias values) will be normally
distributed for each method. Additionally, methods that have higher accuracy are
expected to be less precise. Assignment of age estimations based on stages or phases
should also be consistent between individuals.
Hypothesis 3
The success of skeletal age estimation methods can also be related to analyst
experience. Experience can be measured by highest degree held, total number of
skeletons analyzed, and familiarity with the method or methods employed. For this
portion of the study, the SEI will be calculated and compared between groups with
different levels of experience, similar to Adams and Byrd (2002).
It is expected that the SEI is dependent upon experience. Those individuals
with more experience in skeletal age estimation, as indicated by highest degree obtained
and number of years of experience in skeletal aging, will have lower average SEI scores
than individuals with less experience. Possible differences in relation to how confident
analysts felt in their final age estimations will also be examined.
Other Questions
There are several other research questions that are not based on strict
hypothesis-testing. These questions are descriptive in nature and are designed to better
understand age estimation methods in the context of the JPAC/CIL SOPs. The following
questions will be discussed: what are the sources of error in age estimation at the
JPAC/CIL? Is error systematic or random? What recommendations can be made for age
estimation methods used at the JPAC/CIL?
7
Outline of the Thesis
In order to understand the performance of age estimation methods in the
JPAC/CIL identified sample, it is necessary to first understand the basis of skeletal age
estimation, methods used to estimate age-at-death, and procedures for estimating
measurement uncertainty. Chapter II is an in-depth literature review of adult skeletal age
estimation, including historical perspectives, general concepts and terms, trends,
published aging methods, and the statistical basis of age estimation. Chapter III focuses
on uncertainty analysis, outlining standards for estimating measurement uncertainty and
error.
Chapters IV and V outline the methods used to conduct both portions of this
study. Chapter VI details the results of the retrospective study, conducted using the
records of identified individuals, and Chapter VII gives the results of a preliminary
investigation of the application of three age estimation methods. Chapter VIII discusses
the results from both studies, synthesizes these results with relevant concepts outlined in
Chapters II and III, and details limitations of this thesis. Finally, Chapter IX summarizes
all findings and suggests avenues for future research.
8
CHAPTER II
ADULT SKELETAL AGING
Age estimation of unidentified adult skeletal remains is a significant facet of
applied physical anthropology. A variety of methods and mathematical analyses are
employed in an attempt to construct accurate individual and population profiles based on
age-at-death. This chapter will discuss the development of adult skeletal age estimation
methods and their use in physical anthropology.
Historical Perspectives
The first attempts at age estimation by anatomists and anthropologists began
in earnest during the 1920s following World War I (e.g., Todd 1920, 1921; Stevenson
1924; Todd and Lyon 1924, 1925; Todd and D’Errico 1928). T. Wingate Todd and his
colleagues at Western Reserve University were instrumental in launching studies of
skeletal aging using a documented skeletal collection (McKern and Stewart 1957; Bass
2005). The first efforts at adult skeletal age estimation focused on skeletal maturation and
a better understanding of the morphological age-related changes observed in the adult
skeleton, including variability in these changes. These studies served, and continue to
serve, as building blocks for research in adult skeletal aging. Individual methods and their
histories will be discussed in further detail below.
9
During the period directly following World War II, great advances in
identification, including skeletal age estimation, were made. This war and the Korean
War resulted in large numbers of killed U.S. servicemen that were not immediately
recovered, and were often badly decomposed (Byers 2008). The need for identification of
these individuals propelled research in skeletal identification. One of the most pivotal
studies to emerge from the Korean War period was McKern and Stewart’s Skeletal Age
Changes in Young American Males (1957). Based on males of military age (generally
between the ages of 17 and 30 years old), McKern and Stewart (1957) was the first study
that did not use anatomical specimens from dissecting rooms, thereby providing
important information on younger individuals (Bass 2005).
Until the 1980s, skeletal biologists relied mainly on macroscopic, or gross,
techniques for age estimation, such as cranial suture closure and the morphology of the
pubic symphyseal face (Iscan 1989a). The rise of forensic anthropology during the
“Modern Period1” of the discipline (Byers 2008) and critiques in the field of
paleodemography (e.g., Bocquet-Appel and Masset 1982) meant that new methods were
being developed, such as the sternal rib end (Iscan et al. 1984b, 1985) and the auricular
surface (Lovejoy et al. 1985b). Interest in skeletal aging also initiated a re-examination of
older methods, such as the pubic symphysis (Suchey 1979, Katz and Suchey 1986,
Brooks and Suchey 1990). Additionally, microscopic and radiographic methods began to
receive much more attention. For example, histological techniques based on the osteon
counting method of Kerley (1965) became more common in the research literature during
this time period (e.g., Kerley and Ubelaker 1978; Stout 1989).
1 The modern period is defined by Byers (2008) as 1972 to the present.
10
The search for new and better ways to estimate age continued through the
1990s to present day. Methods to estimate age from new elements like the acetabulum
(e.g., Rougé-Maillart et al. 2004; Rissech et al. 2006) and the sacral auricular surface
(Kutyla 2008) are currently being developed. More emphasis has been placed on
understanding the accuracy and error in age estimation methods. Questions currently
being asked in the field of skeletal aging include: is it better to use a single indicator or a
multifactorial method? Should the same skeletal aging techniques be used across the
board or adapted to a specific area of research, i.e., forensic anthropology versus
paleodemography? How are age estimates being constructed and what statistics should be
employed? What is the error or uncertainty in measurement associated with skeletal age
estimation? Komar and Buikstra (2008) raise these questions and address many of the
theoretical perspectives that will continue to define the direction of further developments
in skeletal age estimation in physical anthropology.
General Concepts
Age estimation from adult human skeletal remains is certainly an integral part
of physical anthropology, influencing forensic anthropology, bioarchaeology, and
paleodemography. Physical anthropologists are inherently concerned with what it means
to be biologically human. Gaining a better understanding of aging and its skeletal
manifestations leads to increased knowledge concerning the nature of human variability.
There is an ever-growing toolkit available to physical anthropologists
interested in age estimation from the skeleton. Varying statistical procedures offer
different ways to analyze data; thus, skeletal aging methods are constantly being
11
developed and refined. Currently, there are no published international standards for
skeletal age estimation, though attempts to codify and clarify procedures have been made
within the United States (e.g., Buikstra and Ubelaker 1994; Moore-Jansen et al. 1994).
However, European and American anthropologists do not use the same standards, an
issue addressed by Wittwer-Backofen et al. (2008). Laboratories and institutions are
generally able to utilize whichever methods they choose and many of these agencies may
not even have set protocols for age estimation, making standardization difficult.
Estimating age-at-death from skeletal remains is generally problematic.
Maples (1989) likens skeletal aging to an art, rather than a precise science, and
recommends the use of as many techniques as possible in constructing age ranges. While
sub-adult aging techniques rely on skeletal and dental growth, events that take place at a
fairly consistent rate between individuals, adult aging techniques rely on degenerative
skeletal changes that are much more variable and less predictable than growth sequences
(White and Folkens 2005). In addition, many aging techniques have demonstrated a
general trend of overaging younger individuals and underaging older individuals
(Aykroyd et al. 1999).
Growth and aging can be affected by disease, health, environment, presence or
absence of trauma, and cultural practices (Bogin 1999), so these factors must also be
taken into consideration when comparing age estimations between samples and using
methods developed from one reference group to make inferences about another. Age
estimation methods developed using samples from mainly European populations may not
be applicable to individuals of Asian or African ancestry. There is a need for further
12
research in population-specific standards in the field of applied osteology (Schmitt et al.
2002).
Due to the inherent challenges in aging and the need to better understand and
interpret age estimates from the adult skeleton, physical anthropologists must constantly
develop and test techniques for age estimation. This includes an effort to better
understand age-related morphological changes as potentially population-specific
phenomena. Success in interpretation, whether it be individual identification or the
construction of representative population profiles, is contingent upon reliable, valid,
accurate, and precise methods of aging.
Key Terms
Reliability is the degree to which a method produces the same results when
used at different times (Adams and Byrd 2002), either by multiple observers or the same
observer. A highly reliable aging technique produces similar age estimates for the same
individual even when applied by different analysts or at different times by the same
analyst. Reliability can be tested for a method or technique by conducting interobserver
or intraobserver variation studies to determine error rates. Low interobserver variation (or
error) indicates high reliability. Reliability can also be referred to as repeatability,
indicating that the technique or method applied produces similar measurements of the
same quantity or entity being measured (ISO 2004:4.21).
Validity concerns the degree to which a method actually measures what it
claims to measure (Adams and Byrd 2002). Validation studies are designed to test
techniques and their applicability. In skeletal age estimation, determination of validity for
13
a method or technique is usually conducted by testing a specific method or technique on a
sample of known age individuals (e.g., Wittwer-Backofen et al. 2004; Ginter 2005;
Mulhern and Jones 2005). These tests serve as important reviews of methodology and
also produce data on the accuracy, precision, and reliability of a technique.
Accuracy is the degree of error in a measurement as calculated from the true
value (Youden 1998) or the “closeness of agreement between a quantity value obtained
by measurement and the true value of the measurand” (ISO 2004:A2). For skeletal age
estimation, this is the ability of a method to continually and consistently provide age
intervals that encompass the true age-at-death of individuals. Calculations of inaccuracy,
the absolute difference between actual and estimated ages at death, and bias, the
directionality of these differences, are used to measure error and determine the validity of
a method or technique.
Precision is linked to accuracy and entails the level of refinement of the
measurement or estimate (White and Folkens 2005). The ISO defines precision as:
“closeness of agreement between quantity values obtained by replicate measurements of
a quantity” (2004:2.35). Precision is determined by the number of deviations of an
individual measurement from the average of the total measurements (Youden 1998) and
can be expressed as standard deviation or variance. A very precise technique gives an age
estimate with a very small standard deviation from the average value measured for a
sample and thus a small age interval.
While all of these terms are closely related, they represent different facets of
skeletal aging methods. For example, a technique may be highly accurate but imprecise,
e.g., the actual age-at-death falls into the range of expected values predicted by the
14
method but the range of values is so large it does not give highly useful individualizing
information. Conversely, a technique could also be highly precise but inaccurate, e.g., the
estimated age-at-death offers a narrow interval of one to two years but rarely correctly
estimates actual age-at-death. Finally, a technique could be both accurate and precise but
unreliable, e.g., a single researcher may have great success with applying a certain age
estimation technique but when tested by multiple researchers the technique suffers from
high interobserver error. Many varying degrees of accuracy, reliability, and precision are
possible in scientific research and it is vital that all be accounted for when analyzing data
and methodology.
Trends
Forensic Anthropology
Forensic anthropological analysis is inherently concerned with identification
on an individual level. Recent developments in the field suggest that it may be moving
away from this basic concern and towards the analysis of events that occurred at or
around the time of death, as represented by the fields of forensic taphonomy and forensic
archaeology (Dirkmaat et al. 2008). However, the construction of a biological profile still
remains a central focus of forensic anthropologists because it is necessary for the
comparison of missing persons files to unidentified skeletal remains. The data generated
from skeletal analyses can lead to a positive identification (Byers 2008). Elements most
commonly included in the biological profile are: sex, age-at-death, ancestry, and stature
(Komar and Buikstra 2008).
15
In the early 1990s, the development of more stringent guidelines concerning
expert witness testimony and admissibility of evidence meant that forensic
anthropologists began to be held increasingly accountable for the reliability of their
techniques under the Federal Rules of Evidence. The 1993 Supreme Court ruling in the
case of Daubert v. Merrell Dow Pharmaceuticals set the precedent for federal trial judges
to be the “gatekeepers” of evidence. Specifically, this ruling concerned the relevancy and
reliability of expert witness testimony. Evidence must be based on the scientific method,
which means techniques have to be empirically tested, subject to peer review, have a
known error rate and standards of application, and be generally accepted by the scientific
community (Daubert v. Merrell Dow Pharmaceuticals, 509 US 579 [1993]).
Increasingly, in the post-Daubert age, forensic anthropologists are required to provide
standard error rates and measures of reliability for the techniques that they use. Both new
methods and commonly used methods must be consistently tested to ensure their
reliability and forensic anthropologists “should…be particularly cautious that their
investigations result in methods and techniques that will be admissible under the Daubert
guidelines” (Christensen 2004:2).
A second Supreme Court case in 1999, Kumho Tire Company, Ltd. v.
Carmichael, established further guidance for expert witness testimony. Specifically,
Kumho gave greater flexibility to Daubert guidelines with the understanding that not
every expert witness testimony will necessarily meet all of the requirements of Daubert
(Kumho Tire Company, Ltd. v. Carmichael, 526 US 137 [1999]). Grivas and Komar
(2008) express concern about the lack of discussion about Kumho in the forensic
anthropology literature. In fact, “many anthropological techniques already meet the
16
criteria for admissibility under Kumho, potentially making many revisions [of analytical
techniques] unnecessary” (Grivas and Komar 2008:773). However, Grivas and Komar
(2008) also insist that Daubert and Kumho are complementary. What is clear from this
discussion is that the subject of expert witness testimony has become crucial to any
analysis in forensic anthropology and its practitioners must understand their role within
the legal system.
Paleodemography
In contrast to forensic anthropology, paleodemography focuses on the
construction of group or population profiles. Central to paleodemography is the
generation and interpretation of skeletal age distributions, as this information offers key
insights into the demographic composition of a particular group and possible differential
mortality based on age (Milner et al. 2008). The correct interpretation of age distributions
has been at the center of an intense debate since the publication of Farewell to
Paleodemography (Bocquet-Appel and Masset 1982) because it relies on age estimation
in the absence of written records (Konigsberg and Frankenberg 2002). Similar to forensic
anthropology, the field of paleodemography has also undergone its own critiques in
regards to standardization of techniques and understanding error rates when constructing
population profiles from skeletal remains (e.g., Boquet-Appel and Masset 1982; Wood et
al. 1992). Paleodemographers have concentrated their efforts to construct population
parameters that accurately reflect the group being studied and that do not mirror the
reference sample or samples being used to study them.
Age structure mimicry (Mensforth 1990) has been a central problem in
paleodemographic interpretations. In age structure mimicry, the mean age of an aging
17
indicator is actually based on the age structure of the reference population (Bocquet-
Appel and Masset 1982) so that when these calculations are applied to an unknown
group, the “target” sample (Konigsberg and Frankenberg 1992) takes on a distribution
similar or identical to that of the reference. Bocquet-Appel and Masset (1982) also
highlight the very low correlation that exists between skeletal age-indicators and actual
age-at-death, resulting in what Chamberlain (2006:85) terms “unacceptably large
standard errors of estimation.” In addition to the problem of sample mimicry and error,
the osteological paradox as proposed by Wood et al. (1992) addresses conceptual
problems in paleodemography, including demographic nonstationarity, selective
mortality, and hidden heterogeneity in risks. All of these issues have the potential to
completely undermine a paleodemographic study and render its results meaningless.
Critiques of the field of paleodemography have sparked intense discussions on
theoretical perspectives and the development of new age estimation methods based on
Bayesian and maximum-likelihood techniques (Chamberlain 2006). The issues
highlighted during the 1980s have certainly not disappeared, but anthropologists are
continually refining and testing methods to reduce bias and error in age estimation.
Problems with differential preservation and age estimation of older individuals remain
central to the field, but continued research holds promise that age estimation of past
populations is indeed possible and useful for understanding and interpreting morbidity,
mortality, and levels of health from a demographic perspective.
18
Published Methods
Multifactorial and Multiple-Indicator
Many anthropologists advocate a multifactorial approach to age estimation
(e.g., Lovejoy et al. 1985a; Bedford et al. 1993; Baccino et al. 1999; Martrille et al.
2007). Multifactorial age estimation methods, which usually weight a number of
morphological indicators to estimate an overall age range, have been used in
paleodemography and forensic anthropology with varying degrees of success. These
methods are generally mathematically complex. Age estimation based on multiple
indicators uses a variety of single age indicators to come up with an age interval, but
requires no weighting of individual methods.
McKern and Stewart (1957) concluded their extensive study of age changes in
young American males with a chapter on the overall pattern of skeletal maturation that
included three separate regression formulae. These formulae were based on segments
composed of different age indicators. Scores for segments I, II, and III are calculated by
adding up the individual scores for each element and these composite scores can then be
translated into age intervals and predicted age point estimates. McKern and Stewart
(1957) found that the elements of the innominate bone, including the pubic symphysis
and the epiphyses of the iliac crest, ischial tuberosity, and ramus, were the strongest
combined indicators. In the case of missing innominate elements, remaining elements
could also be combined with other skeletal indicators to clarify and support age
determination (McKern and Stewart 1957).
Nemeskéri et al. (1960) devised a “complex” method that combined
endocranial suture closure, pubic symphyseal morphology, and radiographs of the
19
proximal humerus and femur. Each element was given a score and the final age estimate
was derived by averaging the total score and dividing it by four, meaning that no region
was given more weight than another (Iscan 1989b). The Lovejoy et al. (1985a) method is
similar to the Nemeskéri et al. (1960), but it uses principal components analysis to weight
the following indicators: pubic symphysis, auricular surface, proximal femur, dental
wear, and suture closure. Results from this study indicate that multifactorial methods are
superior to single indicators with regards to bias and accuracy. A test of the Lovejoy et al.
(1985a) method by Bedford and colleagues (1993), which eliminated suture closure and
dental wear, included the clavicle, and weighted the indicators according to their
reliability, again found that this method was very accurate and particularly suited for
paleodemography.
The multifactorial method, however, is not unequivocally accepted as the best
possible choice for age estimation. Saunders et al. (1992) found that the multifactorial
method did not outperform a simple averaging of age estimates, thereby eliminating the
need to calculate the complicated statistics the multifactorial method requires. In general,
greater age increased bias and inaccuracy, no matter what method was used (Saunders et
al. 1992). Schmitt et al. (2002), based on their auricular surface scoring system,
concluded that multifactorial methods are not more reliable than single indicators.
Regardless of the performance of multifactorial methods, it is difficult to argue against
the utility of multiple indicators when attempting to derive an age estimate from adult
skeletal remains.
The use of several methods or techniques to derive an overall age interval is
preferable in many cases because it uses all available information rather than a single
20
indicator. Anthropologists are constantly cautioned to avoid the use of single indicators
whenever possible (e.g., Brooks and Suchey 1990; Saunders et al. 1992; Martrille et al.
2007). Through the use of multiple age indicators, the anthropologist can work towards
estimating a more precise interval (Brooks and Suchey 1990). Cranial suture closure, the
fourth sternal rib end, and changes of the pubic symphyseal and auricular surfaces are
some of the most common techniques used to estimate the age of adult skeletal remains
(White and Folkens 2005) and can be readily combined to produce range charts as
recommended by Byers (2008).
The use of multiple indicators does not come without its own caveats. Simply
applying as many age techniques as possible or available to a set of skeletal remains does
not ensure the best possible age-at-death estimate. Age estimation is distinctly related to
uncertainty in measurement; the use of methods with known error, bias, and inaccuracy
rates is essential. Once an individual has been categorized as a young or old adult,
Martrille et al. (2007) recommend further consideration of methods that have higher
accuracy for that age range. This will help “to maximize the potential of each method”
(Martrille et al. 2007:306) by excluding methods that do not perform well for older or
younger adults. This adaptation of multiple age indicators is especially important as
physical anthropologists continually refine age estimation methods and calculate error
rates for skeletal age estimation.
Single-Indicator
Multiple indicator age estimates are usually based on the combination of
single-indicator methods. Single indicator methods can be macrosopic or microscopic,
including gross, histological, chemical, and radiographic observations. In most instances,
21
the methods chosen are directly related to what the anthropologist is working with, taking
into account problems of differential preservation or recovery. It is important to
understand the strength of each method on its own when applying several single-indicator
methods for an overall age estimate. The methods discussed below are those that are used
most commonly for late-adolescent to adult macroscopic age-at-death estimation at the
JPAC/CIL2. The unique composition of the JPAC/CIL population3 and problems of
recovery and preservation of skeletal remains in general render some methods that would
not be employed as frequently very useful (e.g., McKern and Stewart 1957; Mann et al.
1991) and other more frequently used methods less useful (e.g., Iscan et al. 1984b).
Therefore, only those methods that are listed in SOP 3.4 or that were found to be used on
a routine basis are discussed.
Epiphyseal Fusion
Age estimation based on epiphyseal fusion falls under the category of growth
and development in the human skeleton and as such is only useful for those individuals
not yet fully skeletally mature. It is based on the documented progression of fusion of
proximal and distal ends of long bones and other skeletal sites. During the period of
skeletal growth, the observation of open, partially fused, or fully fused epiphyses is very
useful in constructing an age interval. Long bone, vertebral, and iliac crest epiphyseal
closure are useful for individuals in their late teens and early twenties, while the sternal
end of the clavicle can be informative up until the age of 30 for some individuals.
2 All methods employed are macroscopic with the exception of dental formation age
estimation techniques, i.e., Moorrees et al. (1963), Mincer et al. (1993). 3 The JPAC/CIL population is defined as all members of the U.S. military who are missing as
a result of American conflicts. The population is largely young, male, and Caucasian.
22
As part of a larger group of studies on skeletal age estimation conducted by
Todd and colleagues at Western Reserve University, Stevenson (1924) examined
epiphyseal closure for the purposes of age identification. This study was the first
comprehensive effort to document skeletal changes related to age and determined: “(1)
the age of the union of the individual epiphyses, (2) the sequence of union of the different
epiphyses, and (3) the actual duration of the period of epiphyseal union as a whole”
(Stevenson 1924:56). Before this, the only knowledge of age estimation based on
epiphyseal union was sporadically gleaned from anatomical texts with no attempt at
standardization (Stevenson 1924) and there was no agreement between sources (McKern
and Stewart 1957).
Stevenson (1924) outlined four stages of epiphyseal closure: no union,
beginning union, recent union, and complete union in ten bones: humerus, radius, ulna,
femur, tibia, fibula, scapula, innominate, ribs, and clavicle. The largest difference in
epiphyseal closure is that between non-union (stages one and two) and union (stages
three and four) (Stevenson 1924). No difference in rate of fusion was found between
races or sexes and Stevenson concluded that the ages of 15 to 20 years could be defined
as the “real period of epiphyseal union” (1924:76). Additionally, he considered the long
bones to be the most reliable and constant indicators of age, while the scapula,
innominate, clavicle, and ribs exhibited a much larger degree of individual variation
(Stevenson 1924).
McKern and Stewart’s (1957) Skeletal Age Changes in Young American
Males was the first study to use a non-anatomical documented skeletal collection (Bass
2005) and its organization follows closely that of Stevenson (1924), with the addition of
23
analyses of suture closure, third molar development, and fusion of the vertebrae, sternum,
and sacrum. A supplementary category of union was also added, “active,” and all
epiphyses were rated on a zero to five scale (zero = no union, five = complete union).
Due to previous confusion on the concept of when epiphyseal fusion is considered to
occur, McKern and Stewart (1957:18) “emphasize[d] the total range of maturational
activity and define[d] the age of union as that age when all cases are completely united.”
McKern and Stewart (1957) continues to be one of the major references for age
estimation using epiphyseal fusion, especially for young males.
McKern and Stewart (1957:41-42) divided the long bones or “extremities”
into two main groups: Group I – epiphyses showing early union and Group II – epiphyses
showing late union. Because the McKern and Stewart (1957) sample is comprised of
males of military age, no individuals under the age of 17 were a part of this study. Thus,
most Group I epiphyses were already fused for the youngest members of this group, so
this classification is much less informative than the Group II epiphyses. Group II
epiphyses include: proximal humerus, tibia, and fibula and distal radius, ulna, and femur.
Complete union of these epiphyses occurs by age 24 in all individuals in the McKern and
Stewart (1957) sample.
Developmental Juvenile Osteology (Scheuer and Black 2000) is a useful text
for age estimation based on epiphyseal closure of all skeletal sites. The goal of the book,
as stated by its authors, is to “describe each individual bone of the skeleton…from its
embryological origin to the final adult form” (Scheuer and Black 2000:1). As such, it is
an invaluable resource for age identification and it supplies data for both male and female
epiphyseal fusion. Unlike McKern and Stewart (1957), Scheuer and Black (2000) do not
24
provide a five-phase system of fusion. Ages are given for time of union; age estimation
using this source is therefore conducted in a binary fashion, scored either non-union or
union. This text is a compilation of many different sources and not based on a particular
sample.
Stevenson (1924), McKern and Stewart (1957), and Scheuer and Black (2000)
discuss the epiphyseal fusion of most of the major epiphyses in the skeleton. Later works
focused on particular categories of epiphyseal fusion, such as the long bones or the
clavicle. These more specific studies are important to the understanding of skeletal aging
because they further highlight the individual variability that exists in the human skeleton.
McKern and Stewart (1957) offered a rudimentary introduction to age
estimation based on the vertebral column, but concluded this section by stating that
changes seen in the vertebrae are far too variable to be of any use in age estimation.
Albert and Maples (1995) examined the fusion of superior and inferior vertebral rings in
thoracic and lumbar vertebrae. While the sample size is considerably smaller (n=55) than
the McKern and Stewart (1957) sample, this study was an attempt to look at sex and race
differences in the timing of epiphyseal fusion of vertebral centra. Albert and Maples
(1995) utilized a four-stage system, with stages zero through two each having an early
and late division. Vertebral ring fusion is fairly well-correlated with age in this study
(r=0.78), but the authors suggest that this method be used with other methods in order to
narrow the predicted age interval.
Epiphyseal fusion of the medial clavicle and anterior iliac crest are useful for
age estimation because of the delayed appearance of the epiphyses at these two sites.
Webb and Suchey (1985) is the primary source for age estimation from these skeletal
25
elements. Previous studies were limited by sample size (e.g., Todd and D’Errico 1928)
and underrepresentation of females, non-whites, and individuals under 17 years of age
and over 30 (e.g., McKern and Stewart 1957). Webb and Suchey (1985) utilized a four-
stage system to observe the epiphyses of the medial clavicle and anterior iliac crest in a
sample of 605 males and 254 females between the ages of 11 and 40. The authors provide
age distribution tables based on stage of union and “general rules” tables for quick
reference. The McKern and Stewart (1957) and Webb and Suchey (1985) studies show
similar patterns of epiphyseal formation based on the clavicle and iliac crest. The slightly
larger age intervals in the latter study are most likely due to greater variability from a
larger sample (Webb and Suchey 1985).
Even with comprehensive sources like McKern and Stewart (1957) and
Scheuer and Black (2000), research in epiphyseal closure is far from complete. While
Stevenson (1924) noted no differences in timing of fusion between males and females
and individuals of different ethnicities, recent studies suggest that population-specific
standards would be more appropriate. Identification efforts for Bosnia war dead have
shown that males in this sample exhibit epiphyseal closure up to three years earlier than
American males (Schaefer and Black 2005). Research using the Lisbon documented
skeletal collection points to the need to understand the socioeconomic background of the
individuals in the sample being studied, as growth and development may be affected by
conditions of malnutrition (Cardoso 2008). Standardization is also needed since studies
rarely use the same stage system for rating epiphyseal fusion or even the same
methodology, i.e., radiographic versus macroscopic examination of union.
26
Suture Closure
Age estimation from suture closure has a long history, beginning with Vesale
in 1542, who first noted a possible relationship between age and cranial suture synostosis
(Masset 1989). Varying degrees of obliteration in ectocranial, endocranial, and maxillary
suture sites can be correlated to skeletal age changes. A complete historical perspective is
given in Masset (1989). Suture closure methods are perhaps most well-known because of
issues that have been raised with their use. A large number of publications in the 1950s
(e.g., Brooks 1955) highlighted the uncertainty of this indicator because each study
published produced different results for age (Masset 1989). Even though suture closure
was one of the first methods to be used for age estimation from skeletal remains (Todd
and Lyon 1924, 1925), it fell further out of favor as new methods were developed and
refined during the 1980s (Iscan 1989a). Masset (1989) lists possible sources of error as
sex differences in timing of closure and the structure of the reference population and he
found that the correlation between age and cranial suture closure was never greater than
63%.
Given these limitations, the obliteration of cranial sutures can still be useful
for providing general age estimates and to corroborate other skeletal age indicators.
Meindl and Lovejoy (1985) published results on ectocranial suture closure and skeletal
age-at-death. Interestingly, they pointed to methodological inconsistencies as the key
issue behind problems with using cranial suture closure as an age indicator. Meindl and
Lovejoy (1985) employed a four-stage system to score suture closure at ten cranial sites.
Results indicated that the lateral-anterior sites were the best overall predictors of age and
that ectocranial suture closure was superior to endocranial (Meindl and Lovejoy 1985).
27
Lovejoy et al. (1985a) employed the Meindl and Lovejoy (1985) ectocranial suture
closure method as part of their summary age method, which they find to be the most
accurate means to estimate age-at-death.
Mann and colleagues devised a system of age estimation based on the
obliteration of the maxillary sutures (Mann et al. 1987, Mann et al. 1991). The method
uses four sutures of the maxilla: incisive, anterior median palatine, transverse palatine,
and posterior median palatine. In its original format, the amount of obliteration was
measured with a sliding caliper and converted to a percent for the entire suture (Mann et
al. 1987). The revised method eliminated measurement and relied solely on visual
inspection of the four sutures (Mann et al. 1991). Any obliteration on any portion of the
suture automatically places the individual in the age interval corresponding to obliteration
of that suture (personal communication, Robert Mann 2008). It also adds inspection of
the transverse suture within the greater palatine foramen and additional features for
individuals most likely over the age of 60, such as very thin bone and a narrow bony
ridge along the anterior median palatine suture. The progression of obliteration of
maxillary sutures in both studies is as follows: incisive suture, posterior median palatine,
transverse palatine, and finally, the anterior median palatine. General age estimates are
given based on the overall pattern of obliteration.
The Mann et al. (1987) method was tested by Gruspier and Mullen (1991).
This test was designed to look at interobserver error and assess the accuracy of the
maxillary suture obliteration method. Gruspier and Mullen (1991) found that the
relationship between maxillary suture obliteration and age was not linear and the
inaccuracy of the Mann et al. method exceeded all age estimation methods used in
28
Lovejoy et al. (1985a). Ginter (2005) further examined maxillary suture closure by
comparing the original and revised methods and testing the revised method. The revised
method performed much better than the original method; age phase was estimated
correctly for 83% of the individuals in the study (Ginter 2005). Ginter (2005) also
suggested that the revised maxillary suture method was more effective at estimating age
than more commonly accepted methods, e.g., the pubic symphysis and sternal rib ends.
Nawrocki (1998:290) found that all cranial suture closure methods are “not
that much worse than other techniques…and in fact better than some (e.g., pubic
symphysis, sternal end of the rib).” This conclusion is contingent upon the proper
construction of error intervals for all aging methods. Therefore, the problem with age
estimation from cranial suture closure is not the methods themselves, but the use of
inappropriate statistics. Nawrocki (1998) recommended using multiple areas of the vault
and provided several race- and sex-specific regression formulae based on stepwise
regression techniques that perform quite well. The investigation of cranial suture closure
methods for age estimation in skeletal remains continues to be an important, if not
contentious, area of research in physical anthropology.
Third Molar Dental Formation and Eruption
Tooth development and eruption are believed to be under strong genetic
control and are therefore considered to be more reliable in predicting chronological age
than other osteological indicators (White and Folkens 2005). Stages of dental formation
are most useful for estimating the age-at-death of sub-adults, but the late formation and
emergence of the third molars renders this method useful for late adolescents and early
29
adults as well. However, third molars are also the most variable of all teeth (White and
Folkens 2005), exhibiting a high percent of agenesis.
Moorrees et al. (1963) is a comprehensive study of dental formation that looks
at age variation in ten permanent teeth. Teeth included in this study are the maxillary
incisors and all mandibular teeth. Stages are given for crown, root, and apex formation.
For the mandibular molars, results are reported for males and females as well as mesial
and distal roots. Reference charts are provided for both sexes that include the mean age of
attainment for a given stage of formation and two standard deviations. No additional
statistics are included and no data for the maxillary molars are given. Moorrees et al.
(1963) stress the importance of accounting for variability in individual dental formation
and the need for further research to better understand patterns of tooth formation.
Saunders et al. (1993) used the Moorrees et al. (1963) dental formation aging
method to estimate age for a sample of subadult remains (n=282) from a historic
cemetery. While the sample size for accuracy and bias tests was small (n=17), the
Moorrees et al. (1963) method produced age estimates within a standard deviation of half
a year. Saunders et al. (1993) concluded that the Moorrees et al. (1963) method is the best
method to use for juvenile skeletons; however they did also highlight the similarity of the
reference sample and the target sample. Since this study did not specifically focus on
late-adolescents, it may be difficult to draw exact parallels between this sample and
samples of late-adolescents. Additionally, the Moorrees et al. (1963) formation reference
charts are difficult to interpret, which could very easily introduce error into further
studies.
30
A later study by Mincer et al. (1993) analyzed the development of third
molars as it related to chronological age and applications to the legal system. Diplomates
of the American Board of Forensic Odontology (ABFO) scored 823 cases using the
Demirijian et al. (1973) eight-grade system4. Results from the study are summarized in a
table that includes mean ages at attainment with one standard deviation and division by
race (black and white) and sex (male and female). Probabilities of an individual being at
least 18 years of age based on third molar dental formation are also given to aid in the
determination of the juvenile or adult status of an individual. The authors conducted their
own tests of accuracy and concluded that the “third molar is far from an ideal
developmental marker” (Mincer et al. 1993:386). Bass (2005) reemphasizes that the
accuracy of this method has been called into question. However, very few other methods
exist for this age period and in some cases the dentition may be the only element
available for analysis.
Further research in third molar dental formation has shown that the process is
population-specific (Chaillet and Demirijian 2004). Solari and Abramovitch (2002)
investigated third molar formation in a Hispanic sample, again using the Demirjian et al.
(1973) system. Similar to Mincer et al. (1993), Solari and Abramovitch (2002) found that
third molars in males develop earlier than in females and that Hispanics in general
develop earlier than Canadian Caucasians. Other examples of work in this arena include
studies of southern French children (Chaillet and Demirijian 2004), Finnish children
(Chaillet et al. 2004), third molar development in Japanese juveniles (Arany et al. 2004),
4 This system rates the development of the crown, root, and apex for all four types of teeth.
Stages A through H are assigned after radiographic examination of tooth formation.
31
and comparisons of third molar development between American blacks and whites
(Blakenship et al. 2007), which all demonstrate that there is inter-population variation in
dental formation. Clearly, caution must be used when applying methods to ethnically
diverse samples and these findings underscore the need to develop population-specific
methods for age estimation from the third molar.
McKern and Stewart (1957) briefly summarized third molar eruption in their
sample. While they recognized that the pattern of eruption for third molars is extremely
variable, they did also point out its importance as an age indicator that could further
corroborate observations from other skeletal indicators. Unerupted and partially erupted
third molars are the most useful for age estimation and 17 to 22 years of age was
identified as the main eruptional period, with the peak between the ages of 17 and 18
(McKern and Stewart 1957). In general, third molar formation is more useful than
eruption when radiographic analyses are possible.
Pubic Symphysis
The pubic symphysis is the most frequently used skeletal aging technique
(Aykroyd et al. 1999). According to some authors, it is “universally considered more
reliable than other criteria” (Meindl and Lovejoy 1989:138). This reliability is attributed
to the fact that, in general, other sites are less reliable and the age-related changes that
occur in the pubic symphysis are clear and distinct (Meindl et al. 1985; Meindl and
Lovejoy 1989), as well as late-occurring compared to epiphyseal fusion and dental
formation. A good history of anecdotal observations of possible age-related changes of
the pubic symphysis is provided by Todd (1920), but the first formal method of age
32
estimation using this element was not developed until the 1920s by Todd and his
colleagues (Todd 1920, 1921).
The Todd ten-phase system describes the modal appearance of each phase and
gives an age interval per phase. The ten phases are essentially the same regardless of sex
or race. The first three phases he terms “post-adolescent” and the final phase
encompasses all individuals over the age of 50. Photographs of several examples from
each phase are provided in all of his publications (Todd 1920, 1921). In a test of the Todd
system, Brooks (1955) found that it consistently over-aged male and female pubic
symphyses. While the original goal of Brooks’ study was to correlate cranial and pubic
indicators, Brooks (1955) found cranial suture closure to be wholly unreliable and instead
focused on modifying the age intervals per phase of the Todd pubic symphysis method to
attempt to correct the problem of over-aging with the pubic symphysis. Meindl et al.
(1985), in a test of four pubic symphyseal methods5 found the series of Todd methods to
be the most accurate and have the best correlations between real and actual ages.
McKern and Stewart (1957) devised a component system for males based on
three features of the pubic symphysis: dorsal plateau, ventral rampart, and symphyseal
rim. Each component is scored using six stages (zero to five) and these stages are added
to produce a composite score. The choice of features is based on a compilation of nine
features originally described by Todd (1920). McKern and Stewart (1957) found that the
original ten phases were too rigid to encompass the variability they saw in pubic
symphyseal faces. Their system of separate component scoring allowed for differential
5 McKern and Stewart (1957), Gilbert and McKern (1973), Hanihara and Suzuki (1978), Todd
(1920).
33
development of features with the possibility of arriving at the same age estimate, e.g., a
score of 3-3-2 and a score of 2-4-2 both have total scores of eight and would be
categorized as an age interval of 22-28 years with a mean age estimate of 24.14 years.
Plastic casts are available for comparison and summary statistics are in the original
publication. McKern and Stewart’s (1957) own tests of their method indicated that
observers could arrive at the correct age estimate with approximately 90% accuracy.
Meindl et al. (1985) identified three problems that could influence performance of the
McKern and Stewart (1957) method: their sample is entirely male, the age range is
extremely limited, and the method was never tested on another population.
Gilbert and McKern (1973), following McKern and Stewart (1957),
recognized the need for separate standards for females and adapted the three-component
system based on a sample of 120 females aged 17-55 years. The same three components
as the male system were used and the effects of birthing on pubic symphyseal appearance
were also researched. There was no significant activity observed beyond the age of 55
and the pubis appearance had no correlation to parity. Females, however, did exhibit a
faster flattening of the dorsal surface and the full separation of the dorsal and ventral
demi-faces by the symphyseal rim, neither of which is seen in males (Gilbert and
McKern 1973). Suchey (1979) found, however, that the female three-component system
resulted in a correct age interval assessment only 51% of the time, mainly due to
difficulties in applying the method.
The 1980s saw the progressive development of the Suchey-Brooks pubic
symphysis aging method, which is now the most widely used pubic symphysis age
estimation method. Katz and Suchey (1986) reanalyzed the Todd and McKern-Stewart
34
systems with a large sample of male pubic bones (n=739). Regression analyses of
multiple variables (e.g., overall Todd score, ventral rampart) showed that a modified six-
phase Todd system performed better than all other methods and their possible
modifications. The modifications of the Todd system were to combine phases I, II, and
III, phases IV and V, and phases VII and VIII. Male phase descriptions and casts made
by France were first distributed in 1986 at anthropology conferences (Brooks and Suchey
1990). A system was then developed for females, who show much greater variability in
pubic symphyseal morphology than males (Suchey and Katz 1998). Finally, a set of
unisex descriptions was developed that focused on key changes observed in both male
and female pubic bones (Brooks and Suchey 1990). Most recently, Berg (2008) modified
the Suchey-Brooks system by reformulating Phases V and VI and adding a Phase VII.
These changes allow for more accurate aging of older females, but have yet to be tested
independently.
Despite its fairly ubiquitous use in age determination, the Suchey-Brooks
method does have some inherent problems. In blind tests of several different methods of
skeletal age estimation6, Saunders et al. (1992) found the pubic symphysis fared the worst
and they questioned the usefulness of this method due to its broad age ranges. This high
level of imprecision may undermine the utility of this method in forensic cases. In
addition, the pubic symphysis is commonly damaged in forensic and archaeological
contexts (Saunders et al. 1992), rendering any age-determination from this element
impossible.
6 Suchey-Brooks pubic symphysis, Lovejoy et al. auricular surface, ectocranial suture closure,
sternal rib ends
35
Schmitt (2004) used the Suchey-Brooks pubic symphysis method on a sample
of Asian individuals and found that it tended to underage. The method was also highly
inaccurate in older groups, though this information is by no means unique to this method
or to the pubic symphysis as a predictor of age. More troubling, Schmitt (2004) found
asymmetries in right and left bones from the same individuals for both the pubic
symphysis and the auricular surface. This means that bones from the same individual that
are found separately could be assigned to different phases, thereby increasing the error
associated with the age estimate. Asymmetry is not generally recognized as a problem in
age estimation from the pubic symphysis or the auricular surface (e.g., Brooks and
Suchey 1990; Falys et al. 2006)7.
The study by Schmitt (2004) suggests that current pubic symphysis and
auricular surface age estimation methods cannot be accurately or reliably applied to
individuals or groups with Asian origins. Sinha and Gupta (1995) also found that the
Todd pubic symphysis method gave significantly different mean differences in age per
phase when compared to a sample of males from India. However, Brooks and Suchey
(1990) emphasized that the sample for the Suchey-Brooks pubic symphysis method was
developed from a large multiracial sample and should be a fairly good representation of
modern human variation. A final note of caution can be taken from Hoppa (2000), who
reminds anthropologists that skeletal age estimation is far from perfect and that age-
related changes of the pubic symphysis can be significantly different between target and
reference samples.
7 Brooks and Suchey (1990) do not discuss asymmetry between right and left pubic
symphyseal surfaces. Falys et al. (2006) found no significant side differences (Mann-Whitney U-test) between left and right auricular surfaces.
36
Auricular Surface
When compared to the pubic symphysis, use of the auricular surface is far less
accepted or common in skeletal age estimation. This technique was originally developed
in 1985 by Lovejoy and colleagues for paleodemographic and archaeological
applications. It now appears as a standard method in osteology lab manuals, such as
Standards for Data Collection from Human Skeletal Remains (Buikstra and Ubelaker
1994), but is still the focus of considerable research concerning its accuracy and
reliability.
The Lovejoy et al. (1985b) auricular surface aging method relies on gross
morphological changes of the auricular surface of the ilium, similar to the age-related
changes of the pubic symphyseal face. These changes were originally categorized into
eight phases. The auricular surface is potentially advantageous as an age indicator
because of its higher preservation potential when compared to the pubic symphysis and
the presence of morphological age-related changes beyond 50 years old (Lovejoy et al.
1985b). Additionally, it has the capability to perform as well as aging methods using the
pubic symphysis (Lovejoy et al. 1985b) and Saunders et al. (1992) found that it
performed the best in blind tests of different age estimation methods.
Age estimation using the auricular surface of the ilium is not as easy to master
as other aging techniques (Lovejoy et al. 1985b) and there are no casts for comparison.
The photos in the original publication only represent the “modal surface appearance for
each age category” and much more emphasis is thus placed on qualitative descriptions
(Saunders et al. 1992:114). Another common complaint is that the auricular surface
37
technique suffers from a general lack of standardization and more work is needed to
improve the method (Saunders et al. 1992).
Murray and Murray (1991), concerned with the applicability of the Lovejoy et
al. (1985b) auricular surface method to forensic anthropology, tested the accuracy of this
method using individuals of known age-at-death from the Terry Collection. Analysis of
variance (ANOVA) indicated that morphological changes in the auricular surface of the
ilium are dependent on age but not sex (Murray and Murray 1991). Further statistical
analyses showed that auricular surface changes are also independent of ancestry, but that
these changes were too variable to be used as a single indicator of age-at-death (Murray
and Murray 1991). These results indicate that the auricular surface method overages
younger individuals and underages older individuals and its range of estimation error is
too large to be used on a case-by-case basis, as in forensic anthropology.
One of the issues raised by Murray and Murray (1991) was the viability of the
original phases as proposed by Lovejoy et al. (1985b). Osborne (2000) used the original
descriptive terms from the Lovejoy et al. (1985b) method combined with descriptive
statistics to redefine the eight-phase system into a six-phase system. Further research by
Osborne et al. (2004) indicated that the original five-year intervals did not accurately
reflect true variation in auricular surface morphology because age only accounted for
34% of the variation in auricular surface morphology and that a combined scoring system
for auricular surface features fared better than any single indicator. The Osborne et al.
(2004) study highlights the importance of using statistical tests and calculating accuracy,
bias, and confidence intervals when examining the viability of any aging technique.
38
Buckberry and Chamberlain (2002) presented a revised method of age
estimation from the auricular surface of the ilium. Their method uses a quantitative
scoring system that assigns numbered stages to different features of the auricular surface
based on the criteria from Lovejoy et al. (1985b). A composite score is correlated with
one of seven auricular surface stages to estimate age-at-death. Buckberry and
Chamberlain (2002) believe that their method more realistically expresses the age
changes seen in each feature and is easier to apply.
A test of the revised method (Buckberry and Chamberlain 2002) as compared
to the original method (Lovejoy et al. 1985b) indicated that the revised method is more
accurate for individuals between 20-49 years of age, but less accurate between 50-69
years of age (Mulhern and Jones 2005). In this test, the revised method was also found to
be easier to apply than the original method and showed no significant differences
between white and black or male and female individuals (Mulhern and Jones 2005).
Mulhern and Jones (2005) caution, however, that the auricular surface is not accurate
enough to be used as the only indicator of age in older adults.
Falys et al. (2006) also tested the revised method and their results were similar
to those of Mulhern and Jones (2005) concerning ease of application and precision of
aging. Falys et al. (2006) modified the Buckberry and Chamberlain (2002) revised
technique of seven stages by proposing a new three stage system that aggregates certain
composite scores. This aggregation allows for a discrimination of older versus younger
individuals and the three stages show significant differences in age. While the Falys et al.
(2006) system does not aid in distinctly separating middle-aged individuals, it does show
some promise for individuals over the age of 60.
39
Igarashi et al. (2005) proposed another new method for age estimation based
on the auricular surface of the ilium. This method is based on a collection of modern
Japanese skeletal remains with known age-at-death. Nine features in males and seven
features in females are scored as either present or absent. The features were a
combination of relief and texture categories and are well defined within the study. A
feature was marked as present if it was found anywhere on the surface, i.e., “on the basis
of [an] all-or-none principle” (Igarashi et al. 2005:327). While this may eliminate scoring
of gradients, this method warrants further consideration because of the new qualitative
categories described and its development based on a non-western sample. As highlighted
by Schmitt (2004), age estimation is problematic for Asian individuals and samples
because most methods have been developed using European groups and these techniques
may not actually be ancestry independent, as has been suggested by other researchers
(e.g., Murray and Murray 1991, Osborne et al. 2004, Mulhern and Jones 2005).
Sternal Rib Ends
The largest contributors to research on age estimation from the sternal rib end
have been Iscan, Loth, and colleagues. Age related changes were first noted by Kerley
(1970), but no research was conducted until the early 1980s (Loth and Iscan 1989).
Numerous publications during the 1980s described the then newly developed technique,
accompanied by phase descriptions, photographs, and descriptive statistics (e.g., Iscan et
al. 1984a, 1984b, 1985). Sternal rib end estimation was originally based on component
analysis of three features of the right fourth rib: Component I – pit depth, Component II –
pit shape, Component III – rim and wall configurations (Iscan et al. 1984a). The
component system was then modified to a phase system based on the overall changes
40
seen in form, shape, texture, and quality of the sternal rib (Iscan et al. 1984b). There are
nine phases total (0-8) and standards are given for both white males and females (Iscan et
al. 1984b, 1985). Casts of the different phases of the right fourth sternal rib end are
available for comparative analyses.
Blind tests of the white male and female phase methods found that both were
reliable and that interobserver error was minimal (Iscan and Loth 1986a, 1986b).
Participants of varying levels of experience were asked to match unknown ribs to
photographs of specimens in order to assign a phase. Iscan and Loth (1986a, 1986b)
found that the unknown ribs were almost always placed within one phase of the correct
chronological age. Additionally, Iscan et al. (1989) presented a study at the annual
meeting of the American Academy of Forensic Sciences in which they found that the rib
shows much less variation than the pubic symphysis. An independent test of fourth rib
aging supported its use as an age indicator and found no significant differences in age
estimation between white and black males (Russell et al. 1993). Iscan et al. (1987),
however, identified differences in age-related changes of the sternal rib end between
blacks and whites. Because black individuals were consistently over-aged beginning in
their mid-30s (phases five through seven), Iscan et al. (1987) recommended the
modification of the existing white standards. However, no new standards were developed
for non-white individuals.
Iscan and colleagues clearly support their method as one of the best for
skeletal age estimation. However, Nawrocki (n.d.) has questioned its validity because of
the statistics presented in the original studies. The 95% confidence intervals published by
Iscan et al. (1984b, 1985) are in reality the confidence intervals for the population mean
41
and not for the range of values possible per phase (Nawrocki n.d.). Accordingly, the
intervals presented as appropriate for each phase of the Iscan sternal rib end method are
far too small. Constructing accurate error ranges is thus a vital part of any age estimation
method, as it will significantly impact the final age interval (Nawrocki n.d.)
The original sternal rib end method only employed the right fourth rib.
However, the right fourth rib may be absent or the remains so fragmentary that
determination of rib number or side is impossible. Yoder et al. (2001) demonstrated that
ribs IV through IX exhibited similar age-related changes on both right and left sides and
therefore could be used to estimate age with some caution. A summary method of age
estimation, based on a composite of rib series scores, is preferable when the fourth sternal
rib end is not available or observable (Yoder et al. 2001).
Kunos et al. (1999) developed a method of age estimation based on changes
seen in the first rib. While not exclusively based on the sternal end, this method is an
interesting alternative. Kunos et al. (1999) concluded that their method was reliable and
simple, providing age estimates comparable to those produced from multifactorial
methods. Schmitt and Murail (2004) found this method to be far too subjective in its
application, with a correct classification rate of only 55%. This study highlights the
importance of understanding variability and application of single-indicator age estimation
methods. Using a large sample of known age-at-death Balkan males, Bayesian statistics,
and new first rib age estimation method modified from Kunos et al. (1999), DiGangi et
al. (2009) suggest that the first rib may be able to detect age-related morphological
changes into the ninth decade. This method has yet to be tested on other samples, but the
results are certainly promising.
42
The Statistical Basis of Age Estimation
Regression and Correlation
Age estimates are based on skeletally manifested age indicators. Therefore,
the amount of error in any age estimate is directly related to how well a given age
indicator correlates with actual age (Aykroyd et al. 1997). Aging methods have
traditionally relied on linear regression and correlation models that use an observed
morphological age-related change or changes to predict age-at-death. Regression and
correlation models are based on using a known and independent variable (x), the
“indicator,” to predict the unknown and dependent variable (y), the “age.” Konigsberg et
al. (1994) also refer to this as the regression of age on an indicator.
An age estimate is derived by comparing a skeletal element or elements to
photos, casts, or descriptions related to a method and assigning a phase, stage, or score
based on that method. The method then usually reports an age interval, e.g., 25-30 or
60+, for the stage. More statistically robust methods can include any combination of the
following per stage: sample size, a mean (point estimate), median (midpoint), standard
deviation, 95% confidence (or prediction) interval, range, accuracy/inaccuracy, or bias.
Regression and correlation methods have the advantage of relatively easy application
because, once a regression equation has been derived, it is possible to estimate the age of
unknown individuals through a process known as inverse calibration (Aykroyd et al.
1997). Inverse calibration is a statistical procedure that utilizes least squares regression to
estimate values from given data. For age estimation, the estimated value would be age
and the given data would be an age indicator. In inverse calibration, the relationship
between the age indicator (x) and the estimated age (y) is assumed to be linear.
43
Error is based on the correlation of a morphological age indicator to
chronological age, thus, the poorer the correlation between variables, the greater the bias
in the age estimate (Aykroyd et al. 1999). Some statisticians and anthropologists now see
regression models as inherently flawed for estimating age (e.g., Aykroyd et al. 1997,
1999; Konigsberg and Frankenberg 2002), mainly due to the assumed linear relationship
between x and y variables. Classical calibration has been offered as a solution to the bias
inherent in regression models (Konigsberg et al. 1994; Aykroyd et al. 1997). This
variation of regression switches age to the x-variable and the indicator to the y-variable,
devising “an equation for y in terms of x” (Aykroyd et al. 1997:262), i.e., the regression
of the indicator on age (Konigsberg et al. 1994). While this reformulation of regression
results in lower bias, it is also less efficient (Konigsberg et al. 1994), meaning that it
results in greater variability and higher inaccuracy in age estimates than inverse
calibration (Aykroyd et al. 1997). Problems with both inverse and classical calibration
have led to undergoing scrutiny of the statistical analysis of age estimation.
Bayesian Analysis
More recently introduced statistical techniques into physical anthropology
include the use of Bayesian-based prediction models. Konigsberg and Frankenberg argue
that using Bayesian analysis is “the only logical way to proceed in estimating age in
paleodemography” (2002:306) because it solves the problem of reference sample
mimicry (Konigsberg and Frankenberg 1992). Forensic anthropologists have also begun
to incorporate Bayesian models, and not just for age estimation (e.g., Lucy et al. 1996;
Ross and Konigsberg 2002; Schmitt et al. 2002; Edgar 2005). Steadman et al. (2006)
presented the use of likelihood ratios in forensic anthropology to give an overall sense of
44
the strength of an identification based on key elements of the biological profile. Lucy et
al. (1996) identify applications of Bayes’ theorem as especially useful when analyzing
ordinal or categorical data and Schmitt et al. (2002) found that Bayesian prediction was
reliable and useful for individuals over the age of 50.
Bayes’ theorem uses prior probability, maximum likelihood ratios, and
posterior probability, and deals with the age of individuals on a case-by-case basis as part
of a larger sample. Prior probability is the expected probabilistic outcome of a
hypothesis; for aging, this means the probability of an individual being a certain age with
no other information beyond the assumption that he or she is similar to the sample being
used (Aykroyd et al. 1999). The likelihood is based on observed traits, i.e., the probability
of an individual being a certain age based on the distribution of the sample given that
particular score (Aykroyd et al. 1999). Finally, posterior probability is based on both the
prior probability and the likelihood, or the probability of an individual belonging to an
age group based on the prior probability and likelihood (Aykroyd et al. 1999). Aykroyd et
al. (1999:65) summarize these terms in the following manner: “the posterior probability
is proportional to the prior probability multiplied by the likelihood.”
Using Bayesian statistics, age estimates are given as probabilities of an
individual being a certain age given observed age indicators and the structure of the
observed population (and not the reference sample). Prior probabilities can be determined
by assuming uniform priors, using fixed model age structures, or estimating prior
probabilities in the target sample using the observed distribution of indicators in the
sample (Chamberlain 2006). While mathematically more complex, Bayesian age
estimation has the advantage of having a lower Mean Absolute Deviation (MAD) than
45
conventional regression models (Aykroyd et al. 1999) and has been shown to be better at
predicting ages beyond the fifth decade of life (Schmitt et al. 2002). Bayesian models
also support a non-linear relationship between age-at-death and skeletal indicators
(Schmitt et al. 2002). Bayesian models can be disadvantageous because they often require
a large, well-distributed reference sample of known-age individuals with a range of
measured age indicators (Aykroyd et al. 1999), though this challenge can sometimes be
mitigated.
Summary
Research in skeletal age estimation is an on-going endeavor in physical
anthropology. Accurate age estimation is a central facet of forensic anthropology,
paleodemography, and bioarchaeology. The best age estimations are constructed from
multiple methods, with an understanding of the strengths and weaknesses of each method
used to construct the final age interval. Even with decades of developments and
improvements, “age estimation from adult skeletal remains is one of the more difficult
and error-prone procedures in biological anthropology” (Chamberlain 2006:105). It is for
this reason that anthropologists must now work to understand and quantify error.
46
CHAPTER III
UNCERTAINTY ANALYSIS
Uncertainty analysis is a key component to evaluating and improving methods
in all scientific disciplines. In order to better understand measurement uncertainty, it is
necessary to quantify error. Error is the difference between the true value and the
measured value (Brach and Dunn 2004). Measurement uncertainty is expressed by a
range of possible values that could exist for a given measurement based on an estimate of
error for that measurement (Brach and Dunn 2004). Questions asked by those
investigating uncertainty include: “What is the instrument measuring? What units are
being reported by the instrument? What is the precision of the measurement?” (Brach and
Dunn 2004:2). This chapter will discuss standards of estimation of uncertainty, error, and
uncertainty in skeletal age estimation.
Standards
ISO/IEC 17025
The ISO is the global authority on the development of standards. The ISO/IEC
17025 (2005) is currently the international baseline document for general requirements
for testing and calibration laboratories. Only those standards that apply to testing
laboratories will be discussed because anthropology laboratories, in general, are
concerned with the testing of evidence and not with calibration. ISO/IEC 17025 sets the
47
minimum requirements for laboratory accreditation and quality management, including
guidelines for the estimation of uncertainty of measurement (Section 5.4.6). Uncertainty
is defined by the ISO as a “parameter that characterizes the dispersion of the quantity
values that are being attributed to a measurand, based on the information used”
(2004b:2.11).
Section 5.4.6 of ISO/IEC 17025 does not outline specific procedures for
estimating uncertainty in measurement in order to allow for flexibility in application of
the standards, especially when applied to testing laboratories. Certain types of
measurements may not lend themselves to rigorous mathematical testing of uncertainty
like the procedures outlined in the ISO Guide to the Expression of Uncertainty in
Measurement (GUM)–Supplement 1 (2004a). It is the responsibility of the testing
laboratory to “attempt to identify all the components of uncertainty and make a
reasonable estimation, and…ensure that the form of reporting of the result does not give a
wrong impression of the uncertainty” (ISO 2005:5.4.6.2). This estimation can be based
on previous validation studies, experience, and knowledge of method or measurement
performance (ISO 2005). Another important facet of ISO/IEC 17025 is the need to
recognize possible sources of uncertainty, including: reference standards and reference
materials used, methods and equipment used, environmental conditions, properties and
condition of the item being tested or calibrated, and the operator (5.4.6.3, Note 1).
ASCLD/LAB-International
The ASCLD/LAB-International program accredits crime laboratories in
accordance with ISO/IEC 17025 standards and ASCLD/LAB-International Supplemental
Requirements. This accreditation is usually part of a larger quality assurance program
48
undertaken by the laboratory. Like the ISO, “ASCLD/LAB does not prescribe a specific
formula for estimating uncertainty of measurement” (2007:2), but a laboratory must
include certain elements when attempting to analyze uncertainty in measurement. These
elements are: specify what is being measured and the measurement system, construct and
document an appropriate measurement uncertainty budget, identify and list all potential
sources of uncertainty while dismissing any potential sources that do not impact the
uncertainty of measurement, gather measurement data, estimate the uncertainty of
measurement using an appropriate formula, document estimated uncertainty and put
results/supporting documentation in the laboratory, and maintain and calculate as need
arises (ASCLD/LAB 2007:3-4).
An important clarification in the ASCLD/LAB-International Supplemental
Requirements entails what measurements require an estimation of uncertainty. Only
numerical values in quantitative tests are subject to estimation of uncertainty
requirements; the term used for these values is “measurements that matter”
(ASCLD/LAB 2007). Measurements that matter are defined by ASCLD/LAB (2007:4) as
measurements that are “used, or may reasonably be expected to be used, by an immediate
or extended customer (anyone in the judicial process) to determine, prosecute or defend
the type or level of criminal charge(s).” Qualitative tests do not require that estimates of
uncertainty be generated (ASLCD/LAB 2007). However, the updated ASCLD/LAB
uncertainty of measurement requirements recognize that other measurements made
during analysis may impact the accuracy of the final report (ASCLD/LAB 2008). These
“critical measurements made during analysis” will also require estimation of uncertainty
after all “measurements that matter” have been appropriately reported and documented
49
(ASCLD/LAB 2008) and further updates to ASCLD/LAB requirements will reflect the
need to estimate uncertainty in all methods employed.
JPAC/CIL
The JPAC/CIL is the only accredited laboratory dedicated solely to forensic
anthropology in the world. The CIL Quality Assurance Model has an overall goal of
maintaining the highest scientific integrity at all times and adheres to the surety model.
Surety is a form of quality assurance that sets standards to ensure that each and every end
product is of equal quality. This model is used in situations where a high degree of
reliability is required and was first developed in the 1930s by the aircraft industry
(JPAC/CIL 2008).
The JPAC/CIL Uncertainty of Measurement Policy is covered in Annex B of
SOP 4.0 in the JPAC Laboratory Manual (2008). At the CIL, “measurements that matter”
are “those that affect or have the potential to affect the overall conclusions of an
identification” (JPAC/CIL 2008:16). In general, this includes numerical values and
methods that are based on metric analyses. Since the CIL does not perform metrological
calculations of error, the main concerns are recognizing the potential for uncertainty in
measurement and making reasonable efforts to estimate uncertainty.
The list of components of uncertainty at the CIL is identical to that in ISO/IEC
17025 Section 5.4.6.3. Note 1, with additional descriptions of their effects on methods
employed at the CIL. The most significant factor in uncertainty is human error. As long
as methods and equipment are being used for their intended purposes and in accordance
with original, established instructions, there is very little uncertainty associated with test
results. Estimates of uncertainty must be conducted when new methods or modified
50
methods are put into use at the CIL. In these circumstances, estimates will comply with
ASCLD/LAB-International Requirements for the analysis of uncertainty (see above).
While skeletal age estimation methods are not currently listed as requiring estimates of
uncertainty at the CIL because of their generally qualitative and quasi-continuous nature,
ever-adapting ASCLD/LAB-International requirements and updates to uncertainty of
“critical measurements” mean that more stringent guidelines are imminent.
NAS
The recent report released in 2009 by The National Academy of Sciences
(NAS), titled Strengthening Forensic Science in the United States: a Path Forward,
highlights the importance of forensic science research encompassing uncertainty in
measurement and demonstrates the need for more stringent guidelines for the practice of
forensic science. This report is the result of a lengthy study of the field of forensic
science as conducted by the NAS. The document as a whole deals with problems of
regulating a decentralized system in need of national leadership and standards. It has
already been recognized by the American Academy of Forensic Sciences (AAFS) as a
significant contribution to and critique of the field of forensic science (personal
communication, Thomas Bohan March 15, 2009).
Chapter 6 of the NAS report deals specifically with method improvement,
practice, and performance in forensic science. Of paramount importance is the section on
uncertainties and bias, which clearly states that all forensic science methods need to
“indicate the uncertainty in the measurements that are made” (NAS 2009:6-1).
Additionally, the recommendations made are very closely linked with the aims of this
thesis: addressing issues of accuracy, reliability, and validity in methods
51
(Recommendation 3). The NAS also proposes that the National Institute of Forensic
Science (NIFS) establish model laboratory reports in accordance with ISO/IEC 17025
(Recommendation 2). Finally, the NIFS is encouraged to promote research on the effects
of human error in forensic science investigations, including determining causes of bias
and ways to quantify and describe error (Recommendation 5). This report demonstrates
that uncertainty analysis is at the forefront of research in the forensic sciences.
Error
Analysis of error is what determines the degree of uncertainty in a
measurement. Error is quantified by calculating the difference between the actual (or
true) value and the measured or estimated value. Error is a recurring problem of
measurement and the researcher must decide how much error can be tolerated in
experimentation and how to measure this error (Youden 1998). As Bernard (2002:426)
points out, “no set of data is free of error.” Ultimately, the goal is to minimize error while
accurately characterizing any error that is present so as not to overstate the accuracy of
the method (Komar and Buikstra 2008). There are two types of error that contribute to
uncertainty: random error and systematic error. The sum of random and systematic error
is equal to the total uncertainty of a measurement (Brach and Dunn 2004).
Random error is also referred to as Type A error and affects the precision and
repeatability of the measurement (Brach and Dunn 2004). This type of error is
statistically quantifiable and limited by repeated measurements and condition control
(Brach and Dunn 2004), although it is always present in measurement. Random error can
be related to the observer or observers, i.e., the human factor. In skeletal age estimation,
52
random error can occur because of incorrect assignment of an individual to a stage or
phase, or may be due to the analyst’s level of experience. Repetition of measurements
and replication of experiments does not guarantee complete accuracy or removal of
random error, but it can help point out errors (Youden 1998), especially if they result
from inconsistent applications of a method or measurement (Adams and Byrd 2002).
Systematic error, also known as Type B error, affects the accuracy of the
sample mean value in relation to the true mean value (Brach and Dunn 2004). This error
can be minimized by careful calibration, but lacks statistical information (Brach and
Dunn 2004). Systematic error in skeletal age estimation is usually related to the natural
variability of human aging that cannot be completely accounted for in osteological
techniques. In this case, the original phase or stage may not accurately reflect the true
variation of a particular age indicator in the human population. Measures of variability
include: range, variance, and standard deviation (Komar and Buikstra 2008) and can be
used when comparing a target sample to a reference sample.
Uncertainty in Age Estimation
Since age estimations are not metric measurements, the more mathematically
robust standards of measuring uncertainty, such as those in the ISO GUM Supplement 1
(2004a), are not applicable. However, the concept of uncertainty is still important
because it is necessary to understand the error associated with a method. Error directly
contributes to the accuracy of an age estimate and must be accounted for in
identifications in accordance with Daubert. Large error can also affect age estimation in
paleodemography, leading to skewed or incorrect mortality and health profiles. In
53
general, the error being analyzed is random error, which is the error associated with the
assignment of individuals to age intervals based on the application of age estimation
methods. Systematic error may be less readily apparent because it relates to the method
as it was originally developed.
Levels of Measurement
The methods used to estimate age-at-death are largely qualitative in nature.
Based on assigning individuals to a given phase or stage based on observed age-
indicators, they generally produce ordinal data (e.g., phase one, stage three). These values
can be rank ordered, but the differences between the ranks are not meaningful (Bernard
2002). For example, an auricular surface scored as a phase four is not from an individual
twice as old as phase two. Ordinal-level data can be analyzed statistically using
nonparametric tests (e.g., median tests, Spearman’s rank-order correlation coefficient),
but these are less statistically robust than tests for interval- and ratio-level data.
Some age estimation methods generate composite scores. These scores can be
treated as continuous variables because the score given can take on any value on a
continuum (Neter et al. 1988). Continuous variables are significant because they are
either interval- or ratio-level and can be described with continuous probability
distributions, the most well-known and most important in statistical analysis being the
normal distribution (Neter et al. 1988). The normal distribution is required for more
robust statistical tests and is used to draw conclusions about a sample based on the larger
population (Levin and Fox 2007).
Additionally, interval-level variables are produced when data for known age-
at-death and means for phases are available. Using the Suchey-Brooks pubic symphysis
54
method, a female scored as a phase three would be assigned a point age estimate of 30.7
years, which represents the mean age for that phase interval. This point estimate is then
comparable at an interval-level with the actual age of the individual. These data are
significant because they use units of measurement (years) that have known and
meaningful distances (Bernard 2002). Interval-level data can be analyzed statistically
with parametric tests, which include the ability to compare means between samples [e.g.,
Student’s t-test, analysis of variance (ANOVA)] and calculate and describe the degree of
association between two variables (e.g., Pearson’s r correlation, regression analysis)
(Levin and Fox 2007).
Osborne et al. (2004) made a case for the use of parametric tests, such as
analysis of covariance (ANCOVA), in determining the factors that affect auricular
surface morphology. ANCOVA is a statistical test similar to ANOVA, with the addition
of extra variables, or covariates, to examine variation while holding one variable
constant. In order to use ANCOVA, there must be a continuous dependent variable,
which is either interval- or ratio-level. The original age interval phases were designed by
Lovejoy and colleagues as discrete entities to make applying the method easier. However,
Osborne et al. (2004:907) argued that the large number of phase categories and the
general regularity of age-related changes of the auricular surface “closely mimic a true
continuous variable.” Therefore, they were able to apply interval-level statistical tests that
produced more robust information about age changes in the auricular surface. Their
discussion demonstrates that statistical analysis is left to the discretion of the researcher,
who must, in turn, understand the data that he or she is interpreting and what results can
be produced by different tests and assumptions.
55
Calculating Error
The rates of correct and incorrect classification are a useful starting point for
further investigation of error. Actual (or known) age-at-death is compared with the age
interval generated by the method. If the actual age falls within this interval, the
assignment is considered to be correct. Conversely, if it is not within the interval, it is
incorrect. The total number of correct and incorrect assignments can then be converted to
percentages by dividing by the total sample and multiplying by 100. Rates close to 100%
correct classification indicate that the method is very accurate in assigning age intervals.
Tests and validation studies of methods usually indicate the correct classification rate. An
example of this is Suchey’s (1979) test of the Gilbert and McKern (1973) female pubic
symphysis aging method, which yielded a correct classification rate of only 51%. Correct
and incorrect classification rates are a simple calculation, but offer little information on
precision of a method and are only vague indicators of random and systematic error.
Calculations of inaccuracy and bias are the most commonly employed
formulae for quantifying error in age estimation and examine how well a particular
method performs in relation to the actual ages of a sample of individuals. Inaccuracy is
the average error of a method given in years, but it does not take into account the
direction of the error (Meindl and Lovejoy 1989). The equation for inaccuracy utilizes the
absolute value of the difference between estimated and actual ages, divided by the sample
size. The sum of the differences divided by the sample size gives the overall inaccuracy.
Bias is similar to inaccuracy; it is the average error of a method in years that takes into
account under- or over-aging. The equation for bias is identical to that of inaccuracy,
56
except for the elimination of the absolute value to allow for directionality. Bias and
inaccuracy can be calculated for methods, phases within methods, and age classes.
Pearson’s correlation coefficient (r) measures the strength and direction of a
relationship between two variables. It is therefore a measure of precision because it
demonstrates how much each variable deviates from its respective mean (Levin and Fox
2007). For age estimation, correlation examines the strength of the relationship between
the estimated age based on an age-indicator and the actual age (e.g., Mulhern and Jones
2005). An age indicator that is only weakly correlated with actual age could be the source
of significant methodological error. Regression can also be used to further elucidate the
nature of the relationship of two variables, given that one variable is dependent upon the
other. The squared correlation (r2), or coefficient of determination, explains the variance
of one variable in terms of another. For age estimation, this could be used to explain how
much variation in an age indicator can be explained by age. The remaining percentage
would be related to other factors, including error.
Interobserver error is also a significant source of error in age estimation.
There will always be variation between researchers conducting the observations and the
extent of this relates to the reliability of a method. Interobserver error can be measured
using correlation and inaccuracy (e.g., Bedford et al. 1993). The kappa statistic can also
be used to measure agreement between tests for nominal data, encompassing both intra-
and interobserver variation (Komar and Buikstra 2008). A good age estimation method
must exhibit low interobserver error.
Adams and Byrd (2002) devised a Scaled Error Index (SEI) as part of their
study on interobserver error of postcranial skeletal measurements. The SEI is represented
57
by: SEI = [(׀x– median׀) /median]*100, where x represents a single measurement and
median the midpoint of all measurements. This index is useful because it allows for a
comparison of measurements that is unaffected by scale or sample size (Adams and Byrd
2002). The SEI also facilitates comparisons between individuals taking the measurements
and the measurements themselves, so it can be used to analyze both interobserver error
and method performance. Differences in mean SEI can be examined using a one- or two-
tailed t-test, or, for several methods, ANOVA.
Sources of Error
Skeletal aging methods do not produce exact ages; they produce estimates
with associated error rates, like all forensic anthropological techniques (Adams and Byrd
2002). There are several sources of error in age estimation, which can be associated with
those listed in ISO/IEC 17025, 5.4.6.3, Note 1. If incorrectly constructed, the reference
standards can contribute to error. The human skeleton exhibits natural variation and so
reference standards may not always express the full extent of human variability in aging.
Similarly, the reference materials may not be evenly distributed across all age classes,
which can influence the development of the method. Methods can be inherently flawed,
so that even their correct application produces incorrect results. The equipment used for
age estimation is most often comparison exemplars or casts used as part of an aging
method. Improper use of these materials will certainly affect results. Additionally, casts
may not express all possible variations of a particular phase since they were selected to
represent an average expression of age-related traits. Environmental conditions will most
likely have a negligible impact on age estimation, as will properties and condition of the
58
item (skeletal element) being tested, unless it exhibits severe pathological or taphonomic
alterations. In these instances, age will either not be estimated or estimated with caution.
The final cause of error in age estimation is the operator. According to the
JPAC Laboratory Manual (2008:18), “The most significant, ever present, and widely
varied uncertainties are the result of human error and other variances in performance. The
ways in which human performance affects uncertainty of tests may vary widely.” Analyst
error can be mitigated, but never entirely removed. Implementation of quality control
procedures and standardization of analytical techniques aid in minimizing the human
component of overall uncertainty in measurement.
Summary
Understanding error and uncertainty in measurement means that we can be
more confident in standardization and the comparison of results from different
laboratories and anthropologists. Comprehensive studies of methods and techniques
applied in one laboratory or institution offer an interesting historical perspective on
methodology, but these studies are also useful in the context of uncertainty analysis as
required by international scientific standards. A review of all methods used and inter-
method comparisons can provide an overarching critique that addresses temporal and
methodological concerns in a specific setting. This type of study is especially important
for those laboratories that are considering accreditation and for institutions and
individuals that regularly conduct casework for law enforcement agencies and require a
known error rate for laboratory manuals, standard operating procedures (SOPs), and
expert witness testimony. The following study is an attempt to estimate uncertainty in
59
methods used for age estimation at the JPAC/CIL and directly addresses many of the
current issues raised by the NAS hearings.
60
CHAPTER IV
METHODS I: RETROSPECTIVE STUDY
A sample of 979 individuals was compiled from the case files and records
archived at the JPAC/CIL. This chapter discusses the sample and sample selection, data
collection process, and methods used to analyze uncertainty in skeletal age estimation for
this sample. Calculations include: correct versus incorrect classification of individuals,
bias, inaccuracy, scaled error index (SEI), and correlation between known age-at-death
and estimated age.
The Sample
The JPAC/CIL is responsible for the recovery and identification of American
service men and women lost in past conflicts. Once an individual has been identified, the
remains are returned to the family and a case file is kept on record, usually in both hard
copy and electronic format. These case files contain any reports written by analysts at the
CIL, analytical notes, individual military and medical records, and other identification
media. From 1972 to 31 July 2008, the JPAC/CIL identified 1,717 individuals from
worldwide conflicts. Of these, 979 individuals have records with adequate information
concerning known age-at-death and methods used to estimate age.
The overall JPAC/CIL sample is almost completely male, and the sample for
this study is exclusively male. The individuals in this sample are also generally very
61
young, with a limited number of individuals over the age of 40. Additionally, an
overwhelming majority of the individuals are white, although some black and Hispanic
individuals are present in records from later conflicts such as the Vietnam War. A
distribution of individuals identified by conflict is given in Figure 1. The majority of
identified cases are from Southeast Asia and World War II conflicts, with a more limited
number from Korea, the Cold War, and other conflicts. Only one individual has been
identified from World War I. The JPAC/CIL population is unique because it comprises
individuals of identical sexes, with very similar ages, statures, and “races,” who died
from similar causes and in similar manners.
0
200
400
600
800
1000
1200
COLD WAR KOREA OTHER SOUTHEAST ASIA WORLD WAR I WORLD WAR II
Count of CONFLICT
CONFLICT
Figure 1. Individuals identified by conflict (N=1717).
62
Data Collection
Before beginning data collection, a list of all identified individuals was
obtained from the JPAC network and entered into a Microsoft© Excel spreadsheet. This
list included the accession number, name of the individual, approval date, date received,
associated conflict, associated country, and incident number or reference number
(REFNO) for each case. These data were organized in chronological order by accession
number to facilitate data collection from archived records. Additional data collection
categories were added for this study: known age-at-death in years or years and months
when given, analyst’s initials, method(s) used to estimate age, the phase or predicted age
interval for each method, and a place for additional comments.
All data were entered by hand onto ledger-size paper and then re-entered into
a computer database after all data collection was complete. Abbreviations were used for
analysts and methods and a key kept on file. No distinctions were made between right
versus left sides unless recorded as different phases or stages by the analyst. Aging
methods do not cite a significant difference in assigning an age estimate between sides
(e.g., McKern and Stewart 1957 for epiphyseal fusion of the clavicle; Falys et al. 2006
for the auricular surface) so it is expected that side will have a negligible impact on
measurement uncertainty.
All hand-entered data were transferred to a Microsoft© Excel spreadsheet,
undergoing reorganization and cleaning during this process. The data set was cleaned by
eliminating cases that had no available age-at-death or age estimation data because
further analyses would not be possible. Individuals who were eliminated were left in the
original database and a second database was created to encompass only the cleaned data
63
set. This step reduced the sample size from 1717 to 979 individuals. Ages-at-death were
rounded to the closest year since year and month were not available for all individuals.
For zero to five months, the age was rounded down and for six to eleven months rounded
up. Skeletal aging methods do not give estimates as precise as year and month. Therefore,
the reporting of age-at-death by year, whether it is due to original exclusion of months in
the military report or rounding, is unlikely to affect accuracy in data analysis.
Column headings were added to the cleaned database for each method type
(e.g., auricular surface, pubic symphysis) and column subheadings for the specific
methods (e.g., Mincer et al.; Suchey-Brooks). Data were then entered by method and
method type (e.g., pubic symphysis) when a specific method was not listed. Separate
spreadsheets were created for each subheading (specific method) and the data sorted by
method and entered into their respective spreadsheets. Each spreadsheet included the
accession number, known age-at-death, and age estimation as determined by that method,
per individual. Age estimations that were not method-specific were removed at this point
because they did not offer information on the performance of the method. For example, if
the femoral head was cited as fully fused but no reference was given, this data point was
eliminated. Some methods could be inferred (e.g., fully fused epiphyses in cases
identified before 1985 since McKern and Stewart (1957) would have been the only
reference up until this year), but case files that did not cite a specific method were
generally eliminated from the individual method data sets.
For each method-specific spreadsheet, data were resorted in chronological
order by accession number. Considerable reorganization was then needed in order to
begin analysis of the data. Since the sample spans 36 years of data collection at the CIL,
64
there was a wide variety of analytical styles and data recording methods in use.
Therefore, for each method, two columns were added: one for the reported phase or stage
and one for the age interval associated with this stage. In some cases, the analyst reported
only the stage, in other cases, only the age interval. The missing category was filled in
using the referenced method so that both phase/stage and estimated age interval were
listed for each individual. The wide variety of methods used meant that slight
modifications had to be made per method; these are discussed in the next section.
Additional columns were also added as needed for further analyses, e.g., predicted point
estimate, calculations of bias, inaccuracy, and SEI (see below).
Method-Specific Modifications
Epiphyseal Fusion. Scoring of epiphyseal fusion was not consistent between
methods. Data were recorded according to the referenced method, but are not easily
comparable between methods. A breakdown of scoring is given in Table 1. Several cases
had age estimates based on references not commonly used at the CIL, including:
Krogman (1939), McKern (1970), Pyle and Hoeur (1955), White (1991, 2000), Kunos et
Table 1. Epiphyseal scoring by method
Score McKern-Stewart Scheuer-Black Webb-Suchey Albert-Maples 0 Not fused Not
fused/IncompleteN/A No union
1 Beginning Complete Nonunion w/out separate epiphyses
Early-beginning, Late-progressing
2 Active N/A Nonunion w/separate epiphyses
Early-almost complete, Late-recent
3 Recent N/A Partial Union Complete 4 Complete N/A Complete N/A
65
al. (1999), and White and Folkens (2005). These cases were eliminated from further
analyses due to their small sample sizes; they were only used once or twice overall.
For McKern and Stewart (1957), a variety of terms were used by CIL analysts
to describe fusion. When no actual score was given, the descriptive terms were converted
to a number. Fusion described as partial, incomplete, gap, patent, or lapsed was given a
score of two; nearly complete, line, or late stages of union a score of three; and slight line
or small scar a score of four. The number conversion allowed easier comparison of the
data in table format.
Additionally, epiphyseal fusion does not occur at the same point in time for all
sites of a single bone, so it was necessary to record the stage of fusion for each site. This
required significant reorganization of the data reported for both McKern and Stewart
(1957) and Scheuer and Black (2000) into tables that listed each epiphysis of each bone.
Reconsultation of case reports, notes, and photographs was required so that the epiphyses
present for each case could be recorded in instances when the analyst gave the age
estimate based on “complete fusion of all epiphyses.” Late and early distinctions, if
noted, were removed for these methods since no such category exists in the references.
When both sides were recorded, the right side was chosen. Epiphyses that had a sample
size smaller than 15 were eliminated from further analyses, which included all age
estimates based on Scheuer and Black (2000) and the iliac crest using Webb and Suchey
(1985).
Suture Closure. Reporting for cranial suture closure was highly sporadic and
every effort was made to consolidate the data for analysis. In many cases, only a basic
description of the state of the cranial sutures was given and no method was referenced.
66
Age estimations using Meindl and Lovejoy (1985) were given as composite scores, mean
ages, and age intervals. When descriptions of the individual scores were given, these
could be converted into one of the four degrees of closure from the reference method. A
table was compiled that included the composite score, system (vault or lateral-anterior),
and the ten observation sites. Even with this reorganization and a sample size of 22, there
were still not enough composite scores or suture descriptions to adequately analyze this
method and it was excluded from further analyses.
For maxillary suture closure, both the Mann et al. 1987 and 1991 methods
have been employed at the CIL; these methods are not the same. All age estimations
produced using the 1987 method (n=7) were distinguished from those made using the
1991 method (n=55). In many instances, only a description of the state of the sutures was
included in reports or notes and these descriptions were converted to age intervals based
on the 1991 publication. Ginter (2005) was used to attempt to consolidate the many
intervals provided by the analysts since very few of them corresponded to the age
intervals provided by Mann et al. (1991) for the general pattern of suture obliteration.
Only those age estimations with closed intervals (e.g., 15-20) were used to calculate bias,
inaccuracy, and SEI and examine the correlation between known and estimated age-at-
death.
Third Molar Formation. The method of Moorrees et al. (1963) requires
analysis of both mesial and distal tooth roots and data collection and recording followed
this procedure. The method is only applicable to posterior mandibular teeth and all
incisors, although it was used occasionally at the CIL to describe maxillary third molars.
These data points (n=8) were removed from further analysis of the sample. Since data
67
were present for both mesial and distal roots of both mandibular third molars, there were
235 possible data points for 105 individuals.
Mincer et al. (1993) distinguish between maxillary and mandibular third
molars and black and white individuals, but not individual roots. All data referencing this
method were cleaned to reflect the stage per tooth, even when the stage per root was
given by the analyst. Information on the ancestry of the individual was also collected
when known. One case showed a discrepancy in stages between mesial and distal roots
for the same tooth and this tooth was given the same score as the other third molars for
this individual. There were 160 possible data points for 92 individuals.
Pubic Symphysis. Reporting of pubic symphysis age estimates almost always
occurred by method and required little reorganization of data. No modifications were
made to the scores and phases reported. Estimates that were not referenced were
eliminated.
Auricular Surface. Auricular surface age estimates appeared to reference
several sources, but on closer examination all used the Lovejoy et al. (1985b) phase
description with the exception of ten individuals that were scored with Buckberry and
Chamberlain (2002). Analysts who referenced Bedford et al. (1989) were referring to the
publication of color photos of the auricular surface, which were distributed to clarify the
phase descriptions of Lovejoy et al. (1985b). Age estimates with this reference were
combined with those using Lovejoy et al. (1985b). These were then assigned the Lovejoy
et al. phases when only an interval was given and all estimates were then also assigned
the Osborne et al. (2004) modified phases as per the JPAC Laboratory Manual SOP 3.4.
Analyses could then be undertaken with both the original and modified phases.
68
Sternal Rib Ends.Age estimates from the sternal rib ends appeared to be based
on a number of different references. However, they were all different publications of
Iscan and colleagues and could be consolidated into a generalized category based on their
age estimation method. Three individuals were aged based on the Kunos et al. (1999) first
rib criteria but these were eliminated from further analyses because of small sample size.
Data Analysis
Once the data were organized by method, known age-at-death distributions
were examined for each of the methods and compared to the overall sample by means of
ANOVA and Student’s t-tests to determine whether the sub-samples were representative
of the larger group. Distributions per method were also graphically represented as
histograms to look at their shapes. Descriptive statistics were calculated for each method
sample to compare actual ages of individuals making up the larger sample.
Correct and incorrect classification percentages were calculated for each
element or method. A column was added to the right of the phase number for the binary
system used to code correct versus incorrect classification. A zero meant that the actual
age did not fall within the predicted age interval, a one meant that it did. A simple sum of
the column gave the count of correct classification for that method or element. The
correct and incorrect classifications were then tabulated in the same spreadsheet and
percentages calculated based on the total sample size. All correct and incorrect
classifications were then entered into a separate table for ease of comparison between
elements and methods. Correct and incorrect classification sample sizes are generally
larger than sample sizes for calculations of bias/inaccuracy/SEI due to multiple phase
69
assessment age estimations that had to be eliminated from more robust statistical analyses
(e.g., the analyst listed the sternal rib end as a Phase IV-V).
To compare the results obtained by CIL analysts with the published
references, the distribution of known age per phase within a single method was
calculated. Saunders at al. (1992) graphically depicted results of similar analyses by
superimposing known ages of individuals in their sample over the 95% confidence
intervals for the reference standards. This graphing technique was modified for this study
to allow for larger sample sizes than those in Saunders et al. (1992). Therefore, each point
on the scatterplot does not always represent one individual, but all individuals that were
of that known age-at-death. The graphs represent known age-at-death distributions per
phase compared to the intervals of the referenced method per phase. The mean of each
reference interval when given in the method is represented on the graphs by a diamond.
All intervals are based on the male standards when given.
Initial analyses revealed that epiphyseal fusion methods, with the exception of
the iliac crest, S1-S2, and the medial clavicle, were being used to confirm adulthood in a
majority of the cases. The terminal stage of epiphyseal fusion offers little information on
uncertainty in age estimation since an individual with all long bone epiphyses fused could
be twenty-four or eighty years old. Even late-fusing epiphyses did not have adequate
samples sizes in the earlier stages to allow for meaningful statistical comparison.
Epiphyseal fusion methods were eliminated from further analyses. Dental formation
methods suffer from a similar problem, but further analyses were undertaken because
small sample sizes could be mitigated by combining data for all teeth in an individual.
70
A scaled error index (SEI) based on Adams and Byrd (2002) was devised to
allow for between- and within-method (phase to phase) comparison of error.
The equation used for each individual was:
SEI = ׀Estimated Age – Actual Age100 * ׀ Actual Age
Estimated age is either the mean of the phase or stage based on the original
publication or the midpoint of the phase or stage if no mean was included in the method.
In some instances the median may also be used, e.g., Buckberry and Chamberlain’s
(2002) revised auricular surface method. The average SEI per method and per phase
within the method was calculated. Tests of significance between phases of a single
method were conducted by ANOVA. If significant differences were found, Student’s t-
tests or ANOVA with the Bonferroni correction were run to further elucidate error by
phase.
Bias and inaccuracy were also calculated per individual based on the
equations provided by Mulhern and Jones (2005):
Bias = Σ(estimated age-actual age)/N
Inaccuracy = Σ׀estimated age-actual age׀/N
The sum of both bias and inaccuracy was compiled per method and per phase
within the method. Tests of significance were performed using ANOVA for multiples
phases and methods and Student’s t-test between pairs of methods or phases if ANOVA
detected statistically significant differences. Bias was depicted in histograms per method
to examine its distribution. Bias was chosen over inaccuracy and SEI because the
equation allows for negative values, thus giving a full picture of its distribution.
71
Pearson’s r was calculated per method to determine the strength of the
relationship between actual age and estimated age. The estimated mean/median/midpoint
(Y) was compared to the actual age (X) and an r-value calculated to determine the
strength of this relationship. Actual age was used as the independent variable because an
estimated age should depend on morphological changes related to the aging process
(Schmitt 2002). The coefficient of determination (r2) was also calculated in order to
explain how much of the variation in Y could be determined by X. The linear relationship
between Y and X was tested statistically with ANOVA. While phases or stages of age
estimation methods are normally considered to be ordinal data and therefore not
appropriate for correlation, Osborne et al. (2004) argued that the continuous nature of
data produced by age estimation allows for the use of parametric tests. Additionally,
using the mean age of the predicted phase for each individual provides interval-level data,
although using the mean point estimate may introduce additional error into the analyses
and does not provide as strong of a correlation as other interval- or ratio-level data.
Calculating the error of each method individually is not entirely realistic.
Normally, a final age estimation includes all available indicators to come up with an
overall age interval. However, this study will only examine each method’s individual
performance in the JPAC/CIL sample. Further studies using the same data set and a
combination of methods per individual may be possible, but this was not the scope of the
current study.
72
Summary
The retrospective study outlined here encompasses all identified individuals at
the JPAC/CIL with adequate data on known age-at-death and age estimation methods.
Comparisons of age distributions give an overall understanding of the sample and how
well each sub-sample represents the larger group. Comparisons of estimated ages to
known ages-at-death will quantify uncertainty associated with skeletal age estimation
methods.
73
CHAPTER V
METHODS II: INTEROBSERVER ERROR
STUDY
Following data collection for all individuals and age estimation methods in the
JPAC/CIL identified sample and initial analyses of these samples, three methods were
chosen for a preliminary interobserver error study to assess method reliability. These
methods are: Buckberry and Chamberlain’s (2002) revised auricular surface method,
Iscan et al.’s (1984b) sternal rib end for males, and Mann et al.’s (1991) maxillary suture
method. This chapter discusses the choice of methods, design of the study, and methods
of data analysis.
Choice of Methods
Buckberry and Chamberlain (2002)
While the original auricular surface method (Lovejoy et al. 1985b) had an
adequate sample size in the study sample, the revised method (Buckberry and
Chamberlain 2002) was only employed ten times in analyses at the CIL. A histogram of
bias revealed an irregular distribution, most likely due to the small sample size. However,
the method had a 100% correct classification rate. Additionally, it is the author’s opinion
that the Buckberry and Chamberlain (2002) method is used far less than the Lovejoy et
al. (1985b) method, especially since the latter appears in Standards in Data Collection
74
(Buikstra and Ubelaker 1994) and has been around for longer than the revised method.
Subjecting Buckberry and Chamberlain’s (2002) method to interobserver analyses will
produce a larger sample size and a better understanding of the use of this method as it
relates to experience and level of comfort. Is this method a reliable means of age
estimation?
Iscan et al. (1984b)
The sternal rib end method as developed by Iscan and colleagues is used
frequently for age estimations and is included as one of the main methods for age
estimation in Standards for Data Collection (Buikstra and Ubelaker 1994). In the
JPAC/CIL sample, it is used less regularly; the overall sample size is only 21 individuals.
Due to the poor preservation of skeletal remains in archaeological contexts, especially in
Southeast Asia where a large number of individuals are recovered by the JPAC, ribs are
often not recovered and, if recovered, the sternal end may be missing or too damaged for
analysis. Of those individuals who had sternal rib ends intact enough for age estimation,
the correct classification rate for the Iscan et al. age estimation method in the JPAC/CIL
sample was 71.4%. This could be due to problems in application of the method or with
the original age intervals as addressed by Nawrocki (n.d.). A test of interobserver error
using the original method will hopefully clarify at least one of these issues and increase
the overall sample size.
Mann et al. (1991)
The Mann et al. (1991) maxillary suture method is one of the more obscure
methods used for age estimation, and before arriving at the JPAC/CIL, the author had
never heard of or used this method. The Mann et al. maxillary suture method was used 62
75
times in the course of age estimations at the CIL and represents one of the larger sample
sizes, with the exception of epiphyseal fusion methods. With an overall correct
classification rate of 88.7%1, the Mann et al. maxillary suture method holds promise for
accurate age estimation. However, is the success of the Mann et al. method related to the
method itself or the presence of Dr. Mann at the CIL, who can readily answer questions
related to application of the method? Including this method in an interobserver error
study will address the ease of applicability of this method for analysts who may not ever
use this method.
Design of Study
Originally, tests of interobserver variation were to be conducted by comparing
the data produced by each analyst over the course of 36 years of casework at the CIL.
Research questions were: are there discernible trends by individual, e.g., is one person
consistently providing estimates that over- or under-age? and, how does error in age
estimation relate to level of experience as measured by the highest degree held?
However, the timeframe of data collection and the turnover of anthropologists at the CIL
did not allow for the generation of a large enough sample size per method or analyst to
make these comparisons. Therefore, an interobserver error study was developed
following the retrospective portion of data collection that encompassed three methods
commonly used at the CIL.
Two samples were chosen from the CIL study collection for each method: two
innominates, two full ribs, and two crania. Age-at-death of these samples is unknown, but
1 This figure includes correct classification for both Mann et al. (1987) and (1991). Age
estimations derived from the 1987 method were removed from further analyses.
76
an effort was made to choose samples that represented different age classes per method.
Validation studies require the use of a documented collection. Since this study examines
the application of methods, the comparison of known age-at-death to estimated age is not
vital. Instead, the distribution of estimated phases can be examined and inferences made
on method performance and level of experience of the analyst. Anthropologists from the
JPAC/CIL and participants at the 2009 annual meeting of the American Academy of
Forensic Sciences in Denver, Colorado were asked to voluntarily participate in this study.
A survey form was designed based on Adams and Byrd (2002), portions of
which were modified for this study. The survey form was approved by JPAC/CIL lab
management before it was administered and conducted as part of a paid fellowship from
Oak Ridge Institute for Science and Education (ORISE). All individuals participating
gave verbal consent and remained anonymous; no names were recorded and each survey
was randomly coded with a number for database entry. The following background
information was asked of each participant: field of study, highest degree obtained,
whether the individual is a Diplomate of the American Board of Forensic Anthropology
(D-ABFA) or not, number of years of experience with skeletal aging, and approximate
number of skeletons analyzed. For each method, the participant was asked to answer the
following questions: have you used this method before? If yes, do you use it on a regular
basis? What is your level of comfort with this method? (On a scale from one to five, one
being very low and five being very high).
The participant was then asked to give an age estimate for each of the samples
following the referenced method. The original articles were provided for easy reference
as well as any additional materials required for analysis (e.g., sternal rib end casts,
77
flashlight, magnifying glass). Each innominate was entirely covered in aluminum foil,
leaving only the auricular surface exposed, so that the analyst was not influenced by other
areas of the bone. The author was present for almost all portions of the study to answer
questions, but did not help any of the participants apply the methods so as not to bias
interpretations of ease of applicability.
The required information was slightly modified for each separate method. For
Buckberry and Chamberlain (2002), the participant was asked to enter the composite
score, corresponding stage, age point estimate, and interval, a percent confidence that the
observations made correspond to the correct composite score, and any additional notes.
For Iscan et al. (1984b), the participant was asked for a phase and corresponding age
interval, a percent confidence that the observations correspond to the correct phase, and
any additional notes. For Mann et al. (1991), the participant was asked to circle the
sutures that showed obliteration, provide an age estimate based on the state of
obliteration, give a percent confidence that observations of obliteration are correct and
that the interpretation of the age interval is correct, and any additional notes.
Data Analysis
Each completed survey was coded with a number and all information entered
into a Microsoft© Excel spreadsheet with four separate pages. The first page included all
background information provided by the participant. The following three pages contained
the information collected per method. All data were entered exactly as they appeared on
the surveys, including additional comments and notes.
78
The first step of data analysis was to examine the summary of self-reported
background information. Pie charts were produced to summarize the information
provided by participants. The next step of data analysis was to examine each method
individually. This included whether or not the participant had used the method before and
his or her level of comfort with the method. Additionally, are participants correctly
assigning point estimates based on estimated phases or stages?
The distribution of phase or stage assignment was then described. Histograms
were generated per sample based on the analysts’ given phases. In general, are the phases
tightly clustered with a normal distribution or is there a large variety of phase
assignments?
Finally, level of experience was analyzed in relation to phase assignment. The
median phase for each sample was assumed to represent the best possible estimation of
age for that individual. A SEI was calculated for phase and midpoint and then compared
based on experience levels. Experience was broken down by years of experience in
skeletal aging and highest degree obtained.
No comparison was made between SEI and self-reported approximate number
of skeletons analyzed because the sample size for those individuals with over 1000
skeletons was far too small (n=3). Additionally, the total number of skeletons analyzed is
not necessarily related to experience in age estimation. This question was asked to obtain
an overall sense of experience in applied osteology.
79
Summary
This portion of the study seeks to better understand error associated with three
specific age estimation methods. An overview of data obtained from a series of
interobserver error studies will provide information on reliability of these age estimation
methods as related to experience and ease of application of the method. If successful, this
preliminary study may be expanded to include more age estimation methods currently in
use at the JPAC/CIL as well as testing on documented collections.
80
CHAPTER VI
RESULTS I: RETROSPECTIVE STUDY
This chapter discusses the results obtained from statistical analyses of age
estimations produced by anthropologists at the CIL. Final sample sizes, known age-at-
death distributions, and descriptive statistics of these distributions are given for the
overall sample and each method’s sub-sample. A comparison of methods is given,
followed by the results for each method broken down by method type.
The Sample
Sample Sizes
Final sample sizes for each age estimation method used at the CIL between
1972 and 31 July 2008 are given in Appendix A (Tables A.1 and A.2). Both tables
represent the cleaned data set, which includes all identified individuals for which there
was adequate aging data (known age-at-death and cited methods). Sample sizes are all
greater than 20, with the exception of Todd (1920), Buckberry and Chamberlain (2002),
Webb and Suchey (1985) iliac crest, and all Scheuer and Black (2000) epiphyseal
methods. Ambiguity in reporting of cranial suture closure eliminated the Meindl and
Lovejoy (1985) method from further analyses.
81
Known Age-at-Death Distributions
The known age-at-death distribution for all identified individuals in the
cleaned data set is given in Figure 2. The sample is positively skewed, comprised of a
majority of young individuals and far fewer older individuals. Figures 3 through 15
0
20
40
60
80
100
17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59
Age-at-Death
Fre
qu
ency
Figure 2. Age distribution of total sample (n=979). graphically depict the known age-at-death distributions for each of the methods to allow
for visual comparisons of distribution shape with each other and the overall cleaned data
set. These figures show whether or not the sub-samples are representative of the
JPAC/CIL identified sample. If the sub-samples are not similar to the overall sample or
each other, there could be problems with comparisons made between different aging
methods, specifically related to error since age estimation methods do not perform the
same for all age groups. The method sub-samples are all generally similar in distribution
to the overall sample, with the exception of those that have small sample sizes (e.g.,
Buckberry-Chamberlain; Todd; Iscan et al.).
82
0
1
2
3
4
5
6
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
Age-at-Death
Fre
qu
ency
Figure 3. Age distribution: Albert-Maples 1995 (n=24).
0
1
2
3
4
5
6
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
Age-at-Death
Fre
qu
ency
Figure 4. Age distribution: Webb-Suchey clavicle 1985 (n=33).
83
0
5
10
15
20
25
17 19 21 23 25 27 29 31 33 35 37 39 41 43 45
Age-at-Death
Fre
qu
ency
Figure 5. Age distribution: McKern-Stewart epiphyses 1957 (n=161).
0
2
4
6
8
10
12
18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52
Age-at-Death
Fre
qu
ency
Figure 6. Age distribution: McKern-Stewart pubic symphysis 1957 (n=79).
84
0
2
4
6
8
10
12
19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53
Age-at-Death
Fre
qu
ency
Figure 7. Age distribution: Suchey-Brooks pubic symphysis (n=10).
0
0.5
1
1.5
2
2.5
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
Age-at-Death
Fre
qu
ency
Figure 8. Age distribution: Todd pubic symphysis 1920 (n=93).
85
02468
101214161820
19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53
Age-at-Death
Fre
qu
ency
Figure 9. Age distribution: Lovejoy et al. auricular surface 1985 (n=147).
0
5
10
15
20
25
19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53
Age-at-Death
Fre
qu
ency
Figure 10. Age distribution: Osborne et al. auricular surface 2004 (n=151).
86
0
0.5
1
1.5
2
2.5
19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
Age-at-Death
Fre
qu
ency
Figure 11. Age distribution: Buckberry-Chamberlain auricular surface 2002 (n=10).
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
Age-at-Death
Fre
qu
ency
Figure 12. Age distribution: Iscan et al. sternal rib end 1984 (n=21).
87
0
2
4
6
8
10
12
14
16
18
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
Age-at-Death
Fre
qu
ency
Figure 13. Age distribution: Moorrees et al. dental formation 1963 (n=92).
0
2
4
6
8
10
12
14
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37
Age-at-Death
Fre
qu
ency
Figure 14. Age distribution: Mincer et al. dental formation 1993 (n=105).
88
0123456789
10
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Age-at-Death
Fre
qu
ency
Figure 15. Age distribution: Mann et al. maxillary sutures 1991 (n=55).
In order to quantitatively examine the age distributions, summary statistics for
the the overall known age-at-death sample and each method sub-sample were calculated
(Table 2). Each method sub-sample is generally similar to the larger sample, with mean
and median ages-at-death in the mid-20s. The Todd method is the only sample with a
mean known age-at-death in the 30s, but it is similar in standard deviation and variance to
several other methods. There are no individuals younger than 17 years old or older than
59 years old in the JPAC/CIL identified known age-at-death sample (n=979).
A one-way ANOVA revealed a statistically significant difference in mean
age-at-death between all samples, including the overall known age-at-death sample
(p=0.000). To better understand which samples differed from one other, ANOVA with
the Bonferroni correction for multiple comparisons was run; p-values are given in Table
3. The McKern and Stewart (1957) epiphyseal fusion methods, the maxillary suture
closure method, and both dental formation methods had mean known ages-at-death that
89
Table 2. Descriptive statistics by method.
Sample N Mean (x̄ )
Median Min Max Range Standard Deviation
Variance
All (identified, known age-at-death)
979 27.24 26 17 59 42 6.59 43.39
McKern-Stewart EPIP
161 24.42 23 17 46 29 5.00 24.97
Albert-Maples VERT 25 23.60 23 19 42 23 4.82 23.25
Webb-Suchey CLAV 33 25.21 24 19 46 27 5.15 26.55
Mann et al. MSUT 55 23.88 23 18 36 18 3.92 15.38
Moorrees et al. DEN 105 23.80 23 17 42 25 4.26 18.16
Mincer et al. DEN 92 23.92 23.5 17 37 20 4.00 16.01
McKern-Stewart PS 79 26.34 25 18 53 35 5.81 33.79
Suchey-Brooks PS 93 29.14 27 19 54 35 7.48 56.21
Todd PS 10 31.10 31 24 41 17 6.51 42.32
Lovejoy et al. AS 147 26.97 25 19 53 34 5.89 34.66
Osborne et al. AS 151 26.82 25 19 53 34 5.89 34.65
Buckberry-Chamberlain AS
10 25.40 25 19 37 18 5.19 26.93
Iscan et al. RIB 21 24.95 23 18 35 17 4.54 20.65
were significantly lower than the identified known age-at-death sample; these differences
are significant at α≤0.05. All other methods were not statistically different in mean age-
at-death from the total sample.
Statistically significant differences in mean age-at-death occurred between
two groups: 1. epiphyseal fusion, maxillary suture, and dental formation methods, and 2.
pubic symphysis and auricular surface methods. The sternal rib end method, the Webb-
Suchey clavicle method, and the Buckberry-Chamberlain auricular surface method were
not statistically different in mean age-at-death from all other methods. In general, the first
group represents samples with lower mean ages-at-death, while the second group has
samples with higher mean ages-at-death. This is logical when considering that all
methods in the first group (with the exception of maxillary suture closure) are generally
90
Table 3. P-values from one-way ANOVA with Bonferroni correction: all methods.
Method ALL MC-S EPI
A-M VERT
W-S CLV
MANN MOO MIN MC-S PS
S-B TODD LOVE OSB B-C ISC
ALL N/A .000* .268 1.000 .003* .000* .000* 1.000 .348 1.000 1.000 1.000 1.000 1.000MC-S EPIP
.000* N/A 1.000 1.000 1.000 1.000 1.000 1.000 .000* .064 .021* .042* 1.000 1.000
A-M .268 1.000 N/A 1.000 1.000 1.000 1.000 1.000 .004* .084 .916 1.000 1.000 1.000W-S 1.000 1.000 1.000 N/A 1.000 1.000 1.000 1.000 .123 .635 1.000 1.000 1.000 1.000MANN .003* 1.000 1.000 1.000 N/A 1.000 1.000 1.000 .000* .044* .085 .141 1.000 1.000MOO .000* 1.000 1.000 1.000 1.000 N/A 1.000 .434 .000* .024* .004* .008* 1.000 1.000MIN .000* 1.000 1.000 1.000 1.000 1.000 N/A .831 .000* .033* .014* .027* 1.000 1.000MC-S PS
1.000 1.000 1.000 1.000 1.000 .434 .831 N/A .227 1.000 1.000 1.000 1.000 1.000
S-B .348 .000* .004* .123 .000* .000* .000* .227 N/A 1.000 .606 .330 1.000 .378TODD 1.000 .064 .084 .635 .044* .024* .033* 1.000 1.000 N/A 1.000 1.000 1.000 .740LOVE 1.000 .021* .916 1.000 .085 .004* .014* 1.000 .606 1.000 N/A 1.000 1.000 1.000OSB 1.000 .042* 1.000 1.000 .141 .008* .027* 1.000 .330 1.000 1.000 N/A 1.000 1.000B-C 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 N/A 1.000ISC 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 .378 .740 1.000 1.000 1.000 N/A
*p≤0.05
91
used to provide age estimates for late adolescents/young adults since they concern late
development, while the methods in the second group are used for adults who have
completed all development.
Method-to-Method Comparison
All methods were compared to one another with the exception of the
epiphyseal fusion methods. These are discussed separately below. Initially, the dental
formation methods were also compared to all other methods, but high error values
initiated a reanalysis of Moorrees et al. and Mincer et al. separately. The inclusion of
terminal stages in both epiphyseal fusion and dental formation makes these methods
harder to compare with methods having continuous variable categories.
Correct and Incorrect Classifications
The amount of correct and incorrect classifications by method (excluding
epiphyseal fusion) are given in Table 4. The methods with correct classifications above
Table 4. Correct and incorrect classifications by method (excluding epiphyseal fusion). Method N # Correct % Correct # Incorrect % Incorrect McKern-Stewart PS 79 65 82.3 14 17.7 Suchey-Brooks PS 93 91 97.9 2 2.2 Todd PS 10 7 70.0 3 30.0 Lovejoy et al. AS 147 95 64.6 52 35.4 Osborne et al. AS 151 142 94.0 9 6.0 Buckberry-Chamberlain AS
10 10 100.0 0 0.0
Iscan et al. RIB 21 15 71.4 6 28.6 Mann et al. MSUT 62 55 88.7 7 11.3 Mincer et al. DEN 160 153 95.6 7 4.4 Moorrees et al. DEN 235 209 88.9 26 11.1
92
90% included: Suchey-Brooks, Osborne et al., Buckberry-Chamberlain, and Mincer et al.
Methods with correct classifications between 80 % and 90% were: McKern-Stewart
pubic symphysis, Mann et al., and Moorrees et al. Finally, methods with correct
classifications below 80% were Todd, Lovejoy et al., and Iscan et al., with a particularly
low correct classification rate of 64.6% for Lovejoy et al.
Error
Overall bias, inaccuracy, and the scaled error index (SEI) for each non-
epiphyseal union method are given in Table 5. Bias is the average error in years, taking
into consideration directionality (i.e., over- or underaging) (Meindl and Lovejoy 1989). It
Table 5. Error by method (excluding epiphyseal fusion). Method N Bias (Σ) Inaccuracy (Σ) SEI (x̄ ) McKern-Stewart PS 73 -1.07 2.68 9.49 Suchey-Brooks PS 86 0.76 3.72 13.27 Todd PS 10 0.10 1.80 6.13 Lovejoy et al. AS 147 1.89 3.16 12.40 Osborne et al. AS 113 -0.59 3.93 14.25 Buckberry-Chamberlain AS 9 6.51 6.88 25.91 Iscan et al. RIB 14 -0.50 1.76 7.21 Mann et al. MSUT 27 0.09 2.25 9.35 Mincer et al. DEN 27 -2.02 2.17 10.40 Moorrees et al. DEN 46 -3.47 3.47 16.94
is calculated with the following equation: bias=Σ(estimated age-actual age)/N. Inaccuracy
is the average error in years regardless of directionality (Meindl and Lovejoy 1989). It is
calculated with the following equation: inaccuracy=Σ׀estimated age-actual age׀/N. The
SEI was developed for this study and compares error between estimated and actual age
regardless of scale. It can be used to compare methods, phases of methods, and observers.
93
The SEI is calculated with the following equation: SEI = [(׀Estimated Age – Actual
Age׀) /Actual Age] * 100. The mean is then computed for each group being compared.
The dental formation methods (Mincer et al; Moorrees et al.) originally had
the highest bias and inaccuracy of all methods. Error decreased once the terminal stages
were eliminated from analyses. The calculations given in Table 5 exclude terminal
stages.Tthe Buckberry-Chamberlain method had the highest bias, inaccuracy, and SEI of
all methods, though this may be affected by the small sample size for this method. The
Todd method had very low bias, inaccuracy, and SEI, which may also be affected by
small sample size. Excluding those methods with sample sizes smaller than 20
individuals, the Mann et al. maxillary suture method had the lowest overall error. Table 5
gives the error associated with each method and Figures 16 through 18 are graphical
representations of overall method error by error type (e.g., bias, inaccuracy, SEI).
Based on bias,1 the McKern-Stewart pubic symphysis, Osborne et al., Iscan et
al., Mincer et al., and Moorrees et al. methods generally underaged, while the Suchey-
Brooks, Todd, Lovejoy et al., Buckberry-Chamberlain, and Mann et al. methods
overaged. There was no trend in bias based on types of method (e.g., all pubic
symphyseal methods do not underage). Examining those methods with large enough
sample sizes (n≥20), average error in years did not exceed four years for any method. The
Suchey-Brooks and Osborne et al. methods had higher inaccuracy compared to other
methods, which is related to the large confidence intervals provided for phases of these
methods. A high SEI can also be a function of large confidence intervals, as witnessed by
1 (-) bias indicates underaging, (+) bias indicates overaging.
94
‐4
‐2
0
2
4
6
8
McKern
‐Stewart
PS
Suchey‐B
rooks
Todd
Lovejo
y et al.
Osborne e
t al.
Buckb
erry‐
Cham
berlain
Iscan et al.
Mann et al.
Mincer e
t al.
Moorrees et al.
Figure 16. Sum of bias by method (in years).
0
1
2
3
4
5
6
7
8
McKern‐Stewart PS
Suchey‐Brooks
Todd
Lovejoy et al.
Osborne et al.
Buckberry‐Chamberlain
Iscan et al.
Mann et al.
Mincer et al.
Moorrees et al.
Figure 17. Sum of inaccuracy by method (in years).
95
0
5
10
15
20
25
30
McKern‐Stewart PS
Suchey‐Brooks
Todd
Lovejoy et al.
Osborne et al.
Buckberry‐Chamberlain
Iscan et al.
Mann et al.
Mincer et al.
Moorrees et al.
Figure 18. Mean SEI by method. the SEI for the Buckberry-Chamberlain and Suchey-Brooks methods, or a high
percentage of incorrect classifications, such as the Lovejoy et al. method.
Table 6 shows results from the calculation of Pearson’s r, where X is the
known age-at-death and Y is the midpoint of the estimated age. The Todd and
Table 6. Comparison of Pearson’s r and r2 by method (excluding dental methods). Method N r r2 Standard Error
(in years) P-Value
(ANOVA) McKern-Stewart PS 73 0.79 0.63 3.04 0.000* Suchey-Brooks PS 86 0.80 0.63 4.55 0.000* Todd PS 10 0.95 0.90 2.38 0.000* Lovejoy et al. AS 147 0.78 0.61 3.51 0.000* Osborne et al. AS 113 0.74 0.55 5.33 0.000* Buckberry-Chamberlain AS 9 0.92 0.85 4.67 0.000* Iscan et al. RIB 14 0.71 0.50 2.51 0.004* Mann et al. MSUT 27 0.79 0.63 2.96 0.000*
*p≤0.05
96
Buckberry-Chamberlain methods show high correlations, although all methods listed here
have r-values greater than 0.70. The coefficient of determination (r2) shows how much of
the variation in predicted age can be explained by the known age-at-death. The standard
error is the average number of years one can expect to be off when using a given age
estimation method. No method is greater than six years, with the mean for all eight
methods equal to 3.62 years. All age estimation methods in Table 6 have a significant
linear relationship between known age-at-death and estimated age (ANOVA, p≤0.05).
Method by Method
Epiphyseal Fusion
Table 7 gives the correct and incorrect classification rates for all epiphyseal
fusion methods used at the CIL, with the exception of Scheuer-Black. The epiphyses
listed here all fully fuse generally by late adolescence, with the exception of the medial
clavicle, vertebral centra, iliac crest, and the first two sacral segments. Results for
methods using the long bone and later-fusing epiphyses are given in separate sections
below.
Long Bone Epiphyses
All long bone epiphyses had correct classification rates above 95% and were
scored using McKern and Stewart (1957). There was no clear trend between correct
classification using early- versus late-fusing epiphyses. Early-fusing epiphyses (Group I)
are the distal humerus, medial epicondyle of the humerus, proximal radius, proximal
ulna, femoral head, greater and lesser trochanters of the femur, distal tibia, and distal
fibula (McKern and Stewart 1957). Late-fusing epiphyses (Group II) are the proximal
97
Table 7. Correct and incorrect classifications for epiphyseal fusion methods.
Method Epiphysis N # Correct
% Correct
# Incorrect
% Incorrect
Albert-Maples Vertebral Centra 24 23 95.8 1 4.2 Webb-Suchey Medial Clavicle 33 32 97.0 1 3.0 Iliac Crest 6 6 100.0 0 0.0 McKern-Stewart Proximal
Humerus 80 80 100.0 0 0.0
Distal Humerus 63 63 100.0 0 0.0 Medial
Epicondyle 57 57 100.0 0 0.0
Proximal Radius 50 50 100.0 0 0.0 Distal Radius 51 50 98.0 1 2.0 Proximal Ulna 56 56 100.0 0 0.0 Distal Ulna 37 37 100.0 0 0.0 Proximal Femur 85 83 97.7 2 2.4 Greater
Trochanter 70 70 100.0 0 0.0
Lesser Trochanter
65 65 100.0 0 0.0
Distal Femur 79 77 97.5 2 2.5 Proximal Tibia 72 70 97.2 2 2.8 Distal Tibia 65 64 98.5 1 1.5 Proximal Fibula 36 36 100.0 0 0.0 Distal Fibula 49 48 98.0 1 2.0 Clavicle 72 64 88.9 8 11.1 Iliac Crest 32 32 100.0 0 0.0 S1-S2 28 9 32.1 19 67.9 Vertebrae 18 16 88.9 2 11.1
humerus, distal radius, distal ulna, distal femur, proximal tibia, and proximal fibula
(McKern and Stewart 1957). In the JPAC/CIL sample, the long bone epiphyses that
showed less than 100% correct classification were the distal radius, proximal femur,
distal femur, proximal tibia, distal tibia, and distal fibula. However, incorrect
classifications never occurred for more than two individuals out of each element’s
sample.
98
Age distributions for long bone epiphyses (in %) can be found in Appendix B
(Tables B.1-B.13). For the following epiphyses, all from Group I, only stage four was
observed in the JPAC/CIL identified sample: medial epicondyle of the humerus, proximal
radius, proximal ulna, and lesser trochanter of the femur. In general, Group II epiphyses
had slightly larger age distributions for stages zero through three, but still retained a
majority of individuals in stage four.
Other Epiphyses
Epiphyseal fusion of the vertebral centra was scored using both the Albert and
Maples (1995) and McKern and Stewart (1957) methods. The Albert-Maples method had
a higher correct classification rate than the McKern-Stewart method (95.8% versus
88.9%). Both had similar sample sizes. Figures 19 and 20 depict the distributions of
known ages-at-death superimposed over the stage intervals given by the reference
method. The youngest age of complete union of vertebral centra in the Albert-Maples
sample was 23, while in the McKern-Stewart sample it was 19. Fusion was still occurring
in individuals up to the age of 25 in the Albert-Maples method sample, and 24 in the
McKern-Stewart method sample.
Epiphyseal fusion of the sternal (medial) end of the clavicle was scored using
both the Webb and Suchey (1985) and McKern and Stewart (1957) methods. The Webb-
Suchey method had a higher correct classification rate than the McKern-Stewart method
(97.0% versus 88.9%). Figures 21 and 22 depict the distribution of known ages-at-death
superimposed over the stage intervals given by the reference method. The youngest
individual to exhibit complete fusion in the Webb-Suchey method sample was 25 at the
99
Figure 19. Comparison of known ages of identified males superimposed over the summary stage observations for the three stages of vertebral centra fusion as given in the Albert and Maples (1995) method. time of death, contrasted with 21 for the McKern-Stewart method sample. Active union
was seen up until the age of 28 in both samples.
Epiphyseal fusion of the iliac crest was scored using the Webb and Suchey
(1985) and McKern and Stewart (1957) methods. The overall sample size for Webb-
Suchey was only six individuals, all of whom were correctly classified. While the sample
for the McKern-Stewart method was larger, it too had a 100% correct classification. The
age distributions for stages of iliac crest union using the McKern-Stewart method in the
JPAC/CIL sample are given in Table 8. The youngest individual to show complete union
was 19 and active union was observed until the age of 22. An unfused iliac crest
epiphysis was observed in an individual 20 years of age.
100
Figure 20. Comparison of known ages of identified males superimposed over the summary stage observations for the four stages of vertebral centra fusion as given in the McKern and Stewart (1957) method.
Fusion of the first and second sacral segments was scored uniquely with
McKern and Stewart (1957). Other sacral segments were not recorded. This aging
technique had the lowest percentage of correct classification in the entire JPAC/CIL
sample at 32.1%. Figure 23 shows that the problem clearly lies in the intervals given for
the first three stages; very few of the individuals in the JPAC/CIL sample actually fall
into the age intervals given by the reference method. Stages three and four do not exhibit
the same problem, however, these stages also have very large ranges. A distinct fusion
pattern related to age is not present, with an absence of fusion seen in an individual who
was 30 years old at the time of death and complete fusion in two individuals who were
29. These results suggest that fusion of sacral segments may be of limited use in age
estimation.
101
Figure 21. Comparison of known ages of identified males superimposed over the age intervals for the four stages of epiphyseal fusion of the medial clavicle as given in the Webb and Suchey (1985) method. Suture Closure
Reporting for the Mann et al. maxillary suture method did not appear to
follow a standardized format, which could be due to confusion in applying the method.
This possibility will be discussed further in Chapter VII. The results were difficult to
compare between individuals since the same age class was rarely reported and in many
instances only a minimum age was given (e.g., 20+). Table 9 shows the correct and
incorrect classification of individuals in this sample using the age intervals as they were
reported by the analysts, with an overall correct classification rate of 88.7%. This table
includes age estimations based on both the 1987 and 1991 methods. The highest number
of incorrect classifications occurred in individuals under the age of 20. It is important to
102
Figure 22. Comparison of known ages of identified males superimposed over the age intervals for the five stages of epiphyseal fusion of the medial clavicle as given in the McKern and Stewart (1957) method.
Table 8. Age distribution of stages of iliac crest union (in %): McKern-Stewart (1957).
Age N 0 1 2 3 4 18 2 - - 100 - - 19 2 - - - 50 50 20 5 20 20 40 20 - 21 4 - - 25 - 75 22 3 - - 33 - 67 23 1 - - - - 100
24+ 13 - - - - 100 Total 30
103
Figure 23. Comparison of known ages of identified males superimposed over the age intervals for the five stages of epiphyseal fusion of the first two sacral segments as given in the McKern and Stewart (1957) method. note that there are far fewer individuals in the upper age categories of this method, i.e.,
individuals greater than the age of 25. Therefore, the higher percentage of correct
classification of individuals over the age of 25 may be a factor of sample size. Of those
individuals who were incorrectly classified, four were assigned to age categories greater
than their actual age-at-death and three were assigned to categories less than their actual
age-at-death.
Bias, inaccuracy, and SEI were calculated using the midpoint of the predicted
age interval. Age intervals were taken from Figure 2 in Mann et al. (1991:783). These
intervals represent the general pattern of suture obliteration in an adult; ages for earliest
obliteration are also provided in the original publication. Only 27 individuals could be
104
Table 9. Correct and incorrect classifications by age interval of the Mann et al. maxillary suture method (1987, 1991).
Age Interval N # Correct % Correct # Incorrect % Incorrect <20 4 3 75.0 1 25.0 <21 1 1 100.0 0 0.0 <25 7 7 100.0 0 0.0 <30 2 2 100.0 0 0.0 20+ 15 14 93.3 1 6.7 22+ 1 1 100.0 0 0.0 25+ 4 3 75.0 1 25.0 15-20 4 2 50.0 2 50.0 20-25 11 9 81.8 2 18.2 22-26 1 1 100.0 0 0.0 20-30 3 3 100.0 0 0.0 20-35 1 1 100.0 0 0.0 25-30 3 3 100.0 0 0.0 25-35 1 1 100.0 0 0.0 25-40 1 1 100.0 0 0.0 26-33 1 1 100.0 0 0.0 20-50 2 2 100.0 0 0.0 Total 62 55 88.7 7 11.3
used for this portion of the analysis because all estimates without closed intervals had to
be eliminated due to the lack of a midpoint. Table 10 gives bias, inaccuracy, and SEI
values for intervals as they were reported by the analysts. Overall, the method had a
slight tendency to overage (bias=0.09) and had an average inaccuracy of 2.25 years. Age
estimates under the age of 20 had a tendency to underage, while those over the age of 20
overaged, with the exception of the 25-30 interval. A histogram showing the distribution
of bias for this method is given in Figure 24. The normal curve has been superimposed
over this distribution and demonstrates that error is relatively normally distributed even
with the small sample size.
No graph of actual ages superimposed over intervals was produced for this
method due to the variation in reporting of age estimates. Figure 25 shows the correlation
105
Table 10. Error values for Mann et al. (1991) by reported interval.
Age Interval N Bias (Σ) Inaccuracy (Σ) SEI (x̄ ) 15-20 4 -3.00 3.00 14.38 20-25 11 -0.09 1.55 6.94 20-30 3 0.67 1.33 5.83 20-35 1 3.50 3.50 14.58 20-50 2 3.25 3.75 13.19 22-26 1 0.00 0.00 0.00 25-30 3 -0.83 3.50 12.73 25-40 1 3.50 3.50 12.07 26-33 1 2.50 2.50 10.00 ALL 27 0.09 2.25 9.35
Figure 24. Distribution of bias for the Mann et al. (1991) maxillary suture method.
106
R2 = 0.6308
0
5
10
15
20
25
30
35
40
0 5 10 15 20 25 30 35 40
Known Age-at-Death
Est
imat
ed A
ge
Figure 25. Correlation of estimated and known ages-at-death for the Mann et al. (1991) maxillary suture method (n=27).
between estimated and known age-at-death for individuals in the Mann et al. sample
(n=27). The estimated age values are the midpoints of the predicted age interval. There is
a significant relationship (p=0.000) between estimated and known age-at-death (r=0.79,
see Table 6).
Third Molar Formation
Third molar formation was reported using the Moorrees et al. (1963) and
Mincer et al. (1993) methods. When compared to other age estimation methods employed
at the JPAC/CIL, both dental formation methods exhibited higher than average bias,
inaccuracy, and SEI, while maintaining average to above-average correct classifications
(Tables 4 and 5). Both methods include terminal stages (Apices complete (Ac) and Stage
H, respectively), which give a minimum time for formation but not an upper age
boundary. Similar to epiphyseal fusion methods, an individual with closed root apices
107
could be 25 or 85, thus limiting the usefulness of these methods to age estimation of
individuals beyond late-adolescence.
Tables 11 and 12 display calculations for both methods, differentiating
between results that include the terminal stage and results where the terminal stage has
Table 11. Correct and incorrect classifications of dental formation methods. Method N #
Correct%
Correct #
Incorrect %
Incorrect Moorrees et al. with “Ac” 235 209 88.9 26 11.1 Moorrees et al. excluding “Ac” 54 28 51.9 26 48.2 Mincer et al. with “H” 160 153 95.6 7 4.4 Mincer et al. excluding “H” 27 20 74.1 7 25.9
Table 12. Error of dental formation methods. Method N Bias (Σ) Inaccuracy (Σ) SEI (x̄ ) Moorrees et al. with “Ac” 227 -19.56 20.01 19.35 Moorrees et al. excluding “Ac” 46 -3.47 3.47 16.94 Mincer et al. with “H” 160 -15.57 16.46 15.74 Mincer et al. excluding “H” 27 -2.02 2.17 10.40
been removed from analyses. While the number of incorrect classifications did not
change for either method when the terminal stages are removed from analyses, there was
a dramatic shift in the percentage of correct classifications. Removing individuals
classified as “Ac” or Stage H reduced the number and percentage of correct
classifications. For the Moorrees et al. method, stages of partial root development and
108
apex closure had a low correct classification rate for all teeth and roots (see Tables C.1
through C.4, Appendix C). The Mincer et al. method had higher rates of correct
classification overall (see Tables D.1 through D.4, Appendix D).
Error decreased with the elimination of the terminal stages (Table 12). Both
methods continued to underage, but by a much smaller average number of years.
Inaccuracy was reduced from 20.01 years to 3.47 years for Moorrees et al. and 16.46
years to 2.17 years for Mincer et al. The SEI does not show as large of a reduction, but
still decreases compared to the results including terminal stages. It is clear from these
results that the inclusion of terminal stages in the analyses of dental formation methods
does not accurately represent overall method performance.
Non-terminal stage sample sizes were very small, therefore distributions of
known age-at-death compared to estimated age and distributions of bias will not be
reported, as they are uninformative. Age distributions by stage of root formation
excluding terminal stages are given for each method by combining data for all teeth and
roots for that method (Tables 13 and 14). The minimum age of complete apex closure
observed in both the Moorrees et al. and Mincer et al. samples was 18 years old.
Table 13. Age distribution of stages of dental root formation (in %): Moorrees et al.
Age N R1/2 R3/4 Rc A1/2 17 4 100 - - - 18 8 13 50 38 - 19 7 29 57 - 14 20 12 17 - 33 50 21 11 9 73 9 9 22 0 - - - - 23+ 4 - - 50 50 Total 46
109
Table 14. Age distribution of stages of dental root formation (in %): Mincer et al.
Age N D E F G 17 2 - - 100 - 18 3 - - 33 67 19 6 17 50 - 33 20 10 - - 30 70 21+ 6 - - - 100
Total 27
Since non-terminal stage sample sizes for both methods were too small for
more detailed statistical analyses, comparisons of bias, inaccuracy, and SEI within each
method were conducted including terminal stages. For the Moorrees et al. method, there
was a significant difference in bias (Student’s t-test, p=0.002) and inaccuracy (Student’s
t-test, p=0.002) between the mesial and distal roots, with the mesial roots having higher
bias and inaccuracy than the distal roots. The difference in bias (Student’s t-test,
p=0.236) and inaccuracy (Student’s t-test, p=0.226) between teeth 17 and 32 was not
significant. There was no significant difference in average SEI between teeth or roots
(ANOVA, p=0.263). For the Mincer et al. method, there were no significant differences
between bias (ANOVA, p=0.536), inaccuracy (ANOVA, p=0.568), or SEI (ANOVA,
p=0.837) for all four third molars.
Pubic Symphysis
Three methods were used to report age-related pubic symphyseal changes:
Todd (1920, 1921), McKern and Stewart (1957), and Suchey-Brooks. Of these, the
Suchey-Brooks method had the highest correct classification rate (97.9%, Table 4). The
Todd method had the lowest bias (0.10), inaccuracy (1.80), and SEI (6.13) of the pubic
symphyseal methods (see Table 5), but also the smallest sample size (n=10). All three
110
methods have significant linear relationships between known ages-at-death and estimated
ages (Table 6). Results for each method are presented below.
When employing the Todd pubic symphysis age estimation method, CIL
analysts used combined phases 50% of the time. This means that rather than giving a
single phase, five individuals were given age estimates based on a dual-phase
classification, such as three and four. Table 15 shows a breakdown of the Todd method
sample results. Those individuals that were incorrectly classified (n=3, indicated with an
asterisk in Table 15), fell just outside of the given age range. These three individuals also
Table 15. Todd (1920) pubic symphysis method sample (n=10). Age Phase Range Midpoint Bias (Σ) Inaccuracy (Σ) SEI (x̄ ) 24 3 22-24 23 -0.1 0.1 4.17 24 3 22-24 23 -0.1 0.1 4.17 27* 3-4 22-26 24 -0.3 0.3 11.11 25* 5 27-30 28.5 0.35 0.35 14.00 26* 5 27-30 28.5 0.25 0.25 9.62 35 6-7 30-39 34.5 -0.05 0.05 1.43 35 6-7 30-39 34.5 -0.05 0.05 1.43 36 6-7 30-39 34.5 -0.15 0.15 4.17 38 7 35-39 37 -0.1 0.1 2.63 41 8-9 39-50 44.5 0.35 0.35 8.54 ALL - - - 0.10 1.80 6.13
*incorrect classification had the highest bias, inaccuracy, and SEI values in the sample, with the exception of the
individual classified as phases eight and nine. Bias, inaccuracy, and SEI were calculated
using the midpoint of the range. Phases three, four, six, and seven underaged, while five,
eight, and nine overaged. A distribution of bias is given in Figure 26. The distribution
does not conform to the superimposed normal curve, most likely because of the small
111
Figure 26. Distribution of bias for the Todd (1920) pubic symphysis method. sample size. Additionally, because of the overall small sample size, ANOVA examining
possible differences in bias, inaccuracy, and SEI by phase was not conducted. Figure 27
displays the correlation between known age-at-death and estimated age for this method
(r=0.95, see Table 6). This correlation is statistically significant (ANOVA, p=0.000).
JPAC/CIL SOP 3.4 stipulates that the McKern-Stewart pubic symphysis
method should be used for American males who died before 1960 and the Suchey-Brooks
pubic symphysis method for those who died after 1960 (JPAC/CIL 2008). These two
methods have much larger sample sizes than the Todd method, most likely because the
Todd method is not an SOP-stipulated method for age estimation at the CIL. Each of the
two approved methods will be discussed independently below and then compared to one
another.
112
R2 = 0.8982
05
101520253035404550
0 5 10 15 20 25 30 35 40 45
Known Age-at-Death
Est
imat
ed A
ge
Figure 27. Correlation of estimated and known age-at-death for the Todd (1920) pubic symphysis method (n=10).
The McKern-Stewart pubic symphysis method has an overall correct
classification rate of 82.2% (Table 16). This number differs slightly from the figure in
Table 4 because six individuals with composite scores not falling into the original
Table 16. Correct and incorrect classification: McKern-Stewart (1957) pubic symphysis method.
Total Score N # Correct % Correct # Incorrect % Incorrect 0 0 - - - - 1-2 5 3 60.0 2 40.0 3 3 3 100.0 0 0.0 4-5 9 5 55.6 4 44.4 6-7 13 12 92.3 1 7.7 8-9 11 9 81.8 2 18.2 10 6 5 83.3 1 16.7 11-12-13 22 20 90.9 2 9.1 14 1 0 0.0 1 100.0 15 3 3 100.0 0 0.0 Total 73 60 82.2 13 17.8
reference categories were eliminated. It appears that most scores had high correct
classification rates, but that composite scores 1-2, 4-5, and 14 performed poorly. This
113
trend is also reflected in Figure 28, where the known ages of identified males have been
superimposed over age intervals for each composite score.
Figure 28. Comparison of known ages of identified males superimposed over the age intervals for the composite scores of pubic symphysis components as given in the McKern and Stewart (1957) method. Diamonds represent the mean age for each composite score.
Table 17 shows a breakdown of bias, inaccuracy, and SEI by composite score.
The McKern-Stewart pubic symphysis method underaged individuals in all composite
score categories, with the exception of scores 10 and 15. The score of 14 (n=1) also had a
high SEI, related to the incorrect classification of this individual. The method had an
average inaccuracy of 2.68 years and score 15 had the highest inaccuracy (9.28 years).
Bias is normally distributed (Figure 29). Figure 30 displays the correlation between
estimated and known age-at-death (r=0.79, see Table 6). This correlation is statistically
114
Table 17. Error of McKern-Stewart (1957) pubic symphysis method by composite score group.
Total Score N Bias (Σ) Inaccuracy (Σ) SEI (x̄ ) 0 0 - - - 1-2 5 -0.93 1.51 9.05 3 3 -0.71 1.24 2.09 4-5 9 -1.23 1.78 9.86 6-7 13 -1.03 1.65 5.95 8-9 11 -2.66 3.05 9.01 10 6 0.03 2.11 6.96 11-12-13 22 -0.88 3.22 11.01 14 1 -1.86 1.86 49.33 15 3 1.28 9.28 14.11 ALL 73 -1.07 2.68 9.49
significant (ANOVA, p=0.000). There was a striking decrease in sample size beyond the
age of 30.
Figure 29. Distribution of bias for the McKern-Stewart (1957) pubic symphysis method.
115
R2 = 0.629
0
5
10
15
20
25
30
35
40
45
0 10 20 30 40 50 60
Known Age-at-Death
Es
tim
ate
d A
ge
Figure 30. Correlation of estimated and known age-at-death for the McKern-Stewart (1957) pubic symphysis method (n=73).
Statistical tests of significance could only be run between composite score
groups 6-7, 8-9, and 11-12-13 due to small sample sizes for all other groups. A one-way
ANOVA revealed a significant difference in bias (p=0.031) and inaccuracy (p=0.040)
between these groups, but no significant difference in mean SEI (p=0.164). ANOVA run
with the Bonferroni correction indicated that significant differences in bias occurred
between composite score groups 8-9 and 11-12-13 (p=0.029), but not between 6-7 and
either of those groups. While the difference in inaccuracy between composite score
groups was significant, there was no significant difference between groups at α≤0.05
when ANOVA was run with the Bonferroni correction. The values for composite score
group 8-9 approach significance when compared to both the 6-7 group (p=0.062) and the
11-12-13 group (p=0.076), indicating that the problem of inaccuracy may lie in the 8-9
group.
116
The Suchey-Brooks pubic symphysis method had the highest correct
classification of all three pubic symphysis methods (97.7%, Table 18). The percentages
in Table 18 do not include multiple phase designations and thus differ slightly from the
Table 18. Correct and incorrect classification: Suchey-Brooks pubic symphysis method.
Phase N # Correct % Correct # Incorrect % Incorrect 1 12 10 83.3 2 16.7 2 20 20 100.0 0 0.0 3 22 22 100.0 0 0.0 4 29 29 100.0 0 0.0 5 3 3 100.0 0 0.0 6 0 - - - -
Total 86 84 97.7 2 2.3 correct classification percentages given in Table 4. Phases two through five have 100%
correct classification and only two individuals scored as a phase one were incorrectly
classified. No individuals were observed in phase six in the JPAC/CIL identified sample.
While close to 100% correct classification is impressive, it is also important to recognize
that the age intervals given by the Suchey-Brooks method are very large, which is
reflected in Figure 31.
Error broken down by each phase of the Suchey-Brooks pubic symphysis is
given in Table 19. Samples sizes for the first four phases are adequate. The sample size
for phase five is small and no individuals in the JPAC/CIL sample were placed in phase
six. Overall, the method had a small tendency to overage. When broken down by phase,
phase one is the only phase that underaged. Phase five exhibited the highest positive bias
value, indicating that it overaged more than the other phases. Additionally, phase five
also had the largest inaccuracy and SEI of the phases observed in this sample. Inaccuracy,
117
Figure 31. Comparison of known ages of identified males superimposed over the age intervals for the six phases of the pubis symphysis as given in the Suchey-Brooks pubic symphysis method. Diamonds represent the mean age for each phase. Table 19. Error of Suchey-Brooks pubic symphysis method by phase.
Phase N Bias (Σ) Inaccuracy (Σ) SEI (x̄ ) 1 12 -2.27 3.33 11.84 2 20 0.23 2.52 9.46 3 22 1.33 2.83 12.72 4 29 1.49 5.02 16.51 5 3 5.27 7.27 17.00 6 0 - - -
Total 86 0.76 3.72 13.51
118
bias, and SEI increased with each phase after phase two. Bias is normally distributed
(Figure 32). The Suchey-Brooks pubic symphysis method has a correlation of r=0.80
(see Table 6) between estimated and known age-at-death (Figure 33). This correlation is
statistically significant (ANOVA, p=0.000).
Figure 32. Distribution of bias for the Suchey-Brooks pubic symphysis method.
Statistical tests of significance of error between phases revealed that there is a
significant difference in bias (ANOVA, p=0.004) and inaccuracy (ANOVA, p=0.006)
between the first four phases of the Suchey-Brooks method. Phase five was eliminated
from analyses because of its small sample size. Running ANOVA with the Bonferroni
correction revealed that differences in bias and inaccuracy occured between phase one
and phases two through four (Tables 20 and 21). Phase one is the only phase that had a
119
R2 = 0.633
0
10
20
30
40
50
60
0 10 20 30 40 50 60
Known Age-at-Death
Est
imat
ed A
ge
Figure 33. Correlation of estimated and known age-at-death for the Suchey-Brooks pubic symphysis method (n=86).
Table 20. P-values from ANOVA with Bonferroni correction between the first four phases of the Suchey-Brooks method: bias.
Phase 1 2 3 4 1 N/A - - - 2 0.045* N/A - - 3 0.005* 1.000 N/A - 4 0.005* 1.000 1.000 N/A
*p≤0.05
Table 21. P-values from ANOVA with Bonferroni correction between the first four phases of the Suchey-Brooks method: inaccuracy.
Phase 1 2 3 4 1 N/A - - - 2 0.009* N/A - - 3 0.009* 1.000 N/A - 4 0.109 1.000 1.000 N/A
*p≤0.05
120
negative bias value, indicating that it underaged individuals assigned to this phase. Even
though the difference in inaccuracy between phase one and phase four is not significant,
the p-value (p=0.109) is not as high as the p-values for comparisons of phases two, three,
and four. Inaccuracy is highest overall for phase four. There was no significant difference
in SEI between the first four phases (ANOVA, p=0.189).
The McKern-Stewart and Suchey-Brooks method samples were broken down
into five-year age intervals in order to compare their performance in the JPAC/CIL
identified sample. Bias and inaccuracy were calculated for each five-year interval based
on the sample size for that interval. Results are shown in Table 22. For the youngest age
Table 22. Comparison of bias and inaccuracy: McKern-Stewart and Suchey-Brooks pubic symphysis methods. Known Age McKern-Stewart Suchey-Brooks 16-20 7 5
Bias 0.05 -2.30 Inaccuracy 0.72 2.30
21-25 30 31 Bias 0.54 -0.39
Inaccuracy 2.17 2.92 26-30 22 21
Bias -1.34 1.55 Inaccuracy 2.39 2.57
31-35 9 10 Bias -3.79 4.00
Inaccuracy 3.79 5.32 36-40 2 11
Bias -1.41 -0.84 Inaccuracy 5.41 6.00
41+ 3 8 Bias -9.27 3.25
Inaccuracy 9.27 5.60 All Ages 73 86
Bias -1.07 0.76 Inaccuracy 2.68 3.72
121
class (16-20), the McKern-Stewart method performed better than Suchey-Brooks, with a
very low bias and inaccuracy. For individuals between the ages of 21 and 25, both
methods performed equally as well, but the McKern-Stewart method slightly overaged
individuals in this category while the Suchey-Brooks method slightly underaged. For the
next two age classes (26-30, 31-35), both methods performed similarly to one another,
except that the McKern-Stewart method underaged these individuals and the Suchey-
Brooks method overaged them. From 36-40, both methods underaged individuals in the
age class, with a similar average number of years of inaccuracy. Finally, the 41+ category
had the greatest discrepancy between the two methods in both bias and inaccuracy. This
group also had the smallest sample sizes, which is related to the age distribution of the
JPAC/CIL identified sample. Additionally, the Suchey-Brooks method had a higher
standard error than the McKern-Stewart method, but both methods have almost identical
correlation coefficients when comparing estimated and known age-at-death (r=0.80,
r=0.79, respectively, see Table 6).
Auricular Surface
Three methods were used to estimate age based on the auricular surface:
Lovejoy et al. (1985b), Osborne et al. (2004), and Buckberry and Chamberlain (2002).
SOP 3.4 of the JPAC/CIL laboratory manual stipulates the use of the Buckberry-
Chamberlain method except for very young individuals or where only a partial auricular
surface is present (JPAC/CIL 2008). When employing the Lovejoy et al. method, the
statistics from Osborne et al. (2004) should be used in place of the age intervals provided
in the original Lovejoy et al. method (JPAC/CIL 2008). Even so, analysts often report
both the Lovejoy et al. phase along with the Osborne et al. statistics. This practice allows
122
for the analysis of performance of both methods and results for all auricular surface
methods are presented below. No comparison will be made between the original and
revised methods because the sample size for the Buckberry-Chamberlain method is too
small.
Of the 147 individuals in the Lovejoy et al. sample, 53 were placed in multiple
phases by the analysts. The results in Tables 4 and 5 include all individuals. Table 23
shows a breakdown of all phase assignments as recorded by the analysts and Table 24
includes only those phases originally defined in Lovejoy et al. (1985b). Comparison of
these two tables shows that a higher correct classification rate is obtained when assigning
Table 23. Correct and incorrect classification: Lovejoy et al. (1985b) auricular surface method. Phase Range Midpoint N # Correct % Correct # Incorrect % Incorrect
1 20-24 22 27 22 81.5 5 18.51-2 20-29 24.5 15 14 93.3 1 6.71-3 20-34 27 3 3 100.0 0 0.01-4 20-39 29.5 4 4 100.0 0 0.01-5 20-44 32 1 1 100.0 0 0.0
2 25-29 27 35 20 57.1 15 42.92-3 25-34 29.5 18 10 55.6 8 44.42-4 25-39 32 1 1 100.0 0 0.0
3 30-34 32 20 4 20.0 16 80.03-4 30-39 34.5 3 3 100.0 0 0.03-5 30-44 37 3 2 66.7 1 33.3
4 35-39 37 6 4 66.7 2 33.34-5 35-44 39.5 3 3 100.0 0 0.0
5 40-44 42 6 3 50.0 3 50.05-6 40-49 44.5 1 0 0.0 1 100.05-7 40-59 49.5 1 1 100.0 0 0.0
6 45-49 47 0 - - - -7 50-59 54.5 0 - - - -8 60+ N/A 0 - - - -
ALL - - 147 95 65.6 52 35.4
123
Table 24. Correct and incorrect classification for single phases only: Lovejoy et al. (1985b) auricular surface method. Phase N # Correct % Correct # Incorrect % Incorrect
1 27 22 81.5 5 18.5 2 35 20 57.1 15 42.9 3 20 4 20.0 16 80.0 4 6 4 66.7 2 33.3 5 6 3 50.0 3 50.0 6 0 - - - - 7 0 - - - - 8 0 - - - -
Total 94 53 56.4 41 43.6 individuals to multiple phases. In fact, 100% correct classification is achieved only when
two or more phases are used. The original phases had a correct classification rate of only
56.4% and this problem is again reflected in Figure 34.
Figure 34. Comparison of known ages of identified males superimposed over the age intervals for the eight phases of the auricular surface as given in the Lovejoy et al. (1985b) auricular surface method.
124
Error for the Lovejoy et al. method was calculated for all 147 individuals by
comparing the known age-at-death with the midpoint of the assigned phase since the
reference method does not provide descriptive statistics for the phases (i.e., no confidence
intervals). Overall, the method had a tendency to overage, as indicated by the mainly
positive bias values for all phases (Table 25). Multiple-phase designations generally have
a slightly higher bias, inaccuracy, and SEI values than single-phase designations. Results
Table 25. Error of Lovejoy et al. (1985b) auricular surface method by phase. Phase N Bias (Σ) Inaccuracy (Σ) SEI (x̄ )
1 27 -0.04 1.74 7.59 1-2 15 0.50 1.83 7.57 1-3 3 1.67 1.67 6.62 1-4 4 1.00 4.25 14.95 1-5 1 -4.00 4.00 11.11 2 35 1.80 2.66 11.33
2-3 18 3.39 4.39 18.38 2-4 1 2.00 2.00 6.67 3 20 3.30 5.00 19.28
3-4 3 2.83 2.83 9.25 3-5 3 3.67 4.33 16.88 4 6 3.00 3.67 13.29
4-5 3 0.50 1.50 3.85 5 6 5.00 5.00 15.74
5-6 1 8.50 8.50 23.61 5-7 1 -3.50 3.40 6.60 6 0 - - - 7 0 - - - 8 0 - - - ALL 147 1.89 3.16 12.40
for single phases are presented in Table 26 to separate the original phases as given in the
reference method from multiple phases as assigned by the analysts. Phase five had the
125
Table 26. Error of Lovejoy et al. (1985b) auricular surface method: single phases only. Phase N Bias (Σ) Inaccuracy (Σ) SEI (x̄ )
1 27 -0.04 1.74 7.59 2 35 1.80 2.66 11.33 3 20 3.30 5.00 19.28 4 6 3.00 3.67 13.29 5 6 5.00 5.00 15.74 6 0 - - - 7 0 - - - 8 0 - - -
ALL 94 1.87 3.11 12.35 highest bias, phases three and five had the highest inaccuracy, and phase three had the
highest SEI. Distribution of bias for the Lovejoy et al. auricular surface method is not
normally distributed around zero (Figure 35) and the distribution is skewed to the right.
The method has a Pearson’s r of 0.78 (see Table 6), also represented in Figure 36. The
Figure 35. Distribution of bias for the Lovejoy et al. (1985b) auricular surface method.
126
R2 = 0.6148
0
10
20
30
40
50
60
0 10 20 30 40 50 60
Known Age-at-Death
Est
imat
ed A
ge
Figure 36. Correlation of estimated and known age-at-death for the Lovejoy et al. (1985b) auricular surface method (n=147).
correlation between known and estimated age-at-death is statistically significant
(ANOVA, p=0.000).
One-way ANOVAs were run to compare bias, inaccuracy, and SEI of the first
three phases of the Lovejoy et al. auricular surface method. Multiple-phase designations
were not compared to single-phases or each other as they are uninformative concerning
the performance of the original method and its phases, and phases four and five were not
included in analyses because of small sample sizes. Results from three separate ANOVA
tests indicated that there are statistically significant differences in bias (p=0.001),
inaccuracy (p=0.000), and SEI (p=0.001) between the first three phases observed in the
JPAC/CIL identified sample. The Bonferroni correction for multiple comparisons
revealed that phase three differed significantly from phases one and two in bias,
inaccuracy, and SEI, but that phases one and two were not significantly different from
one another (Tables 27, 28, 29). Phase three had larger error values than phases one and
two.
127
Table 27. P-values from ANOVA with Bonferroni correction between the first three phases of the Lovejoy et al. (1985b) method: bias. Phase 1 2 3
1 N/A - - 2 0.441 N/A - 3 0.000* 0.015* N/A
*p≤0.05 Table 28. P-values from ANOVA with Bonferroni correction between the first three phases of the Lovejoy et al. (1985b) method: inaccuracy. Phase 1 2 3
1 N/A - - 2 1.000 N/A - 3 0.000* 0.000* N/A
*p≤0.05 Table 29. P-values from ANOVA with Bonferroni correction between the first three phases of the Lovejoy et al. (1985b) method: SEI. Phase 1 2 3
1 N/A - - 2 0.515 N/A - 3 0.001* 0.027* N/A
*p≤0.05
128
The Osborne et al. auricular surface statistics were applied to all individuals
that were aged with the Lovejoy et al. method. Additionally, four individuals were aged
with only the Osborne et al. method, giving a total sample size of 151 individuals.
However, multiple-phase classifications were removed from further analyses because
there are no descriptive statistics for multiple phases in the reference method and
multiple-phase estimations produced extremely imprecise age estimates, e.g., phases one
through three gives an estimate of less than or equal to sixty-nine years old. All phases
except phase one had 100% correct classification in the JPAC/CIL sample (Table 30).
These high rates of correct classification are also shown in Figure 37, along with the large
age intervals that are associated with the six phases of the Osborne et al. auricular surface
method.
Table 30. Correct and incorrect classification: Osborne et al. (2004) auricular surface method. Phase N # Correct % Correct # Incorrect % Incorrect
1 79 71 89.9 8 10.1 2 21 21 100.0 0 0.0 3 6 6 100.0 0 0.0 4 7 7 100.0 0 0.0 5 0 - - - - 6 0 - - - -
Total 113 105 92.9 8 7.1
The Osborne et al. auricular surface method had an overall tendency to
underage (bias=-0.59) with an average inaccuracy of 3.93 years (Table 31). This method
had a higher overall SEI than the Lovejoy et al. method. Additionally, inaccuracy and
SEI increased with each subsequent phase. Bias increased after phase two, with phase
129
Figure 37. Comparison of known ages of identified males superimposed over the age intervals for the six phases of the auricular surface as given in the Osborne et al. (2004) auricular surface method. Diamonds represent the mean age for each phase.
Table 31. Error of Osborne et al. (2004) auricular surface method by phase. Phase N Bias (Σ) Inaccuracy (Σ) SEI (x̄ )
1 79 -2.68 3.02 11.71 2 21 0.98 3.88 14.38 3 6 8.00 8.00 26.66 4 7 10.94 10.94 31.88 5 0 - - - 6 0 - - -
Total 113 -0.59 3.93 14.25
130
one underaging and phases two through four increasingly overaging. This trend is also
apparent in Figure 37, in which all individuals in phases three and four have known ages-
at-death below the given means for the phases. The distribution of bias for the Osborne et
al. method is slightly skewed to the right and does not fully conform to the normal
distribution (Figure 38). This method also had a lower correlation between known and
estimated age-at-death than the Lovejoy et al method (r=0.74, see Table 6 and Figure
39), but this correlation is still statistically significant (ANOVA, p=0.000).
Figure 38. Distribution of bias for the Osborne et al. (2004) auricular surface method.
Only phases one and two had adequate sample sizes for statistical tests of
significance. Therefore, Student’s t-tests were run to compare bias, inaccuracy, and SEI
between phases one and two. There were significant differences in bias (p=0.004) and
131
R2 = 0.5481
0
10
20
30
40
50
60
0 5 10 15 20 25 30 35 40 45
Known Age-at-Death
Est
imat
ed A
ge
Figure 39. Correlation of estimated and known age-at-death for the Osborne et al. (2004) auricular surface method (n=113). inaccuracy (p=0.000) between phases one and two, but no significant differences in SEI
(p=0.259). Phase one had a negative bias value, while phase two had a positive bias
value; phase two had a higher average error in number of years (i.e., inaccuracy). For
phases three and four (which were not statistically tested), individuals from the
JPAC/CIL sample placed into these phases had known ages-at-death lower than the
respective phase means.
The Buckberry-Chamberlain revised auricular surface method had 100%
correct classification in the JPAC/CIL sample. However, it also had the highest overall
bias, inaccuracy, and SEI of all methods employed at the CIL2 (Table 5). Only three
stages were recorded (stages one, two, and five) and one individual was classified as
between stages three and four. Because of the 100% correct classification and the small
sample size (n=10), this method was tested with multiple observers (see Chapter VII).
Table 32 gives a summary for all individuals aged with the Buckberry-Chamberlain
method. All observed stages overaged, with the exception of stage one. Figure 40 shows
2 This compariason is made using the adjusted dental formation error values (see Table 12).
132
Table 32. Buckberry-Chamberlain (2002) auricular surface method sample (n=10). Age Score Stage Range Mean Bias (Σ) Inaccuracy (Σ) SEI (x̄ )
19 5 1 16-19 17.33 -0.19 0.19 8.79 21 8 2 21-38 29.33 0.93 0.93 39.67 25 8 2 21-38 29.33 0.48 0.48 17.32 26 8 2 21-38 29.33 0.37 0.37 12.81 22 NR 2 21-38 29.33 0.81 0.81 33.33 27 NR 2 21-38 29.33 0.26 0.26 8.63 22 NR 2 21-38 29.33 0.81 0.81 33.32 25 NR 2 21-38 29.33 0.48 0.48 17.32 30 NR 3-4 16-81 N/A N/A N/A N/A 37 13 5 29-88 59.94 2.55 2.55 62.00
ALL - - - - 6.51 6.88 25.91 NR=not reported that the younger stages performed well, even with a limited sample size. No distribution
of bias is depicted because of the small sample size. The method has a strong correlation
between estimated and known age-at-death (r=0.92, see Table 6 and Figure 41). This
correlation is statistically significant (ANOVA, p=0.000). ANOVA was not conducted
between stages because of the small sample size of each stage.
Sternal Rib End
Age estimation using the sternal rib end exclusively referenced methods
published by Iscan and colleagues. Correct and incorrect classifications were calculated
using the 95% confidence intervals (CI) provided in their 1984 Journal of Forensic
Sciences publication (Table 33). This table includes multiple-phase designations as given
by CIL analysts; seven of 21 individuals were assigned to dual phases. Ranges for more
than one phase were derived by using the minimum age for the lowest phase and the
maximum age for the highest phase to compare classification of individuals into single or
multiple phases. The 100% correct classifications only occurred when individuals were
133
Figure 40. Comparison of known ages of identified males superimposed over the age intervals for the seven stages of the auricular surface as given in the Buckberry-Chamberlain (2002) revised auricular surface method. Diamonds represent the mean age for each stage.
R2 = 0.8538
0
10
20
30
40
50
60
70
0 5 10 15 20 25 30 35 40
Known Age-at-Death
Est
imat
ed A
ge
Figure 41. Correlation of estimated and known age-at-death for the Buckberry-Chamberlain (2002) revised auricular surface method (n=9).
134
Table 33. Correct and incorrect classification: Iscan et al. (1984b) sternal rib end method. Phase 95% CI N # Correct % Correct # Incorrect % Incorrect
1 16.5-18.0 2 1 50.0 1 50.0 1+2 16.5-23.1 1 1 100.0 0 0.0
2 20.8-23.1 7 6 85.7 1 14.3 2+3 20.8-27.7 2 1 50.0 1 50.0
3 24.1-27.7 3 1 33.3 2 66.7 4 25.7-30.6 2 1 50.0 1 50.0
4+5 25.7-42.3 4 4 100.0 0 0.0 5 34.4-42.3 0 0 - 0 - 6 44.3-55.7 0 0 - 0 - 7 54.3-64.1 0 0 - 0 - 8 65.0-78.0 0 0 - 0 -
Total 21 15 71.4 6 28.6 assigned to two phases. Nawrocki (n.d.) called into question the confidence intervals
originally published by Iscan and colleagues, saying that they are far too small. Table 34
presents correct and incorrect classifications using the prediction intervals (PI) given by
Table 34. Correct and incorrect classification using Nawrocki (n.d.) prediction intervals. Phase 95% PI N # Correct % Correct # Incorrect % Incorrect
1 15.5-19.1 2 1 50.0 1 50.0 2 17.2-26.6 7 7 100.0 0 0.0 3 18.3-33.5 3 3 100.0 0 0.0 4 19.4-37.0 2 2 100.0 0 0.0 5 23.2-54.5 0 - - - - 6 25.6-74.4 0 - - - - 7 38.4-80.0 0 - - - - 8 48.0-95.0 0 - - - -
Total 14 13 92.9 1 7.1 Nawrocki (n.d.). This table does not include multiple-phase designations. Correct
classification increased from 71.4% to 92.9% with the modified age intervals. Figure 42
shows the known ages-at-death of individuals aged with the sternal rib end method; the
Iscan et al. (1984b) CI are represented by a solid line and the Nawrocki (n.d.) PI are
135
Figure 42. Comparison of known ages of identified males superimposed over the age intervals for the eight phases of the sternal rib end as given in the Iscan et al. (1984b) sternal rib end method (solid rectangles) and prediction intervals as calculated by Nawrocki (dashed rectangles). Diamonds represent the mean age for each phase. represented by a dashed line. Means of intervals for both Iscan et al. (1984b) and
Nawrocki (n.d.) are the same and are represented by a diamond. The Nawrocki (n.d.)
intervals are much larger than the Iscan et al. (1984b) intervals.
Bias, inaccuracy, and SEI by reported phase are given in Table 35. The
method has an overall tendency to underage; only phase three has a positive bias value.
Average error in years (inaccuracy) was low for all four reported phases. However, phase
one had a high SEI compared to other phases in the method and other methods used in
the JPAC/CIL sample. No statistical comparisons were made between phases since
sample sizes for each phase are small. Distribution of bias does not conform to the
136
Table 35. Error of Iscan et al. (1984b) sternal rib end method by phase.
Phase N Bias (Σ) Inaccuracy (Σ) SEI (x̄ ) 1 2 -3.70 0.53 15.90 2 7 -0.53 0.39 3.42 3 3 2.23 0.48 9.61 4 2 -1.30 0.36 8.16 5 0 - - - 6 0 - - - 7 0 - - - 8 0 - - -
Total 14 -0.5 1.76 7.21 normal curve (Figure 43), but this is most likely due to the small sample size. The Iscan
et al. sternal rib end method has the lowest correlation between estimated and known age-
at-death (r=0.71) of all methods listed in Table 6 (see also Figure 44). However, this
correlation is still statistically significant (ANOVA, p=0.004).
Summary
This chapter presented results from all age estimation methods employed at
the JPAC/CIL between 1972 and 31 July 2008. The age distributions of the total sample
and each individual sub-sample were given, followed by a comparison of methods to one
another and classification rates and error values for each individual method, where
applicable. The next chapter will further examine the reliability of three of the above
methods.
137
Figure 43. Distribution of bias for the Iscan et al. (1984b) sternal rib end method.
R2 = 0.5038
0
5
10
15
20
25
30
35
0 5 10 15 20 25 30 35
Known Age-at-Death
Est
imat
ed A
ge
Figure 44. Correlation of estimated and known age-at-death for the Iscan et al. (1984b) sternal rib end method (n=14).
138
CHAPTER VII
RESULTS II: INTEROBSERVER ERROR
STUDY
A total of 39 individuals voluntarily aged two skeletal samples for each of the
following age estimation methods: Buckberry and Chamberlain (2002) revised auricular
surface, Iscan et al. (1984b) sternal rib end for males, and Mann et al. (1991) maxillary
suture closure. A summary of self-reported background information is presented below,
followed by preliminary results for each method. These results include: distribution of
stages, phases, or suture obliteration, and correlation of experience with these estimates.
Participants
All participants reported their field of study as anthropology or a sub-
discipline of anthropology (Figure 45). The category “other anthropology/combination”
includes the self-reported categories of applied anthropology, bioarchaeology, physical
anthropology/archaeology, and physical/forensic anthropology. Four participants did not
report their field of study.
Approximately half of the participants reported having obtained a Master’s
degree as their highest level of education (Figure 46). The next largest group was made
up of those individuals with a Bachelor’s degree and the smallest group was those
individuals with a Doctorate. Two of 39 participants are Diplomates of the American
139
Field of Study(n =35)
11
8
12
4
Anthropology
Forensic Anthropology
Physical Anthropology
Other Anthropology/Combination
Figure 45. Participants’ self-reported fields of study.
Highest Degree Obtained(n =38)
14
18
6
Bachelor
Master
Doctorate
Figure 46. Participants’ self-reported highest degrees obtained.
140
Board of Forensic Anthropology (ABFA). One individual did not report his or her highest
degree obtained.
More than half of the participants had less than four years of experience in
skeletal age estimation (Figure 47) and had analyzed under 100 skeletons (Figure 48).
Approximately one-third of the participants had between five and nine years of
Experience with Skeletal Aging (in years)(n =38)
21
10
7
0-4
5--9
10+
Figure 47. Participants’ self-reported years of experience with skeletal aging. experience and the remaining participants had over ten years of experience in skeletal
aging. Only three individuals had analyzed over 1000 skeletons and a little over one-third
of the participants had analyzed between 100 and 1000 skeletons. One individual did not
give an estimate of years of experience and another individual did not give an
approximate number of skeletons analyzed. Overall, participants were largely graduate
141
Number of Skeletons Analyzed(n =38)
2114
3
<100
100-1000
>1000
Figure 48. Participants’ self-reported approximate number of skeletons analyzed.1
students enrolled in both MA and PhD programs. A limited number of individuals who
had already completed their graduate studies participated in this study.
Method Performance
Buckberry-Chamberlain Revised Auricular Surface
Approximately half of the participants (48.7%) had never used the Buckberry-
Chamberlain revised auricular surface method before, while the remaining participants
had (51.3%). Of those people that were familiar with the method, 26.3% use it on a
regular basis and 73.7% do not. Self-reported level of comfort with the method was
generally low, with the modal score being one, or “very low.” The median of all self-
1 “Number of skeletons analyzed” was not used for further comparisons of experience and error due to the very small sample size for individuals who had analyzed over 1000 skeletons. Additionally, the total number of skeletons analyzed may not be the best proxy for experience with specific methods, such as skeletal age estimation.
142
reported “comfort scores” was two. Several individuals included comments indicating
that they were familiar with the original auricular surface method as published by
Lovejoy and colleagues (1985b).
In general, participants correctly used the statistics given in the reference
method to assign age point estimates and stages based on composite scores, reporting
either the mean or median of the assigned stage. For sample A, participants were on
average 58.1% sure that their observations corresponded to the correct composite score
and 60.7% sure for sample B. There was no significant difference in level of confidence
between the samples (Student’s t-test, p=0.579). Although the sample sizes for each stage
were too small for statistical analyses, those individuals with higher self-reported
confidence levels were not more likely to assign the consensus stage (Table 36).
Table 36. Percent confidence in assigned composite score by stage: samples A and B.
Sample A Sample B Stage N % Confidence in
Assigned Score (x̄ ) Stage N % Confidence in
Assigned Score (x̄ ) 1 1 60.0 1 2 60.0 2 7 59.3 2 3 61.7
*3 14 61.1 3 0 - 4 8 51.9 4 4 57.5 5 6 55.8 5 5 65.5 6 1 70 *6 11 57.7 7 0 - 7 11 62.5
Total 37 58.1 Total 36 60.7 * indicates median phase assigned.
The distribution of stages assigned by the participants for sample A is given in
Figure 49. One individual did not assign a stage and a second person placed the sample in
both stages III and IV. The most frequent stage assignment was clearly stage III. This
143
0
2
4
6
8
10
12
14
16
I II III IV V VI
Count of Stage
Stage Figure 49. Distribution of assigned stages for sample A (n=37). stage was reported by 14 of 37 participants, but this was still less than 50% of all
participants. There was a relatively large variation in stage assignment, extending most
notably from stages II through V, although one individual classified this sample as a
stage I and a second classified it as a stage VI. Given that the method already has very
large age intervals per stage, the imprecision shown here is remarkable; assigning an
individual to stages II through V would give an estimated age range of 21-88 years.
However, the distribution of stage assignment closely mimics the normal curve.
Based on stage III being the most frequent stage assignment for sample A,
statistics for this stage were used to calculate the SEI as related to both the highest degree
obtained and number of years of experience with skeletal age estimation. The mean age
144
for stage III is 37.86 years, the median age is 37 years, and the range is 16-65 years; the
SEI was calculated using the median age. Those individuals with doctorates had a higher
mean SEI than other groups (Table 37). It was not possible to run ANOVA between all
groups due to the small sample size for the “doctorate” group. A Student’s t-test between
“bachelor” and “master” groups revealed no significant difference in SEI (p=0.940).
When compared by years of experience, the 10+ group also had a higher mean SEI than
other groups (Table 38). No statistical tests of significance were run due to the small
sample sizes of both the 5-9 and 10+ groups.
Table 37. SEI by highest degree obtained: samples A and B. Sample A Sample B Degree N SEI N SEI Bachelor 14 27.80 14 23.27 Master 17 27.03 17 13.47 Doctorate 5 35.14 5 6.667 ALL 36 28.41 36 16.26
Table 38. SEI by years of experience in skeletal aging: samples A and B. Sample A Sample B Years of Experience in Skeletal Aging
N SEI N SEI
0-4 21 26.90 21 17.10 5-9 9 27.03 9 22.05 10+ 7 34.75 7 6.28 ALL 37 28.48 37 16.26
145
The distribution of stages for sample B is not normally distributed like that of
sample A (Figure 50), most likely because this sample appears to fall at or near the
maximum stages for the method. One individual did not assign a stage to the sample and
a second person assigned both stages IV and V to the sample. Stages VI and VII were
assigned 11 times each and the overall distribution for sample B is more varied than for
sample A, with all stages of the reference method represented at least once.
0
2
4
6
8
10
12
I II III IV V VI VII
Count of Stage
Stage Figure 50. Distribution of assigned stages for sample B (n=37).
Since both stages VI and VII were assigned the same number of times, the
SEI was calculated based on the median of all stages assigned, which was stage VI. The
mean age for stage VI is 66.71 years, the median age is 66 years, and the range is 39-91
146
years; the SEI was calculated using the median age. For sample B, individuals with a
bachelor’s degree had the highest mean SEI (Table 37). The small sample size of the
“doctorate” group precluded comparison of all three groups by ANOVA, but a Student’s
t-test revealed that there was no significant difference in SEI between “bachelor” and
“master” groups (p=0.258). Those with five to nine years of experience in skeletal aging
had the highest mean SEI (Table 38) and no other statistical tests of significance were run
because of the small sample sizes for the 5-9 and 10+ groups.
Iscan et al. Sternal Rib End
All but one of the participants had used the Iscan et al. sternal rib end method
prior to this study. Of those people that were familiar with the method, 42.1% use it on a
regular basis and 57.9% do not. The median of self-reported level of comfort with the
method was three, or “medium.” Of the three methods tested, participants were most
comfortable with the sternal rib end method.
Very few of the participants used the age ranges published in the reference
article and instead used the ranges given with the sternal rib end casts. This is not
problematic because all participants indicated what phase they had chosen based on the
cast set and these are the same because they are based on the article. Additionally, as seen
in Chapter VI, the confidence intervals assigned by Iscan and colleagues are far too small
to be statistically valid for both the reference article and the casts set. Therefore, the
published age ranges may be of limited utility. Two participants mentioned this
phenomenon, also adding that some of the age categories could probably be condensed. It
should also be noted that some participants may have used the female age intervals or
exemplars even though the questionnaire clearly stated “male.” The assigned phases were
147
used for all analyses, rather than the assigned age estimates, which were generally given
as intervals.
For sample C, participants were on average 77.5% sure that their observations
corresponded to the correct phase and 73.5% sure for sample D. The most frequent
percentage given for sample C was 90% and for sample D it was 80%, indicating that
several especially low percentages may be affecting the overall confidence mean. There
was no significant difference in level of confidence between the samples (Student’s t-test,
p=0.275). Although the sample sizes for each stage were too small for statistical analyses,
those individuals with higher self-reported confidence levels were not more likely to
assign the consensus phase (Table 39).
Table 39. Percent confidence in assigned phase by stage: samples C and D.
Sample C Sample D Phase
N % Confidence in
Assigned Phase (x̄ ) Phase
N % Confidence in
Assigned Phase (x̄ ) 1 2 90.0 1 0 - 2 13 78.8 2 1 80.0
*3 13 77.5 3 1 50.0 4 7 74.3 4 1 80.0 5 1 85.0 5 6 78.3 6 1 50.0 6 8 72.5 7 0 - *7 15 72.5 8 0 - 8 2 75.0
Total 37 77.5 Total 34 73.5 * indicates median phase assigned.
The distribution of assigned phases for sample C is concentrated mainly on
phases two through four (Figure 51). The distribution approaches normal and has a
median phase assignment of three. Two participants classified the sample in dual-phases
148
0
2
4
6
8
10
12
14
1 2 3 4 5 6
Count of Phase
Phase Figure 51. Distribution of assigned phases for sample C (n=37). and these data points are not included here. Given the distribution seen in Figure 51 and
the age ranges given in Iscan et al. (1984b) for phases two (18-25 years) and three (19-33
years), condensing these two phases should be considered. The phases given with the cast
set are even smaller: phase two – 20-23 years and phase three – 24-28 years.
The SEI for sample C was calculated based on statistics for phase three of the
Iscan et al. sternal rib end method. The mean for this phase is 25.9 years. Multiple-phase
designations were eliminated from these analyses, as were individuals who did not report
their highest degree obtained or the number of years of experience with skeletal age
estimation. Individuals who have their doctorates had the lowest mean SEI of all groups
(Table 40) but the sample size for this group was too small to run ANOVA between the
149
Table 40. SEI by highest degree obtained: samples C and D. Sample C Sample D Degree N SEI N SEI Bachelor 14 15.80 14 11.29 Master 16 13.72 14 21.97 Doctorate 6 2.96 5 12.60 ALL 36 12.69 33 16.02
three groups. A Student’s t-test revealed that there was no significant difference between
“bachelor” and “master” groups (p=0.755). Those individuals who had ten or more years
of experience also had the lowest mean SEI (Table 41). No statistical tests of significance
were run due to small sample sizes for two of the three “years of experience” groups.
Table 41. SEI by years of experience in skeletal aging: samples C and D. Sample C Sample D Years of Experience in Skeletal Aging
N SEI N SEI
0-4 20 15.48 20 16.47 5-9 9 12.66 7 13.80 10+ 7 5.68 6 9.21 ALL 36 12.87 33 14.58
The distribution for sample D clusters in the upper phases generally between
phases five and seven (Figure 52). Sample D was assigned to multiple phases five times
and these data points are not represented here. Phase seven was the most frequently
assigned phase; it was assigned 15 times. The distribution has a median phase of 6.5, and
is skewed to the left. The age ranges given in the 1984b publication are larger than the
ranges reported by the majority of the participants for sample D, reflecting the
discrepancy between ranges reported in different sources.
150
0
2
4
6
8
10
12
14
16
2 3 4 5 6 7 8
Count of Phase
Phase
Figure 52. Distribution of assigned phases for sample D (n=34).
The mean for phase seven (59.2 years) was used to calculate the SEI for
sample D. These calculations do not include multiple-phase designations or individuals
who did not report degree or years of experience. Individuals who held a bachelor’s as
their highest degree had the lowest mean SEI, though the mean SEI for the doctorate
group was very close (Table 40). ANOVA was not run because of the small sample size
for the “doctorate” group and a Student’s t-test revealed no significant difference between
“bachelor” and “master” groups (p=0.098). Individuals with ten or more years of
experience had a lower mean SEI than both other groups (Table 41). No statistical tests of
significance were run because of small sample sizes in the 5-9 and 10+ groups.
151
Mann et al. Maxillary Sutures
Approximately half of the participants (48.7%) had never used the Mann et al.
maxillary suture method before, while 46.2% of the participants had. Two people did not
record whether or not they were familiar with the method, although they did provide age
estimations based on this method. Of those people that were already familiar with the
method, 16.7% use it on a regular basis and 83.3% do not. The median score for self-
reported level of comfort with the method was two, which corresponds to “low.”
As was observed in the JPAC/CIL identified sample, scores produced by the
participants are difficult to analyze because there was little similarity in reporting
between individuals. For example, using sample E, the only age estimates assigned more
than once were: 20-25 (n=3), 26+ (n=3), 30-85 (n=2), 35+ (n=4), and 35-50 (n=2). Two
individuals did not give an age estimate and the remaining 22 age estimates all differed
from one another. The age estimates for sample F were slightly easier to interpret than
sample E, but suffered from a comparable lack of similarity in reporting. Participants did
not appear to be using the same tables or figures from the reference method in assigning
age estimates.
Two separate “percent confidence” questions were asked for the Mann et al.
maxillary suture method: percent sure that the observations of obliteration were correct
and percent sure that interpretation of age interval was correct. For sample E,
participants’ mean level of confidence for observations of obliteration was 70.53%, with
the most frequent score reported being 60%. Participants were on average 66.7% sure
that their interpretation of the age interval was correct for this sample, although the most
frequently reported score was 75%. For sample F, the mean levels of confidence for
152
observations of obliteration and interpretation of age interval were higher at 79.2% and
72.2%, respectively. The most frequently reported score for both confidence levels was
80%. Sample F seemed to present less of a problem for age estimation and interpretation
than sample E.
To further illustrate the problem of interpretation, all combinations of suture
obliteration as reported by participants for sample E are show in Figure 53. This figure
includes only those data points for which the participants circled the obliterated sutures
on the questionnaire. Several individuals made additional notes concerning stages of
suture obliteration but these are omitted here. The five categories following “none”
represent the general pattern of suture obliteration in chronological order, i.e., IN, PMP
obliteration generally occurs before IN, PMP, TP. There are 13 combinations listed here.
The incisive suture was recorded as fused in all but three groups and the posterior median
palatine in all but six. The transverse palatine suture is present in six of the groups and
the transverse palatine within the greater palatine foramina in seven. Finally, the anterior
median palatine suture is present in only three groups. One individual scored this sample
as having no obliterated sutures.
Figure 54 represents an overall count of the number of times each individual
suture was recorded as obliterated for all participants. Almost all participants agreed that
the incisive suture was obliterated and close to 50% of participants also recorded the
posterior median palatine and the transverse palatine within the greater palatine foramina
as obliterated. More than 50% of the participants agreed that the transverse palatine
suture was not obliterated and almost all agree that the anterior median palatine suture
was not obliterated. Given these trends and the general pattern of suture obliteration as
153
0
1
2
3
4
5
6
7
8no
ne IN
IN,
PM
P
IN, P
MP
, T
P
IN, P
MP
, T
P, T
Pin
GP
F
IN,
PM
P, T
P,
TP
inG
PF
,A
MP
IN, P
MP
, A
MP
IN,
PM
P, T
P,
AM
P
IN,
PM
P, T
Pin
GP
F
IN,
TP
, TP
inG
PF
IN, T
Pin
GP
F
TP
, TP
inG
PF
TP
inG
PF
Count of E
E
Figure 53. All combinations of suture obliteration as reported by participants for sample E (n=38).
given in Mann et al. (1991), sample E most likely represents an individual over the age of
30. However, since age estimate reporting was so sporadic, no attempt was made to
calculate the SEI by experience level for this sample.
While the age estimates were not clearly reported for sample F, the pattern of
suture obliteration recorded exhibits high interobserver agreement. Only three categories
of suture obliteration were recorded (Figure 55), and two of the three included the
154
0
5
10
15
20
25
30
35
40
None IN PMP TP TPinGPF AMP
Suture
Co
un
t
Figure 54. Frequency of sutures scored as obliterated: sample E. incisive suture. Only one individual scored the transverse palatine suture as fused and
only two individuals reported the combination of incisive/transverse palatine within the
greater palatine foramina. Figure 56 also displays the consensus between observers of
obliteration of the incisive suture. Given this agreement, the best age estimate for this
sample is between 20 and 25 years, which is also the most frequently reported age
estimate (n=11). However, even with the agreement concerning obliteration of the
incisive suture, age estimates were not easily comparable and no attempt was made to
calculate the SEI based on level of experience.
155
0
5
10
15
20
25
30
35
IN IN, TPinGPF TP
Count of F
F
Figure 55. All combinations of suture obliteration as reported by participants for sample F (n=36).
Summary
This chapter presented results from tests of three skeletal age estimation
methods. All three methods were chosen based on analyses of the JPAC/CIL identified
sample. Recommendations for possible method modifications or further research based
on results from this preliminary interobserver error study will be presented in the
following chapter.
156
0
5
10
15
20
25
30
35
40
None IN PMP TP TPinGPF AMP
Suture
Co
un
t
Figure 56. Frequency of sutures scored as obliterated: sample F.
157
CHAPTER VIII
DISCUSSION
This chapter discusses the results from both the retrospective and
interobserver error studies. Generally, all methods perform well for the JPAC/CIL
identified sample, although there are some exceptions to this rule. An overview of
method performance, error, and experience as correlated to error are given below.
General observations of skeletal age estimation for this sample and limitations and biases
of both studies will also be outlined.
Method Performance
It was hypothesized that age estimation methods would perform well in the
JPAC/CIL identified sample because of the use of McKern and Stewart (1957) for age
estimation based on epiphyseal fusion and the pubic symphysis and the young
composition of the sample. The results from the retrospective and interobserver error
studies suggest that age estimation methods are, for the most part, performing well for
individuals in the JPAC/CIL sample. Each method is discussed in further detail below,
followed by a brief comparison of all methods.
Epiphyseal fusion methods perform very well at the CIL, with close to 100%
correct classification for all of the long bone epiphyses. Long bones are usually scored
using McKern and Stewart (1957), although Scheuer and Black (2000) is also
158
occasionally employed. The data for long bone epiphyses is not extremely useful since
most individuals in the sample are listed as “all epiphyses fused,” indicating adult age.
While this is helpful in determining a minimum age, late-fusing epiphyses are more
informative for age estimation, especially in young adults.
Late-fusing epiphyses, such as the iliac crest, medial clavicle, vertebral centra,
and the first two sacral segments, have high potential for age estimation in the JPAC/CIL
sample because they all generally fuse in one’s 20s. When examining correct and
incorrect classifications, the Webb-Suchey medial clavicle and iliac crest and the Albert-
Maples vertebral centra methods perform very well; all three epiphyses exhibit close to
100% correct classification. For the McKern-Stewart method, the iliac crest also has
100% correct classification, but the medial clavicle and vertebral centra drop below 90%.
The age intervals provided for stages of fusion of the vertebral centra are similar between
methods, while the Webb-Suchey clavicle method has much larger intervals per stage of
fusion than the McKern-Stewart clavicle method. Larger age intervals can contribute to a
higher percentage of correct classification.
Fusion of the first two sacral segments has the lowest correct classification of
all methods employed at the CIL (32.1%). Very few of the individuals that were scored
as stages zero through three fall into the age intervals provided by McKern and Stewart
(1957) for fusion of these two elements (Figure 23). Additionally, the pattern of fusion is
sporadic, with an absence of fusion seen in an individual 30 years old and complete
fusion seen as young as 26 years in this sample. It is recommended that S1-S2 fusion no
longer be used in age estimation, unless new age intervals are devised for the stages or
further research is conducted.
159
The Mann et al. maxillary suture method had a surprisingly high correct
classification rate (88.7%) considering the general skepticism surrounding the use of
cranial sutures in age estimation (Brooks 1955; Masset 1989). This is in agreement with
Ginter (2005), who suggested that the revised maxillary suture method (Mann et al. 1991)
was more effective at age estimation than more commonly used methods, like the pubic
symphysis and the sternal rib ends. Most incorrect classifications occurred for individuals
under the age of 25. Age estimates given as five-year intervals have negative bias values,
indicating that they tend to underage. All other intervals have positive bias values. There
does not appear to be a clear correlation between error values and estimated age interval
based on maxillary suture obliteration. There is a significant linear relationship between
estimated and known age-at-death (Table 6, r=0.79) for the Mann et al. maxillary suture
method.
Even with low overall bias, inaccuracy, and SEI, a high correct classification
rate, and a good correlation between estimated and known age-at-death, the method
appears to be difficult to apply. Age estimates were difficult to compare between
analysts, a problem that was found both in the retrospective and interobserver error
studies. Even when the same sutures were reported as obliterated, the age intervals
produced by analysts were not always comparable. Using sample F, for which 35 of 36
participants scored the incisive suture as fused, only 11 participants gave the same age
interval (20-25 years). The discrepancies in reporting are probably due to the use of
different tables and figures from the reference article. About half of the participants had
never used the method before and of those that did, it was not on a regular basis,
160
indicating that of the anthropologists surveyed, many were not familiar or comfortable
with the Mann et al. maxillary suture method and that the method is not easy to apply.
Another problem in application of this method may be the definition of suture
obliteration. The revised method, based on visual observation of obliteration, does not
clearly define suture obliteration. In fact, the senior author explained that any small
amount of obliteration along the suture counts as obliteration of that suture (personal
communication, Robert Mann 2008), but this is not detailed in the reference article.
Standardization of the method is therefore recommended to include a clear explanation of
what constitutes suture obliteration (i.e., partial, complete), defined age intervals for each
stage of suture obliteration, and summary statistics associated with each interval.
Both dental formation methods (Moorrees et al. 1963; Mincer et al. 1993)
include terminal stages, analogous to complete union in epiphyseal fusion methods.
When these stages are eliminated from analyses, there is a drastic reduction in the
number of correct classifications for both methods. For the Moorrees et al. method,
correct classification occurred for only 51.9% of individuals not classified as “apices
complete” and all incorrect classifications occurred for incomplete stages of root
formation, with the majority taking place in the R1/2 to R3/4 stages (Appendix C). For the
Mincer et al. method, correct classification was 74.1% when the terminal stage (H) was
eliminated from calculations. No incorrect classifications were recorded for stage H, and
the majority occurred in stage G, which directly precedes H in the Demirjian et al. (1973)
system.
Error values as indicated by bias, inaccuracy, and SEI are also very high when
terminal stages are included in analyses. This is to be expected because the terminal
161
stages offer only a minimum age. Using the mean for this stage to calculate error values
for all individuals results in large error values because older individuals are further away
from the stage mean. Eliminating terminal stage data points decreases the error associated
with both methods. It should also be noted that the JPAC/CIL sample is not a random
sample because of military enlistment requirements. There are no individuals younger
than 17 years old in the sample and this inherent bias accounts for the tendency of both
dental formation methods to underage. Both the Moorrees et al. (1963) and Mincer et al.
(1993) methods were developed using samples of mainly young children and therefore
have mean ages at attainment for each stage of root development that are generally
younger than most individuals in the JPAC/CIL sample.
The results from both dental formation methods suggest that these methods
are not appropriate for age estimation using terminal categories, except as a minimum
age. The number of incorrect classifications for stages of incomplete formation also
suggests that dental formation methods are not performing well for late adolescents and
young adults. There is also the possibility that analysts are making determinations based
on visual inspection of teeth no longer in the alveoli (personal communication, Kevin
Torske 2008). This could severely bias the age estimation because a root that appears
incompletely formed may in fact be damaged. Additionally, confusion between assigning
stages of complete root formation (Rc) versus complete apical closure (Ac) cannot be
ruled out, especially considering the small sample sizes for stages of incomplete
formation.
The Suchey-Brooks pubic symphysis method had the highest correct
classification rate of all three pubic symphysis methods, while the Todd method had the
162
lowest bias, inaccuracy, and SEI. The Todd method also had the smallest sample size and
is therefore not truly comparable to the Suchey-Brooks and McKern-Stewart pubic
symphysis method samples. The Todd method only had a 70% correct classification rate
and a larger sample would be beneficial to fully understand the error associated with this
method. It is important to point out that the Todd method is not included as an approved
age estimation method in SOP 3.4.
The rationale of using McKern-Stewart for males that died before 1960 is that
the method was developed on a sample of identified males from the Korean War. The
Suchey-Brooks method was developed using a more recent sample and should better
represent those individuals who died after 1960. When comparing the two methods using
five-year intervals (Table 22), both perform equally as well. The most notable differences
occur in the 41+ age category, which is not unexpected because of the tendency of
skeletal age estimation methods to perform poorly for older individuals (see Meindl and
Lovejoy 1989; Schmitt 2004). Another interesting factor is the juxtaposition of bias
directionality per age interval; where one method overages, the other underages, and vice
versa. This is true for all age intervals except the last two (36-40, 41+).
The Suchey-Brooks method has a higher overall correct classification rate
than the McKern-Stewart method. When the known ages-at-death of individuals are
superimposed over the age intervals given by each method, it is clear that this difference
is attributed to the large age intervals per phase of the Suchey-Brooks method; the only
incorrect classification occurred in phase one. The Suchey-Brooks method is very
accurate, but certainly not as precise as the McKern-Stewart method. This trend was
noted by Saunders et al. (1992) in a test of multiple age estimation methods. While the
163
overall accuracy of the McKern-Stewart method is not as high as the Suchey-Brooks
method, it is also not as poor as other methods currently in use. However, Figure 28
demonstrates that a possible combination of certain composite score groups may be
warranted, specifically those in the lower end of the spectrum (e.g., 1-2, 3, 4-5).
Concerning Suchey-Brooks, the highest bias, inaccuracy, and SEI values were observed
in phase five, which is also the largest phase range of all five observed phases1.
Given similar results for both the McKern-Stewart and Suchey-Brooks pubic
symphysis methods and the assumption that JPAC/CIL analysts have been applying age
estimation methods according to SOP 3.4, the continued use of both methods is
supported. It was not possible to determine how well either of the methods performs for
older individuals because of the absence of these individuals in the JPAC/CIL identified
sample. Both methods are acceptable for the age of individuals that make up this sample.
The different auricular surface methods are difficult to compare because the
sample size for the Buckberry-Chamberlain method was so small. It is very apparent that
the age intervals provided in the original Lovejoy et al. auricular surface method are far
too narrow, which is why Osborne et al. (2004) published revised statistics and combined
several of the original phases. Correct classification for the Lovejoy et al. method is low
even when multiple phases are assigned by analysts. It is preferable to use multiple
phases when employing the Lovejoy et al. (1985) auricular surface method because of the
very small age intervals given for this method. When multiple phase assignments are
removed from analyses, correct classification is just above 50%. The Lovejoy et al.
1This does not include phase six since this phase was never recorded by CIL analysts.
164
auricular surface method is not an accurate age estimation method for the JPAC/CIL
sample.
Applying the Osborne et al. phases and statistics to the same individuals
drastically increased the correct classification rate for this sample. Similar to the Suchey-
Brooks method, the age intervals provided in Osborne et al. (2004) are very large and
equally as imprecise. Also similar to the Suchey-Brooks method, the only incorrect
classification using the Osborne et al. statistics occurred in phase one. Phases three and
four had a tendency to overage. All individuals classified into these two phases had
known ages-at-death less than the mean for the phase, which is most likely related to the
age distribution of the sample (Figure 10). There are no individuals with an age-at-death
greater than 42, and the mean age for both phases three and four is greater than or equal
to 42. This trend is also reflected in the error values for each phase of the Osborne et al.
method, which increase with each subsequent phase.
The Buckberry-Chamberlain revised method is the preferred method for
auricular surface age estimation at the JPAC/CIL, except for very young individuals and
partial auricular surfaces. Unfortunately, the method was only used ten times for the
JPAC/CIL identified sample. Of the ten individuals aged with this method, all were
correctly classified. The Buckberry-Chamberlain auricular surface method was also the
method with the highest bias, inaccuracy, and SEI values when compared to all methods
employed at the JPAC/CIL2. Figure 40 demonstrates that the age intervals associated
with each stage are again very large, explaining the 100% correct classification yet high
2 This comparison is made excluding the terminal stages of dental formation methods.
165
error values. Only stages one, two, and five were observed for this sample. Both phases
two and five overaged individuals.
The interobserver test of the Buckberry-Chamberlain method indicated that
analysts were correctly assigning age point estimates and intervals based on composite
scores, but that the problem may actually lie in application of the method. Almost half of
the participants had never used the revised auricular surface method before, although
some people indicated that they were familiar with the descriptive categories of the
Lovejoy et al. method. There was fairly large variation in stage assignment for both
samples, which is problematic because the age intervals are already so large that spanning
multiple stages gives very imprecise age estimates. While the Buckberry-Chamberlain
method has been described as easier to apply than the original method (Mulhern and
Jones 2005; Falys et al. 2006), this was not supported by this study. This may be due to
the fact that analysts were not familiar or comfortable with the revised method or the
design of the study (discussed further below). Further research needs to be conducted
since this method is listed as the primary method for auricular surface age estimation at
the JPAC/CIL. Overall, the results from the retrospective and interobserver error studies
do not strongly support the auricular surface as a good age indicator.
The sternal rib end method as developed by Iscan and colleagues did not have
a good correct classification rate when the individuals in the JPAC/CIL sample were
assigned to the 95% confidence intervals from the original study (Iscan et al. 1984b).
These age intervals are not the same age intervals that accompany the sternal rib end
casts. Both the age intervals published with the original method and those that
accompany the sternal rib end cast set are far too narrow. Additionally, assigning the
166
Nawrocki (n.d.) prediction intervals, which are much larger than the Iscan et al (1984b)
intervals, dramatically increased the number of correct classifications. The only incorrect
classification with these prediction intervals was in phase one. The Iscan et al. sternal rib
end method also has the lowest correlation between known and estimated age-at-death for
all methods listed in Table 6.
Error values for the sternal rib end method are not extremely large because the
known ages-at-death of individuals aged with this method are generally very young and
do not differ greatly from the mean age given per phase. This indicates that the method is
performing adequately for younger individuals when the mean age point estimate is used,
although a larger sample size would be desirable to fully investigate this phenomenon.
No conclusions can be drawn about the performance of this method for older individuals
since no one was assigned to phase five or greater in the JPAC/CIL identified sample.
Results from the preliminary interobserver error study show that analysts are
generally familiar and moderately comfortable with the sternal rib end method and that
almost 50% of participants use it on a regular basis. The availability of casts makes this
method easier to apply than other methods, such as the auricular surface. The variation in
phase assignment for both samples was relatively low. Given the problems with very
small age intervals and the results for sample C, further research is warranted to
determine if phases of the sternal rib end method could possibly be condensed. This was
noted by one participant in the study and reflected by the multiple phase assignments
given by some JPAC/CIL analysts. The sternal rib end method would also benefit from
testing on larger, more varied samples.
167
Of all groups of methods, the auricular surface performs the poorest as an age
indicator in the JPAC/CIL identified sample based on all error values. The pubic
symphysis methods are the only methods that do not require modifications or further
research, even though smaller age intervals would always be welcome. All methods listed
in Table 6 had a significant linear relationship between estimated and known age-at-
death; this table does not include epiphyseal fusion or dental formation methods.
Error
If methods were developed from correctly balanced samples (Nawrocki 1998)
and are being applied correctly, error should be normally distributed. This error is
represented her e by calculations of bias for each method. Additionally, there should be
no significant differences in bias, inaccuracy, or SEI between phases of a single method.
Differences in phases would indicate that assigning an individual to one phase over
another could increase the error in the age estimation. Finally, interobserver error should
also be low, i.e., age estimates produced by different observers should be similar.
No error values were calculated for epiphyseal fusion methods. Age
estimations based on complete stages of union produce high error values because full
union is observed in all skeletally mature adults. The generally high correct classification
rates for the methods employed at the JPAC/CIL and the small sample sizes for stages of
incomplete fusion did not support the use of statistical tests of significance to examine
error.
The distribution of bias for the Mann et al. maxillary suture method was
normal. No tests of significance between age intervals were conducted due to
168
discrepancies in reporting these intervals and small sample sizes for all but one of the
intervals. Bias, inaccuracy, and SEI were calculated using the midpoint of the reported
age interval. Larger age intervals may have larger error values because individuals
correctly classified in a large interval could fall further away from the midpoint than
those in a small interval. The highest bias, inaccuracy, and SEI values do occur in the
larger age intervals for this method (Table 10). However, two five-year age intervals (15-
20, 25-30) have error values that are comparable to and, in some cases, greater than larger
age intervals. Therefore, the large age intervals do not fully explain the error between
reported age intervals. Larger sample sizes for each age interval would allow for
statistical testing and perhaps elucidate the problem. This method has the potential to be
very accurate and precise, especially for younger age groups.
Distributions of bias for the dental formation methods were not created
because of the small sample sizes for non-terminal stages. Dental formation methods
suffer from the same analytical problems as epiphyseal fusion methods because complete
apex closure indicates dentally mature adults, who can be significantly older than the
given stage mean. This creates difficulty in establishing a stage mean, which is required
for statitstical analysis. Additionally, when terminal stages are removed from all error
calculations, error drastically decreases for both methods (Table 12). It is apparent that
the high error values beforehand are due to the inclusion of the terminal stages.
To see if different teeth or roots are more prone to error, Student’s t-tests and
ANOVA were run including terminal stages for both methods. The Moorrees et al. dental
formation method showed no significant difference in bias, inaccuracy, or SEI between
teeth 17 and 32. There was a significant difference in bias and inaccuracy between mesial
169
and distal roots, with mesial roots exhibiting higher error than distal roots. This indicates
that age estimation based on the mesial roots may not be as accurate as the distal roots,
although the difference is only approximately one year. The Mincer et al. third molar
formation method does not distinguish between different tooth roots. There was no
significant difference in error between all four third molars using this method.
The distribution of bias for the Todd pubic symphysis method is not normal,
but this is most likely related to the small sample size. Bias, inaccuracy, and SEI are low
for all individuals who were correctly classified. Error values were the highest for
individuals who were incorrectly classified, since these individuals fall further away from
the interval midpoint than those individuals who were correctly classified. No tests of
significance were conducted because of the small sample size for phases of the Todd
method.
The distribution of bias for the McKern-Stewart pubic symphysis method is
fairly normal, with a slightly larger number of negative bias values. Statistical tests of
significance were run to compare possible differences in bias, inaccuracy, and SEI
between composite score groups 6-7, 8-9, and 11-12-13. These three groups were the
only groups with large enough sample sizes. ANOVA with the Bonferroni correction
revealed that composite score group 8-9 was problematic; it had a higher negative bias
than all other groups. Inaccuracy was not significantly different, but was higher than
lower composite score groups. There was no significant difference in mean SEI between
the three groups compared. The highest SEI values occurred in the last two composite
score groups (14 and 15). This can be explained by the incorrect classification of the one
170
individual assigned to 14 and the large age interval associated with 15 (36+). Larger
sample sizes for all score groups would greatly enhance between-group analyses.
The distribution of bias for the Suchey-Brooks pubic symphysis method is
also relatively normally distributed, albeit slightly skewed to the right, indicating slightly
higher positive bias values. ANOVA with the Bonferroni correction run between the first
four phases indicated that phase one was significantly different in bias and inaccuracy
from phases two through four. This is logical because phase one is the only phase with
negative bias. It also has an average error of years (inaccuracy) that is greater than those
of phases two and three, but less than phase four. Additionally, phase one is the only
phase in which incorrect classification occurred for this method. There is no significant
difference in SEI between the first four phases and SEI is relatively high overall. This is
most likely related to the large confidence intervals associated with phases of the Suchey-
Brooks method, which are accurate but not precise.
The Suchey-Brooks method, while being largely supported as the most
reliable and widely employed skeletal age estimation method, has not been subjected to
analysis of interobserver error. The analyst using the Suchey-Brooks system thus assumes
that he or she is scoring a given pubis in the same manner that the method developers
would and that the method works as published (personal communication, John Byrd
2009). By including intervals that include two standard deviations from the mean, Suchey
and colleagues have created sufficiently large intervals to account for error associated
with the assigned phase. However, there is no discussion of error in phase assignment,
specifically by those that are less experienced in age estimation from the pubic
symphysis.
171
Distribution of bias for the Lovejoy et al. auricular surface method shows two
very large peaks on the positive side of the graph. The bias is not normally distributed
around zero, which is consistent with the majority of positive bias values for phases of
this method. ANOVA with the Bonferroni correction indicates that phase three is
problematic for this method. However, bias, inaccuracy, and SEI are similar for phases
four and five, but these phases did not have large enough sample sizes for tests of
statistical significance. Even with small age intervals, the Lovejoy et al. method has large
SEI values, indicating that the problem is with the distribution of ages within the original
intervals. This method is precise, but not at all accurate.
The distribution of bias for the Osborne et al. auricular surface method is not
normal. There are numerous high peaks of negative bias values and the distribution is
skewed to the right. The only two phases with large enough sample sizes for statistical
comparisons were phases one and two. Student’s t-tests comparing bias, inaccuracy, and
SEI between these phases revealed that there were significant differences in bias and
inaccuracy, but not SEI. Error values increase for each consecutive phase of this method
as the distribution of known-aged individuals becomes less well centered around the
phase means. The Osborne et al. method has high SEI values because of the large age
intervals that are associated with each phase. The high level of correct classifications of
individuals aged using this method accompanied by high SEI values indicates that this
method is accurate but not precise.
No distribution of bias was created for the Buckberry-Chamberlain revised
auricular surface method because of the small sample size and the paucity of recorded
stages in the JPAC/CIL identified sample. No tests of significance were conducted either.
172
This method has low bias and inaccuracy values for all stages, with the exception of stage
five. However, the SEI values are the highest of all methods employed at the JPAC/CIL.
This method is clearly accurate but not at all precise. The interobserver error study
revealed that analysts may not be correctly applying the method due to large variation in
stage assignment, which is problematic considering the already large age intervals it
predicts.
The distribution of bias for the Iscan et al. sternal rib end method was not
normally distributed, which is related to the small sample size available for analysis once
multiple phase designations were eliminated. No statistical tests of significance were
conducted because of small sample sizes for the four phases observed in the JPAC/CIL
identified sample. Bias and inaccuracy were very low for the first four phases, with the
exception of a high negative bias value for phase one. This is due to the incorrect
classification of one individual with a known age-at-death of 24 years. This also
influenced the SEI for this phase. These discrepancies would not be so dramatic if the
sample sizes per phase were larger. Using the Nawrocki prediction intervals for the first
four phases produced both accurate and precise age estimations. The interobserver error
study also indicated that phase assignment between individuals was relatively consistent.
The poor performance of the auricular surface methods is again confirmed by
the error values discussed above. All other methods perform better in general than the
auricular surface methods, though not without their own caveats. It is important to note
that high SEI values can be related to large confidence intervals, poorly constructed
confidence intervals from the reference method, and incorrect classifications. It is
173
therefore important to analyze this index in conjunction with other measures of method
performance and error.
Analyst Experience
As Maples (1989:323) suggested, successful age estimation comes from
analysts “with a long and wide experience with a variety of techniques.” It was
hypothesized that the SEI would be dependent on experience. Therefore, individuals with
more experience in skeletal aging, measured in this study by highest degree obtained
(doctorate) and number of years of experience with skeletal aging (10+), should have
lower average SEIs for measured samples than those individuals with less experience.
For the Buckberry and Chamberlain (2002) revised auricular surface method,
those individuals with a doctorate actually had the highest mean SEI for sample A, but
the lowest mean SEI for sample B. The same trend was observed for the 10+ years of
experience group. Because the sample sizes for individuals with a doctorate and 10+
years of experience were so small, no statistical tests of significance were possible
between this and other degree-level groups. No significant difference in SEI was present
between individuals with bachelor’s and master’s degrees.
Participants were not highly confident that their observations corresponded to
the correct composite score and the self-reported level of comfort was low for this
method. Half of the participants have never used the Buckberry-Chamberlain revised
auricular surface method before this study, which includes four of the five individuals
with doctorates and four of the seven individuals with 10+ years of experience. Lack of
familiarity with the method and the small sample size of individuals with doctorates or
174
10+ years of experience are most likely affecting the mean SEI for both groups, rendering
the comparison between experience levels of little utility.
For the Iscan et al. (1984b) sternal rib end method, individuals with a doctoral
degree had a much lower average SEI than other groups for sample C, but not for sample
D. Those individuals with greater than ten years of experience in skeletal aging had
markedly lower average SEIs for both samples. There was no significant difference in
SEI between individuals with bachelors and master’s degrees. The results for the Iscan et
al. sternal rib end method, especially for the average SEI based on years of experience in
skeletal aging, support the hypothesis that individuals with more experience will
generally have lower error associated with their age estimations. Additionally,
participants were fairly confident that their observations corresponded to the correct
phase and the self-reported level of comfort for this method was average. Of the three
methods tested, participants were most familiar with the Iscan et al. sternal rib end
method and all individuals with the highest levels of experience had used this method
before.
It was not possible to calculate a SEI for either sample tested using the Mann
et al. maxillary suture method because of a general lack of similarity between participants
in reporting age estimates. Similar to the Buckberry-Chamberlain method, only half of
the participants had used this method prior to this study and, of these individuals, very
few use it on a regular basis. Self-reported level of comfort with the method is low, but
confidence levels both for observations of obliteration and interpretation of age intervals
were similar to those reported for the sternal rib end method. This is interesting,
especially considering the large number of combinations of suture obliteration and age
175
intervals recorded for sample E. Because individuals had so little experience with this
method, analyzing error based on experience would probably produce similar results to
the SEI for the Buckberry-Chamberlain method.
Two of the three methods (Buckberry-Chamberlain and Mann et al.)
examined in the interobserver error study were chosen because it was believed that they
were not as well-known or understood as other more commonly used skeletal age
estimation methods. Therefore, the results from this portion of the study are not highly
informative in regards to a possible correlation of error and aging. The exception to this is
the Iscan et al. method, which appears to support the hypothesis that those individuals
with more experience have lower average SEI than those individuals with less
experience. These results do suggest that experience with individual methods of age
estimation is more important than overall experience in skeletal age estimation or highest
degree held.
General Observations
Analysis of known age-at-death distributions for each sample indicated that
there were significant differences in mean age-at-death between sub-samples. Notably,
epiphyseal union and dental formation methods have lower mean ages-at-death than the
total identified known age-at-death sample and the pubic symphysis and auricular surface
methods have higher mean ages-at-death. This is related to the age indicators being
measured in each group of methods. The first group is concerned with processes of late
development and the second group with age-related changes in skeletally mature adults.
Therefore, the different distributions observed are to be expected.
176
Sources of error in age estimation methods at the JPAC/CIL include analyst
error and the methods themselves. Analyst error is random while method error is
systematic. Neither source of error can be completely removed for skeletal age
estimation, but it is important to estimate uncertainty in measurement to avoid overstating
the performance of a method.
SOP 3.4 stipulates the use of specific age estimation methods unless approved
by lab management. Approved age estimation methods at the JPAC/CIL include: McKern
and Stewart (1957) epiphyseal fusion, Scheuer and Black (2002) epiphyseal fusion,
Moorrees et al. (1963) dental formation, Mincer et al. (1993) dental formation, McKern
and Stewart (1957) pubic symphysis, Suchey-Brooks pubic symphysis, Buckberry and
Chamberlain (2002) revised auricular surface, Lovejoy et al. (1985b) auricular surface
with Osborne et al. (2004) age intervals, and Iscan et al. (1984b) sternal rib end. All other
age estimation methods require full documentation and citation. Methods analyzed here
that are not listed in SOP 3.4 include: Albert and Maples (1995) vertebral centra union,
Webb and Suchey (1985) medial clavicle and iliac crest union, Todd (1920, 1921) pubic
symphysis, and Mann et al. (1991) maxillary suture obliteration. With the exception of
the Todd pubic symphysis method, all of these methods perform well in the JPAC/CIL
sample and their addition to the laboratory manual should be seriously considered.
Limitations of the Thesis
All scientific research has its limitations and the studies presented here are no
exception. The first problem with measuring uncertainty in age estimation for the
JPAC/CIL identified sample is the reliance on written records. Since identified
177
individuals are returned to their families upon identification, no skeletal remains are
available for analysis. The records at the JPAC/CIL are complete in most cases, but data
collection follows the assumption that analysts are properly recording all analyses and
following SOPs. There is no possibility to confirm or refute their conclusions based on
skeletal material. Earlier records are problematic because there has not always been a
quality assurance program in place.
Another problem is those elements available for analysis. The sample sizes of
methods analyzed in this thesis are dependent upon recovery and preservation of age-
related skeletal indicators. Elements that are poorly preserved, such as the sternal rib end,
have a smaller sample size. Age distributions per method could therefore be dependent on
those elements present at the time of analysis.
Single-method analysis does not replicate the reality of age estimation.
Maples (1989) recommended the use of as many techniques as possible when
constructing individual age intervals. This practice uses all available elements to provide
an age estimate with as much age-related information as possible. Additionally, this
procedure is designed to ensure that the anthropologist is establishing an accurate and
precise interval. The goal of this study was not to analyze overall age estimations for each
individual in the JPAC/CIL identified sample, but to quantify error for each isolated
method.
The interobserver error study was conducted largely at a professional
conference, which does not replicate laboratory conditions. The study was run in a large
exhibit hall, with, as noted by one participant, poor lighting and many possible
distractions. Participants were likely to feel rushed since their main purpose at the
178
conference was not to complete this study. The table space was also cramped and not
conducive to extended analyses of the samples.
There also seemed to be considerable confusion with application of at least
two of the methods (Buckberry-Chamberlain revised auricular surface and Mann et al.
maxillary suture obliteration). Very few participants referenced the Iscan et al. (1984b)
article and instead used the age intervals provided with the cast set, although these are not
the same as the original article. It was difficult for participants to read and understand
reference methods with which they were not already familiar, even when the articles were
provided. This probably introduced significant error to the study, although a partial goal
of the study was to see how easily methods could be applied by individuals that were not
familiar with them.
Summary
This chapter discussed results obtained from the retrospective and
interobserver error studies conducted for this thesis. Analysis of method performance as
indicated by classification rates and error values indicated that age estimations generally
perform well for individuals in the JPAC/CIL identified sample. Exceptions to this were
outlined. General observations for age estimation at the JPAC/CIL and limitations of the
current research were also given. The following chapter will summarize this thesis and
provide ideas for future research.
179
CHAPTER IX
SUMMARY
The aim of this thesis was to examine and quantify measurement uncertainty
associated with skeletal age estimation methods employed at the JPAC/CIL. This study
was the first to span so many years and methods at a single institution. This chapter
summarizes the findings of this research and discusses future research to be conducted in
age estimation and measurement uncertainty.
Uncertainty in Skeletal Age Estimation
Both the retrospective and interobserver error studies presented in this thesis
have produced valuable data concerning age estimation. Error in age estimation is both
random and systematic, caused by operators and problems with the reference methods.
Both of these sources of error will never be completely eliminated, but can be controlled
for via quality assurance programs and continued testing of and research on age
estimation methods. Research in the reliability and accuracy of methods is crucial to
further development in the forensic sciences and will continue to drive inquiry in related
fields.
Age estimation methods perform well for the JPAC/CIL identified sample.
This is most likely related to the age composition of the sample, which is largely between
the ages of 18 and 30. The legacy of McKern and Stewart also continues to provide a
180
strong analytical framework for age estimations conducted at the JPAC/CIL. Certain
methods still present problems, such as the auricular surface and, to some extent, the
sternal rib end. Other methods should be eliminated from age estimation procedures,
specifically the fusion of the first two sacral segments. Finally, some methods produced
surprising results and merit further consideration, such as maxillary suture closure.
Future Research
There is a vast array of future research topics that have arisen as a result of
this thesis. It is important to note that this thesis is the first attempt at estimating
uncertainty of skeletal age estimation at the JPAC/CIL and as such represents the
beginning of quantifying error associated with these methods. Undoubtedly, with the
current focus on quality assurance and identification, all methods employed at the
JPAC/CIL will need to be subjected to uncertainty analysis. This includes methods for
estimating stature, sex, ancestry, and other portions of the biological profile.
Validation studies are at the forefront of forensic science research. They
represent the efforts of forensic scientists to understand the error associated with the
methods they use and to quantify reliability and accuracy of these methods. Accreditation
and quality assurance is the future of forensic anthropology. Therefore, future research
related to methods of human identification must be prepared to detail uncertainty in
measurement in accordance with national and international standards, such as ISO/IEC
17025.
Using the records at the JPAC/CIL, it would be interesting to conduct future
research to examine the frequency of elements recovered. This could include what
181
percentage of age-related elements are recovered and analyzed. The preservation of these
elements should also be considered as a possible factor of age estimation, including the
choice of method or methods used. Element preservation and recovery may also have a
significant impact on measurement uncertainty.
One method that showed promise for accurate and precise age estimation was
the Mann et al. (1991) maxillary suture method. Most participants in the interobserver
error study expressed surprise when asked to use a suture closure method. This method
was explicitly chosen because of the good performance of this method in the JPAC/CIL
identified sample juxtaposed against a disdain for suture closure methods as related to
age estimation. The method needs to be standardized and tested on more populations,
including documented age-at-death collections.
Improvements to other methods are also recommended. The McKern-Stewart
pubic symphysis method would benefit from a reanalysis of the age intervals associated
with each composite score group; it may be possible to combine several of the lower age
categories. Iscan and colleagues’ sternal rib end method could also benefit from the
combination of some phases, as well as validation studies on different populations and
varied age groups. The Buckberry-Chamberlain auricular surface method, while not
promising given the results in these studies, may suffer from a lack of familiarity.
Training in using the method as proposed by its authors would be worthwhile, as well as
continually testing it with different populations.
A more expansive interobserver error study is also desirable. The study
presented here was preliminary and designed to initiate research concerning three
methods that presented problems in the JPAC/CIL identified known age-at-death sample.
182
Further studies should find a way to simplify the methods being analyzed, such as putting
all pertinent information and a step-by-step procedure onto small posters (personal
communication, Eric Bartelink 2009). Additionally, a quieter setting would be preferred
as well as the testing of only one method at once to reduce confusion. One final
improvement would be to use samples of known age-at-death in order to examine method
reliability.
Understanding the uncertainty associated with age estimation is crucial to
correct applications of these methods by forensic anthropologists, paleodemographers,
and bioarchaeologists. Quantifying the error of any scientific method allows for the
continued improvement and development of practices, procedures, and techniques related
to the method. Age estimation from adult skeletal remains will most likely always present
challenges to physical anthropologists as more reliable, precise, and accurate methods are
pursued. The emphasis now lies on elucidating the error associated with these methods.
REFERENCES CITED
184
REFERENCES CITED
Adams, Bradley J., and John E. Byrd 2002 Interobserver Variation of Selected Postcranial Skeletal Measurements.
Journal of Forensic Sciences 47(6):1193-1202. Albert, Arlene Midori, and William R. Maples
1995 Stages of Epiphyseal Union for Thoracic and Lumbar Vertebral Centra as a Method of Age Determination for Teenage and Young Adult Skeletons. Journal of Forensic Sciences 40(4):623-633.
American Society of Crime Laboratory Directors/Lab Accreditation Board
2007 Estimating Uncertainty of Measurement Policy. AL-PD-3008-Ver 2.0. 2008 Updated Uncertainty of Measurement Requirements. AL-PD-3033-Ver 1.0.
Arany, Szilvia, Mitsuyoshi Iino, and Naofumi Yoshioka
2004 Radiographic Survey of Third Molar Development in Relation to Chronological Age Among Japanese Juveniles. Journal of Forensic Sciences 49(3):1-5.
Aykroyd, Robert G., David Lucy, A. Mark Pollard, and Charlotte A. Roberts
1999 Nasty, Brutish, but Not Necessarily Short: A Reconsideration of the Statistical Methods Used to Calculate Age at Death from Adult Human Skeletal and Dental Age Indicators. American Antiquity 64(1):55-70.
Aykroyd, R. G., D. Lucy, A. M. Pollard, and T. Solheim
1997 Technical Note: Regression Analysis in Adult Age Estimation. American Journal of Physical Anthropology 104:259-265.
Baccino, Eric, Douglas H. Ubelaker, Lee-Ann C. Hayek, and A. Zerilli
1999 Evaluation of Seven Methods of Estimating Age at Death from Mature Human Skeletal Remains. Journal of Forensic Sciences 44(5):931-936.
Bass, William M.
2005 Human Osteology: A Laboratory and Field Manual. 5th Edition. Colombia, MO: Missouri Archaeological Society, Special Publication No. 2.
Bedford, M. E., K. F. Russell, and C. O. Lovejoy
1989 The Auricular Surface Aging Technique. 16 color photographs with descriptions. Kent, Ohio: Kent State University.
185
Bedford, M. E., K. F. Russell, C. O. Lovejoy, R. S. Meindl, S. W. Simpson, and P. L. Stuart-Macadam
1993 Test of the Multifactorial Aging Method Using Skeletons with Known Ages-at-Death from the Grant Collection. American Journal of Physical Anthropology 91(3):287-297.
Berg, Gregory E.
2008 Pubic Bone Age Estimation in Adult Women. Journal of Forensic Sciences 53(3):569-577.
Bernard, H. Russell
2002 Research Methods in Anthropology: Qualitative and Quantitative Approaches. Third Edition. Walnut Creek: AltaMira Press.
Blakenship, Jane A., Harry H. Mincer, Kenneth M. Anderson, Marjorie A. Woods, and Eddie L. Burton
2007 Third Molar Development in the Estimation of Chronologic Age in American Blacks as Compared with Whites. Journal of Forensic Sciences 52(2):428-433.
Bocquet-Appel, Jean-Pierre, and Claude Masset
1982 Farewell to Paleodemography. Journal of Human Evolution 11:321-333. Brach, Raymond M., and Patrick F. Dunn
2004 Uncertainty Analysis in Forensic Science. Tucson, AZ: Lawyers and Judges Publishing Company, Inc.
Brooks, Sheilagh Thompson
1955 Skeletal Age at Death: the Reliability of Cranial and Pubic Age Indicators. American Journal of Physical Anthropology 13:567-597
Brooks, S., and J. M. Suchey
1990 Skeletal Age Determination Based on the Os Pubis: a Comparison of the Acsádi-Nemeskéri and Suchey-Brooks Methods. Human Evolution 5(3):227-238.
Buckberry, J. L. and A. T. Chamberlain
2002 Age Estimation from the Auricular Surface of the Ilium: A Revised Method. American Journal of Physical Anthropology 119:231-239.
Buikstra, Jane E., and Douglas H. Ubelaker
1994 Standards for Data Collection from Human Skeletal Remains. Arkansas Archeological Survey Research Series, 44. Fayetteville: Arkansas Archeological Survey.
186
Byers, Steven N. 2008 Introduction to Forensic Anthropology: A Textbook. Boston: Allyn and
Bacon. Cardoso, Hugo F. V.
2008 Age Estimation of Adolescent and Young Adult Male and Female Skeletons II, Epiphyseal Union at the Upper Limb and Scapular Girdle in a Modern Portuguese Skeletal Sample. American Journal of Physical Anthropology 137:97-105.
Chaillet, Nils, and Arto Demirjian
2004 Dental Maturity in South France: A Comparison Between Demirjian’s Method and Polynomial Functions. Journal of Forensic Sciences 49(5):1-8.
Chaillet, Nils, Marjatta Nyström, Matti Kataja, and Arto Demirjian
2004 Dental Maturity Curves in Finnish Children: Demirjian’s Method Revisited and Polynomial Functions for Age Estimation. Journal of Forensic Sciences 49(6):1-8.
Chamberlain, Andrew T.
2006 Demography in Archaeology. New York: Cambridge University Press. Christensen, Angi M.
2004 The Impact of Daubert: Implications for Testimony and Research in Forensic Anthropology (and the Use of Frontal Sinuses in Personal Identification). Journal of Forensic Sciences 49(3):1-4.
Demirjian, A., H. Goldstein, and J. M. Tanner
1973 A New System of Dental Age Assessment. Human Biology 45(2):211-227. DiGangi, Elizabeth A., Jonathan D. Bethard, Erin H. Kimmerle, and Lyle W. Konigsberg
2009 A New Method for Estimating Age-at-Death from the First Rib. American Journal of Physical Anthropology 138(2):164-176.
Dirkmaat, Dennis C., Luis L. Cabo, Stephen D. Ousley, and Steven A. Symes
2008 New Perspectives in Forensic Anthropology. Yearbook of Physical Anthropology 51:33-52.
Edgar, Heather J. H.
2005 Prediction of Race Using Characteristics of Dental Morphology. Journal of Forensic Sciences 50(2):269-273.
187
Falys, Ceri G., Holger Schutkowski, and Darlene A. Weston 2006 Auricular Surface Ageing: Worse Than Expected? A Test of the Revised
Method on a Documented Historic Skeletal Assemblage. American Journal of Physical Anthropology 130:508-513.
Gilbert, B. Miles, and Thomas W. McKern
1973 A Method for Aging the Female Os Pubis. American Journal of Physical Anthropology 38:31-38.
Ginter, Jaime K.
2005 A Test of the Effectiveness of the Revised Maxillary Suture Obliteration Method in Estimating Adult Age at Death. Journal of Forensic Sciences 50(6):1303-1309.
Grivas, Christopher R., and Debra A. Komar
2008 Kumho, Daubert, and the Nature of Scientific Inquiry: Implications for Forensic Anthropology. Journal of Forensic Sciences 53(4):771-776.
Gruspier, Kathy L., and Grant J. Mullen
1991 Maxillary Suture Obliteration: A Test of the Mann Method. Journal of Forensic Sciences 36(2):512-519.
Hanihara, Kazuro, and Takao Suzuki
1978 Estimation of Age from the Pubic Symphysis by Means of Multiple Regression Analysis. American Journal of Physical Anthropology 48:233-240.
Hoppa, Robert D.
2000 Population Variation in Osteological Aging Criteria: An Example from the Pubic Symphysis. American Journal of Physical Anthropology 111:185-191.
Igarashi, Yuriko, Kagumi Uesu, Tetsuaki Wakebe, and Eisaku Kanazawa
2005 New Method for Estimation of Adult Skeletal Age at Death From the Morphology of the Auricular Surface of the Ilium. American Journal of Physical Anthropology 128:324-339.
International Organization for Standardization
2004a Guide to the Expression of Uncertainty in Measurement (GUM)–Supplement 1: Numerical Methods for the Propagation of Distributions. DGUIDE 99998. Geneva, Switzerland.
2004b International Vocabulary of Basic and General Terms in Metrology (VIM). DGUIDE 99999. Geneva, Switzerland.
2005 General Requirements for the Competence of Testing and Calibration Laboratories. ISO/IEC 17025 International Standard. Geneva, Switzerland.
188
Işcan, Mehmet Yaşar 1989a Assessment of Age at Death in the Human Skeleton. In Age Markers in the
Human Skeleton. Mehmet Yaşar Işcan, ed. Pp. 5-18. Springfield: Charles C. Thomas.
1989b Research Strategies in Age Estimation: the Multiregional Approach. In Age Markers in the Human Skeleton. Mehmet Yaşar Işcan,ed. Pp. 325-339. Springfield: Charles C. Thomas.
Işcan, M. Yaşar, and Susan R. Loth
1986a Determination of Age from the Sternal Rib End in White Males: A Test of the Phase Method. Journal of Forensic Sciences 31:122-132.
1986b Determination of Age from the Sternal Rib End in White Females: A Test of the Phase Method. Journal of Forensic Sciences 31:990-999.
Işcan, M. Yaşar, Susan R. Loth, and E. H. Scheuerman
1989 Assessment of Age from the Combined Use of the Sternal End of the Rib and Pubic Symphysis. Paper presented at the annual meeting of the American Academy of Forensic Sciences.
Işcan, M. Yaşar, Susan R. Loth, and Ronald K. Wright
1984a Metamorphosis at the Sternal Rib End: A New Method to Estimate Age at Death in White Males. American Journal of Physical Anthropology 65:147-156.
1984b Age Estimation from the Rib by Phase Analysis: White Males. Journal of Forensic Sciences 29(4):1094-1104.
1985 Age Estimation from the Rib by Phase Analysis: White Females. Journal of Forensic Sciences 30(3):853-863.
1987 Racial Variation in the Sternal Extremity of the Rib and Its Effect on Age Determination. Journal of Forensic Sciences 32(2):452-466.
Joint POW/MIA Accounting Command/Central Identification Laboratory
2008 JPAC Laboratory Manual. Part IV, SOP 4.0. Last revised 02 April 2008. Katz, Darryl, and Judy Myers Suchey
1986 Age Determination of the Male Os Pubis. American Journal of Physical Anthropology 69(4):427-435.
Kerley, E. R.
1965 The Microscopic Determination of Age in Human Bone. American Journal of Physical Anthropology 23:149-163.
1970 Estimation of Skeletal Age: After about Age 30 Years. In Personal Identification in Mass Disasters. T.D. Stewart, ed. Pp. 57-70. Washington, DC: National Museum of Natural History.
189
Kerley, E.R., and D.H. Ubelaker 1978 Revisions in the Microscopic Method of Estimating Age at Death in Human
Cortical Bone. American Journal of Physical Anthropology 49:545-546. Komar, Debra A., and Jane E. Buikstra
2008 Forensic Anthropology: Contemporary Theory and Practice. New York: Oxford University Press.
Konigsberg, Lyle W., and Susan R. Frankenberg
1992 Estimation of Age Structure in Anthropological Demography. American Journal of Physical Anthropology 89:235-256.
2002 Deconstructing Death in Paleodemography. American Journal of Physical Anthropology 117(4):297-309.
Konigsberg, Lyle W., Susan R. Frankenberg, and Renee B. Walker
1994 Regress What on What? Paleodemographic Age Estimation as a Calibration Problem. In Integrating Archaeological Demography: Multidisciplinary Approaches to Prehistoric Populations. Richard R. Paine, ed. Pp. 64-88. Occasional Paper No. 24. Center for Archaeological Investigations, Southern Illinois University, Carbondale.
Krogman, W. M.
1939 A Guide to the Identification of Human Skeletal Material. FBI Law Enforcement Bulletin 8:1-29.
Kunos, Charles A., Scott W. Simpson, Katherine F. Russell, and Israel Hershkovitz
1999 First Rib Metamorphosis: Its Possible Utility for Human Age-at-Death Estimation. American Journal of Physical Anthropology 110:303-323.
Kutyla, Alicja K.
2008 The Sacral Auricular Surface and its Significance in Age Estimation. Paper presented at the annual meeting of the American Association of Physical Anthropology. Columbus, OH. Annual Meeting Issue 2008: Supplement 46 (abstract).
Levin, Jack, and James Alan Fox
2007 Elementary Statistics in Social Research: The Essentials. Second Edition. Boston: Pearson Education, Inc.
Lovejoy, C. Owen, Richard S. Meindl, Robert P. Mensforth, and Thomas J. Barton
1985a Multifactorial Determination of Skeletal Age at Death: A Method and Blind Tests of Its Accuracy. American Journal of Physical Anthropology 68(1):1-14.
190
Lovejoy, C. Owen, Richard S. Meindl, Thomas R. Pryzbeck, and Robert P. Mensforth 1985b Chronological Metamorphosis of the Auricular Surface of the Ilium: A New
Method for the Determination of Adult Skeletal Age at Death. American Journal of Physical Anthropology 68:15-28.
Loth, Susan R., and Mehmet Yaşar Işcan
1989 Morphological Assessment of Age in the Adult: the Thoracic Region. In Age Markers in the Human Skeleton. Mehmet Yaşar Işcan, ed. Pp. 105-135. Springfield: Charles C. Thomas.
Lucy, D., R. G. Aykroyd, A. M. Pollard, and T. Solheim
1996 A Bayesian Approach to Adult Human Age Estimation from Dental Observations by Johanson’s Ages Changes. Journal of Forensic Sciences 41(2):189-194.
Mann, Robert W., Richard L. Jantz, William M. Bass, and Patrick S. Willey
1991 Maxillary Suture Obliteration: A Visual Method for Estimating Skeletal Age. Journal of Forensic Sciences 36(3):781-791.
Mann, Robert W., Steven A. Symes, and William M. Bass
1987 Maxillary Suture Obliteration: Aging the Human Skeleton Based on Intact or Fragmentary Maxilla. Journal of Forensic Sciences 32(1):148-157.
Maples, William R.
1989 The Practical Application of Age-Estimation Techniques. In Age Markers in the Human Skeleton. Işcan, Mehmet Yaşar, ed. Pp. 319-324. Springfield: Charles C. Thomas.
Martrille, Laurent, Douglas H. Ubelaker, Cristina Cattaneo, Fabienne Seguret, Marie
Tremblay, and Eric Baccino 2007 Comparison of Four Skeletal Methods for the Estimation of Age at Death on
White and Black Adults. Journal of Forensic Sciences 52(2):302-307. Masset, Claude
1989 Age Estimation on the Basis of Cranial Sutures. In Age Markers in the Human Skeleton. Mehmet Yaşar Işcan, ed. Pp. 71-103. Springfield: Charles C. Thomas.
McKern, Thomas W.
1970 Estimation of Skeletal Age: From Puberty to About 30 Years of Age. In Personal Identification in Mass Disasters. Pp. 41-56. T.D. Stewart, ed. Washington, DC: National Museum of Natural History, Smithsonian Institution.
McKern, Thomas W., and T. D. Stewart
1957 Skeletal Age Changes in Young American Males. Technical Report EP-45. Natick, MA: Quartermaster Research and Development Command.
191
Meindl, Richard S., and C. Owen Lovejoy 1985 Ectocranial Suture Closure: A Revised Method for the Determination of
Skeletal Age at Death Based on the Lateral-Anterior Sutures. American Journal of Physical Anthropology 68:57-66.
1989 Age Changes in the Pelvis: Implications for Paleodemography. In Age Markers in the Human Skeleton. Mehmet Yaşar Işcan, ed. Pp. 137-168. Springfield: Charles C. Thomas.
Meindl, Richard S., C. Owen Lovejoy, Robert P. Mensforth, and Robert A. Walker
1985 A Revised Method of Age Determination Using the Os Pubis, With a Review and Tests of Accuracy of Other Current Methods of Pubic Symphyseal Aging. American Journal of Physical Anthropology 68:29-45.
Mensforth, Robert P.
1990 Paleodemography of the Carlston Annis (Bt-5) Late Archaic Skeletal Population. American Journal of Physical Anthropology 82:81-99.
Milner, George R., James W. Wood, and Jesper L. Boldsen
2008 Advances in Paleodemography. In Biological Anthropology of the Human Skeleton. Pp. 561-600. M. Anne Katzenberg and Shelley R. Saunders, eds. Second Edition. Hoboken: Wiley-Liss.
Mincer, Harry H., Edward F. Harris, and Hugh E. Berryman
1993 The A.B.F.O. Study of Third Molar Development and Its Use as an Estimator of Chronological Age. Journal of Forensic Sciences 38(2):379-390.
Moore-Jansen, Peer M., Stephen D. Ousley, and Richard L. Jantz
1994 Data Collection Procedures for Forensic Skeletal Material. 3rd Edition. Knoxville, TN: University of Tennessee Department of Anthropology, Report of Investigations No. 48.
Moorrees, Coenraad F.A., Elizabeth A. Fanning, and Edward E. Hunt, Jr.
1963 Age Variation of Formation Stages for Ten Permanent Teeth. Journal of Dental Research 42(6):1490-1502.
Mulhern, Dawn M., and Erica B. Jones
2005 Test of Revised Method of Age Estimation From the Auricular Surface of the Ilium. American Journal of Physical Anthropology 126:61-65.
Murray, Katherine A., and Tracy Murray
1991 A Test of the Auricular Surface Aging Technique. Journal of Forensic Sciences 36(4):1162-1169.
192
National Academy of Sciences 2009 Strengthening Forensic Science in the United States: A Path Forward.
Washington, DC: National Academies Press. Nawrocki, Stephen P.
N.d. Prediction Intervals for Estimates of Age at Death from the Sternal Extremity of the Rib. Manuscript in preparation (unpublished).
1998 Regression Formulae for Estimating Age at Death from Cranial Suture Closure. In Forensic Osteology: Advances in the Identification of Human Remains. Second Edition. Kathleen J. Reichs, ed. Pp. 276-292. Springfield: Charles C. Thomas Publisher, Ltd.
Nemeskéri, J., L. Harsányi, and G. Acsádi
1960 Methoden zur Diagnose des Lebensalters von Skelettfunden. Anthropol Anzeiger 24:70-95.
Neter, John, William Wasserman, and G.A. Whitmore
1988 Applied Statistics. Third Edition. Boston: Allyn and Bacon, Inc. Osborne, Daniel
2000 Reconsidering the Auricular Surface as an Indicator of Age at Death. Masters Thesis, Department of Anthropology, Western Michigan University.
Osborne, Daniel L., Tal L. Simmons, and Stephen P. Nawrocki
2004 Reconsidering the Auricular Surface as an Indicator of Age at Death. Journal of Forensic Sciences 49(5):905-911.
Pyle, S. I., and N. L. Hoerr
1955 A Radiographic Standard of Reference for the Growing Knee. Second Edition. Springfield: Charles C. Thomas.
Rissech, Carme, George F. Estabrook, Eugenia Cunha, and Assumció Malgosa
2006 Using the Acetabulum to Estimate Age at Death of Adult Males. Journal of Forensic Sciences 51(2):213-229.
Ross, Ann H., and Lyle W. Konigsberg
2002 New Formulae for Estimating Stature in the Balkans. Journal of Forensic Sciences 47(1):165-167.
Rougé-Maillart, Clotilde, Norbert Telmon, Carme Rissech, Assumption Malgosa, and
Daniel Rougé 2004 The Determination of Male Adult Age at Death by Central and Posterior
Coxal Analysis – A Preliminary Study. Journal of Forensic Sciences 49(2):1-7.
193
Russell, Katherine F., Scott W. Simpson, Jeremy Genovese, Mary D. Kinkel, Richard S. Meindl, and C. Owen Lovejoy
1993 Independent Test of the Fourth Rib Aging Technique. American Journal of Physical Anthropology 92:53-62.
Saunders, Shelley, Carol DeVito, Ann Herring, Rebecca Southern, and Robert Hoppa
1993 Accuracy Tests of Tooth Formation Age Estimations for Human Skeletal Remains. American Journal of Physical Anthropology 92:173-188.
Saunders, S. R., C. Fitzgerald, T. Rogers, C. Dudar, and H. McKillop
1992 A Test of Several Methods of Skeletal Age Estimation Using a Documented Archaeological Sample. Canadian Society of Forensic Science Journal 25(2):97-118.
Schaefer, Maureen C., and Sue M. Black
2005 Comparison of Ages of Epiphyseal Union in North American and Bosnian Skeletal Material. Journal of Forensic Sciences 50(4):1-8.
Scheuer, Louise, and Sue Black
2000 Developmental Juvenile Osteology. San Diego, CA: Academic Press. Schmitt, Aurore
2002 Estimation de l’âge au décès des sujets adultes à partir du squelette: des raisons d’espérer. Bulletins et Mémoires de la Société d’Anthropologie de Paris 14(1-2): 1-20.
2004 Age-at-Death Assessment Using the Os Pubis and the Auricular Surface of the Ilium: a Test on an Identified Asian Sample. International Journal of Osteoarchaeology 14:1-6.
Schmitt, Aurore, and Pascal Murail
2004 Is the first rib a reliable indicator of age at death assessment? Test of the method developed by Kunos et al (1999). Homo 54(3):207-214.
Schmitt, Aurore, Pascal Murail, Eugenia Cunha, and Daniel Rougé
2002 Variability of the Pattern of Aging on the Human Skeleton: Evidence from Bone Indicators and Implications on Age at Death Estimation. Journal of Forensic Sciences 47:1203-1205.
Sinha, A., and V. Gupta
1995 A Study on Estimation of Age from Pubic Symphysis. Forensic Science International 75:73-78.
Solari, Ana C., and Kenneth Abramovitch
2002 The Accuracy and Precision of Third Molar Development as an Indicator of Chronological Age in Hispanics. Journal of Forensic Sciences 47(3):531-535.
194
Steadman, Dawnie Wolfe, Bradley J. Adams, and Lyle W. Konigsberg 2006 Statistical Basis for Positive Identification in Forensic Anthropology.
American Journal of Physical Anthropology 131:15-26. Stevenson, Paul H.
1924 Age Order of Epiphyseal Union in Man. American Journal of Physical Anthropology 7(1):53-93.
Stout, Sam D.
1989 The Use of Cortical Bone Histology to Estimate Age at Death. In Age Markers in the Human Skeleton. Mehmet Yaşar Işcan, ed. Pp. 195-207. Springfield: Charles C. Thomas.
Suchey, Judy Meyers
1979 Problems in the Aging of Females Using the Os Pubis. American Journal of Physical Anthropology 51:467-470.
Suchey, Judy Meyers, and Darryl Katz
1998 Applications of Pubic Age Determination in a Forensic Setting. In Forensic Osteology: Advances in the Identification of Human Remains. Second Edition. Kathleen J. Reichs, ed. Pp. 204-236. Springfield: Charles C. Thomas Publisher, Ltd.
Todd, T. Wingate
1920 Age Changes in the Pubic Bone. I: The Male White Pubis. American Journal of Physical Anthropology 3(3):285-334.
1921 Age Changes in the Pubic Bone. II: The Pubis of the Male Negro-White Hybrid. III: The Pubis of the White Female. IV: The Pubis of the Female Negro-White Hybrid. American Journal of Physical Anthropology 4(1):1-70.
Todd, T. W., and J. D’Errico, Jr.
1928 The Clavicular Epiphyses. American Journal of Anatomy 4:25-50. Todd, T. Wingate, and D. W. Lyon, Jr.
1924 Cranial Suture Closure, Its Progress and Age Relationship: Part I – Endocranial Closure in Adult Males of White Stock. American Journal of Physical Anthropology 7:325-384.
1925 Cranial Suture Closure, Its Progress and Age Relationship: Part II – Ectocranial Closure in Adult Males of White Stock. American Journal of Physical Anthropology 8(1):23-45.
Ubelaker, D. H.
1989 Human Skeletal Remains: Excavation, Analysis, and Interpretation. Second Edition. Washington, DC: Taraxacum.
195
Webb, Patricia A. Owings, and Judy Myers Suchey 1985 Epiphyseal Union of the Anterior Iliac Crest and Medial Clavicle in a Modern
Multiracial Sample of American Males and Females. American Journal of Physical Anthropology 68(4):457-466.
White, Tim D.
1991 Human Osteology. San Diego: Academic Press. 2000 Human Osteology. Second Edition. San Diego: Academic Press.
White, Tim D., and Pieter A. Folkens
2005 The Human Bone Manual. Amsterdam: Elsevier Academic Press. Wittwer-Backofen, Ursula, Jutta Gampe, and James W. Vaupel
2004 Tooth Cementum Annulation for Age Estimation: Results from a Large Known-Age Validation Study. American Journal of Physical Anthropology 123(2):119-129.
Wittwer-Backofen, Ursula, Jo Buckberry, Alfred Czarnetzki, Stefanie Doppler, Gisela
Grupe, Gerhard Hotz, Ariane Kemkes, Clark Spencer Larsen, Debbi Prince, Joachim Wahl, Alexander Fabig, and Svenja Weise
2008 Basics in Paleodemography: A Comparison of Age Indicators Applied to the Early Medieval Skeletal Sample of Lauchheim. American Journal of Physical Anthropology 137(4):384-396.
Wood, James W., George R. Milner, Henry C. Harpending, and Kenneth M. Weiss
1992 The Osteological Paradox: Problems of Inferring Prehistoric Health from Skeletal Samples. Current Anthropology 33(4):343-370.
Yoder, C., D. H. Ubelaker, and J. F. Powell
2001 Examination of Variation in Sternal Rib End Morphology Relevant to Age Assessment. Journal of Forensic Sciences 46(2):223-227.
Youden, W. J.
1998 Experimentation and Measurement. Mineola: Dover Publications, Inc.
APPENDIX A
197
FINAL SAMPLE SIZES FOR
ALL METHODS
Table A.1. Sample sizes per method.
*total number of teeth
Method Element N McKern and Stewart 1957 Pubic Symphysis (PS) 79 Suchey-Brooks Pubic Symphysis (PS) 93 Todd 1920, 1921 Pubic Symphysis (PS) 10 Lovejoy et al. 1985 Auricular Surface (AS) 147 Osborne et al. 2004 Auricular Surface (AS) 151 Buckberry and Chamberlain 2002 Auricular Surface (AS) 10 Iscan et al. 1984 Sternal Rib End (RIB) 21 Mann et al. 1991 Maxillary Sutures (MSUT) 62 Meindl and Lovejoy 1985 Ectocranial Sutures (CSUT) 22 Mincer et al. 1993 Dental Formation (DEN) 160* Moorrees et al. 1963 Dental Formation (DEN) 235*
198
Table A.2. Sample sizes for epiphyseal fusion methods.
Method Element N Albert-Maples 1995 Vertebrae (VERT) 24 Webb-Suchey 1985 Clavicle (CLAV) 33 Iliac Crest 6 McKern-Stewart 1957 All (EPIP) 161 Proximal Humerus 80 Distal Humerus 63 Medial Epicondyle 57 Proximal Radius 50 Distal Radius 51 Proximal Ulna 56 Distal Ulna 37 Proximal Femur 85 Greater Trochanter 70 Lesser Trochanter 65 Distal Femur 79 Proximal Tibia 72 Distal Tibia 65 Proximal Fibula 36 Distal Fibula 49 Clavicle 72 Iliac Crest 32 S1-S2 28 Vertebrae 18 Scheuer-Black 2000 Glenoid Fossa:
Scapula 7
Inferior Angle: Scapula
1
Coracoid: Scapula 2 Acromion: Scapula 1 Medial Clavicle 2 Acromion: Clavicle 2 Proximal Humerus 13 Distal Humerus 11 Medial Epicondyle 7 Proximal Radius 10 Distal Radius 8
199
Table A.2. (continued)
Method Element N Proximal Ulna 7 Distal Ulna 6 Proximal Femur 11 Greater Trochanter 8 Lesser Trochanter 9 Distal Femur 12 Proximal Tibia 8 Distal Tibia 8 Proximal Fibula 5 Distal Fibula 5 MC1 2 MC2 1 MC3 1 MC4 1 MC5 1 MT1 1 MT2 1 MT3 1 MT4 2 MT5 1 Phalanges 1 Talus 1 Calcaneus 1 Basilar Suture 1 Iliac crest 2 Acetabulum 1 Ischial tuberosity 2 S1/S2 2 S2/S3 2 S3/S4 1 S4/S5 1 S1 Superior 1 Cervical Vertebrae 1 Lumbar Vertebrae 1 Rib Heads 1 Sternal S2/S3 1
APPENDIX B
201
AGE DISTRIBUTIONS FOR LONG BONE EPIPHYSES (McKern-Stewart 1957)
Table B.1. Age distribution of stages of proximal humerus union (in %).
Age N 0 1 2 3 4 18 3 - 33 67 - - 19 6 - - 33 33 33 20 7 - - 71 14 14 21 15 - - - - 100 22 12 - - - 25 75 23 3 - - - - 100 24+ 33 - - - - 100
Total 79 Table B.2. Age distribution of stages of distal humerus union (in %)
Age N 0 1 2 3 4 18 2 - - - 50 50 19 2 - - - - 100 20 4 - - - - 100 21 12 - - - - 100 22 10 - - - - 100 23 3 - - - - 100 24+ 30 - - - - 100
Total 63
202
Table B.3. Age distributions of humeral medial epicondyle union (in %).
Age N 0 1 2 3 4 19 1 - - - - 100 20 4 - - - - 100 21 10 - - - - 100 22 9 - - - - 100 23 3 - - - - 100 24+ 30 - - - - 100
Total 57 Table B.4. Age distribution of stages of distal radius union (in %).
Age N 0 1 2 3 4 17 1 100 - - - - 18 2 - - 50 - 50 19 2 - - 100 - - 20 5 40 - - 60 - 21 9 - - - - 100 22 5 - - - - 100 23 4 - - - 25 75 24+ 22 - - - - 100
Total 50 Table B.5. Age distribution of stages of distal ulna union (in %).
Age N 0 1 2 3 4 18 1 - - - 100 - 19 2 - - 50 50 - 20 3 33 - - 67 - 21 8 - - - - 100 22 2 - - - - 100 23 2 - - - - 100 24+ 18 - - - - 100
Total 36
203
Table B.6. Age distribution of stages of proximal femur union (in %).
Age N 0 1 2 3 4 17 1 100 - - - - 18 2 - - 50 50 - 19 3 - - - - 100 20 6 - - - 17 83 21 16 - - - - 100 22 10 - - - 10 90 23 9 - - - - 100 24+ 38 - - - - 100
Total 85 Table B.7. Age distribution of stages of femoral greater trochanter union (in %).
Age N 0 1 2 3 4
17 1 100 - - - - 18 1 - 100 - - - 19 2 - - - - 100 20 2 - - - - 100 21 15 - - - - 100 22 9 - - - - 100 23 6 - - - - 100 24+ 34 - - - - 100
Total 70 Table B.8. Age distribution of stages of femoral lesser trochanter union (in %).
Age N 0 1 2 3 4 19 2 - - - - 100 20 2 - - - - 100 21 14 - - - - 100 22 8 - - - - 100 23 6 - - - - 100 24+ 33 - - - - 100
Total 65
204
Table B.9. Age distribution of stages of distal femur union (in %).
Age N 0 1 2 3 4 17 1 100 - - - - 18 2 - - - 100 - 19 4 25 - 50 - 25 20 7 14 - - 14 71 21 16 - - - - 100 22 9 - - - - 100 23 7 - - - 14 86 24+ 33 - - - - 100
Total 79 Table B.10. Age distribution of stages of proximal tibia union (in %).
Age N 0 1 2 3 4 17 1 100 - - - - 18 2 - 50 - 50 - 19 4 - - 25 50 25 20 6 - - 17 33 50 21 16 - - - - 100 22 7 - - - - 100 23 7 - - - - 100 24 7 - - - - 100 25 1 - - - - 100 26 5 - - - 20 80 27+ 16 - - - - 100
Total 72
205
Table B.11. Age distribution of stages of distal tibia union (in %).
Age N 0 1 2 3 4 18 2 50 - - - 50 19 5 - - 20 20 60 20 4 - - - 25 75 21 11 - - - - 100 22 8 - - - - 100 23 3 - - - - 100 24+ 32 - - - - 100
Total 65 Table B.12. Age distribution of stages of proximal fibula union (in %).
Age N 0 1 2 3 4 20 2 - - 50 50 - 21 7 - - - - 100 22 6 - - - - 100 23 3 - - - - 100 24+ 18 - - - - 100
Total 34 Table B.13. Age distribution of stages of distal fibula union (in %).
Age N 0 1 2 3 4 18 1 - - - 100 - 19 2 - - - 50 50 20 2 - - - 50 50 21 9 - - - - 100 22 6 - - - - 100 23 3 - - - - 100 24+ 26 - - - - 100
Total 49
APPENDIX C
207
MOORREES ET AL. (1963) CORRECT AND
INCORRECT CLASSIFICATION BY
TOOTH NUMBER AND ROOT
Table C.1. #17 mesial. Stage N # Correct % Correct # Incorrect % Incorrect R1/2 2 0 0 2 100 R3/4 3 2 66.67 1 33.33 Rc 2 1 50 1 50 Rc-A1/2 2 1 50 1 50 A1/2 3 2 66.67 1 33.33 Ac 42 42 100 0 0 Total 54 48 88.89 6 11.11
Table C.2. #17 distal. Stage N # Correct % Correct # Incorrect % Incorrect R1/2 3 0 0 3 100 R3/4 4 2 50 2 50 Rc 2 2 100 0 0 Rc-A1/2 2 1 50 1 50 A1/2 3 2 66.67 1 33.33 Ac 48 48 100 0 0 Total 62 55 88.71 7 11.29
208
Table C.3. #32 mesial. Stage N # Correct % Correct # Incorrect % Incorrect R1/2 2 0 0 2 100 R3/4 4 2 50 2 50 Rc 2 1 50 1 50 Rc-A1/2 2 1 50 1 50 A1/2 2 2 100 0 0 Ac 43 43 100 0 0 Total 55 49 89.09 6 10.91
Table C.4. #32 distal. Stage N # Correct % Correct # Incorrect % Incorrect R1/2 3 0 0 3 100 R3/4 5 3 60 2 40 Rc 4 3 75 1 25 Rc-A1/2 2 1 50 1 50 A1/2 2 2 100 0 0 Ac 48 48 100 0 0 Total 64 57 89.06 7 10.94
APPENDIX D
210
MINCER ET AL. (1993) CORRECT AND
INCORRECT CLASSIFICATION BY
TOOTH NUMBER
Table D.1. Tooth #1. Stage N # Correct % Correct # Incorrect % Incorrect D 0 - - - - E 1 1 100 0 0 F 2 2 100 0 0 G 4 2 50 2 50 H 36 36 100 0 0 Total 43 41 95.35 2 4.65
Table D.2. Tooth #16. Stage N # Correct % Correct # Incorrect % Incorrect D 1 0 0 1 100 E 1 1 100 0 0 F 2 2 100 0 0 G 5 3 60 2 40 H 32 32 100 0 0 Total 41 38 92.68 3 7.32
211
Table D.3. Tooth #17. Stage N # Correct % Correct # Incorrect % Incorrect D 0 0 - - - E 0 0 - - - F 1 1 100 0 0 G 4 3 75 1 25 H 32 32 100 0 0 Total 37 36 97.3 1 2.7
Table D.4. Tooth #32. Stage N # Correct % Correct # Incorrect % Incorrect D 0 0 - - - E 1 1 100 0 0 F 1 1 100 0 0 G 4 3 75 1 25 H 33 33 100 0 0 Total 39 38 97.44 1 2.56