uncertainty in skeletal aging: a retrospective study and test of

UNCERTAINTY IN SKELETAL AGING: A RETROSPECTIVE STUDY

AND TEST OF SKELETAL AGING METHODS AT THE JOINT

POW/MIA ACCOUNTING COMMAND CENTRAL

IDENTIFICATION LABORATORY

____________

A Thesis

Presented

to the Faculty of

California State University, Chico

____________

In Partial Fulfillment

of the Requirements for the Degree

Master of Arts

in

Anthropology

____________

by

Carrie Ann Brown

Spring 2009





A Thesis

by

Carrie Ann Brown

Spring 2009

APPROVED BY THE DEAN OF THE SCHOOL OF

GRADUATE, INTERNATIONAL, AND INTERDISCIPLINARY STUDIES:

_________________________________ Susan E. Place, Ph.D.

APPROVED BY THE GRADUATE ADVISORY COMMITTEE:

_________________________________ Eric J. Bartelink, Ph.D., Chair

_________________________________ Beth S. Shook, Ph.D.

_________________________________ John E. Byrd, Ph.D.

iii

DEDICATION

To the men and women who give their lives to protect our freedom, and to those who never returned…you are not forgotten.

Until they are home.

iv

ACKNOWLEDGMENTS

This thesis has been a labor of love. Its successful completion would not have

been possible without contributions from a significant number of people. What follows is

not an exhaustive list of everyone I have met during this process, but a thank you to the

key players in my research and writing and to those people that have made my graduate

school experience an unforgettable one.

Thanks are in order first to my thesis committee: Drs. Eric Bartelink, Beth

Shook, and John Byrd. As my chair, Dr. Bartelink has dutifully (and quickly) read my

thesis in the past few months, always offering insightful and timely comments and

critiques. I truly appreciate his help and regular availability, especially considering his

hectic schedule and his many, many duties as advisor, professor, researcher, and

anthropologist. The physical anthropology graduate students, including myself, benefit

tremendously from his expertise and dedication. I truly believe that my three years at

Chico State would not have been as successful or fulfilling without his guidance and

support.

An enormous thank you goes to Dr. Shook for serving on my committee. Her

comments helped me refine my ideas and statistical analyses. She too is a fabulous

mentor for the graduate students as a whole and serves as a model professor, the kind that

I one day hope to become. Additionally, she has served on my committee while expecting

v

her first child. Her son, Joel Douglas Shook, was born April 12, 2009, and my thanks to

him for his timely arrival following my thesis defense!

I am indebted to Dr. Byrd for my thesis topic and his statistical expertise, as

well as his guidance during data collection. When he asked me a little over a year ago

what my thesis topic was going to be, I had no way of knowing how his idea would grow

into a research project for my thesis, but also for years to come. His support and

confidence in my skills as an anthropologist are unwavering and I look forward to

working with him and the rest of the staff at the Central Identification Laboratory for

years to come, whether in Hawaii or elsewhere.

I would also like to thank the California State University, Chico,

Anthropology Department. Without listing all of the faculty and staff here, it is important

for me to say that this department is unique because while not everyone agrees all of the

time (perhaps even most of the time), we all still manage to get along (and have fun!). I

still remember how I felt immediately at home the first time that I met my cohort and a

majority of the faculty. This experience certainly would not have been as rich or as varied

without the many personalities that make up this department and I am thankful for this.

Thank you to my graduate cohort. The end goal of graduate school is certainly

important, but I have found the journey to be equally as rewarding. Through all the first

semester tears and frustrations, our food-filled seminars, and life outside of anthropology,

however rare, I know that I have made friends for life. I look forward to future

collaborations with them, as well as always sharing the memories of our years in graduate

school at Chico State.

vi

Thank you to all of the anonymous volunteers from the CIL and at the 2009

AAFS meetings that participated in the interobserver error study. Thank you also to the

three other members of the inaugural Forensic Science Academy: Angela Soler, Cate

Bird, and Laurel Freas. They were the first to hear about my research as I worked through

the kinks of data collection and initial statistical analyses and the first to volunteer their

time to test several age estimation methods. Additionally, Angela helped run my study at

the AAFS meetings. Unrelated to this thesis, but certainly not to my overall graduate

school experience, they were there for me through thick and thin, from activity-filled

days at the lab to the jungles in Laos.

Apart from my current academic community, I would also like to thank my

family, my first academic community, for their continued support. This includes late

night and weekend phone calls fielded by my mother, many, many conversations with my

sister, and every supportive and humorous email from my father. I am thankful that they

have always had faith in me and my dreams, especially in my earlier years when I was

not quite sure what they were. I am also indebted to them for instilling a life-long love of

learning and sense of curiosity, no matter how much my sister and I harass our father

about his many interests. Thanks to Grandma and Grandpa Brown for their support and

excitement about all those “bone books” I bring for them to read and thanks to

Grandaddy and Grandmommy Newcomer who are no longer with us.

Finally, thank you to everyone I have met during the thesis process and over

the last three years. I have to thank Empire Coffee for the delicious Aztec mochas and a

place to go when I could not possibly work any longer at home or school. Thanks to

vii

Shannon Damon for also offering me an alternative workspace, the Human ID Lab,

where I wrote a good number of pages. Thanks to Karen Smith Gardner for allowing me

to invade her house while I was in between homes, and also for being a great hostess

even though she swore she would never have the time! And thank you to the rest of my

friends here in Chico who are always ready to provide support and laughter at every

moment, whether it be knee-fighting or good old-fashioned jokes.

The end of my thesis is bittersweet because it signifies the end of three years

of hard work, but also the end of my time in Chico. A large number of students in my

cohort will also leave Chico at the end of May and I wish them all great success and

happiness. I look forward to the future with great anticipation, but also look back fondly

on my time here. Again, thank you to everyone who was a part of my life during the past

three years.

This research was supported in part by an appointment to the Student

Research Participation Program at the Joint POW/MIA Accounting Command/Central

Identification Laboratory (JPAC/CIL) administered by the Oak Ridge Institute for

Science and Education (ORISE) through an interagency agreement between the U.S.

Department of Energy and JPAC/CIL.

viii

TABLE OF CONTENTS

PAGE

Dedication................................................................................................................... iii Acknowledgments ...................................................................................................... iv List of Tables.............................................................................................................. xi List of Figures............................................................................................................. xiv Abstract....................................................................................................................... xix

CHAPTER I. Introduction .............................................................................................. 1

Research Design ........................................................................... 3 Outline of the Thesis .................................................................... 7

II. Adult Skeletal Aging ................................................................................ 8

Historical Perspectives ................................................................. 8 General Concepts.......................................................................... 10 Key Terms .................................................................................... 12 Trends ........................................................................................... 14 Published Methods ....................................................................... 18 The Statistical Basis of Age Estimation ....................................... 42 Summary....................................................................................... 45

III. Uncertainty Analysis ................................................................................ 46

Standards ...................................................................................... 46 Error.............................................................................................. 51 Uncertainty in Age Estimation ..................................................... 52 Summary....................................................................................... 58

ix

CHAPTER PAGE

IV. Methods I: Retrospective Study ............................................................... 60

The Sample................................................................................... 60 Data Collection............................................................................. 62 Data Analysis................................................................................ 68 Summary....................................................................................... 72

V. Methods II: Interobserver Error Study ..................................................... 73

Choice of Methods ....................................................................... 73 Design of Study ............................................................................ 75 Data Analysis................................................................................ 77 Summary....................................................................................... 79

VI. Results I: Retrospective Study.................................................................. 80

The Sample................................................................................... 80 Method-to-Method Comparison................................................... 91 Method by Method ....................................................................... 96 Summary....................................................................................... 136

VII. Results II: Interobserver Error Study ....................................................... 138

Participants ................................................................................... 138 Method Performance .................................................................... 141 Summary....................................................................................... 155

VIII. Discussion................................................................................................. 157

Method Performance .................................................................... 157 Error.............................................................................................. 167 Analyst Experience....................................................................... 173 General Observations ................................................................... 175 Limitations of the Thesis .............................................................. 176 Summary....................................................................................... 178

IX. Summary................................................................................................... 179

Uncertainty in Skeletal Age Estimation ....................................... 179 Future Research ............................................................................ 180

References Cited......................................................................................................... 183

x

CHAPTER PAGE Appendices A. Final Sample Sizes for All Methods......................................................... 196 B. Age Distributions for Long Bone Epiphyses:

McKern-Stewart (1957) .................................................................... 200 C. Moorrees et al. (1963) Correct and Incorrect Classification

by Tooth Number and Root .............................................................. 206 D. Mincer et al. (1993) Correct and Incorrect Classification by

Tooth Number ................................................................................... 209

xi

LIST OF TABLES

TABLE PAGE 1. Epiphyseal Scoring by Method ................................................................. 64 2. Descriptive Statistics by Method .............................................................. 89 3. P-values From One-way ANOVA with Bonferroni Correction:

All Methods....................................................................................... 90 4. Correct and Incorrect Classifications by Method (Excluding

Epiphyseal Fusion) ............................................................................ 91 5. Error by Method (Excluding Epiphyseal Fusion) ..................................... 92 6. Comparison of Pearson’s r and r2 by Method (Excluding Dental

Methods)............................................................................................ 95 7. Correct and Incorrect Classifications for Epiphyseal Fusion

Methods ............................................................................................. 97 8. Age Distribution of Stages of Iliac Crest Union (in %):

McKern-Stewart (1957)..................................................................... 102 9. Correct and Incorrect Classifications by Age Interval of the

Mann et al. Maxillary Suture Method (1987, 1991).......................... 104 10. Error Values for Mann et al. (1991) by Reported Interval........................ 105 11. Correct and Incorrect Classifications of Dental Formation Methods ....... 107 12. Error of Dental Formation Methods.......................................................... 107 13. Age Distribution of Stages of Dental Root Formation (in %):

Moorrees et al .................................................................................... 108 14. Age Distribution of Stages of Dental Root Formation (in %):

Mincer et al........................................................................................ 109

xii

TABLE PAGE 15. Todd (1920) Pubic Symphysis Method Sample (n=10) ........................... 110 16. Correct and Incorrect Classification: McKern-Stewart (1957)

Pubic Symphysis Method. ................................................................. 112 17. Error of McKern-Stewart (1957) Pubic Symphysis Method by

Composite Score Group .................................................................... 114 18. Correct and Incorrect Classification: Suchey-Brooks Pubic

Symphysis Method ............................................................................ 116

19. Error of Suchey-Brooks Pubic Symphysis Method by Phase ................... 117 20. P-Values from ANOVA with Bonferroni Correction Between

the First Four Phases of the Suchey-Brooks Method: Bias ............... 119 21. P-Values from ANOVA with Bonferroni Correction Between

the First Four Phases of the Suchey-Brooks Method: Inaccuracy..... 119 22. Comparison of Bias and Inaccuracy: McKern-Stewart and

Suchey-Brooks Pubic Symphysis Methods ....................................... 120 23. Correct and Incorrect Classification: Lovejoy et al. (1985b)

Auricular Surface Method ................................................................. 122 24. Correct and Incorrect Classification for Single Phases Only:

Lovejoy et al. (1985b) Auricular Surface Method............................. 123 25. Error of Lovejoy et al. (1985b) Auricular Surface Method by Phase ....... 124 26. Error of Lovejoy et al. (1985b) Auricular Surface Method:

Single Phases Only............................................................................ 125 27. P-Values from ANOVA with Bonferroni Correction Between

the First Three Phases of the Lovejoy et al. (1985b) Method: Bias ..................................................................................... 127

28. P-Values from ANOVA with Bonferroni Correction Between

the First Three Phases of the Lovejoy et al. (1985b) Method: Inaccuracy ........................................................................... 127

xiii

TABLE PAGE 29. P-Values from ANOVA with Bonferroni Correction Between

the First Three Phases of the Lovejoy et al. (1985b) Method: SEI....................................................................................... 127

30. Correct and Incorrect Classification: Osborne et al. (2004)

Auricular Surface method.................................................................. 128 31. Error of Osborne et al. (2004) Auricular Surface Method by Phase......... 129 32. Buckberry-Chamberlain (2002) Auricular Surface Method

Sample (n=10)................................................................................... 132 33. Correct and Incorrect Classification: Iscan et al. (1984b)

Sternal Rib End Method.................................................................... 134 34. Correct and Incorrect Classification Using Nawrocki (n.d.)

Prediction Intervals............................................................................ 134 35. Error of Iscan et al. (1984b) Sternal Rib End Method by Phase............... 136 36. Percent Confidence in Assigned Composite Score by Stage:

Samples A and B ............................................................................... 142 37. SEI by Highest Degree Obtained: Samples A and B ................................ 144 38. SEI by Years of Experience in Skeletal Aging: Samples A and B ........... 144 39. Percent Confidence in Assigned Phase by Stage: Samples C and D ........ 147 40. SEI by Highest Degree Obtained: Samples C and D ................................ 149 41. SEI by Years of Experience in Skeletal Aging: Samples C and D ........... 149

xiv

LIST OF FIGURES

FIGURE PAGE 1. Individuals Identified by Conflict (N=1717)............................................. 61 2. Age Distribution of Total Sample (n=979)............................................... 81 3. Age Distribution: Albert-Maples 1995 (n=24) ......................................... 82 4. Age Distribution: Webb-Suchey Clavicle 1985 (n=33)............................ 82 5. Age Distribution: McKern-Stewart Epiphyses 1957 (n=161).................. 83 6. Age Distribution: McKern-Stewart Pubic Symphysis 1957 (n=79) ......... 83 7. Age Distribution: Suchey-Brooks Pubic Symphysis (n=10)..................... 84 8. Age Distribution: Todd Pubic Symphysis 1920 (n=93)............................ 84 9. Age Distribution: Lovejoy et al. Auricular Surface 1985 (n=147) ........... 85 10. Age Distribution: Osborne et al. Auricular Surface 2004 (n=151)........... 85 11. Age Distribution: Buckberry-Chamberlain Auricular Surface

2002 (n=10) ....................................................................................... 86 12. Age Distribution: Iscan et al. Sternal Rib End 1984 (n=21)..................... 86 13. Age Distribution: Moorrees et al. Dental Formation 1963 (n=92) ........... 87 14. Age Distribution: Mincer et al. Dental Formation 1993 (n=105) ............. 87 15. Age Distribution: Mann et al. Maxillary Sutures 1991 (n=55)................. 88 16. Sum of Bias by Method (in Years) ........................................................... 94 17. Sum of Inaccuracy by Method (in Years) ................................................. 94

xv

FIGURE PAGE 18. Mean SEI by Method ................................................................................ 95 19. Comparison of Known Ages of Identified Males Superimposed

over the Summary Stage Observations for the Three Stages of Vertebral Centra Fusion as Given in the Albert and Maples (1995) Method ...................................................................... 99

20. Comparison of Known Ages of Identified Males Superimposed

over the Summary Stage Observations for the Four Stages of Vertebral Centra Fusion as Given in the Mckern and Stewart (1957) Method...................................................................... 100


over the Age Intervals for the Four Stages of Epiphyseal Fusion of the Medial Clavicle as Given in the Webb and Suchey (1985) Method ...................................................................... 101


over the Age Intervals for the Five Stages of Epiphyseal Fusion of the Medial Clavicle as Given in the Mckern and Stewart (1957) Method. .............................................................. 102


over the Age Intervals for the Five Stages of Epiphyseal Fusion of the First Two Sacral Segments as Given in the Mckern and Stewart (1957) Method.................................................. 103

24. Distribution of Bias for the Mann et al. (1991) Maxillary Suture

Method............................................................................................... 105 25. Correlation of Estimated and Known Ages-at-Death for the

Mann et al. (1991) Maxillary Suture Method (n=27) ....................... 106 26. Distribution of Bias for the Todd (1920) Pubic Symphysis

Method............................................................................................... 111 27. Correlation of Estimated and Known Age-at-Death for the

Todd (1920) Pubic Symphysis Method (n=10)................................. 112

xvi

FIGURE PAGE 28. Comparison of Known Ages of Identified Males Superimposed

over the Age Intervals for the Composite Scores of Pubic Symphysis Components as Given in the Mckern and Stewart (1957) Method...................................................................... 113

29. Distribution of Bias for the Mckern-Stewart (1957) Pubic Symphysis Method ............................................................................ 114

30. Correlation of Estimated and Known Age-at-Death for the

Mckern-Stewart (1957) Pubic Symphysis Method (n=73) ............... 115 31. Comparison of Known Ages of Identified Males Superimposed

over the Age Intervals for the Six Phases of the Pubis Symphysis as Given in the Suchey-Brooks Pubic Symphysis Method............................................................................................... 117

32. Distribution of Bias for the Suchey-Brooks Pubic Symphysis

Method............................................................................................... 118 33. Correlation of Estimated and Known Age-at-Death for the

Suchey-Brooks Pubic Symphysis Method (n=86) ............................ 119 34. Comparison of Known Ages of Identified Males Superimposed

over the Age Intervals for the Eight Phases of the Auricular Surface as Given in the Lovejoy et al. (1985b) Auricular Surface Method. ................................................................................ 123

35. Distribution of Bias for the Lovejoy et al. (1985b) Auricular

Surface Method ................................................................................. 125 36. Correlation of Estimated and Known Age-at-Death for the

Lovejoy et al. (1985b) Auricular Surface Method (n=147) .............. 126 37. Comparison of Known Ages of Identified Males Superimposed

over the Age Intervals for the Six Phases of the Auricular Surface as Given in the Osborne et al. (2004) Auricular Surface Method ................................................................................. 129

38. Distribution of Bias for the Osborne et al. (2004) Auricular

Surface Method ................................................................................. 130

xvii

FIGURE PAGE 39. Correlation of Estimated and Known Age-at-Death for the

Osborne et al. (2004) Auricular Surface Method (n=113)................ 131 40. Comparison of Known Ages of Identified Males Superimposed

over the Age Intervals for the Seven Stages of the Auricular Surface as Given in the Buckberry-Chamberlain (2002) Revised Auricular Surface Method ................................................... 133

41. Correlation of Estimated and Known Age-at-Death for the

Buckberry-Chamberlain (2002) Revised Auricular Surface Method (n=9) .................................................................................... 133


over the Age Intervals for the Eight Phases of the Sternal Rib End as Given in the Iscan et al. (1984b) Sternal Rib End Method (Solid Rectangles) and Prediction Intervals as Calculated by Nawrocki (Dashed Rectangles) .................................. 135

43. Distribution of Bias for the Iscan et al. (1984b) Sternal Rib

End Method ....................................................................................... 137 44. Correlation of Estimated and Known Age-at-Death for the

Iscan et al. (1984b) Sternal Rib End Method (n=14)........................ 137 45. Participants’ Self-Reported Fields of Study.............................................. 139 46. Participants’ Self-Reported Highest Degrees Obtained............................ 139 47. Participants’ Self-Reported Years of Experience with Skeletal

Aging ................................................................................................. 140 48. Participants’ Self-Reported Approximate Number of Skeletons

Analyzed............................................................................................ 141 49. Distribution of Assigned Stages for Sample A (n=37) ............................. 143 50. Distribution of Assigned Stages for Sample B (n=37) ............................. 145 51. Distribution of Assigned Phases for Sample C (n=37)............................. 148 52. Distribution of Assigned Phases for Sample D (n=34)............................. 150

xviii

FIGURE PAGE 53. All Combinations of Suture Obliteration As Reported by

Participants for Sample E (n=38) ...................................................... 153 54. Frequency of Sutures Scored As Obliterated: Sample E .......................... 154 55. All Combinations of Suture Obliteration As Reported by

Participants for Sample F (n=36) ...................................................... 155 56. Frequency of Sutures Scored As Obliterated: Sample F........................... 156

xix

ABSTRACT





by

Carrie Ann Brown

Master of Arts in Anthropology

California State University, Chico

Spring 2009

Adult skeletal age estimation is an important facet of forensic anthropology,

paleodemography, and bioarchaeology. Estimating the age-at-death of adults is prob-

lematic because of human variability in the aging process. Analysis of the error associ-

ated with skeletal age estimation methods is necessary so that the performance of these

methods is not overestimated and so that the uncertainty in these skeletal techniques can

be quantified and better understood.

The purpose of this thesis is to analyze and describe the error associated

with skeletal age estimation methods used at the Joint POW/MIA Accounting Com-

mand Central Identification Laboratory (JPAC/CIL) from 1972 to 31 July 2008. There

were

xx

six general categories of age estimation methods used: epiphyseal fusion, suture clo-

sure, dental formation and eruption, and morphological changes in the pubic symphysis,

auricular surface, and sternal rib end. The total identified known age-at-death sample

was 979 individuals, although method sub-samples were much smaller. Additional in-

terobserver error research was conducted with three methods that were problematic for

the JPAC/CIL sample.

Results indicate that adult age estimation methods perform well for the

JPAC/CIL identified known age-at-death sample, most likely because of the young age

composition of this sample. Bias, inaccuracy, and scaled error index (SEI) values are

low for most methods and phases or stages of methods. Correlation between estimated

and known age-at-death is statistically significant for maxillary suture closure, pubic

symphysis, auricular surface, and sternal rib end methods. The auricular surface is the

poorest age indicator of those examined in the JPAC/CIL sample. It is also recom-

mended that fusion of the sacral segments no longer be used for age estimation since

this method had a correct classification rate of only 32.1%. Future research in adult

skeletal age estimation and refinement of existing techniques should include estimation

of measurement uncertainty.

1

CHAPTER I

INTRODUCTION

Physical anthropology examines what it means to be biologically human,

including human variability. Estimating age-at-death from skeletal remains is an integral

part of the field of physical anthropology because it helps contribute to the understanding

of aging and aging processes as well as facilitating the construction of individual and

population profiles. Forensic anthropologists, bioarchaeologists, and paleodemographers

all apply age estimation techniques and make significant contributions to research in age

estimation.

Methods for estimating the age-at-death of sub-adults rely on processes of

growth and development, while those for adults generally rely on skeletal degeneration.

Due to this, sub-adult aging techniques are subjected to far less scrutiny than adult aging

techniques since growth and development follows a similar pattern for all individuals,

unlike the adult aging process. Aykroyd et al. (1999) have shown that many aging

techniques have a tendency to overage younger individuals and underage older

individuals. Additionally, age estimation of older individuals is more problematic than

that of younger individuals (e.g., Schmitt et al. 2002; Berg 2008).

Methods of age estimation in adults can be broken down into several broad

categories: epiphyseal fusion, suture closure, third molar formation and eruption,

morphological changes in the pubic symphysis, auricular surface, and sternal rib ends,

2

and also methods that combine a number of skeletal age indicators. This list is not

intended to be exhaustive, but rather to represent the most commonly employed

techniques for age estimation and those techniques that will be examined in this study.

Anthropologists usually apply a variety of methods to come up with a “best-fit” age

estimation. Research in refining and developing age estimation techniques is on-going

and varied.

Mathematical models of age estimation have undergone significant critiques.

Traditionally, linear regression and correlation have been used to derive age estimates

from skeletal age indicators. Recent research has suggested that Bayesian-based models

may be better suited to age estimation calculations than regression and correlation (e.g.,

Konigsberg and Frankenberg 1992; Schmitt et al. 2002; Steadman et al. 2006). While

mathematically more complex than traditional models, Bayesian models have the ability

to analyze non-linear relationships and may even be more successful in predicting the

age-at-death of older individuals (Schmitt et al. 2002).

Another key concern in age estimation is understanding the error associated

with different methods. This is particularly relevant in the field of forensic anthropology,

where evidence presented as part of expert witness testimony must have known error

rates and standards of application (Daubert v. Merrell Dow Pharmaceuticals, 509 US 579

[1993]). When applying several age estimation methods, it is important to not

overestimate the performance of a single method and to use methods appropriate to the

general age category of the individual (Martrille et al. 2007).

Estimating uncertainty associated with different measurements is

internationally regulated. The International Organization for Standardization (ISO)

3

publishes baseline documents to ensure global standardization in measurement, including

guidelines for estimating measurement uncertainty. Additionally, the American Society

of Crime Laboratory Directors/Lab Accreditation Board (ASCLD/LAB) regulates and

accredits forensic science testing laboratories in the United States and worldwide.

Finally, each institution is charged with maintaining its own standards that adhere to both

ISO and ASCLD/LAB regulations in order to maintain accreditation. This is usually

accomplished by the establishment and maintenance of standard operating procedures

(SOPs).

The Joint POW/MIA Accounting Command Central Identification Laboratory

(JPAC/CIL), located on Hickam Air Force Base (AFB) is tasked with the recovery and

identification of servicemen and women lost in foreign conflicts and is the largest skeletal

human identification laboratory in the world. The composition of the JPAC/CIL sample

is largely male, young, Caucasian, and of similar stature. This laboratory is accredited

under the ASCLD/LAB-International program and follows ISO standards as well as

ASCLD/LAB supplemental requirements for estimating uncertainty in measurement,

which includes age estimation methods. Currently, the JPAC/CIL laboratory manual does

not detail error associated with skeletal age estimation, but SOP 3.4 (Determining

Biological Profiles) outlines the procedures to follow for age estimation.

Research Design

The purpose of this thesis is to analyze and describe the error associated with

each skeletal age estimation method currently in use at the JPAC/CIL and to determine

how well each method performed for this sample. Because adult age estimation

4

techniques do not perform as well as sub-adult techniques and this sample is largely

comprised of young adults, it is important to continually analyze possible measurement

uncertainty on a method-by-method basis. Since the JPAC/CIL is an accredited

institution, it is also necessary to quantify the error associated with each method.

Therefore, this study also has the purpose of estimating measurement uncertainty in

relation to laboratory SOPs. Records of individuals identified between 1972 and 31 July

2008 were used to calculate error. Additional tests of error were conducted with skeletal

samples from the JPAC/CIL anatomical collection.

Hypothesis 1

The JPAC/CIL has a long history of identifying American war dead and

contributing to research in human identification. Of research in age estimation, McKern

and Stewart (1957) is certainly one of the most pivotal studies. McKern and Stewart’s

research was based on identified Korean War casualties and represents a sample

demographically similar to that of the JPAC/CIL sample used in this study. Many of the

observations made by McKern and Stewart are still used at the JPAC/CIL, including

epiphyseal fusion of the long bones, iliac crest, medial clavicle, and vertebrae and the

component scoring system for the pubic symphysis.

The sample of identified individuals from the JPAC/CIL is entirely young

men. Aging methods usually perform better for younger individuals (as compared to

older individuals). The age estimation methods employed at the JPAC/CIL conform to

this general rule, especially when considering that several of the methods are particularly

useful for late adolescents (e.g., epiphyseal fusion, dental formation and eruption).

5

Due to the use of methods developed on a similar sample and the young

composition of the JPAC/CIL identified sample, it is expected that age estimation

methods will perform well overall. Method performance will be measured by correct and

incorrect classifications, bias, inaccuracy, a scaled error index (SEI), and Pearson’s r

correlation coefficient, which will be calculated for each method. Methods that perform

well should have a high percentage of correct classifications, low bias, inaccuracy, and

SEI, and a high correlation between known and estimated age-at-death.

Hypothesis 2

Error can be measured in a variety of ways. Several different calculations

provide quantifications of error. Bias is the average error in years that takes into

consideration under- and overaging, while inaccuracy is the average error in years with

no implication of directionality (Meindl and Lovejoy 1989). The SEI is an index

developed for this thesis that allows the comparison of error between estimated and

actual age regardless of scale or sample size.

Correct application of a method produces error values that are normally

distributed. Because the measure of bias produces both positive and negative values, it

will be used to examine error distribution. Analysis of variance (ANOVA) and t-tests can

also help to determine where error is most pronounced. For example, a significant

difference in bias, inaccuracy, or SEI between phases of a single method indicates which

phase may be more likely to produce incorrect age estimations. When possible, methods

that use the same skeletal indicators (e.g., pubic symphysis methods) will be compared to

one another.

6

It is expected that error (as indicated by bias values) will be normally

distributed for each method. Additionally, methods that have higher accuracy are

expected to be less precise. Assignment of age estimations based on stages or phases

should also be consistent between individuals.

Hypothesis 3

The success of skeletal age estimation methods can also be related to analyst

experience. Experience can be measured by highest degree held, total number of

skeletons analyzed, and familiarity with the method or methods employed. For this

portion of the study, the SEI will be calculated and compared between groups with

different levels of experience, similar to Adams and Byrd (2002).

It is expected that the SEI is dependent upon experience. Those individuals

with more experience in skeletal age estimation, as indicated by highest degree obtained

and number of years of experience in skeletal aging, will have lower average SEI scores

than individuals with less experience. Possible differences in relation to how confident

analysts felt in their final age estimations will also be examined.

Other Questions

There are several other research questions that are not based on strict

hypothesis-testing. These questions are descriptive in nature and are designed to better

understand age estimation methods in the context of the JPAC/CIL SOPs. The following

questions will be discussed: what are the sources of error in age estimation at the

JPAC/CIL? Is error systematic or random? What recommendations can be made for age

estimation methods used at the JPAC/CIL?

7

Outline of the Thesis

In order to understand the performance of age estimation methods in the

JPAC/CIL identified sample, it is necessary to first understand the basis of skeletal age

estimation, methods used to estimate age-at-death, and procedures for estimating

measurement uncertainty. Chapter II is an in-depth literature review of adult skeletal age

estimation, including historical perspectives, general concepts and terms, trends,

published aging methods, and the statistical basis of age estimation. Chapter III focuses

on uncertainty analysis, outlining standards for estimating measurement uncertainty and

error.

Chapters IV and V outline the methods used to conduct both portions of this

study. Chapter VI details the results of the retrospective study, conducted using the

records of identified individuals, and Chapter VII gives the results of a preliminary

investigation of the application of three age estimation methods. Chapter VIII discusses

the results from both studies, synthesizes these results with relevant concepts outlined in

Chapters II and III, and details limitations of this thesis. Finally, Chapter IX summarizes

all findings and suggests avenues for future research.

8

CHAPTER II

ADULT SKELETAL AGING

Age estimation of unidentified adult skeletal remains is a significant facet of

applied physical anthropology. A variety of methods and mathematical analyses are

employed in an attempt to construct accurate individual and population profiles based on

age-at-death. This chapter will discuss the development of adult skeletal age estimation

methods and their use in physical anthropology.

Historical Perspectives

The first attempts at age estimation by anatomists and anthropologists began

in earnest during the 1920s following World War I (e.g., Todd 1920, 1921; Stevenson

1924; Todd and Lyon 1924, 1925; Todd and D’Errico 1928). T. Wingate Todd and his

colleagues at Western Reserve University were instrumental in launching studies of

skeletal aging using a documented skeletal collection (McKern and Stewart 1957; Bass

2005). The first efforts at adult skeletal age estimation focused on skeletal maturation and

a better understanding of the morphological age-related changes observed in the adult

skeleton, including variability in these changes. These studies served, and continue to

serve, as building blocks for research in adult skeletal aging. Individual methods and their

histories will be discussed in further detail below.

9

During the period directly following World War II, great advances in

identification, including skeletal age estimation, were made. This war and the Korean

War resulted in large numbers of killed U.S. servicemen that were not immediately

recovered, and were often badly decomposed (Byers 2008). The need for identification of

these individuals propelled research in skeletal identification. One of the most pivotal

studies to emerge from the Korean War period was McKern and Stewart’s Skeletal Age

Changes in Young American Males (1957). Based on males of military age (generally

between the ages of 17 and 30 years old), McKern and Stewart (1957) was the first study

that did not use anatomical specimens from dissecting rooms, thereby providing

important information on younger individuals (Bass 2005).

Until the 1980s, skeletal biologists relied mainly on macroscopic, or gross,

techniques for age estimation, such as cranial suture closure and the morphology of the

pubic symphyseal face (Iscan 1989a). The rise of forensic anthropology during the

“Modern Period1” of the discipline (Byers 2008) and critiques in the field of

paleodemography (e.g., Bocquet-Appel and Masset 1982) meant that new methods were

being developed, such as the sternal rib end (Iscan et al. 1984b, 1985) and the auricular

surface (Lovejoy et al. 1985b). Interest in skeletal aging also initiated a re-examination of

older methods, such as the pubic symphysis (Suchey 1979, Katz and Suchey 1986,

Brooks and Suchey 1990). Additionally, microscopic and radiographic methods began to

receive much more attention. For example, histological techniques based on the osteon

counting method of Kerley (1965) became more common in the research literature during

this time period (e.g., Kerley and Ubelaker 1978; Stout 1989).

1 The modern period is defined by Byers (2008) as 1972 to the present.

10

The search for new and better ways to estimate age continued through the

1990s to present day. Methods to estimate age from new elements like the acetabulum

(e.g., Rougé-Maillart et al. 2004; Rissech et al. 2006) and the sacral auricular surface

(Kutyla 2008) are currently being developed. More emphasis has been placed on

understanding the accuracy and error in age estimation methods. Questions currently

being asked in the field of skeletal aging include: is it better to use a single indicator or a

multifactorial method? Should the same skeletal aging techniques be used across the

board or adapted to a specific area of research, i.e., forensic anthropology versus

paleodemography? How are age estimates being constructed and what statistics should be

employed? What is the error or uncertainty in measurement associated with skeletal age

estimation? Komar and Buikstra (2008) raise these questions and address many of the

theoretical perspectives that will continue to define the direction of further developments

in skeletal age estimation in physical anthropology.

General Concepts

Age estimation from adult human skeletal remains is certainly an integral part

of physical anthropology, influencing forensic anthropology, bioarchaeology, and

paleodemography. Physical anthropologists are inherently concerned with what it means

to be biologically human. Gaining a better understanding of aging and its skeletal

manifestations leads to increased knowledge concerning the nature of human variability.

There is an ever-growing toolkit available to physical anthropologists

interested in age estimation from the skeleton. Varying statistical procedures offer

different ways to analyze data; thus, skeletal aging methods are constantly being

11

developed and refined. Currently, there are no published international standards for

skeletal age estimation, though attempts to codify and clarify procedures have been made

within the United States (e.g., Buikstra and Ubelaker 1994; Moore-Jansen et al. 1994).

However, European and American anthropologists do not use the same standards, an

issue addressed by Wittwer-Backofen et al. (2008). Laboratories and institutions are

generally able to utilize whichever methods they choose and many of these agencies may

not even have set protocols for age estimation, making standardization difficult.

Estimating age-at-death from skeletal remains is generally problematic.

Maples (1989) likens skeletal aging to an art, rather than a precise science, and

recommends the use of as many techniques as possible in constructing age ranges. While

sub-adult aging techniques rely on skeletal and dental growth, events that take place at a

fairly consistent rate between individuals, adult aging techniques rely on degenerative

skeletal changes that are much more variable and less predictable than growth sequences

(White and Folkens 2005). In addition, many aging techniques have demonstrated a

general trend of overaging younger individuals and underaging older individuals

(Aykroyd et al. 1999).

Growth and aging can be affected by disease, health, environment, presence or

absence of trauma, and cultural practices (Bogin 1999), so these factors must also be

taken into consideration when comparing age estimations between samples and using

methods developed from one reference group to make inferences about another. Age

estimation methods developed using samples from mainly European populations may not

be applicable to individuals of Asian or African ancestry. There is a need for further

12

research in population-specific standards in the field of applied osteology (Schmitt et al.

2002).

Due to the inherent challenges in aging and the need to better understand and

interpret age estimates from the adult skeleton, physical anthropologists must constantly

develop and test techniques for age estimation. This includes an effort to better

understand age-related morphological changes as potentially population-specific

phenomena. Success in interpretation, whether it be individual identification or the

construction of representative population profiles, is contingent upon reliable, valid,

accurate, and precise methods of aging.

Key Terms

Reliability is the degree to which a method produces the same results when

used at different times (Adams and Byrd 2002), either by multiple observers or the same

observer. A highly reliable aging technique produces similar age estimates for the same

individual even when applied by different analysts or at different times by the same

analyst. Reliability can be tested for a method or technique by conducting interobserver

or intraobserver variation studies to determine error rates. Low interobserver variation (or

error) indicates high reliability. Reliability can also be referred to as repeatability,

indicating that the technique or method applied produces similar measurements of the

same quantity or entity being measured (ISO 2004:4.21).

Validity concerns the degree to which a method actually measures what it

claims to measure (Adams and Byrd 2002). Validation studies are designed to test

techniques and their applicability. In skeletal age estimation, determination of validity for

13

a method or technique is usually conducted by testing a specific method or technique on a

sample of known age individuals (e.g., Wittwer-Backofen et al. 2004; Ginter 2005;

Mulhern and Jones 2005). These tests serve as important reviews of methodology and

also produce data on the accuracy, precision, and reliability of a technique.

Accuracy is the degree of error in a measurement as calculated from the true

value (Youden 1998) or the “closeness of agreement between a quantity value obtained

by measurement and the true value of the measurand” (ISO 2004:A2). For skeletal age

estimation, this is the ability of a method to continually and consistently provide age

intervals that encompass the true age-at-death of individuals. Calculations of inaccuracy,

the absolute difference between actual and estimated ages at death, and bias, the

directionality of these differences, are used to measure error and determine the validity of

a method or technique.

Precision is linked to accuracy and entails the level of refinement of the

measurement or estimate (White and Folkens 2005). The ISO defines precision as:

“closeness of agreement between quantity values obtained by replicate measurements of

a quantity” (2004:2.35). Precision is determined by the number of deviations of an

individual measurement from the average of the total measurements (Youden 1998) and

can be expressed as standard deviation or variance. A very precise technique gives an age

estimate with a very small standard deviation from the average value measured for a

sample and thus a small age interval.

While all of these terms are closely related, they represent different facets of

skeletal aging methods. For example, a technique may be highly accurate but imprecise,

e.g., the actual age-at-death falls into the range of expected values predicted by the

14

method but the range of values is so large it does not give highly useful individualizing

information. Conversely, a technique could also be highly precise but inaccurate, e.g., the

estimated age-at-death offers a narrow interval of one to two years but rarely correctly

estimates actual age-at-death. Finally, a technique could be both accurate and precise but

unreliable, e.g., a single researcher may have great success with applying a certain age

estimation technique but when tested by multiple researchers the technique suffers from

high interobserver error. Many varying degrees of accuracy, reliability, and precision are

possible in scientific research and it is vital that all be accounted for when analyzing data

and methodology.

Trends

Forensic Anthropology

Forensic anthropological analysis is inherently concerned with identification

on an individual level. Recent developments in the field suggest that it may be moving

away from this basic concern and towards the analysis of events that occurred at or

around the time of death, as represented by the fields of forensic taphonomy and forensic

archaeology (Dirkmaat et al. 2008). However, the construction of a biological profile still

remains a central focus of forensic anthropologists because it is necessary for the

comparison of missing persons files to unidentified skeletal remains. The data generated

from skeletal analyses can lead to a positive identification (Byers 2008). Elements most

commonly included in the biological profile are: sex, age-at-death, ancestry, and stature

(Komar and Buikstra 2008).

15

In the early 1990s, the development of more stringent guidelines concerning

expert witness testimony and admissibility of evidence meant that forensic

anthropologists began to be held increasingly accountable for the reliability of their

techniques under the Federal Rules of Evidence. The 1993 Supreme Court ruling in the

case of Daubert v. Merrell Dow Pharmaceuticals set the precedent for federal trial judges

to be the “gatekeepers” of evidence. Specifically, this ruling concerned the relevancy and

reliability of expert witness testimony. Evidence must be based on the scientific method,

which means techniques have to be empirically tested, subject to peer review, have a

known error rate and standards of application, and be generally accepted by the scientific

community (Daubert v. Merrell Dow Pharmaceuticals, 509 US 579 [1993]).

Increasingly, in the post-Daubert age, forensic anthropologists are required to provide

standard error rates and measures of reliability for the techniques that they use. Both new

methods and commonly used methods must be consistently tested to ensure their

reliability and forensic anthropologists “should…be particularly cautious that their

investigations result in methods and techniques that will be admissible under the Daubert

guidelines” (Christensen 2004:2).

A second Supreme Court case in 1999, Kumho Tire Company, Ltd. v.

Carmichael, established further guidance for expert witness testimony. Specifically,

Kumho gave greater flexibility to Daubert guidelines with the understanding that not

every expert witness testimony will necessarily meet all of the requirements of Daubert

(Kumho Tire Company, Ltd. v. Carmichael, 526 US 137 [1999]). Grivas and Komar

(2008) express concern about the lack of discussion about Kumho in the forensic

anthropology literature. In fact, “many anthropological techniques already meet the

16

criteria for admissibility under Kumho, potentially making many revisions [of analytical

techniques] unnecessary” (Grivas and Komar 2008:773). However, Grivas and Komar

(2008) also insist that Daubert and Kumho are complementary. What is clear from this

discussion is that the subject of expert witness testimony has become crucial to any

analysis in forensic anthropology and its practitioners must understand their role within

the legal system.

Paleodemography

In contrast to forensic anthropology, paleodemography focuses on the

construction of group or population profiles. Central to paleodemography is the

generation and interpretation of skeletal age distributions, as this information offers key

insights into the demographic composition of a particular group and possible differential

mortality based on age (Milner et al. 2008). The correct interpretation of age distributions

has been at the center of an intense debate since the publication of Farewell to

Paleodemography (Bocquet-Appel and Masset 1982) because it relies on age estimation

in the absence of written records (Konigsberg and Frankenberg 2002). Similar to forensic

anthropology, the field of paleodemography has also undergone its own critiques in

regards to standardization of techniques and understanding error rates when constructing

population profiles from skeletal remains (e.g., Boquet-Appel and Masset 1982; Wood et

al. 1992). Paleodemographers have concentrated their efforts to construct population

parameters that accurately reflect the group being studied and that do not mirror the

reference sample or samples being used to study them.

Age structure mimicry (Mensforth 1990) has been a central problem in

paleodemographic interpretations. In age structure mimicry, the mean age of an aging

17

indicator is actually based on the age structure of the reference population (Bocquet-

Appel and Masset 1982) so that when these calculations are applied to an unknown

group, the “target” sample (Konigsberg and Frankenberg 1992) takes on a distribution

similar or identical to that of the reference. Bocquet-Appel and Masset (1982) also

highlight the very low correlation that exists between skeletal age-indicators and actual

age-at-death, resulting in what Chamberlain (2006:85) terms “unacceptably large

standard errors of estimation.” In addition to the problem of sample mimicry and error,

the osteological paradox as proposed by Wood et al. (1992) addresses conceptual

problems in paleodemography, including demographic nonstationarity, selective

mortality, and hidden heterogeneity in risks. All of these issues have the potential to

completely undermine a paleodemographic study and render its results meaningless.

Critiques of the field of paleodemography have sparked intense discussions on

theoretical perspectives and the development of new age estimation methods based on

Bayesian and maximum-likelihood techniques (Chamberlain 2006). The issues

highlighted during the 1980s have certainly not disappeared, but anthropologists are

continually refining and testing methods to reduce bias and error in age estimation.

Problems with differential preservation and age estimation of older individuals remain

central to the field, but continued research holds promise that age estimation of past

populations is indeed possible and useful for understanding and interpreting morbidity,

mortality, and levels of health from a demographic perspective.

18

Published Methods

Multifactorial and Multiple-Indicator

Many anthropologists advocate a multifactorial approach to age estimation

(e.g., Lovejoy et al. 1985a; Bedford et al. 1993; Baccino et al. 1999; Martrille et al.

2007). Multifactorial age estimation methods, which usually weight a number of

morphological indicators to estimate an overall age range, have been used in

paleodemography and forensic anthropology with varying degrees of success. These

methods are generally mathematically complex. Age estimation based on multiple

indicators uses a variety of single age indicators to come up with an age interval, but

requires no weighting of individual methods.

McKern and Stewart (1957) concluded their extensive study of age changes in

young American males with a chapter on the overall pattern of skeletal maturation that

included three separate regression formulae. These formulae were based on segments

composed of different age indicators. Scores for segments I, II, and III are calculated by

adding up the individual scores for each element and these composite scores can then be

translated into age intervals and predicted age point estimates. McKern and Stewart

(1957) found that the elements of the innominate bone, including the pubic symphysis

and the epiphyses of the iliac crest, ischial tuberosity, and ramus, were the strongest

combined indicators. In the case of missing innominate elements, remaining elements

could also be combined with other skeletal indicators to clarify and support age

determination (McKern and Stewart 1957).

Nemeskéri et al. (1960) devised a “complex” method that combined

endocranial suture closure, pubic symphyseal morphology, and radiographs of the

19

proximal humerus and femur. Each element was given a score and the final age estimate

was derived by averaging the total score and dividing it by four, meaning that no region

was given more weight than another (Iscan 1989b). The Lovejoy et al. (1985a) method is

similar to the Nemeskéri et al. (1960), but it uses principal components analysis to weight

the following indicators: pubic symphysis, auricular surface, proximal femur, dental

wear, and suture closure. Results from this study indicate that multifactorial methods are

superior to single indicators with regards to bias and accuracy. A test of the Lovejoy et al.

(1985a) method by Bedford and colleagues (1993), which eliminated suture closure and

dental wear, included the clavicle, and weighted the indicators according to their

reliability, again found that this method was very accurate and particularly suited for

paleodemography.

The multifactorial method, however, is not unequivocally accepted as the best

possible choice for age estimation. Saunders et al. (1992) found that the multifactorial

method did not outperform a simple averaging of age estimates, thereby eliminating the

need to calculate the complicated statistics the multifactorial method requires. In general,

greater age increased bias and inaccuracy, no matter what method was used (Saunders et

al. 1992). Schmitt et al. (2002), based on their auricular surface scoring system,

concluded that multifactorial methods are not more reliable than single indicators.

Regardless of the performance of multifactorial methods, it is difficult to argue against

the utility of multiple indicators when attempting to derive an age estimate from adult

skeletal remains.

The use of several methods or techniques to derive an overall age interval is

preferable in many cases because it uses all available information rather than a single

20

indicator. Anthropologists are constantly cautioned to avoid the use of single indicators

whenever possible (e.g., Brooks and Suchey 1990; Saunders et al. 1992; Martrille et al.

2007). Through the use of multiple age indicators, the anthropologist can work towards

estimating a more precise interval (Brooks and Suchey 1990). Cranial suture closure, the

fourth sternal rib end, and changes of the pubic symphyseal and auricular surfaces are

some of the most common techniques used to estimate the age of adult skeletal remains

(White and Folkens 2005) and can be readily combined to produce range charts as

recommended by Byers (2008).

The use of multiple indicators does not come without its own caveats. Simply

applying as many age techniques as possible or available to a set of skeletal remains does

not ensure the best possible age-at-death estimate. Age estimation is distinctly related to

uncertainty in measurement; the use of methods with known error, bias, and inaccuracy

rates is essential. Once an individual has been categorized as a young or old adult,

Martrille et al. (2007) recommend further consideration of methods that have higher

accuracy for that age range. This will help “to maximize the potential of each method”

(Martrille et al. 2007:306) by excluding methods that do not perform well for older or

younger adults. This adaptation of multiple age indicators is especially important as

physical anthropologists continually refine age estimation methods and calculate error

rates for skeletal age estimation.

Single-Indicator

Multiple indicator age estimates are usually based on the combination of

single-indicator methods. Single indicator methods can be macrosopic or microscopic,

including gross, histological, chemical, and radiographic observations. In most instances,

21

the methods chosen are directly related to what the anthropologist is working with, taking

into account problems of differential preservation or recovery. It is important to

understand the strength of each method on its own when applying several single-indicator

methods for an overall age estimate. The methods discussed below are those that are used

most commonly for late-adolescent to adult macroscopic age-at-death estimation at the

JPAC/CIL2. The unique composition of the JPAC/CIL population3 and problems of

recovery and preservation of skeletal remains in general render some methods that would

not be employed as frequently very useful (e.g., McKern and Stewart 1957; Mann et al.

1991) and other more frequently used methods less useful (e.g., Iscan et al. 1984b).

Therefore, only those methods that are listed in SOP 3.4 or that were found to be used on

a routine basis are discussed.

Epiphyseal Fusion

Age estimation based on epiphyseal fusion falls under the category of growth

and development in the human skeleton and as such is only useful for those individuals

not yet fully skeletally mature. It is based on the documented progression of fusion of

proximal and distal ends of long bones and other skeletal sites. During the period of

skeletal growth, the observation of open, partially fused, or fully fused epiphyses is very

useful in constructing an age interval. Long bone, vertebral, and iliac crest epiphyseal

closure are useful for individuals in their late teens and early twenties, while the sternal

end of the clavicle can be informative up until the age of 30 for some individuals.

2 All methods employed are macroscopic with the exception of dental formation age

estimation techniques, i.e., Moorrees et al. (1963), Mincer et al. (1993). 3 The JPAC/CIL population is defined as all members of the U.S. military who are missing as

a result of American conflicts. The population is largely young, male, and Caucasian.

22

As part of a larger group of studies on skeletal age estimation conducted by

Todd and colleagues at Western Reserve University, Stevenson (1924) examined

epiphyseal closure for the purposes of age identification. This study was the first

comprehensive effort to document skeletal changes related to age and determined: “(1)

the age of the union of the individual epiphyses, (2) the sequence of union of the different

epiphyses, and (3) the actual duration of the period of epiphyseal union as a whole”

(Stevenson 1924:56). Before this, the only knowledge of age estimation based on

epiphyseal union was sporadically gleaned from anatomical texts with no attempt at

standardization (Stevenson 1924) and there was no agreement between sources (McKern

and Stewart 1957).

Stevenson (1924) outlined four stages of epiphyseal closure: no union,

beginning union, recent union, and complete union in ten bones: humerus, radius, ulna,

femur, tibia, fibula, scapula, innominate, ribs, and clavicle. The largest difference in

epiphyseal closure is that between non-union (stages one and two) and union (stages

three and four) (Stevenson 1924). No difference in rate of fusion was found between

races or sexes and Stevenson concluded that the ages of 15 to 20 years could be defined

as the “real period of epiphyseal union” (1924:76). Additionally, he considered the long

bones to be the most reliable and constant indicators of age, while the scapula,

innominate, clavicle, and ribs exhibited a much larger degree of individual variation

(Stevenson 1924).

McKern and Stewart’s (1957) Skeletal Age Changes in Young American

Males was the first study to use a non-anatomical documented skeletal collection (Bass

2005) and its organization follows closely that of Stevenson (1924), with the addition of

23

analyses of suture closure, third molar development, and fusion of the vertebrae, sternum,

and sacrum. A supplementary category of union was also added, “active,” and all

epiphyses were rated on a zero to five scale (zero = no union, five = complete union).

Due to previous confusion on the concept of when epiphyseal fusion is considered to

occur, McKern and Stewart (1957:18) “emphasize[d] the total range of maturational

activity and define[d] the age of union as that age when all cases are completely united.”

McKern and Stewart (1957) continues to be one of the major references for age

estimation using epiphyseal fusion, especially for young males.

McKern and Stewart (1957:41-42) divided the long bones or “extremities”

into two main groups: Group I – epiphyses showing early union and Group II – epiphyses

showing late union. Because the McKern and Stewart (1957) sample is comprised of

males of military age, no individuals under the age of 17 were a part of this study. Thus,

most Group I epiphyses were already fused for the youngest members of this group, so

this classification is much less informative than the Group II epiphyses. Group II

epiphyses include: proximal humerus, tibia, and fibula and distal radius, ulna, and femur.

Complete union of these epiphyses occurs by age 24 in all individuals in the McKern and

Stewart (1957) sample.

Developmental Juvenile Osteology (Scheuer and Black 2000) is a useful text

for age estimation based on epiphyseal closure of all skeletal sites. The goal of the book,

as stated by its authors, is to “describe each individual bone of the skeleton…from its

embryological origin to the final adult form” (Scheuer and Black 2000:1). As such, it is

an invaluable resource for age identification and it supplies data for both male and female

epiphyseal fusion. Unlike McKern and Stewart (1957), Scheuer and Black (2000) do not

24

provide a five-phase system of fusion. Ages are given for time of union; age estimation

using this source is therefore conducted in a binary fashion, scored either non-union or

union. This text is a compilation of many different sources and not based on a particular

sample.

Stevenson (1924), McKern and Stewart (1957), and Scheuer and Black (2000)

discuss the epiphyseal fusion of most of the major epiphyses in the skeleton. Later works

focused on particular categories of epiphyseal fusion, such as the long bones or the

clavicle. These more specific studies are important to the understanding of skeletal aging

because they further highlight the individual variability that exists in the human skeleton.

McKern and Stewart (1957) offered a rudimentary introduction to age

estimation based on the vertebral column, but concluded this section by stating that

changes seen in the vertebrae are far too variable to be of any use in age estimation.

Albert and Maples (1995) examined the fusion of superior and inferior vertebral rings in

thoracic and lumbar vertebrae. While the sample size is considerably smaller (n=55) than

the McKern and Stewart (1957) sample, this study was an attempt to look at sex and race

differences in the timing of epiphyseal fusion of vertebral centra. Albert and Maples

(1995) utilized a four-stage system, with stages zero through two each having an early

and late division. Vertebral ring fusion is fairly well-correlated with age in this study

(r=0.78), but the authors suggest that this method be used with other methods in order to

narrow the predicted age interval.

Epiphyseal fusion of the medial clavicle and anterior iliac crest are useful for

age estimation because of the delayed appearance of the epiphyses at these two sites.

Webb and Suchey (1985) is the primary source for age estimation from these skeletal

25

elements. Previous studies were limited by sample size (e.g., Todd and D’Errico 1928)

and underrepresentation of females, non-whites, and individuals under 17 years of age

and over 30 (e.g., McKern and Stewart 1957). Webb and Suchey (1985) utilized a four-

stage system to observe the epiphyses of the medial clavicle and anterior iliac crest in a

sample of 605 males and 254 females between the ages of 11 and 40. The authors provide

age distribution tables based on stage of union and “general rules” tables for quick

reference. The McKern and Stewart (1957) and Webb and Suchey (1985) studies show

similar patterns of epiphyseal formation based on the clavicle and iliac crest. The slightly

larger age intervals in the latter study are most likely due to greater variability from a

larger sample (Webb and Suchey 1985).

Even with comprehensive sources like McKern and Stewart (1957) and

Scheuer and Black (2000), research in epiphyseal closure is far from complete. While

Stevenson (1924) noted no differences in timing of fusion between males and females

and individuals of different ethnicities, recent studies suggest that population-specific

standards would be more appropriate. Identification efforts for Bosnia war dead have

shown that males in this sample exhibit epiphyseal closure up to three years earlier than

American males (Schaefer and Black 2005). Research using the Lisbon documented

skeletal collection points to the need to understand the socioeconomic background of the

individuals in the sample being studied, as growth and development may be affected by

conditions of malnutrition (Cardoso 2008). Standardization is also needed since studies

rarely use the same stage system for rating epiphyseal fusion or even the same

methodology, i.e., radiographic versus macroscopic examination of union.

26

Suture Closure

Age estimation from suture closure has a long history, beginning with Vesale

in 1542, who first noted a possible relationship between age and cranial suture synostosis

(Masset 1989). Varying degrees of obliteration in ectocranial, endocranial, and maxillary

suture sites can be correlated to skeletal age changes. A complete historical perspective is

given in Masset (1989). Suture closure methods are perhaps most well-known because of

issues that have been raised with their use. A large number of publications in the 1950s

(e.g., Brooks 1955) highlighted the uncertainty of this indicator because each study

published produced different results for age (Masset 1989). Even though suture closure

was one of the first methods to be used for age estimation from skeletal remains (Todd

and Lyon 1924, 1925), it fell further out of favor as new methods were developed and

refined during the 1980s (Iscan 1989a). Masset (1989) lists possible sources of error as

sex differences in timing of closure and the structure of the reference population and he

found that the correlation between age and cranial suture closure was never greater than

63%.

Given these limitations, the obliteration of cranial sutures can still be useful

for providing general age estimates and to corroborate other skeletal age indicators.

Meindl and Lovejoy (1985) published results on ectocranial suture closure and skeletal

age-at-death. Interestingly, they pointed to methodological inconsistencies as the key

issue behind problems with using cranial suture closure as an age indicator. Meindl and

Lovejoy (1985) employed a four-stage system to score suture closure at ten cranial sites.

Results indicated that the lateral-anterior sites were the best overall predictors of age and

that ectocranial suture closure was superior to endocranial (Meindl and Lovejoy 1985).

27

Lovejoy et al. (1985a) employed the Meindl and Lovejoy (1985) ectocranial suture

closure method as part of their summary age method, which they find to be the most

accurate means to estimate age-at-death.

Mann and colleagues devised a system of age estimation based on the

obliteration of the maxillary sutures (Mann et al. 1987, Mann et al. 1991). The method

uses four sutures of the maxilla: incisive, anterior median palatine, transverse palatine,

and posterior median palatine. In its original format, the amount of obliteration was

measured with a sliding caliper and converted to a percent for the entire suture (Mann et

al. 1987). The revised method eliminated measurement and relied solely on visual

inspection of the four sutures (Mann et al. 1991). Any obliteration on any portion of the

suture automatically places the individual in the age interval corresponding to obliteration

of that suture (personal communication, Robert Mann 2008). It also adds inspection of

the transverse suture within the greater palatine foramen and additional features for

individuals most likely over the age of 60, such as very thin bone and a narrow bony

ridge along the anterior median palatine suture. The progression of obliteration of

maxillary sutures in both studies is as follows: incisive suture, posterior median palatine,

transverse palatine, and finally, the anterior median palatine. General age estimates are

given based on the overall pattern of obliteration.

The Mann et al. (1987) method was tested by Gruspier and Mullen (1991).

This test was designed to look at interobserver error and assess the accuracy of the

maxillary suture obliteration method. Gruspier and Mullen (1991) found that the

relationship between maxillary suture obliteration and age was not linear and the

inaccuracy of the Mann et al. method exceeded all age estimation methods used in

28

Lovejoy et al. (1985a). Ginter (2005) further examined maxillary suture closure by

comparing the original and revised methods and testing the revised method. The revised

method performed much better than the original method; age phase was estimated

correctly for 83% of the individuals in the study (Ginter 2005). Ginter (2005) also

suggested that the revised maxillary suture method was more effective at estimating age

than more commonly accepted methods, e.g., the pubic symphysis and sternal rib ends.

Nawrocki (1998:290) found that all cranial suture closure methods are “not

that much worse than other techniques…and in fact better than some (e.g., pubic

symphysis, sternal end of the rib).” This conclusion is contingent upon the proper

construction of error intervals for all aging methods. Therefore, the problem with age

estimation from cranial suture closure is not the methods themselves, but the use of

inappropriate statistics. Nawrocki (1998) recommended using multiple areas of the vault

and provided several race- and sex-specific regression formulae based on stepwise

regression techniques that perform quite well. The investigation of cranial suture closure

methods for age estimation in skeletal remains continues to be an important, if not

contentious, area of research in physical anthropology.

Third Molar Dental Formation and Eruption

Tooth development and eruption are believed to be under strong genetic

control and are therefore considered to be more reliable in predicting chronological age

than other osteological indicators (White and Folkens 2005). Stages of dental formation

are most useful for estimating the age-at-death of sub-adults, but the late formation and

emergence of the third molars renders this method useful for late adolescents and early

29

adults as well. However, third molars are also the most variable of all teeth (White and

Folkens 2005), exhibiting a high percent of agenesis.

Moorrees et al. (1963) is a comprehensive study of dental formation that looks

at age variation in ten permanent teeth. Teeth included in this study are the maxillary

incisors and all mandibular teeth. Stages are given for crown, root, and apex formation.

For the mandibular molars, results are reported for males and females as well as mesial

and distal roots. Reference charts are provided for both sexes that include the mean age of

attainment for a given stage of formation and two standard deviations. No additional

statistics are included and no data for the maxillary molars are given. Moorrees et al.

(1963) stress the importance of accounting for variability in individual dental formation

and the need for further research to better understand patterns of tooth formation.

Saunders et al. (1993) used the Moorrees et al. (1963) dental formation aging

method to estimate age for a sample of subadult remains (n=282) from a historic

cemetery. While the sample size for accuracy and bias tests was small (n=17), the

Moorrees et al. (1963) method produced age estimates within a standard deviation of half

a year. Saunders et al. (1993) concluded that the Moorrees et al. (1963) method is the best

method to use for juvenile skeletons; however they did also highlight the similarity of the

reference sample and the target sample. Since this study did not specifically focus on

late-adolescents, it may be difficult to draw exact parallels between this sample and

samples of late-adolescents. Additionally, the Moorrees et al. (1963) formation reference

charts are difficult to interpret, which could very easily introduce error into further

studies.

30

A later study by Mincer et al. (1993) analyzed the development of third

molars as it related to chronological age and applications to the legal system. Diplomates

of the American Board of Forensic Odontology (ABFO) scored 823 cases using the

Demirijian et al. (1973) eight-grade system4. Results from the study are summarized in a

table that includes mean ages at attainment with one standard deviation and division by

race (black and white) and sex (male and female). Probabilities of an individual being at

least 18 years of age based on third molar dental formation are also given to aid in the

determination of the juvenile or adult status of an individual. The authors conducted their

own tests of accuracy and concluded that the “third molar is far from an ideal

developmental marker” (Mincer et al. 1993:386). Bass (2005) reemphasizes that the

accuracy of this method has been called into question. However, very few other methods

exist for this age period and in some cases the dentition may be the only element

available for analysis.

Further research in third molar dental formation has shown that the process is

population-specific (Chaillet and Demirijian 2004). Solari and Abramovitch (2002)

investigated third molar formation in a Hispanic sample, again using the Demirjian et al.

(1973) system. Similar to Mincer et al. (1993), Solari and Abramovitch (2002) found that

third molars in males develop earlier than in females and that Hispanics in general

develop earlier than Canadian Caucasians. Other examples of work in this arena include

studies of southern French children (Chaillet and Demirijian 2004), Finnish children

(Chaillet et al. 2004), third molar development in Japanese juveniles (Arany et al. 2004),

4 This system rates the development of the crown, root, and apex for all four types of teeth.

Stages A through H are assigned after radiographic examination of tooth formation.

31

and comparisons of third molar development between American blacks and whites

(Blakenship et al. 2007), which all demonstrate that there is inter-population variation in

dental formation. Clearly, caution must be used when applying methods to ethnically

diverse samples and these findings underscore the need to develop population-specific

methods for age estimation from the third molar.

McKern and Stewart (1957) briefly summarized third molar eruption in their

sample. While they recognized that the pattern of eruption for third molars is extremely

variable, they did also point out its importance as an age indicator that could further

corroborate observations from other skeletal indicators. Unerupted and partially erupted

third molars are the most useful for age estimation and 17 to 22 years of age was

identified as the main eruptional period, with the peak between the ages of 17 and 18

(McKern and Stewart 1957). In general, third molar formation is more useful than

eruption when radiographic analyses are possible.

Pubic Symphysis

The pubic symphysis is the most frequently used skeletal aging technique

(Aykroyd et al. 1999). According to some authors, it is “universally considered more

reliable than other criteria” (Meindl and Lovejoy 1989:138). This reliability is attributed

to the fact that, in general, other sites are less reliable and the age-related changes that

occur in the pubic symphysis are clear and distinct (Meindl et al. 1985; Meindl and

Lovejoy 1989), as well as late-occurring compared to epiphyseal fusion and dental

formation. A good history of anecdotal observations of possible age-related changes of

the pubic symphysis is provided by Todd (1920), but the first formal method of age

32

estimation using this element was not developed until the 1920s by Todd and his

colleagues (Todd 1920, 1921).

The Todd ten-phase system describes the modal appearance of each phase and

gives an age interval per phase. The ten phases are essentially the same regardless of sex

or race. The first three phases he terms “post-adolescent” and the final phase

encompasses all individuals over the age of 50. Photographs of several examples from

each phase are provided in all of his publications (Todd 1920, 1921). In a test of the Todd

system, Brooks (1955) found that it consistently over-aged male and female pubic

symphyses. While the original goal of Brooks’ study was to correlate cranial and pubic

indicators, Brooks (1955) found cranial suture closure to be wholly unreliable and instead

focused on modifying the age intervals per phase of the Todd pubic symphysis method to

attempt to correct the problem of over-aging with the pubic symphysis. Meindl et al.

(1985), in a test of four pubic symphyseal methods5 found the series of Todd methods to

be the most accurate and have the best correlations between real and actual ages.

McKern and Stewart (1957) devised a component system for males based on

three features of the pubic symphysis: dorsal plateau, ventral rampart, and symphyseal

rim. Each component is scored using six stages (zero to five) and these stages are added

to produce a composite score. The choice of features is based on a compilation of nine

features originally described by Todd (1920). McKern and Stewart (1957) found that the

original ten phases were too rigid to encompass the variability they saw in pubic

symphyseal faces. Their system of separate component scoring allowed for differential

5 McKern and Stewart (1957), Gilbert and McKern (1973), Hanihara and Suzuki (1978), Todd

(1920).

33

development of features with the possibility of arriving at the same age estimate, e.g., a

score of 3-3-2 and a score of 2-4-2 both have total scores of eight and would be

categorized as an age interval of 22-28 years with a mean age estimate of 24.14 years.

Plastic casts are available for comparison and summary statistics are in the original

publication. McKern and Stewart’s (1957) own tests of their method indicated that

observers could arrive at the correct age estimate with approximately 90% accuracy.

Meindl et al. (1985) identified three problems that could influence performance of the

McKern and Stewart (1957) method: their sample is entirely male, the age range is

extremely limited, and the method was never tested on another population.

Gilbert and McKern (1973), following McKern and Stewart (1957),

recognized the need for separate standards for females and adapted the three-component

system based on a sample of 120 females aged 17-55 years. The same three components

as the male system were used and the effects of birthing on pubic symphyseal appearance

were also researched. There was no significant activity observed beyond the age of 55

and the pubis appearance had no correlation to parity. Females, however, did exhibit a

faster flattening of the dorsal surface and the full separation of the dorsal and ventral

demi-faces by the symphyseal rim, neither of which is seen in males (Gilbert and

McKern 1973). Suchey (1979) found, however, that the female three-component system

resulted in a correct age interval assessment only 51% of the time, mainly due to

difficulties in applying the method.

The 1980s saw the progressive development of the Suchey-Brooks pubic

symphysis aging method, which is now the most widely used pubic symphysis age

estimation method. Katz and Suchey (1986) reanalyzed the Todd and McKern-Stewart

34

systems with a large sample of male pubic bones (n=739). Regression analyses of

multiple variables (e.g., overall Todd score, ventral rampart) showed that a modified six-

phase Todd system performed better than all other methods and their possible

modifications. The modifications of the Todd system were to combine phases I, II, and

III, phases IV and V, and phases VII and VIII. Male phase descriptions and casts made

by France were first distributed in 1986 at anthropology conferences (Brooks and Suchey

1990). A system was then developed for females, who show much greater variability in

pubic symphyseal morphology than males (Suchey and Katz 1998). Finally, a set of

unisex descriptions was developed that focused on key changes observed in both male

and female pubic bones (Brooks and Suchey 1990). Most recently, Berg (2008) modified

the Suchey-Brooks system by reformulating Phases V and VI and adding a Phase VII.

These changes allow for more accurate aging of older females, but have yet to be tested

independently.

Despite its fairly ubiquitous use in age determination, the Suchey-Brooks

method does have some inherent problems. In blind tests of several different methods of

skeletal age estimation6, Saunders et al. (1992) found the pubic symphysis fared the worst

and they questioned the usefulness of this method due to its broad age ranges. This high

level of imprecision may undermine the utility of this method in forensic cases. In

addition, the pubic symphysis is commonly damaged in forensic and archaeological

contexts (Saunders et al. 1992), rendering any age-determination from this element

impossible.

6 Suchey-Brooks pubic symphysis, Lovejoy et al. auricular surface, ectocranial suture closure,

sternal rib ends

35

Schmitt (2004) used the Suchey-Brooks pubic symphysis method on a sample

of Asian individuals and found that it tended to underage. The method was also highly

inaccurate in older groups, though this information is by no means unique to this method

or to the pubic symphysis as a predictor of age. More troubling, Schmitt (2004) found

asymmetries in right and left bones from the same individuals for both the pubic

symphysis and the auricular surface. This means that bones from the same individual that

are found separately could be assigned to different phases, thereby increasing the error

associated with the age estimate. Asymmetry is not generally recognized as a problem in

age estimation from the pubic symphysis or the auricular surface (e.g., Brooks and

Suchey 1990; Falys et al. 2006)7.

The study by Schmitt (2004) suggests that current pubic symphysis and

auricular surface age estimation methods cannot be accurately or reliably applied to

individuals or groups with Asian origins. Sinha and Gupta (1995) also found that the

Todd pubic symphysis method gave significantly different mean differences in age per

phase when compared to a sample of males from India. However, Brooks and Suchey

(1990) emphasized that the sample for the Suchey-Brooks pubic symphysis method was

developed from a large multiracial sample and should be a fairly good representation of

modern human variation. A final note of caution can be taken from Hoppa (2000), who

reminds anthropologists that skeletal age estimation is far from perfect and that age-

related changes of the pubic symphysis can be significantly different between target and

reference samples.

7 Brooks and Suchey (1990) do not discuss asymmetry between right and left pubic

symphyseal surfaces. Falys et al. (2006) found no significant side differences (Mann-Whitney U-test) between left and right auricular surfaces.

36

Auricular Surface

When compared to the pubic symphysis, use of the auricular surface is far less

accepted or common in skeletal age estimation. This technique was originally developed

in 1985 by Lovejoy and colleagues for paleodemographic and archaeological

applications. It now appears as a standard method in osteology lab manuals, such as

Standards for Data Collection from Human Skeletal Remains (Buikstra and Ubelaker

1994), but is still the focus of considerable research concerning its accuracy and

reliability.

The Lovejoy et al. (1985b) auricular surface aging method relies on gross

morphological changes of the auricular surface of the ilium, similar to the age-related

changes of the pubic symphyseal face. These changes were originally categorized into

eight phases. The auricular surface is potentially advantageous as an age indicator

because of its higher preservation potential when compared to the pubic symphysis and

the presence of morphological age-related changes beyond 50 years old (Lovejoy et al.

1985b). Additionally, it has the capability to perform as well as aging methods using the

pubic symphysis (Lovejoy et al. 1985b) and Saunders et al. (1992) found that it

performed the best in blind tests of different age estimation methods.

Age estimation using the auricular surface of the ilium is not as easy to master

as other aging techniques (Lovejoy et al. 1985b) and there are no casts for comparison.

The photos in the original publication only represent the “modal surface appearance for

each age category” and much more emphasis is thus placed on qualitative descriptions

(Saunders et al. 1992:114). Another common complaint is that the auricular surface

37

technique suffers from a general lack of standardization and more work is needed to

improve the method (Saunders et al. 1992).

Murray and Murray (1991), concerned with the applicability of the Lovejoy et

al. (1985b) auricular surface method to forensic anthropology, tested the accuracy of this

method using individuals of known age-at-death from the Terry Collection. Analysis of

variance (ANOVA) indicated that morphological changes in the auricular surface of the

ilium are dependent on age but not sex (Murray and Murray 1991). Further statistical

analyses showed that auricular surface changes are also independent of ancestry, but that

these changes were too variable to be used as a single indicator of age-at-death (Murray

and Murray 1991). These results indicate that the auricular surface method overages

younger individuals and underages older individuals and its range of estimation error is

too large to be used on a case-by-case basis, as in forensic anthropology.

One of the issues raised by Murray and Murray (1991) was the viability of the

original phases as proposed by Lovejoy et al. (1985b). Osborne (2000) used the original

descriptive terms from the Lovejoy et al. (1985b) method combined with descriptive

statistics to redefine the eight-phase system into a six-phase system. Further research by

Osborne et al. (2004) indicated that the original five-year intervals did not accurately

reflect true variation in auricular surface morphology because age only accounted for

34% of the variation in auricular surface morphology and that a combined scoring system

for auricular surface features fared better than any single indicator. The Osborne et al.

(2004) study highlights the importance of using statistical tests and calculating accuracy,

bias, and confidence intervals when examining the viability of any aging technique.

38

Buckberry and Chamberlain (2002) presented a revised method of age

estimation from the auricular surface of the ilium. Their method uses a quantitative

scoring system that assigns numbered stages to different features of the auricular surface

based on the criteria from Lovejoy et al. (1985b). A composite score is correlated with

one of seven auricular surface stages to estimate age-at-death. Buckberry and

Chamberlain (2002) believe that their method more realistically expresses the age

changes seen in each feature and is easier to apply.

A test of the revised method (Buckberry and Chamberlain 2002) as compared

to the original method (Lovejoy et al. 1985b) indicated that the revised method is more

accurate for individuals between 20-49 years of age, but less accurate between 50-69

years of age (Mulhern and Jones 2005). In this test, the revised method was also found to

be easier to apply than the original method and showed no significant differences

between white and black or male and female individuals (Mulhern and Jones 2005).

Mulhern and Jones (2005) caution, however, that the auricular surface is not accurate

enough to be used as the only indicator of age in older adults.

Falys et al. (2006) also tested the revised method and their results were similar

to those of Mulhern and Jones (2005) concerning ease of application and precision of

aging. Falys et al. (2006) modified the Buckberry and Chamberlain (2002) revised

technique of seven stages by proposing a new three stage system that aggregates certain

composite scores. This aggregation allows for a discrimination of older versus younger

individuals and the three stages show significant differences in age. While the Falys et al.

(2006) system does not aid in distinctly separating middle-aged individuals, it does show

some promise for individuals over the age of 60.

39

Igarashi et al. (2005) proposed another new method for age estimation based

on the auricular surface of the ilium. This method is based on a collection of modern

Japanese skeletal remains with known age-at-death. Nine features in males and seven

features in females are scored as either present or absent. The features were a

combination of relief and texture categories and are well defined within the study. A

feature was marked as present if it was found anywhere on the surface, i.e., “on the basis

of [an] all-or-none principle” (Igarashi et al. 2005:327). While this may eliminate scoring

of gradients, this method warrants further consideration because of the new qualitative

categories described and its development based on a non-western sample. As highlighted

by Schmitt (2004), age estimation is problematic for Asian individuals and samples

because most methods have been developed using European groups and these techniques

may not actually be ancestry independent, as has been suggested by other researchers

(e.g., Murray and Murray 1991, Osborne et al. 2004, Mulhern and Jones 2005).

Sternal Rib Ends

The largest contributors to research on age estimation from the sternal rib end

have been Iscan, Loth, and colleagues. Age related changes were first noted by Kerley

(1970), but no research was conducted until the early 1980s (Loth and Iscan 1989).

Numerous publications during the 1980s described the then newly developed technique,

accompanied by phase descriptions, photographs, and descriptive statistics (e.g., Iscan et

al. 1984a, 1984b, 1985). Sternal rib end estimation was originally based on component

analysis of three features of the right fourth rib: Component I – pit depth, Component II –

pit shape, Component III – rim and wall configurations (Iscan et al. 1984a). The

component system was then modified to a phase system based on the overall changes

40

seen in form, shape, texture, and quality of the sternal rib (Iscan et al. 1984b). There are

nine phases total (0-8) and standards are given for both white males and females (Iscan et

al. 1984b, 1985). Casts of the different phases of the right fourth sternal rib end are

available for comparative analyses.

Blind tests of the white male and female phase methods found that both were

reliable and that interobserver error was minimal (Iscan and Loth 1986a, 1986b).

Participants of varying levels of experience were asked to match unknown ribs to

photographs of specimens in order to assign a phase. Iscan and Loth (1986a, 1986b)

found that the unknown ribs were almost always placed within one phase of the correct

chronological age. Additionally, Iscan et al. (1989) presented a study at the annual

meeting of the American Academy of Forensic Sciences in which they found that the rib

shows much less variation than the pubic symphysis. An independent test of fourth rib

aging supported its use as an age indicator and found no significant differences in age

estimation between white and black males (Russell et al. 1993). Iscan et al. (1987),

however, identified differences in age-related changes of the sternal rib end between

blacks and whites. Because black individuals were consistently over-aged beginning in

their mid-30s (phases five through seven), Iscan et al. (1987) recommended the

modification of the existing white standards. However, no new standards were developed

for non-white individuals.

Iscan and colleagues clearly support their method as one of the best for

skeletal age estimation. However, Nawrocki (n.d.) has questioned its validity because of

the statistics presented in the original studies. The 95% confidence intervals published by

Iscan et al. (1984b, 1985) are in reality the confidence intervals for the population mean

41

and not for the range of values possible per phase (Nawrocki n.d.). Accordingly, the

intervals presented as appropriate for each phase of the Iscan sternal rib end method are

far too small. Constructing accurate error ranges is thus a vital part of any age estimation

method, as it will significantly impact the final age interval (Nawrocki n.d.)

The original sternal rib end method only employed the right fourth rib.

However, the right fourth rib may be absent or the remains so fragmentary that

determination of rib number or side is impossible. Yoder et al. (2001) demonstrated that

ribs IV through IX exhibited similar age-related changes on both right and left sides and

therefore could be used to estimate age with some caution. A summary method of age

estimation, based on a composite of rib series scores, is preferable when the fourth sternal

rib end is not available or observable (Yoder et al. 2001).

Kunos et al. (1999) developed a method of age estimation based on changes

seen in the first rib. While not exclusively based on the sternal end, this method is an

interesting alternative. Kunos et al. (1999) concluded that their method was reliable and

simple, providing age estimates comparable to those produced from multifactorial

methods. Schmitt and Murail (2004) found this method to be far too subjective in its

application, with a correct classification rate of only 55%. This study highlights the

importance of understanding variability and application of single-indicator age estimation

methods. Using a large sample of known age-at-death Balkan males, Bayesian statistics,

and new first rib age estimation method modified from Kunos et al. (1999), DiGangi et

al. (2009) suggest that the first rib may be able to detect age-related morphological

changes into the ninth decade. This method has yet to be tested on other samples, but the

results are certainly promising.

42

The Statistical Basis of Age Estimation

Regression and Correlation

Age estimates are based on skeletally manifested age indicators. Therefore,

the amount of error in any age estimate is directly related to how well a given age

indicator correlates with actual age (Aykroyd et al. 1997). Aging methods have

traditionally relied on linear regression and correlation models that use an observed

morphological age-related change or changes to predict age-at-death. Regression and

correlation models are based on using a known and independent variable (x), the

“indicator,” to predict the unknown and dependent variable (y), the “age.” Konigsberg et

al. (1994) also refer to this as the regression of age on an indicator.

An age estimate is derived by comparing a skeletal element or elements to

photos, casts, or descriptions related to a method and assigning a phase, stage, or score

based on that method. The method then usually reports an age interval, e.g., 25-30 or

60+, for the stage. More statistically robust methods can include any combination of the

following per stage: sample size, a mean (point estimate), median (midpoint), standard

deviation, 95% confidence (or prediction) interval, range, accuracy/inaccuracy, or bias.

Regression and correlation methods have the advantage of relatively easy application

because, once a regression equation has been derived, it is possible to estimate the age of

unknown individuals through a process known as inverse calibration (Aykroyd et al.

1997). Inverse calibration is a statistical procedure that utilizes least squares regression to

estimate values from given data. For age estimation, the estimated value would be age

and the given data would be an age indicator. In inverse calibration, the relationship

between the age indicator (x) and the estimated age (y) is assumed to be linear.

43

Error is based on the correlation of a morphological age indicator to

chronological age, thus, the poorer the correlation between variables, the greater the bias

in the age estimate (Aykroyd et al. 1999). Some statisticians and anthropologists now see

regression models as inherently flawed for estimating age (e.g., Aykroyd et al. 1997,

1999; Konigsberg and Frankenberg 2002), mainly due to the assumed linear relationship

between x and y variables. Classical calibration has been offered as a solution to the bias

inherent in regression models (Konigsberg et al. 1994; Aykroyd et al. 1997). This

variation of regression switches age to the x-variable and the indicator to the y-variable,

devising “an equation for y in terms of x” (Aykroyd et al. 1997:262), i.e., the regression

of the indicator on age (Konigsberg et al. 1994). While this reformulation of regression

results in lower bias, it is also less efficient (Konigsberg et al. 1994), meaning that it

results in greater variability and higher inaccuracy in age estimates than inverse

calibration (Aykroyd et al. 1997). Problems with both inverse and classical calibration

have led to undergoing scrutiny of the statistical analysis of age estimation.

Bayesian Analysis

More recently introduced statistical techniques into physical anthropology

include the use of Bayesian-based prediction models. Konigsberg and Frankenberg argue

that using Bayesian analysis is “the only logical way to proceed in estimating age in

paleodemography” (2002:306) because it solves the problem of reference sample

mimicry (Konigsberg and Frankenberg 1992). Forensic anthropologists have also begun

to incorporate Bayesian models, and not just for age estimation (e.g., Lucy et al. 1996;

Ross and Konigsberg 2002; Schmitt et al. 2002; Edgar 2005). Steadman et al. (2006)

presented the use of likelihood ratios in forensic anthropology to give an overall sense of

44

the strength of an identification based on key elements of the biological profile. Lucy et

al. (1996) identify applications of Bayes’ theorem as especially useful when analyzing

ordinal or categorical data and Schmitt et al. (2002) found that Bayesian prediction was

reliable and useful for individuals over the age of 50.

Bayes’ theorem uses prior probability, maximum likelihood ratios, and

posterior probability, and deals with the age of individuals on a case-by-case basis as part

of a larger sample. Prior probability is the expected probabilistic outcome of a

hypothesis; for aging, this means the probability of an individual being a certain age with

no other information beyond the assumption that he or she is similar to the sample being

used (Aykroyd et al. 1999). The likelihood is based on observed traits, i.e., the probability

of an individual being a certain age based on the distribution of the sample given that

particular score (Aykroyd et al. 1999). Finally, posterior probability is based on both the

prior probability and the likelihood, or the probability of an individual belonging to an

age group based on the prior probability and likelihood (Aykroyd et al. 1999). Aykroyd et

al. (1999:65) summarize these terms in the following manner: “the posterior probability

is proportional to the prior probability multiplied by the likelihood.”

Using Bayesian statistics, age estimates are given as probabilities of an

individual being a certain age given observed age indicators and the structure of the

observed population (and not the reference sample). Prior probabilities can be determined

by assuming uniform priors, using fixed model age structures, or estimating prior

probabilities in the target sample using the observed distribution of indicators in the

sample (Chamberlain 2006). While mathematically more complex, Bayesian age

estimation has the advantage of having a lower Mean Absolute Deviation (MAD) than

45

conventional regression models (Aykroyd et al. 1999) and has been shown to be better at

predicting ages beyond the fifth decade of life (Schmitt et al. 2002). Bayesian models

also support a non-linear relationship between age-at-death and skeletal indicators

(Schmitt et al. 2002). Bayesian models can be disadvantageous because they often require

a large, well-distributed reference sample of known-age individuals with a range of

measured age indicators (Aykroyd et al. 1999), though this challenge can sometimes be

mitigated.

Summary

Research in skeletal age estimation is an on-going endeavor in physical

anthropology. Accurate age estimation is a central facet of forensic anthropology,

paleodemography, and bioarchaeology. The best age estimations are constructed from

multiple methods, with an understanding of the strengths and weaknesses of each method

used to construct the final age interval. Even with decades of developments and

improvements, “age estimation from adult skeletal remains is one of the more difficult

and error-prone procedures in biological anthropology” (Chamberlain 2006:105). It is for

this reason that anthropologists must now work to understand and quantify error.

46

CHAPTER III

UNCERTAINTY ANALYSIS

Uncertainty analysis is a key component to evaluating and improving methods

in all scientific disciplines. In order to better understand measurement uncertainty, it is

necessary to quantify error. Error is the difference between the true value and the

measured value (Brach and Dunn 2004). Measurement uncertainty is expressed by a

range of possible values that could exist for a given measurement based on an estimate of

error for that measurement (Brach and Dunn 2004). Questions asked by those

investigating uncertainty include: “What is the instrument measuring? What units are

being reported by the instrument? What is the precision of the measurement?” (Brach and

Dunn 2004:2). This chapter will discuss standards of estimation of uncertainty, error, and

uncertainty in skeletal age estimation.

Standards

ISO/IEC 17025

The ISO is the global authority on the development of standards. The ISO/IEC

17025 (2005) is currently the international baseline document for general requirements

for testing and calibration laboratories. Only those standards that apply to testing

laboratories will be discussed because anthropology laboratories, in general, are

concerned with the testing of evidence and not with calibration. ISO/IEC 17025 sets the

47

minimum requirements for laboratory accreditation and quality management, including

guidelines for the estimation of uncertainty of measurement (Section 5.4.6). Uncertainty

is defined by the ISO as a “parameter that characterizes the dispersion of the quantity

values that are being attributed to a measurand, based on the information used”

(2004b:2.11).

Section 5.4.6 of ISO/IEC 17025 does not outline specific procedures for

estimating uncertainty in measurement in order to allow for flexibility in application of

the standards, especially when applied to testing laboratories. Certain types of

measurements may not lend themselves to rigorous mathematical testing of uncertainty

like the procedures outlined in the ISO Guide to the Expression of Uncertainty in

Measurement (GUM)–Supplement 1 (2004a). It is the responsibility of the testing

laboratory to “attempt to identify all the components of uncertainty and make a

reasonable estimation, and…ensure that the form of reporting of the result does not give a

wrong impression of the uncertainty” (ISO 2005:5.4.6.2). This estimation can be based

on previous validation studies, experience, and knowledge of method or measurement

performance (ISO 2005). Another important facet of ISO/IEC 17025 is the need to

recognize possible sources of uncertainty, including: reference standards and reference

materials used, methods and equipment used, environmental conditions, properties and

condition of the item being tested or calibrated, and the operator (5.4.6.3, Note 1).

ASCLD/LAB-International

The ASCLD/LAB-International program accredits crime laboratories in

accordance with ISO/IEC 17025 standards and ASCLD/LAB-International Supplemental

Requirements. This accreditation is usually part of a larger quality assurance program

48

undertaken by the laboratory. Like the ISO, “ASCLD/LAB does not prescribe a specific

formula for estimating uncertainty of measurement” (2007:2), but a laboratory must

include certain elements when attempting to analyze uncertainty in measurement. These

elements are: specify what is being measured and the measurement system, construct and

document an appropriate measurement uncertainty budget, identify and list all potential

sources of uncertainty while dismissing any potential sources that do not impact the

uncertainty of measurement, gather measurement data, estimate the uncertainty of

measurement using an appropriate formula, document estimated uncertainty and put

results/supporting documentation in the laboratory, and maintain and calculate as need

arises (ASCLD/LAB 2007:3-4).

An important clarification in the ASCLD/LAB-International Supplemental

Requirements entails what measurements require an estimation of uncertainty. Only

numerical values in quantitative tests are subject to estimation of uncertainty

requirements; the term used for these values is “measurements that matter”

(ASCLD/LAB 2007). Measurements that matter are defined by ASCLD/LAB (2007:4) as

measurements that are “used, or may reasonably be expected to be used, by an immediate

or extended customer (anyone in the judicial process) to determine, prosecute or defend

the type or level of criminal charge(s).” Qualitative tests do not require that estimates of

uncertainty be generated (ASLCD/LAB 2007). However, the updated ASCLD/LAB

uncertainty of measurement requirements recognize that other measurements made

during analysis may impact the accuracy of the final report (ASCLD/LAB 2008). These

“critical measurements made during analysis” will also require estimation of uncertainty

after all “measurements that matter” have been appropriately reported and documented

49

(ASCLD/LAB 2008) and further updates to ASCLD/LAB requirements will reflect the

need to estimate uncertainty in all methods employed.

JPAC/CIL

The JPAC/CIL is the only accredited laboratory dedicated solely to forensic

anthropology in the world. The CIL Quality Assurance Model has an overall goal of

maintaining the highest scientific integrity at all times and adheres to the surety model.

Surety is a form of quality assurance that sets standards to ensure that each and every end

product is of equal quality. This model is used in situations where a high degree of

reliability is required and was first developed in the 1930s by the aircraft industry

(JPAC/CIL 2008).

The JPAC/CIL Uncertainty of Measurement Policy is covered in Annex B of

SOP 4.0 in the JPAC Laboratory Manual (2008). At the CIL, “measurements that matter”

are “those that affect or have the potential to affect the overall conclusions of an

identification” (JPAC/CIL 2008:16). In general, this includes numerical values and

methods that are based on metric analyses. Since the CIL does not perform metrological

calculations of error, the main concerns are recognizing the potential for uncertainty in

measurement and making reasonable efforts to estimate uncertainty.

The list of components of uncertainty at the CIL is identical to that in ISO/IEC

17025 Section 5.4.6.3. Note 1, with additional descriptions of their effects on methods

employed at the CIL. The most significant factor in uncertainty is human error. As long

as methods and equipment are being used for their intended purposes and in accordance

with original, established instructions, there is very little uncertainty associated with test

results. Estimates of uncertainty must be conducted when new methods or modified

50

methods are put into use at the CIL. In these circumstances, estimates will comply with

ASCLD/LAB-International Requirements for the analysis of uncertainty (see above).

While skeletal age estimation methods are not currently listed as requiring estimates of

uncertainty at the CIL because of their generally qualitative and quasi-continuous nature,

ever-adapting ASCLD/LAB-International requirements and updates to uncertainty of

“critical measurements” mean that more stringent guidelines are imminent.

NAS

The recent report released in 2009 by The National Academy of Sciences

(NAS), titled Strengthening Forensic Science in the United States: a Path Forward,

highlights the importance of forensic science research encompassing uncertainty in

measurement and demonstrates the need for more stringent guidelines for the practice of

forensic science. This report is the result of a lengthy study of the field of forensic

science as conducted by the NAS. The document as a whole deals with problems of

regulating a decentralized system in need of national leadership and standards. It has

already been recognized by the American Academy of Forensic Sciences (AAFS) as a

significant contribution to and critique of the field of forensic science (personal

communication, Thomas Bohan March 15, 2009).

Chapter 6 of the NAS report deals specifically with method improvement,

practice, and performance in forensic science. Of paramount importance is the section on

uncertainties and bias, which clearly states that all forensic science methods need to

“indicate the uncertainty in the measurements that are made” (NAS 2009:6-1).

Additionally, the recommendations made are very closely linked with the aims of this

thesis: addressing issues of accuracy, reliability, and validity in methods

51

(Recommendation 3). The NAS also proposes that the National Institute of Forensic

Science (NIFS) establish model laboratory reports in accordance with ISO/IEC 17025

(Recommendation 2). Finally, the NIFS is encouraged to promote research on the effects

of human error in forensic science investigations, including determining causes of bias

and ways to quantify and describe error (Recommendation 5). This report demonstrates

that uncertainty analysis is at the forefront of research in the forensic sciences.

Error

Analysis of error is what determines the degree of uncertainty in a

measurement. Error is quantified by calculating the difference between the actual (or

true) value and the measured or estimated value. Error is a recurring problem of

measurement and the researcher must decide how much error can be tolerated in

experimentation and how to measure this error (Youden 1998). As Bernard (2002:426)

points out, “no set of data is free of error.” Ultimately, the goal is to minimize error while

accurately characterizing any error that is present so as not to overstate the accuracy of

the method (Komar and Buikstra 2008). There are two types of error that contribute to

uncertainty: random error and systematic error. The sum of random and systematic error

is equal to the total uncertainty of a measurement (Brach and Dunn 2004).

Random error is also referred to as Type A error and affects the precision and

repeatability of the measurement (Brach and Dunn 2004). This type of error is

statistically quantifiable and limited by repeated measurements and condition control

(Brach and Dunn 2004), although it is always present in measurement. Random error can

be related to the observer or observers, i.e., the human factor. In skeletal age estimation,

52

random error can occur because of incorrect assignment of an individual to a stage or

phase, or may be due to the analyst’s level of experience. Repetition of measurements

and replication of experiments does not guarantee complete accuracy or removal of

random error, but it can help point out errors (Youden 1998), especially if they result

from inconsistent applications of a method or measurement (Adams and Byrd 2002).

Systematic error, also known as Type B error, affects the accuracy of the

sample mean value in relation to the true mean value (Brach and Dunn 2004). This error

can be minimized by careful calibration, but lacks statistical information (Brach and

Dunn 2004). Systematic error in skeletal age estimation is usually related to the natural

variability of human aging that cannot be completely accounted for in osteological

techniques. In this case, the original phase or stage may not accurately reflect the true

variation of a particular age indicator in the human population. Measures of variability

include: range, variance, and standard deviation (Komar and Buikstra 2008) and can be

used when comparing a target sample to a reference sample.

Uncertainty in Age Estimation

Since age estimations are not metric measurements, the more mathematically

robust standards of measuring uncertainty, such as those in the ISO GUM Supplement 1

(2004a), are not applicable. However, the concept of uncertainty is still important

because it is necessary to understand the error associated with a method. Error directly

contributes to the accuracy of an age estimate and must be accounted for in

identifications in accordance with Daubert. Large error can also affect age estimation in

paleodemography, leading to skewed or incorrect mortality and health profiles. In

53

general, the error being analyzed is random error, which is the error associated with the

assignment of individuals to age intervals based on the application of age estimation

methods. Systematic error may be less readily apparent because it relates to the method

as it was originally developed.

Levels of Measurement

The methods used to estimate age-at-death are largely qualitative in nature.

Based on assigning individuals to a given phase or stage based on observed age-

indicators, they generally produce ordinal data (e.g., phase one, stage three). These values

can be rank ordered, but the differences between the ranks are not meaningful (Bernard

2002). For example, an auricular surface scored as a phase four is not from an individual

twice as old as phase two. Ordinal-level data can be analyzed statistically using

nonparametric tests (e.g., median tests, Spearman’s rank-order correlation coefficient),

but these are less statistically robust than tests for interval- and ratio-level data.

Some age estimation methods generate composite scores. These scores can be

treated as continuous variables because the score given can take on any value on a

continuum (Neter et al. 1988). Continuous variables are significant because they are

either interval- or ratio-level and can be described with continuous probability

distributions, the most well-known and most important in statistical analysis being the

normal distribution (Neter et al. 1988). The normal distribution is required for more

robust statistical tests and is used to draw conclusions about a sample based on the larger

population (Levin and Fox 2007).

Additionally, interval-level variables are produced when data for known age-

at-death and means for phases are available. Using the Suchey-Brooks pubic symphysis

54

method, a female scored as a phase three would be assigned a point age estimate of 30.7

years, which represents the mean age for that phase interval. This point estimate is then

comparable at an interval-level with the actual age of the individual. These data are

significant because they use units of measurement (years) that have known and

meaningful distances (Bernard 2002). Interval-level data can be analyzed statistically

with parametric tests, which include the ability to compare means between samples [e.g.,

Student’s t-test, analysis of variance (ANOVA)] and calculate and describe the degree of

association between two variables (e.g., Pearson’s r correlation, regression analysis)

(Levin and Fox 2007).

Osborne et al. (2004) made a case for the use of parametric tests, such as

analysis of covariance (ANCOVA), in determining the factors that affect auricular

surface morphology. ANCOVA is a statistical test similar to ANOVA, with the addition

of extra variables, or covariates, to examine variation while holding one variable

constant. In order to use ANCOVA, there must be a continuous dependent variable,

which is either interval- or ratio-level. The original age interval phases were designed by

Lovejoy and colleagues as discrete entities to make applying the method easier. However,

Osborne et al. (2004:907) argued that the large number of phase categories and the

general regularity of age-related changes of the auricular surface “closely mimic a true

continuous variable.” Therefore, they were able to apply interval-level statistical tests that

produced more robust information about age changes in the auricular surface. Their

discussion demonstrates that statistical analysis is left to the discretion of the researcher,

who must, in turn, understand the data that he or she is interpreting and what results can

be produced by different tests and assumptions.

55

Calculating Error

The rates of correct and incorrect classification are a useful starting point for

further investigation of error. Actual (or known) age-at-death is compared with the age

interval generated by the method. If the actual age falls within this interval, the

assignment is considered to be correct. Conversely, if it is not within the interval, it is

incorrect. The total number of correct and incorrect assignments can then be converted to

percentages by dividing by the total sample and multiplying by 100. Rates close to 100%

correct classification indicate that the method is very accurate in assigning age intervals.

Tests and validation studies of methods usually indicate the correct classification rate. An

example of this is Suchey’s (1979) test of the Gilbert and McKern (1973) female pubic

symphysis aging method, which yielded a correct classification rate of only 51%. Correct

and incorrect classification rates are a simple calculation, but offer little information on

precision of a method and are only vague indicators of random and systematic error.

Calculations of inaccuracy and bias are the most commonly employed

formulae for quantifying error in age estimation and examine how well a particular

method performs in relation to the actual ages of a sample of individuals. Inaccuracy is

the average error of a method given in years, but it does not take into account the

direction of the error (Meindl and Lovejoy 1989). The equation for inaccuracy utilizes the

absolute value of the difference between estimated and actual ages, divided by the sample

size. The sum of the differences divided by the sample size gives the overall inaccuracy.

Bias is similar to inaccuracy; it is the average error of a method in years that takes into

account under- or over-aging. The equation for bias is identical to that of inaccuracy,

56

except for the elimination of the absolute value to allow for directionality. Bias and

inaccuracy can be calculated for methods, phases within methods, and age classes.

Pearson’s correlation coefficient (r) measures the strength and direction of a

relationship between two variables. It is therefore a measure of precision because it

demonstrates how much each variable deviates from its respective mean (Levin and Fox

2007). For age estimation, correlation examines the strength of the relationship between

the estimated age based on an age-indicator and the actual age (e.g., Mulhern and Jones

2005). An age indicator that is only weakly correlated with actual age could be the source

of significant methodological error. Regression can also be used to further elucidate the

nature of the relationship of two variables, given that one variable is dependent upon the

other. The squared correlation (r2), or coefficient of determination, explains the variance

of one variable in terms of another. For age estimation, this could be used to explain how

much variation in an age indicator can be explained by age. The remaining percentage

would be related to other factors, including error.

Interobserver error is also a significant source of error in age estimation.

There will always be variation between researchers conducting the observations and the

extent of this relates to the reliability of a method. Interobserver error can be measured

using correlation and inaccuracy (e.g., Bedford et al. 1993). The kappa statistic can also

be used to measure agreement between tests for nominal data, encompassing both intra-

and interobserver variation (Komar and Buikstra 2008). A good age estimation method

must exhibit low interobserver error.

Adams and Byrd (2002) devised a Scaled Error Index (SEI) as part of their

study on interobserver error of postcranial skeletal measurements. The SEI is represented

57

by: SEI = [(׀x– median׀) /median]*100, where x represents a single measurement and

median the midpoint of all measurements. This index is useful because it allows for a

comparison of measurements that is unaffected by scale or sample size (Adams and Byrd

2002). The SEI also facilitates comparisons between individuals taking the measurements

and the measurements themselves, so it can be used to analyze both interobserver error

and method performance. Differences in mean SEI can be examined using a one- or two-

tailed t-test, or, for several methods, ANOVA.

Sources of Error

Skeletal aging methods do not produce exact ages; they produce estimates

with associated error rates, like all forensic anthropological techniques (Adams and Byrd

2002). There are several sources of error in age estimation, which can be associated with

those listed in ISO/IEC 17025, 5.4.6.3, Note 1. If incorrectly constructed, the reference

standards can contribute to error. The human skeleton exhibits natural variation and so

reference standards may not always express the full extent of human variability in aging.

Similarly, the reference materials may not be evenly distributed across all age classes,

which can influence the development of the method. Methods can be inherently flawed,

so that even their correct application produces incorrect results. The equipment used for

age estimation is most often comparison exemplars or casts used as part of an aging

method. Improper use of these materials will certainly affect results. Additionally, casts

may not express all possible variations of a particular phase since they were selected to

represent an average expression of age-related traits. Environmental conditions will most

likely have a negligible impact on age estimation, as will properties and condition of the

58

item (skeletal element) being tested, unless it exhibits severe pathological or taphonomic

alterations. In these instances, age will either not be estimated or estimated with caution.

The final cause of error in age estimation is the operator. According to the

JPAC Laboratory Manual (2008:18), “The most significant, ever present, and widely

varied uncertainties are the result of human error and other variances in performance. The

ways in which human performance affects uncertainty of tests may vary widely.” Analyst

error can be mitigated, but never entirely removed. Implementation of quality control

procedures and standardization of analytical techniques aid in minimizing the human

component of overall uncertainty in measurement.

Summary

Understanding error and uncertainty in measurement means that we can be

more confident in standardization and the comparison of results from different

laboratories and anthropologists. Comprehensive studies of methods and techniques

applied in one laboratory or institution offer an interesting historical perspective on

methodology, but these studies are also useful in the context of uncertainty analysis as

required by international scientific standards. A review of all methods used and inter-

method comparisons can provide an overarching critique that addresses temporal and

methodological concerns in a specific setting. This type of study is especially important

for those laboratories that are considering accreditation and for institutions and

individuals that regularly conduct casework for law enforcement agencies and require a

known error rate for laboratory manuals, standard operating procedures (SOPs), and

expert witness testimony. The following study is an attempt to estimate uncertainty in

59

methods used for age estimation at the JPAC/CIL and directly addresses many of the

current issues raised by the NAS hearings.

60

CHAPTER IV

METHODS I: RETROSPECTIVE STUDY

A sample of 979 individuals was compiled from the case files and records

archived at the JPAC/CIL. This chapter discusses the sample and sample selection, data

collection process, and methods used to analyze uncertainty in skeletal age estimation for

this sample. Calculations include: correct versus incorrect classification of individuals,

bias, inaccuracy, scaled error index (SEI), and correlation between known age-at-death

and estimated age.

The Sample

The JPAC/CIL is responsible for the recovery and identification of American

service men and women lost in past conflicts. Once an individual has been identified, the

remains are returned to the family and a case file is kept on record, usually in both hard

copy and electronic format. These case files contain any reports written by analysts at the

CIL, analytical notes, individual military and medical records, and other identification

media. From 1972 to 31 July 2008, the JPAC/CIL identified 1,717 individuals from

worldwide conflicts. Of these, 979 individuals have records with adequate information

concerning known age-at-death and methods used to estimate age.

The overall JPAC/CIL sample is almost completely male, and the sample for

this study is exclusively male. The individuals in this sample are also generally very

61

young, with a limited number of individuals over the age of 40. Additionally, an

overwhelming majority of the individuals are white, although some black and Hispanic

individuals are present in records from later conflicts such as the Vietnam War. A

distribution of individuals identified by conflict is given in Figure 1. The majority of

identified cases are from Southeast Asia and World War II conflicts, with a more limited

number from Korea, the Cold War, and other conflicts. Only one individual has been

identified from World War I. The JPAC/CIL population is unique because it comprises

individuals of identical sexes, with very similar ages, statures, and “races,” who died

from similar causes and in similar manners.

0

200

400

600

800

1000

1200

COLD WAR KOREA OTHER SOUTHEAST ASIA WORLD WAR I WORLD WAR II

Count of CONFLICT

CONFLICT

Figure 1. Individuals identified by conflict (N=1717).

62

Data Collection

Before beginning data collection, a list of all identified individuals was

obtained from the JPAC network and entered into a Microsoft© Excel spreadsheet. This

list included the accession number, name of the individual, approval date, date received,

associated conflict, associated country, and incident number or reference number

(REFNO) for each case. These data were organized in chronological order by accession

number to facilitate data collection from archived records. Additional data collection

categories were added for this study: known age-at-death in years or years and months

when given, analyst’s initials, method(s) used to estimate age, the phase or predicted age

interval for each method, and a place for additional comments.

All data were entered by hand onto ledger-size paper and then re-entered into

a computer database after all data collection was complete. Abbreviations were used for

analysts and methods and a key kept on file. No distinctions were made between right

versus left sides unless recorded as different phases or stages by the analyst. Aging

methods do not cite a significant difference in assigning an age estimate between sides

(e.g., McKern and Stewart 1957 for epiphyseal fusion of the clavicle; Falys et al. 2006

for the auricular surface) so it is expected that side will have a negligible impact on

measurement uncertainty.

All hand-entered data were transferred to a Microsoft© Excel spreadsheet,

undergoing reorganization and cleaning during this process. The data set was cleaned by

eliminating cases that had no available age-at-death or age estimation data because

further analyses would not be possible. Individuals who were eliminated were left in the

original database and a second database was created to encompass only the cleaned data

63

set. This step reduced the sample size from 1717 to 979 individuals. Ages-at-death were

rounded to the closest year since year and month were not available for all individuals.

For zero to five months, the age was rounded down and for six to eleven months rounded

up. Skeletal aging methods do not give estimates as precise as year and month. Therefore,

the reporting of age-at-death by year, whether it is due to original exclusion of months in

the military report or rounding, is unlikely to affect accuracy in data analysis.

Column headings were added to the cleaned database for each method type

(e.g., auricular surface, pubic symphysis) and column subheadings for the specific

methods (e.g., Mincer et al.; Suchey-Brooks). Data were then entered by method and

method type (e.g., pubic symphysis) when a specific method was not listed. Separate

spreadsheets were created for each subheading (specific method) and the data sorted by

method and entered into their respective spreadsheets. Each spreadsheet included the

accession number, known age-at-death, and age estimation as determined by that method,

per individual. Age estimations that were not method-specific were removed at this point

because they did not offer information on the performance of the method. For example, if

the femoral head was cited as fully fused but no reference was given, this data point was

eliminated. Some methods could be inferred (e.g., fully fused epiphyses in cases

identified before 1985 since McKern and Stewart (1957) would have been the only

reference up until this year), but case files that did not cite a specific method were

generally eliminated from the individual method data sets.

For each method-specific spreadsheet, data were resorted in chronological

order by accession number. Considerable reorganization was then needed in order to

begin analysis of the data. Since the sample spans 36 years of data collection at the CIL,

64

there was a wide variety of analytical styles and data recording methods in use.

Therefore, for each method, two columns were added: one for the reported phase or stage

and one for the age interval associated with this stage. In some cases, the analyst reported

only the stage, in other cases, only the age interval. The missing category was filled in

using the referenced method so that both phase/stage and estimated age interval were

listed for each individual. The wide variety of methods used meant that slight

modifications had to be made per method; these are discussed in the next section.

Additional columns were also added as needed for further analyses, e.g., predicted point

estimate, calculations of bias, inaccuracy, and SEI (see below).

Method-Specific Modifications

Epiphyseal Fusion. Scoring of epiphyseal fusion was not consistent between

methods. Data were recorded according to the referenced method, but are not easily

comparable between methods. A breakdown of scoring is given in Table 1. Several cases

had age estimates based on references not commonly used at the CIL, including:

Krogman (1939), McKern (1970), Pyle and Hoeur (1955), White (1991, 2000), Kunos et

Table 1. Epiphyseal scoring by method

Score McKern-Stewart Scheuer-Black Webb-Suchey Albert-Maples 0 Not fused Not

fused/IncompleteN/A No union

1 Beginning Complete Nonunion w/out separate epiphyses

Early-beginning, Late-progressing

2 Active N/A Nonunion w/separate epiphyses

Early-almost complete, Late-recent

3 Recent N/A Partial Union Complete 4 Complete N/A Complete N/A

65

al. (1999), and White and Folkens (2005). These cases were eliminated from further

analyses due to their small sample sizes; they were only used once or twice overall.

For McKern and Stewart (1957), a variety of terms were used by CIL analysts

to describe fusion. When no actual score was given, the descriptive terms were converted

to a number. Fusion described as partial, incomplete, gap, patent, or lapsed was given a

score of two; nearly complete, line, or late stages of union a score of three; and slight line

or small scar a score of four. The number conversion allowed easier comparison of the

data in table format.

Additionally, epiphyseal fusion does not occur at the same point in time for all

sites of a single bone, so it was necessary to record the stage of fusion for each site. This

required significant reorganization of the data reported for both McKern and Stewart

(1957) and Scheuer and Black (2000) into tables that listed each epiphysis of each bone.

Reconsultation of case reports, notes, and photographs was required so that the epiphyses

present for each case could be recorded in instances when the analyst gave the age

estimate based on “complete fusion of all epiphyses.” Late and early distinctions, if

noted, were removed for these methods since no such category exists in the references.

When both sides were recorded, the right side was chosen. Epiphyses that had a sample

size smaller than 15 were eliminated from further analyses, which included all age

estimates based on Scheuer and Black (2000) and the iliac crest using Webb and Suchey

(1985).

Suture Closure. Reporting for cranial suture closure was highly sporadic and

every effort was made to consolidate the data for analysis. In many cases, only a basic

description of the state of the cranial sutures was given and no method was referenced.

66

Age estimations using Meindl and Lovejoy (1985) were given as composite scores, mean

ages, and age intervals. When descriptions of the individual scores were given, these

could be converted into one of the four degrees of closure from the reference method. A

table was compiled that included the composite score, system (vault or lateral-anterior),

and the ten observation sites. Even with this reorganization and a sample size of 22, there

were still not enough composite scores or suture descriptions to adequately analyze this

method and it was excluded from further analyses.

For maxillary suture closure, both the Mann et al. 1987 and 1991 methods

have been employed at the CIL; these methods are not the same. All age estimations

produced using the 1987 method (n=7) were distinguished from those made using the

1991 method (n=55). In many instances, only a description of the state of the sutures was

included in reports or notes and these descriptions were converted to age intervals based

on the 1991 publication. Ginter (2005) was used to attempt to consolidate the many

intervals provided by the analysts since very few of them corresponded to the age

intervals provided by Mann et al. (1991) for the general pattern of suture obliteration.

Only those age estimations with closed intervals (e.g., 15-20) were used to calculate bias,

inaccuracy, and SEI and examine the correlation between known and estimated age-at-

death.

Third Molar Formation. The method of Moorrees et al. (1963) requires

analysis of both mesial and distal tooth roots and data collection and recording followed

this procedure. The method is only applicable to posterior mandibular teeth and all

incisors, although it was used occasionally at the CIL to describe maxillary third molars.

These data points (n=8) were removed from further analysis of the sample. Since data

67

were present for both mesial and distal roots of both mandibular third molars, there were

235 possible data points for 105 individuals.

Mincer et al. (1993) distinguish between maxillary and mandibular third

molars and black and white individuals, but not individual roots. All data referencing this

method were cleaned to reflect the stage per tooth, even when the stage per root was

given by the analyst. Information on the ancestry of the individual was also collected

when known. One case showed a discrepancy in stages between mesial and distal roots

for the same tooth and this tooth was given the same score as the other third molars for

this individual. There were 160 possible data points for 92 individuals.

Pubic Symphysis. Reporting of pubic symphysis age estimates almost always

occurred by method and required little reorganization of data. No modifications were

made to the scores and phases reported. Estimates that were not referenced were

eliminated.

Auricular Surface. Auricular surface age estimates appeared to reference

several sources, but on closer examination all used the Lovejoy et al. (1985b) phase

description with the exception of ten individuals that were scored with Buckberry and

Chamberlain (2002). Analysts who referenced Bedford et al. (1989) were referring to the

publication of color photos of the auricular surface, which were distributed to clarify the

phase descriptions of Lovejoy et al. (1985b). Age estimates with this reference were

combined with those using Lovejoy et al. (1985b). These were then assigned the Lovejoy

et al. phases when only an interval was given and all estimates were then also assigned

the Osborne et al. (2004) modified phases as per the JPAC Laboratory Manual SOP 3.4.

Analyses could then be undertaken with both the original and modified phases.

68

Sternal Rib Ends.Age estimates from the sternal rib ends appeared to be based

on a number of different references. However, they were all different publications of

Iscan and colleagues and could be consolidated into a generalized category based on their

age estimation method. Three individuals were aged based on the Kunos et al. (1999) first

rib criteria but these were eliminated from further analyses because of small sample size.

Data Analysis

Once the data were organized by method, known age-at-death distributions

were examined for each of the methods and compared to the overall sample by means of

ANOVA and Student’s t-tests to determine whether the sub-samples were representative

of the larger group. Distributions per method were also graphically represented as

histograms to look at their shapes. Descriptive statistics were calculated for each method

sample to compare actual ages of individuals making up the larger sample.

Correct and incorrect classification percentages were calculated for each

element or method. A column was added to the right of the phase number for the binary

system used to code correct versus incorrect classification. A zero meant that the actual

age did not fall within the predicted age interval, a one meant that it did. A simple sum of

the column gave the count of correct classification for that method or element. The

correct and incorrect classifications were then tabulated in the same spreadsheet and

percentages calculated based on the total sample size. All correct and incorrect

classifications were then entered into a separate table for ease of comparison between

elements and methods. Correct and incorrect classification sample sizes are generally

larger than sample sizes for calculations of bias/inaccuracy/SEI due to multiple phase

69

assessment age estimations that had to be eliminated from more robust statistical analyses

(e.g., the analyst listed the sternal rib end as a Phase IV-V).

To compare the results obtained by CIL analysts with the published

references, the distribution of known age per phase within a single method was

calculated. Saunders at al. (1992) graphically depicted results of similar analyses by

superimposing known ages of individuals in their sample over the 95% confidence

intervals for the reference standards. This graphing technique was modified for this study

to allow for larger sample sizes than those in Saunders et al. (1992). Therefore, each point

on the scatterplot does not always represent one individual, but all individuals that were

of that known age-at-death. The graphs represent known age-at-death distributions per

phase compared to the intervals of the referenced method per phase. The mean of each

reference interval when given in the method is represented on the graphs by a diamond.

All intervals are based on the male standards when given.

Initial analyses revealed that epiphyseal fusion methods, with the exception of

the iliac crest, S1-S2, and the medial clavicle, were being used to confirm adulthood in a

majority of the cases. The terminal stage of epiphyseal fusion offers little information on

uncertainty in age estimation since an individual with all long bone epiphyses fused could

be twenty-four or eighty years old. Even late-fusing epiphyses did not have adequate

samples sizes in the earlier stages to allow for meaningful statistical comparison.

Epiphyseal fusion methods were eliminated from further analyses. Dental formation

methods suffer from a similar problem, but further analyses were undertaken because

small sample sizes could be mitigated by combining data for all teeth in an individual.

70

A scaled error index (SEI) based on Adams and Byrd (2002) was devised to

allow for between- and within-method (phase to phase) comparison of error.

The equation used for each individual was:

SEI = ׀Estimated Age – Actual Age100 * ׀ Actual Age

Estimated age is either the mean of the phase or stage based on the original

publication or the midpoint of the phase or stage if no mean was included in the method.

In some instances the median may also be used, e.g., Buckberry and Chamberlain’s

(2002) revised auricular surface method. The average SEI per method and per phase

within the method was calculated. Tests of significance between phases of a single

method were conducted by ANOVA. If significant differences were found, Student’s t-

tests or ANOVA with the Bonferroni correction were run to further elucidate error by

phase.

Bias and inaccuracy were also calculated per individual based on the

equations provided by Mulhern and Jones (2005):

Bias = Σ(estimated age-actual age)/N

Inaccuracy = Σ׀estimated age-actual age׀/N

The sum of both bias and inaccuracy was compiled per method and per phase

within the method. Tests of significance were performed using ANOVA for multiples

phases and methods and Student’s t-test between pairs of methods or phases if ANOVA

detected statistically significant differences. Bias was depicted in histograms per method

to examine its distribution. Bias was chosen over inaccuracy and SEI because the

equation allows for negative values, thus giving a full picture of its distribution.

71

Pearson’s r was calculated per method to determine the strength of the

relationship between actual age and estimated age. The estimated mean/median/midpoint

(Y) was compared to the actual age (X) and an r-value calculated to determine the

strength of this relationship. Actual age was used as the independent variable because an

estimated age should depend on morphological changes related to the aging process

(Schmitt 2002). The coefficient of determination (r2) was also calculated in order to

explain how much of the variation in Y could be determined by X. The linear relationship

between Y and X was tested statistically with ANOVA. While phases or stages of age

estimation methods are normally considered to be ordinal data and therefore not

appropriate for correlation, Osborne et al. (2004) argued that the continuous nature of

data produced by age estimation allows for the use of parametric tests. Additionally,

using the mean age of the predicted phase for each individual provides interval-level data,

although using the mean point estimate may introduce additional error into the analyses

and does not provide as strong of a correlation as other interval- or ratio-level data.

Calculating the error of each method individually is not entirely realistic.

Normally, a final age estimation includes all available indicators to come up with an

overall age interval. However, this study will only examine each method’s individual

performance in the JPAC/CIL sample. Further studies using the same data set and a

combination of methods per individual may be possible, but this was not the scope of the

current study.

72

Summary

The retrospective study outlined here encompasses all identified individuals at

the JPAC/CIL with adequate data on known age-at-death and age estimation methods.

Comparisons of age distributions give an overall understanding of the sample and how

well each sub-sample represents the larger group. Comparisons of estimated ages to

known ages-at-death will quantify uncertainty associated with skeletal age estimation

methods.

73

CHAPTER V

METHODS II: INTEROBSERVER ERROR

STUDY

Following data collection for all individuals and age estimation methods in the

JPAC/CIL identified sample and initial analyses of these samples, three methods were

chosen for a preliminary interobserver error study to assess method reliability. These

methods are: Buckberry and Chamberlain’s (2002) revised auricular surface method,

Iscan et al.’s (1984b) sternal rib end for males, and Mann et al.’s (1991) maxillary suture

method. This chapter discusses the choice of methods, design of the study, and methods

of data analysis.

Choice of Methods

Buckberry and Chamberlain (2002)

While the original auricular surface method (Lovejoy et al. 1985b) had an

adequate sample size in the study sample, the revised method (Buckberry and

Chamberlain 2002) was only employed ten times in analyses at the CIL. A histogram of

bias revealed an irregular distribution, most likely due to the small sample size. However,

the method had a 100% correct classification rate. Additionally, it is the author’s opinion

that the Buckberry and Chamberlain (2002) method is used far less than the Lovejoy et

al. (1985b) method, especially since the latter appears in Standards in Data Collection

74

(Buikstra and Ubelaker 1994) and has been around for longer than the revised method.

Subjecting Buckberry and Chamberlain’s (2002) method to interobserver analyses will

produce a larger sample size and a better understanding of the use of this method as it

relates to experience and level of comfort. Is this method a reliable means of age

estimation?

Iscan et al. (1984b)

The sternal rib end method as developed by Iscan and colleagues is used

frequently for age estimations and is included as one of the main methods for age

estimation in Standards for Data Collection (Buikstra and Ubelaker 1994). In the

JPAC/CIL sample, it is used less regularly; the overall sample size is only 21 individuals.

Due to the poor preservation of skeletal remains in archaeological contexts, especially in

Southeast Asia where a large number of individuals are recovered by the JPAC, ribs are

often not recovered and, if recovered, the sternal end may be missing or too damaged for

analysis. Of those individuals who had sternal rib ends intact enough for age estimation,

the correct classification rate for the Iscan et al. age estimation method in the JPAC/CIL

sample was 71.4%. This could be due to problems in application of the method or with

the original age intervals as addressed by Nawrocki (n.d.). A test of interobserver error

using the original method will hopefully clarify at least one of these issues and increase

the overall sample size.

Mann et al. (1991)

The Mann et al. (1991) maxillary suture method is one of the more obscure

methods used for age estimation, and before arriving at the JPAC/CIL, the author had

never heard of or used this method. The Mann et al. maxillary suture method was used 62

75

times in the course of age estimations at the CIL and represents one of the larger sample

sizes, with the exception of epiphyseal fusion methods. With an overall correct

classification rate of 88.7%1, the Mann et al. maxillary suture method holds promise for

accurate age estimation. However, is the success of the Mann et al. method related to the

method itself or the presence of Dr. Mann at the CIL, who can readily answer questions

related to application of the method? Including this method in an interobserver error

study will address the ease of applicability of this method for analysts who may not ever

use this method.

Design of Study

Originally, tests of interobserver variation were to be conducted by comparing

the data produced by each analyst over the course of 36 years of casework at the CIL.

Research questions were: are there discernible trends by individual, e.g., is one person

consistently providing estimates that over- or under-age? and, how does error in age

estimation relate to level of experience as measured by the highest degree held?

However, the timeframe of data collection and the turnover of anthropologists at the CIL

did not allow for the generation of a large enough sample size per method or analyst to

make these comparisons. Therefore, an interobserver error study was developed

following the retrospective portion of data collection that encompassed three methods

commonly used at the CIL.

Two samples were chosen from the CIL study collection for each method: two

innominates, two full ribs, and two crania. Age-at-death of these samples is unknown, but

1 This figure includes correct classification for both Mann et al. (1987) and (1991). Age

estimations derived from the 1987 method were removed from further analyses.

76

an effort was made to choose samples that represented different age classes per method.

Validation studies require the use of a documented collection. Since this study examines

the application of methods, the comparison of known age-at-death to estimated age is not

vital. Instead, the distribution of estimated phases can be examined and inferences made

on method performance and level of experience of the analyst. Anthropologists from the

JPAC/CIL and participants at the 2009 annual meeting of the American Academy of

Forensic Sciences in Denver, Colorado were asked to voluntarily participate in this study.

A survey form was designed based on Adams and Byrd (2002), portions of

which were modified for this study. The survey form was approved by JPAC/CIL lab

management before it was administered and conducted as part of a paid fellowship from

Oak Ridge Institute for Science and Education (ORISE). All individuals participating

gave verbal consent and remained anonymous; no names were recorded and each survey

was randomly coded with a number for database entry. The following background

information was asked of each participant: field of study, highest degree obtained,

whether the individual is a Diplomate of the American Board of Forensic Anthropology

(D-ABFA) or not, number of years of experience with skeletal aging, and approximate

number of skeletons analyzed. For each method, the participant was asked to answer the

following questions: have you used this method before? If yes, do you use it on a regular

basis? What is your level of comfort with this method? (On a scale from one to five, one

being very low and five being very high).

The participant was then asked to give an age estimate for each of the samples

following the referenced method. The original articles were provided for easy reference

as well as any additional materials required for analysis (e.g., sternal rib end casts,

77

flashlight, magnifying glass). Each innominate was entirely covered in aluminum foil,

leaving only the auricular surface exposed, so that the analyst was not influenced by other

areas of the bone. The author was present for almost all portions of the study to answer

questions, but did not help any of the participants apply the methods so as not to bias

interpretations of ease of applicability.

The required information was slightly modified for each separate method. For

Buckberry and Chamberlain (2002), the participant was asked to enter the composite

score, corresponding stage, age point estimate, and interval, a percent confidence that the

observations made correspond to the correct composite score, and any additional notes.

For Iscan et al. (1984b), the participant was asked for a phase and corresponding age

interval, a percent confidence that the observations correspond to the correct phase, and

any additional notes. For Mann et al. (1991), the participant was asked to circle the

sutures that showed obliteration, provide an age estimate based on the state of

obliteration, give a percent confidence that observations of obliteration are correct and

that the interpretation of the age interval is correct, and any additional notes.

Data Analysis

Each completed survey was coded with a number and all information entered

into a Microsoft© Excel spreadsheet with four separate pages. The first page included all

background information provided by the participant. The following three pages contained

the information collected per method. All data were entered exactly as they appeared on

the surveys, including additional comments and notes.

78

The first step of data analysis was to examine the summary of self-reported

background information. Pie charts were produced to summarize the information

provided by participants. The next step of data analysis was to examine each method

individually. This included whether or not the participant had used the method before and

his or her level of comfort with the method. Additionally, are participants correctly

assigning point estimates based on estimated phases or stages?

The distribution of phase or stage assignment was then described. Histograms

were generated per sample based on the analysts’ given phases. In general, are the phases

tightly clustered with a normal distribution or is there a large variety of phase

assignments?

Finally, level of experience was analyzed in relation to phase assignment. The

median phase for each sample was assumed to represent the best possible estimation of

age for that individual. A SEI was calculated for phase and midpoint and then compared

based on experience levels. Experience was broken down by years of experience in

skeletal aging and highest degree obtained.

No comparison was made between SEI and self-reported approximate number

of skeletons analyzed because the sample size for those individuals with over 1000

skeletons was far too small (n=3). Additionally, the total number of skeletons analyzed is

not necessarily related to experience in age estimation. This question was asked to obtain

an overall sense of experience in applied osteology.

79

Summary

This portion of the study seeks to better understand error associated with three

specific age estimation methods. An overview of data obtained from a series of

interobserver error studies will provide information on reliability of these age estimation

methods as related to experience and ease of application of the method. If successful, this

preliminary study may be expanded to include more age estimation methods currently in

use at the JPAC/CIL as well as testing on documented collections.

80

CHAPTER VI

RESULTS I: RETROSPECTIVE STUDY

This chapter discusses the results obtained from statistical analyses of age

estimations produced by anthropologists at the CIL. Final sample sizes, known age-at-

death distributions, and descriptive statistics of these distributions are given for the

overall sample and each method’s sub-sample. A comparison of methods is given,

followed by the results for each method broken down by method type.

The Sample

Sample Sizes

Final sample sizes for each age estimation method used at the CIL between

1972 and 31 July 2008 are given in Appendix A (Tables A.1 and A.2). Both tables

represent the cleaned data set, which includes all identified individuals for which there

was adequate aging data (known age-at-death and cited methods). Sample sizes are all

greater than 20, with the exception of Todd (1920), Buckberry and Chamberlain (2002),

Webb and Suchey (1985) iliac crest, and all Scheuer and Black (2000) epiphyseal

methods. Ambiguity in reporting of cranial suture closure eliminated the Meindl and

Lovejoy (1985) method from further analyses.

81

Known Age-at-Death Distributions

The known age-at-death distribution for all identified individuals in the

cleaned data set is given in Figure 2. The sample is positively skewed, comprised of a

majority of young individuals and far fewer older individuals. Figures 3 through 15

0

20

40

60

80

100

17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59

Age-at-Death

Fre

qu

ency

Figure 2. Age distribution of total sample (n=979). graphically depict the known age-at-death distributions for each of the methods to allow

for visual comparisons of distribution shape with each other and the overall cleaned data

set. These figures show whether or not the sub-samples are representative of the

JPAC/CIL identified sample. If the sub-samples are not similar to the overall sample or

each other, there could be problems with comparisons made between different aging

methods, specifically related to error since age estimation methods do not perform the

same for all age groups. The method sub-samples are all generally similar in distribution

to the overall sample, with the exception of those that have small sample sizes (e.g.,

Buckberry-Chamberlain; Todd; Iscan et al.).

82

0

1

2

3

4

5

6

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

Age-at-Death

Fre

qu

ency

Figure 3. Age distribution: Albert-Maples 1995 (n=24).

0

1

2

3

4

5

6

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46

Age-at-Death

Fre

qu

ency

Figure 4. Age distribution: Webb-Suchey clavicle 1985 (n=33).

83

0

5

10

15

20

25

17 19 21 23 25 27 29 31 33 35 37 39 41 43 45

Age-at-Death

Fre

qu

ency

Figure 5. Age distribution: McKern-Stewart epiphyses 1957 (n=161).

0

2

4

6

8

10

12

18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52

Age-at-Death

Fre

qu

ency

Figure 6. Age distribution: McKern-Stewart pubic symphysis 1957 (n=79).

84

0

2

4

6

8

10

12

19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53

Age-at-Death

Fre

qu

ency

Figure 7. Age distribution: Suchey-Brooks pubic symphysis (n=10).

0

0.5

1

1.5

2

2.5

24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41

Age-at-Death

Fre

qu

ency

Figure 8. Age distribution: Todd pubic symphysis 1920 (n=93).

85

02468

101214161820

19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53

Age-at-Death

Fre

qu

ency

Figure 9. Age distribution: Lovejoy et al. auricular surface 1985 (n=147).

0

5

10

15

20

25

19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53

Age-at-Death

Fre

qu

ency

Figure 10. Age distribution: Osborne et al. auricular surface 2004 (n=151).

86

0

0.5

1

1.5

2

2.5

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37

Age-at-Death

Fre

qu

ency

Figure 11. Age distribution: Buckberry-Chamberlain auricular surface 2002 (n=10).

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

Age-at-Death

Fre

qu

ency

Figure 12. Age distribution: Iscan et al. sternal rib end 1984 (n=21).

87

0

2

4

6

8

10

12

14

16

18

17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

Age-at-Death

Fre

qu

ency

Figure 13. Age distribution: Moorrees et al. dental formation 1963 (n=92).

0

2

4

6

8

10

12

14

17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37

Age-at-Death

Fre

qu

ency

Figure 14. Age distribution: Mincer et al. dental formation 1993 (n=105).

88

0123456789

10

18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

Age-at-Death

Fre

qu

ency

Figure 15. Age distribution: Mann et al. maxillary sutures 1991 (n=55).

In order to quantitatively examine the age distributions, summary statistics for

the the overall known age-at-death sample and each method sub-sample were calculated

(Table 2). Each method sub-sample is generally similar to the larger sample, with mean

and median ages-at-death in the mid-20s. The Todd method is the only sample with a

mean known age-at-death in the 30s, but it is similar in standard deviation and variance to

several other methods. There are no individuals younger than 17 years old or older than

59 years old in the JPAC/CIL identified known age-at-death sample (n=979).

A one-way ANOVA revealed a statistically significant difference in mean

age-at-death between all samples, including the overall known age-at-death sample

(p=0.000). To better understand which samples differed from one other, ANOVA with

the Bonferroni correction for multiple comparisons was run; p-values are given in Table

3. The McKern and Stewart (1957) epiphyseal fusion methods, the maxillary suture

closure method, and both dental formation methods had mean known ages-at-death that

89

Table 2. Descriptive statistics by method.

Sample N Mean (x̄ )

Median Min Max Range Standard Deviation

Variance

All (identified, known age-at-death)

979 27.24 26 17 59 42 6.59 43.39

McKern-Stewart EPIP

161 24.42 23 17 46 29 5.00 24.97

Albert-Maples VERT 25 23.60 23 19 42 23 4.82 23.25

Webb-Suchey CLAV 33 25.21 24 19 46 27 5.15 26.55

Mann et al. MSUT 55 23.88 23 18 36 18 3.92 15.38

Moorrees et al. DEN 105 23.80 23 17 42 25 4.26 18.16

Mincer et al. DEN 92 23.92 23.5 17 37 20 4.00 16.01

McKern-Stewart PS 79 26.34 25 18 53 35 5.81 33.79

Suchey-Brooks PS 93 29.14 27 19 54 35 7.48 56.21

Todd PS 10 31.10 31 24 41 17 6.51 42.32

Lovejoy et al. AS 147 26.97 25 19 53 34 5.89 34.66

Osborne et al. AS 151 26.82 25 19 53 34 5.89 34.65

Buckberry-Chamberlain AS

10 25.40 25 19 37 18 5.19 26.93

Iscan et al. RIB 21 24.95 23 18 35 17 4.54 20.65

were significantly lower than the identified known age-at-death sample; these differences

are significant at α≤0.05. All other methods were not statistically different in mean age-

at-death from the total sample.

Statistically significant differences in mean age-at-death occurred between

two groups: 1. epiphyseal fusion, maxillary suture, and dental formation methods, and 2.

pubic symphysis and auricular surface methods. The sternal rib end method, the Webb-

Suchey clavicle method, and the Buckberry-Chamberlain auricular surface method were

not statistically different in mean age-at-death from all other methods. In general, the first

group represents samples with lower mean ages-at-death, while the second group has

samples with higher mean ages-at-death. This is logical when considering that all

methods in the first group (with the exception of maxillary suture closure) are generally

90

Table 3. P-values from one-way ANOVA with Bonferroni correction: all methods.

Method ALL MC-S EPI

A-M VERT

W-S CLV

MANN MOO MIN MC-S PS

S-B TODD LOVE OSB B-C ISC

ALL N/A .000* .268 1.000 .003* .000* .000* 1.000 .348 1.000 1.000 1.000 1.000 1.000MC-S EPIP

.000* N/A 1.000 1.000 1.000 1.000 1.000 1.000 .000* .064 .021* .042* 1.000 1.000

A-M .268 1.000 N/A 1.000 1.000 1.000 1.000 1.000 .004* .084 .916 1.000 1.000 1.000W-S 1.000 1.000 1.000 N/A 1.000 1.000 1.000 1.000 .123 .635 1.000 1.000 1.000 1.000MANN .003* 1.000 1.000 1.000 N/A 1.000 1.000 1.000 .000* .044* .085 .141 1.000 1.000MOO .000* 1.000 1.000 1.000 1.000 N/A 1.000 .434 .000* .024* .004* .008* 1.000 1.000MIN .000* 1.000 1.000 1.000 1.000 1.000 N/A .831 .000* .033* .014* .027* 1.000 1.000MC-S PS

1.000 1.000 1.000 1.000 1.000 .434 .831 N/A .227 1.000 1.000 1.000 1.000 1.000

S-B .348 .000* .004* .123 .000* .000* .000* .227 N/A 1.000 .606 .330 1.000 .378TODD 1.000 .064 .084 .635 .044* .024* .033* 1.000 1.000 N/A 1.000 1.000 1.000 .740LOVE 1.000 .021* .916 1.000 .085 .004* .014* 1.000 .606 1.000 N/A 1.000 1.000 1.000OSB 1.000 .042* 1.000 1.000 .141 .008* .027* 1.000 .330 1.000 1.000 N/A 1.000 1.000B-C 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 N/A 1.000ISC 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 .378 .740 1.000 1.000 1.000 N/A

*p≤0.05

91

used to provide age estimates for late adolescents/young adults since they concern late

development, while the methods in the second group are used for adults who have

completed all development.

Method-to-Method Comparison

All methods were compared to one another with the exception of the

epiphyseal fusion methods. These are discussed separately below. Initially, the dental

formation methods were also compared to all other methods, but high error values

initiated a reanalysis of Moorrees et al. and Mincer et al. separately. The inclusion of

terminal stages in both epiphyseal fusion and dental formation makes these methods

harder to compare with methods having continuous variable categories.

Correct and Incorrect Classifications

The amount of correct and incorrect classifications by method (excluding

epiphyseal fusion) are given in Table 4. The methods with correct classifications above

Table 4. Correct and incorrect classifications by method (excluding epiphyseal fusion). Method N # Correct % Correct # Incorrect % Incorrect McKern-Stewart PS 79 65 82.3 14 17.7 Suchey-Brooks PS 93 91 97.9 2 2.2 Todd PS 10 7 70.0 3 30.0 Lovejoy et al. AS 147 95 64.6 52 35.4 Osborne et al. AS 151 142 94.0 9 6.0 Buckberry-Chamberlain AS

10 10 100.0 0 0.0

Iscan et al. RIB 21 15 71.4 6 28.6 Mann et al. MSUT 62 55 88.7 7 11.3 Mincer et al. DEN 160 153 95.6 7 4.4 Moorrees et al. DEN 235 209 88.9 26 11.1

92

90% included: Suchey-Brooks, Osborne et al., Buckberry-Chamberlain, and Mincer et al.

Methods with correct classifications between 80 % and 90% were: McKern-Stewart

pubic symphysis, Mann et al., and Moorrees et al. Finally, methods with correct

classifications below 80% were Todd, Lovejoy et al., and Iscan et al., with a particularly

low correct classification rate of 64.6% for Lovejoy et al.

Error

Overall bias, inaccuracy, and the scaled error index (SEI) for each non-

epiphyseal union method are given in Table 5. Bias is the average error in years, taking

into consideration directionality (i.e., over- or underaging) (Meindl and Lovejoy 1989). It

Table 5. Error by method (excluding epiphyseal fusion). Method N Bias (Σ) Inaccuracy (Σ) SEI (x̄ ) McKern-Stewart PS 73 -1.07 2.68 9.49 Suchey-Brooks PS 86 0.76 3.72 13.27 Todd PS 10 0.10 1.80 6.13 Lovejoy et al. AS 147 1.89 3.16 12.40 Osborne et al. AS 113 -0.59 3.93 14.25 Buckberry-Chamberlain AS 9 6.51 6.88 25.91 Iscan et al. RIB 14 -0.50 1.76 7.21 Mann et al. MSUT 27 0.09 2.25 9.35 Mincer et al. DEN 27 -2.02 2.17 10.40 Moorrees et al. DEN 46 -3.47 3.47 16.94

is calculated with the following equation: bias=Σ(estimated age-actual age)/N. Inaccuracy

is the average error in years regardless of directionality (Meindl and Lovejoy 1989). It is

calculated with the following equation: inaccuracy=Σ׀estimated age-actual age׀/N. The

SEI was developed for this study and compares error between estimated and actual age

regardless of scale. It can be used to compare methods, phases of methods, and observers.

93

The SEI is calculated with the following equation: SEI = [(׀Estimated Age – Actual

Age׀) /Actual Age] * 100. The mean is then computed for each group being compared.

The dental formation methods (Mincer et al; Moorrees et al.) originally had

the highest bias and inaccuracy of all methods. Error decreased once the terminal stages

were eliminated from analyses. The calculations given in Table 5 exclude terminal

stages.Tthe Buckberry-Chamberlain method had the highest bias, inaccuracy, and SEI of

all methods, though this may be affected by the small sample size for this method. The

Todd method had very low bias, inaccuracy, and SEI, which may also be affected by

small sample size. Excluding those methods with sample sizes smaller than 20

individuals, the Mann et al. maxillary suture method had the lowest overall error. Table 5

gives the error associated with each method and Figures 16 through 18 are graphical

representations of overall method error by error type (e.g., bias, inaccuracy, SEI).

Based on bias,1 the McKern-Stewart pubic symphysis, Osborne et al., Iscan et

al., Mincer et al., and Moorrees et al. methods generally underaged, while the Suchey-

Brooks, Todd, Lovejoy et al., Buckberry-Chamberlain, and Mann et al. methods

overaged. There was no trend in bias based on types of method (e.g., all pubic

symphyseal methods do not underage). Examining those methods with large enough

sample sizes (n≥20), average error in years did not exceed four years for any method. The

Suchey-Brooks and Osborne et al. methods had higher inaccuracy compared to other

methods, which is related to the large confidence intervals provided for phases of these

methods. A high SEI can also be a function of large confidence intervals, as witnessed by

1 (-) bias indicates underaging, (+) bias indicates overaging.

94

‐4

‐2

0

2

4

6

8

McKern

‐Stewart

PS

Suchey‐B

rooks

Todd

Lovejo

y et al.

Osborne e

t al.

Buckb

erry‐

Cham

berlain

Iscan et al.

Mann et al.

Mincer e

t al.

Moorrees et al.

Figure 16. Sum of bias by method (in years).

0

1

2

3

4

5

6

7

8

McKern‐Stewart PS

Suchey‐Brooks

Todd

Lovejoy et al.

Osborne et al.

Buckberry‐Chamberlain

Iscan et al.

Mann et al.

Mincer et al.

Moorrees et al.

Figure 17. Sum of inaccuracy by method (in years).

95

0

5

10

15

20

25

30

McKern‐Stewart PS

Suchey‐Brooks

Todd

Lovejoy et al.

Osborne et al.

Buckberry‐Chamberlain

Iscan et al.

Mann et al.

Mincer et al.

Moorrees et al.

Figure 18. Mean SEI by method. the SEI for the Buckberry-Chamberlain and Suchey-Brooks methods, or a high

percentage of incorrect classifications, such as the Lovejoy et al. method.

Table 6 shows results from the calculation of Pearson’s r, where X is the

known age-at-death and Y is the midpoint of the estimated age. The Todd and

Table 6. Comparison of Pearson’s r and r2 by method (excluding dental methods). Method N r r2 Standard Error

(in years) P-Value

(ANOVA) McKern-Stewart PS 73 0.79 0.63 3.04 0.000* Suchey-Brooks PS 86 0.80 0.63 4.55 0.000* Todd PS 10 0.95 0.90 2.38 0.000* Lovejoy et al. AS 147 0.78 0.61 3.51 0.000* Osborne et al. AS 113 0.74 0.55 5.33 0.000* Buckberry-Chamberlain AS 9 0.92 0.85 4.67 0.000* Iscan et al. RIB 14 0.71 0.50 2.51 0.004* Mann et al. MSUT 27 0.79 0.63 2.96 0.000*

*p≤0.05

96

Buckberry-Chamberlain methods show high correlations, although all methods listed here

have r-values greater than 0.70. The coefficient of determination (r2) shows how much of

the variation in predicted age can be explained by the known age-at-death. The standard

error is the average number of years one can expect to be off when using a given age

estimation method. No method is greater than six years, with the mean for all eight

methods equal to 3.62 years. All age estimation methods in Table 6 have a significant

linear relationship between known age-at-death and estimated age (ANOVA, p≤0.05).

Method by Method

Epiphyseal Fusion

Table 7 gives the correct and incorrect classification rates for all epiphyseal

fusion methods used at the CIL, with the exception of Scheuer-Black. The epiphyses

listed here all fully fuse generally by late adolescence, with the exception of the medial

clavicle, vertebral centra, iliac crest, and the first two sacral segments. Results for

methods using the long bone and later-fusing epiphyses are given in separate sections

below.

Long Bone Epiphyses

All long bone epiphyses had correct classification rates above 95% and were

scored using McKern and Stewart (1957). There was no clear trend between correct

classification using early- versus late-fusing epiphyses. Early-fusing epiphyses (Group I)

are the distal humerus, medial epicondyle of the humerus, proximal radius, proximal

ulna, femoral head, greater and lesser trochanters of the femur, distal tibia, and distal

fibula (McKern and Stewart 1957). Late-fusing epiphyses (Group II) are the proximal

97

Table 7. Correct and incorrect classifications for epiphyseal fusion methods.

Method Epiphysis N # Correct

% Correct

# Incorrect

% Incorrect

Albert-Maples Vertebral Centra 24 23 95.8 1 4.2 Webb-Suchey Medial Clavicle 33 32 97.0 1 3.0 Iliac Crest 6 6 100.0 0 0.0 McKern-Stewart Proximal

Humerus 80 80 100.0 0 0.0

Distal Humerus 63 63 100.0 0 0.0 Medial

Epicondyle 57 57 100.0 0 0.0

Proximal Radius 50 50 100.0 0 0.0 Distal Radius 51 50 98.0 1 2.0 Proximal Ulna 56 56 100.0 0 0.0 Distal Ulna 37 37 100.0 0 0.0 Proximal Femur 85 83 97.7 2 2.4 Greater

Trochanter 70 70 100.0 0 0.0

Lesser Trochanter

65 65 100.0 0 0.0

Distal Femur 79 77 97.5 2 2.5 Proximal Tibia 72 70 97.2 2 2.8 Distal Tibia 65 64 98.5 1 1.5 Proximal Fibula 36 36 100.0 0 0.0 Distal Fibula 49 48 98.0 1 2.0 Clavicle 72 64 88.9 8 11.1 Iliac Crest 32 32 100.0 0 0.0 S1-S2 28 9 32.1 19 67.9 Vertebrae 18 16 88.9 2 11.1

humerus, distal radius, distal ulna, distal femur, proximal tibia, and proximal fibula

(McKern and Stewart 1957). In the JPAC/CIL sample, the long bone epiphyses that

showed less than 100% correct classification were the distal radius, proximal femur,

distal femur, proximal tibia, distal tibia, and distal fibula. However, incorrect

classifications never occurred for more than two individuals out of each element’s

sample.

98

Age distributions for long bone epiphyses (in %) can be found in Appendix B

(Tables B.1-B.13). For the following epiphyses, all from Group I, only stage four was

observed in the JPAC/CIL identified sample: medial epicondyle of the humerus, proximal

radius, proximal ulna, and lesser trochanter of the femur. In general, Group II epiphyses

had slightly larger age distributions for stages zero through three, but still retained a

majority of individuals in stage four.

Other Epiphyses

Epiphyseal fusion of the vertebral centra was scored using both the Albert and

Maples (1995) and McKern and Stewart (1957) methods. The Albert-Maples method had

a higher correct classification rate than the McKern-Stewart method (95.8% versus

88.9%). Both had similar sample sizes. Figures 19 and 20 depict the distributions of

known ages-at-death superimposed over the stage intervals given by the reference

method. The youngest age of complete union of vertebral centra in the Albert-Maples

sample was 23, while in the McKern-Stewart sample it was 19. Fusion was still occurring

in individuals up to the age of 25 in the Albert-Maples method sample, and 24 in the

McKern-Stewart method sample.

Epiphyseal fusion of the sternal (medial) end of the clavicle was scored using

both the Webb and Suchey (1985) and McKern and Stewart (1957) methods. The Webb-

Suchey method had a higher correct classification rate than the McKern-Stewart method

(97.0% versus 88.9%). Figures 21 and 22 depict the distribution of known ages-at-death

superimposed over the stage intervals given by the reference method. The youngest

individual to exhibit complete fusion in the Webb-Suchey method sample was 25 at the

99

Figure 19. Comparison of known ages of identified males superimposed over the summary stage observations for the three stages of vertebral centra fusion as given in the Albert and Maples (1995) method. time of death, contrasted with 21 for the McKern-Stewart method sample. Active union

was seen up until the age of 28 in both samples.

Epiphyseal fusion of the iliac crest was scored using the Webb and Suchey

(1985) and McKern and Stewart (1957) methods. The overall sample size for Webb-

Suchey was only six individuals, all of whom were correctly classified. While the sample

for the McKern-Stewart method was larger, it too had a 100% correct classification. The

age distributions for stages of iliac crest union using the McKern-Stewart method in the

JPAC/CIL sample are given in Table 8. The youngest individual to show complete union

was 19 and active union was observed until the age of 22. An unfused iliac crest

epiphysis was observed in an individual 20 years of age.

100

Figure 20. Comparison of known ages of identified males superimposed over the summary stage observations for the four stages of vertebral centra fusion as given in the McKern and Stewart (1957) method.

Fusion of the first and second sacral segments was scored uniquely with

McKern and Stewart (1957). Other sacral segments were not recorded. This aging

technique had the lowest percentage of correct classification in the entire JPAC/CIL

sample at 32.1%. Figure 23 shows that the problem clearly lies in the intervals given for

the first three stages; very few of the individuals in the JPAC/CIL sample actually fall

into the age intervals given by the reference method. Stages three and four do not exhibit

the same problem, however, these stages also have very large ranges. A distinct fusion

pattern related to age is not present, with an absence of fusion seen in an individual who

was 30 years old at the time of death and complete fusion in two individuals who were

29. These results suggest that fusion of sacral segments may be of limited use in age

estimation.

101

Figure 21. Comparison of known ages of identified males superimposed over the age intervals for the four stages of epiphyseal fusion of the medial clavicle as given in the Webb and Suchey (1985) method. Suture Closure

Reporting for the Mann et al. maxillary suture method did not appear to

follow a standardized format, which could be due to confusion in applying the method.

This possibility will be discussed further in Chapter VII. The results were difficult to

compare between individuals since the same age class was rarely reported and in many

instances only a minimum age was given (e.g., 20+). Table 9 shows the correct and

incorrect classification of individuals in this sample using the age intervals as they were

reported by the analysts, with an overall correct classification rate of 88.7%. This table

includes age estimations based on both the 1987 and 1991 methods. The highest number

of incorrect classifications occurred in individuals under the age of 20. It is important to

102

Figure 22. Comparison of known ages of identified males superimposed over the age intervals for the five stages of epiphyseal fusion of the medial clavicle as given in the McKern and Stewart (1957) method.

Table 8. Age distribution of stages of iliac crest union (in %): McKern-Stewart (1957).

Age N 0 1 2 3 4 18 2 - - 100 - - 19 2 - - - 50 50 20 5 20 20 40 20 - 21 4 - - 25 - 75 22 3 - - 33 - 67 23 1 - - - - 100

24+ 13 - - - - 100 Total 30

103

Figure 23. Comparison of known ages of identified males superimposed over the age intervals for the five stages of epiphyseal fusion of the first two sacral segments as given in the McKern and Stewart (1957) method. note that there are far fewer individuals in the upper age categories of this method, i.e.,

individuals greater than the age of 25. Therefore, the higher percentage of correct

classification of individuals over the age of 25 may be a factor of sample size. Of those

individuals who were incorrectly classified, four were assigned to age categories greater

than their actual age-at-death and three were assigned to categories less than their actual

age-at-death.

Bias, inaccuracy, and SEI were calculated using the midpoint of the predicted

age interval. Age intervals were taken from Figure 2 in Mann et al. (1991:783). These

intervals represent the general pattern of suture obliteration in an adult; ages for earliest

obliteration are also provided in the original publication. Only 27 individuals could be

104

Table 9. Correct and incorrect classifications by age interval of the Mann et al. maxillary suture method (1987, 1991).

Age Interval N # Correct % Correct # Incorrect % Incorrect <20 4 3 75.0 1 25.0 <21 1 1 100.0 0 0.0 <25 7 7 100.0 0 0.0 <30 2 2 100.0 0 0.0 20+ 15 14 93.3 1 6.7 22+ 1 1 100.0 0 0.0 25+ 4 3 75.0 1 25.0 15-20 4 2 50.0 2 50.0 20-25 11 9 81.8 2 18.2 22-26 1 1 100.0 0 0.0 20-30 3 3 100.0 0 0.0 20-35 1 1 100.0 0 0.0 25-30 3 3 100.0 0 0.0 25-35 1 1 100.0 0 0.0 25-40 1 1 100.0 0 0.0 26-33 1 1 100.0 0 0.0 20-50 2 2 100.0 0 0.0 Total 62 55 88.7 7 11.3

used for this portion of the analysis because all estimates without closed intervals had to

be eliminated due to the lack of a midpoint. Table 10 gives bias, inaccuracy, and SEI

values for intervals as they were reported by the analysts. Overall, the method had a

slight tendency to overage (bias=0.09) and had an average inaccuracy of 2.25 years. Age

estimates under the age of 20 had a tendency to underage, while those over the age of 20

overaged, with the exception of the 25-30 interval. A histogram showing the distribution

of bias for this method is given in Figure 24. The normal curve has been superimposed

over this distribution and demonstrates that error is relatively normally distributed even

with the small sample size.

No graph of actual ages superimposed over intervals was produced for this

method due to the variation in reporting of age estimates. Figure 25 shows the correlation

105

Table 10. Error values for Mann et al. (1991) by reported interval.

Age Interval N Bias (Σ) Inaccuracy (Σ) SEI (x̄ ) 15-20 4 -3.00 3.00 14.38 20-25 11 -0.09 1.55 6.94 20-30 3 0.67 1.33 5.83 20-35 1 3.50 3.50 14.58 20-50 2 3.25 3.75 13.19 22-26 1 0.00 0.00 0.00 25-30 3 -0.83 3.50 12.73 25-40 1 3.50 3.50 12.07 26-33 1 2.50 2.50 10.00 ALL 27 0.09 2.25 9.35

Figure 24. Distribution of bias for the Mann et al. (1991) maxillary suture method.

106

R2 = 0.6308

0

5

10

15

20

25

30

35

40

0 5 10 15 20 25 30 35 40

Known Age-at-Death

Est

imat

ed A

ge

Figure 25. Correlation of estimated and known ages-at-death for the Mann et al. (1991) maxillary suture method (n=27).

between estimated and known age-at-death for individuals in the Mann et al. sample

(n=27). The estimated age values are the midpoints of the predicted age interval. There is

a significant relationship (p=0.000) between estimated and known age-at-death (r=0.79,

see Table 6).

Third Molar Formation

Third molar formation was reported using the Moorrees et al. (1963) and

Mincer et al. (1993) methods. When compared to other age estimation methods employed

at the JPAC/CIL, both dental formation methods exhibited higher than average bias,

inaccuracy, and SEI, while maintaining average to above-average correct classifications

(Tables 4 and 5). Both methods include terminal stages (Apices complete (Ac) and Stage

H, respectively), which give a minimum time for formation but not an upper age

boundary. Similar to epiphyseal fusion methods, an individual with closed root apices

107

could be 25 or 85, thus limiting the usefulness of these methods to age estimation of

individuals beyond late-adolescence.

Tables 11 and 12 display calculations for both methods, differentiating

between results that include the terminal stage and results where the terminal stage has

Table 11. Correct and incorrect classifications of dental formation methods. Method N #

Correct%

Correct #

Incorrect %

Incorrect Moorrees et al. with “Ac” 235 209 88.9 26 11.1 Moorrees et al. excluding “Ac” 54 28 51.9 26 48.2 Mincer et al. with “H” 160 153 95.6 7 4.4 Mincer et al. excluding “H” 27 20 74.1 7 25.9

Table 12. Error of dental formation methods. Method N Bias (Σ) Inaccuracy (Σ) SEI (x̄ ) Moorrees et al. with “Ac” 227 -19.56 20.01 19.35 Moorrees et al. excluding “Ac” 46 -3.47 3.47 16.94 Mincer et al. with “H” 160 -15.57 16.46 15.74 Mincer et al. excluding “H” 27 -2.02 2.17 10.40

been removed from analyses. While the number of incorrect classifications did not

change for either method when the terminal stages are removed from analyses, there was

a dramatic shift in the percentage of correct classifications. Removing individuals

classified as “Ac” or Stage H reduced the number and percentage of correct

classifications. For the Moorrees et al. method, stages of partial root development and

108

apex closure had a low correct classification rate for all teeth and roots (see Tables C.1

through C.4, Appendix C). The Mincer et al. method had higher rates of correct

classification overall (see Tables D.1 through D.4, Appendix D).

Error decreased with the elimination of the terminal stages (Table 12). Both

methods continued to underage, but by a much smaller average number of years.

Inaccuracy was reduced from 20.01 years to 3.47 years for Moorrees et al. and 16.46

years to 2.17 years for Mincer et al. The SEI does not show as large of a reduction, but

still decreases compared to the results including terminal stages. It is clear from these

results that the inclusion of terminal stages in the analyses of dental formation methods

does not accurately represent overall method performance.

Non-terminal stage sample sizes were very small, therefore distributions of

known age-at-death compared to estimated age and distributions of bias will not be

reported, as they are uninformative. Age distributions by stage of root formation

excluding terminal stages are given for each method by combining data for all teeth and

roots for that method (Tables 13 and 14). The minimum age of complete apex closure

observed in both the Moorrees et al. and Mincer et al. samples was 18 years old.

Table 13. Age distribution of stages of dental root formation (in %): Moorrees et al.

Age N R1/2 R3/4 Rc A1/2 17 4 100 - - - 18 8 13 50 38 - 19 7 29 57 - 14 20 12 17 - 33 50 21 11 9 73 9 9 22 0 - - - - 23+ 4 - - 50 50 Total 46

109

Table 14. Age distribution of stages of dental root formation (in %): Mincer et al.

Age N D E F G 17 2 - - 100 - 18 3 - - 33 67 19 6 17 50 - 33 20 10 - - 30 70 21+ 6 - - - 100

Total 27

Since non-terminal stage sample sizes for both methods were too small for

more detailed statistical analyses, comparisons of bias, inaccuracy, and SEI within each

method were conducted including terminal stages. For the Moorrees et al. method, there

was a significant difference in bias (Student’s t-test, p=0.002) and inaccuracy (Student’s

t-test, p=0.002) between the mesial and distal roots, with the mesial roots having higher

bias and inaccuracy than the distal roots. The difference in bias (Student’s t-test,

p=0.236) and inaccuracy (Student’s t-test, p=0.226) between teeth 17 and 32 was not

significant. There was no significant difference in average SEI between teeth or roots

(ANOVA, p=0.263). For the Mincer et al. method, there were no significant differences

between bias (ANOVA, p=0.536), inaccuracy (ANOVA, p=0.568), or SEI (ANOVA,

p=0.837) for all four third molars.

Pubic Symphysis

Three methods were used to report age-related pubic symphyseal changes:

Todd (1920, 1921), McKern and Stewart (1957), and Suchey-Brooks. Of these, the

Suchey-Brooks method had the highest correct classification rate (97.9%, Table 4). The

Todd method had the lowest bias (0.10), inaccuracy (1.80), and SEI (6.13) of the pubic

symphyseal methods (see Table 5), but also the smallest sample size (n=10). All three

110

methods have significant linear relationships between known ages-at-death and estimated

ages (Table 6). Results for each method are presented below.

When employing the Todd pubic symphysis age estimation method, CIL

analysts used combined phases 50% of the time. This means that rather than giving a

single phase, five individuals were given age estimates based on a dual-phase

classification, such as three and four. Table 15 shows a breakdown of the Todd method

sample results. Those individuals that were incorrectly classified (n=3, indicated with an

asterisk in Table 15), fell just outside of the given age range. These three individuals also

Table 15. Todd (1920) pubic symphysis method sample (n=10). Age Phase Range Midpoint Bias (Σ) Inaccuracy (Σ) SEI (x̄ ) 24 3 22-24 23 -0.1 0.1 4.17 24 3 22-24 23 -0.1 0.1 4.17 27* 3-4 22-26 24 -0.3 0.3 11.11 25* 5 27-30 28.5 0.35 0.35 14.00 26* 5 27-30 28.5 0.25 0.25 9.62 35 6-7 30-39 34.5 -0.05 0.05 1.43 35 6-7 30-39 34.5 -0.05 0.05 1.43 36 6-7 30-39 34.5 -0.15 0.15 4.17 38 7 35-39 37 -0.1 0.1 2.63 41 8-9 39-50 44.5 0.35 0.35 8.54 ALL - - - 0.10 1.80 6.13

*incorrect classification had the highest bias, inaccuracy, and SEI values in the sample, with the exception of the

individual classified as phases eight and nine. Bias, inaccuracy, and SEI were calculated

using the midpoint of the range. Phases three, four, six, and seven underaged, while five,

eight, and nine overaged. A distribution of bias is given in Figure 26. The distribution

does not conform to the superimposed normal curve, most likely because of the small

111

Figure 26. Distribution of bias for the Todd (1920) pubic symphysis method. sample size. Additionally, because of the overall small sample size, ANOVA examining

possible differences in bias, inaccuracy, and SEI by phase was not conducted. Figure 27

displays the correlation between known age-at-death and estimated age for this method

(r=0.95, see Table 6). This correlation is statistically significant (ANOVA, p=0.000).

JPAC/CIL SOP 3.4 stipulates that the McKern-Stewart pubic symphysis

method should be used for American males who died before 1960 and the Suchey-Brooks

pubic symphysis method for those who died after 1960 (JPAC/CIL 2008). These two

methods have much larger sample sizes than the Todd method, most likely because the

Todd method is not an SOP-stipulated method for age estimation at the CIL. Each of the

two approved methods will be discussed independently below and then compared to one

another.

112

R2 = 0.8982

05

101520253035404550

0 5 10 15 20 25 30 35 40 45

Known Age-at-Death

Est

imat

ed A

ge

Figure 27. Correlation of estimated and known age-at-death for the Todd (1920) pubic symphysis method (n=10).

The McKern-Stewart pubic symphysis method has an overall correct

classification rate of 82.2% (Table 16). This number differs slightly from the figure in

Table 4 because six individuals with composite scores not falling into the original

Table 16. Correct and incorrect classification: McKern-Stewart (1957) pubic symphysis method.

Total Score N # Correct % Correct # Incorrect % Incorrect 0 0 - - - - 1-2 5 3 60.0 2 40.0 3 3 3 100.0 0 0.0 4-5 9 5 55.6 4 44.4 6-7 13 12 92.3 1 7.7 8-9 11 9 81.8 2 18.2 10 6 5 83.3 1 16.7 11-12-13 22 20 90.9 2 9.1 14 1 0 0.0 1 100.0 15 3 3 100.0 0 0.0 Total 73 60 82.2 13 17.8

reference categories were eliminated. It appears that most scores had high correct

classification rates, but that composite scores 1-2, 4-5, and 14 performed poorly. This

113

trend is also reflected in Figure 28, where the known ages of identified males have been

superimposed over age intervals for each composite score.

Figure 28. Comparison of known ages of identified males superimposed over the age intervals for the composite scores of pubic symphysis components as given in the McKern and Stewart (1957) method. Diamonds represent the mean age for each composite score.

Table 17 shows a breakdown of bias, inaccuracy, and SEI by composite score.

The McKern-Stewart pubic symphysis method underaged individuals in all composite

score categories, with the exception of scores 10 and 15. The score of 14 (n=1) also had a

high SEI, related to the incorrect classification of this individual. The method had an

average inaccuracy of 2.68 years and score 15 had the highest inaccuracy (9.28 years).

Bias is normally distributed (Figure 29). Figure 30 displays the correlation between

estimated and known age-at-death (r=0.79, see Table 6). This correlation is statistically

114

Table 17. Error of McKern-Stewart (1957) pubic symphysis method by composite score group.

Total Score N Bias (Σ) Inaccuracy (Σ) SEI (x̄ ) 0 0 - - - 1-2 5 -0.93 1.51 9.05 3 3 -0.71 1.24 2.09 4-5 9 -1.23 1.78 9.86 6-7 13 -1.03 1.65 5.95 8-9 11 -2.66 3.05 9.01 10 6 0.03 2.11 6.96 11-12-13 22 -0.88 3.22 11.01 14 1 -1.86 1.86 49.33 15 3 1.28 9.28 14.11 ALL 73 -1.07 2.68 9.49

significant (ANOVA, p=0.000). There was a striking decrease in sample size beyond the

age of 30.

Figure 29. Distribution of bias for the McKern-Stewart (1957) pubic symphysis method.

115

R2 = 0.629

0

5

10

15

20

25

30

35

40

45

0 10 20 30 40 50 60

Known Age-at-Death

Es

tim

ate

d A

ge

Figure 30. Correlation of estimated and known age-at-death for the McKern-Stewart (1957) pubic symphysis method (n=73).

Statistical tests of significance could only be run between composite score

groups 6-7, 8-9, and 11-12-13 due to small sample sizes for all other groups. A one-way

ANOVA revealed a significant difference in bias (p=0.031) and inaccuracy (p=0.040)

between these groups, but no significant difference in mean SEI (p=0.164). ANOVA run

with the Bonferroni correction indicated that significant differences in bias occurred

between composite score groups 8-9 and 11-12-13 (p=0.029), but not between 6-7 and

either of those groups. While the difference in inaccuracy between composite score

groups was significant, there was no significant difference between groups at α≤0.05

when ANOVA was run with the Bonferroni correction. The values for composite score

group 8-9 approach significance when compared to both the 6-7 group (p=0.062) and the

11-12-13 group (p=0.076), indicating that the problem of inaccuracy may lie in the 8-9

group.

116

The Suchey-Brooks pubic symphysis method had the highest correct

classification of all three pubic symphysis methods (97.7%, Table 18). The percentages

in Table 18 do not include multiple phase designations and thus differ slightly from the

Table 18. Correct and incorrect classification: Suchey-Brooks pubic symphysis method.

Phase N # Correct % Correct # Incorrect % Incorrect 1 12 10 83.3 2 16.7 2 20 20 100.0 0 0.0 3 22 22 100.0 0 0.0 4 29 29 100.0 0 0.0 5 3 3 100.0 0 0.0 6 0 - - - -

Total 86 84 97.7 2 2.3 correct classification percentages given in Table 4. Phases two through five have 100%

correct classification and only two individuals scored as a phase one were incorrectly

classified. No individuals were observed in phase six in the JPAC/CIL identified sample.

While close to 100% correct classification is impressive, it is also important to recognize

that the age intervals given by the Suchey-Brooks method are very large, which is

reflected in Figure 31.

Error broken down by each phase of the Suchey-Brooks pubic symphysis is

given in Table 19. Samples sizes for the first four phases are adequate. The sample size

for phase five is small and no individuals in the JPAC/CIL sample were placed in phase

six. Overall, the method had a small tendency to overage. When broken down by phase,

phase one is the only phase that underaged. Phase five exhibited the highest positive bias

value, indicating that it overaged more than the other phases. Additionally, phase five

also had the largest inaccuracy and SEI of the phases observed in this sample. Inaccuracy,

117

Figure 31. Comparison of known ages of identified males superimposed over the age intervals for the six phases of the pubis symphysis as given in the Suchey-Brooks pubic symphysis method. Diamonds represent the mean age for each phase. Table 19. Error of Suchey-Brooks pubic symphysis method by phase.

Phase N Bias (Σ) Inaccuracy (Σ) SEI (x̄ ) 1 12 -2.27 3.33 11.84 2 20 0.23 2.52 9.46 3 22 1.33 2.83 12.72 4 29 1.49 5.02 16.51 5 3 5.27 7.27 17.00 6 0 - - -

Total 86 0.76 3.72 13.51

118

bias, and SEI increased with each phase after phase two. Bias is normally distributed

(Figure 32). The Suchey-Brooks pubic symphysis method has a correlation of r=0.80

(see Table 6) between estimated and known age-at-death (Figure 33). This correlation is

statistically significant (ANOVA, p=0.000).

Figure 32. Distribution of bias for the Suchey-Brooks pubic symphysis method.

Statistical tests of significance of error between phases revealed that there is a

significant difference in bias (ANOVA, p=0.004) and inaccuracy (ANOVA, p=0.006)

between the first four phases of the Suchey-Brooks method. Phase five was eliminated

from analyses because of its small sample size. Running ANOVA with the Bonferroni

correction revealed that differences in bias and inaccuracy occured between phase one

and phases two through four (Tables 20 and 21). Phase one is the only phase that had a

119

R2 = 0.633

0

10

20

30

40

50

60

0 10 20 30 40 50 60

Known Age-at-Death

Est

imat

ed A

ge

Figure 33. Correlation of estimated and known age-at-death for the Suchey-Brooks pubic symphysis method (n=86).

Table 20. P-values from ANOVA with Bonferroni correction between the first four phases of the Suchey-Brooks method: bias.

Phase 1 2 3 4 1 N/A - - - 2 0.045* N/A - - 3 0.005* 1.000 N/A - 4 0.005* 1.000 1.000 N/A

*p≤0.05

Table 21. P-values from ANOVA with Bonferroni correction between the first four phases of the Suchey-Brooks method: inaccuracy.

Phase 1 2 3 4 1 N/A - - - 2 0.009* N/A - - 3 0.009* 1.000 N/A - 4 0.109 1.000 1.000 N/A

*p≤0.05

120

negative bias value, indicating that it underaged individuals assigned to this phase. Even

though the difference in inaccuracy between phase one and phase four is not significant,

the p-value (p=0.109) is not as high as the p-values for comparisons of phases two, three,

and four. Inaccuracy is highest overall for phase four. There was no significant difference

in SEI between the first four phases (ANOVA, p=0.189).

The McKern-Stewart and Suchey-Brooks method samples were broken down

into five-year age intervals in order to compare their performance in the JPAC/CIL

identified sample. Bias and inaccuracy were calculated for each five-year interval based

on the sample size for that interval. Results are shown in Table 22. For the youngest age

Table 22. Comparison of bias and inaccuracy: McKern-Stewart and Suchey-Brooks pubic symphysis methods. Known Age McKern-Stewart Suchey-Brooks 16-20 7 5

Bias 0.05 -2.30 Inaccuracy 0.72 2.30

21-25 30 31 Bias 0.54 -0.39

Inaccuracy 2.17 2.92 26-30 22 21

Bias -1.34 1.55 Inaccuracy 2.39 2.57

31-35 9 10 Bias -3.79 4.00

Inaccuracy 3.79 5.32 36-40 2 11

Bias -1.41 -0.84 Inaccuracy 5.41 6.00

41+ 3 8 Bias -9.27 3.25

Inaccuracy 9.27 5.60 All Ages 73 86

Bias -1.07 0.76 Inaccuracy 2.68 3.72

121

class (16-20), the McKern-Stewart method performed better than Suchey-Brooks, with a

very low bias and inaccuracy. For individuals between the ages of 21 and 25, both

methods performed equally as well, but the McKern-Stewart method slightly overaged

individuals in this category while the Suchey-Brooks method slightly underaged. For the

next two age classes (26-30, 31-35), both methods performed similarly to one another,

except that the McKern-Stewart method underaged these individuals and the Suchey-

Brooks method overaged them. From 36-40, both methods underaged individuals in the

age class, with a similar average number of years of inaccuracy. Finally, the 41+ category

had the greatest discrepancy between the two methods in both bias and inaccuracy. This

group also had the smallest sample sizes, which is related to the age distribution of the

JPAC/CIL identified sample. Additionally, the Suchey-Brooks method had a higher

standard error than the McKern-Stewart method, but both methods have almost identical

correlation coefficients when comparing estimated and known age-at-death (r=0.80,

r=0.79, respectively, see Table 6).

Auricular Surface

Three methods were used to estimate age based on the auricular surface:

Lovejoy et al. (1985b), Osborne et al. (2004), and Buckberry and Chamberlain (2002).

SOP 3.4 of the JPAC/CIL laboratory manual stipulates the use of the Buckberry-

Chamberlain method except for very young individuals or where only a partial auricular

surface is present (JPAC/CIL 2008). When employing the Lovejoy et al. method, the

statistics from Osborne et al. (2004) should be used in place of the age intervals provided

in the original Lovejoy et al. method (JPAC/CIL 2008). Even so, analysts often report

both the Lovejoy et al. phase along with the Osborne et al. statistics. This practice allows

122

for the analysis of performance of both methods and results for all auricular surface

methods are presented below. No comparison will be made between the original and

revised methods because the sample size for the Buckberry-Chamberlain method is too

small.

Of the 147 individuals in the Lovejoy et al. sample, 53 were placed in multiple

phases by the analysts. The results in Tables 4 and 5 include all individuals. Table 23

shows a breakdown of all phase assignments as recorded by the analysts and Table 24

includes only those phases originally defined in Lovejoy et al. (1985b). Comparison of

these two tables shows that a higher correct classification rate is obtained when assigning

Table 23. Correct and incorrect classification: Lovejoy et al. (1985b) auricular surface method. Phase Range Midpoint N # Correct % Correct # Incorrect % Incorrect

1 20-24 22 27 22 81.5 5 18.51-2 20-29 24.5 15 14 93.3 1 6.71-3 20-34 27 3 3 100.0 0 0.01-4 20-39 29.5 4 4 100.0 0 0.01-5 20-44 32 1 1 100.0 0 0.0

2 25-29 27 35 20 57.1 15 42.92-3 25-34 29.5 18 10 55.6 8 44.42-4 25-39 32 1 1 100.0 0 0.0

3 30-34 32 20 4 20.0 16 80.03-4 30-39 34.5 3 3 100.0 0 0.03-5 30-44 37 3 2 66.7 1 33.3

4 35-39 37 6 4 66.7 2 33.34-5 35-44 39.5 3 3 100.0 0 0.0

5 40-44 42 6 3 50.0 3 50.05-6 40-49 44.5 1 0 0.0 1 100.05-7 40-59 49.5 1 1 100.0 0 0.0

6 45-49 47 0 - - - -7 50-59 54.5 0 - - - -8 60+ N/A 0 - - - -

ALL - - 147 95 65.6 52 35.4

123

Table 24. Correct and incorrect classification for single phases only: Lovejoy et al. (1985b) auricular surface method. Phase N # Correct % Correct # Incorrect % Incorrect

1 27 22 81.5 5 18.5 2 35 20 57.1 15 42.9 3 20 4 20.0 16 80.0 4 6 4 66.7 2 33.3 5 6 3 50.0 3 50.0 6 0 - - - - 7 0 - - - - 8 0 - - - -

Total 94 53 56.4 41 43.6 individuals to multiple phases. In fact, 100% correct classification is achieved only when

two or more phases are used. The original phases had a correct classification rate of only

56.4% and this problem is again reflected in Figure 34.

Figure 34. Comparison of known ages of identified males superimposed over the age intervals for the eight phases of the auricular surface as given in the Lovejoy et al. (1985b) auricular surface method.

124

Error for the Lovejoy et al. method was calculated for all 147 individuals by

comparing the known age-at-death with the midpoint of the assigned phase since the

reference method does not provide descriptive statistics for the phases (i.e., no confidence

intervals). Overall, the method had a tendency to overage, as indicated by the mainly

positive bias values for all phases (Table 25). Multiple-phase designations generally have

a slightly higher bias, inaccuracy, and SEI values than single-phase designations. Results

Table 25. Error of Lovejoy et al. (1985b) auricular surface method by phase. Phase N Bias (Σ) Inaccuracy (Σ) SEI (x̄ )

1 27 -0.04 1.74 7.59 1-2 15 0.50 1.83 7.57 1-3 3 1.67 1.67 6.62 1-4 4 1.00 4.25 14.95 1-5 1 -4.00 4.00 11.11 2 35 1.80 2.66 11.33

2-3 18 3.39 4.39 18.38 2-4 1 2.00 2.00 6.67 3 20 3.30 5.00 19.28

3-4 3 2.83 2.83 9.25 3-5 3 3.67 4.33 16.88 4 6 3.00 3.67 13.29

4-5 3 0.50 1.50 3.85 5 6 5.00 5.00 15.74

5-6 1 8.50 8.50 23.61 5-7 1 -3.50 3.40 6.60 6 0 - - - 7 0 - - - 8 0 - - - ALL 147 1.89 3.16 12.40

for single phases are presented in Table 26 to separate the original phases as given in the

reference method from multiple phases as assigned by the analysts. Phase five had the

125

Table 26. Error of Lovejoy et al. (1985b) auricular surface method: single phases only. Phase N Bias (Σ) Inaccuracy (Σ) SEI (x̄ )

1 27 -0.04 1.74 7.59 2 35 1.80 2.66 11.33 3 20 3.30 5.00 19.28 4 6 3.00 3.67 13.29 5 6 5.00 5.00 15.74 6 0 - - - 7 0 - - - 8 0 - - -

ALL 94 1.87 3.11 12.35 highest bias, phases three and five had the highest inaccuracy, and phase three had the

highest SEI. Distribution of bias for the Lovejoy et al. auricular surface method is not

normally distributed around zero (Figure 35) and the distribution is skewed to the right.

The method has a Pearson’s r of 0.78 (see Table 6), also represented in Figure 36. The

Figure 35. Distribution of bias for the Lovejoy et al. (1985b) auricular surface method.

126

R2 = 0.6148

0

10

20

30

40

50

60

0 10 20 30 40 50 60

Known Age-at-Death

Est

imat

ed A

ge

Figure 36. Correlation of estimated and known age-at-death for the Lovejoy et al. (1985b) auricular surface method (n=147).

correlation between known and estimated age-at-death is statistically significant

(ANOVA, p=0.000).

One-way ANOVAs were run to compare bias, inaccuracy, and SEI of the first

three phases of the Lovejoy et al. auricular surface method. Multiple-phase designations

were not compared to single-phases or each other as they are uninformative concerning

the performance of the original method and its phases, and phases four and five were not

included in analyses because of small sample sizes. Results from three separate ANOVA

tests indicated that there are statistically significant differences in bias (p=0.001),

inaccuracy (p=0.000), and SEI (p=0.001) between the first three phases observed in the

JPAC/CIL identified sample. The Bonferroni correction for multiple comparisons

revealed that phase three differed significantly from phases one and two in bias,

inaccuracy, and SEI, but that phases one and two were not significantly different from

one another (Tables 27, 28, 29). Phase three had larger error values than phases one and

two.

127

Table 27. P-values from ANOVA with Bonferroni correction between the first three phases of the Lovejoy et al. (1985b) method: bias. Phase 1 2 3

1 N/A - - 2 0.441 N/A - 3 0.000* 0.015* N/A

*p≤0.05 Table 28. P-values from ANOVA with Bonferroni correction between the first three phases of the Lovejoy et al. (1985b) method: inaccuracy. Phase 1 2 3

1 N/A - - 2 1.000 N/A - 3 0.000* 0.000* N/A

*p≤0.05 Table 29. P-values from ANOVA with Bonferroni correction between the first three phases of the Lovejoy et al. (1985b) method: SEI. Phase 1 2 3

1 N/A - - 2 0.515 N/A - 3 0.001* 0.027* N/A

*p≤0.05

128

The Osborne et al. auricular surface statistics were applied to all individuals

that were aged with the Lovejoy et al. method. Additionally, four individuals were aged

with only the Osborne et al. method, giving a total sample size of 151 individuals.

However, multiple-phase classifications were removed from further analyses because

there are no descriptive statistics for multiple phases in the reference method and

multiple-phase estimations produced extremely imprecise age estimates, e.g., phases one

through three gives an estimate of less than or equal to sixty-nine years old. All phases

except phase one had 100% correct classification in the JPAC/CIL sample (Table 30).

These high rates of correct classification are also shown in Figure 37, along with the large

age intervals that are associated with the six phases of the Osborne et al. auricular surface

method.

Table 30. Correct and incorrect classification: Osborne et al. (2004) auricular surface method. Phase N # Correct % Correct # Incorrect % Incorrect

1 79 71 89.9 8 10.1 2 21 21 100.0 0 0.0 3 6 6 100.0 0 0.0 4 7 7 100.0 0 0.0 5 0 - - - - 6 0 - - - -

Total 113 105 92.9 8 7.1

The Osborne et al. auricular surface method had an overall tendency to

underage (bias=-0.59) with an average inaccuracy of 3.93 years (Table 31). This method

had a higher overall SEI than the Lovejoy et al. method. Additionally, inaccuracy and

SEI increased with each subsequent phase. Bias increased after phase two, with phase

129

Figure 37. Comparison of known ages of identified males superimposed over the age intervals for the six phases of the auricular surface as given in the Osborne et al. (2004) auricular surface method. Diamonds represent the mean age for each phase.

Table 31. Error of Osborne et al. (2004) auricular surface method by phase. Phase N Bias (Σ) Inaccuracy (Σ) SEI (x̄ )

1 79 -2.68 3.02 11.71 2 21 0.98 3.88 14.38 3 6 8.00 8.00 26.66 4 7 10.94 10.94 31.88 5 0 - - - 6 0 - - -

Total 113 -0.59 3.93 14.25

130

one underaging and phases two through four increasingly overaging. This trend is also

apparent in Figure 37, in which all individuals in phases three and four have known ages-

at-death below the given means for the phases. The distribution of bias for the Osborne et

al. method is slightly skewed to the right and does not fully conform to the normal

distribution (Figure 38). This method also had a lower correlation between known and

estimated age-at-death than the Lovejoy et al method (r=0.74, see Table 6 and Figure

39), but this correlation is still statistically significant (ANOVA, p=0.000).

Figure 38. Distribution of bias for the Osborne et al. (2004) auricular surface method.

Only phases one and two had adequate sample sizes for statistical tests of

significance. Therefore, Student’s t-tests were run to compare bias, inaccuracy, and SEI

between phases one and two. There were significant differences in bias (p=0.004) and

131

R2 = 0.5481

0

10

20

30

40

50

60

0 5 10 15 20 25 30 35 40 45

Known Age-at-Death

Est

imat

ed A

ge

Figure 39. Correlation of estimated and known age-at-death for the Osborne et al. (2004) auricular surface method (n=113). inaccuracy (p=0.000) between phases one and two, but no significant differences in SEI

(p=0.259). Phase one had a negative bias value, while phase two had a positive bias

value; phase two had a higher average error in number of years (i.e., inaccuracy). For

phases three and four (which were not statistically tested), individuals from the

JPAC/CIL sample placed into these phases had known ages-at-death lower than the

respective phase means.

The Buckberry-Chamberlain revised auricular surface method had 100%

correct classification in the JPAC/CIL sample. However, it also had the highest overall

bias, inaccuracy, and SEI of all methods employed at the CIL2 (Table 5). Only three

stages were recorded (stages one, two, and five) and one individual was classified as

between stages three and four. Because of the 100% correct classification and the small

sample size (n=10), this method was tested with multiple observers (see Chapter VII).

Table 32 gives a summary for all individuals aged with the Buckberry-Chamberlain

method. All observed stages overaged, with the exception of stage one. Figure 40 shows

2 This compariason is made using the adjusted dental formation error values (see Table 12).

132

Table 32. Buckberry-Chamberlain (2002) auricular surface method sample (n=10). Age Score Stage Range Mean Bias (Σ) Inaccuracy (Σ) SEI (x̄ )

19 5 1 16-19 17.33 -0.19 0.19 8.79 21 8 2 21-38 29.33 0.93 0.93 39.67 25 8 2 21-38 29.33 0.48 0.48 17.32 26 8 2 21-38 29.33 0.37 0.37 12.81 22 NR 2 21-38 29.33 0.81 0.81 33.33 27 NR 2 21-38 29.33 0.26 0.26 8.63 22 NR 2 21-38 29.33 0.81 0.81 33.32 25 NR 2 21-38 29.33 0.48 0.48 17.32 30 NR 3-4 16-81 N/A N/A N/A N/A 37 13 5 29-88 59.94 2.55 2.55 62.00

ALL - - - - 6.51 6.88 25.91 NR=not reported that the younger stages performed well, even with a limited sample size. No distribution

of bias is depicted because of the small sample size. The method has a strong correlation

between estimated and known age-at-death (r=0.92, see Table 6 and Figure 41). This

correlation is statistically significant (ANOVA, p=0.000). ANOVA was not conducted

between stages because of the small sample size of each stage.

Sternal Rib End

Age estimation using the sternal rib end exclusively referenced methods

published by Iscan and colleagues. Correct and incorrect classifications were calculated

using the 95% confidence intervals (CI) provided in their 1984 Journal of Forensic

Sciences publication (Table 33). This table includes multiple-phase designations as given

by CIL analysts; seven of 21 individuals were assigned to dual phases. Ranges for more

than one phase were derived by using the minimum age for the lowest phase and the

maximum age for the highest phase to compare classification of individuals into single or

multiple phases. The 100% correct classifications only occurred when individuals were

133

Figure 40. Comparison of known ages of identified males superimposed over the age intervals for the seven stages of the auricular surface as given in the Buckberry-Chamberlain (2002) revised auricular surface method. Diamonds represent the mean age for each stage.

R2 = 0.8538

0

10

20

30

40

50

60

70

0 5 10 15 20 25 30 35 40

Known Age-at-Death

Est

imat

ed A

ge

Figure 41. Correlation of estimated and known age-at-death for the Buckberry-Chamberlain (2002) revised auricular surface method (n=9).

134

Table 33. Correct and incorrect classification: Iscan et al. (1984b) sternal rib end method. Phase 95% CI N # Correct % Correct # Incorrect % Incorrect

1 16.5-18.0 2 1 50.0 1 50.0 1+2 16.5-23.1 1 1 100.0 0 0.0

2 20.8-23.1 7 6 85.7 1 14.3 2+3 20.8-27.7 2 1 50.0 1 50.0

3 24.1-27.7 3 1 33.3 2 66.7 4 25.7-30.6 2 1 50.0 1 50.0

4+5 25.7-42.3 4 4 100.0 0 0.0 5 34.4-42.3 0 0 - 0 - 6 44.3-55.7 0 0 - 0 - 7 54.3-64.1 0 0 - 0 - 8 65.0-78.0 0 0 - 0 -

Total 21 15 71.4 6 28.6 assigned to two phases. Nawrocki (n.d.) called into question the confidence intervals

originally published by Iscan and colleagues, saying that they are far too small. Table 34

presents correct and incorrect classifications using the prediction intervals (PI) given by

Table 34. Correct and incorrect classification using Nawrocki (n.d.) prediction intervals. Phase 95% PI N # Correct % Correct # Incorrect % Incorrect

1 15.5-19.1 2 1 50.0 1 50.0 2 17.2-26.6 7 7 100.0 0 0.0 3 18.3-33.5 3 3 100.0 0 0.0 4 19.4-37.0 2 2 100.0 0 0.0 5 23.2-54.5 0 - - - - 6 25.6-74.4 0 - - - - 7 38.4-80.0 0 - - - - 8 48.0-95.0 0 - - - -

Total 14 13 92.9 1 7.1 Nawrocki (n.d.). This table does not include multiple-phase designations. Correct

classification increased from 71.4% to 92.9% with the modified age intervals. Figure 42

shows the known ages-at-death of individuals aged with the sternal rib end method; the

Iscan et al. (1984b) CI are represented by a solid line and the Nawrocki (n.d.) PI are

135

Figure 42. Comparison of known ages of identified males superimposed over the age intervals for the eight phases of the sternal rib end as given in the Iscan et al. (1984b) sternal rib end method (solid rectangles) and prediction intervals as calculated by Nawrocki (dashed rectangles). Diamonds represent the mean age for each phase. represented by a dashed line. Means of intervals for both Iscan et al. (1984b) and

Nawrocki (n.d.) are the same and are represented by a diamond. The Nawrocki (n.d.)

intervals are much larger than the Iscan et al. (1984b) intervals.

Bias, inaccuracy, and SEI by reported phase are given in Table 35. The

method has an overall tendency to underage; only phase three has a positive bias value.

Average error in years (inaccuracy) was low for all four reported phases. However, phase

one had a high SEI compared to other phases in the method and other methods used in

the JPAC/CIL sample. No statistical comparisons were made between phases since

sample sizes for each phase are small. Distribution of bias does not conform to the

136

Table 35. Error of Iscan et al. (1984b) sternal rib end method by phase.

Phase N Bias (Σ) Inaccuracy (Σ) SEI (x̄ ) 1 2 -3.70 0.53 15.90 2 7 -0.53 0.39 3.42 3 3 2.23 0.48 9.61 4 2 -1.30 0.36 8.16 5 0 - - - 6 0 - - - 7 0 - - - 8 0 - - -

Total 14 -0.5 1.76 7.21 normal curve (Figure 43), but this is most likely due to the small sample size. The Iscan

et al. sternal rib end method has the lowest correlation between estimated and known age-

at-death (r=0.71) of all methods listed in Table 6 (see also Figure 44). However, this

correlation is still statistically significant (ANOVA, p=0.004).

Summary

This chapter presented results from all age estimation methods employed at

the JPAC/CIL between 1972 and 31 July 2008. The age distributions of the total sample

and each individual sub-sample were given, followed by a comparison of methods to one

another and classification rates and error values for each individual method, where

applicable. The next chapter will further examine the reliability of three of the above

methods.

137

Figure 43. Distribution of bias for the Iscan et al. (1984b) sternal rib end method.

R2 = 0.5038

0

5

10

15

20

25

30

35

0 5 10 15 20 25 30 35

Known Age-at-Death

Est

imat

ed A

ge

Figure 44. Correlation of estimated and known age-at-death for the Iscan et al. (1984b) sternal rib end method (n=14).

138

CHAPTER VII

RESULTS II: INTEROBSERVER ERROR

STUDY

A total of 39 individuals voluntarily aged two skeletal samples for each of the

following age estimation methods: Buckberry and Chamberlain (2002) revised auricular

surface, Iscan et al. (1984b) sternal rib end for males, and Mann et al. (1991) maxillary

suture closure. A summary of self-reported background information is presented below,

followed by preliminary results for each method. These results include: distribution of

stages, phases, or suture obliteration, and correlation of experience with these estimates.

Participants

All participants reported their field of study as anthropology or a sub-

discipline of anthropology (Figure 45). The category “other anthropology/combination”

includes the self-reported categories of applied anthropology, bioarchaeology, physical

anthropology/archaeology, and physical/forensic anthropology. Four participants did not

report their field of study.

Approximately half of the participants reported having obtained a Master’s

degree as their highest level of education (Figure 46). The next largest group was made

up of those individuals with a Bachelor’s degree and the smallest group was those

individuals with a Doctorate. Two of 39 participants are Diplomates of the American

139

Field of Study(n =35)

11

8

12

4

Anthropology

Forensic Anthropology

Physical Anthropology

Other Anthropology/Combination

Figure 45. Participants’ self-reported fields of study.

Highest Degree Obtained(n =38)

14

18

6

Bachelor

Master

Doctorate

Figure 46. Participants’ self-reported highest degrees obtained.

140

Board of Forensic Anthropology (ABFA). One individual did not report his or her highest

degree obtained.

More than half of the participants had less than four years of experience in

skeletal age estimation (Figure 47) and had analyzed under 100 skeletons (Figure 48).

Approximately one-third of the participants had between five and nine years of

Experience with Skeletal Aging (in years)(n =38)

21

10

7

0-4

5--9

10+

Figure 47. Participants’ self-reported years of experience with skeletal aging. experience and the remaining participants had over ten years of experience in skeletal

aging. Only three individuals had analyzed over 1000 skeletons and a little over one-third

of the participants had analyzed between 100 and 1000 skeletons. One individual did not

give an estimate of years of experience and another individual did not give an

approximate number of skeletons analyzed. Overall, participants were largely graduate

141

Number of Skeletons Analyzed(n =38)

2114

3

<100

100-1000

>1000

Figure 48. Participants’ self-reported approximate number of skeletons analyzed.1

students enrolled in both MA and PhD programs. A limited number of individuals who

had already completed their graduate studies participated in this study.

Method Performance

Buckberry-Chamberlain Revised Auricular Surface

Approximately half of the participants (48.7%) had never used the Buckberry-

Chamberlain revised auricular surface method before, while the remaining participants

had (51.3%). Of those people that were familiar with the method, 26.3% use it on a

regular basis and 73.7% do not. Self-reported level of comfort with the method was

generally low, with the modal score being one, or “very low.” The median of all self-

1 “Number of skeletons analyzed” was not used for further comparisons of experience and error due to the very small sample size for individuals who had analyzed over 1000 skeletons. Additionally, the total number of skeletons analyzed may not be the best proxy for experience with specific methods, such as skeletal age estimation.

142

reported “comfort scores” was two. Several individuals included comments indicating

that they were familiar with the original auricular surface method as published by

Lovejoy and colleagues (1985b).

In general, participants correctly used the statistics given in the reference

method to assign age point estimates and stages based on composite scores, reporting

either the mean or median of the assigned stage. For sample A, participants were on

average 58.1% sure that their observations corresponded to the correct composite score

and 60.7% sure for sample B. There was no significant difference in level of confidence

between the samples (Student’s t-test, p=0.579). Although the sample sizes for each stage

were too small for statistical analyses, those individuals with higher self-reported

confidence levels were not more likely to assign the consensus stage (Table 36).

Table 36. Percent confidence in assigned composite score by stage: samples A and B.

Sample A Sample B Stage N % Confidence in

Assigned Score (x̄ ) Stage N % Confidence in

Assigned Score (x̄ ) 1 1 60.0 1 2 60.0 2 7 59.3 2 3 61.7

*3 14 61.1 3 0 - 4 8 51.9 4 4 57.5 5 6 55.8 5 5 65.5 6 1 70 *6 11 57.7 7 0 - 7 11 62.5

Total 37 58.1 Total 36 60.7 * indicates median phase assigned.

The distribution of stages assigned by the participants for sample A is given in

Figure 49. One individual did not assign a stage and a second person placed the sample in

both stages III and IV. The most frequent stage assignment was clearly stage III. This

143

0

2

4

6

8

10

12

14

16

I II III IV V VI

Count of Stage

Stage Figure 49. Distribution of assigned stages for sample A (n=37). stage was reported by 14 of 37 participants, but this was still less than 50% of all

participants. There was a relatively large variation in stage assignment, extending most

notably from stages II through V, although one individual classified this sample as a

stage I and a second classified it as a stage VI. Given that the method already has very

large age intervals per stage, the imprecision shown here is remarkable; assigning an

individual to stages II through V would give an estimated age range of 21-88 years.

However, the distribution of stage assignment closely mimics the normal curve.

Based on stage III being the most frequent stage assignment for sample A,

statistics for this stage were used to calculate the SEI as related to both the highest degree

obtained and number of years of experience with skeletal age estimation. The mean age

144

for stage III is 37.86 years, the median age is 37 years, and the range is 16-65 years; the

SEI was calculated using the median age. Those individuals with doctorates had a higher

mean SEI than other groups (Table 37). It was not possible to run ANOVA between all

groups due to the small sample size for the “doctorate” group. A Student’s t-test between

“bachelor” and “master” groups revealed no significant difference in SEI (p=0.940).

When compared by years of experience, the 10+ group also had a higher mean SEI than

other groups (Table 38). No statistical tests of significance were run due to the small

sample sizes of both the 5-9 and 10+ groups.

Table 37. SEI by highest degree obtained: samples A and B. Sample A Sample B Degree N SEI N SEI Bachelor 14 27.80 14 23.27 Master 17 27.03 17 13.47 Doctorate 5 35.14 5 6.667 ALL 36 28.41 36 16.26

Table 38. SEI by years of experience in skeletal aging: samples A and B. Sample A Sample B Years of Experience in Skeletal Aging

N SEI N SEI

0-4 21 26.90 21 17.10 5-9 9 27.03 9 22.05 10+ 7 34.75 7 6.28 ALL 37 28.48 37 16.26

145

The distribution of stages for sample B is not normally distributed like that of

sample A (Figure 50), most likely because this sample appears to fall at or near the

maximum stages for the method. One individual did not assign a stage to the sample and

a second person assigned both stages IV and V to the sample. Stages VI and VII were

assigned 11 times each and the overall distribution for sample B is more varied than for

sample A, with all stages of the reference method represented at least once.

0

2

4

6

8

10

12

I II III IV V VI VII

Count of Stage

Stage Figure 50. Distribution of assigned stages for sample B (n=37).

Since both stages VI and VII were assigned the same number of times, the

SEI was calculated based on the median of all stages assigned, which was stage VI. The

mean age for stage VI is 66.71 years, the median age is 66 years, and the range is 39-91

146

years; the SEI was calculated using the median age. For sample B, individuals with a

bachelor’s degree had the highest mean SEI (Table 37). The small sample size of the

“doctorate” group precluded comparison of all three groups by ANOVA, but a Student’s

t-test revealed that there was no significant difference in SEI between “bachelor” and

“master” groups (p=0.258). Those with five to nine years of experience in skeletal aging

had the highest mean SEI (Table 38) and no other statistical tests of significance were run

because of the small sample sizes for the 5-9 and 10+ groups.

Iscan et al. Sternal Rib End

All but one of the participants had used the Iscan et al. sternal rib end method

prior to this study. Of those people that were familiar with the method, 42.1% use it on a

regular basis and 57.9% do not. The median of self-reported level of comfort with the

method was three, or “medium.” Of the three methods tested, participants were most

comfortable with the sternal rib end method.

Very few of the participants used the age ranges published in the reference

article and instead used the ranges given with the sternal rib end casts. This is not

problematic because all participants indicated what phase they had chosen based on the

cast set and these are the same because they are based on the article. Additionally, as seen

in Chapter VI, the confidence intervals assigned by Iscan and colleagues are far too small

to be statistically valid for both the reference article and the casts set. Therefore, the

published age ranges may be of limited utility. Two participants mentioned this

phenomenon, also adding that some of the age categories could probably be condensed. It

should also be noted that some participants may have used the female age intervals or

exemplars even though the questionnaire clearly stated “male.” The assigned phases were

147

used for all analyses, rather than the assigned age estimates, which were generally given

as intervals.

For sample C, participants were on average 77.5% sure that their observations

corresponded to the correct phase and 73.5% sure for sample D. The most frequent

percentage given for sample C was 90% and for sample D it was 80%, indicating that

several especially low percentages may be affecting the overall confidence mean. There

was no significant difference in level of confidence between the samples (Student’s t-test,

p=0.275). Although the sample sizes for each stage were too small for statistical analyses,

those individuals with higher self-reported confidence levels were not more likely to

assign the consensus phase (Table 39).

Table 39. Percent confidence in assigned phase by stage: samples C and D.

Sample C Sample D Phase

N % Confidence in

Assigned Phase (x̄ ) Phase

N % Confidence in

Assigned Phase (x̄ ) 1 2 90.0 1 0 - 2 13 78.8 2 1 80.0

*3 13 77.5 3 1 50.0 4 7 74.3 4 1 80.0 5 1 85.0 5 6 78.3 6 1 50.0 6 8 72.5 7 0 - *7 15 72.5 8 0 - 8 2 75.0

Total 37 77.5 Total 34 73.5 * indicates median phase assigned.

The distribution of assigned phases for sample C is concentrated mainly on

phases two through four (Figure 51). The distribution approaches normal and has a

median phase assignment of three. Two participants classified the sample in dual-phases

148

0

2

4

6

8

10

12

14

1 2 3 4 5 6

Count of Phase

Phase Figure 51. Distribution of assigned phases for sample C (n=37). and these data points are not included here. Given the distribution seen in Figure 51 and

the age ranges given in Iscan et al. (1984b) for phases two (18-25 years) and three (19-33

years), condensing these two phases should be considered. The phases given with the cast

set are even smaller: phase two – 20-23 years and phase three – 24-28 years.

The SEI for sample C was calculated based on statistics for phase three of the

Iscan et al. sternal rib end method. The mean for this phase is 25.9 years. Multiple-phase

designations were eliminated from these analyses, as were individuals who did not report

their highest degree obtained or the number of years of experience with skeletal age

estimation. Individuals who have their doctorates had the lowest mean SEI of all groups

(Table 40) but the sample size for this group was too small to run ANOVA between the

149

Table 40. SEI by highest degree obtained: samples C and D. Sample C Sample D Degree N SEI N SEI Bachelor 14 15.80 14 11.29 Master 16 13.72 14 21.97 Doctorate 6 2.96 5 12.60 ALL 36 12.69 33 16.02

three groups. A Student’s t-test revealed that there was no significant difference between

“bachelor” and “master” groups (p=0.755). Those individuals who had ten or more years

of experience also had the lowest mean SEI (Table 41). No statistical tests of significance

were run due to small sample sizes for two of the three “years of experience” groups.

Table 41. SEI by years of experience in skeletal aging: samples C and D. Sample C Sample D Years of Experience in Skeletal Aging

N SEI N SEI

0-4 20 15.48 20 16.47 5-9 9 12.66 7 13.80 10+ 7 5.68 6 9.21 ALL 36 12.87 33 14.58

The distribution for sample D clusters in the upper phases generally between

phases five and seven (Figure 52). Sample D was assigned to multiple phases five times

and these data points are not represented here. Phase seven was the most frequently

assigned phase; it was assigned 15 times. The distribution has a median phase of 6.5, and

is skewed to the left. The age ranges given in the 1984b publication are larger than the

ranges reported by the majority of the participants for sample D, reflecting the

discrepancy between ranges reported in different sources.

150

0

2

4

6

8

10

12

14

16

2 3 4 5 6 7 8

Count of Phase

Phase

Figure 52. Distribution of assigned phases for sample D (n=34).

The mean for phase seven (59.2 years) was used to calculate the SEI for

sample D. These calculations do not include multiple-phase designations or individuals

who did not report degree or years of experience. Individuals who held a bachelor’s as

their highest degree had the lowest mean SEI, though the mean SEI for the doctorate

group was very close (Table 40). ANOVA was not run because of the small sample size

for the “doctorate” group and a Student’s t-test revealed no significant difference between

“bachelor” and “master” groups (p=0.098). Individuals with ten or more years of

experience had a lower mean SEI than both other groups (Table 41). No statistical tests of

significance were run because of small sample sizes in the 5-9 and 10+ groups.

151

Mann et al. Maxillary Sutures

Approximately half of the participants (48.7%) had never used the Mann et al.

maxillary suture method before, while 46.2% of the participants had. Two people did not

record whether or not they were familiar with the method, although they did provide age

estimations based on this method. Of those people that were already familiar with the

method, 16.7% use it on a regular basis and 83.3% do not. The median score for self-

reported level of comfort with the method was two, which corresponds to “low.”

As was observed in the JPAC/CIL identified sample, scores produced by the

participants are difficult to analyze because there was little similarity in reporting

between individuals. For example, using sample E, the only age estimates assigned more

than once were: 20-25 (n=3), 26+ (n=3), 30-85 (n=2), 35+ (n=4), and 35-50 (n=2). Two

individuals did not give an age estimate and the remaining 22 age estimates all differed

from one another. The age estimates for sample F were slightly easier to interpret than

sample E, but suffered from a comparable lack of similarity in reporting. Participants did

not appear to be using the same tables or figures from the reference method in assigning

age estimates.

Two separate “percent confidence” questions were asked for the Mann et al.

maxillary suture method: percent sure that the observations of obliteration were correct

and percent sure that interpretation of age interval was correct. For sample E,

participants’ mean level of confidence for observations of obliteration was 70.53%, with

the most frequent score reported being 60%. Participants were on average 66.7% sure

that their interpretation of the age interval was correct for this sample, although the most

frequently reported score was 75%. For sample F, the mean levels of confidence for

152

observations of obliteration and interpretation of age interval were higher at 79.2% and

72.2%, respectively. The most frequently reported score for both confidence levels was

80%. Sample F seemed to present less of a problem for age estimation and interpretation

than sample E.

To further illustrate the problem of interpretation, all combinations of suture

obliteration as reported by participants for sample E are show in Figure 53. This figure

includes only those data points for which the participants circled the obliterated sutures

on the questionnaire. Several individuals made additional notes concerning stages of

suture obliteration but these are omitted here. The five categories following “none”

represent the general pattern of suture obliteration in chronological order, i.e., IN, PMP

obliteration generally occurs before IN, PMP, TP. There are 13 combinations listed here.

The incisive suture was recorded as fused in all but three groups and the posterior median

palatine in all but six. The transverse palatine suture is present in six of the groups and

the transverse palatine within the greater palatine foramina in seven. Finally, the anterior

median palatine suture is present in only three groups. One individual scored this sample

as having no obliterated sutures.

Figure 54 represents an overall count of the number of times each individual

suture was recorded as obliterated for all participants. Almost all participants agreed that

the incisive suture was obliterated and close to 50% of participants also recorded the

posterior median palatine and the transverse palatine within the greater palatine foramina

as obliterated. More than 50% of the participants agreed that the transverse palatine

suture was not obliterated and almost all agree that the anterior median palatine suture

was not obliterated. Given these trends and the general pattern of suture obliteration as

153

0

1

2

3

4

5

6

7

8no

ne IN

IN,

PM

P

IN, P

MP

, T

P

IN, P

MP

, T

P, T

Pin

GP

F

IN,

PM

P, T

P,

TP

inG

PF

,A

MP

IN, P

MP

, A

MP

IN,

PM

P, T

P,

AM

P

IN,

PM

P, T

Pin

GP

F

IN,

TP

, TP

inG

PF

IN, T

Pin

GP

F

TP

, TP

inG

PF

TP

inG

PF

Count of E

E

Figure 53. All combinations of suture obliteration as reported by participants for sample E (n=38).

given in Mann et al. (1991), sample E most likely represents an individual over the age of

30. However, since age estimate reporting was so sporadic, no attempt was made to

calculate the SEI by experience level for this sample.

While the age estimates were not clearly reported for sample F, the pattern of

suture obliteration recorded exhibits high interobserver agreement. Only three categories

of suture obliteration were recorded (Figure 55), and two of the three included the

154

0

5

10

15

20

25

30

35

40

None IN PMP TP TPinGPF AMP

Suture

Co

un

t

Figure 54. Frequency of sutures scored as obliterated: sample E. incisive suture. Only one individual scored the transverse palatine suture as fused and

only two individuals reported the combination of incisive/transverse palatine within the

greater palatine foramina. Figure 56 also displays the consensus between observers of

obliteration of the incisive suture. Given this agreement, the best age estimate for this

sample is between 20 and 25 years, which is also the most frequently reported age

estimate (n=11). However, even with the agreement concerning obliteration of the

incisive suture, age estimates were not easily comparable and no attempt was made to

calculate the SEI based on level of experience.

155

0

5

10

15

20

25

30

35

IN IN, TPinGPF TP

Count of F

F

Figure 55. All combinations of suture obliteration as reported by participants for sample F (n=36).

Summary

This chapter presented results from tests of three skeletal age estimation

methods. All three methods were chosen based on analyses of the JPAC/CIL identified

sample. Recommendations for possible method modifications or further research based

on results from this preliminary interobserver error study will be presented in the

following chapter.

156

0

5

10

15

20

25

30

35

40

None IN PMP TP TPinGPF AMP

Suture

Co

un

t

Figure 56. Frequency of sutures scored as obliterated: sample F.

157

CHAPTER VIII

DISCUSSION

This chapter discusses the results from both the retrospective and

interobserver error studies. Generally, all methods perform well for the JPAC/CIL

identified sample, although there are some exceptions to this rule. An overview of

method performance, error, and experience as correlated to error are given below.

General observations of skeletal age estimation for this sample and limitations and biases

of both studies will also be outlined.

Method Performance

It was hypothesized that age estimation methods would perform well in the

JPAC/CIL identified sample because of the use of McKern and Stewart (1957) for age

estimation based on epiphyseal fusion and the pubic symphysis and the young

composition of the sample. The results from the retrospective and interobserver error

studies suggest that age estimation methods are, for the most part, performing well for

individuals in the JPAC/CIL sample. Each method is discussed in further detail below,

followed by a brief comparison of all methods.

Epiphyseal fusion methods perform very well at the CIL, with close to 100%

correct classification for all of the long bone epiphyses. Long bones are usually scored

using McKern and Stewart (1957), although Scheuer and Black (2000) is also

158

occasionally employed. The data for long bone epiphyses is not extremely useful since

most individuals in the sample are listed as “all epiphyses fused,” indicating adult age.

While this is helpful in determining a minimum age, late-fusing epiphyses are more

informative for age estimation, especially in young adults.

Late-fusing epiphyses, such as the iliac crest, medial clavicle, vertebral centra,

and the first two sacral segments, have high potential for age estimation in the JPAC/CIL

sample because they all generally fuse in one’s 20s. When examining correct and

incorrect classifications, the Webb-Suchey medial clavicle and iliac crest and the Albert-

Maples vertebral centra methods perform very well; all three epiphyses exhibit close to

100% correct classification. For the McKern-Stewart method, the iliac crest also has

100% correct classification, but the medial clavicle and vertebral centra drop below 90%.

The age intervals provided for stages of fusion of the vertebral centra are similar between

methods, while the Webb-Suchey clavicle method has much larger intervals per stage of

fusion than the McKern-Stewart clavicle method. Larger age intervals can contribute to a

higher percentage of correct classification.

Fusion of the first two sacral segments has the lowest correct classification of

all methods employed at the CIL (32.1%). Very few of the individuals that were scored

as stages zero through three fall into the age intervals provided by McKern and Stewart

(1957) for fusion of these two elements (Figure 23). Additionally, the pattern of fusion is

sporadic, with an absence of fusion seen in an individual 30 years old and complete

fusion seen as young as 26 years in this sample. It is recommended that S1-S2 fusion no

longer be used in age estimation, unless new age intervals are devised for the stages or

further research is conducted.

159

The Mann et al. maxillary suture method had a surprisingly high correct

classification rate (88.7%) considering the general skepticism surrounding the use of

cranial sutures in age estimation (Brooks 1955; Masset 1989). This is in agreement with

Ginter (2005), who suggested that the revised maxillary suture method (Mann et al. 1991)

was more effective at age estimation than more commonly used methods, like the pubic

symphysis and the sternal rib ends. Most incorrect classifications occurred for individuals

under the age of 25. Age estimates given as five-year intervals have negative bias values,

indicating that they tend to underage. All other intervals have positive bias values. There

does not appear to be a clear correlation between error values and estimated age interval

based on maxillary suture obliteration. There is a significant linear relationship between

estimated and known age-at-death (Table 6, r=0.79) for the Mann et al. maxillary suture

method.

Even with low overall bias, inaccuracy, and SEI, a high correct classification

rate, and a good correlation between estimated and known age-at-death, the method

appears to be difficult to apply. Age estimates were difficult to compare between

analysts, a problem that was found both in the retrospective and interobserver error

studies. Even when the same sutures were reported as obliterated, the age intervals

produced by analysts were not always comparable. Using sample F, for which 35 of 36

participants scored the incisive suture as fused, only 11 participants gave the same age

interval (20-25 years). The discrepancies in reporting are probably due to the use of

different tables and figures from the reference article. About half of the participants had

never used the method before and of those that did, it was not on a regular basis,

160

indicating that of the anthropologists surveyed, many were not familiar or comfortable

with the Mann et al. maxillary suture method and that the method is not easy to apply.

Another problem in application of this method may be the definition of suture

obliteration. The revised method, based on visual observation of obliteration, does not

clearly define suture obliteration. In fact, the senior author explained that any small

amount of obliteration along the suture counts as obliteration of that suture (personal

communication, Robert Mann 2008), but this is not detailed in the reference article.

Standardization of the method is therefore recommended to include a clear explanation of

what constitutes suture obliteration (i.e., partial, complete), defined age intervals for each

stage of suture obliteration, and summary statistics associated with each interval.

Both dental formation methods (Moorrees et al. 1963; Mincer et al. 1993)

include terminal stages, analogous to complete union in epiphyseal fusion methods.

When these stages are eliminated from analyses, there is a drastic reduction in the

number of correct classifications for both methods. For the Moorrees et al. method,

correct classification occurred for only 51.9% of individuals not classified as “apices

complete” and all incorrect classifications occurred for incomplete stages of root

formation, with the majority taking place in the R1/2 to R3/4 stages (Appendix C). For the

Mincer et al. method, correct classification was 74.1% when the terminal stage (H) was

eliminated from calculations. No incorrect classifications were recorded for stage H, and

the majority occurred in stage G, which directly precedes H in the Demirjian et al. (1973)

system.

Error values as indicated by bias, inaccuracy, and SEI are also very high when

terminal stages are included in analyses. This is to be expected because the terminal

161

stages offer only a minimum age. Using the mean for this stage to calculate error values

for all individuals results in large error values because older individuals are further away

from the stage mean. Eliminating terminal stage data points decreases the error associated

with both methods. It should also be noted that the JPAC/CIL sample is not a random

sample because of military enlistment requirements. There are no individuals younger

than 17 years old in the sample and this inherent bias accounts for the tendency of both

dental formation methods to underage. Both the Moorrees et al. (1963) and Mincer et al.

(1993) methods were developed using samples of mainly young children and therefore

have mean ages at attainment for each stage of root development that are generally

younger than most individuals in the JPAC/CIL sample.

The results from both dental formation methods suggest that these methods

are not appropriate for age estimation using terminal categories, except as a minimum

age. The number of incorrect classifications for stages of incomplete formation also

suggests that dental formation methods are not performing well for late adolescents and

young adults. There is also the possibility that analysts are making determinations based

on visual inspection of teeth no longer in the alveoli (personal communication, Kevin

Torske 2008). This could severely bias the age estimation because a root that appears

incompletely formed may in fact be damaged. Additionally, confusion between assigning

stages of complete root formation (Rc) versus complete apical closure (Ac) cannot be

ruled out, especially considering the small sample sizes for stages of incomplete

formation.

The Suchey-Brooks pubic symphysis method had the highest correct

classification rate of all three pubic symphysis methods, while the Todd method had the

162

lowest bias, inaccuracy, and SEI. The Todd method also had the smallest sample size and

is therefore not truly comparable to the Suchey-Brooks and McKern-Stewart pubic

symphysis method samples. The Todd method only had a 70% correct classification rate

and a larger sample would be beneficial to fully understand the error associated with this

method. It is important to point out that the Todd method is not included as an approved

age estimation method in SOP 3.4.

The rationale of using McKern-Stewart for males that died before 1960 is that

the method was developed on a sample of identified males from the Korean War. The

Suchey-Brooks method was developed using a more recent sample and should better

represent those individuals who died after 1960. When comparing the two methods using

five-year intervals (Table 22), both perform equally as well. The most notable differences

occur in the 41+ age category, which is not unexpected because of the tendency of

skeletal age estimation methods to perform poorly for older individuals (see Meindl and

Lovejoy 1989; Schmitt 2004). Another interesting factor is the juxtaposition of bias

directionality per age interval; where one method overages, the other underages, and vice

versa. This is true for all age intervals except the last two (36-40, 41+).

The Suchey-Brooks method has a higher overall correct classification rate

than the McKern-Stewart method. When the known ages-at-death of individuals are

superimposed over the age intervals given by each method, it is clear that this difference

is attributed to the large age intervals per phase of the Suchey-Brooks method; the only

incorrect classification occurred in phase one. The Suchey-Brooks method is very

accurate, but certainly not as precise as the McKern-Stewart method. This trend was

noted by Saunders et al. (1992) in a test of multiple age estimation methods. While the

163

overall accuracy of the McKern-Stewart method is not as high as the Suchey-Brooks

method, it is also not as poor as other methods currently in use. However, Figure 28

demonstrates that a possible combination of certain composite score groups may be

warranted, specifically those in the lower end of the spectrum (e.g., 1-2, 3, 4-5).

Concerning Suchey-Brooks, the highest bias, inaccuracy, and SEI values were observed

in phase five, which is also the largest phase range of all five observed phases1.

Given similar results for both the McKern-Stewart and Suchey-Brooks pubic

symphysis methods and the assumption that JPAC/CIL analysts have been applying age

estimation methods according to SOP 3.4, the continued use of both methods is

supported. It was not possible to determine how well either of the methods performs for

older individuals because of the absence of these individuals in the JPAC/CIL identified

sample. Both methods are acceptable for the age of individuals that make up this sample.

The different auricular surface methods are difficult to compare because the

sample size for the Buckberry-Chamberlain method was so small. It is very apparent that

the age intervals provided in the original Lovejoy et al. auricular surface method are far

too narrow, which is why Osborne et al. (2004) published revised statistics and combined

several of the original phases. Correct classification for the Lovejoy et al. method is low

even when multiple phases are assigned by analysts. It is preferable to use multiple

phases when employing the Lovejoy et al. (1985) auricular surface method because of the

very small age intervals given for this method. When multiple phase assignments are

removed from analyses, correct classification is just above 50%. The Lovejoy et al.

1This does not include phase six since this phase was never recorded by CIL analysts.

164

auricular surface method is not an accurate age estimation method for the JPAC/CIL

sample.

Applying the Osborne et al. phases and statistics to the same individuals

drastically increased the correct classification rate for this sample. Similar to the Suchey-

Brooks method, the age intervals provided in Osborne et al. (2004) are very large and

equally as imprecise. Also similar to the Suchey-Brooks method, the only incorrect

classification using the Osborne et al. statistics occurred in phase one. Phases three and

four had a tendency to overage. All individuals classified into these two phases had

known ages-at-death less than the mean for the phase, which is most likely related to the

age distribution of the sample (Figure 10). There are no individuals with an age-at-death

greater than 42, and the mean age for both phases three and four is greater than or equal

to 42. This trend is also reflected in the error values for each phase of the Osborne et al.

method, which increase with each subsequent phase.

The Buckberry-Chamberlain revised method is the preferred method for

auricular surface age estimation at the JPAC/CIL, except for very young individuals and

partial auricular surfaces. Unfortunately, the method was only used ten times for the

JPAC/CIL identified sample. Of the ten individuals aged with this method, all were

correctly classified. The Buckberry-Chamberlain auricular surface method was also the

method with the highest bias, inaccuracy, and SEI values when compared to all methods

employed at the JPAC/CIL2. Figure 40 demonstrates that the age intervals associated

with each stage are again very large, explaining the 100% correct classification yet high

2 This comparison is made excluding the terminal stages of dental formation methods.

165

error values. Only stages one, two, and five were observed for this sample. Both phases

two and five overaged individuals.

The interobserver test of the Buckberry-Chamberlain method indicated that

analysts were correctly assigning age point estimates and intervals based on composite

scores, but that the problem may actually lie in application of the method. Almost half of

the participants had never used the revised auricular surface method before, although

some people indicated that they were familiar with the descriptive categories of the

Lovejoy et al. method. There was fairly large variation in stage assignment for both

samples, which is problematic because the age intervals are already so large that spanning

multiple stages gives very imprecise age estimates. While the Buckberry-Chamberlain

method has been described as easier to apply than the original method (Mulhern and

Jones 2005; Falys et al. 2006), this was not supported by this study. This may be due to

the fact that analysts were not familiar or comfortable with the revised method or the

design of the study (discussed further below). Further research needs to be conducted

since this method is listed as the primary method for auricular surface age estimation at

the JPAC/CIL. Overall, the results from the retrospective and interobserver error studies

do not strongly support the auricular surface as a good age indicator.

The sternal rib end method as developed by Iscan and colleagues did not have

a good correct classification rate when the individuals in the JPAC/CIL sample were

assigned to the 95% confidence intervals from the original study (Iscan et al. 1984b).

These age intervals are not the same age intervals that accompany the sternal rib end

casts. Both the age intervals published with the original method and those that

accompany the sternal rib end cast set are far too narrow. Additionally, assigning the

166

Nawrocki (n.d.) prediction intervals, which are much larger than the Iscan et al (1984b)

intervals, dramatically increased the number of correct classifications. The only incorrect

classification with these prediction intervals was in phase one. The Iscan et al. sternal rib

end method also has the lowest correlation between known and estimated age-at-death for

all methods listed in Table 6.

Error values for the sternal rib end method are not extremely large because the

known ages-at-death of individuals aged with this method are generally very young and

do not differ greatly from the mean age given per phase. This indicates that the method is

performing adequately for younger individuals when the mean age point estimate is used,

although a larger sample size would be desirable to fully investigate this phenomenon.

No conclusions can be drawn about the performance of this method for older individuals

since no one was assigned to phase five or greater in the JPAC/CIL identified sample.

Results from the preliminary interobserver error study show that analysts are

generally familiar and moderately comfortable with the sternal rib end method and that

almost 50% of participants use it on a regular basis. The availability of casts makes this

method easier to apply than other methods, such as the auricular surface. The variation in

phase assignment for both samples was relatively low. Given the problems with very

small age intervals and the results for sample C, further research is warranted to

determine if phases of the sternal rib end method could possibly be condensed. This was

noted by one participant in the study and reflected by the multiple phase assignments

given by some JPAC/CIL analysts. The sternal rib end method would also benefit from

testing on larger, more varied samples.

167

Of all groups of methods, the auricular surface performs the poorest as an age

indicator in the JPAC/CIL identified sample based on all error values. The pubic

symphysis methods are the only methods that do not require modifications or further

research, even though smaller age intervals would always be welcome. All methods listed

in Table 6 had a significant linear relationship between estimated and known age-at-

death; this table does not include epiphyseal fusion or dental formation methods.

Error

If methods were developed from correctly balanced samples (Nawrocki 1998)

and are being applied correctly, error should be normally distributed. This error is

represented her e by calculations of bias for each method. Additionally, there should be

no significant differences in bias, inaccuracy, or SEI between phases of a single method.

Differences in phases would indicate that assigning an individual to one phase over

another could increase the error in the age estimation. Finally, interobserver error should

also be low, i.e., age estimates produced by different observers should be similar.

No error values were calculated for epiphyseal fusion methods. Age

estimations based on complete stages of union produce high error values because full

union is observed in all skeletally mature adults. The generally high correct classification

rates for the methods employed at the JPAC/CIL and the small sample sizes for stages of

incomplete fusion did not support the use of statistical tests of significance to examine

error.

The distribution of bias for the Mann et al. maxillary suture method was

normal. No tests of significance between age intervals were conducted due to

168

discrepancies in reporting these intervals and small sample sizes for all but one of the

intervals. Bias, inaccuracy, and SEI were calculated using the midpoint of the reported

age interval. Larger age intervals may have larger error values because individuals

correctly classified in a large interval could fall further away from the midpoint than

those in a small interval. The highest bias, inaccuracy, and SEI values do occur in the

larger age intervals for this method (Table 10). However, two five-year age intervals (15-

20, 25-30) have error values that are comparable to and, in some cases, greater than larger

age intervals. Therefore, the large age intervals do not fully explain the error between

reported age intervals. Larger sample sizes for each age interval would allow for

statistical testing and perhaps elucidate the problem. This method has the potential to be

very accurate and precise, especially for younger age groups.

Distributions of bias for the dental formation methods were not created

because of the small sample sizes for non-terminal stages. Dental formation methods

suffer from the same analytical problems as epiphyseal fusion methods because complete

apex closure indicates dentally mature adults, who can be significantly older than the

given stage mean. This creates difficulty in establishing a stage mean, which is required

for statitstical analysis. Additionally, when terminal stages are removed from all error

calculations, error drastically decreases for both methods (Table 12). It is apparent that

the high error values beforehand are due to the inclusion of the terminal stages.

To see if different teeth or roots are more prone to error, Student’s t-tests and

ANOVA were run including terminal stages for both methods. The Moorrees et al. dental

formation method showed no significant difference in bias, inaccuracy, or SEI between

teeth 17 and 32. There was a significant difference in bias and inaccuracy between mesial

169

and distal roots, with mesial roots exhibiting higher error than distal roots. This indicates

that age estimation based on the mesial roots may not be as accurate as the distal roots,

although the difference is only approximately one year. The Mincer et al. third molar

formation method does not distinguish between different tooth roots. There was no

significant difference in error between all four third molars using this method.

The distribution of bias for the Todd pubic symphysis method is not normal,

but this is most likely related to the small sample size. Bias, inaccuracy, and SEI are low

for all individuals who were correctly classified. Error values were the highest for

individuals who were incorrectly classified, since these individuals fall further away from

the interval midpoint than those individuals who were correctly classified. No tests of

significance were conducted because of the small sample size for phases of the Todd

method.

The distribution of bias for the McKern-Stewart pubic symphysis method is

fairly normal, with a slightly larger number of negative bias values. Statistical tests of

significance were run to compare possible differences in bias, inaccuracy, and SEI

between composite score groups 6-7, 8-9, and 11-12-13. These three groups were the

only groups with large enough sample sizes. ANOVA with the Bonferroni correction

revealed that composite score group 8-9 was problematic; it had a higher negative bias

than all other groups. Inaccuracy was not significantly different, but was higher than

lower composite score groups. There was no significant difference in mean SEI between

the three groups compared. The highest SEI values occurred in the last two composite

score groups (14 and 15). This can be explained by the incorrect classification of the one

170

individual assigned to 14 and the large age interval associated with 15 (36+). Larger

sample sizes for all score groups would greatly enhance between-group analyses.

The distribution of bias for the Suchey-Brooks pubic symphysis method is

also relatively normally distributed, albeit slightly skewed to the right, indicating slightly

higher positive bias values. ANOVA with the Bonferroni correction run between the first

four phases indicated that phase one was significantly different in bias and inaccuracy

from phases two through four. This is logical because phase one is the only phase with

negative bias. It also has an average error of years (inaccuracy) that is greater than those

of phases two and three, but less than phase four. Additionally, phase one is the only

phase in which incorrect classification occurred for this method. There is no significant

difference in SEI between the first four phases and SEI is relatively high overall. This is

most likely related to the large confidence intervals associated with phases of the Suchey-

Brooks method, which are accurate but not precise.

The Suchey-Brooks method, while being largely supported as the most

reliable and widely employed skeletal age estimation method, has not been subjected to

analysis of interobserver error. The analyst using the Suchey-Brooks system thus assumes

that he or she is scoring a given pubis in the same manner that the method developers

would and that the method works as published (personal communication, John Byrd

2009). By including intervals that include two standard deviations from the mean, Suchey

and colleagues have created sufficiently large intervals to account for error associated

with the assigned phase. However, there is no discussion of error in phase assignment,

specifically by those that are less experienced in age estimation from the pubic

symphysis.

171

Distribution of bias for the Lovejoy et al. auricular surface method shows two

very large peaks on the positive side of the graph. The bias is not normally distributed

around zero, which is consistent with the majority of positive bias values for phases of

this method. ANOVA with the Bonferroni correction indicates that phase three is

problematic for this method. However, bias, inaccuracy, and SEI are similar for phases

four and five, but these phases did not have large enough sample sizes for tests of

statistical significance. Even with small age intervals, the Lovejoy et al. method has large

SEI values, indicating that the problem is with the distribution of ages within the original

intervals. This method is precise, but not at all accurate.

The distribution of bias for the Osborne et al. auricular surface method is not

normal. There are numerous high peaks of negative bias values and the distribution is

skewed to the right. The only two phases with large enough sample sizes for statistical

comparisons were phases one and two. Student’s t-tests comparing bias, inaccuracy, and

SEI between these phases revealed that there were significant differences in bias and

inaccuracy, but not SEI. Error values increase for each consecutive phase of this method

as the distribution of known-aged individuals becomes less well centered around the

phase means. The Osborne et al. method has high SEI values because of the large age

intervals that are associated with each phase. The high level of correct classifications of

individuals aged using this method accompanied by high SEI values indicates that this

method is accurate but not precise.

No distribution of bias was created for the Buckberry-Chamberlain revised

auricular surface method because of the small sample size and the paucity of recorded

stages in the JPAC/CIL identified sample. No tests of significance were conducted either.

172

This method has low bias and inaccuracy values for all stages, with the exception of stage

five. However, the SEI values are the highest of all methods employed at the JPAC/CIL.

This method is clearly accurate but not at all precise. The interobserver error study

revealed that analysts may not be correctly applying the method due to large variation in

stage assignment, which is problematic considering the already large age intervals it

predicts.

The distribution of bias for the Iscan et al. sternal rib end method was not

normally distributed, which is related to the small sample size available for analysis once

multiple phase designations were eliminated. No statistical tests of significance were

conducted because of small sample sizes for the four phases observed in the JPAC/CIL

identified sample. Bias and inaccuracy were very low for the first four phases, with the

exception of a high negative bias value for phase one. This is due to the incorrect

classification of one individual with a known age-at-death of 24 years. This also

influenced the SEI for this phase. These discrepancies would not be so dramatic if the

sample sizes per phase were larger. Using the Nawrocki prediction intervals for the first

four phases produced both accurate and precise age estimations. The interobserver error

study also indicated that phase assignment between individuals was relatively consistent.

The poor performance of the auricular surface methods is again confirmed by

the error values discussed above. All other methods perform better in general than the

auricular surface methods, though not without their own caveats. It is important to note

that high SEI values can be related to large confidence intervals, poorly constructed

confidence intervals from the reference method, and incorrect classifications. It is

173

therefore important to analyze this index in conjunction with other measures of method

performance and error.

Analyst Experience

As Maples (1989:323) suggested, successful age estimation comes from

analysts “with a long and wide experience with a variety of techniques.” It was

hypothesized that the SEI would be dependent on experience. Therefore, individuals with

more experience in skeletal aging, measured in this study by highest degree obtained

(doctorate) and number of years of experience with skeletal aging (10+), should have

lower average SEIs for measured samples than those individuals with less experience.

For the Buckberry and Chamberlain (2002) revised auricular surface method,

those individuals with a doctorate actually had the highest mean SEI for sample A, but

the lowest mean SEI for sample B. The same trend was observed for the 10+ years of

experience group. Because the sample sizes for individuals with a doctorate and 10+

years of experience were so small, no statistical tests of significance were possible

between this and other degree-level groups. No significant difference in SEI was present

between individuals with bachelor’s and master’s degrees.

Participants were not highly confident that their observations corresponded to

the correct composite score and the self-reported level of comfort was low for this

method. Half of the participants have never used the Buckberry-Chamberlain revised

auricular surface method before this study, which includes four of the five individuals

with doctorates and four of the seven individuals with 10+ years of experience. Lack of

familiarity with the method and the small sample size of individuals with doctorates or

174

10+ years of experience are most likely affecting the mean SEI for both groups, rendering

the comparison between experience levels of little utility.

For the Iscan et al. (1984b) sternal rib end method, individuals with a doctoral

degree had a much lower average SEI than other groups for sample C, but not for sample

D. Those individuals with greater than ten years of experience in skeletal aging had

markedly lower average SEIs for both samples. There was no significant difference in

SEI between individuals with bachelors and master’s degrees. The results for the Iscan et

al. sternal rib end method, especially for the average SEI based on years of experience in

skeletal aging, support the hypothesis that individuals with more experience will

generally have lower error associated with their age estimations. Additionally,

participants were fairly confident that their observations corresponded to the correct

phase and the self-reported level of comfort for this method was average. Of the three

methods tested, participants were most familiar with the Iscan et al. sternal rib end

method and all individuals with the highest levels of experience had used this method

before.

It was not possible to calculate a SEI for either sample tested using the Mann

et al. maxillary suture method because of a general lack of similarity between participants

in reporting age estimates. Similar to the Buckberry-Chamberlain method, only half of

the participants had used this method prior to this study and, of these individuals, very

few use it on a regular basis. Self-reported level of comfort with the method is low, but

confidence levels both for observations of obliteration and interpretation of age intervals

were similar to those reported for the sternal rib end method. This is interesting,

especially considering the large number of combinations of suture obliteration and age

175

intervals recorded for sample E. Because individuals had so little experience with this

method, analyzing error based on experience would probably produce similar results to

the SEI for the Buckberry-Chamberlain method.

Two of the three methods (Buckberry-Chamberlain and Mann et al.)

examined in the interobserver error study were chosen because it was believed that they

were not as well-known or understood as other more commonly used skeletal age

estimation methods. Therefore, the results from this portion of the study are not highly

informative in regards to a possible correlation of error and aging. The exception to this is

the Iscan et al. method, which appears to support the hypothesis that those individuals

with more experience have lower average SEI than those individuals with less

experience. These results do suggest that experience with individual methods of age

estimation is more important than overall experience in skeletal age estimation or highest

degree held.

General Observations

Analysis of known age-at-death distributions for each sample indicated that

there were significant differences in mean age-at-death between sub-samples. Notably,

epiphyseal union and dental formation methods have lower mean ages-at-death than the

total identified known age-at-death sample and the pubic symphysis and auricular surface

methods have higher mean ages-at-death. This is related to the age indicators being

measured in each group of methods. The first group is concerned with processes of late

development and the second group with age-related changes in skeletally mature adults.

Therefore, the different distributions observed are to be expected.

176

Sources of error in age estimation methods at the JPAC/CIL include analyst

error and the methods themselves. Analyst error is random while method error is

systematic. Neither source of error can be completely removed for skeletal age

estimation, but it is important to estimate uncertainty in measurement to avoid overstating

the performance of a method.

SOP 3.4 stipulates the use of specific age estimation methods unless approved

by lab management. Approved age estimation methods at the JPAC/CIL include: McKern

and Stewart (1957) epiphyseal fusion, Scheuer and Black (2002) epiphyseal fusion,

Moorrees et al. (1963) dental formation, Mincer et al. (1993) dental formation, McKern

and Stewart (1957) pubic symphysis, Suchey-Brooks pubic symphysis, Buckberry and

Chamberlain (2002) revised auricular surface, Lovejoy et al. (1985b) auricular surface

with Osborne et al. (2004) age intervals, and Iscan et al. (1984b) sternal rib end. All other

age estimation methods require full documentation and citation. Methods analyzed here

that are not listed in SOP 3.4 include: Albert and Maples (1995) vertebral centra union,

Webb and Suchey (1985) medial clavicle and iliac crest union, Todd (1920, 1921) pubic

symphysis, and Mann et al. (1991) maxillary suture obliteration. With the exception of

the Todd pubic symphysis method, all of these methods perform well in the JPAC/CIL

sample and their addition to the laboratory manual should be seriously considered.

Limitations of the Thesis

All scientific research has its limitations and the studies presented here are no

exception. The first problem with measuring uncertainty in age estimation for the

JPAC/CIL identified sample is the reliance on written records. Since identified

177

individuals are returned to their families upon identification, no skeletal remains are

available for analysis. The records at the JPAC/CIL are complete in most cases, but data

collection follows the assumption that analysts are properly recording all analyses and

following SOPs. There is no possibility to confirm or refute their conclusions based on

skeletal material. Earlier records are problematic because there has not always been a

quality assurance program in place.

Another problem is those elements available for analysis. The sample sizes of

methods analyzed in this thesis are dependent upon recovery and preservation of age-

related skeletal indicators. Elements that are poorly preserved, such as the sternal rib end,

have a smaller sample size. Age distributions per method could therefore be dependent on

those elements present at the time of analysis.

Single-method analysis does not replicate the reality of age estimation.

Maples (1989) recommended the use of as many techniques as possible when

constructing individual age intervals. This practice uses all available elements to provide

an age estimate with as much age-related information as possible. Additionally, this

procedure is designed to ensure that the anthropologist is establishing an accurate and

precise interval. The goal of this study was not to analyze overall age estimations for each

individual in the JPAC/CIL identified sample, but to quantify error for each isolated

method.

The interobserver error study was conducted largely at a professional

conference, which does not replicate laboratory conditions. The study was run in a large

exhibit hall, with, as noted by one participant, poor lighting and many possible

distractions. Participants were likely to feel rushed since their main purpose at the

178

conference was not to complete this study. The table space was also cramped and not

conducive to extended analyses of the samples.

There also seemed to be considerable confusion with application of at least

two of the methods (Buckberry-Chamberlain revised auricular surface and Mann et al.

maxillary suture obliteration). Very few participants referenced the Iscan et al. (1984b)

article and instead used the age intervals provided with the cast set, although these are not

the same as the original article. It was difficult for participants to read and understand

reference methods with which they were not already familiar, even when the articles were

provided. This probably introduced significant error to the study, although a partial goal

of the study was to see how easily methods could be applied by individuals that were not

familiar with them.

Summary

This chapter discussed results obtained from the retrospective and

interobserver error studies conducted for this thesis. Analysis of method performance as

indicated by classification rates and error values indicated that age estimations generally

perform well for individuals in the JPAC/CIL identified sample. Exceptions to this were

outlined. General observations for age estimation at the JPAC/CIL and limitations of the

current research were also given. The following chapter will summarize this thesis and

provide ideas for future research.

179

CHAPTER IX

SUMMARY

The aim of this thesis was to examine and quantify measurement uncertainty

associated with skeletal age estimation methods employed at the JPAC/CIL. This study

was the first to span so many years and methods at a single institution. This chapter

summarizes the findings of this research and discusses future research to be conducted in

age estimation and measurement uncertainty.

Uncertainty in Skeletal Age Estimation

Both the retrospective and interobserver error studies presented in this thesis

have produced valuable data concerning age estimation. Error in age estimation is both

random and systematic, caused by operators and problems with the reference methods.

Both of these sources of error will never be completely eliminated, but can be controlled

for via quality assurance programs and continued testing of and research on age

estimation methods. Research in the reliability and accuracy of methods is crucial to

further development in the forensic sciences and will continue to drive inquiry in related

fields.

Age estimation methods perform well for the JPAC/CIL identified sample.

This is most likely related to the age composition of the sample, which is largely between

the ages of 18 and 30. The legacy of McKern and Stewart also continues to provide a

180

strong analytical framework for age estimations conducted at the JPAC/CIL. Certain

methods still present problems, such as the auricular surface and, to some extent, the

sternal rib end. Other methods should be eliminated from age estimation procedures,

specifically the fusion of the first two sacral segments. Finally, some methods produced

surprising results and merit further consideration, such as maxillary suture closure.

Future Research

There is a vast array of future research topics that have arisen as a result of

this thesis. It is important to note that this thesis is the first attempt at estimating

uncertainty of skeletal age estimation at the JPAC/CIL and as such represents the

beginning of quantifying error associated with these methods. Undoubtedly, with the

current focus on quality assurance and identification, all methods employed at the

JPAC/CIL will need to be subjected to uncertainty analysis. This includes methods for

estimating stature, sex, ancestry, and other portions of the biological profile.

Validation studies are at the forefront of forensic science research. They

represent the efforts of forensic scientists to understand the error associated with the

methods they use and to quantify reliability and accuracy of these methods. Accreditation

and quality assurance is the future of forensic anthropology. Therefore, future research

related to methods of human identification must be prepared to detail uncertainty in

measurement in accordance with national and international standards, such as ISO/IEC

17025.

Using the records at the JPAC/CIL, it would be interesting to conduct future

research to examine the frequency of elements recovered. This could include what

181

percentage of age-related elements are recovered and analyzed. The preservation of these

elements should also be considered as a possible factor of age estimation, including the

choice of method or methods used. Element preservation and recovery may also have a

significant impact on measurement uncertainty.

One method that showed promise for accurate and precise age estimation was

the Mann et al. (1991) maxillary suture method. Most participants in the interobserver

error study expressed surprise when asked to use a suture closure method. This method

was explicitly chosen because of the good performance of this method in the JPAC/CIL

identified sample juxtaposed against a disdain for suture closure methods as related to

age estimation. The method needs to be standardized and tested on more populations,

including documented age-at-death collections.

Improvements to other methods are also recommended. The McKern-Stewart

pubic symphysis method would benefit from a reanalysis of the age intervals associated

with each composite score group; it may be possible to combine several of the lower age

categories. Iscan and colleagues’ sternal rib end method could also benefit from the

combination of some phases, as well as validation studies on different populations and

varied age groups. The Buckberry-Chamberlain auricular surface method, while not

promising given the results in these studies, may suffer from a lack of familiarity.

Training in using the method as proposed by its authors would be worthwhile, as well as

continually testing it with different populations.

A more expansive interobserver error study is also desirable. The study

presented here was preliminary and designed to initiate research concerning three

methods that presented problems in the JPAC/CIL identified known age-at-death sample.

182

Further studies should find a way to simplify the methods being analyzed, such as putting

all pertinent information and a step-by-step procedure onto small posters (personal

communication, Eric Bartelink 2009). Additionally, a quieter setting would be preferred

as well as the testing of only one method at once to reduce confusion. One final

improvement would be to use samples of known age-at-death in order to examine method

reliability.

Understanding the uncertainty associated with age estimation is crucial to

correct applications of these methods by forensic anthropologists, paleodemographers,

and bioarchaeologists. Quantifying the error of any scientific method allows for the

continued improvement and development of practices, procedures, and techniques related

to the method. Age estimation from adult skeletal remains will most likely always present

challenges to physical anthropologists as more reliable, precise, and accurate methods are

pursued. The emphasis now lies on elucidating the error associated with these methods.

REFERENCES CITED

184

REFERENCES CITED

Adams, Bradley J., and John E. Byrd 2002 Interobserver Variation of Selected Postcranial Skeletal Measurements.

Journal of Forensic Sciences 47(6):1193-1202. Albert, Arlene Midori, and William R. Maples

1995 Stages of Epiphyseal Union for Thoracic and Lumbar Vertebral Centra as a Method of Age Determination for Teenage and Young Adult Skeletons. Journal of Forensic Sciences 40(4):623-633.

American Society of Crime Laboratory Directors/Lab Accreditation Board

2007 Estimating Uncertainty of Measurement Policy. AL-PD-3008-Ver 2.0. 2008 Updated Uncertainty of Measurement Requirements. AL-PD-3033-Ver 1.0.

Arany, Szilvia, Mitsuyoshi Iino, and Naofumi Yoshioka

2004 Radiographic Survey of Third Molar Development in Relation to Chronological Age Among Japanese Juveniles. Journal of Forensic Sciences 49(3):1-5.

Aykroyd, Robert G., David Lucy, A. Mark Pollard, and Charlotte A. Roberts

1999 Nasty, Brutish, but Not Necessarily Short: A Reconsideration of the Statistical Methods Used to Calculate Age at Death from Adult Human Skeletal and Dental Age Indicators. American Antiquity 64(1):55-70.

Aykroyd, R. G., D. Lucy, A. M. Pollard, and T. Solheim

1997 Technical Note: Regression Analysis in Adult Age Estimation. American Journal of Physical Anthropology 104:259-265.

Baccino, Eric, Douglas H. Ubelaker, Lee-Ann C. Hayek, and A. Zerilli

1999 Evaluation of Seven Methods of Estimating Age at Death from Mature Human Skeletal Remains. Journal of Forensic Sciences 44(5):931-936.

Bass, William M.

2005 Human Osteology: A Laboratory and Field Manual. 5th Edition. Colombia, MO: Missouri Archaeological Society, Special Publication No. 2.

Bedford, M. E., K. F. Russell, and C. O. Lovejoy

1989 The Auricular Surface Aging Technique. 16 color photographs with descriptions. Kent, Ohio: Kent State University.

185

Bedford, M. E., K. F. Russell, C. O. Lovejoy, R. S. Meindl, S. W. Simpson, and P. L. Stuart-Macadam

1993 Test of the Multifactorial Aging Method Using Skeletons with Known Ages-at-Death from the Grant Collection. American Journal of Physical Anthropology 91(3):287-297.

Berg, Gregory E.

2008 Pubic Bone Age Estimation in Adult Women. Journal of Forensic Sciences 53(3):569-577.

Bernard, H. Russell

2002 Research Methods in Anthropology: Qualitative and Quantitative Approaches. Third Edition. Walnut Creek: AltaMira Press.

Blakenship, Jane A., Harry H. Mincer, Kenneth M. Anderson, Marjorie A. Woods, and Eddie L. Burton

2007 Third Molar Development in the Estimation of Chronologic Age in American Blacks as Compared with Whites. Journal of Forensic Sciences 52(2):428-433.

Bocquet-Appel, Jean-Pierre, and Claude Masset

1982 Farewell to Paleodemography. Journal of Human Evolution 11:321-333. Brach, Raymond M., and Patrick F. Dunn

2004 Uncertainty Analysis in Forensic Science. Tucson, AZ: Lawyers and Judges Publishing Company, Inc.

Brooks, Sheilagh Thompson

1955 Skeletal Age at Death: the Reliability of Cranial and Pubic Age Indicators. American Journal of Physical Anthropology 13:567-597

Brooks, S., and J. M. Suchey

1990 Skeletal Age Determination Based on the Os Pubis: a Comparison of the Acsádi-Nemeskéri and Suchey-Brooks Methods. Human Evolution 5(3):227-238.

Buckberry, J. L. and A. T. Chamberlain

2002 Age Estimation from the Auricular Surface of the Ilium: A Revised Method. American Journal of Physical Anthropology 119:231-239.

Buikstra, Jane E., and Douglas H. Ubelaker

1994 Standards for Data Collection from Human Skeletal Remains. Arkansas Archeological Survey Research Series, 44. Fayetteville: Arkansas Archeological Survey.

186

Byers, Steven N. 2008 Introduction to Forensic Anthropology: A Textbook. Boston: Allyn and

Bacon. Cardoso, Hugo F. V.

2008 Age Estimation of Adolescent and Young Adult Male and Female Skeletons II, Epiphyseal Union at the Upper Limb and Scapular Girdle in a Modern Portuguese Skeletal Sample. American Journal of Physical Anthropology 137:97-105.

Chaillet, Nils, and Arto Demirjian

2004 Dental Maturity in South France: A Comparison Between Demirjian’s Method and Polynomial Functions. Journal of Forensic Sciences 49(5):1-8.

Chaillet, Nils, Marjatta Nyström, Matti Kataja, and Arto Demirjian

2004 Dental Maturity Curves in Finnish Children: Demirjian’s Method Revisited and Polynomial Functions for Age Estimation. Journal of Forensic Sciences 49(6):1-8.

Chamberlain, Andrew T.

2006 Demography in Archaeology. New York: Cambridge University Press. Christensen, Angi M.

2004 The Impact of Daubert: Implications for Testimony and Research in Forensic Anthropology (and the Use of Frontal Sinuses in Personal Identification). Journal of Forensic Sciences 49(3):1-4.

Demirjian, A., H. Goldstein, and J. M. Tanner

1973 A New System of Dental Age Assessment. Human Biology 45(2):211-227. DiGangi, Elizabeth A., Jonathan D. Bethard, Erin H. Kimmerle, and Lyle W. Konigsberg

2009 A New Method for Estimating Age-at-Death from the First Rib. American Journal of Physical Anthropology 138(2):164-176.

Dirkmaat, Dennis C., Luis L. Cabo, Stephen D. Ousley, and Steven A. Symes

2008 New Perspectives in Forensic Anthropology. Yearbook of Physical Anthropology 51:33-52.

Edgar, Heather J. H.

2005 Prediction of Race Using Characteristics of Dental Morphology. Journal of Forensic Sciences 50(2):269-273.

187

Falys, Ceri G., Holger Schutkowski, and Darlene A. Weston 2006 Auricular Surface Ageing: Worse Than Expected? A Test of the Revised

Method on a Documented Historic Skeletal Assemblage. American Journal of Physical Anthropology 130:508-513.

Gilbert, B. Miles, and Thomas W. McKern

1973 A Method for Aging the Female Os Pubis. American Journal of Physical Anthropology 38:31-38.

Ginter, Jaime K.

2005 A Test of the Effectiveness of the Revised Maxillary Suture Obliteration Method in Estimating Adult Age at Death. Journal of Forensic Sciences 50(6):1303-1309.

Grivas, Christopher R., and Debra A. Komar

2008 Kumho, Daubert, and the Nature of Scientific Inquiry: Implications for Forensic Anthropology. Journal of Forensic Sciences 53(4):771-776.

Gruspier, Kathy L., and Grant J. Mullen

1991 Maxillary Suture Obliteration: A Test of the Mann Method. Journal of Forensic Sciences 36(2):512-519.

Hanihara, Kazuro, and Takao Suzuki

1978 Estimation of Age from the Pubic Symphysis by Means of Multiple Regression Analysis. American Journal of Physical Anthropology 48:233-240.

Hoppa, Robert D.

2000 Population Variation in Osteological Aging Criteria: An Example from the Pubic Symphysis. American Journal of Physical Anthropology 111:185-191.

Igarashi, Yuriko, Kagumi Uesu, Tetsuaki Wakebe, and Eisaku Kanazawa

2005 New Method for Estimation of Adult Skeletal Age at Death From the Morphology of the Auricular Surface of the Ilium. American Journal of Physical Anthropology 128:324-339.

International Organization for Standardization

2004a Guide to the Expression of Uncertainty in Measurement (GUM)–Supplement 1: Numerical Methods for the Propagation of Distributions. DGUIDE 99998. Geneva, Switzerland.

2004b International Vocabulary of Basic and General Terms in Metrology (VIM). DGUIDE 99999. Geneva, Switzerland.

2005 General Requirements for the Competence of Testing and Calibration Laboratories. ISO/IEC 17025 International Standard. Geneva, Switzerland.

188

Işcan, Mehmet Yaşar 1989a Assessment of Age at Death in the Human Skeleton. In Age Markers in the

Human Skeleton. Mehmet Yaşar Işcan, ed. Pp. 5-18. Springfield: Charles C. Thomas.

1989b Research Strategies in Age Estimation: the Multiregional Approach. In Age Markers in the Human Skeleton. Mehmet Yaşar Işcan,ed. Pp. 325-339. Springfield: Charles C. Thomas.

Işcan, M. Yaşar, and Susan R. Loth

1986a Determination of Age from the Sternal Rib End in White Males: A Test of the Phase Method. Journal of Forensic Sciences 31:122-132.

1986b Determination of Age from the Sternal Rib End in White Females: A Test of the Phase Method. Journal of Forensic Sciences 31:990-999.

Işcan, M. Yaşar, Susan R. Loth, and E. H. Scheuerman

1989 Assessment of Age from the Combined Use of the Sternal End of the Rib and Pubic Symphysis. Paper presented at the annual meeting of the American Academy of Forensic Sciences.

Işcan, M. Yaşar, Susan R. Loth, and Ronald K. Wright

1984a Metamorphosis at the Sternal Rib End: A New Method to Estimate Age at Death in White Males. American Journal of Physical Anthropology 65:147-156.

1984b Age Estimation from the Rib by Phase Analysis: White Males. Journal of Forensic Sciences 29(4):1094-1104.

1985 Age Estimation from the Rib by Phase Analysis: White Females. Journal of Forensic Sciences 30(3):853-863.

1987 Racial Variation in the Sternal Extremity of the Rib and Its Effect on Age Determination. Journal of Forensic Sciences 32(2):452-466.

Joint POW/MIA Accounting Command/Central Identification Laboratory

2008 JPAC Laboratory Manual. Part IV, SOP 4.0. Last revised 02 April 2008. Katz, Darryl, and Judy Myers Suchey

1986 Age Determination of the Male Os Pubis. American Journal of Physical Anthropology 69(4):427-435.

Kerley, E. R.

1965 The Microscopic Determination of Age in Human Bone. American Journal of Physical Anthropology 23:149-163.

1970 Estimation of Skeletal Age: After about Age 30 Years. In Personal Identification in Mass Disasters. T.D. Stewart, ed. Pp. 57-70. Washington, DC: National Museum of Natural History.

189

Kerley, E.R., and D.H. Ubelaker 1978 Revisions in the Microscopic Method of Estimating Age at Death in Human

Cortical Bone. American Journal of Physical Anthropology 49:545-546. Komar, Debra A., and Jane E. Buikstra

2008 Forensic Anthropology: Contemporary Theory and Practice. New York: Oxford University Press.

Konigsberg, Lyle W., and Susan R. Frankenberg

1992 Estimation of Age Structure in Anthropological Demography. American Journal of Physical Anthropology 89:235-256.

2002 Deconstructing Death in Paleodemography. American Journal of Physical Anthropology 117(4):297-309.

Konigsberg, Lyle W., Susan R. Frankenberg, and Renee B. Walker

1994 Regress What on What? Paleodemographic Age Estimation as a Calibration Problem. In Integrating Archaeological Demography: Multidisciplinary Approaches to Prehistoric Populations. Richard R. Paine, ed. Pp. 64-88. Occasional Paper No. 24. Center for Archaeological Investigations, Southern Illinois University, Carbondale.

Krogman, W. M.

1939 A Guide to the Identification of Human Skeletal Material. FBI Law Enforcement Bulletin 8:1-29.

Kunos, Charles A., Scott W. Simpson, Katherine F. Russell, and Israel Hershkovitz

1999 First Rib Metamorphosis: Its Possible Utility for Human Age-at-Death Estimation. American Journal of Physical Anthropology 110:303-323.

Kutyla, Alicja K.

2008 The Sacral Auricular Surface and its Significance in Age Estimation. Paper presented at the annual meeting of the American Association of Physical Anthropology. Columbus, OH. Annual Meeting Issue 2008: Supplement 46 (abstract).

Levin, Jack, and James Alan Fox

2007 Elementary Statistics in Social Research: The Essentials. Second Edition. Boston: Pearson Education, Inc.

Lovejoy, C. Owen, Richard S. Meindl, Robert P. Mensforth, and Thomas J. Barton

1985a Multifactorial Determination of Skeletal Age at Death: A Method and Blind Tests of Its Accuracy. American Journal of Physical Anthropology 68(1):1-14.

190

Lovejoy, C. Owen, Richard S. Meindl, Thomas R. Pryzbeck, and Robert P. Mensforth 1985b Chronological Metamorphosis of the Auricular Surface of the Ilium: A New

Method for the Determination of Adult Skeletal Age at Death. American Journal of Physical Anthropology 68:15-28.

Loth, Susan R., and Mehmet Yaşar Işcan

1989 Morphological Assessment of Age in the Adult: the Thoracic Region. In Age Markers in the Human Skeleton. Mehmet Yaşar Işcan, ed. Pp. 105-135. Springfield: Charles C. Thomas.

Lucy, D., R. G. Aykroyd, A. M. Pollard, and T. Solheim

1996 A Bayesian Approach to Adult Human Age Estimation from Dental Observations by Johanson’s Ages Changes. Journal of Forensic Sciences 41(2):189-194.

Mann, Robert W., Richard L. Jantz, William M. Bass, and Patrick S. Willey

1991 Maxillary Suture Obliteration: A Visual Method for Estimating Skeletal Age. Journal of Forensic Sciences 36(3):781-791.

Mann, Robert W., Steven A. Symes, and William M. Bass

1987 Maxillary Suture Obliteration: Aging the Human Skeleton Based on Intact or Fragmentary Maxilla. Journal of Forensic Sciences 32(1):148-157.

Maples, William R.

1989 The Practical Application of Age-Estimation Techniques. In Age Markers in the Human Skeleton. Işcan, Mehmet Yaşar, ed. Pp. 319-324. Springfield: Charles C. Thomas.

Martrille, Laurent, Douglas H. Ubelaker, Cristina Cattaneo, Fabienne Seguret, Marie

Tremblay, and Eric Baccino 2007 Comparison of Four Skeletal Methods for the Estimation of Age at Death on

White and Black Adults. Journal of Forensic Sciences 52(2):302-307. Masset, Claude

1989 Age Estimation on the Basis of Cranial Sutures. In Age Markers in the Human Skeleton. Mehmet Yaşar Işcan, ed. Pp. 71-103. Springfield: Charles C. Thomas.

McKern, Thomas W.

1970 Estimation of Skeletal Age: From Puberty to About 30 Years of Age. In Personal Identification in Mass Disasters. Pp. 41-56. T.D. Stewart, ed. Washington, DC: National Museum of Natural History, Smithsonian Institution.

McKern, Thomas W., and T. D. Stewart

1957 Skeletal Age Changes in Young American Males. Technical Report EP-45. Natick, MA: Quartermaster Research and Development Command.

191

Meindl, Richard S., and C. Owen Lovejoy 1985 Ectocranial Suture Closure: A Revised Method for the Determination of

Skeletal Age at Death Based on the Lateral-Anterior Sutures. American Journal of Physical Anthropology 68:57-66.

1989 Age Changes in the Pelvis: Implications for Paleodemography. In Age Markers in the Human Skeleton. Mehmet Yaşar Işcan, ed. Pp. 137-168. Springfield: Charles C. Thomas.

Meindl, Richard S., C. Owen Lovejoy, Robert P. Mensforth, and Robert A. Walker

1985 A Revised Method of Age Determination Using the Os Pubis, With a Review and Tests of Accuracy of Other Current Methods of Pubic Symphyseal Aging. American Journal of Physical Anthropology 68:29-45.

Mensforth, Robert P.

1990 Paleodemography of the Carlston Annis (Bt-5) Late Archaic Skeletal Population. American Journal of Physical Anthropology 82:81-99.

Milner, George R., James W. Wood, and Jesper L. Boldsen

2008 Advances in Paleodemography. In Biological Anthropology of the Human Skeleton. Pp. 561-600. M. Anne Katzenberg and Shelley R. Saunders, eds. Second Edition. Hoboken: Wiley-Liss.

Mincer, Harry H., Edward F. Harris, and Hugh E. Berryman

1993 The A.B.F.O. Study of Third Molar Development and Its Use as an Estimator of Chronological Age. Journal of Forensic Sciences 38(2):379-390.

Moore-Jansen, Peer M., Stephen D. Ousley, and Richard L. Jantz

1994 Data Collection Procedures for Forensic Skeletal Material. 3rd Edition. Knoxville, TN: University of Tennessee Department of Anthropology, Report of Investigations No. 48.

Moorrees, Coenraad F.A., Elizabeth A. Fanning, and Edward E. Hunt, Jr.

1963 Age Variation of Formation Stages for Ten Permanent Teeth. Journal of Dental Research 42(6):1490-1502.

Mulhern, Dawn M., and Erica B. Jones

2005 Test of Revised Method of Age Estimation From the Auricular Surface of the Ilium. American Journal of Physical Anthropology 126:61-65.

Murray, Katherine A., and Tracy Murray

1991 A Test of the Auricular Surface Aging Technique. Journal of Forensic Sciences 36(4):1162-1169.

192

National Academy of Sciences 2009 Strengthening Forensic Science in the United States: A Path Forward.

Washington, DC: National Academies Press. Nawrocki, Stephen P.

N.d. Prediction Intervals for Estimates of Age at Death from the Sternal Extremity of the Rib. Manuscript in preparation (unpublished).

1998 Regression Formulae for Estimating Age at Death from Cranial Suture Closure. In Forensic Osteology: Advances in the Identification of Human Remains. Second Edition. Kathleen J. Reichs, ed. Pp. 276-292. Springfield: Charles C. Thomas Publisher, Ltd.

Nemeskéri, J., L. Harsányi, and G. Acsádi

1960 Methoden zur Diagnose des Lebensalters von Skelettfunden. Anthropol Anzeiger 24:70-95.

Neter, John, William Wasserman, and G.A. Whitmore

1988 Applied Statistics. Third Edition. Boston: Allyn and Bacon, Inc. Osborne, Daniel

2000 Reconsidering the Auricular Surface as an Indicator of Age at Death. Masters Thesis, Department of Anthropology, Western Michigan University.

Osborne, Daniel L., Tal L. Simmons, and Stephen P. Nawrocki

2004 Reconsidering the Auricular Surface as an Indicator of Age at Death. Journal of Forensic Sciences 49(5):905-911.

Pyle, S. I., and N. L. Hoerr

1955 A Radiographic Standard of Reference for the Growing Knee. Second Edition. Springfield: Charles C. Thomas.

Rissech, Carme, George F. Estabrook, Eugenia Cunha, and Assumció Malgosa

2006 Using the Acetabulum to Estimate Age at Death of Adult Males. Journal of Forensic Sciences 51(2):213-229.

Ross, Ann H., and Lyle W. Konigsberg

2002 New Formulae for Estimating Stature in the Balkans. Journal of Forensic Sciences 47(1):165-167.

Rougé-Maillart, Clotilde, Norbert Telmon, Carme Rissech, Assumption Malgosa, and

Daniel Rougé 2004 The Determination of Male Adult Age at Death by Central and Posterior

Coxal Analysis – A Preliminary Study. Journal of Forensic Sciences 49(2):1-7.

193

Russell, Katherine F., Scott W. Simpson, Jeremy Genovese, Mary D. Kinkel, Richard S. Meindl, and C. Owen Lovejoy

1993 Independent Test of the Fourth Rib Aging Technique. American Journal of Physical Anthropology 92:53-62.

Saunders, Shelley, Carol DeVito, Ann Herring, Rebecca Southern, and Robert Hoppa

1993 Accuracy Tests of Tooth Formation Age Estimations for Human Skeletal Remains. American Journal of Physical Anthropology 92:173-188.

Saunders, S. R., C. Fitzgerald, T. Rogers, C. Dudar, and H. McKillop

1992 A Test of Several Methods of Skeletal Age Estimation Using a Documented Archaeological Sample. Canadian Society of Forensic Science Journal 25(2):97-118.

Schaefer, Maureen C., and Sue M. Black

2005 Comparison of Ages of Epiphyseal Union in North American and Bosnian Skeletal Material. Journal of Forensic Sciences 50(4):1-8.

Scheuer, Louise, and Sue Black

2000 Developmental Juvenile Osteology. San Diego, CA: Academic Press. Schmitt, Aurore

2002 Estimation de l’âge au décès des sujets adultes à partir du squelette: des raisons d’espérer. Bulletins et Mémoires de la Société d’Anthropologie de Paris 14(1-2): 1-20.

2004 Age-at-Death Assessment Using the Os Pubis and the Auricular Surface of the Ilium: a Test on an Identified Asian Sample. International Journal of Osteoarchaeology 14:1-6.

Schmitt, Aurore, and Pascal Murail

2004 Is the first rib a reliable indicator of age at death assessment? Test of the method developed by Kunos et al (1999). Homo 54(3):207-214.

Schmitt, Aurore, Pascal Murail, Eugenia Cunha, and Daniel Rougé

2002 Variability of the Pattern of Aging on the Human Skeleton: Evidence from Bone Indicators and Implications on Age at Death Estimation. Journal of Forensic Sciences 47:1203-1205.

Sinha, A., and V. Gupta

1995 A Study on Estimation of Age from Pubic Symphysis. Forensic Science International 75:73-78.

Solari, Ana C., and Kenneth Abramovitch

2002 The Accuracy and Precision of Third Molar Development as an Indicator of Chronological Age in Hispanics. Journal of Forensic Sciences 47(3):531-535.

194

Steadman, Dawnie Wolfe, Bradley J. Adams, and Lyle W. Konigsberg 2006 Statistical Basis for Positive Identification in Forensic Anthropology.

American Journal of Physical Anthropology 131:15-26. Stevenson, Paul H.

1924 Age Order of Epiphyseal Union in Man. American Journal of Physical Anthropology 7(1):53-93.

Stout, Sam D.

1989 The Use of Cortical Bone Histology to Estimate Age at Death. In Age Markers in the Human Skeleton. Mehmet Yaşar Işcan, ed. Pp. 195-207. Springfield: Charles C. Thomas.

Suchey, Judy Meyers

1979 Problems in the Aging of Females Using the Os Pubis. American Journal of Physical Anthropology 51:467-470.

Suchey, Judy Meyers, and Darryl Katz

1998 Applications of Pubic Age Determination in a Forensic Setting. In Forensic Osteology: Advances in the Identification of Human Remains. Second Edition. Kathleen J. Reichs, ed. Pp. 204-236. Springfield: Charles C. Thomas Publisher, Ltd.

Todd, T. Wingate

1920 Age Changes in the Pubic Bone. I: The Male White Pubis. American Journal of Physical Anthropology 3(3):285-334.

1921 Age Changes in the Pubic Bone. II: The Pubis of the Male Negro-White Hybrid. III: The Pubis of the White Female. IV: The Pubis of the Female Negro-White Hybrid. American Journal of Physical Anthropology 4(1):1-70.

Todd, T. W., and J. D’Errico, Jr.

1928 The Clavicular Epiphyses. American Journal of Anatomy 4:25-50. Todd, T. Wingate, and D. W. Lyon, Jr.

1924 Cranial Suture Closure, Its Progress and Age Relationship: Part I – Endocranial Closure in Adult Males of White Stock. American Journal of Physical Anthropology 7:325-384.

1925 Cranial Suture Closure, Its Progress and Age Relationship: Part II – Ectocranial Closure in Adult Males of White Stock. American Journal of Physical Anthropology 8(1):23-45.

Ubelaker, D. H.

1989 Human Skeletal Remains: Excavation, Analysis, and Interpretation. Second Edition. Washington, DC: Taraxacum.

195

Webb, Patricia A. Owings, and Judy Myers Suchey 1985 Epiphyseal Union of the Anterior Iliac Crest and Medial Clavicle in a Modern

Multiracial Sample of American Males and Females. American Journal of Physical Anthropology 68(4):457-466.

White, Tim D.

1991 Human Osteology. San Diego: Academic Press. 2000 Human Osteology. Second Edition. San Diego: Academic Press.

White, Tim D., and Pieter A. Folkens

2005 The Human Bone Manual. Amsterdam: Elsevier Academic Press. Wittwer-Backofen, Ursula, Jutta Gampe, and James W. Vaupel

2004 Tooth Cementum Annulation for Age Estimation: Results from a Large Known-Age Validation Study. American Journal of Physical Anthropology 123(2):119-129.

Wittwer-Backofen, Ursula, Jo Buckberry, Alfred Czarnetzki, Stefanie Doppler, Gisela

Grupe, Gerhard Hotz, Ariane Kemkes, Clark Spencer Larsen, Debbi Prince, Joachim Wahl, Alexander Fabig, and Svenja Weise

2008 Basics in Paleodemography: A Comparison of Age Indicators Applied to the Early Medieval Skeletal Sample of Lauchheim. American Journal of Physical Anthropology 137(4):384-396.

Wood, James W., George R. Milner, Henry C. Harpending, and Kenneth M. Weiss

1992 The Osteological Paradox: Problems of Inferring Prehistoric Health from Skeletal Samples. Current Anthropology 33(4):343-370.

Yoder, C., D. H. Ubelaker, and J. F. Powell

2001 Examination of Variation in Sternal Rib End Morphology Relevant to Age Assessment. Journal of Forensic Sciences 46(2):223-227.

Youden, W. J.

1998 Experimentation and Measurement. Mineola: Dover Publications, Inc.

APPENDIX A

197

FINAL SAMPLE SIZES FOR

ALL METHODS

Table A.1. Sample sizes per method.

*total number of teeth

Method Element N McKern and Stewart 1957 Pubic Symphysis (PS) 79 Suchey-Brooks Pubic Symphysis (PS) 93 Todd 1920, 1921 Pubic Symphysis (PS) 10 Lovejoy et al. 1985 Auricular Surface (AS) 147 Osborne et al. 2004 Auricular Surface (AS) 151 Buckberry and Chamberlain 2002 Auricular Surface (AS) 10 Iscan et al. 1984 Sternal Rib End (RIB) 21 Mann et al. 1991 Maxillary Sutures (MSUT) 62 Meindl and Lovejoy 1985 Ectocranial Sutures (CSUT) 22 Mincer et al. 1993 Dental Formation (DEN) 160* Moorrees et al. 1963 Dental Formation (DEN) 235*

198

Table A.2. Sample sizes for epiphyseal fusion methods.

Method Element N Albert-Maples 1995 Vertebrae (VERT) 24 Webb-Suchey 1985 Clavicle (CLAV) 33 Iliac Crest 6 McKern-Stewart 1957 All (EPIP) 161 Proximal Humerus 80 Distal Humerus 63 Medial Epicondyle 57 Proximal Radius 50 Distal Radius 51 Proximal Ulna 56 Distal Ulna 37 Proximal Femur 85 Greater Trochanter 70 Lesser Trochanter 65 Distal Femur 79 Proximal Tibia 72 Distal Tibia 65 Proximal Fibula 36 Distal Fibula 49 Clavicle 72 Iliac Crest 32 S1-S2 28 Vertebrae 18 Scheuer-Black 2000 Glenoid Fossa:

Scapula 7

Inferior Angle: Scapula

1

Coracoid: Scapula 2 Acromion: Scapula 1 Medial Clavicle 2 Acromion: Clavicle 2 Proximal Humerus 13 Distal Humerus 11 Medial Epicondyle 7 Proximal Radius 10 Distal Radius 8

199

Table A.2. (continued)

Method Element N Proximal Ulna 7 Distal Ulna 6 Proximal Femur 11 Greater Trochanter 8 Lesser Trochanter 9 Distal Femur 12 Proximal Tibia 8 Distal Tibia 8 Proximal Fibula 5 Distal Fibula 5 MC1 2 MC2 1 MC3 1 MC4 1 MC5 1 MT1 1 MT2 1 MT3 1 MT4 2 MT5 1 Phalanges 1 Talus 1 Calcaneus 1 Basilar Suture 1 Iliac crest 2 Acetabulum 1 Ischial tuberosity 2 S1/S2 2 S2/S3 2 S3/S4 1 S4/S5 1 S1 Superior 1 Cervical Vertebrae 1 Lumbar Vertebrae 1 Rib Heads 1 Sternal S2/S3 1

APPENDIX B

201

AGE DISTRIBUTIONS FOR LONG BONE EPIPHYSES (McKern-Stewart 1957)

Table B.1. Age distribution of stages of proximal humerus union (in %).

Age N 0 1 2 3 4 18 3 - 33 67 - - 19 6 - - 33 33 33 20 7 - - 71 14 14 21 15 - - - - 100 22 12 - - - 25 75 23 3 - - - - 100 24+ 33 - - - - 100

Total 79 Table B.2. Age distribution of stages of distal humerus union (in %)

Age N 0 1 2 3 4 18 2 - - - 50 50 19 2 - - - - 100 20 4 - - - - 100 21 12 - - - - 100 22 10 - - - - 100 23 3 - - - - 100 24+ 30 - - - - 100

Total 63

202

Table B.3. Age distributions of humeral medial epicondyle union (in %).

Age N 0 1 2 3 4 19 1 - - - - 100 20 4 - - - - 100 21 10 - - - - 100 22 9 - - - - 100 23 3 - - - - 100 24+ 30 - - - - 100

Total 57 Table B.4. Age distribution of stages of distal radius union (in %).

Age N 0 1 2 3 4 17 1 100 - - - - 18 2 - - 50 - 50 19 2 - - 100 - - 20 5 40 - - 60 - 21 9 - - - - 100 22 5 - - - - 100 23 4 - - - 25 75 24+ 22 - - - - 100

Total 50 Table B.5. Age distribution of stages of distal ulna union (in %).

Age N 0 1 2 3 4 18 1 - - - 100 - 19 2 - - 50 50 - 20 3 33 - - 67 - 21 8 - - - - 100 22 2 - - - - 100 23 2 - - - - 100 24+ 18 - - - - 100

Total 36

203

Table B.6. Age distribution of stages of proximal femur union (in %).

Age N 0 1 2 3 4 17 1 100 - - - - 18 2 - - 50 50 - 19 3 - - - - 100 20 6 - - - 17 83 21 16 - - - - 100 22 10 - - - 10 90 23 9 - - - - 100 24+ 38 - - - - 100

Total 85 Table B.7. Age distribution of stages of femoral greater trochanter union (in %).

Age N 0 1 2 3 4

17 1 100 - - - - 18 1 - 100 - - - 19 2 - - - - 100 20 2 - - - - 100 21 15 - - - - 100 22 9 - - - - 100 23 6 - - - - 100 24+ 34 - - - - 100

Total 70 Table B.8. Age distribution of stages of femoral lesser trochanter union (in %).

Age N 0 1 2 3 4 19 2 - - - - 100 20 2 - - - - 100 21 14 - - - - 100 22 8 - - - - 100 23 6 - - - - 100 24+ 33 - - - - 100

Total 65

204

Table B.9. Age distribution of stages of distal femur union (in %).

Age N 0 1 2 3 4 17 1 100 - - - - 18 2 - - - 100 - 19 4 25 - 50 - 25 20 7 14 - - 14 71 21 16 - - - - 100 22 9 - - - - 100 23 7 - - - 14 86 24+ 33 - - - - 100

Total 79 Table B.10. Age distribution of stages of proximal tibia union (in %).

Age N 0 1 2 3 4 17 1 100 - - - - 18 2 - 50 - 50 - 19 4 - - 25 50 25 20 6 - - 17 33 50 21 16 - - - - 100 22 7 - - - - 100 23 7 - - - - 100 24 7 - - - - 100 25 1 - - - - 100 26 5 - - - 20 80 27+ 16 - - - - 100

Total 72

205

Table B.11. Age distribution of stages of distal tibia union (in %).

Age N 0 1 2 3 4 18 2 50 - - - 50 19 5 - - 20 20 60 20 4 - - - 25 75 21 11 - - - - 100 22 8 - - - - 100 23 3 - - - - 100 24+ 32 - - - - 100

Total 65 Table B.12. Age distribution of stages of proximal fibula union (in %).

Age N 0 1 2 3 4 20 2 - - 50 50 - 21 7 - - - - 100 22 6 - - - - 100 23 3 - - - - 100 24+ 18 - - - - 100

Total 34 Table B.13. Age distribution of stages of distal fibula union (in %).

Age N 0 1 2 3 4 18 1 - - - 100 - 19 2 - - - 50 50 20 2 - - - 50 50 21 9 - - - - 100 22 6 - - - - 100 23 3 - - - - 100 24+ 26 - - - - 100

Total 49

APPENDIX C

207

MOORREES ET AL. (1963) CORRECT AND

INCORRECT CLASSIFICATION BY

TOOTH NUMBER AND ROOT

Table C.1. #17 mesial. Stage N # Correct % Correct # Incorrect % Incorrect R1/2 2 0 0 2 100 R3/4 3 2 66.67 1 33.33 Rc 2 1 50 1 50 Rc-A1/2 2 1 50 1 50 A1/2 3 2 66.67 1 33.33 Ac 42 42 100 0 0 Total 54 48 88.89 6 11.11

Table C.2. #17 distal. Stage N # Correct % Correct # Incorrect % Incorrect R1/2 3 0 0 3 100 R3/4 4 2 50 2 50 Rc 2 2 100 0 0 Rc-A1/2 2 1 50 1 50 A1/2 3 2 66.67 1 33.33 Ac 48 48 100 0 0 Total 62 55 88.71 7 11.29

208

Table C.3. #32 mesial. Stage N # Correct % Correct # Incorrect % Incorrect R1/2 2 0 0 2 100 R3/4 4 2 50 2 50 Rc 2 1 50 1 50 Rc-A1/2 2 1 50 1 50 A1/2 2 2 100 0 0 Ac 43 43 100 0 0 Total 55 49 89.09 6 10.91

Table C.4. #32 distal. Stage N # Correct % Correct # Incorrect % Incorrect R1/2 3 0 0 3 100 R3/4 5 3 60 2 40 Rc 4 3 75 1 25 Rc-A1/2 2 1 50 1 50 A1/2 2 2 100 0 0 Ac 48 48 100 0 0 Total 64 57 89.06 7 10.94

APPENDIX D

210

MINCER ET AL. (1993) CORRECT AND

INCORRECT CLASSIFICATION BY

TOOTH NUMBER

Table D.1. Tooth #1. Stage N # Correct % Correct # Incorrect % Incorrect D 0 - - - - E 1 1 100 0 0 F 2 2 100 0 0 G 4 2 50 2 50 H 36 36 100 0 0 Total 43 41 95.35 2 4.65

Table D.2. Tooth #16. Stage N # Correct % Correct # Incorrect % Incorrect D 1 0 0 1 100 E 1 1 100 0 0 F 2 2 100 0 0 G 5 3 60 2 40 H 32 32 100 0 0 Total 41 38 92.68 3 7.32

211

Table D.3. Tooth #17. Stage N # Correct % Correct # Incorrect % Incorrect D 0 0 - - - E 0 0 - - - F 1 1 100 0 0 G 4 3 75 1 25 H 32 32 100 0 0 Total 37 36 97.3 1 2.7

Table D.4. Tooth #32. Stage N # Correct % Correct # Incorrect % Incorrect D 0 0 - - - E 1 1 100 0 0 F 1 1 100 0 0 G 4 3 75 1 25 H 33 33 100 0 0 Total 39 38 97.44 1 2.56

uncertainty in skeletal aging: a retrospective study and test of

Documents