noack myers saa 2014 anthropology of archaeological data

1
The design of an archaeological data repository’s structure has important implications for the ways archaeological professionals interact with the data. The Digital Index of North American Archaeology (DINAA) brings into focus the different choices SHPOs and other repositories make regarding data collection and management. When working with digital data, structure (e.g. Booleans, lookup tables, text strings, categories of enumeration, etc.) functions in combination with vocabularies to frame our understanding of the archaeological record. While each state standardizes data collection through site forms and databases, prioritization of specific data collection in the collection process and the ontological system used to form the data necessarily create imposed and differing mental constructs of how archaeological concept relate to one another. This creates operative differences in site definition and different affordances for a researcher running queries through these imposed taxonomies of practice.. DINAA circumvents these limitations created by the structures of our data communication systems. Although DINAA does not manage sensitive data, including site locations, it is a valuable tool available to interpret data sets for research, resource management, and outreach. Further Information To learn more about DINAA, visit the project blog at: http://ux.opencontext.org/blog/ archaeology-site-data/ IMPORTANT CATEGORIES Data that describe archaeological sites can be CULTURAL, TEMPORAL, SPATIAL, USE-RELATED, ARTIFACT- RELATED, a combination of these, or another category altogether. Many sites are not defined with all of these traits. How would your state rank the importance of having these kinds of data? Or other descriptors? CULTURE HISTORY: The choice of terms represented in the databases represents local organizational adaptations to site forms, government computing systems, bureaucratic needs for coordination with other non-archaeological offices, etc. Our source sets are not really scientific databases, or anthropological databases based on any one analysis of the ways in which human activities in the past existed in any kind of hierarchy – they are management tools. However, these terms and tare management tools for the massive amount of data that exists. They necessarily contain scientific and anthropological information of value to stakeholders. The goal of DINAA is to make this system an open one. Using the descriptors in each state’s database, DINAA graduate research assistants work to identify each term based on the literature. Each term is identified to the nearest abstract category in the DINAA ontology based on its temporal association, and a citation for the definition of the term is identified. (See the information about our Zotero bibliography, above.) The beginning and ending dates for each term are also determined based on the literature, cited, and converted to BP dates for consistency in the database. In this way, each term becomes explicit in terms of its temporal qualities, and can be linked between states. However, if a site has not been identified with the same types or categories of labels, it may not show up in searches of the database if the information needed to locate the record has not been recorded. To what extent do terms that can be both temporal or cultural descriptors (e.g., Late Woodland) cause confusion or uncertainty in your database system? TERMS, TERMS, TERMS As anthropologists, we have no shortage of descriptive terms for the sites we analyze. The image above is a word cloud, which emphasizes words from a body of text based on their frequency of use. This image was made from a list of all of the descriptive terms used by partner states who have submitted their databases to DINAA. As you can see, there are a variety of term types. 144 28 97 165 145 Number of unique ID terms used in each database There are approximately 541 terms in use by just five of the states that have partnered with DINAA. Below are examples of unique identifiers used by Indiana, Iowa, Georgia, Missouri, or Florida and the categories into which they were classified. Amana: broad temporal Reconstruction 1866-1879: temporal/cultural Early Paleo-Indian: broad temporal/cultural Havana/Hopewell: broad cultural Caloosahatchee: cultural Allamakee Phase: phase Cahokia: cultural-spatial Cretaceous: geological Mississippian triangular: artifact type Multi-component: N/A 5 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Dalton Adena Clovis Middle… Mississi… Woodla… 20th… African… Alachua Altamaha Big… Early… Early… Folsom French Hardin Havana Hopewell Indeter… Kolomoki Late… Late… Late… Lower… Middle… Middle… Oneota Other Palmer Quaker Red… Santa… Shawnee Stanley Most frequently used terms & number of states in which they are used (IA, IN, MO, GA, FL) EXAMPLE: The Illinois database The column labeled “culture” contains an alphabetized list of the terms in the IL vocabulary. The rest of the columns are determined based on the process described below. The map above shows the sites whose data are currently a part of DINAA. The states included in the examples on this poster are specifically from the states starred above and shown below. Just looking at the bar graph depicted here, it is easy to see how different database terms are from state to state. EXAMPLE: The maps above demonstrate the result of the term choices used to identify each site. On the left, the simple term “historic” was used to show all sites that have that association. The middle map has the specification “historic Indian.” It is easy to see that there is a great reduction in the number of sites identified, and that state boundaries are clearly delineated within the results. This indicates that sites existing in states that do not use this specific term will not be represented in a query, despite the fact that they may exist in those states. The map on the right uses the culture specific term “Miami,” and reduces the results ever further. Again, does this mean that recorded historic Miami sites do not exist anywhere else, or that they are being referred to in a different way? Acknowledgments DINAA is a multi-institutional collaboration funded by a grant from the National Science Foundation Archaeology program. THANK YOU to all of the participants of the March 2014 Workshop for your input! Author affiliations: 1. Indiana University, Bloomington 2. Indiana University, South Bend 3.University of Tennessee, Knoxville 4. Open Context (http://opencontext.org) & UC, Berkeley (D-Lab) 5. Alexandria Archive Institute (http://alexandriaarchive.org) DINAA Group Library Resource on Zotero Another great product of DINAA is the group library with citations for all sources used for term definition (see more about the process below). If you would like to view a live version of the sources in use by DINAA, use the link or QR code shown here: http://goo.gl/U2hbWw Screen grabs below taken from:

Upload: dinaaproj

Post on 13-Jul-2015

253 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Page 1: Noack Myers SAA 2014 Anthropology of Archaeological Data

The design of an archaeological data repository’s structure has important implications for the ways archaeological professionals interact with the data. The Digital Index of North American Archaeology (DINAA) brings into focus the different choices SHPOs and other repositories make regarding data collection and management. When working with digital data, structure (e.g. Booleans, lookup tables, text strings, categories of enumeration, etc.) functions in combination with vocabularies to frame our understanding of the archaeological record. While each state standardizes data collection through site forms and databases, prioritization of specific data collection in the collection process and the ontological system used to form the data necessarily create imposed and differing mental constructs of how archaeological concept relate to one another. This creates operative differences in site definition and different affordances for a researcher running queries through these imposed taxonomies of practice.. DINAA circumvents these limitations created by the structures of our data communication systems. Although DINAA does not manage sensitive data, including site locations, it is a valuable tool available to interpret data sets for research, resource management, and outreach.

Further Information To learn more about DINAA,

visit the project blog at: http://ux.opencontext.org/blog/

archaeology-site-data/

IMPORTANT CATEGORIES Data that describe archaeological sites can be CULTURAL, TEMPORAL, SPATIAL, USE-RELATED, ARTIFACT-RELATED, a combination of these, or another category altogether. Many sites are not defined with all of these traits. How would your state rank the importance of having these kinds of data? Or other descriptors?

CULTURE HISTORY: The choice of terms represented in the databases represents local organizational adaptations to site forms, government computing systems, bureaucratic needs for coordination with other non-archaeological offices, etc. Our source sets are not really scientific databases, or anthropological databases based on any one analysis of the ways in which human activities in the past existed in any kind of hierarchy – they are management tools. However, these terms and tare management tools for the massive amount of data that exists. They necessarily contain scientific and anthropological information of value to stakeholders. The goal of DINAA is to make this system an open one. Using the descriptors in each state’s database, DINAA graduate research assistants work to identify each term based on the literature. Each term is identified to the nearest abstract category in the DINAA ontology based on its temporal association, and a citation for the definition of the term is identified. (See the information about our Zotero bibliography, above.) The beginning and ending dates for each term are also determined based on the literature, cited, and converted to BP dates for consistency in the database. In this way, each term becomes explicit in terms of its temporal qualities, and can be linked between states. However, if a site has not been identified with the same types or categories of labels, it may not show up in searches of the database if the information needed to locate the record has not been recorded. To what extent do terms that can be both temporal or cultural descriptors (e.g., Late Woodland) cause confusion or uncertainty in your database system?

TERMS, TERMS, TERMS As anthropologists, we have no shortage of descriptive terms for the sites we analyze. The image above is a word cloud, which emphasizes words from a body of text based on their frequency of use. This image was made from a list of all of the descriptive terms used by partner states who have submitted their databases to DINAA. As you can see, there are a variety of term types. 144

28

97

165 145

Number of unique ID terms used in each database

There are approximately 541 terms in use by just five of the states that have partnered with

DINAA.

Below are examples of unique identifiers used by Indiana, Iowa, Georgia, Missouri, or Florida

and the categories into which they were classified.

Amana: broad temporal

Reconstruction 1866-1879: temporal/cultural Early Paleo-Indian: broad temporal/cultural

Havana/Hopewell: broad cultural Caloosahatchee: cultural Allamakee Phase: phase Cahokia: cultural-spatial Cretaceous: geological

Mississippian triangular: artifact type Multi-component: N/A

5 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Dalto

nAd

ena

Clov

isM

iddl

e…M

ississ

i…W

oodl

a…20

th…

Afric

an…

Alac

hua

Alta

mah

aBi

g…Ea

rly…

Early

…Fo

lsom

Fren

chHa

rdin

Hava

naHo

pew

ell

Inde

ter…

Kolo

mok

iLa

te…

Late

…La

te…

Low

er…

Mid

dle…

Mid

dle…

One

ota

Oth

erPa

lmer

Qua

ker

Red…

Sant

a…Sh

awne

eSt

anle

y

Most frequently used terms & number of states in which they are used (IA, IN, MO, GA, FL)

EXAMPLE: The Illinois database The column labeled “culture” contains an alphabetized list of the terms in the IL vocabulary. The rest of the columns are determined based on the process described below.

The map above shows the sites whose data are currently a part of DINAA. The states included in the examples on this poster are specifically from the states starred above and shown below. Just looking at the bar graph depicted here, it is easy to see how different database terms are from state to state.

EXAMPLE: The maps above demonstrate the result of the term choices used to identify each site. On the left, the simple term “historic” was used to show all sites that have that association. The middle map has the specification “historic Indian.” It is easy to see that there is a great reduction in the number of sites identified, and that state boundaries are clearly delineated within the results. This indicates that sites existing in states that do not use this specific term will not be represented in a query, despite the fact that they may exist in those states. The map on the right uses the culture specific term “Miami,” and reduces the results ever further. Again, does this mean that recorded historic Miami sites do not exist anywhere else, or that they are being referred to in a different way?

Acknowledgments DINAA is a multi-institutional collaboration funded by a grant from the National Science Foundation Archaeology program.

THANK YOU to all of the participants of the March 2014 Workshop for your input! Author affiliations: 1. Indiana University, Bloomington 2. Indiana University, South Bend 3.University of Tennessee, Knoxville 4. Open Context (http://opencontext.org) & UC, Berkeley (D-Lab) 5. Alexandria Archive Institute (http://alexandriaarchive.org)

DINAA Group Library Resource on Zotero Another great product of DINAA is the group library with citations for all sources used for term definition (see more about the process below). If you would like to view a live version of the sources in use by DINAA, use the link or QR code shown here: http://goo.gl/U2hbWw

Screen grabs below taken from: