the archaeotools project, faceted classification and natural language processing in an...

31
The Archaeotools project, faceted The Archaeotools project, faceted classification and natural classification and natural language processing in an language processing in an archaeological context. archaeological context. University of York, April 2008

Upload: whitney-miller

Post on 14-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008

The Archaeotools project, faceted The Archaeotools project, faceted classification and natural language processing classification and natural language processing in an archaeological context.in an archaeological context.

University of York, April 2008

Page 2: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008

AHRC-EPSRC-JISC eScience research grants scheme:AHRC-EPSRC-JISC eScience research grants scheme:

AIM: To allow archaeologists to discover, share and analyse datasets and legacy publications which have hitherto been very difficult to integrate into existing digital frameworks

BUILDS UPON: Common Information Environment Enhanced Geospatial browser

PARTNERS: Natural Language Processing Research Group, Department of Computer Science, University of Sheffield

Joint Information Systems Committee

Page 3: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008
Page 4: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008
Page 5: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008
Page 6: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008
Page 7: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008
Page 8: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008

• Workpackage 1 - Advanced Faceted Classification /Geo-spatial Workpackage 1 - Advanced Faceted Classification /Geo-spatial browser – 1m+ records; 4 primary facets (What, Where, When browser – 1m+ records; 4 primary facets (What, Where, When and Media).and Media).

• Workpackage 2 – Natural language processing /Data-mining of Workpackage 2 – Natural language processing /Data-mining of Grey Literature; plus taggingGrey Literature; plus tagging

• Workpackage 3 – Data-mining of Historic Literature; plus Workpackage 3 – Data-mining of Historic Literature; plus geoXwalkgeoXwalk

Three distinct Workpackages:

Page 9: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008

• Datasets include:– National Monuments Records (Scotland, Wales, England)– Excavation Index (EH)– Archive Holdings– Local Authority Historic Environment Records

• Thesauri include:– Thesaurus of Monuments Types (TMT)– Thesaurus of Object Types – MIDAS Period list– UK Government list of administrative areas, County,

District, Parish (CDP) – Not MIDAS

Work package 1Work package 1

Page 10: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008

OracleRDBMS

MIDAS XML Record

Information Extraction RDF Resource

Knowledge triple store

XML Docs of Thesaurus

Query

User Interface

Information Extraction

When, Where, What ontologiesas entries to faceted index

Input

Input

Page 11: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008

“WHAT”

• Records that have no subject information

• Records that use terms not found in TMT, so these records cannot be indexed (6,442 unique terms)

Records (1,001,407)

19,269 records (2%)

Records (1,001,407)

101,507 records (10.1%)

Page 12: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008

“WHEN”

• Records that have no temporal information

• Records that use period terms not found in MIDAS so these records cannot be indexed (457 types of irresolvable dates)

Records (1,001,407)

292,793 records (29.2%)

Records (1,001,407)

114,505 (11.4%)

1066, 1001-1100,11th Centuary, C11, 11C, Eleventh Century

Page 13: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008

“WHERE”

• Records that have no spatial information

• Records that use terms not found in CDP, so these records cannot be indexed.

Records (1,001,407)

11,126(1.1%)

Records (1,001,407)

245,601 records (24.5%)

Page 14: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008
Page 15: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008
Page 16: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008
Page 17: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008
Page 18: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008

linear

Page 19: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008

• Workpackage 1 - Advanced Faceted Classification /Geo-spatial Workpackage 1 - Advanced Faceted Classification /Geo-spatial browser – 1m+ records; 4 primary facets (What, Where, When browser – 1m+ records; 4 primary facets (What, Where, When and Media).and Media).

• Workpackage 2 – Natural language processing /Data-mining of Workpackage 2 – Natural language processing /Data-mining of Grey Literature; plus taggingGrey Literature; plus tagging

• Workpackage 3 – Data-mining of Historic Literature; plus Workpackage 3 – Data-mining of Historic Literature; plus geoXwalkgeoXwalk

Three distinct Workpackages:

Page 20: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008

XML tagging of semantic content

CIDOC: CRM

Page 21: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008

University Researchers

Local authority curators

Page 22: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008
Page 23: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008
Page 24: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008
Page 25: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008

• Workpackage 1 - Advanced Faceted Classification /Geo-spatial Workpackage 1 - Advanced Faceted Classification /Geo-spatial browser – 1m+ records; 4 primary facets (What, Where, When browser – 1m+ records; 4 primary facets (What, Where, When and Media).and Media).

• Workpackage 2 – Natural language processing /Data-mining of Workpackage 2 – Natural language processing /Data-mining of Grey Literature; plus taggingGrey Literature; plus tagging

• Workpackage 3 – Data-mining of Historic Literature; plus Workpackage 3 – Data-mining of Historic Literature; plus geoXwalkgeoXwalk

Three distinct Workpackages:

Page 26: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008
Page 27: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008
Page 28: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008
Page 29: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008

http://ads.ahds.ac.uk/project/archaeotools/

Page 30: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008
Page 31: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008