gesis – vocabulary, statistics, time and geography combining statistics and text for a view of...

19
GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May 27, 2009 Fredric C. Gey UC Data Archive & Technical Assistance. University of California, Berkeley http://ucdata.berkeley.edu/gey.html Institute for Museum and Library Services Grants: Seamless search of textual and numeric databases (1999-2002), Going places in the catalog: Improved Geographic Access (2002-2004), What Where, When and Why– support for the learner (2004-2006), Bringing Lives to Light – Biography in Context (2006-2008) Context and Relationships – Ireland and Irish Studies, (2007-2010) Colleagues: Michael Buckland, Ray Larson, Kim Carl, Jeanette Zerneke, host of students including Ryan Shaw and Vivien Petras Collaboration with Centre for Digitisation, Queens University, Belfast Paul Ell, collaborating PI

Upload: melvin-norton

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

GESIS – Vocabulary, Statistics, Time and Geography

Combining Statistics and Text for a View of Irish Cultural Heritage

IASSIST 2009, Tampere Finland, May 27, 2009

• Fredric C. Gey– UC Data Archive & Technical Assistance. – University of California, Berkeley– http://ucdata.berkeley.edu/gey.html

• Institute for Museum and Library Services Grants: – Seamless search of textual and numeric databases (1999-2002),– Going places in the catalog: Improved Geographic Access (2002-2004), – What Where, When and Why– support for the learner (2004-2006), – Bringing Lives to Light – Biography in Context (2006-2008) – Context and Relationships – Ireland and Irish Studies, (2007-2010) – Colleagues: Michael Buckland, Ray Larson, Kim Carl, Jeanette Zerneke, host of

students including Ryan Shaw and Vivien Petras– Collaboration with Centre for Digitisation, Queens University, Belfast

• Paul Ell, collaborating PI

Page 2: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

GESIS – Vocabulary, Statistics, Time and Geography

HETEROGENEOUS DIGITAL INFORMATION SEARCHCurrent Search Technology

(multiple independent searches without search aids)

Bibliography

Full Text

Maps and other Geospatial data Music and other

media

QUERY

Numeric Statistical Databases

Patents

Page 3: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

GESIS – Vocabulary, Statistics, Time and Geography

Heterogeneous Digital Information SearchDirect Mappings and Search Between Multiple Information Types

QUERY

Bibliography

Full Text

Numeric Statistical Databases

Patents

EVMs

EVM

mEVM

pEVMt

EVMg

QUERYplus

Maps and other Geospatial data

Music and other media

Page 4: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

GESIS – Vocabulary, Statistics, Time and Geography

Context and Relationships: Ireland and Irish Studies (Goals)

(2007-2010 NEH/IMLS Grant)

• Enable automatic and manual editorial markup of scanned scholarly materials for personal names and

geography • Recognition of place/person names in middle English

and Gaelic• Combine historical statistics with external search of

documents by geographic commonality

• Utilize Hogan’s Onomasticon Goedelicum locorum et tribuum Hiberniae et Scotiae An index, with identifications, to the Gaelic names of places and tribes (1909 Edmund Hogan, SJ), a kind of concordance of Irish documents by place

Page 5: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

GESIS – Vocabulary, Statistics, Time and Geography

Who, What, Where When IMLS Project(2004-200 IMLS grant)

Developed multi-genre search using common geography (data/books)

Page 6: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

GESIS – Vocabulary, Statistics, Time and Geography

Biography Markup and Search Goals(2006-2006 IMLS grant)

• To develop tools for editors, archivists and compilers of historical papers – Emma Goldman papers

• To develop display in time/space to facilitate historical discovery, i.e. who lived there at the same time and what important events occurred there• To visualize biography as an ordered sequence of 4-tuple events (activity,

time, place, other-people) – developing biographical markup standards

• Congressional Biography – automatic markup of place, date, time-range

<biog source="cong_dict" page_start="19" page_end="19"> <name> ADAMS, JOHN QUINCY. </name> <text> Born in Braintree, now Quincy, Mass., July 11, 1767. When ten years of age, he

accompanied his father to France ; and when fifteen, was private secretary to the American Minister in Russia. He was graduated at Harvard University in 1787 ; studied law in Newburyport, and settled in Boston. From 1794 to 1801 he was American Minister to Holland, England, Sweden, and Prussia. He was a Senator in Congress from 1803 to 1808

</text></biog>

Page 7: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

GESIS – Vocabulary, Statistics, Time and Geography

Biography Markup: Emma Goldman Travels(2006-2009 IMLS grant)

The Atom format feeds directly into GOOGLE maps

Page 8: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

GESIS – Vocabulary, Statistics, Time and Geography

From Publishing Context to Building Context

Page 9: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

GESIS – Vocabulary, Statistics, Time and Geography

Context and Relationships: Ireland and Irish Studies

(2007-2009 NEH/IMLS Grant)

• Collaboration with Center for Digitization, Queens University Belfast– Digitizing ~500,000 pages of Irish Historical and Cultural Studies

• To develop display and contextual search in time/space to facilitate scholarly discovery: http://gray.ischool.berkeley.edu/oldw4/irish/

Page 10: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

Digital Library of Core Materials on Ireland exemplar

• £620,000 grant from JISC to digitise journals, monographs and manuscripts relating to Irish Studies and create the foundations of a digital library resource

• Initial archive of around 470,000 pages

• 100 journals covering 200 year period and about 400,000 pages

• 2,500 pages of manuscript

• 205 key monographs

• Machine-readable text for all journals and monographs and some manuscripts

• Detailed ‘object’ level metadata

Page 11: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

Project Imperatives

• Access to rare resources without visiting Belfast

• Resource discovery – use of less common journals

• New, complex searching using detailed metadata and semantic searching

• Serendipity

• A one stop shop for journals – and more

• Enhanced research developing from better access

Insert image

Page 12: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

GESIS – Vocabulary, Statistics, Time and Geography

Ireland and Irish Studies: Statistical Data about Ireland

• Center for Digitization, Queens University Belfast has digitized 200 years of Irish Historical Statistics

• We wish to integrate statistical data display with scholarly search and browsing by time and place

Page 13: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

The Database of Irish Historical Statistics

• 32,934,018 data values from 1821 to 1971, and then linked to contemporary digital sources

• Mostly census data but also annual agricultural statistics, civil registration information, crime statistics . . .

• Topics include population statistics, crop and stock data, language, literacy, religion, occupations, employment, housing, emigration, industry and industrial structure, trade and commerce, wages, pauperism etc

• www.qub.ac.uk/cdda/iredb/dbhme.htm

Page 14: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

GESIS – Vocabulary, Statistics, Time and Geography

Ireland and Irish Studies: Our new approachUtilize the capabilities of Google Earth

• Obtain historic Irish sub-county boundary files (Baronies and Poor Law Union)

Page 15: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

GESIS – Vocabulary, Statistics, Time and Geography

Ireland and Irish Studies: Our new approachUtilize the capabilities of Google Earth (2)

• Utilize the KML markup language to integrate statistical data display with scholarly search and browsing by time and place

Page 16: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

GESIS – Vocabulary, Statistics, Time and Geography

Ireland and Irish Studies: Google Earth (3)Search links added to statistical data display

Page 17: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

GESIS – Vocabulary, Statistics, Time and Geography

Ireland and Irish Studies: next steps

Add more statistics Religion (percent Catholic, Protestant, other) Agriculture

Add more resources to search

Begin working with and geographically indexing the 500k pages of Irish journals and books.

Refine our user interfaces and develop more prototype demonstrations

Page 18: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

GESIS – Vocabulary, Statistics, Time and Geography

References

• M Buckland and L Lancaster 2004, "Combining Place, Time, and Topic" D-Lib Magazine, May 2004, Volume 10 Number 5 http://www.dlib.org/dlib/may04/buckland/05buckland.html

• M Buckland, A Chen, F Gey & R Larson, 2006. “Search Across Different Media: Numeric Data Sets and Text Files.” Information Technology and Libraries. December 2006, pp 181-189.

• M Buckland, A Chen, F Gey, R Larson, R Mostern & V Petras 2007 ”Geographic

Search: Catalogs, Gazetteers, and Maps.” College & Research Libraries, Sept 2007

• F Gey, R Shaw, R Larson, M Buckland, B Pateman and D Melia, “Marking Up Cultural Materials for Time and Geography,” in Proceedings of the Workshop on Information Access to Cultural Heritage, Aarhus, Denmark, Sept 28, 2008.

• F Gey, R Shaw, R Larson, B Pateman, “Biography as events in time and space”, Proceedings of ACM GIS Conference, Irvine, California, Nov 4-7, 2008

• Emma Goldman papers (http://sunsite.berkeley.edu/Goldman/) • http://www.ucc.ie:8080/cocoon/doi/locus (onomasticon)

Page 19: GESIS – Vocabulary, Statistics, Time and Geography Combining Statistics and Text for a View of Irish Cultural Heritage IASSIST 2009, Tampere Finland, May

GESIS – Vocabulary, Statistics, Time and Geography

Grant home pages

Biography project

• http://ecai.org/imls2006/

Irish project

• http://ecai.org/neh2007/