iassit kansa presentation
DESCRIPTION
A presentation given at the "Data Stewardship: Increasing the Integrity and Effectiveness of Science and Scholarship" Session on Friday, June 8 2012 at the IASSIT 2012 conference in Washington DC. This presentation introduced data publishing, using a social science (archaeology) case study to explore editorial processes and dissemination outcomes that increasingly demand “Linked Data” capabilities.TRANSCRIPT
![Page 1: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/1.jpg)
Case-Study: Publishing to the “Web of Data” in Archaeology
Quality and Workflows
Eric Kansa UC Berkeley / OpenContext.org
Unless otherwise indicated, this work is licensed under a Creative Commons Attribution 3.0 License <http://creativecommons.org/licenses/by/3.0/>
![Page 2: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/2.jpg)
“Small Science” data sharing is hard:(1) Complexity(2) Scalability(3) Ethics, cultural property
claims, IP(4) Incentives(5) Preservation
Image Credit: “Grand Canyon NPS” via Flickr (CC-By)http://www.flickr.com/photos/grand_canyon_nps/5975537378/
![Page 3: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/3.jpg)
Thousand Flowers
● Open Context: Open access, open licensed data for arhaeology
● Archiving by California Digital Library
● Persistent Identifiers (DOIs, ARKs)
● Web services● NSF/NEH links for data
management plans
![Page 4: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/4.jpg)
Thousand Flowers
Fills a Gap:
Most data sources are institutional. Open Context publishes individual, small group contributions
![Page 5: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/5.jpg)
Thousand Flowers
Fills a Gap:
Most data sources are institutional. Open Context publishes individual, small group contributions
Challenge:Diverse contributions, needing lots of work to clean-up and “link” to the Web of Data
![Page 6: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/6.jpg)
• 3-year project Oct 2010 – Sep 2013
• Funded with a National Leadership Grant from the Institute for Museum and Library Services, LG-06-10-0140-10, “Dissemination Information Packages for Information Reuse”
• Ixchel Faniel, PI & Elizabeth Yakel, Co-PI
http://www.dipir.org
![Page 7: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/7.jpg)
DIPIR Collaboration
![Page 8: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/8.jpg)
The Big DIPIR Questions
Research Questions
1. What are the significant properties of data that facilitate reuse by the designated communities at the three sites?
2. How can these significant properties be expressed as representation information to ensure the preservation of meaning and enable data reuse?
![Page 9: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/9.jpg)
Open Context Interviewees
• 22 Ph.D. or graduate students interviewed
– 13 men– 9 women
• Novices / Experts– 19 experts– 3 novices
• Interviewees who where curators or professors also with a curatorial role = 6
![Page 10: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/10.jpg)
Raw Data is Unappetizing?
![Page 11: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/11.jpg)
Data Documentation PracticesI use an Excel spreadsheet…which I … inherited from my research advisers. …my dissertation advisor was still recording data for each specimen on paper when I was in graduate school so that's what I started …then quickly, I was like, "This is ridiculous.“… I just started using an Excel spreadsheet that has sort of slowly gotten bigger and bigger over time with more variables or columns…I've added …color coding…I also use…a very sort of primitive numerical coding system, again, that I inherited from my research advisers…So, this little book that goes with me of codes which is sort of odd, but …we all know that a 14 is a sheep.” (CCU13)
![Page 12: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/12.jpg)
Data Documentation PracticesI use an Excel spreadsheet…which I … inherited from my research advisers. …my dissertation advisor was still recording data for each specimen on paper when I was in graduate school so that's what I started …then quickly, I was like, "This is ridiculous.“… I just started using an Excel spreadsheet that has sort of slowly gotten bigger and bigger over time with more variables or columns…I've added …color coding…I also use…a very sort of primitive numerical coding system, again, that I inherited from my research advisers…So, this little book that goes with me of codes which is sort of odd, but …we all know that a 14 is a sheep.” (CCU13)
A long way to go before we get usable, intelligible data
![Page 13: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/13.jpg)
Sometimes data is better served cooked.
![Page 14: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/14.jpg)
Thousand Flowers
● Clean-up and document contributed data
● Map to ArchaeoML (general ontology)
● Mint URIs to entities (potsherds, projects, contexts, people)
● Link to important vocabularies / collections (Pleiades, Encyclopedia of Life)
● Working on CIDOC-CRM (RDF) representations (not straightforward)
![Page 15: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/15.jpg)
Open Context: Record
![Page 16: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/16.jpg)
Open Context: Record
● XHTML + RDFa (Dublin Core, Open Annotation, etc.)
● XML (ArchaeoML)● Atom● RDF (draft CIDOC)● Link to GitHub versioned file
![Page 17: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/17.jpg)
Open Context: Record
![Page 18: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/18.jpg)
Open Context: Record
![Page 19: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/19.jpg)
Open Context: Visutalization of Data Linked to the EOL
![Page 20: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/20.jpg)
My Precious Data
Image Credit: “Lord of the Rings” (2003, New Line), All Rights Reserved Copyright
![Page 21: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/21.jpg)
Data sharing as publication
![Page 22: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/22.jpg)
Data Publishing
![Page 23: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/23.jpg)
Data Quality and Standards Alignment(1) Check consistency(2) Edit functions(3) Align to common standards
(“Linked Data” if applicable)(4) Issue tracking, version
control
Publishing
![Page 24: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/24.jpg)
Tools of the Trade
(1) Google Refine (check, edit, consistancy)
(2) Mantis (issue-tracker, coordinate edits, metadata creation)
Publishing
![Page 25: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/25.jpg)
Tools of the Trade
(1) Domain scientists (Editorial Board) check data
(2) Iterative “coproduction” between contributors and editoris
Publishing
![Page 26: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/26.jpg)
Publishing
Project Metadata
Column Descriptions
![Page 27: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/27.jpg)
Web of Data (2011)
Main Contributors:
● Institutions (esp. government)
● Thematic collections / projects
![Page 28: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/28.jpg)
Entity Reconciliation
(1) With Google Refine(2) Implemented, EOL and
Pleiades (gazetteer)(3) Use existing mappings to
improve future reconciliation
Publishing
![Page 29: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/29.jpg)
● CDL Archiving Service● EZID for persistent Identity: DOIs
(aggregate resources), ARKs (granular resources) and Merritt Repository
● Helps build trust in community
![Page 30: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/30.jpg)
● Platform / Services disciplinary communities can use for “Data Publishing”
● Different communities work out semantic/interoperability needs, editorial policies, incentives, etc.
University of California (System) Repository,
All disciplines(UC-funded library, grants)
CDL as Infrastructure
![Page 31: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/31.jpg)
● Platform / Services disciplinary communities can use for “Data Publishing”
● Different communities work out semantic/interoperability needs, editorial policies, incentives, etc.
University of California (System) Repository,
All disciplines(UC-funded library, grants)
CDL as InfrastructureFuture data publisher
Future data publisher
![Page 32: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/32.jpg)
eScholarship: UC’s OA Publishing Platform
![Page 33: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/33.jpg)
Platform for traditional publishing
![Page 34: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/34.jpg)
Also supports new genres
![Page 35: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/35.jpg)
Outcomes of Publishing Data:(1) Communicate and set
expectations about content and quality
(2) Organize workflows to improve data quality and usability
(3) Make “datasets” first class citizens in world of scholarly communications
Summary
![Page 36: IASSIT Kansa Presentation](https://reader034.vdocument.in/reader034/viewer/2022051609/547b602eb47959a4098b4e0f/html5/thumbnails/36.jpg)
Final Thoughts
Publication needs to evolve!
(1) Participating in Linked Data is a great goal, but far removed from most everyday practice
(2) Researchers need help.
(3) 19th century publication norms poorly suited to 21st century methods, research, public goals