ala 2010 -- jeremy york
DESCRIPTION
TRANSCRIPT
![Page 1: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/1.jpg)
HATHI TRUSTA Shared Digital Repository
Delivering Data For New Generations of ResearchNew Generations of Research
Strategies and ChallengesStrategies and ChallengesJeremy York
NISO/BISG ForumNISO/BISG ForumALA 2010
![Page 2: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/2.jpg)
IntroductionIntroduction
• Digital RepositoryDigital Repository– Initial focus on digitized book and journal content
– “Light” archive– Light archive
• Collections and CollaborationC h i ll ti– Comprehensive collection
– Shared strategies
Local services– Local services
– Public Good
![Page 3: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/3.jpg)
Content DistributionContent Distribution
19%
In Copyright
81%Public Domain
6,173,575 – Total1,177,667 – Public Domain
* As of June 15, 2010
![Page 4: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/4.jpg)
Language Distribution (1)Language Distribution (1)
The top 10 languages make up ~86%
ItalianArabic2%
Polish1% Remaining
p g g p %of all content
English48%
h
Japanese4%
Italian3%
2% Languages14%
48%
FrenchSpanish
Chinese4%
German8%
French7%
Russian5%
Spanish4%
5%
* As of June 15, 2010
![Page 5: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/5.jpg)
Language Distribution (2)Language Distribution (2)
Serbian%
Romanian%
Ancient‐GreekYiddishSlovenian%
Multiple
The next 40 languages make up
Hindi6%
Portuguese6%
Hebrew
Vietnamese2% Ukrainian
2%Bulgarian
2%
1%
Armenian1%Greek1%
Panjabi1%
Malay1%Catalan1%
1%Malayalam1% Slovak
1%
1%1%
Finnish1%
p1% ~13% of total
Hebrew6%
Indonesian6%
D t hNorwegian
Hungarian2% Sanskrit
2%
Ukrainian2%
1%1% 1%1%
Dutch5%
LatinKorean2%
Bengali2%
Norwegian2%
5%Urdu4%
Swedish4%TurkishCzechThaiDanish
Undetermined3%Tamil
Persian3%
2%
4%Turkish4%
Unknown4%
Czech3%
Thai3%3%Croatian
3%
3%
* As of June 15, 2010
![Page 6: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/6.jpg)
Originating InstitutionOriginating Institution
Uni ersit of Indiana University of Penn State University of Wisconsin
6%
University3%
University of Minnesota
1%University
0%
University of California
University of Michigan65%
25%
65%
* As of June 15, 2010
![Page 7: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/7.jpg)
Content over timeContent over time
80%
100%
40%
60% Minnesota
Penn State
California
0%
20%
4
California
Indiana
Wisconsin
Michigan
Sep‐04
Nov‐04
Jan‐05
Mar‐05
May‐05
Jul‐0
5
Sep‐05
Nov
‐05
an‐06
ar‐06
y‐06
MichiganN Ja
Ma
May
* As of June 15, 2010
![Page 8: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/8.jpg)
Content GrowthContent Growth
![Page 9: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/9.jpg)
![Page 10: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/10.jpg)
Data Distribution & APIsData Distribution & APIs
• OAI‐PMHOAI PMH
• Metadata files
ibli hi• Bibliographic API
• Data API
![Page 11: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/11.jpg)
Extended ServicesExtended Services
• Community Development EnvironmentCommunity Development Environment
• Non‐Google Ingest
k/ l• Non‐Book/Non‐Journal Ingest
• Computational Research
![Page 12: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/12.jpg)
Strategies for Computational ResearchStrategies for Computational Research
• Data distributionData distribution
• Protocol‐based access
h C• Research Center
![Page 13: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/13.jpg)
![Page 14: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/14.jpg)
SEASR ArchitectureVisualizationsVisualizations
AppsApps ServicesServicesPluginsPluginsWeb AppsWeb Apps
User InterfacesUser Interfaces
ComponentsComponents
Meandre Data‐Intensive FlowsMeandre Data‐Intensive Flowsr Tools
r Tools
RepositoriesRepositories
Meandre WorkbenchMeandre Workbench
ComponentsComponents
Meandre InfrastructureMeandre Infrastructure
VisualizationVisualization
Component RepositoryComponent Repository Component DiscoveryComponent Discovery
AnalyticsAnalyticsDataData
Develop
erDevelop
er DataAnalysis
ComponentsFlows
DataAnalysis
ComponentsFlows
Virtualization InfrastructureVirtualization Infrastructure
Cloud ComputingCloud Computing
![Page 15: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/15.jpg)
SEASR @ Work – Tag Cloud
• Count tokens• Filter options• Filter options
supportedSt d• Stem words
![Page 16: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/16.jpg)
SEASR @ Work – Entity Mash-upE tit E t ti ith• Entity Extraction with OpenNLP or Stanford NER
• Locations viewed on Google Map D i d• Dates viewed on Simile Timeline
![Page 17: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/17.jpg)
SEASR @ Work – Entities To Network
• Identify entities• Define relationships between entities withinDefine relationships between entities within
same sentence
![Page 18: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/18.jpg)
SEASR @ Work – Text Clustering
• Clustering of Text by token counts• Filtering options for stop words Part of Speech• Filtering options for stop words, Part of Speech• Dendogram Visualization
![Page 19: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/19.jpg)
SEASR @ Work – Audio Analysis• NEMA: Executes a SEASR
flow for each run
– Loads audio data– Loads audio data
– Extracts features for every 10 sec moving
i d f diwindow of audio
– Loads and applies the models
– Sends results back to the WebUI
NESTER: Annotation of• NESTER: Annotation of Audio via Spectral Analysis
![Page 20: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/20.jpg)
SEASR @ Work – Zotero• Plugin to Firefox • Zotero manages the
collection• Launch SEASR Analytics
– Citation Analysis uses the– Citation Analysis uses the JUNG network importance algorithms to rank the authors in the citation network that is exported as RDF data from Zotero to SEASR
– Zotero Export to Fedora through SEASRthrough SEASR
– Saves results from SEASR Analytics to a Collection
• Launch MONK• Launch MONK Processing– MONK DB Ingestion Workflow
![Page 21: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/21.jpg)
SEASR @ Work – Emotion Tracking
Goal is to have this type of Visualization to track emotions across a text document (Leveraging flare.prefuse.org)
![Page 22: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/22.jpg)
Sentiment Analysis: Visualization
![Page 23: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/23.jpg)
Person Extraction:Scott's Waverley, Ivanhoe, and The Heart of Midlothian.
![Page 24: ALA 2010 -- Jeremy York](https://reader034.vdocument.in/reader034/viewer/2022051609/5479d52bb4af9fe2158b4955/html5/thumbnails/24.jpg)
Location Extraction:Top: Walter Scott's Waverley Bottom: Maria Edgeworth's Castle Rackrent