![Page 1: Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor](https://reader035.vdocument.in/reader035/viewer/2022062304/568140c8550346895dac935f/html5/thumbnails/1.jpg)
Michael Krot, Data Managerand
David Yakimischak, CTO
[email protected], [email protected]://www.jstor.org
![Page 2: Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor](https://reader035.vdocument.in/reader035/viewer/2022062304/568140c8550346895dac935f/html5/thumbnails/2.jpg)
![Page 3: Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor](https://reader035.vdocument.in/reader035/viewer/2022062304/568140c8550346895dac935f/html5/thumbnails/3.jpg)
![Page 4: Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor](https://reader035.vdocument.in/reader035/viewer/2022062304/568140c8550346895dac935f/html5/thumbnails/4.jpg)
JSTOR Mission
• JSTOR is a not-for-profit organization with a mission to help the scholarly community take advantage of the advances in information technology. This includes: (1) building a reliable and comprehensive archive of core scholarly journals, and (2) dramatically improve access to this scholarly material
• In pursuing its mission, JSTOR takes a system-wide perspective, seeking benefits for libraries, publishers and scholars
![Page 5: Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor](https://reader035.vdocument.in/reader035/viewer/2022062304/568140c8550346895dac935f/html5/thumbnails/5.jpg)
Currently
Over 1,000 U.S. Participating SitesOver 700 International Participating SitesOver 200 Participating PublishersOver 300 PublicationsBroad coverage of disciplines14 million pages scanned (and counting)
(average 10,000 pages per day)
![Page 6: Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor](https://reader035.vdocument.in/reader035/viewer/2022062304/568140c8550346895dac935f/html5/thumbnails/6.jpg)
Monthly Usage
Meaningful Accesses
0
2,000,000
4,000,000
6,000,000
8,000,000
10,000,000
12,000,000
14,000,000
16,000,000
18,000,000
20,000,000
Jan-9
7
Apr-9
7Ju
l-97
Oct-97
Jan-9
8
Apr-9
8Ju
l-98
Oct-98
Jan-9
9
Apr-9
9Ju
l-99
Oct-99
Jan-0
0
Apr-0
0Ju
l-00
Oct-00
Jan-0
1
Apr-0
1Ju
l-01
Oct-01
Jan-0
2
Apr-0
2Ju
l-02
Oct-02
Jan-0
3
Apr-0
3Ju
l-03
Oct-03
![Page 7: Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor](https://reader035.vdocument.in/reader035/viewer/2022062304/568140c8550346895dac935f/html5/thumbnails/7.jpg)
OAI-PMH Project Background
• JSTOR has shared metadata for some applications
• However we use proprietary data formats and transmission methods
• OAI-PMH had the right characteristics• But, we are re-writing our system• Gave us a chance to learn new techniques• Forced the separation of server from data
![Page 8: Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor](https://reader035.vdocument.in/reader035/viewer/2022062304/568140c8550346895dac935f/html5/thumbnails/8.jpg)
Purpose of this Presentation
• Overview of JSTOR OAI-PMH System
• Constraints
• Process
• Design
• Sharing our observations
![Page 9: Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor](https://reader035.vdocument.in/reader035/viewer/2022062304/568140c8550346895dac935f/html5/thumbnails/9.jpg)
Constraints
• Large amount of data (2.5 million articles)
• Content restricted by subscription
• Authorization System in transition
• Metadata store in transition
• Code must be sharable with others, in Java
• Lots of uncertainty!
![Page 10: Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor](https://reader035.vdocument.in/reader035/viewer/2022062304/568140c8550346895dac935f/html5/thumbnails/10.jpg)
Process
• Initial Requirements Gathering
- No existing software for our needs
- Current JSTOR System inadequate
• Unified Process/UML
• Outside advisors (Object Insight)
• Create pluggable parts to handle uncertainty
![Page 11: Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor](https://reader035.vdocument.in/reader035/viewer/2022062304/568140c8550346895dac935f/html5/thumbnails/11.jpg)
Use Case Diagram
![Page 12: Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor](https://reader035.vdocument.in/reader035/viewer/2022062304/568140c8550346895dac935f/html5/thumbnails/12.jpg)
Initial Steps
• ‘Retrieve Bibliographic Records’ Use Case• Use cases gave insight into Search/Auth
requirements– Repository would have to handle
increments,counts– Auth would have to know about harvester sets
• Use Case Analysis using Collaboration Diagram (MVC)
![Page 13: Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor](https://reader035.vdocument.in/reader035/viewer/2022062304/568140c8550346895dac935f/html5/thumbnails/13.jpg)
Retrieve Bibliographic Records Collaboration Diagram
![Page 14: Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor](https://reader035.vdocument.in/reader035/viewer/2022062304/568140c8550346895dac935f/html5/thumbnails/14.jpg)
View of Participating Classes
![Page 15: Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor](https://reader035.vdocument.in/reader035/viewer/2022062304/568140c8550346895dac935f/html5/thumbnails/15.jpg)
J2EE Design Patterns
![Page 16: Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor](https://reader035.vdocument.in/reader035/viewer/2022062304/568140c8550346895dac935f/html5/thumbnails/16.jpg)
Current Issues, Questions
• What's “new to repository” vs. “new to subscription”
• Resumption Tokens
• Compression
• Associating metadata formats with types of objects returned from search
• Development nears completion
![Page 17: Michael Krot, Data Manager and David Yakimischak, CTO krot@jstor, davidyak@jstor](https://reader035.vdocument.in/reader035/viewer/2022062304/568140c8550346895dac935f/html5/thumbnails/17.jpg)
Conclusion
• Constraints, Process, Sharing our Findings• Load Testing has been helpful• Internal Use First• How and when to introduce externally• Possibility of sharing code, UML• Need for a harvester, internally and externally• Paper Available• [email protected], [email protected]• Questions?