google print ™, million book project, and google scholar ™ digital libraries colloquium january...

28
Google Print, Million Book Project, and Google ScholarDigital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Upload: louisa-cole

Post on 28-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Google Print™, Million Book Project, and Google Scholar™

Digital Libraries ColloquiumJanuary 27, 2005Gloriana St. Clair

Dean of University Libraries

Page 2: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

“This is the day the world changes.”

John Wilkin, University of Michigan2

“Commercialize the great research libraries with a handshake, suddenly and epochally.”

Rory Litwin, in Library Juice1

Page 3: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Thesis

Google’s new projects are exciting and, of course, commercial

This talk will compare Google Print™ with the NSF-funded Million Book Project, and then touch briefly on Google Scholar™

Page 4: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Main Points

Why / Genesis - Leaders, Partners Realities - Collections, Logistics Worries – Duplication, Copyright, Copyright,

Copyright, Printing . . .

Page 5: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Sources For This Talk

News / web / talks / interviews, with help: Jean Alexander, Head, and the Hunt Library

Reference Department Denise Troll Covey, Special Projects Librarian Missy Harvey, Computer Science Librarian Penn State Reference Department David Seaman, Digital Library Federation Anthony Tomasic, E-XMLMedia Michael Lesk, Rutgers University

Page 6: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Google Print™ Leaders/Partners

Google, Inc. U. Michigan Stanford University Harvard University U. Oxford New York Public Library

Page 7: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Million Book Project Leaders/Partners in India Indian Institute of Science International Institute of Information Technology Indian Institute of Information Technology Anna University Mysore University University of Pune Goa University Tirumala Tirupati Devasthanams Shanmugha Arts, Science, Technology & Research

Academy Arulmigu Kalasalingam College of Engineering Maharashtra Industrial Development Corporation

Page 8: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Million Book Project Leaders/Partners in China

Chinese Academy of Science Chinese Ministry of Education Fudan University Nanjing University Peking University Tsinghua University Zhejiang University 

Page 9: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Google Print™Collections

Stanford – entire collection Harvard – 40,000-volume pilot from a 15-million

volume collection U. Michigan – virtually the entire collection;

add seven million to search engine; Michigan to “receive and own a high quality digital copy”3 and provide access

New York Public Library – a subset of a 20-million volume collection; selection criteria = in public domain (1923), interesting, not too fragile

Page 10: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Million Book Project Targeted Subcollections

Books for College Libraries (best books) University presses / scholarly societies

(copyright permissions work) U.N.’s Food and Agriculture Organization

content

Page 11: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Google Print™Handling the Copyright Issue

Displays “a snippet of text”4 online for books in copyright A ‘snippet’ is defined as three lines A search returns three snippets per book, and

lists the number of times your search terms appear in the book

BUY button

Page 12: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Million Book Project Handling the Copyright Issue

After extensive work, we are experiencing growing success in efforts to gain permission from university presses / scholarly societies to digitize books in searchable full text

Page 13: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Million Book ProjectResearch Initiatives

Machine translation Massive distributed

database Storage formats Use of digital libraries Distribution and

sustainability

Security Search engines Image processing Optical Character Recognition (OCR)

Language processing Copyright laws

Google™ began as a research projectat Stanford in 1995.

Page 14: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Google Print™Logistics

“Google will be doing all the digitizing with their own staff at Google headquarters and supposedly at Harvard and Michigan.”5

Six-year time frame 2.25 books per minute Onsite

Page 15: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Million Book ProjectLogistics

● With scanning time @ one page per second: ● 20,000 pages per day shift x 200 working days per

year ● 100 years to scan 1 million books ÷ (number of

operators/machines)

● Several mega scanning centers are set up in India and China

Page 16: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Million Book ProjectFinances

India - $25M annually to support a large set of language translation research projects

China - $8.46M from Ministry of Education over 3 yrs (2006)

United States - $3.63M from NSF over 4 yrs (2005); and equipment, staff and money from the Internet Archive

Google Print™ has funding of $???, but estimates costs at $10 per book.

Page 17: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Worries

Duplication “De-duplication is NOT part of the [Google

Print™] process. NOTE Stanford is interested in having multiple copies of the same materials across various partners.”6

Million Book Project will use OCLC’s Digital Registry as soon as batch loading is available.

Page 18: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Worries

Copyright Google will be responsible for determining

what’s in copyright.”7

“A team is working on copyright issues but, in the meantime, Google is treating [copyright] conservatively.”8

Printing “Google will disable printing for out-of-copyright

books.”9

Page 19: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

More Worries Google Print™

Rory Litwin, “On Google’s Monetization of Libraries”10

1. Privacy [cookies]2. Introduction of commercial bias3. Questions about democratization

and equity of access4. Disintermediation issues5. Decontextualization of knowledge6. Closing of the information commons

Page 20: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

More WorriesMillion Book Project

1. Getting it done

2. Sustainability

3. Cohesion of content

4. Usefulness

Page 21: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Google Scholar™ Beta

Reviewed by Péter’s,11 Anthony Tomasic, and reference librarians at Carnegie Mellon and Penn State: Not as good as Citebase, Research Index,

RePEc/LogEc (Péter’s) Not as good as CiteSeer (Tomasic) Not as suitable as CiteSeer (Lesk) Not as good as Google press releases indicate

(St. Clair)

Page 22: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Google Scholar™ Beta What:12

Offers free access to bibliographic records and some abstracts

May lead to full text if the university library subscribes or if free-to-read

May lead to a document delivery company Does not penetrate the invisible Web Has significantly enlarged the scope by crawling

additional publishers, preprint and reprint servers Competes with other aggregators, such as SFX

Page 23: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Google Scholar™ Beta

What: Meets the needs of students looking for a different

kind of material, and targets advertising to them It is easy for a human to identify a scholarly article,

but it is a challenge for a machine (Tomasic)

Page 24: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Additional Challenges for a Better Scholarly Search Engine13

Exploit highly structured and tagged web pages with rich metadata from scholarly publishers

Create field-specific indexes for many distinct data elements

Offer advanced navigation with pull-down menus for limited search by document type, publisher, publication year, journal

Consolidate cited references Collect information from all relevant materials Develop utilities to help libraries find all materials

subscribed to, not just one path

Page 25: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Thank you

Gloriana St. ClairDean of University LibrariesCarnegie Mellon [email protected] or 412-268-2447

If you would like an electronic copy of this talk, contact Cindy Carroll, [email protected]

Page 26: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Endnotes1. Litwin, Rory. “On Google’s Monetization of Libraries. Library Juice

7,26 (December 17, 2004). Available: http://www.libr.org/Juice/issues/vol7/LJ_7.26.html#3.

2. Wilkin, John. Quoted in “Google to Scan Books from Major Libraries.” MSNBC Tech News & Reviews. Available: http://www.msnbc.msn.com/id/6709342.

3. University of Michigan (Nancy Connell). “Google/U-M Project Opens the Way to Universal Access to Information .“ University of Michigan News Service (December 14, 2004). Available: http://www.umich.edu/news/?Releases/2004/Dec04/library/index.

4. University of Michigan. “Google/U-M Project Questions and Answers.” The University Record Online (January 7, 2005). Available: http://www.umich.edu/~urecord/0405/Dec13_04/lib_qa.shtml.

Page 27: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Endnotes5. Misseli. “The Google Deal (Down on the Farm).” Message

posted by a Stanford staff member to Confessions of a Mad Librarian. Available: http://edwards.orcas.net/~misseli/blog/archives/000222.html.

6. Ibid.7. Ibid. 8. Adam Smith, Senior Business and Product Manager for Google

Print and Google Scholar, speaking informally with the ALA Electronic Text Centers Discussion Group. American Library Association Mid-Winter Conference (January 15, 2005).

9. Price, Gary. “Google Partners with Oxford, Harvard & Others to Digitize Libraries.” Search Engine Watch (December 14, 2004). Available: http://searchenginewatch.com/searchday/article.php/3447411.

Page 28: Google Print ™, Million Book Project, and Google Scholar ™ Digital Libraries Colloquium January 27, 2005 Gloriana St. Clair Dean of University Libraries

Endnotes

10. Litwin.

11. Péter’s Digital Reference Shelf. “Google Scholar Beta.” (December 2004). Available: http://www.galegroup.com/servlet/HTMLFileServlet?imprint=9999&region=7&fileName=reference/archive/200412/googlescholar.html.

12. Ibid.

13. Ibid.