google print ™, million book project, and google scholar ™ digital libraries colloquium january...

Post on 28-Dec-2015

219 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Google Print™, Million Book Project, and Google Scholar™

Digital Libraries ColloquiumJanuary 27, 2005Gloriana St. Clair

Dean of University Libraries

“This is the day the world changes.”

John Wilkin, University of Michigan2

“Commercialize the great research libraries with a handshake, suddenly and epochally.”

Rory Litwin, in Library Juice1

Thesis

Google’s new projects are exciting and, of course, commercial

This talk will compare Google Print™ with the NSF-funded Million Book Project, and then touch briefly on Google Scholar™

Main Points

Why / Genesis - Leaders, Partners Realities - Collections, Logistics Worries – Duplication, Copyright, Copyright,

Copyright, Printing . . .

Sources For This Talk

News / web / talks / interviews, with help: Jean Alexander, Head, and the Hunt Library

Reference Department Denise Troll Covey, Special Projects Librarian Missy Harvey, Computer Science Librarian Penn State Reference Department David Seaman, Digital Library Federation Anthony Tomasic, E-XMLMedia Michael Lesk, Rutgers University

Google Print™ Leaders/Partners

Google, Inc. U. Michigan Stanford University Harvard University U. Oxford New York Public Library

Million Book Project Leaders/Partners in India Indian Institute of Science International Institute of Information Technology Indian Institute of Information Technology Anna University Mysore University University of Pune Goa University Tirumala Tirupati Devasthanams Shanmugha Arts, Science, Technology & Research

Academy Arulmigu Kalasalingam College of Engineering Maharashtra Industrial Development Corporation

Million Book Project Leaders/Partners in China

Chinese Academy of Science Chinese Ministry of Education Fudan University Nanjing University Peking University Tsinghua University Zhejiang University 

Google Print™Collections

Stanford – entire collection Harvard – 40,000-volume pilot from a 15-million

volume collection U. Michigan – virtually the entire collection;

add seven million to search engine; Michigan to “receive and own a high quality digital copy”3 and provide access

New York Public Library – a subset of a 20-million volume collection; selection criteria = in public domain (1923), interesting, not too fragile

Million Book Project Targeted Subcollections

Books for College Libraries (best books) University presses / scholarly societies

(copyright permissions work) U.N.’s Food and Agriculture Organization

content

Google Print™Handling the Copyright Issue

Displays “a snippet of text”4 online for books in copyright A ‘snippet’ is defined as three lines A search returns three snippets per book, and

lists the number of times your search terms appear in the book

BUY button

Million Book Project Handling the Copyright Issue

After extensive work, we are experiencing growing success in efforts to gain permission from university presses / scholarly societies to digitize books in searchable full text

Million Book ProjectResearch Initiatives

Machine translation Massive distributed

database Storage formats Use of digital libraries Distribution and

sustainability

Security Search engines Image processing Optical Character Recognition (OCR)

Language processing Copyright laws

Google™ began as a research projectat Stanford in 1995.

Google Print™Logistics

“Google will be doing all the digitizing with their own staff at Google headquarters and supposedly at Harvard and Michigan.”5

Six-year time frame 2.25 books per minute Onsite

Million Book ProjectLogistics

● With scanning time @ one page per second: ● 20,000 pages per day shift x 200 working days per

year ● 100 years to scan 1 million books ÷ (number of

operators/machines)

● Several mega scanning centers are set up in India and China

Million Book ProjectFinances

India - $25M annually to support a large set of language translation research projects

China - $8.46M from Ministry of Education over 3 yrs (2006)

United States - $3.63M from NSF over 4 yrs (2005); and equipment, staff and money from the Internet Archive

Google Print™ has funding of $???, but estimates costs at $10 per book.

Worries

Duplication “De-duplication is NOT part of the [Google

Print™] process. NOTE Stanford is interested in having multiple copies of the same materials across various partners.”6

Million Book Project will use OCLC’s Digital Registry as soon as batch loading is available.

Worries

Copyright Google will be responsible for determining

what’s in copyright.”7

“A team is working on copyright issues but, in the meantime, Google is treating [copyright] conservatively.”8

Printing “Google will disable printing for out-of-copyright

books.”9

More Worries Google Print™

Rory Litwin, “On Google’s Monetization of Libraries”10

1. Privacy [cookies]2. Introduction of commercial bias3. Questions about democratization

and equity of access4. Disintermediation issues5. Decontextualization of knowledge6. Closing of the information commons

More WorriesMillion Book Project

1. Getting it done

2. Sustainability

3. Cohesion of content

4. Usefulness

Google Scholar™ Beta

Reviewed by Péter’s,11 Anthony Tomasic, and reference librarians at Carnegie Mellon and Penn State: Not as good as Citebase, Research Index,

RePEc/LogEc (Péter’s) Not as good as CiteSeer (Tomasic) Not as suitable as CiteSeer (Lesk) Not as good as Google press releases indicate

(St. Clair)

Google Scholar™ Beta What:12

Offers free access to bibliographic records and some abstracts

May lead to full text if the university library subscribes or if free-to-read

May lead to a document delivery company Does not penetrate the invisible Web Has significantly enlarged the scope by crawling

additional publishers, preprint and reprint servers Competes with other aggregators, such as SFX

Google Scholar™ Beta

What: Meets the needs of students looking for a different

kind of material, and targets advertising to them It is easy for a human to identify a scholarly article,

but it is a challenge for a machine (Tomasic)

Additional Challenges for a Better Scholarly Search Engine13

Exploit highly structured and tagged web pages with rich metadata from scholarly publishers

Create field-specific indexes for many distinct data elements

Offer advanced navigation with pull-down menus for limited search by document type, publisher, publication year, journal

Consolidate cited references Collect information from all relevant materials Develop utilities to help libraries find all materials

subscribed to, not just one path

Thank you

Gloriana St. ClairDean of University LibrariesCarnegie Mellon Universitygstclair@andrew.cmu.edu or 412-268-2447

If you would like an electronic copy of this talk, contact Cindy Carroll, stell@cmu.edu

Endnotes1. Litwin, Rory. “On Google’s Monetization of Libraries. Library Juice

7,26 (December 17, 2004). Available: http://www.libr.org/Juice/issues/vol7/LJ_7.26.html#3.

2. Wilkin, John. Quoted in “Google to Scan Books from Major Libraries.” MSNBC Tech News & Reviews. Available: http://www.msnbc.msn.com/id/6709342.

3. University of Michigan (Nancy Connell). “Google/U-M Project Opens the Way to Universal Access to Information .“ University of Michigan News Service (December 14, 2004). Available: http://www.umich.edu/news/?Releases/2004/Dec04/library/index.

4. University of Michigan. “Google/U-M Project Questions and Answers.” The University Record Online (January 7, 2005). Available: http://www.umich.edu/~urecord/0405/Dec13_04/lib_qa.shtml.

Endnotes5. Misseli. “The Google Deal (Down on the Farm).” Message

posted by a Stanford staff member to Confessions of a Mad Librarian. Available: http://edwards.orcas.net/~misseli/blog/archives/000222.html.

6. Ibid.7. Ibid. 8. Adam Smith, Senior Business and Product Manager for Google

Print and Google Scholar, speaking informally with the ALA Electronic Text Centers Discussion Group. American Library Association Mid-Winter Conference (January 15, 2005).

9. Price, Gary. “Google Partners with Oxford, Harvard & Others to Digitize Libraries.” Search Engine Watch (December 14, 2004). Available: http://searchenginewatch.com/searchday/article.php/3447411.

Endnotes

10. Litwin.

11. Péter’s Digital Reference Shelf. “Google Scholar Beta.” (December 2004). Available: http://www.galegroup.com/servlet/HTMLFileServlet?imprint=9999&region=7&fileName=reference/archive/200412/googlescholar.html.

12. Ibid.

13. Ibid.

top related