cni 17 april 2007 hastac 23 feb 2007 1 an update on google digitization at the university of...

55
CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything in it (Much of it stolen from John Wilkin) Paul N.Courant

Upload: nick-uzzle

Post on 15-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

1

An Update on Google Digitization at the University of

MichiganActually, a 50 slide kitchen sink and

everything in it(Much of it stolen from John Wilkin)

Paul N.Courant

Page 2: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

2

Outline of the talk

• Background--what it is and how we got there,

• Status report--what’s been done, what’s being done, and how the work is done

• GBS and MBooks--two access systems

• Implications and questions

• Conversation

Page 3: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

3

Part 1

Background

Page 4: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

4

Beginnings

• A conversation one day in 2002

• Intensive planning and negotiations (2003-2004)

• Equipment design and refinement (2003-2004)

• Early deployment and growth (2004-2006)

Page 5: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

5

Michigan’s deal

• What will be digitized?– 7m (University Library, bound print) vols

• Google covers costs• Make searchable on Google • What about the legal issues?• What happens to the books?• We get a copy of the images (more on this later)• Contract available online

Page 6: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

6

UM prior to Google

Digital initiatives going back to 1980s UMLibText - 1989 Humanities Text Initiative - 1994 PEAK, JSTOR, Making of America

Page 7: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

7

University Library mission statement:

“To support, enhance, and collaborate in the instructional, research, and service activities of faculty, students, and staff, and contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge.”

Why engage in this partnership?

Page 8: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

8

Part 2

Status

Page 9: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

9

Production status

• Google began capture @UM in July, 2004

• UM receiving content continuously

• Large amounts of UM content went into GBS in November, 2005

• Production ramp-up continues– New facility– Scaling (on pace to meet the 6-year target)– Pattern of doubling growth

Page 10: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

10

But how do they do it????

Page 11: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

11

THE GOOGLE WORKSTATION(CONFIDENTIAL)

Page 12: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

12

THE ANN ARBORWORK GROUP

Page 13: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

13

Accomplishments

Getting books out and back - developed workflows - back on the shelf within 5-7 days

Developed mechanisms for receiving content from Google

Image quality - library preservation standards

Pre-/post-condition survey

Page 14: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

14

About the files

• Returned to UM: package per volume, id’d by barcode– 600dpi TIFF ITU G4 (bitonal) for print– 300dpi JPEG2000 color/grayscale for illus.– File naming conventions corresponding to UM specs– Checksums– Production notes– OCR (UTF-8 text files)– Page metadata: page numbers, page features, chapter start

• Quality control– Ongoing improvement of hardware/engineering– Image quality good and improving

• What is secret and why?– Technology– Numbers

Page 15: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

15

Workflow diagram

Page 16: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

16

Accomplishments (2)

September 2006 UM released local implementation - MBooks

System for accessing our digital collections Access through MIRLYN - UM Library OPAC Created Rights Database to manage access

Page 17: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

17

Part 3

Two access systems

Page 18: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

18

Bunker Hill

Page 19: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

19

Page 20: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

20

Page 21: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

21

Page 22: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

22

Page 23: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

23

Page 24: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

24

Page 25: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

25

Page 26: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

26

Page 27: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

27

Page 28: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

28

Page 29: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

29

Record for out-of-copyright work

Page 30: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

30

Page 31: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

31

Page 32: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

32

Page 33: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

33

Search within the book

Walt Whitman

Page 34: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

34

Record for in-copyright work

Page 35: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

35

Page 36: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

36

In copyright: search within text

Walt Whitman

Page 37: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

37

Page 38: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

38

Page 39: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

39

Why would UM put the materials online?• Responsibility for the “archive” • Michigan “audience” more specific and thus more

specialized…– Rights that Google may not have

• Current Section 108 provisions• Services for disabled users• Negotiated rights?

– Functions that Google may not want to support• More flexible displays• More powerful citation tools• Power searches?• Data mining and other research applications

Page 40: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

40

Part 4

Big questions and implications

Page 41: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

41

Key contract provisions, pt. 1: the archive• 4.4.1 Use of U of M Digital Copy on U of M Website. U of

M shall have the right to use the U of M Digital Copy, in whole or in part at U of M's sole discretion,

as part of services offered on U of M's website. U of M shall implement technological measures … to restrict automated access to any portion of the U of M Digital Copy …. U of M shall also make reasonable efforts … to prevent third parties from (a) downloading or otherwise obtaining any portion of the U of M Digital Copy for commercial purposes, (b) redistributing any portions of the U of M Digital Copy, or (c) automated and systematic downloading from its website image files from the U of M Digital Copy….

• Cf. Google: driven by a business model

Page 42: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

42

Key contract provisions, pt. 2: one archive or many?• 4.4.2 Use of U of M Digital Copy in Cooperative Web

Services. Subject to the restrictions set forth in this section, U of M shall have the right to use the U of M Digital Copy, in whole or in part

at U of M's sole discretion, as part of services offered in cooperation with partner research libraries such as the institutions in the Digital Library Federation….

Page 43: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

43

Creating a cooperative enterprise

• Original vision– Greater than the sum of our parts– Effort tied to the mission of our scholarly enterprise(s)– Extensible framework for content and services

• CIC discussions– All 13 institutions participating– Two coordinated instances of replicated content– A foundation for creating shared definition

• Next?

Page 44: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

44

Transformative implications (2004)

• Broad, efficient, democratizing access• Access as driver for …• Exaggerating and resolving IP issues• Creation of cooperative “universal” library• Exacerbating paradox of “library as place”• Facilitating “specialization” (ceding “generalist” role to

Google)• Freeing up resources for related issues (e.g.,

institutional repositories, scholarly communication)

Page 45: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

45

Implications for users--the obvious

• I just came across a copy of the classic philosophy text "Principia Ethica" in your catalog of books and what I want to tell you is WOW!!! ... This is an extraordinary service to the public.... What is interesting, though, is that for those books you have online that are different editions from the one I own, I am inclined to buy the books anew so as to be able to refer to the copy you have online... I know of no better method for doing textual analysis than by using your service.

• I am writing a paper on the history of statistics and was disheartened when I picked up Claude Bernard's Introduction to Experimental Medicine - because it did not have an index. But then I found you.

Page 46: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

46

Use and users, more

• Those who “rediscover” their own collections

• Those who find what they would not have found otherwise

• Not new ways of reading, but efficiencies only imagined

Page 47: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

47

IP Issues

• Orphaned works

• 108

• The immovable object and the irresistible force

• What might a reasonable arrangement look like?– Let books be books?

Page 48: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

48

Universal Library

• Borges’s ‘universal library’?: “This ideal, although unrealizable, has influenced and continues to influence librarians and others.”– Now it is realizable, but here we are

blocked by the rights environment. Again, we should be able to find a Pareto-improving trade

Page 49: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

49

Library as Place

• So, what are we going to do in the library?– So far, we seem to be plenty busy– We might even work on search tools that can be

used by the ordinary scholar working outside (inside?) her own narrow field

• Will we attract users with expertise, with the promise of propinquity to other users?

Page 50: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

50

Specialization

• Maybe we can specialize in the academic library way of doing things, and let Google take care of the other audiences.

• And maybe, given the nature of our business and the lives of undergraduates, not.

Page 51: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

51

Freeing up resources

• Space, the final frontier…

• Institutional repositories

• Shared physical storage

• Support publication and scholarship in the library with library expertise

Page 52: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

52

What’s next?

Storage - RFP Move from storage to other UM libraries MBooks

Collections correcting OCR??? Rights clearance

Services for persons with disabilities Shared effort

Page 53: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

53

Why does the University of Michigan support this project?

At its essence, digitizing books and widening

their exposure is about the public good. I believe it transcends debates about snippets, and copyright, and who owns what when, and rises to the very ideal of a university—particularly a great public university like Michigan. Our work is about the social good of promoting and sharing knowledge. As a university, we have no other choice but to make this happen.

- Mary Sue Coleman

Scholarship and Libraries in Transition March 2006

Page 54: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

54

For further information…

Michigan Digitization Project http://www.lib.umich.edu/mdp/– Contract– Project FAQ– Link to information about UM access?

Mirlyn (UM online catalog) http://mirlyn.lib.umich.edu/

[email protected] [email protected]

Page 55: CNI 17 April 2007 HASTAC 23 Feb 2007 1 An Update on Google Digitization at the University of Michigan Actually, a 50 slide kitchen sink and everything

CNI 17 April 2007 HASTAC 23 Feb 2007

55

Two Futures