hadoop and its applications at the state and university library, scape information day, 25 june 2014

13
Per Møldrup-Dalum State and University Library SCAPE Information Day State and University Library, Denmark, 2014-06-25 Hadoop and its applications at the State and University Library

Upload: scape-project

Post on 28-Nov-2014

102 views

Category:

Technology


0 download

DESCRIPTION

Per Møldrup-Dalum introduced how the State and University Library in Denmark have deployed Hadoop in connection with the SCAPE project. With Hadoop the library have been able to process large amounts of data so much fast than what has been done before. The presentation was given at ‘SCAPE Information Day at the State and University Library, Denmark’, on 25 June 2014. The information day introduced the EU-funded project SCAPE (Scalable Preservation Environments) and its tools and services to the participants. For more information about the demo day, see this blog post, http://bit.ly/SCAPE_SB_Demo, about the event.

TRANSCRIPT

Page 1: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014

Per Møldrup-Dalum State and University Library

SCAPE Information Day State and University Library, Denmark, 2014-06-25

Hadoop and its applications at the State and University Library

Page 2: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014

• A bit on Hadoop in general • A bit on our experience in deploying Hadoop at the

library

2

Agenda

This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).

Page 3: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014

• MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and Senjay Ghemawat, 2004

• In 2005 Cutting and Cafarella created Hadoop at Yahoo! • Now an Apache project • Commercial distributions, community editions, DIY

3

Origins

This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).

Page 4: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014

4

Map/Reduce

This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).

MAP

REDUCE

Page 5: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014

5

Lorem ipsum

This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).

• Count addresses that have fruits etc in their street name • Kirsebærhaven • Jordbærvej • Nødde allé

• Result • Kirsebær: 1203 • Nødder: 34 • Jordbær: 543

Page 6: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014

6

The Zoo

This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).

HDFS – data locality MapReduce

•••

Page 7: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014

7

Hadoop at the Library

This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).

Page 8: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014

• Blade servers with no local storage • Storage exclusively on NAS • We‘ve done several experiments

8

Can it be done?

This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).

Existing infrastructure

CPU Storage

Page 9: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014

4 CPU nodes • Two 6-core CPU • Intel® Xeon® Processor

X5670 with 12M Cache, 2.93 GHz, and 6.40 GT/s Intel® QPI

• 96GB RAM • 2Gbit Ethernet interface • CentOS • NFS mount point on NAS for

HDFS • Reachable NAS storage: ~4PB

9

Cluster topology

This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).

Science Museum/Science & Society Picture Library

Page 10: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014

10

Cloudera Hadoop Distribution

This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).

Page 11: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014

11

Interface

This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).

Page 12: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014

• http://hadoop.apache.org • http://www.cloudera.com • http://static.googleusercontent.com/media/research.g

oogle.com/en//archive/mapreduce-osdi04.pdf

12

References

This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).

Page 13: Hadoop and its applications at the State and University Library, SCAPE Information Day, 25 June 2014

13 This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐2009.4.1 (Grant Agreement number 270137).