lab session

19
LAB 1 Cloud Computing Virtualization Jinnah University for Women Instructor Engr S M Asim Ali

Upload: seestar99

Post on 18-Apr-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lab Session

LAB 1Cloud Computing

Virtualization

Jinnah University for WomenInstructor Engr S M Asim Ali

Page 2: Lab Session

TASK LIST

What is Virtualization? Show your understanding through 02 examples

Page 3: Lab Session

LAB 2Cloud Computing

Services

Jinnah University for WomenInstructor Engr S M Asim Ali

Page 4: Lab Session

TASK LIST

What operating system will you prefer for creating Virtual Environment

Mention the services of Microsoft Operating System or Linux that support virtualization

Page 5: Lab Session

LAB 3Cloud Computing

HADOOP as a tool for MAP REDUCE

Jinnah University for WomenInstructor Engr S M Asim Ali

Page 6: Lab Session

TASK LIST

Introduction Data Grid vs. Computing Grid Grid Computing Cloud Computing

Data Grid (HaDoop File System) Computing Grid (Map Reduce) Counting of Words Conclusion

Page 7: Lab Session

MotivationCount how frequent each words appears in the corpus MEDline (18 millions texts)

Page 8: Lab Session

Motivation

I want to extend my research to another corpus

Need more computing resources

Page 9: Lab Session

Data Grid vs. Computing Grid

Data Grid: distributed data storage controlled sharing and management of large amounts of

distributed data. Computing Grid:

Parallel execution divide pieces of a program among several computers

Data Grid + Computing Grid

Grid Computing

Page 10: Lab Session

Grid Computing

The Grid

Master

Slaves

Task

Page 11: Lab Session

Grid Computing

Motivation: high performance, improving resources utilization

Aims to create illusion of a simple, yet powerful computer out of a large number of heterogeneous systems

Tasks are submitted and distributed on nodes in the grid

Page 12: Lab Session

Cloud Computing

“The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do. “

Larry Ellison

during Oracle’s Analyst Day

Page 13: Lab Session

Cloud Computing

Pay-as-you-go No initial investments

Reduced operation costs Scalability Availability

Page 14: Lab Session

Cloud Computing - Open Issues

Bandwidth and latency Lack of standard and portability „Black-box“ implementations Security and lack of control Immature tools and framework support Legal issues (ownership, auditing, etc) Limited Service Level of Agreements (SLAs)

Page 15: Lab Session

Data Grid vs. Computing Grid

Data Grid: distributed data storage controlled sharing and management of large amounts of

distributed data. Computing Grid:

Parallel execution divide pieces of a program among several computers

Data Grid + Computing Grid

Grid Computing

Page 16: Lab Session

Data Grid (Hadoop FS - Overview)

Caching of DataNamenode

(master node)Metadata (Name, .., ..)

Index:

Datanodes(Slave node)

Block ops

Client

Ask specifictext

Replication

Page 17: Lab Session

Data Grid (HDFS - Replication Data)

Page 18: Lab Session

Counting Words in Text Files

1 3 2 0

0 5 1 8

7 2 3 5

Split-Operation

countWords(File)

countWords(File)

countWords(File)

countWords(File)

Map-Operation

w1:

w2:

w4:

w3:

w5:

6 2 3 4

0 1 0 0

w1: 6

w2: 14

w3: 15

w4: 17

w5: 1

Reduce-Operation

Page 19: Lab Session

Advantages of Hadoop

Purely written in Java, requires installation of Cygwin under Windows

Available under LGPL and Apache 2.0 license Usually offers only one implementation for the different

features of a grid framework May also use other file systems than Hadoop FS Very flexible implementation of MapReduce For split operation only supports FileSplit out of the box Better suited for computations where …

… large data collections should be handled … if reduce-operation is more than a simple aggregation of

the map‘s output