lab session
TRANSCRIPT
LAB 1Cloud Computing
Virtualization
Jinnah University for WomenInstructor Engr S M Asim Ali
TASK LIST
What is Virtualization? Show your understanding through 02 examples
LAB 2Cloud Computing
Services
Jinnah University for WomenInstructor Engr S M Asim Ali
TASK LIST
What operating system will you prefer for creating Virtual Environment
Mention the services of Microsoft Operating System or Linux that support virtualization
LAB 3Cloud Computing
HADOOP as a tool for MAP REDUCE
Jinnah University for WomenInstructor Engr S M Asim Ali
TASK LIST
Introduction Data Grid vs. Computing Grid Grid Computing Cloud Computing
Data Grid (HaDoop File System) Computing Grid (Map Reduce) Counting of Words Conclusion
MotivationCount how frequent each words appears in the corpus MEDline (18 millions texts)
Motivation
I want to extend my research to another corpus
Need more computing resources
Data Grid vs. Computing Grid
Data Grid: distributed data storage controlled sharing and management of large amounts of
distributed data. Computing Grid:
Parallel execution divide pieces of a program among several computers
Data Grid + Computing Grid
Grid Computing
Grid Computing
The Grid
Master
Slaves
Task
Grid Computing
Motivation: high performance, improving resources utilization
Aims to create illusion of a simple, yet powerful computer out of a large number of heterogeneous systems
Tasks are submitted and distributed on nodes in the grid
Cloud Computing
“The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do. “
Larry Ellison
during Oracle’s Analyst Day
Cloud Computing
Pay-as-you-go No initial investments
Reduced operation costs Scalability Availability
Cloud Computing - Open Issues
Bandwidth and latency Lack of standard and portability „Black-box“ implementations Security and lack of control Immature tools and framework support Legal issues (ownership, auditing, etc) Limited Service Level of Agreements (SLAs)
Data Grid vs. Computing Grid
Data Grid: distributed data storage controlled sharing and management of large amounts of
distributed data. Computing Grid:
Parallel execution divide pieces of a program among several computers
Data Grid + Computing Grid
Grid Computing
Data Grid (Hadoop FS - Overview)
Caching of DataNamenode
(master node)Metadata (Name, .., ..)
…
Index:
Datanodes(Slave node)
Block ops
Client
Ask specifictext
Replication
Data Grid (HDFS - Replication Data)
Counting Words in Text Files
1 3 2 0
0 5 1 8
7 2 3 5
…
Split-Operation
countWords(File)
countWords(File)
countWords(File)
countWords(File)
Map-Operation
w1:
w2:
w4:
w3:
w5:
…
…
6 2 3 4
0 1 0 0
w1: 6
w2: 14
w3: 15
w4: 17
w5: 1
Reduce-Operation
Advantages of Hadoop
Purely written in Java, requires installation of Cygwin under Windows
Available under LGPL and Apache 2.0 license Usually offers only one implementation for the different
features of a grid framework May also use other file systems than Hadoop FS Very flexible implementation of MapReduce For split operation only supports FileSplit out of the box Better suited for computations where …
… large data collections should be handled … if reduce-operation is more than a simple aggregation of
the map‘s output