virtualisation of hadoop clusters dr g sudha sadasivam assistant professor department of cse psgct

15
VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

Upload: theodora-cummings

Post on 23-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

VIRTUALISATION OF HADOOP CLUSTERS

Dr G Sudha SadasivamAssistant ProfessorDepartment of CSE

PSGCT

Page 2: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

Introduction• Physical machine can have a number of smaller

virtual machines (VMs), each running a separate operating system instance.

• Challenges– partitioning of a machine – concurrent execution of multiple operating systems – Isolation of virtual machines from one another– Support heterogeneity of applications– Low performance overhead

• Xen is a virtual machine monitor for x86 that supports execution of multiple guest operating systems hypervisor, kernel and user space applications

Page 3: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

Objective• Automation of creation and deletion of a virtual

cluster for hosting Hadoop using Xen• A large physical cluster can be simulated on few

physical machines

Steps• Input user configuration by editing configuration files.• Generates user specified number of VM running

Hadoop.• Users can manage the Hadoop file system • Users can submit jobs for each physical machine.

Page 4: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

Need for virtualisation• Ability to recover from software problems quickly by

saving a copy of guest image.• High availability by relocating guests when a server

machine in inoperable.• Dynamic load balancing by migrating guests from server

machines.• Consolidation of many services in one physical machine

and administer them independently in VM.• Usage of abundant computational power on the physical

machine. Minimisation of cost.• Switch between applications on different OS using

hypervisors.

Page 5: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

HADOOP CLUSTER CONFIGURATION

Host node is configured as master (NN) and also acts as slave (DN) Guest node (DN) is configured as slave

Page 6: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

Master is the HostOS which acts as job tracker/Name node. Slave is the GuestOS which acts as task tracker/Data node.

Page 7: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

• Installation of Xen kernel• Creation of Guest OS• Configuration of Guest OS • Installation of Java Development Kit• Extraction and Configuration of Hadoop

Cluster• Creating OS image for new Guest Machines• Creation and removal of other Virtual

machines, copy the OS images

Steps in implementing

Page 8: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

Automated Creation of a Hadoop Virtual cluster

XML file has configuration details of new VM

Page 9: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

Automated Shut down of Hadoop Virtual cluster

Page 10: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

Advantages of automated virtualization in Hadoop

1.Effective isolation of the datanode from the load on the machine caused by other processes makes the datanode more responsive/reliable.

2.The availability of multiple virtual machines on each machine lowers the granularity of scheduling units thus making it possible to schedule multiple task trackers on the same machine and to improve the overall utilization of the whole clusters.

3.The snapshot a virtual cluster makes it possible to re-activate the same cluster in the future and start to work from the snapshot. (rollback)

Page 11: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

Enhancements

1.Providing a graphical console for monitoring and managing virtual cluster.

2.Creation and Migration of virtual machine for the purpose of load balancing.

3.Enabling snapshot of the virtual machine. For checkpointing

4.Providing Intelligent Monitoring System which could detect the failure of a virtual machine in the cluster and restarts the particular virtual machine increasing the reliability.

Page 12: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

Performance of Physical vs Virtual clusters

0

5E+10

1E+11

1.5E+11

2E+11

2.5E+11

1 2 3 4 5

Number of nodes

Tim

e in

nse

c

Physical clusters Virtual Clusters

4 6 8 10 12

Page 13: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

1.00E+09

1.00E+10

1.00E+11

1 2 3 4 5

Number of nodes

Tim

e in

nse

c

7 Nodes Data nodes – 6 Virtual nodes

Name node –1 physical node

Master as a Physical Node

Page 14: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

7 Nodes Data nodes – 1 physical node + 5 Virtual nodes

Name node –1 virtual node

1.00E+09

1.00E+10

1.00E+11

1 2 3 4 5

File size in MB

Tim

e in

nse

cMaster as a Virtual Node

Page 15: VIRTUALISATION OF HADOOP CLUSTERS Dr G Sudha Sadasivam Assistant Professor Department of CSE PSGCT

Performance with varying number of Virtual nodes

5.74E+10

5.76E+10

5.78E+10

5.80E+10

5.82E+10

5.84E+10

5.86E+10

5.88E+10

5.90E+10

5.92E+10

4 6 8 10 12

File Size in MB

Tim

e in

Na

no

seo

nd

Six Virtual Nodes Four Virtual Nodes