ansible + hadoop - fierce...
TRANSCRIPT
![Page 1: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/1.jpg)
Ansible + HadoopDeploying Hortonworks Data Platform with Ansible
Michael Young
Solutions EngineerFebruary 23, 2017
![Page 2: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/2.jpg)
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
About Me
Michael Young– Solutions Engineer @ Hortonworks
– 16+ years of experience (Almost all in Public Sector)
– Information Retrieval (Solr, Elasticsearch)
– Hadoop (HDP, MapR, Cloudera)
– DevOps (Ansible, Puppet, Docker, Vagrant)
– Development (Python, Perl, Node.js)
@jaraxal
![Page 3: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/3.jpg)
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
About Hortonworks
Only 100% Open Source Hadoop Company
Over 1,000 customers
Over 2,100 partners
Hortonworks Data Platform (HDP)
Hortonworks Data Flow (HDF)
Hortonworks Community Connection (HCC)
![Page 4: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/4.jpg)
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hortonworks Data Platform 2.5
![Page 5: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/5.jpg)
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ambari: Management and Monitoring
![Page 6: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/6.jpg)
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
HDP Provisioning Workflow
Prepare Infrastructure– Package Repos– DNS– NTP
Prepare OS– Disable Transparent Huge Pages– Disable Swapping– Jumbo Frames– Format and mount disk drives
Bootstrap Ambari– Install Ambari Server– Install Ambari Agents
Install HDP– Interactively via Ambari’s web-based UI– Automatically via Ambari Blueprints
![Page 7: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/7.jpg)
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ambari Blueprints for HDP Deployments
https://cwiki.apache.org/confluence/display/AMBARI/Blueprints
Declarative definition of a cluster written in JSON.
Preserves best practice configuration across deployments
Requires OS configuration pre-requisites already in place – Ambari will perform checks and warn you.
![Page 8: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/8.jpg)
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Automation! Why Ansible?
![Page 9: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/9.jpg)
9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ansible for HDP Deployments
Playbooks– Bootstrap baseline configuration
– Install DBs
– Install HDP software
Roles– Master Servers
– Slave Servers
– Ambari Server
– Ambari Agent
Tasks– Install prerequisite packages
– Install Ambari Server packages
– Install Ambari Agent packages
– Disable SELinux
– Turn on NTP
Templates– /etc/hosts
– Ambari Blueprints
Files– Disable THP
– Disable Swapping
![Page 10: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/10.jpg)
10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Create 6-node Environment
• Using Amazon AWS• 6 x c4.4xlarge instances
• Simple Ansible solution• AWS provisioning using ec2 and ec2_group
modules
• Simple inventory
• Simple playbook
• Simple ansible.cfg
![Page 11: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/11.jpg)
11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Simple Inventory
All Ansible commands run locally
Uses AWS API
Using Anaconda Python
![Page 12: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/12.jpg)
12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Simple Playbook: hadoop-demo.yml
• 2 Tasks• Create Security Group
• Create EC2 Instances
![Page 13: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/13.jpg)
13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Task: Provision Security Group
ec2_group module– Region
– VPC
– Rules
![Page 14: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/14.jpg)
14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Task: Provision Servers
ec2 module– Region
– Group
– Instance type
– AMI
– Volumes
– Counts
– Tags
![Page 15: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/15.jpg)
15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Run Playbook
ansible-playbook -i inventory/hosts playbooks/hadoop-demo.yml
Takes ~35 seconds
![Page 16: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/16.jpg)
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
DEMO
![Page 17: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/17.jpg)
17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ansible AWS Ad-Hoc Examples
Dynamic Inventory– https://aws.amazon.com/blogs/apn/gettin
g-started-with-ansible-and-dynamic-amazon-ec2-inventory-management/
– https://raw.githubusercontent.com/ansible/ansible/devel/contrib/inventory/ec2.py
– https://raw.githubusercontent.com/ansible/ansible/devel/contrib/inventory/ec2.ini
– Handy Python script allows you to interact with AWS instances
![Page 18: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/18.jpg)
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Ready to Create?
Inventory– Dev
– Test
– Prod
Playbook– Roles
– Tasks
– Templates
– Files
– Handlers
Generally an iterative process
Start small, move towards more complex
Entire process could take a couple of days to a couple of weeks
![Page 19: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/19.jpg)
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Why re-invent the wheel?• https://github.com/objectrocket/
ansible-hadoop
• ObjectRocket is a Rackspace company.
• Enables deployment of hadoop clusters using Ansible
• Supports Rackspace cloud and existing environments
• Ansible == 2.1.3.0 (2.2 is not supported at the moment)
• Expects RHEL/CentOS 6/7 or Ubuntu 14 hosts.
• Simple – Configure, then run two scripts
![Page 20: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/20.jpg)
20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
DEMO
![Page 21: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/21.jpg)
21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Minimal Configuration Needed
inventory/static
playbooks/group_vars/master_nodes
playbooks/group_vars/slave_nodes
playbooks/group_vars/hortonworks
ansible.cfg
Optional: custom repos and blueprints
![Page 22: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/22.jpg)
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Modify inventory/static
Add information for master, slave and edge nodes
Use public IP for ansible_host
Default user for my AMI is centos. Set ansible_ssh_user appropriately.
Using key, so no password specified
Don’t forget to comment unused node types (edge-nodes)
![Page 23: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/23.jpg)
23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Modify playbook/group_vars/*_nodes
Refer to template files for examples
Most options are geared towards Rackspace cloud
![Page 24: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/24.jpg)
24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Modify playbook/group_vars/hortonworks
Specify Configuration Details– version of HDP and Ambari to install
– components to install
– admin and service passwords
– repo URL
I left this as-is
![Page 25: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/25.jpg)
25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Modify ansible.cfg
Change library value to playbooks/library/site_facts
Specify location of private_key_file.
![Page 26: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/26.jpg)
26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Run bootstrap_static.sh
Performs the common bootstrap configurations
$ bash bootstrap_static.sh
Takes ~8 minutes
Consistent timing regardless of node count
Same tasks done on all servers in parallel – Ansible approach.
![Page 27: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/27.jpg)
27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
DEMO
![Page 28: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/28.jpg)
28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Run hortonworks_static.sh
Performs the Hortonworks installation
$ bash hortonworks_static.sh
Takes ~19 minutes (4-node m4.xlarge cluster)
master01 had significantly more tasks to implement
![Page 29: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/29.jpg)
29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Retrying Tasks is Normal
The last task is waiting for the cluster to be built
Normal to see many failed checks with retry attempts.
![Page 30: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/30.jpg)
30 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Monitor Ambari During Install
Monitor Ambari during cluster installation.
![Page 31: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/31.jpg)
31 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
One Node: ~1,000 seconds
One node took ~1,000 seconds to complete install and startup
This node is the master node, has more components
Room to decrease deployment time by adding more master nodes
![Page 32: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/32.jpg)
32 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Five Node Cluster
5 x m4.xlarge
2 master and 3 slave nodes
Took ~15 minutes
~3 minutes faster than 4-node cluster.
More even distribution of components on master servers
![Page 33: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/33.jpg)
33 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Six Node Cluster
6 x m4.xlarge
3 master and 3 slave nodes
Took ~15 minutes
No apparent improvement in deployment times over 5-node cluster.
![Page 34: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/34.jpg)
34 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Comparing Instance Sizes - Six Node Cluster
m4.xlarge vs c4.4xlarge
Same cluster configuration
3 master and 3 slave nodes
Took ~12 minutes
~3 minutes faster than m4.xlarge cluster
![Page 35: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/35.jpg)
35 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Number & Size of Nodes
Factoring the number and size of nodes to decrease deployment time is interesting, but not generally important
Size your cluster on based on data size and workload– More Data: more local storage per slave node, more slave nodes
– More Queries: more memory and cpu per slave node, more slave nodes
– High Availability: Use at least 3 master nodes, at least 3 slave nodes
Minimum recommended cluster size for production is ~12 nodes
![Page 36: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/36.jpg)
36 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Summary
Easily created an AWS environment using a simple Ansible playbook– Takes ~1-2 minutes, includes modifying playbook
Easily deployed 6-node HDP cluster– Ran playbook from an AWS node with Ansible
– Modify a couple of configuration files
– Run 2 commands and have an HDP cluster in < 20 minutes
Demonstrated how cluster size and instance type affected deployment times
![Page 37: Ansible + Hadoop - Fierce Softwarefiercesw.com/wp-content/uploads/2017/02/ansible-nova-hortonworks...Prepare Infrastructure ... ansible-hadoop ... Easily created an AWS environment](https://reader034.vdocument.in/reader034/viewer/2022042611/5abe55fb7f8b9ab02d8cc2a0/html5/thumbnails/37.jpg)
37 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Questions?