running ansible at scale atl slide... · ansible (2.7.13) boto3 (1.9.202) pycrypto (2.6.1) jinja2...
TRANSCRIPT
Running Ansible at ScaleAjay ChenamparaSr. Specialist Solutions ArchitectNorth American Public Sector
Sam DoranSenior Software EngineerAnsible Core
Agenda● How Ansible grows
● Workflow and content
● Scaling Ansible Core
● Scaling further with Ansible Tower
How Ansible Grows🌱
ansible (2.7.13)boto3 (1.9.202)pycrypto (2.6.1)Jinja2 (2.9.6)
ansible (2.8.0)boto3 (1.9.202)pycrypto (2.6.1)Jinja2 (2.3)
ansible (2.8.5)boto3 (1.2.4)Jinja2 (2.3)
FreeImages.com/therysma
Workflow and Content
Git
Production
Test
Master
Feature/Bug Fix
Repository Structure
ansible├── group_vars│ ├── all.yml│ ├── dev.yml│ ├── prod.yml│ └── web.yml├── inventory├── library├── roles│ └── requirements.yml├── .gitignore├── ansible.cfg├── apache.yml├── deploy-app.yml└── install-updates.yml
Repository Structure
ansible├── group_vars│ ├── all.yml│ ├── dev.yml│ ├── prod.yml│ └── web.yml├── inventory├── library├── roles│ └── requirements.yml├── .gitignore├── ansible.cfg├── apache.yml├── deploy-app.yml└── install-updates.yml
} Play Group VarsWork consistently between command line and Ansible Tower
Repository Structure
ansible├── group_vars│ ├── all.yml│ ├── dev.yml│ ├── prod.yml│ └── web.yml├── inventory├── library├── roles│ └── requirements.yml├── .gitignore├── ansible.cfg├── apache.yml├── deploy-app.yml└── install-updates.yml
Static, dynamic, or both
Repository Structure
ansible├── group_vars│ ├── all.yml│ ├── dev.yml│ ├── prod.yml│ └── web.yml├── inventory├── library├── roles│ └── requirements.yml├── .gitignore├── ansible.cfg├── apache.yml├── deploy-app.yml└── install-updates.yml
Custom modules available across all roles
Repository Structure
ansible├── group_vars│ ├── all.yml│ ├── dev.yml│ ├── prod.yml│ └── web.yml├── inventory├── library├── roles│ └── requirements.yml├── .gitignore├── ansible.cfg├── apache.yml├── deploy-app.yml└── install-updates.yml
Roles from external repositories}
Repository Structure
ansible├── group_vars│ ├── all.yml│ ├── dev.yml│ ├── prod.yml│ └── web.yml├── inventory├── library├── roles│ └── requirements.yml├── .gitignore├── ansible.cfg├── apache.yml├── deploy-app.yml└── install-updates.yml
# Example requirements.yml- samdoran.java- samdoran.redhat_subscription
- src: ssh://[email protected]:8989/ansible-role-apache.git name: apache scm: git version: develop
- src: ssh://[email protected]:8989/ansible-role-users.git name: users scm: git
Repository Structure
ansible├── group_vars│ ├── all.yml│ ├── dev.yml│ ├── prod.yml│ └── web.yml├── inventory├── library├── roles│ └── requirements.yml├── .gitignore├── ansible.cfg├── apache.yml├── deploy-app.yml└── install-updates.yml
Ignore roles and other files
Repository Structure
ansible├── group_vars│ ├── all.yml│ ├── dev.yml│ ├── prod.yml│ └── web.yml├── inventory├── library├── roles│ └── requirements.yml├── .gitignore├── ansible.cfg├── apache.yml├── deploy-app.yml└── install-updates.yml
*.retry.vagrant*.zip
# Ignore everything in roles/ except requirements.ymlroles/*!roles/requirements.yml
Repository Structure
ansible├── group_vars│ ├── all.yml│ ├── dev.yml│ ├── prod.yml│ └── web.yml├── inventory├── library├── roles│ └── requirements.yml├── .gitignore├── ansible.cfg├── apache.yml├── deploy-app.yml└── install-updates.yml
Main Ansible configuration
Repository Structure
ansible├── group_vars│ ├── all.yml│ ├── dev.yml│ ├── prod.yml│ └── web.yml├── inventory├── library├── roles│ └── requirements.yml├── .gitignore├── ansible.cfg├── apache.yml├── deploy-app.yml└── install-updates.yml
PlaybooksAdjacent to group_vars and library}
Another Repository Structure├── inventory├── library├── playbooks│ ├── group_vars│ │ ├── all.yml│ │ ├── dev.yml│ │ ├── prod.yml│ │ └── web.yml│ ├── library -> ../library│ ├── roles -> ../roles│ ├── apache.yml│ ├── deploy-app.yml│ └── install-updates.yml├── roles│ └── requirements.yml├── .gitignore└── ansible.cfg
Another Repository Structure├── inventory├── library├── playbooks│ ├── group_vars│ │ ├── all.yml│ │ ├── dev.yml│ │ ├── prod.yml│ │ └── web.yml│ ├── library -> ../library│ ├── roles -> ../roles│ ├── apache.yml│ ├── deploy-app.yml│ └── install-updates.yml├── roles│ └── requirements.yml├── .gitignore└── ansible.cfg
Symlinks keep library and roles adjacent to playbooks}
Another Repository Structure├── inventory├── library├── playbooks│ ├── group_vars│ │ ├── all.yml│ │ ├── dev.yml│ │ ├── prod.yml│ │ └── web.yml│ ├── library -> ../library│ ├── roles -> ../roles│ ├── apache.yml│ ├── deploy-app.yml│ └── install-updates.yml├── roles│ └── requirements.yml├── .gitignore└── ansible.cfg
Scaling Ansible Core
How Ansible Works
NETWORKINGDEVICES
LINUX/WINDOWSHOSTS
Module code is copied to the managed node, executed, then removed
Module code is executed locally on the control node
2.6: memory ballooning (#35921)
2.7: deepdish copy (#44337) 🍕
2.8: consolidate handler tracking (#49338)
2.9: Perfy McPerfton (#58400)
Use the Latest Version
Facts
"Smart" gathering means only gather facts if needed
Gathering all facts can consume a lot of memory and cause CPU contention with a higher fork count — use min
Several cache plugins available:
Just the Facts
● jsonfile● memcached● mongodb
● pickle● redis● yaml
Just the Facts
[default]gathering = smartgather_subset = minfact_caching = jsonfilefact_caching_connection = ~/.ansible/cachefact_caching_timeout = 3600
Forks🍴
Forks
Default is 5 (very conservative)
More forks means more parallel connections to hosts
Too many forks will overburden your system due to context switching and large number of facts in memory
Forks
General guidelines:
● 5-25 forks on a developer workstation or laptop
● 25-50 on a dedicated server
[default]gathering = smartgather_subset = minfact_caching = jsonfilefact_caching_connection = ~/.ansible/cachefact_caching_timeout = 3600forks = 30
Forks
Connection
Connection
Enable pipelining (!requiretty needed on hosts)
Increase ControlPersist timeout (default 60s)
Use scp (default is sftp)
Connection
[default]gathering = smartgather_subset = minfact_caching = jsonfilefact_caching_connection = ~/.ansible/cachefact_caching_timeout = 3600forks = 50
[ssh_connection]pipelining = Truescp_if_ssh = Truessh_args = -C -o ControlMaster=auto -o ControlPersist=15m
Grab Bag
Python 3 performs better that Python 2
Bastion hosts will slow things down
Passwordless ssh authentication will speed things up
Remote shell profile can slow things down
Using native Jinja types can speed things up (DEFAULT_JINJA2_NATIVE)
Scaling Further withAnsible Tower
Challenges to scaling Ansible
Secure credential storage
Scheduler
API
Detailed auditing
Challenges to scaling Ansible
Consistent Ansible version
Python libraries and module dependencies
Capacity and Jobs
Ansible Tower Capacity Determination
Memory Relative Capacity:
Number of Forks = (total_mem - 2GB)/mem_per_fork
Ansible Tower Capacity Determination
mem_per_fork = 100 by default
CPU Relative Capacity:
Number of Forks = cpus * forks_per_cpu
Ansible Tower Capacity Determination
forks_per_cpu = 4 by default
Running Jobs
Job Slicing
● Slices inventory into a number of chunks, which are then used to run smaller job slices.
● Ideal for workloads where the tasks run on each host can be run independent of other hosts
Fact Caching
Fact Caching
● Cache once, consume over and over
● Schedule a job to gather facts
Smart Inventory
Clustering
Clustering
Bigger is not always better
Instance Groups: Setup and Use
A set of cluster nodes dedicated for a particular purpose
● Instances are shared among teams, groups, and organizations
● Each instance group has its own job queue, and any node in the group can take jobs off of that queue
● Jobs can be assigned to an instance group inthree ways:○ By the organization○ By the inventory○ By the individual job template
Instance Groups[tower]tower1.happy.companytower2.happy.companytower3.happy.company
[instance_group_network]net1.happy.companynet2.happy.companynet3.happy.companytower1.happy.company
[instance_group_compute]compute1.happy.companytower1.happy.companytower2.happy.companytower3.happy.company
[instance_group_prod]prodtower.happy.companytower1.happy.companytower2.happy.companytower3.happy.company
Isolated Nodes
A headless Ansible Tower node that can be used for local execution capacity, either in a constrained networking environment sor in a remote data center
Only requirement is SSH connectivity to the central Tower cluster
Isolated Nodes: Setup and Use
[isolated_group_fortress]solitude1.fortresssolitude2.fortress
[isolated_group_fortress:vars]controller=tower
[tower]chicago1.home.officechicago2.home.officechicago3.home.office
[isolated_group_nc]cary.remote.office controller=tower
[isolated_group_il]bridgeview.remote.office controller=tower
[isolated_group_nj]piscataway.remote.office controller=tower
[isolated_group_ut]sandy.remote.office controller=tower
Instance group that manages tasks
Thank youAjay ChenamparaSr. Specialist Solutions ArchitectNorth American Public Sector
Sam DoranSenior Software EngineerAnsible Core