running ansible at scale atl slide... · ansible (2.7.13) boto3 (1.9.202) pycrypto (2.6.1) jinja2...

Running Ansible at ScaleAjay ChenamparaSr. Specialist Solutions ArchitectNorth American Public Sector

Sam DoranSenior Software EngineerAnsible Core

Agenda● How Ansible grows

● Workflow and content

● Scaling Ansible Core

● Scaling further with Ansible Tower

How Ansible Grows🌱

ansible (2.7.13)boto3 (1.9.202)pycrypto (2.6.1)Jinja2 (2.9.6)

ansible (2.8.0)boto3 (1.9.202)pycrypto (2.6.1)Jinja2 (2.3)

ansible (2.8.5)boto3 (1.2.4)Jinja2 (2.3)

FreeImages.com/therysma

Workflow and Content

Git

Production

Test

Master

Feature/Bug Fix

Repository Structure

ansible├── group_vars│ ├── all.yml│ ├── dev.yml│ ├── prod.yml│ └── web.yml├── inventory├── library├── roles│ └── requirements.yml├── .gitignore├── ansible.cfg├── apache.yml├── deploy-app.yml└── install-updates.yml



} Play Group VarsWork consistently between command line and Ansible Tower



Static, dynamic, or both



Custom modules available across all roles



Roles from external repositories}



# Example requirements.yml- samdoran.java- samdoran.redhat_subscription

- src: ssh://[email protected]:8989/ansible-role-apache.git name: apache scm: git version: develop

- src: ssh://[email protected]:8989/ansible-role-users.git name: users scm: git



Ignore roles and other files



*.retry.vagrant*.zip

# Ignore everything in roles/ except requirements.ymlroles/*!roles/requirements.yml



Main Ansible configuration



PlaybooksAdjacent to group_vars and library}

Another Repository Structure├── inventory├── library├── playbooks│ ├── group_vars│ │ ├── all.yml│ │ ├── dev.yml│ │ ├── prod.yml│ │ └── web.yml│ ├── library -> ../library│ ├── roles -> ../roles│ ├── apache.yml│ ├── deploy-app.yml│ └── install-updates.yml├── roles│ └── requirements.yml├── .gitignore└── ansible.cfg


Symlinks keep library and roles adjacent to playbooks}

Scaling Ansible Core

How Ansible Works

NETWORKINGDEVICES

LINUX/WINDOWSHOSTS

Module code is copied to the managed node, executed, then removed

Module code is executed locally on the control node

2.6: memory ballooning (#35921)

2.7: deepdish copy (#44337) 🍕

2.8: consolidate handler tracking (#49338)

2.9: Perfy McPerfton (#58400)

Use the Latest Version

"Smart" gathering means only gather facts if needed

Gathering all facts can consume a lot of memory and cause CPU contention with a higher fork count — use min

Several cache plugins available:

Just the Facts

● jsonfile● memcached● mongodb

● pickle● redis● yaml

Just the Facts

[default]gathering = smartgather_subset = minfact_caching = jsonfilefact_caching_connection = ~/.ansible/cachefact_caching_timeout = 3600

Forks🍴

Forks

Default is 5 (very conservative)

More forks means more parallel connections to hosts

Too many forks will overburden your system due to context switching and large number of facts in memory

Forks

General guidelines:

● 5-25 forks on a developer workstation or laptop

● 25-50 on a dedicated server

[default]gathering = smartgather_subset = minfact_caching = jsonfilefact_caching_connection = ~/.ansible/cachefact_caching_timeout = 3600forks = 30

Forks

Connection

Connection

Enable pipelining (!requiretty needed on hosts)

Increase ControlPersist timeout (default 60s)

Use scp (default is sftp)

Connection

[default]gathering = smartgather_subset = minfact_caching = jsonfilefact_caching_connection = ~/.ansible/cachefact_caching_timeout = 3600forks = 50

[ssh_connection]pipelining = Truescp_if_ssh = Truessh_args = -C -o ControlMaster=auto -o ControlPersist=15m

Grab Bag

Python 3 performs better that Python 2

Bastion hosts will slow things down

Passwordless ssh authentication will speed things up

Remote shell profile can slow things down

Using native Jinja types can speed things up (DEFAULT_JINJA2_NATIVE)

Scaling Further withAnsible Tower

Challenges to scaling Ansible

Secure credential storage

Scheduler

API

Detailed auditing

Challenges to scaling Ansible

Consistent Ansible version

Python libraries and module dependencies

Capacity and Jobs

Ansible Tower Capacity Determination

Memory Relative Capacity:

Number of Forks = (total_mem - 2GB)/mem_per_fork


mem_per_fork = 100 by default

CPU Relative Capacity:

Number of Forks = cpus * forks_per_cpu


forks_per_cpu = 4 by default

Running Jobs

Job Slicing

● Slices inventory into a number of chunks, which are then used to run smaller job slices.

● Ideal for workloads where the tasks run on each host can be run independent of other hosts

Fact Caching

Fact Caching

● Cache once, consume over and over

● Schedule a job to gather facts

Smart Inventory

Clustering

Clustering

Bigger is not always better

Instance Groups: Setup and Use

A set of cluster nodes dedicated for a particular purpose

● Instances are shared among teams, groups, and organizations

● Each instance group has its own job queue, and any node in the group can take jobs off of that queue

● Jobs can be assigned to an instance group inthree ways:○ By the organization○ By the inventory○ By the individual job template

Instance Groups[tower]tower1.happy.companytower2.happy.companytower3.happy.company

[instance_group_network]net1.happy.companynet2.happy.companynet3.happy.companytower1.happy.company

[instance_group_compute]compute1.happy.companytower1.happy.companytower2.happy.companytower3.happy.company

[instance_group_prod]prodtower.happy.companytower1.happy.companytower2.happy.companytower3.happy.company

Isolated Nodes

A headless Ansible Tower node that can be used for local execution capacity, either in a constrained networking environment sor in a remote data center

Only requirement is SSH connectivity to the central Tower cluster

Isolated Nodes: Setup and Use

[isolated_group_fortress]solitude1.fortresssolitude2.fortress

[isolated_group_fortress:vars]controller=tower

[tower]chicago1.home.officechicago2.home.officechicago3.home.office

[isolated_group_nc]cary.remote.office controller=tower

[isolated_group_il]bridgeview.remote.office controller=tower

[isolated_group_nj]piscataway.remote.office controller=tower

[isolated_group_ut]sandy.remote.office controller=tower

Instance group that manages tasks

Thank youAjay ChenamparaSr. Specialist Solutions ArchitectNorth American Public Sector

Sam DoranSenior Software EngineerAnsible Core

running ansible at scale atl slide... · ansible (2.7.13) boto3 (1.9.202) pycrypto (2.6.1) jinja2...

Documents