automating dba tasks with ansible - home: doag e.v. · ©2013enkitec&...
TRANSCRIPT
©2013 Enkitec
Automa3ng (DBA) tasks with Ansible
Frits Hoogland DOAG 2015
1
This is the font size used for showing screen output. Be sure this is readable for you.
This is the font used to accentuate text/console output. Make sure this is readable for you too!
`whoami`
• Frits Hoogland • Working with Oracle products since 1996
• Blog: hJp://fritshoogland.wordpress.com • TwiJer: @fritshoogland • Email: [email protected] • Oracle ACE Director • OakTable Member
2
Problem
• In my experience, there is a chance on configura3on errors/devia3ons when: – The number of tasks in a procedure is high – Descrip3ons of tasks are non-‐extensively described / le^ to interpreta3on
– Tasks become anonymous (can’t be put in perspec3ve) – There’s too liJle 3me to interpret task outcome – There’s too much or too liJle output – The UI ac3ons are hard to describe –When a task becomes (extremely) repe33ve
6
Problem
• This results in:
– Differences in seangs – Differences in paths – Inconsistent usage of upper and lowercase – Steps forgoJen – Errors, things not able to work – General inconsistencies
7
Problem
• Which essen3ally means:
– Stuff not working
…or…
– Stuff working, but configured (slightly) differently
• Which can make the next configura3on/update/installa3on/modifica3on to fail.
8
Resolu3on
• The problem of configura3on automa3on was solved in 1993 by Mark Burgess with:
• CFEngine – Configura3on Engine
• Released as Open Source so^ware
9
Resolu3on
• Okay, problem solved then?
• In the situa3on I described it was considered. • But concept was too alien for the majority
• It never gained wide acceptance during that 3me
10
Resolu3on
• Time progressed…
• From the “dotcom bubble” we have moved to the “as-‐a-‐service era™” • Get/remove/configure hosts via web UI • Dynamic scaling of hosts needs orchestra3on • Devops
11
Resolu3on
• Now the requirements expanded
• CFEngine (essen3ally) keeps configura3on
• What was needed: • Orchestra3on • …in addi3on to configura3on management
12
Resolu3on
• This is what the “new” genera3on of “IT automa3on” tools brought (top 4):
• Puppet • Chef • Ansible • Salt
14
Resolu3on
• There’s a lot you can say on these tools.
• Puppet agent based dev focussed • Chef agent based dev focussed • Ansible agentless admin focussed • Salt agent ^ agentless admin focussed
• I didn’t know about Salt when I started learning Ansible…
15
Resolu3on
• Wait a minute!
• Isn’t a (large) por3on of DBA work: • Configura3on of O/S to facilitate usage • Install of Oracle database so^ware • Crea3on of databases • Configura3on of databases • Update so^ware and database
16
Resolu3on
• This is why I started to look at Ansible:
• Be able to completely automate boring tasks • Work agentless (!!) • No incorpora3on into corporate infra needed • Makes it a consultant’s tool
17
Ansible execu3on
19
Ansible server Des3na3on host in inventory/hosts file
ansible-‐playbook: #!/usr/bin/python hosts
playbook.yml
Openssh
For each task:
1. mkdir ~/.ansible/tmp/ansible-‐tmp…
2. copy task args in file in dir
3. copy (python) module in dir
4. execute module
5. send results in JSON format back
6. remove ansible-‐tmp… dir
Install ansible
• RHEL/OL 5-‐6; via EPEL (directly)
1. Add EPEL repository 2. Install Ansible # yum install ansible
• EPEL doesn’t contain the latest version.
20
Install ansible
• RHEL/OL 5-‐6; via pip
1. Add EPEL repository 2. Install pip # yum install python-pip
3. Install Ansible # pip install ansible --quiet
• Installs latest version; not sure about askpass
21
Install ansible
• OSX 1. Install Xcode (via appstore) 2. Install pip $ sudo easy_install pip
3. Install Ansible $ sudo pip install ansible --quiet
4. Install sshpass $ curl -O -L http://downloads.sourceforge.net/project/sshpass/sshpass/1.05/sshpass-1.05.tar.gz
$ tar xzf sshpass-1.05.tar.gz; cd sshpass-1.05
$ ./configure; make
$ sudo make install
22
Install ansible
• Windows – Really?
• I’ve found a link which described how to do it in Cygwin on Windows. – No descrip3on on www.ansible.com – hJps://servercheck.in/blog/running-‐ansible-‐within-‐windows
• Create a VM running Linux!!
23
Disclaimer
• I s3ll use Ansible on a small scale
• The most projects I see have the whole directory structure created as specified on ansible.com – This seems to follow the cfengine way of “declaring the intended state”.
– Instead of just describing changes, what I mostly do/use it for
24
Geang started
• Simple usage, execu3ng a single task
• Create a directory for a specific task • Create a file called ‘hosts’ (‘inventory’) • This is the list of hosts you want to use
$ ansible all -i hosts --list-hosts
192.168.39.142
192.168.39.139
25
Geang started
• Simple usage, execu3ng a single task
• Next step: check if you can reach the hosts
$ ansible all -i hosts -m ping
192.168.39.139 | FAILED => SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue
192.168.39.142 | FAILED => SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue
26
Geang started
• Simple usage, execu3ng a single task • Remember: Ansible uses ssh authen3ca3on • You need to specify user • Before key is exchanged, password needed
$ ansible all -i hosts -u root -k -m ping
SSH password:
192.168.39.139 | success >> {
"changed": false,
"ping": "pong"
}
192.168.39.142 | FAILED => FAILED: timed out
27
Geang started
• The RHEL/OL ssh daemon: • Default enabled authen3ca3on methods: • Password and public key
• /etc/ssh/sshd_config • PubkeyAuthen3ca3on yes
• Authorized keys on the remote server are stored in ~/.ssh/authorized_keys
28
Geang started
• In order to use pub key authen3ca3on
• You need a local key-‐pair before you can use it • ssh-‐keygen command
• The public key needs to be placed in the authorized_keys file of a server to have pub key authen3ca3on setup
29
Geang started
• Simple usage, execu3ng a single task
– Now the pub key authen3ca3on has been setup – Let’s execute a task:
$ ansible all -i hosts -u root -a "ifconfig -a"
192.168.39.139 | success | rc=0 >>
eth0 Link encap:Ethernet HWaddr 00:0C:29:D1:DD:32
...
192.168.39.142 | success | rc=0 >>
eth0 Link encap:Ethernet HWaddr 00:0C:29:1C:6F:A9
...
30
Geang started
• Simple usage, execu3ng a single task
– This looks a lot like ‘dcli’ which is provided with exadata
31
Advanced usage
• Advanced usage: playbook – Playbooks use the YAML syntax
• A YAML document must begin with ‘-‐-‐-‐' • All members of a list start with ‘-‐‘ at the same inden3on level • A dic3onary is represented in a key/value form • Inden3on sensi3ve
– YAML is used because it’s easier to read for humans than formats like XML or JSON.
32
Advanced usage
• Simple playbook: install public key for root ---
- hosts: all
gather_facts: no
remote_user: root
tasks:
- name: add public key to authorized_key file of root
authorized_key: user=root state=present
key=“{{ lookup(‘file’,’/Users/fritshoogland/.ssh/id_dsa.pub’) }}”
33
Advanced usage
• Simple playbook: install public key for root • Exadata!
$ ansible-playbook -i hosts -k root_pubkey.yml
SSH password:
PLAY [all] **********************************************
TASK: [copy public key to authorized_key file of root] **
failed: [enkdb02] => {"failed": true, "parsed": false}
invalid output was: Error: ansible requires a json module, none found!
failed: [enkdb01] => {"failed": true, "parsed": false}
invalid output was: Error: ansible requires a json module, none found!
FATAL: all hosts have already failed -- aborting
PLAY RECAP **********************************************
to retry, use: --limit @/home/ansible/root_pubkey.retry
enkdb01 : ok=0 changed=0 unreachable=0 failed=1
enkdb02 : ok=0 changed=0 unreachable=0 failed=1
34
Advanced usage
• Wait! • Message:
Error: ansible requires a json module, none found!
• This means the python-‐simplejson rpm is not installed
• Does this mean this breaks Ansible??
35
Advanced usage
• Resolu3on: raw module
• Purpose: run command without the need for python
• In our case: to add the install of python-‐simplejson to the playbook
36
Advanced usage
• Simple playbook: install public key for root ---
- hosts: all
gather_facts: no
remote_user: root
tasks:
- name: name: install python-simplejson rpm
raw: eval rpm -q python-simplejson >& /dev/null || rpm -Uvh http://public-yum.oracle.com/repo/OracleLinux/OL5/10/base/x86_64/getPackage/python-simplejson-2.0.9-8.el5.x86_64.rpm
- name: add public key to authorized_key file of root
authorized_key: user=root state=present
key=“{{ lookup(‘file’,’/Users/fritshoogland/.ssh/id_dsa.pub’) }}”
37
Advanced usage
• Simple playbook, install public key for root $ ansible-playbook -i hosts -k initial_setup_exadata.yml
SSH password:
PLAY [all] ********************************************************************
TASK: [install python-simplejson rpm] *****************************************
ok: [enkdb01]
ok: [enkdb02]
TASK: [add public key to authorized_key file of root] *************************
changed: [enkdb01]
changed: [enkdb02]
PLAY RECAP ********************************************************************
enkdb01 : ok=2 changed=1 unreachable=0 failed=0
enkdb02 : ok=2 changed=1 unreachable=0 failed=0
38
Thoughts
• What do you think?
• Lot of fuss for something I can do quite simply and fast myself?
• Or do you see poten3al?
39
Real Life scenario
• Let me walk you through a playbook that upgrades the Oracle database so^ware. • In this case: PSU4 of Oracle 11.2.0.4
40
psu upgrade 1/6
---
- hosts: all
vars:
u01_size_gb: 1
tmp_size_gb: 1
oracle_base: /u01/app/oracle
oracle_home: /u01/app/oracle/product/11.2.0.4/dbhome_1
grid_home: /u01/app/11.2.0.4/grid
patch_dir: /u01/install
patch_grid: true
remote_user: oracle
sudo_user: root
sudo: false
gather_facts: no
tasks:
41
psu upgrade 2/6
- name: check u01 free disk space
action: shell df -P /u01 | awk 'END { print $4 }'
register: u01size
failed_when: u01size.stdout|int < {{ u01_size_gb }} * 1024 * 1024
- name: check tmp free disk space
action: shell df -P /tmp | awk 'END { print $4 }'
register: tmpsize
failed_when: tmpsize.stdout|int < {{ tmp_size_gb }} * 1024 * 1024
- name: create directory for installation files
action: file dest={{ patch_dir }} state=directory owner=oracle group=oinstall
42
psu upgrade 3/6 - name: copy opatch and psu grid
copy: src=../files/{{ item }} dest={{ patch_dir }} owner=oracle group=oinstall mode=0644
when: patch_grid|bool
with_items:
- p6880880_112000_Linux-x86-64.zip
- p19380115_112040_Linux-x86-64.zip
- ocm.rsp
- name: copy opatch and psu db
copy: src=../files/{{ item }} dest={{ patch_dir }} owner=oracle group=oinstall mode=0644
when: not patch_grid|bool
with_items:
- p6880880_112000_Linux-x86-64.zip
- p19121551_112040_Linux-x86-64.zip
- ocm.rsp
43
psu upgrade 4/6 - name: install opatch in database home
action: shell unzip -oq {{ patch_dir }}/p6880880_112000_Linux-x86-64.zip -d {{ oracle_home }}
- name: install opatch in grid home
action: shell unzip -oq {{ patch_dir }}/p6880880_112000_Linux-x86-64.zip -d {{ grid_home }}
when: patch_grid|bool
- name: unzip psu patch grid
action: shell unzip -oq {{ patch_dir }}/p19380115_112040_Linux-x86-64.zip -d {{ patch_dir }}
when: patch_grid|bool
- name: unzip psu patch db
action: shell unzip -oq {{ patch_dir }}/p19121551_112040_Linux-x86-64.zip -d {{ patch_dir }}
when: not patch_grid|bool
44
psu upgrade 5/6 - name: patch conflict detection
action: shell export ORACLE_HOME={{ oracle_home }}; cd {{ patch_dir }}/19121551; $ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -ph ./
register: conflict_detection
failed_when: "'Prereq \"checkConflictAgainstOHWithDetail\" passed.' not in conflict_detection.stdout"
when: not patch_grid|bool
- name: patch apply
action: shell export ORACLE_HOME={{ oracle_home }}; cd {{ patch_dir }}/19121551; $ORACLE_HOME/OPatch/opatch apply -silent -ocmrf {{ patch_dir }}/ocm.rsp
register: patch_apply
failed_when: "'Composite patch 19121551 successfully applied.' not in patch_apply.stdout"
when: not patch_grid|bool
45
psu upgrade 6/6 - name: apply psu to grid home
action: shell {{ grid_home }}/OPatch/opatch auto {{ patch_dir }}/19380115 -oh {{ grid_home }} -ocmrf {{ patch_dir }}/ocm.rsp
register: apply_psu_to_grid_home
failed_when: "'opatch auto succeeded.' not in apply_psu_to_grid_home.stdout"
sudo: true
when: patch_grid|bool
- name: apply psu to oracle home
action: shell {{ oracle_home }}/OPatch/opatch auto {{ patch_dir }}/19380115 -oh {{ oracle_home }} -ocmrf {{ patch_dir }}/ocm.rsp
register: apply_psu_to_oracle_home
failed_when: "'opatch auto succeeded.' not in apply_psu_to_oracle_home.stdout"
sudo: true
when: patch_grid|bool
- name: clean up install directory
file: path={{ patch_dir }} state=absent
46
Documenta3on
• In order to learn about all the modules:
• hJp://docs.ansible.com
51
Playbook header
---
- hosts: all
vars:
u01_size_gb: 1
tmp_size_gb: 1
oracle_base: /u01/app/oracle
oracle_home: /u01/app/oracle/product/11.2.0.4/dbhome_1
grid_home: /u01/app/11.2.0.4/grid
patch_dir: /u01/install
dbfs_resource: None
remote_user: oracle
sudo_user: root
sudo: false
gather_facts: true
tasks:
52
Check environment
- name: check u01 free disk space
action: shell df -P /u01 | awk 'END { print $4 }'
register: u01size
failed_when: u01size.stdout|int < {{ u01_size_gb }} * 1024 * 1024
- name: create directory for installation files
action: file dest={{ patch_dir }} state=directory owner=oracle group=oinstall
53
Yum & selinux
- name: upgrade all packages
yum: name=* state=latest
- name: install python-selinux
yum: name=libselinux-python state=installed
- name: disable selinux
selinux: state=disabled
54
Copy module- name: copy opatch and psu
copy: src=../files/{{ item }} dest={{ patch_dir }} owner=oracle group=oinstall mode=0644
with_items:
- p6880880_112000_Linux-x86-64.zip
- p19023390_112040_Linux-x86-64.zip
- ocm.rsp
- name: install opatch in grid home
action: shell unzip -oq {{ patch_dir }}/p6880880_112000_Linux-x86-64.zip -d {{ grid_home }}
- name: install opatch in database home
action: shell unzip -oq {{ patch_dir }}/p6880880_112000_Linux-x86-64.zip -d {{ oracle_home }}
- name: unzip bundle patch
action: shell unzip -oq {{ patch_dir }}/p19023390_112040_Linux-x86-64.zip -d {{ patch_dir }}
55
Copy files from webserver# declared variable
http_host: http://192.168.178.25/files/
- name: copy installation media files
get_url: url={{ http_host }}{{ item }} dest={{ patch_dir }} mode=0644
with_items:
- linux.x64_11gR2_database_1of2.zip
- linux.x64_11gR2_database_2of2.zip
56
git# declared variables
tools_home: /u01/tools
git_repo: git://192.168.178.19/tools.git
- name: install git package
yum: name=git state=present
sudo: yes
- name: clone tools
git: repo={{ git_repo }} dest={{ tools_home }} accept_hostkey=True
57
run opatch
- name: apply bundle patch to database home
action: shell export ORACLE_HOME={{ oracle_home }}; $ORACLE_HOME/OPatch/opatch auto {{ patch_dir }}/19023390 -ocmrf {{ patch_dir }}/ocm.rsp -oh $ORACLE_HOME
sudo: yes
register: patch_oracle
failed_when: "'opatch auto succeeded.' not in patch_oracle.stdout"
58
Check for newer kernel & reboot
- name: reboot if the kernel was updated
shell: 'if [ $(rpm -q --last kernel kernel-uek | head -1 | sed "s/^.*-\([0-9]*\.[0-9]*\.[0-9]*-.*x86_64\)\ *.*/\1/“ | sed "s/^.*-\([0-9]*\.[0-9]*\.[0-9]*-.*uek\)\ *.*/\1/“ ) != $(uname -r) ]; then echo "Reboot by Ansible: new kernel installed"; shutdown -r now "Reboot by Ansible: new kernel installed"; fi'
register: reboot
async: 0
poll: 0
ignore_errors: true
- name: wait for server to come down
when: "'Reboot by Ansible: new kernel installed' in reboot.stdout"
local_action: wait_for host={{ inventory_hostname }} port=22 state=stopped
- name: wait for server to come back up
when: "'Reboot by Ansible: new kernel installed' in reboot.stdout"
local_action: wait_for host={{ inventory_hostname }} port=22 state=started
59
Templates
• Some3mes, an unaJended install needs a filled out file
• For example the Oracle grid or database install
• Introducing: ansible templates
60
Templates...
#-------------------------------------------------------------------------------
# Specify the hostname of the system as set during the install. It can be used
# to force the installation to use an alternative hostname rather than using the
# first hostname found on the system. (e.g., for systems with multiple hostnames
# and network interfaces)
#-------------------------------------------------------------------------------
ORACLE_HOSTNAME={{ ansible_hostname }}
...
#------------------------------------------------------------------------------
# Specify the complete path of the Oracle Home.
#------------------------------------------------------------------------------
ORACLE_HOME={{ oracle_home }}
...
61
Templates
- name: create response file from template
template: src=/Users/fritshoogland/Documents/ansible/files/db_install_11204_single_instance.rsp.j2 dest={{ patch_dir }}/db_install_11204.rsp owner=oracle group=oinstall
62
unaJended / silent install oracle
- name: install oracle 11.2.0.4
action: shell {{ patch_dir}}/database/runInstaller -silent -force -waitforcompletion -responsefile {{ patch_dir }}/db_install_11204.rsp
register: install_oracle
failed_when: "'The installation of Oracle Database 11g was successful.' not in install_oracle.stdout”
- name: run orainstRoot.sh
action: shell /u01/app/oraInventory/orainstRoot.sh
when: "'/u01/app/oraInventory/orainstRoot.sh' in install_oracle.stdout"
sudo: true
- name: run root.sh
action: shell {{ oracle_home }}/root.sh -silent
sudo: true
63
Consistency…
$ grep 'failed_when.*install_oracle.stdout' install_oracle_*instance.yml
install_oracle_11201_oh_single_instance.yml:
failed_when: "'Successfully Setup Software.' not in install_oracle.stdout"
install_oracle_11202_oh_single_instance.yml:
failed_when: "'Successfully Setup Software.' not in install_oracle.stdout"
install_oracle_11203_oh_single_instance.yml:
failed_when: "'The installation of Oracle Database 11g was successful.' not in install_oracle.stdout"
install_oracle_11204_oh_single_instance.yml:
failed_when: "'The installation of Oracle Database 11g was successful.' not in install_oracle.stdout"
64
Consistency…
install_psu_112016.yml:
failed_when: "'Patch 12419378 successfully applied' not in patch_apply.stdout"
install_psu_1120212.yml:
failed_when: "'Composite patch 17082367 successfully applied.' not in patch_apply.stdout"
install_psu_1120312.yml:
failed_when: "'Composite patch 19121548 successfully applied.' not in patch_apply.stdout"
install_psu_112044.yml:
failed_when: "'Composite patch 19121551 successfully applied.' not in patch_apply.stdout"
install_psu_121015.yml:
failed_when: "'Patch 19121550 successfully applied' not in patch_apply.stdout"
install_psu_121021.yml:
failed_when: "'Patch 19303936 successfully applied' not in patch_apply.stdout"
65
Check/modify configura3on files
- name: check limits.conf
lineinfile:
dest=/etc/security/limits.conf
state=present
regexp="^{{ item.key }}"
line="{{ item.key }} {{ item.value }}"
backup=yes
with_items:
- { key: 'oracle soft core', value: 'unlimited' }
- { key: 'oracle hard core', value: 'unlimited' }
- { key: 'oracle soft nproc', value: '131072' }
- { key: 'oracle hard nproc', value: '131072' }
- { key: 'oracle soft nofile', value: '65536' }
- { key: 'oracle hard nofile', value: '65536' }
- { key: 'oracle soft memlock', value: '55679550' }
- { key: 'oracle hard memlock', value: '55679550' }
66
Check/modify configura3on files
- name: disable pam_tally2.so in /etc/pam.d
lineinfile:
dest={{ item }}
state=present
regexp=".*auth required pam_tally2.so.*"
line="#auth required pam_tally2.so deny=5 onerr=fail lock_time=600"
backup=yes
with_items:
- /etc/pam.d/sshd
- /etc/pam.d/login
67
Shell script
- name: check fsck interval of default exadata filesystems
shell: "[ $(tune2fs -l LABEL={{ item }} | grep '^Maximum mount count' | sed 's/^.*:\ *//') -ne -1 ] && tune2fs -c -1 LABEL={{ item }} || true"
with_items:
- BOOT
- DBSYS
- DBORA
68
Kernel parameters / sysctl - name: check sysctl.conf
sysctl:
name="{{ item.key }}"
value="{{ item.value }}"
state=present
reload=no
with_items:
- { key: 'net.ipv4.conf.all.secure_redirects', value: '0' }
- { key: 'net.ipv4.conf.default.log_martians', value: '1' }
- { key: 'net.ipv6.conf.default.accept_redirects', value: '0' }
- { key: 'net.ipv6.conf.all.accept_redirects', value: '0' }
- { key: 'vm.max_map_count', value: '250000' }
- { key: 'kernel.randomize_va_space', value: '2' }
- { key: 'kernel.shmmax', value: "{{ (( memsize_kb.stdout|int * 1024 ) * 0.85 )|int }}" }
- { key: 'kernel.shmall', value: "{{ (( memsize_kb.stdout|int * 1024 ) / page_size.stdout|int )|int }}" }
69