exploring history with hawk - susecon · 2020. 7. 2. · corosync messaging / infrastructure...
TRANSCRIPT
![Page 1: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/1.jpg)
Exploring History with HawkAn Introduction to Cluster Forensics
Kristoffer GrönlundHigh Availability Software Developer
![Page 2: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/2.jpg)
2
This tutorial
• High Availability in 5 minutes
• Introduction to HAWK‒ What's new in HAWK 2
• History Explorer‒ Cluster Forensics
‒ Example Usage
• Summary
![Page 3: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/3.jpg)
3
About me
• Kristoffer Grönlund‒ Developer
‒ crmsh
‒ hawk
‒ resource-agents
‒ Maintainer
‒ fence-agents
‒ haproxy
![Page 4: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/4.jpg)
High Availability
![Page 5: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/5.jpg)
5
High Availability
![Page 6: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/6.jpg)
6
What is a cluster?
• Cluster → 1 - 32* Nodes
• Node → Single machine in cluster‒ Hardware or virtualized
‒ Remote nodes
• Site → Physical location‒ Local
‒ Metro
‒ Geographical
* Scale beyond 32 nodes with remote nodes
![Page 7: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/7.jpg)
7
Resources
• Agent Classes‒ Open Cluster Framework (OCF) Agents
‒ resource-agents
‒ systemd services
‒ Fencing agents
‒ Init scripts
• Examples:‒ Web Server, File Server
‒ Databases
‒ Filesystems, IP Addresses
‒ VMs, resources in VMs...
![Page 8: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/8.jpg)
8
Constraints
• Order‒ Start resource A before resource B
• Location‒ Resource A prefers node
• Colocation‒ Resource A with resource B
• Score‒ Mandatory vs. Preference
‒ Numeric value or +/- infinity
‒ Resource stickiness
![Page 9: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/9.jpg)
9
Overview
Corosync
Messaging / Infrastructure
Resource Allocation
Resource Agents
ResourceResourceResource
Resource
Local Resource Manager Local Resource
Manager
Cluster Resource Manager
Policy Engine Cluster Information Base (CIB)
CIB Replica Cluster Resource
Manager
Corosync
Designated Coordinator (DC)
CO
RO
SYN
CPA
CEM
AK
ERR
ESO
UR
CES
![Page 10: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/10.jpg)
10
Fencing
• Dealing with Schrödinger's cat
• Goal: Preventing corruption
• Storage based: SBD‒ Recommended if possible
‒ No special hardware required
• Hardware based: IPMI, iLO, …‒ Many supported devices
![Page 11: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/11.jpg)
11
![Page 12: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/12.jpg)
12
Tools
• crmsh‒ Command line interface
• HAWK‒ Web interface
![Page 13: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/13.jpg)
13
Learn more
• www.suse.com/documentation/sle-ha-12/
• Two node cluster in two commands
node1 # ha-cluster-init
node2 # ha-cluster-join -c node1
![Page 14: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/14.jpg)
Introducing HAWK
![Page 15: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/15.jpg)
15
HAWK - Overview
• “High Availability Web Konsole”
• Monitoring
• Configuration / Administration
• Dashboard
![Page 16: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/16.jpg)
16
HAWK - Technical details
• Installed by ha-cluster-bootstrap
• Runs on the cluster nodes
• Ruby on Rails
• https://<node>:7630/
![Page 17: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/17.jpg)
17
HAWK - Security
• Default user is hacluster
‒ Remember to change the password
• HTTPS for secure access
• Replace SSL certificate with your own‒ /etc/hawk/hawk.key
‒ /etc/hawk/hawk.pem
![Page 18: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/18.jpg)
HAWK 0.7
![Page 19: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/19.jpg)
19
Status
![Page 20: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/20.jpg)
20
Dashboard
![Page 21: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/21.jpg)
HAWK 2
![Page 22: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/22.jpg)
22
A New Look
• Complete visual overhaul‒ More intuitive
‒ Similar to other SUSE tools
• Improved features‒ History Explorer
‒ More powerful wizards
‒ Integrated help
• Supports new cluster features
![Page 23: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/23.jpg)
23
Upgrading to HAWK 2
zypper install hawk2
![Page 24: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/24.jpg)
24
Login
![Page 25: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/25.jpg)
25
Status
![Page 26: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/26.jpg)
26
Dashboard
![Page 27: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/27.jpg)
27
Graph
![Page 28: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/28.jpg)
28
Simulator
![Page 29: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/29.jpg)
29
Simulator, node event
![Page 30: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/30.jpg)
30
Simulator, results
![Page 31: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/31.jpg)
31
Creating resources
![Page 32: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/32.jpg)
32
Command log
![Page 33: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/33.jpg)
Wizards
![Page 34: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/34.jpg)
34
Wizards
• Apply a complete cluster configuration
• Helps configuring constraints and groups
• Install and configure required software
![Page 35: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/35.jpg)
35
Wizards
![Page 36: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/36.jpg)
36
Wizard, configuration
![Page 37: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/37.jpg)
37
Wizard, verify changes
![Page 38: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/38.jpg)
38
Wizard, advanced options
![Page 39: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/39.jpg)
39
Wizard, optional steps
![Page 40: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/40.jpg)
40
Wizard, verify changes (1)
![Page 41: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/41.jpg)
41
Wizard, verify changes (2)
![Page 42: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/42.jpg)
42
Command line wizards
crm script
list
show virtual-ip
verify virtual-ip id=admin-ip ip=10.13.37.42
run virtual-ip id=...
![Page 43: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/43.jpg)
History Explorer
![Page 44: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/44.jpg)
44
Cluster Forensics
• Something went wrong‒ How can we figure it out?
‒ Pitfalls
• Understanding the cluster logs‒ Use the history explorer
‒ Get a cluster report
![Page 45: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/45.jpg)
45
Root Cause Analysis
• Start at the evidence
• Trace backwards
• Know the application
• Assume you know nothing
![Page 46: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/46.jpg)
46
Jumping To Conclusions
• Always stay on the evidence
• When the evidence runs out, we are guessing
• Guessing is OK!‒ But know when you are guessing
![Page 47: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/47.jpg)
47
The Evidence
• Failed Cluster Action‒ Software bugs, crashes
‒ Configuration error
• Failed Node‒ Hardware failure
‒ Communication error
![Page 48: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/48.jpg)
48
Collecting data
crm report -f '2015-10-10 12:00' -t '2015-10-10 14:00' strange_event
![Page 49: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/49.jpg)
49
Understanding the logs
2015-10-11T19:40:11.717167+02:00 sle12sp1a crmd[1590]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]2015-10-11T19:40:19.777412+02:00 sle12sp1a apache(srv2)[20777]: INFO: Successfully retrieved http header at http://localhost:80002015-10-11T19:40:24.524292+02:00 sle12sp1a crmd[1590]: notice: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]2015-10-11T19:40:24.528651+02:00 sle12sp1a pengine[1589]: notice: Restart admin_addr#011(Started sle12sp1b)2015-10-11T19:40:24.528851+02:00 sle12sp1a pengine[1589]: notice: Calculated Transition 156: /var/lib/pacemaker/pengine/pe-input-55.bz22015-10-11T19:40:24.530055+02:00 sle12sp1a crmd[1590]: notice: Processing graph 156 (ref=pe_calc-dc-1444585224-290) derived from /var/lib/pacemaker/pengine/pe-input-55.bz22015-10-11T19:40:24.530701+02:00 sle12sp1a crmd[1590]: notice: Initiating action 16: stop admin_addr_stop_0 on sle12sp1b2015-10-11T19:40:24.740118+02:00 sle12sp1a crmd[1590]: notice: Initiating action 6: start admin_addr_start_0 on sle12sp1b2015-10-11T19:40:24.801183+02:00 sle12sp1a crmd[1590]: notice: Initiating action 1: monitor admin_addr_monitor_10000 on sle12sp1b2015-10-11T19:40:24.836022+02:00 sle12sp1a crmd[1590]: notice: Transition 156 (Complete=4, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-55.bz2): Complete
![Page 50: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/50.jpg)
50
Internal components
• Cluster Information Base (CIB)
• Cluster Resource Management daemon (crmd)
• Local Resource Management daemon (lrmd)
• Policy Engine (pengine)
• Fencing daemon (stonithd)
![Page 51: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/51.jpg)
51
Policy Engine
• Designated Controller (DC)‒ Elected automatically
‒ Calculates ideal cluster state
‒ Decides on actions to achieve state
![Page 52: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/52.jpg)
52
Transition
• Sequence of actions to reach new state
• Records state before and after transition
• Saved to /var/lib/pacemaker/pengine/
• Numbered with sequence number‒ Number sequence may reset to 0 if DC is re-elected
![Page 53: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/53.jpg)
53
Cluster Actions
• <resource>_<action>_<nn>
• Actions‒ start
‒ stop
‒ promote
‒ demote
‒ monitor
‒ migrate_to
‒ migrate_from
![Page 54: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/54.jpg)
54
Cluster Actions
• Error Codes
0: Success
1: Generic Error
2: Argument Error
3: Unimplemented Action
4: Insufficient Permissions
5: Required Component Is Missing
6: Configuration Error
7: Resource Was Not Running
8: Running As Primary
9: Failed As Primary
![Page 55: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/55.jpg)
55
Cluster Action Failure
• Unexpected result when performing action
• Triggers transition
• May also trigger fencing (stop failure)
![Page 56: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/56.jpg)
56
Node Failure
• Quorum = Majority vote‒ Improves availability
‒ Avoids fence loops
‒ Downside: Need more nodes
• Smaller partitions are fenced
![Page 57: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/57.jpg)
57
Node Failure
• Crash / reboot
• Network issues
• Leads to chaos without fencing‒ Cluster no longer knows if node is running resources
• Uncommunicative nodes are fenced‒ Enforces a known state
![Page 58: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/58.jpg)
58
History Explorer
• Command line:‒ crm history
• Collect logs from cluster nodes
• Analyse transitions
• Present summary of events
• View configuration
• Transition graph
• Transition diff
• Extract logs during a particular transition
![Page 59: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/59.jpg)
59
History Explorer
![Page 60: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/60.jpg)
60
History Explorer
![Page 61: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/61.jpg)
61
History Explorer
![Page 62: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/62.jpg)
62
History Explorer
![Page 63: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/63.jpg)
63
History Explorer
![Page 64: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/64.jpg)
64
Example configuration
demo-node1
demo-node2
srv1
srv2
200
200
g-proxy
proxy proxy-vipping
50
![Page 65: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/65.jpg)
65
Example Description
• Two web servers‒ Port 8000
• HAProxy‒ Port 80
‒ Load balancer (round robin)
• Failed action: kill -9 proxy detected by monitor
![Page 66: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/66.jpg)
66
Failed Action
![Page 67: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/67.jpg)
67
History Explorer
![Page 68: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/68.jpg)
68
History Explorer
![Page 69: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/69.jpg)
69
History Explorer
![Page 70: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/70.jpg)
70
History Explorer
![Page 71: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/71.jpg)
71
History Explorer
![Page 72: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/72.jpg)
72
History Explorer
![Page 73: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/73.jpg)
73
History Explorer
![Page 74: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/74.jpg)
74
History Explorer
![Page 75: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/75.jpg)
75
History Explorer
![Page 76: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/76.jpg)
76
History Explorer
![Page 77: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/77.jpg)
77
Pitfalls
![Page 78: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/78.jpg)
78
Too many logs
• History explorer can get slow‒ Run HAWK in offline mode to avoid burdening cluster
• Find the relevant transitions
• Narrow the scope
• Command line:‒ timeframe <from> <to>
![Page 79: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/79.jpg)
79
End of the tracks
• Analysing action failure‒ Example: monitor fails for unknown reasons
‒ Probes
‒ Before starting a resource, Pacemaker checks if it is running
‒ Success Is Failure
• Know your application‒ Start at action failure, read application logs backwards
‒ At this point, the cluster can't help you
![Page 80: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/80.jpg)
80
General Confusion
• Which node wrote this log?‒ Was it even running the resource in question?
• Get back to the evidence‒ If in doubt, start over
• Cancelled Transitions‒ Sometimes, the history explorer gets confused
‒ Fencing can cancel a transition
‒ By default, Pacemaker fences offline nodes at startup
![Page 81: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/81.jpg)
81
Possible Problems
• Network Latency‒ Does your network fulfill the requirements?
• Disk is full
• Misconfiguration‒ Use csync2 or configuration management tool
• Fencing device failure‒ Is fencing enabled?
‒ Does the fencing device work?
‒ Use SBD
![Page 82: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/82.jpg)
82
Resource tracing
• crm resource trace <resource>
• /var/lib/heartbeat/trace_ra/<agent>/
• Note: Trace is written on node where resource runs
• Complete trace of every action‒ Can be a lot of data: remember to untrace!
![Page 83: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/83.jpg)
83
Summary
• Try The New Hawk
• Use The History Explorer
• Follow The Evidence‒ Action Failure Leads To Actions
‒ Node Failure Leads To Fencing
‒ Without Fencing, Anything Can Happen
![Page 84: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/84.jpg)
84
Open Source
https://github.com/ClusterLabs/hawk
https://github.com/ClusterLabs/crmsh
![Page 85: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/85.jpg)
Thank you.
85
Questions?
www.suse.com
![Page 86: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/86.jpg)
86
![Page 87: Exploring History with Hawk - SUSECON · 2020. 7. 2. · Corosync Messaging / Infrastructure Resource Allocation Resource Agents Resource Resource Resource Resource Local Resource](https://reader033.vdocument.in/reader033/viewer/2022053122/60aa0966e81ce9698a2fd531/html5/thumbnails/87.jpg)
Unpublished Work of SUSE LLC. All Rights Reserved.This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE LLC. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.
General DisclaimerThis document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.