1 doctor fault management 18 may 2015 ryota mibu, nec
TRANSCRIPT
![Page 1: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/1.jpg)
1
DoctorFault Management
18 May 2015
Ryota Mibu, NEC
![Page 2: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/2.jpg)
2
Doctor Overview
• One of OPNFV Requirement Project (Identify requirement, Gap Analysis, Implementation Study)
• Goal
– Build fault management and maintenance framework for high availability of Network Services on top of virtualized infrastructure
– Valuable and acceptable framework for other industries
• Status
– Initial Requirement study, architecture design, Gap analysis : Done (See Document [link])
– Collaborative Development: Started (Blueprints are proposed to Nova and Ceilometer)
– Standardization Sync: On-going (by NFV member efforts, joint meeting)
![Page 3: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/3.jpg)
3
Use Case 1: Fault management
![Page 4: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/4.jpg)
4
Use Case 2: Maintenance
![Page 5: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/5.jpg)
5
High Level Architecture
Virtualized Infrastructure
Applications
VIM User and Administrator
Virtualized Infrastructure Manager (VIM)= OpenStack
Virtual Comput
e
Virtual Storage
Virtual Network
Virtualization Layer
Hardware Resources
App App App
![Page 6: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/6.jpg)
6
Fault Management Sequence
Virtualized Infrastructure
Applications
VIM User and Administrator
Virtualized Infrastructure Manager (VIM)= OpenStack
Virtual Comput
e
Virtual Storage
Virtual Network
Virtualization Layer
Hardware Resources
App App App
Detectio
n
Reaction
Doctor Initial Focus
![Page 7: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/7.jpg)
8
Key Requirements as VIM
Immediate Notification
Consistent Resource State
Awareness
Extensible Monitoring
Fault Correlation
![Page 8: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/8.jpg)
9
TO-BE: Functional Blocks
Virtualized Infrastructure
Applications
VIM User and Administrator
VIM
Virtual Comput
e
Virtual Storage
Virtual Network
Virtualization Layer
Hardware Resources
App App App
Notifier
Monitor
Controller
Inspector
![Page 9: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/9.jpg)
10
Fault Management Scenarios (1/2)
Monitor
Notifier
User-sideManager
Virtualized Infrastructure
Alarm
Conf.3. Update State2. Find Affected
Applications
ControllerController
Controller
Resource Map
1. Raw Failure
Inspector
4. Notify all
4. (alt) Notify
Admin-side Manager
5. Notify Error
0. Set Alarm
6-. Action
Failure
Policy
MonitorMonitor
![Page 10: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/10.jpg)
11
Fault Management Scenarios (2/2)
Monitor
Notifier
User-sideManager
Virtualized Infrastructure
Alarm
Conf.3. Update State2. Find Affected
Applications
ControllerController
Controller
Resource Map
1. Raw Failure
Inspector
4. Notify all
4. (alt) Notify
Admin-side Manager
5. Notify Error
0. Set Alarm6-. Action
Failure
Policy
MonitorMonitor
![Page 11: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/11.jpg)
12
AS-IS: OpenStack Kilo (1/3)
• How can you find faults as a tenant user?
– Keep-a-live check to each VM– Polling VM state to Nova API– Set alarm on metering service (e.g. CPU runtime)
![Page 12: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/12.jpg)
13
AS-IS: OpenStack Kilo (2/3)
• How does the metering service work?
1. Resource controller such as Nova monitors usage of resource [Periodically]
2. Get samples from resource controller and register them to DB [Periodically]
3. Evaluate alarm definition on samples [Periodically]4. Raise alarm depend on result of the evaluation
Machine
Hypervisor
VM
Nova Ceilometer (Heat)
Samples
1.
2. 3
.
4.
![Page 13: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/13.jpg)
14
AS-IS: OpenStack Kilo (3/3)
• Notification
– OpenStack components post events to messaging queue– Ceilometer collects, transform and publish those events which can be
used for billing
NFVI Neutron Ceilometer (Billing)
Samples
Nova
Cinder
Que
ue
![Page 14: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/14.jpg)
15
Implementation Plan in OpenStack
15
Ceilomter
Virtualized Infrastructure
Applications
Zabbix
VIM User and Administrator
Error Injection
Plugin ?
Event Alarm
Immediate Notification
Queue
Inspector
Nova
![Page 15: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/15.jpg)
16
Demo (1/3)
• User Scenario
Web Server
Web Server
Web Server
Load Balancer
HTTP ClientsHTTP
ClientsHTTP Clients
Public Net Private Net
Launch New VM
![Page 16: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/16.jpg)
17
Demo (2/3)
• Demo 1
• Demo 2
Machine
Hypervisor
VM
Nova
Ceilometer (Heat)
Samples
1. Collect CPU time samples
2. Alarm Heat if CPU runtime = 0
3. Create New Web Server
1. Hook
3. Alarm Heat
Agent
Alarm
2. Notify as Event
Machine
Hypervisor
VM
Nova
Ceilometer (Heat)
Agent
Alarm
![Page 17: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/17.jpg)
18
Demo (3/3) Results
• Demo 1
• Demo 2
90 sec
26 sec
![Page 18: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/18.jpg)
19
Doctor Southbound API
UserNFVI
Conf.Polic
yControlle
rInspector Notifier
Admin
Conf.
Monitor
ConfigurationFault Messaging
Unified Event API
Monitor
Monitor
Threshold
Enable
Enable
![Page 19: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/19.jpg)
20
Case 1: Obvious Fault
UserNFVI
Conf.Polic
yControlle
rInspector Notifier
Admin
Conf.
Monitor
ZabbixBMC(Inspecto
r)Nova
Ceilometer
User
ConfigurationFault Messaging
SNMP Trap(Power-off)
HTTP POST(Host A down)
HTTP POST(Host A down,
VM A1-A3 down)
HTTP POST(VM A1 down)
HTTP POST(Alert: VM A1 down)
HTTP POST(Create Alarm)
Enable
Enable
![Page 20: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/20.jpg)
21
Case 2: Threshold Exceeded Fault (Admin Config)
UserNFVI
Conf.Polic
yControlle
rInspector Notifier
Admin
Conf.
Monitor
Zabbix
Monitor Agent
(Inspector)
NovaCeilomet
erUser
ConfigurationFault Messaging
HTTP POST(Switch down) HTTP POST
(Host A down, VM A1-A3 down)
HTTP POST(VM A1 down)
HTTP POST(Alert: VM A1 down)
HTTP POST(Create Alarm)
Threshold
Enable
Enable
vSwitch
collectd
Admin Threshold
![Page 21: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/21.jpg)
22
Backup
![Page 22: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/22.jpg)
23
Fault Management Sequence (Optional)
Virtualized Infrastructure
Applications
VIM User and Administrator
Virtualized Infrastructure Manager (VIM)= OpenStack
Virtual Comput
e
Virtual Storage
Virtual Network
Virtualization Layer
Hardware Resources
App App App
Auto Reaction
Detectio
n
Reaction
![Page 23: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/23.jpg)
24
Fault Management Scenarios (Optional)
Monitor
Notifier
User-sideManager
Virtualized Infrastructure
Alarm
Conf.3. Update State2. Find Affected
Applications
ControllerController
Controller
Resource Map
1. Raw Failure
Inspector
4. Notify all
4. (alt) Notify
Admin-side Manager
5. Notify Error
0. Set Alarm
6-. Action
Failure
Policy
Monitor
Auto Reaction
Monitor
![Page 24: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/24.jpg)
25
Configuration / Policy Enforcement
25
UserNFVI
Conf.Polic
yInspector Notifier
Admin
Policy Service
Conf.
Monitor
ConfigurationFault Messaging
Option 1: Policy Service Integration
Option 2: Using Metadata in Controller
Metadata
Threshold
Enable
Metadata
Controller
PolicyThreshold
Enable
![Page 25: 1 Doctor Fault Management 18 May 2015 Ryota Mibu, NEC](https://reader036.vdocument.in/reader036/viewer/2022062308/56649d935503460f94a7a5b7/html5/thumbnails/25.jpg)
26
Case 3: Threshold Exceeded Fault (User Config)
26
UserNFVI
Conf.Polic
yControlle
rInspector Notifier
Admin
Conf.
Monitor
Zabbix
Monitor Agent
(Inspector)
NovaCeilomet
erUser
ConfigurationFault Messaging
HTTP POST(Switch down) HTTP POST
(Host A down, VM A1-A3 down)
HTTP POST(VM A1 down)
HTTP POST(Alert: VM A1 down)
HTTP POST(Create Resource with Policy Label)
vSwitch
collectd
Admin
Policy Service
Enable
ThresholdEnable Threshold
Policy
CongressHTTP POST(Set Policy)
HTTP POST(Data)
Metadata