opnfv summit 2015 doctor - fault management · pdf file– initial requirement study,...
TRANSCRIPT
![Page 1: OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study, architecture design, Gap analysis : Done – Collaborative Development: ... – NTT DOCOMO, Sprint,](https://reader031.vdocument.in/reader031/viewer/2022030415/5aa0d7837f8b9a6c178eabc9/html5/thumbnails/1.jpg)
1
OPNFV Summit 2015
Doctor - Fault Management
Gerald Kunzmann, DOCOMO
Carlos Goncalves, NEC
Ryota Mibu, NEC
![Page 2: OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study, architecture design, Gap analysis : Done – Collaborative Development: ... – NTT DOCOMO, Sprint,](https://reader031.vdocument.in/reader031/viewer/2022030415/5aa0d7837f8b9a6c178eabc9/html5/thumbnails/2.jpg)
2
Doctor Overview
• Goal
– Build fault management and maintenance framework
• Approach
– Identify requirement
– Gap Analysis
– Implementation work in Upstream (OpenStack)
– Integration and testing
• Status
– Initial Requirement study, architecture design, Gap analysis : Done
– Collaborative Development: On-going (3 merged Blueprints in OpenStack Liberty)
– Standardization Sync: On-going (by NFV member efforts, joint meeting)
![Page 3: OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study, architecture design, Gap analysis : Done – Collaborative Development: ... – NTT DOCOMO, Sprint,](https://reader031.vdocument.in/reader031/viewer/2022030415/5aa0d7837f8b9a6c178eabc9/html5/thumbnails/3.jpg)
3
Doctor Members
• At project creation (Dec 2014)
– NTT DOCOMO, Sprint
– NEC, Nokia, Ericsson, Huawei, ClearPath Network, Cisco
• Now (Oct 2015)
– NTT DOCOMO, Sprint, AT&T, Telecom Italia, KDDI
– NEC, Nokia, Ericsson, Huawei, ClearPath Network, Cisco Cloudbase Solutions, Spirent, Intel, ZTE
2x
![Page 4: OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study, architecture design, Gap analysis : Done – Collaborative Development: ... – NTT DOCOMO, Sprint,](https://reader031.vdocument.in/reader031/viewer/2022030415/5aa0d7837f8b9a6c178eabc9/html5/thumbnails/4.jpg)
4
Assumption of VNF (NFV Application)
• Telco Applications basically deployed in active-standby or active-active fashion
App (Active) App (Standby)
VM VM
Machine Machine
App and App Manager (VNFM) cannot detect HW failures
directly
App state will be switched when failure occurred
![Page 5: OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study, architecture design, Gap analysis : Done – Collaborative Development: ... – NTT DOCOMO, Sprint,](https://reader031.vdocument.in/reader031/viewer/2022030415/5aa0d7837f8b9a6c178eabc9/html5/thumbnails/5.jpg)
5
Consumer C1 Consumer C2 Consumer C3
Virtualized Infrastructure Manager (VIM), e.g. OpenStack
Resource Map
Server – VM mapping
Server S1 VM-1, VM-2 Server S2 VM-7 Server S3 VM-4
Ownership information
VM-1, VM-7 Consumer C1 VM-2 Consumer C2 VM-4 Consumer C3
Resource Pool
Hypervisor
Hardware Server S1
VM-1
Hypervisor
Hardware Server S2
Hypervisor
Hardware Server S3
VM-2 VM-7 VM-4
X 1. Fault Monitoring - Hardware fault - Hypervisor fault - Host OS fault
6. Execute Instruction - e.g. migrate VM
2. Inform the Consumer? If YES, find owner of
affected VMs from database
OpenStack Northbound Interface
3. FaultNotification (VM ID, Fault ID)
5. Instruction (VM ID)
4. Switch to SBY configuration
Use Case 1: Fault management
![Page 6: OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study, architecture design, Gap analysis : Done – Collaborative Development: ... – NTT DOCOMO, Sprint,](https://reader031.vdocument.in/reader031/viewer/2022030415/5aa0d7837f8b9a6c178eabc9/html5/thumbnails/6.jpg)
6
Consumer C1 Consumer C2 Consumer C3
Virtualized Infrastructure Manager (VIM), e.g. OpenStack
Resource Map
Server – VM mapping
Server S1 VM-1, VM-2 Server S2 VM-7 Server S3 VM-4
Ownership information
VM-1, VM-7 Consumer C1 VM-2 Consumer C2 VM-4 Consumer C3
Resource Pool
Hypervisor
Hardware Server S1
VM-1
Hypervisor
Hardware Server S2
Hypervisor
Hardware Server S3
VM-2 VM-7 VM-4 6. Execute Instruction - e.g. migrate VM
OpenStack Northbound Interface
3. Maintenance Notification (VM ID) 5. Instruction
(VM ID)
4. Switch to SBY configuration
2. Which VMs are affected? Find Consumer owning the VM(s) from the database.
Administrator
1. Maintenance Request (Server S3)
Use Case 2: Maintenance
![Page 7: OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study, architecture design, Gap analysis : Done – Collaborative Development: ... – NTT DOCOMO, Sprint,](https://reader031.vdocument.in/reader031/viewer/2022030415/5aa0d7837f8b9a6c178eabc9/html5/thumbnails/7.jpg)
7
Fault Management Sequence
Virtualized Infrastructure
Applications
VIM User and Administrator
Virtualized Infrastructure Manager (VIM)
= OpenStack
Virtual Compute
Virtual Storage
Virtual Network
Virtualization Layer
Hardware Resources
App App App
Detection
Reaction
Doctor Scope
![Page 8: OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study, architecture design, Gap analysis : Done – Collaborative Development: ... – NTT DOCOMO, Sprint,](https://reader031.vdocument.in/reader031/viewer/2022030415/5aa0d7837f8b9a6c178eabc9/html5/thumbnails/8.jpg)
8
Key Requirements as VIM
Immediate Notification Consistent Resource
State Awareness
Extensible Monitoring Fault Correlation
![Page 9: OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study, architecture design, Gap analysis : Done – Collaborative Development: ... – NTT DOCOMO, Sprint,](https://reader031.vdocument.in/reader031/viewer/2022030415/5aa0d7837f8b9a6c178eabc9/html5/thumbnails/9.jpg)
9
Doctor Architecture and Typical Scenario
Monitor
Notifier
Manager
Virtualized Infrastructure (Resource Pool)
Alarm Conf.
3. Update State 2. Find Affected
Application
Controller Controller
Controller
Resource Map
1. Raw Failure
Inspector
4. Notify all
5. Notify Error
0. Set Alarm
6-. Action
Failure Policy
Monitor Monitor
![Page 10: OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study, architecture design, Gap analysis : Done – Collaborative Development: ... – NTT DOCOMO, Sprint,](https://reader031.vdocument.in/reader031/viewer/2022030415/5aa0d7837f8b9a6c178eabc9/html5/thumbnails/10.jpg)
10
Doctor OSS Map
Monitor
Notifier
Manager
Virtualized Infrastructure (Resource Pool)
Alarm Conf.
3. Update State 2. Find Affected
Application
Controller Controller
Controller
Resource Map
1. Raw Failure
Inspector
4. Notify all
5. Notify Error
0. Set Alarm
6-. Action
Failure Policy
Monitor Monitor
Ceilometer
e.g. Monasca e.g. Zabbix
Cinder
Neutron
Nova
![Page 11: OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study, architecture design, Gap analysis : Done – Collaborative Development: ... – NTT DOCOMO, Sprint,](https://reader031.vdocument.in/reader031/viewer/2022030415/5aa0d7837f8b9a6c178eabc9/html5/thumbnails/11.jpg)
11
Doctor OSS Development
Monitor
Notifier
Manager
Virtualized Infrastructure (Resource Pool)
Alarm Conf.
3. Update State 2. Find Affected
Application
Controller Controller
Controller
Resource Map
1. Raw Failure
Inspector
4. Notify all
5. Notify Error
0. Set Alarm
6-. Action
Failure Policy
Monitor Monitor
Ceilometer
Event Alarm
Cinder
Neutron
Nova
State Correction
e.g. Zabbix e.g. Monasca
![Page 12: OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study, architecture design, Gap analysis : Done – Collaborative Development: ... – NTT DOCOMO, Sprint,](https://reader031.vdocument.in/reader031/viewer/2022030415/5aa0d7837f8b9a6c178eabc9/html5/thumbnails/12.jpg)
12
Doctor Blueprints in Liberty Cycle
Project Blueprint Spec Drafter Developer Status
Ceilometer Event Alarm Evaluator Ryota Mibu (NEC)
Ryota Mibu (NEC)
Completed (Liberty)
Nova
New nova API call to mark nova-compute down
Tomi Juvonen (Nokia)
Roman Dobosz (Intel)
Completed (Liberty)
Support forcing service down Tomi Juvonen (Nokia)
Carlos Goncalves (NEC)
Completed (Liberty)
Get valid server state Tomi Juvonen (Nokia)
Spec approved (Mitaka)
Add notification for service status change
Balazs Gibizer (Ericsson)
Balazs Gibizer (Ericsson)
Waiting for spec approval (Mitaka)
✓
✓
✓
![Page 13: OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study, architecture design, Gap analysis : Done – Collaborative Development: ... – NTT DOCOMO, Sprint,](https://reader031.vdocument.in/reader031/viewer/2022030415/5aa0d7837f8b9a6c178eabc9/html5/thumbnails/13.jpg)
13
Doctor BP Detail: Nova – Mark Nova-Compute Down
Host / Machine
Hypervisor
VM
nova compute
nova api
nova conductor
nova scheduler
nova DB queue
External Monitoring Service
vSwitch
BMC
EXISTING (periodic update)
Force-down API
NEW API to update nova-compute service state
service state
Monitoring Client
![Page 14: OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study, architecture design, Gap analysis : Done – Collaborative Development: ... – NTT DOCOMO, Sprint,](https://reader031.vdocument.in/reader031/viewer/2022030415/5aa0d7837f8b9a6c178eabc9/html5/thumbnails/14.jpg)
14
Doctor BP Detail: Ceilometer - Event Alarm
sample
Notification-driven alarm
evaluator
NEW Shortcut (notification-based)
EXISTING (polling-based)
Manager
Audit Service
stats
notification
event
Cinder Neutron Nova
![Page 15: OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study, architecture design, Gap analysis : Done – Collaborative Development: ... – NTT DOCOMO, Sprint,](https://reader031.vdocument.in/reader031/viewer/2022030415/5aa0d7837f8b9a6c178eabc9/html5/thumbnails/15.jpg)
15
Doctor Southbound API
User NFVI
Conf. Policy
Controller Inspector Notifier
Admin
Conf.
Monitor
Configuration Fault Messaging
Unified Event API Monitor
Monitor
Threshold
Enable
Enable
![Page 16: OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study, architecture design, Gap analysis : Done – Collaborative Development: ... – NTT DOCOMO, Sprint,](https://reader031.vdocument.in/reader031/viewer/2022030415/5aa0d7837f8b9a6c178eabc9/html5/thumbnails/16.jpg)
16
Doctor Status
Notifier Monitor Controller Inspector
Ceilometer
Zab
bix
Nova Monasca?
DP
DK
Neu
tron
Cin
der
Done
Next
Ste
p
To-Be Arch. Design
Gap Analysis
Blueprint
Coding
Integration
OPNFV Release
Dec 2014
Sep 2015
Feb 2016
Mar 2015
![Page 17: OPNFV Summit 2015 Doctor - Fault Management · PDF file– Initial Requirement study, architecture design, Gap analysis : Done – Collaborative Development: ... – NTT DOCOMO, Sprint,](https://reader031.vdocument.in/reader031/viewer/2022030415/5aa0d7837f8b9a6c178eabc9/html5/thumbnails/17.jpg)
17
Don’t miss out...
• “Doctor – Fault Management” Project Theater, Wednesday, 3:55 pm – 4:15 pm
• “Doctor: Failure Detection and Notification for NFV” DOCOMO booth, PoC Demo Zone