1 opnfv summit 2015 doctor - fault management gerald kunzmann, docomo carlos goncalves, nec ryota...

17
1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

Upload: ilene-bailey

Post on 21-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

1

OPNFV Summit 2015

Doctor - Fault

ManagementGerald Kunzmann, DOCOMO

Carlos Goncalves, NEC

Ryota Mibu, NEC

Page 2: 1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

2

Doctor Overview

• Goal

– Build fault management and maintenance framework

• Approach

– Identify requirement– Gap Analysis– Implementation work in Upstream (OpenStack)– Integration and testing

• Status

– Initial Requirement study, architecture design, Gap analysis : Done– Collaborative Development: On-going (3 merged Blueprints in

OpenStack Liberty)– Standardization Sync: On-going (by NFV member efforts, joint meeting)

Page 3: 1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

3

Doctor Members

• At project creation (Dec 2014)

– NTT DOCOMO, Sprint– NEC, Nokia, Ericsson, Huawei, ClearPath Network, Cisco

• Now (Oct 2015)

– NTT DOCOMO, Sprint, AT&T, Telecom Italia, KDDI– NEC, Nokia, Ericsson, Huawei, ClearPath Network, Cisco

Cloudbase Solutions, Spirent, Intel, ZTE

2x

Page 4: 1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

4

Assumption of VNF (NFV Application)

• Telco Applications basically deployed in active-standby or active-active fashion

App (Active) App (Standby)

VM VM

Machine Machine

App and App Manager

(VNFM) cannot detect HW

failures directly

App state will be switched when failure

occurred

Page 5: 1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

5

Consumer C1 Consumer C2 Consumer C3

Virtualized Infrastructure Manager (VIM), e.g. OpenStack

Resource Map

Server – VM mappingServer S1 VM-1, VM-2Server S2 VM-7Server S3 VM-4

Ownership informationVM-1, VM-7 Consumer C1VM-2 Consumer C2VM-4 Consumer C3

Resource Pool

Hypervisor

Hardware Server S1

VM-1

Hypervisor

Hardware Server S2

Hypervisor

Hardware Server S3

VM-2 VM-7 VM-4

X1. Fault Monitoring

- Hardware fault- Hypervisor fault- Host OS fault

6. Execute Instruction- e.g. migrate VM

2. Inform the Consumer?If YES, find owner of

affected VMs from database

OpenStack Northbound Interface

3. FaultNotification(VM ID, Fault ID)

5. Instruction(VM ID)

4. Switch to SBY configurationV

Use Case 1: Fault management

Page 6: 1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

6

Consumer C1 Consumer C2 Consumer C3

Virtualized Infrastructure Manager (VIM), e.g. OpenStack

Resource Map

Server – VM mappingServer S1 VM-1, VM-2Server S2 VM-7Server S3 VM-4

Ownership informationVM-1, VM-7 Consumer C1VM-2 Consumer C2VM-4 Consumer C3

Resource Pool

Hypervisor

Hardware Server S1

VM-1

Hypervisor

Hardware Server S2

Hypervisor

Hardware Server S3

VM-2 VM-7 VM-4 6. Execute Instruction- e.g. migrate VM

OpenStack Northbound Interface

3. Maintenance Notification

(VM ID)5. Instruction(VM ID)

4. Switch to SBY configuration

V

2. Which VMs are affected?Find Consumer owning the VM(s) from the database.

Administrator

1. Maintenance Request (Server S3)

Use Case 2: Maintenance

Page 7: 1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

7

Fault Management Sequence

Virtualized Infrastructure

Applications

VIM User and Administrator

Virtualized Infrastructure Manager (VIM)= OpenStack

Virtual Comput

e

Virtual Storage

Virtual Network

Virtualization Layer

Hardware Resources

App App App

Detectio

n

Reaction

Doctor Scope

Page 8: 1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

8

Key Requirements as VIM

Immediate Notification

Consistent Resource State

Awareness

Extensible Monitoring

Fault Correlation

Page 9: 1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

9

Doctor Architecture and Typical Scenario

Monitor

Notifier

Manager

Virtualized Infrastructure

(Resource Pool)

AlarmConf.

3. Update State2. Find Affected

Application

ControllerController

Controller

Resource Map

1. Raw Failure

Inspector

4. Notify all

5. Notify Error

0. Set Alarm

6-. Action

Failure Policy

MonitorMonitor

Page 10: 1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

10

Doctor OSS Map

Monitor

Notifier

Manager

Virtualized Infrastructure

(Resource Pool)

AlarmConf.

3. Update State2. Find Affected

Application

ControllerController

Controller

Resource Map

1. Raw Failure

Inspector

4. Notify all

5. Notify Error

0. Set Alarm

6-. Action

Failure Policy

MonitorMonitor

Ceilometer

e.g. Monasca

e.g. Zabbix

Cinder

Neutron

Nova

Page 11: 1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

11

Doctor OSS Development

Monitor

Notifier

Manager

Virtualized Infrastructure

(Resource Pool)

AlarmConf.

3. Update State2. Find Affected

Application

ControllerController

Controller

Resource Map

1. Raw Failure

Inspector

4. Notify all

5. Notify Error

0. Set Alarm

6-. Action

Failure Policy

MonitorMonitor

Ceilometer

Event Alarm

Cinder

Neutron

Nova

State Correction

e.g. Zabbix

e.g. Monasca

Page 12: 1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

12

Doctor Blueprints in Liberty Cycle

Project BlueprintSpec Drafter

Developer Status

Ceilometer

Event Alarm EvaluatorRyota Mibu (NEC)

Ryota Mibu (NEC)

Completed (Liberty)

Nova

New nova API call to mark nova-compute down

Tomi Juvonen (Nokia)

Roman Dobosz (Intel)

Completed (Liberty)

Support forcing service downTomi Juvonen (Nokia)

Carlos Goncalves (NEC)

Completed (Liberty)

Get valid server stateTomi Juvonen (Nokia)

Spec approved (Mitaka)

Add notification for service status change

Balazs Gibizer (Ericsson)

Balazs Gibizer (Ericsson)

Waiting for spec approval (Mitaka)

Page 13: 1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

13

Doctor BP Detail: Nova – Mark Nova-Compute Down

Host / Machine

Hypervisor

VM

nova comput

e

nova api

nova conduct

or

nova schedule

r

nova DBqueu

e

External Monitoring

Service

vSwitch

BMC

EXISTING(periodic update)

Force-down API

NEW APIto update nova-computeservice state

service state

MonitoringClient

Page 14: 1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

14

Doctor BP Detail: Ceilometer - Event Alarm

sample

Notification-driven alarm

evaluatorNEW Shortcut(notification-based)

EXISTING(polling-based)

Manager

Audit Service

stats

notification

event

CinderNeutro

nNova

Page 15: 1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

15

Doctor Southbound API

UserNFVI

Conf.Polic

yControlle

rInspector Notifier

Admin

Conf.

Monitor

ConfigurationFault Messaging

Unified Event API Monitor

Monitor

Threshold

Enable

Enable

Page 16: 1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

16

Doctor Status

Notifier MonitorController Inspector

Ceilometer Z

abbixNova

Monasca? DPD

K

Neutr

on

Cin

der

Done

Next

Ste

pTo-Be Arch.

Design

Gap Analysis

Blueprint

Coding

Integration

OPNFV Release

Dec 2014

Sep 2015

Feb 2016

Mar 2015

Page 17: 1 OPNFV Summit 2015 Doctor - Fault Management Gerald Kunzmann, DOCOMO Carlos Goncalves, NEC Ryota Mibu, NEC

17

Don’t miss out...• “Doctor – Fault Management”

Project Theater, Wednesday, 3:55 pm – 4:15 pm

• “Doctor: Failure Detection and Notifiaction for NFV” DOCOMO booth, PoC Demo Zone