ota000301 optix sdh system troubleshooting methods issue 1.20

Click here to load reader

Post on 26-Oct-2014

106 views

Category:

Documents

3 download

Embed Size (px)

TRANSCRIPT

OptiX SDH System Troubleshooting Methodswww.huawei.com

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Objectives

Upon completion of this course, you will be able to:

List the common analysis methods of fault locating. Outline the Fault Handling Flow. Analyze the typical faults: traffic interruption, error bit, etc.

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page2

Contents1. 2. 3.

Troubleshooting Preparation Troubleshooting Idea and Methods Classified Troubleshooting Examples

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page3

Contents1. 2. 3.

Troubleshooting Preparation Troubleshooting Idea and Methods Classified Troubleshooting Examples

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page4

Requirements for Maintenance Staff-IProfessional Skills

Be familiar with hardware system and SDH fundamental,

Be familiar with alarm generation mechanism and signal flow in transmission system

Be familiar with the basic maintenance instruments and tools

Familiar with the network under maintenancePage5 Network topology, network protection, traffic

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Requirements for Maintenance Staff-IIProfessional Skills

Be familiar with common alarms

SDH line alarms (R_LOS, R_LOF, R_OOF, AU_AIS, AU_LOP, MS_AIS, MS_RDI, B1_EXC, B2_EXC, HP_LOM, HP_SLM, HP_TIM, HP_UNEQ);

PDH tributary alarms (TU_AIS, TU_LOP, T_ALOS, T_DLOS, P_LOS, EXT_LOS, UP_E1_AIS, LP_RDI, LP_SLM, LP_TIM, LP_UNEQ, B3_EXC);

Protection switching alarms (PS); Clock alarms (LTI, SYNC_C_LOS , SYN_BAD); Equipment alarms (POWER_FAIL, FAN_FAIL, BD_STATUS).

Collect and save on-site data

System alarms, performance events data, configurations, operation records of NMSPage6

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Fault Handling FlowFlow ChartStart Record fault trace External cause? No Analyze the fault to locate it Fault removed? No Report the fault to Huawei Yes Yes Other handling flows

Continue 1Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved. Page7

Continue 2

Fault Handling Flow - cont.Flow ChartContinue 1 Make solution together Try the solution No Service recovered? Yes Observe service running Fault removed? Yes Archive the fault handling report EndCopyright 2006 Huawei Technologies Co., Ltd. All rights reserved. Page8

Continue 2

No

Contents1. 2. 3.

Troubleshooting Preparation Troubleshooting Idea and Methods Classified Troubleshooting Examples

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page9

Question

What is the key for troubleshooting ?

To locate a failure ACCURATELY in one station

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page10

How to Locate a Fault?Basic Principles of Fault Localization

External first, then transmission

Broken fiber, switch failure Power failure, grounding

Network first, then network elements

Try your best to locate the troubles to one LU alarms can lead to TU alarms Higher-severity alarms first, then Lower-severity alarms

node LU first, then TU

First analyze critical/major alarms.

Then come to minor/warning alarms.

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page11

Common Methods of Fault LocalizationKeys of Fault Localization1 Alarm and performance analysis 2 Loopback 3 Replacement 4 Configuration Data Analysis 5 Configuration Modification 6 Test with instruments 7 Experience

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page12

Evaluate Whole Network Use NMS

Alarm and Performance Analysis-IHow to obtain alarms and performance? Observe indicators on boards and cabinets

Comprehensive All alarms/performance events from the whole network Accurate Current alarms, history alarms, occurrence time and performance event data can be queried.

Not detailed No history alarms

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page13

Alarm and Performance Analysis-IIMain Steps

Obtain alarm and performance events

Select the key alarm or performance events

Analyze reasons

Limit the troubles to a certain range or a node

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page14

Alarm and Performance Analysis-IIICaseR-LOS

w

w

E

w

E

w

1

2

3

4

LP-RDI

MS-RDI

TU-AIS

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page15

LoopbackWhat is Loopback?

Loopback is the most common, most efficient method in troubleshooting.Inloop Inloop SDH equipment outloop outloop Tributary InloopLine

Line

outloop

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page16

LoopbackWhere Do We Loop?Board Loopback Loopback Loopback involved options tools levelTributary Inloop/outloop Loopback board cable, NMS

Application

Loopback at Separate switching faults from path level transmission faults. Determine the tributary board failure roughly. Be unnecessary to modify service configuration.

Line board Inloop/outloop Patch fiber, Loopback by Locate single station faults. Roughly NMS optical determines the line board failure. Be interface no need to modify service Software loopback is NOT an absolute method, why? configuration.

Notes

May interrupt the traffic and ECC Will automatically be removed in 5 minutes (provisional)

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page17

LoopbackProcedure

Select one NE from several faulty NEs; Choose one affected traffic path from the selected faulty NE; Draw the traffic flow diagram (source, sink, pass through); Connect testing devices; Check alarms. : w w :

: e w :

t :Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved. Page18

t :

ReplacementWhen to Use?

ObjectiveFiber Cable Board Modules

ApplicationExternal faults Board faults

MSP switch SNCP switch Active/standby XC switch TPS switchPage19

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Configuration Data AnalysisQuery & Analyze the Configuration

Timeslot configuration J1 or C2 bytes LU and TU paths loopback SNCP or MSP switching conditions External commands (e.g. locked switch) Consistent Configuration in both NMS and NEs

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page20

Configuration ModificationFast Solution

ObjectivePort Timeslot Sub-rack Slots

Application ExamplesNo spare boards Restore the traffic temporarily

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page21

Testing InstrumentAccurate Judgments

InstrumentBit error testing device Optical power meter SDH analyzer

Test itemBit error/traffic Optical power Bit error/traffic/overhead bytes

Multi-meter

Voltage/current/resistance

This method is the most reliable one, but we must have the devices in hand.

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page22

ExperienceRule of Thumb

Reset board Power off and on Resend the configurationnot consider them as cure-all. are not helpful for us to find the real cause of

Do

They

the failure.

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page23

SummaryMethods Alarm and performance analysis Loopback Application Universal Features 1. Evaluate the whole network situation. 2. Locate the faulty point preliminarily based on the collected data. 3. Cause no negative effect on normal services 4. Depend on the NMS

Locate the fault to 1. Independent of alarm and performance event a single station or analysis board 2. Rapid and effective

Replacement Locate the fault to 1. Convenient a board or isolate 2. Require spare parts/equipment. external faults 3. Applied with other methods Configuration Locate the fault to 1. Can find the fault cause. data analysis a single station or 2. Fault locating time is longer. board 3. Depend on the NMS Configuration Locate the fault to 1. Have a high risk. modification a board 2. Depend on the NMS Test with Isolate external 1. A general method with high accuracy instruments faults and resolve 2. Have certain requirements for the meters. interconnectivity 3. Applied with other methods problem 1. Fast fault handling 2. High probability of mistake 3. Need rights reserved. Copyright 2006 Huawei Technologies Co., Ltd. All experience accumulation. Page24 Experience Special cases

Contents1. 2. 3.

Troubleshooting Preparation Troubleshooting Idea and Methods Classified Troubleshooting Examples

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page25

Troubleshooting SequenceExclude external troubles Switching problem? Fiber problems? Trunk cable? Power supply system? Grounding problem? Replacement Instrument testing Loopback Alarm/performance analysis

Locate troubles to one NE Replacement Loopback Alarm/performance analysis Configuration analysis Configuration modification Rule of Thumb

Loopback Alarm/performance analysis Locate the troubles to one board

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page26

Classified Troubleshooting ExamplesTraffic Interruption

Bit Errors

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page27

Traffic InterruptionPossible Causes

External causesPower

Operation causes

Equipment failure

supply system equipment power off, under voltage, etc. Switch problems Fiber or trunk cables Excessive attenuation, fiber cut Cable disconnection

Loopback Data

Faulty

modification

board Performance degrade

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page28

Traffic InterruptionOperations

Equipment operator Check the indicator status on each board Analyze the alarms Hardware loopback Replacement

NMS operator Check the login of each station Query and analyze alarms Loopback section by section Configuration modification Implement switchPage29

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Traffic InterruptionNo-protection Line-Casew 2:1 w E 2:1 w E 2:1 w

1t2:1LP-RDI

2

3

4t2:1TU-AIS

Network ConfigurationNode 1 is the centralized services node. Each station has E1 services with node 1. Failure Description

Interrupted E1 service between node1 and 4 Node 4:TU-AIS Node 1: LP-RDI Other services normalPage30

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Traffic InterruptionWhere is the Problem?w 2:1 w E 2:1 w E 2:1 w

1t2:1 LP-RDI

2

3

4t2:1

Query alarms

TU-AIS

Alarm analysisTU-AIS in node 4 only

Node 4 can not receive the traffic from node 1

Other traffic normal between nodes 1, 2, 3

Failure location between nodes 3 and 4

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page31

Traffic InterruptionAnalysisLoop backw 2:1 w E 2:1 w E 2:1 w

1t2:1 BER testerConnect tester

2

3

4t2:1

Outloop on VC4 #2 at node 4 Normal No Failure between nodes 3, 4 Soft Inloop on VC4 #2 at east LU of node 3 Yes Failure in node 4

Normal

No

Failure in node 3

Yes Failure between nodes 3, 4 Hard Optical port inloop at east LU of node 3 Normal Yes Failure in node 4 No Failure in node 3

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page32

Traffic InterruptionFinal Solutionw 2:1 w E 2:1 w E 2:1 w

1t2:1 LP-RDI Replacement

2

3

4t2:1 TU-AIS

Locate failure in one node Maybe LU/TU/XC faulty TPS switch Traffic normal No Active/standby XC switch Traffic normal No Replace faulty LU Yes Replace faulty TU

Yes

Replace faulty XC

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page33

Traffic InterruptionSNCP Ring-CaseTU-AIS

wTU-AIS

e 2 w SNCP Ring w 4 eTU-AIS

e

LP-RDI

3 e

1

w

Network Configuration

Node 1 is the centralized services node. Each station has E1 services with node 1. All E1 services interrupted Nodes 2, 3, 4: TU-AIS Node 1: LP-RDIPage34

Failure Description

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Traffic InterruptionWhere is the Problem?TU-AIS

wTU-AIS

e 2 w SNCP Ring w 4 e

3 e

e LP-RDI 1 w

Alarm/performance analysis Analyze configuration correctness Disconnect ring, convert to line Loopback Replacement

TU-AIS

Thoughts and methods

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page35

Traffic InterruptionMSP Ring-CaseR-LOS R-LOS

e TU-AIS TU-AIS 3 MSP Ring 1 e w STM-4 5 e w 4 e w Network ConfigurationNode 1 is the centralized services node. Each station has E1 services with node 1. Failure Description Shortest service route configuration Fibers between NE2-NE3 are broken R-LOS E1 services interrupted between nodes1 and 3 Nodes 1, 3: TU-AIS Other services normal

w

e 2 w

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page36

Traffic InterruptionWhere is the Problem?MSP switch process

LU

SF or SD detection K1 & K2 bytes transmission

SCC

Normally process APS protocol

Started APS controller Right switch state

XC

Implement switching

Protection channels

Available

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page37

Traffic InterruptionAnalysisR-LOS APS-INDIQuery and check alarms

S

S R-LOSAPS-INDIYes

Check switch status Normal

w 3

e 2 w

e 1 5 eP

No Maybe APS protocol stoped Restart it Yes

e

MSP Ring STM-4 w 4 eAPS-INDI

P

APS-INDI

w

Switch status normal No Resend configuration

w

APS-INDI

PYes Draw switched traffic flow diagram Loopback section after section to locate faulty LU/XC

Switch status normal No Restart APS protocol node by node to locate faulty LU/XC

Replace faulty LU/XC

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page38

Traffic InterruptionNormal route1

R-LOS APS-INDI APS-INDIw1:17

S

S R-LOS

e1:17

2

e1:17

w1:17

3

e

w 3

t2:1

t2:1

e APS-INDI MSP Ring 1 P w STM-4 5 e w 4 e w APS-INDI APS-INDIP P e3:17 e3:17

e 2 w

Switched route

e1:17

e3:17

e3:17

1 w1:17 2 w3:17 t2:1

1w3:17

5w3:17

4w3:17

3 t2:1

Notes

One

complex line Can use dichotomyPage39

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Bit ErrorsPossible Causes

External causesPerformance

Equipment failureTransmitter

degradation of fibers, excessive attenuation Dirty fiber joint or incorrect connector Poor equipment grounding Strong interference source near the equipment Poor ventilation, high operating temperature

or receiver

failure in LU Poor synchronization Poor coordination between XC and LU/TU Fan failure Faulty boards or poor performance

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page40

Bit ErrorsEquipment operator Measure optical power Check cable or fiber connection and grounding Clean fiber connector Operations Check ventilation and temperature Hardware loopback Replace board Exclude interference source NMS operator Query and analyze alarms/ performance events Loopback section by section Configuration modification Implement switch

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page41

Bit ErrorsNo-protection Line-CaseRSBBE MSBBE HPBBE MSFEBBE HPFEBBE

w

w

E

w

E

w

1

2

3

4

LPBBE

LPFEBBE

Network Configuration

Node 1 is the centralized services node. Each station has E1 services with node 1. Too many bit errorsPage42

Failure Description

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Bit ErrorsWhere is the Problem?w w E w

RSBBEM SBBEHP BBE

MSFEBBEH PFEBBE

E

w

1

2

3

4

LPBBE Perform

LPFEBBECheck and exclude external causes

ance event analysis

Performance event analysis LPBBE from 4 to 1 LU first then TU Failure locates between 3 or 4 continue Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved. Page43 RSBBE/MSBBE/HPBBE from 4 to 3

Bit ErrorsAnalysisw w E wRSBBE MSBBE HPBBE MSFEBBE HPFEBBE

E

w

1

2

3

4

Perform ance event analysis

LPBBECheck fans and temperature Normal Yes Measure or query optical power Normal Yes continue No No Solve problems

LPFEBBE

Check and replace transmitter/fiber/ connector/receiver

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page44

Bit ErrorsFinal Solutionw w E wRSBBE MSBBE HPBBE MSFEBBE HPFEBBE

E

w

1

2

3

4

LPBBE

LPFEBBE Connect BER tester

Loopback & ReplacementLoopback

Active/standby XC switch

Modify configuration

Locate and replace the faulty LU/XC

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page45

Bit ErrorsThink About It!RSBBE MSBBE HPBBE MSFEBBE HPFEBBE

w

w

E

w

E

w

1

2

3

4

LPBBE

LPFEBBE

Question

How

to solve occasional bit errors? Interchange You can not loopback for a long time Fiber or LU

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page46

Questions

What is the key of troubleshooting?

To locate a failure ACCURATELY in certain station

What is the principle of troubleshooting?

External first, then internal Station first, then boards LU first, then TU Higher-severity alarms first, then lowerseverity alarms

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page47

Summary

Which methods for troubleshooting?

1Alarm and performance analysis 2Loopback 3Replacement 4Configuration Data Analysis 5Configuration Modification 6Test with instruments 7Rule of Thumb

Copyright 2006 Huawei Technologies Co., Ltd. All rights reserved.

Page48

Thank youwww.huawei.com