iub troubleshooting.pptx
TRANSCRIPT
INITIAL IubTROUBLESHOOTIN
GOVERVIEW
Agenda Objective CPP O&m Concepts Protocols O&m Client Services Counters Overview Performance Management Iub over ATM Initial Counters Iub Analysis Fail After Admission IP Iub Throughput Questions?
OBJECTIVE
Main idea is introduce to the transport engineer the basic concepts of troubleshooting on Iub interface, by presenting initial counters and KPIs, that could help to define which area needs further investigations.
Based on these conclusions, network optimization services can be performed.
Moshell is a suite of tools for O&M of CPP-based nodes.
CPP is the Connectivity Packet Platform on which are based the following nodes: RNC, RBS, MGW, RXI.
Information collected by CPP counters every 15 minutes in stored in xml files (ROP files).
Information are read and stored into a SQL database on a daily basis.
CPP O&M CONCEPTS
Protocols used for accessing these services:› http› unsecure protocols (unencrypted): telnet, ftp, iiop› secure protocols (encrypted): ssh, sftp, ssliop
NODE
OSE shell (COLI)
File system
MIB
CM (Configuration Mgmt) FM (Fault Mgmt) PM (Performance Mgmt)
HTTP (80) FTP (21) / SFTP (22)
TELNET (23) / SSH (22)
IIOP (56834)
/ SSL IOP (56836)
TCP/IP
Ethernet or IPoverATM
RS232
MoShell
Hyper Terminal
Scanners
PROTOCOLS
Figure 1 - Protocols
The O&M client services
› Configuration Service (CS): Read and change configuration data; configuration data is stored in the MO attributes
› Alarm Service (AS): Retrieve the list of alarms currently active on each MO
› Notification Service (NS): Subscribe and receive notifications from the node, informing about parameter/alarm changes in the MOs
› Inventory Service (IS): Get a list of all HW and SW defined in the node
› Log Service (LS): Save a log of certain events such as changes in the configuration, alarms raising and ceasing, node and board restarts
› Performance Measurement (PM): Setup that are stored in MO pm-attributes and output to an XML file every 15 minutes.
COUNTERS overviewCOUNTER TYPES:
• Peg: a counter that is increased by 1 at each occurrence of a specific activity.
• Gauge: a counter that can be increased or decreased depending on the activity in the system.
• Accumulator: a counter that is increased by the value of a sample. It indicates the total sum of all sample values taken during a certain time. The name of an accumulator counter begins either with pmSum or pmSumOfSamp.
• Scan: a counter that is increased by 1 each time the corresponding accumulator counter is increased. It indicates how many samples have been read.
• Probability Density Function (PDF): is a list of range values. If the value falls within a certain range, the range counter for that range is increased.
COUNTERS OVERVIEW Counter Reset Behavior
Counter values can be either reset at the end of ROP Period or can be accumulated up to the counter limit.
In a counter that is not reset after ROP period, the incremented value during a ROP period is the difference between two consecutive ROPs.
Counter Classification
Counters can be grouped by NE Type:
RNC
RXI
RBS
Or by area of interest:
Radio Network – RNC specific counters
Radio Network – RBS specific counters
Transport Network counters
iUb over atm
RNC
Locally terminated AAL2 conns
Locally terminated AAL2 signalling
RBS 1(Hub - AAL2
switching)
Forwarded AAL2 signalling
AAL2 switchedconns
RBS 11
RBS 12
RBS 2
Shared PVCs for AAL2 signalling towards RBS1,
RBS12 and RBS 13Shared AAL2 paths for AAL2 conns set up towards RBS1,
RBS11, RBS12
PVCs for Q.2630AAL2 pathsPVCs for NBAP, Node Synch
Note: O&M (Mub) PVCs omitted for simplicity means PVC termination
AAL2 switching cluster
ATM/SDH Transport Network
AAL2 Access Point
Figure 3 - Iub configuration example
iUb over atmAAL2 CAC and resources usage:
AAL2 connection admission control (CAC) is executed before a new AAL2 connection is set up in the system.
AAL2 connections in UTRAN are always initiated by RNC.
RNC reserves a CID and the relevant bandwidth, and forwards the establish request message through the AP. It will contain, the allocated CID, the traffic descriptors and QoS
iUb over atmCID
Because of standardization constrains, no more than 248 AAL2 connections can be simultaneously established on a single AAL2 path: more than 248 connections can be established between two adjacent nodes if more than one AAL2 path is configured.
When an AAL2 connection is allocated on an AAL2 path, a Channel Identifier (CID) is reserved and assigned by the node that is originating or forwarding the AAL2 connection request.
Figure 4 – AAL2 Connections table
iUb over atmIn particular:
The AAL2 path capacity assumed by CAC is equal to: the configured PCR, for CBR AAL2 paths the configured MCR, for UBR+ AAL2 paths zero, for UBR AAL2 paths
Flow Control:
The Flow Control function has been conceived to dynamically adapt transmission rate of Best Effort services to Iub available bandwidth by reducing transmission rate during Iub congestion situations
Initial counter checkRecommended to check in an initial investigation as they will give clues
on whether the source of the problem is transport network based.
Checking if the number of Unsuccessful local or remote AAL2 connections is increasing will indicate where potential problems exist, at the NodeB, RXI or RNC. The ‘OutConns’, viewed at AAL2 Access points in RNC looking towards the RXI/NodeB, and AAL2 Access Points in the RXI looking towards the NodeBs are the best counters to observe.
Aal2AppmUnSuccOutConnsLocalQosClassA/B/C/D
Aal2Ap pmUnSuccInConnsLocalQosClassA/B/C/D
Aal2AppmUnSuccOutConnsRemoteQosClassA/B/C/D
Aal2AppmUnSuccInConnsRemoteQosClassA/B/C/D
Initial counter checkThe following counters show the BW utilization.
› VclTp, VplTp, Atmport pmBwUtilizationRx;
pmBwUtilizationTx
To check ATM links utilization› VclTp, VplTp, Atmport pmTransmittedAtmCells
pmReceivedAtmCells
To show number of RRC/RAB Establishment failures after admission› Utrancell pmNoFailedAfterAdm
Initial counter checkTo check for congestion in the control plane
Iub interface UniSaalTp pmNoOfLocalCongestions NbapCommon pmNoOfDiscardedNbapMessages Iublink pmTotalTimeIublinkCongestedDl
Iu/Iur interface NniSaalTp pmNoOfLocalCongestions
To check for interface availability
Iub interface UniSaalTp pmLinkInServiceTime
Iu/Iur interface NniSaalTp pmLinkInServiceTime
Initial counter checkThe following counter shows if Iub Bandwidth is limiting HS services, measured
in %.
OBS. if > 75% cause could be Iub capacity or Radio limitations.
IubDataStreams
pmCapAllocIubHsLimitingRatioSpi<xx>
To see HS frame loss IubDataStreams pmHsDataFramesLostSpi<XX> IubDataStreams pmHsDataFramesReceivedSpi<XX>
To check ATM link quality Aal2PathVccTp, pmBwLostCells Aal5TpVccTp,VpcTp pmFwLostCells
Initial counter checkCheck the physical layer quality of the transmission link› ImaLink pmSesIma
pmSesImaFe
pmUasIma
pmUasImaFe
› ImaGroup pmGrUasIma
› E1PhyspathTerm,
E1Ttp,E3PhysPathterm pmEs
pmSes
pmUas
› Os155SpiTtp pmMsEs
pmMsSes
pmMsUas
pmMsBbe
› Vc12Ttp,Vc4Ttp pmVcEs
pmVcSes
pmVcUas
Iub analysisThe following flowchart summarises an Iub link analysis
procedure based on AAL2 Setup failure rate examination.Strict Admission Traffic
AAL2 Setup Failure
No AAL2 Setup FailureOK
AAL2 Setup Failure
Local
Remote
Lack of CID
Lack of Bw
Bad TN quality
Create MoreClass A VCs
Check PhysicalLayer Quality
Best Effort TrafficNo AAL2 Setup Failure
Check FlowControl Counters
AAL2 Setup Failure
Local
Remote
Lack of CID
Bad TN quality
Create MoreClass B&C VCs
Check PhysicalLayer Quality
AAL2 Setup Failure RateThe following KPIs and AAL2Ap counters are suggested to monitor the AAL2 Setup Failure rate on an Iub link.
Counters Aal2Ap::pmUnSuccOutConnsLocalQoSClass<x> (A/B/C/D)
Number of unsuccessful attempts to allocate AAL2 resources during establishment of outgoing connections on this Access Point (AP). Caused by Rejects in Connections Admission Control (CAC).
Aal2Ap::pmUnSuccOutConnsRemoteQoSClass<x> (A/B/C/D)
Number of unsuccessful establishments of outgoing connections on this AAL2 Access Point (AP).
Aal2Ap::pmSuccOutConnsRemoteQosClass<x> (A/B/C/D)
Number of successful establishments of outgoing connections on this AAL2 Access Point (AP).
AAL2 Setup Failure Rate KPIs
ssAmoteQoSClatConnspmUnSuccOulQoSClassAtConnsLocapmUnSuccOussAmoteQoSClaonnspmSuccOutC
lQoSClassAtConnsLocapmUnSuccOu
ClassALocalRateFailAAL
ReRe
%100*
]%____2[
ssAmoteQoSClatConnspmUnSuccOussAmoteQoSClaonnspmSuccOutC
ssAmoteQoSClatConnspmUnSuccOuClassAmoteRateFailAAL
ReRe
%100*Re]%_Re___2[
Similar formulae can be used for Class B & Class C.
The AAL2_Fail_Rate_Local_ClassA KPI signals possible problems in the Iub section between the RNC and the next connected node (NodeB or RXI).
The AAL2_Fail_Rate_Remote_ClassA KPI signals possible problems in the Iub section between any intermediate RXI.
CID Utilization EstimateThis is a crude method of calculating the number of CIDs as it does not distinguish between traffic types.
There is a second method using Erlang Counters, that won’t be demonstrated on this presentation.
Counters Aal2Ap:: pmExisTransConns
The number of existing connections for the Access Point (AP) existing in the node.. Gauge Counter
Aal2Ap:: pmExisOrigConns
Number of existing connections for the Access Point (AP) originating in this node.
Gauge Counter.
Aal2Ap:: pmExisTermConns
Number of existing connections for the Access Point (AP) terminating in this node.
Gauge Counter.
CID Utilization EstimateKPI
where n is the number of paths per AAL2 Access Point.
Note: if the RXI is a pure AAL2 switching node, then the pmExisOrigConns and pmExisTermConns counters can be discounted as there can be no originated or terminated connections in the node, only transiting connections.
This method of CID calculation gives a basic estimate of CID utilization.
In a typical Iub link with one VC (normally vc39) defined for Strict Admission traffic and one VC (normally vc50) defined for Best Effort traffic, the division by 2 in the formula will average the total number of used CIDs over both traffic types. For example, if the counter returns a value of 360, it is not known if this is 180 CIDs in both ClassA and ClassB&C, or maybe 240 in ClassA and 120 in ClassB&C. If it is the latter, then VC expansion is needed, as the maximum number of CIDs allowed per path (248) is being reached.
n
sConnspmExisTranConnspmExisTermConnspmExisOrigsConnectionNoAverage
][__
BW Utilization EstimateBandwidth utilization can be measured per VP and also per VC using counters.
To monitor Best Effort VC utilization is better use ‘Flow Control’ methodology.
Counters
VplTp:: pmTransmittedAtmCells = Number of transmitted ATM cells. This counter is incremented for each transmitted ATM cell. Peg counter.
VplTp:: pmReceivedAtmCells = Number of received ATM cells. This counter is incremented for each received ATM cell. Peg counter.
KPIs %100**)(_
::___2
CRegressAtmPsLengthMeas
stedAtmCellpmTransmitVplTpTxnUtilisatioVPAAL
%100**)(_
Re::___2
PCRingressAtmsLengthMeas
ellsceivedAtmCpmVplTpRxnUtilisatioVPAAL
Physical Layer QualityTN quality
Several counters are available to monitor the availability and the quality of physical and IMA terminations in CPP nodes.
Errored Seconds (ES): seconds with block errors during the PM interval. These counters are incremented for each second where one or more blocks with one or more errors are received.
Severely Errored Seconds (SES): seconds during available time having a severe bit error rate.
Unavailable Seconds: the accumulated unavailable time in seconds during the interval. Unavailable time starts when 10 consecutive SES are detected, and ends when 10 consecutive non-SES are detected. These counters are incremented for each second of unavailable time
Flow Control HSDPA Congestion KPIs:
xxiamesLostSppmHsDataFrxxceivedSpiamespmHsDataFr
xxiamesLostSppmHsDataFrsRatioHSFrameLos
Re
%100*
High frame loss indicates potential congestion problems. <xx> = the supported SPI (Scheduling Priority Indicator)
xxbSpiameDelayIupmHsDataFrtionayDistribuHSFrameDel
This counter indicates the percentage of times where Iub congestion has occurred per SPI (Scheduling Priority Indicator).
Experience has shown that in high loaded Iub cases, this counter could reach values of about 65–75%.
Flow ControlLow HS Throughput Site Analysis Study Case
Counters were extracted and graphs plotted to illustrate the HS Frame Loss Ratio and HSLimitIub KPIs over time
Flow Control Examining the KPIs resulting graphs below, it was evident that the channel normally reserved for ClassA traffic (vc39), was experiencing abnormally high bandwidth utilization.
The ClassB&C traffic channels (vc50 & vc51) were experiencing abnormally low utilization (next slide).
Flow Control
Flow ControlEnhanced Uplink Congestion KPIs
High frame loss indicates potential congestion problems.
This counter is difficult to post process, so is only recommended to be used with troubleshooting rather than performance monitoring
%100*Re
___FramesLostpmEdchDataceivedFramespmEdchData
FramesLostpmEdchDataRatioLossFrameEul
IubFrameDelaypmEdchDataonDistributiDelayFrameEul ___
Failure After AdmissionWhat is ‘Failure After Admission’?
refers to an RRC/RAB setup failure that occurs after the user has been admitted to the network.
Admission to the network occurs when the user successfully completes an initial RRC Connection Setup request.
An RRC failure that occurs after the initial admission could be if the user wanted to upswitch to a higher rate while on an existing call and the upswitch could not be achieved, due to lack of resources (Radio or Transport). This would be perceived by the user as a slow connection.
On the other hand, a RAB setup failure would be perceived by the user as a failure to setup a call.
Failure After AdmissionIn general, high ‘Failure After Admission’ occurrences are mainly due to:
Transport Network: lack of BW/CIDs, or, Radio Network: lack of Channel Element Availability.
Failure After Admission’ Study CaseTo perform this study case the following procedure is performed:
Identification of a problem site, by extraction of pmNoFailedAfterAdm counter.
AAL2 Setup Failure Rate, counter retrieval and KPI calculation. Graphical Analysis to establish correlation between both
Graphical AnalysisFCAN05A pmNoFailedAfterAdm vs Time
0
50
100
150
200
250
300
350
2009
-04-
24,0
0:00
2009
-04-
24,0
2:15
2009
-04-
24,0
4:30
2009
-04-
24,0
6:45
2009
-04-
24,0
9:00
2009
-04-
24,1
1:15
2009
-04-
24,1
3:30
2009
-04-
24,1
5:45
2009
-04-
24,1
8:00
2009
-04-
24,2
0:15
2009
-04-
24,2
2:30
2009
-04-
25,0
0:45
2009
-04-
25,0
3:00
2009
-04-
25,0
5:15
2009
-04-
25,0
7:30
2009
-04-
25,0
9:45
2009
-04-
25,1
2:00
2009
-04-
25,1
4:15
2009
-04-
25,1
6:30
2009
-04-
25,1
8:45
2009
-04-
25,2
1:00
2009
-04-
25,2
3:15
2009
-04-
26,0
1:30
2009
-04-
26,0
3:45
2009
-04-
26,0
6:00
2009
-04-
26,0
8:15
2009
-04-
26,1
0:30
2009
-04-
26,1
2:45
2009
-04-
26,1
5:00
2009
-04-
26,1
7:15
2009
-04-
26,1
9:30
2009
-04-
26,2
1:45
Time
pm
No
Fai
led
Aft
erA
dm
pmNoFailedAfterAdm
AAL2 Setup Failure Rate vs Time
0.00
10.00
20.00
30.00
40.00
50.00
60.00
70.00
2009
/4/2
4 0:
0
2009
/4/2
4 3:
0
2009
/4/2
4 6:
0
2009
/4/2
4 9:
15
2009
/4/2
4 12
:30
2009
/4/2
4 15
:30
2009
/4/2
4 18
:30
2009
/4/2
4 21
:30
2009
/4/2
5 0:
30
2009
/4/2
5 3:
30
2009
/4/2
5 6:
30
2009
/4/2
5 9:
30
2009
/4/2
5 12
:30
2009
/4/2
5 15
:30
2009
/4/2
5 18
:30
2009
/4/2
5 21
:30
2009
/4/2
6 0:
30
2009
/4/2
6 3:
30
2009
/4/2
6 6:
45
2009
/4/2
6 9:
45
2009
/4/2
6 12
:45
2009
/4/2
6 15
:45
2009
/4/2
6 18
:45
2009
/4/2
6 21
:45
Time
Rat
e % ClassA
ClassB
ClassC
IP Iub ThroughputThe client should define a user throughput threshold, in order to identify the bandwidth target to be delivered (in average) for user.
After that, this threshold should be compared with actual customer average throughput, as defined below:
THROUGHPUT PER USER:
This formula calculates the average Bit-rate per user on Iub interface.
Cells
Cellsskbit sPerCellAvNrHsUser
ntHscVolumePsIpmDlTraffi
sLengthMeassAvUserThrH
)(_
1/
ishhRabEstablestPsHsAdcpmSamplesB
EstablishsHsAdchRabpmSumBestPsPerCellAvNrHsUser
mHsRabEstestPsStreapmSamplesB
abEstsStreamHsRpmSumBestP
bEstablishestPsEulRapmSamplesB
ablishsEulRabEstpmSumBestP
IP Iub ThroughputIf the throughput per user is below defined threshold, should be identified if it has been limited by ‘Flow Control’. This can be done using Iub congestion counter:
Other indication that the transport network is overloaded, could be measured by frame loss counter, that should present values below 2%.
xxspiitingratioIubHspmCapAllocHSLimitIub lim
xxiamesLostSppmHsDataFrxxceivedSpiamespmHsDataFr
xxiamesLostSppmHsDataFrsRatioHSFrameLos
Re
%100*
If frame loss counter returns low values, and Iub presents no limitation
IP Iub ThroughputRNC Iub throughput monitoring KPIs:
Average Iub throughput:
Average Iub throughput regulated within ROP:
Periods of Iub Throughput limitation:
This KPIs observation alows to understand when low performance is due internal RNC limitation, and not by transport network.
apacitypmSamplesC
itypmSumCapacIUB_THR / skbit
gulation
gulationREG skbit ReapacitypmSamplesC
ReitypmSumCapacIUB_THR_ /
egulatedeCapacityRpmTotalTimDURATION_REG_THR_IUB sec
IP IUB EVALUATION FLOWCHART