verification of critical telecommunication systems of critical telecommunication systems mats...
TRANSCRIPT
Verification Of Critical
Telecommunication
Systems
Mats Larsson
Ericsson
itle
In CAPITALS
50 pt
ubtitle
32 pt
Verification of Critical Telecommunication Systems
Mats Larsson+46 8 7273162
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-153
Outline
Content of presentation– Introduction from the end users perspective
– Technical Introduction to a fault tolerant platform
– How basic fault tolerance could be built into a
telecommunication platform
– Fault tolerance mechanisms targeted in verification
– Special problems encountered during the verification
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-154
Introduction: Where are we?
User View:– A number of users
– A number of services
– Users include:
Fixed (PSTN)
GSM, GPRS
3G (WCDMA, CDMA2000, UMTS and EDGE)
– Services Include:
Fixed Voice
GSM Voice
3G Voice/Video/Data
Data (internet, mail...)
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-155
Introduction: Where are we?
Network View: (WCDMA example)– Services to user provided by a network of nodes
– Nodes designed differently depending on purpose and usage
– Example of Nodes:
RBS, Radio Base Stations
RNC, Radio Network Controllers
MGw, Media Gateways
MSC, Mobile Switching Centres
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-156
Introduction: Where are we?
Node View:– Extremely different need depending in purpose in network
Ex RBS Requirements:
- Small footprint, light weight, low power
consumption, weather resistant
Ex MGw Requirements:
- High processing power, high level of fault
tolerance, many high speed network interfaces,
support many connections
– Common platform for many nodes?
Increased flexibility required!
Migration of functionality from platform from application
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-157
What is our challenge?
Robustness!
– Operation for long periods of time without failures
– Most bad things should be able to happen without effecting the end users!
– Operation & Maintenance without effecting end users= fault tolerance and redundancy!
Flexibility & Scalability!
– Requirement of supporting smallest RBS to largest MGw
– Requirement of efficient scalability
Importantly!
– Robustness, Flexibility and Scalability is not achieved by a perfect verification process, it may only make such even more complex!
Conclusions:
– We have both architecture and verification issues
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-158
System Architecture
Node & Cabinet– Each node consisting of 1 - n number
of standard 19” or special cabinets
– Multiple nodes may share one the same standard cabinet
– Each standard cabinet holding up to 3 subrack
19” Subrack– Subrack have a backplane to connect
up to 28 device boards
– Subrack have a power and fan unit, cable shelf
– Subrack have 4 special slots and 24 generic slots with support for hot plug & play
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-159
System Architecture
Board Info– 8 board types and >30 different boards
– Each board 265x225x15mm (30mm)
Example of Board Types– Generic Processing Boards (GPB)
– Time Unit Boards (TUB)
– Switch Core Boards (SCB)
– Switch Extension Boards (SXB)
– Exchange Terminal Board (ETB)
– Media Stream Boards (MSB)
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1510
System Architecture
Physical and Logical View– Physical View:
Boards physically connected through the backplane
Board backplane contains data and power bus
– Logical View: Boards logically connected
to Switch Core(s) Each board may through
Switch Core communicate with any other board
Switch Core support 28 boards at 622mbit/s
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1511
Processing units
3 different processing units– General Purpose Processors
on General Processing Boards
For execution of application software without special
hardware dependencies
1 CPU / board.
– Special Purpose Processors
On Special Processing Boards for special processing of
applications in UMTS networks
3-5 CPUs / board.
– Media Stream Processors
On Media Stream Boards
for special voice processing
15-20 CPUs / board.
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1512
Distribution Model & Fault Tolerance
Distribution of software for fault tolerance– N+N Processor Clusters
Active processor executing Passive processor waiting for switch - 50% utilization of used hardware + No reduced performance after failure
– N Pooled Devices Pool of N processing units available + Normal utilization level >50% of total processing power - Failure of processing unit cause N-1 performance
reduction
– Typically Usage Processing Clusters for continuously reliable
execution of 1 software instance Pooled Devices for execution for continuously
execution of many software instances
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1513
Reliable Program Execution
Reliable Programs– For protection against software & hardware failures in the
Main Processor Cluster (MPC)
– Reliable Programs executing on individual hardware, using
individual memory in individual environments
– Two versions of the same software runs simultaneously
Active: Executing and providing the service
Passive: On hold waiting for failure in active
– Typical Configurations:
Fault in Active, Passive goes Active, Faulty goes
Passive after fault recovery
Fault in Active, Passive goes Active, Faulty goes
Active after fault recovery
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1514
Reliable Program Execution
Reliable Program Example:– Task A needs to be executed resistant to:
Software faults in software P executing task A
Hardware faults in hardware executing software P
– Software P loaded on two individual boards
Active and Passive executing simultaneously
Active and Passive instance seen as one Reliable
Program from the system
RP
P P
Active Instance
Passive Instance
Reliable Program
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1515
Reliable Program Execution
A look at the inside of a Reliable Program– To reduce failover times program may use persistent data
storages to save a “resume” point.
Active
Initial ProcessingInput Data
Processing Step 1
Processing Step 2 Processing Step 2
Resume Point
Resume Point
Processing Step 3 Output Data
Resume Point
Passive
Initial Processing
Processing Step 1
Resume Point
Skip
Skip
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1516
Who is monitor the monitor?
Fault Tolerant Core– Two identical GPBs acts as Fault Tolerant Core in the node
– Each executing one supervisor monitor monitoring reliable
programs and other supervisor
– Failure on Core GPB/supervisor handled according to
predefined protocols and escalation stairs
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1517
Signalling System 7
SS7 Background & Usage– SS7 is a set of standards and protocols for message
exchange in telephony networks
– Typically used for tasks suck as call setup and control
message exchange.
SS7 vs. OSI vs. TCP/IP– Where are we on the map?
A B C
Control Traffic i.e. SS7
Voice Traffic i.e. TDM/ATM/IP
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1518
SS7 Basics
Mtp3 Terminology– Signal Point
Logical Node in network addressable with a unique
address (point code)
– Signal Route Set & Signal Route
A Signal Route Set is a container holding a number of
valid routes for traffic to a specific point code
A Signal Route is a specific route within a Signal Route
Set with a specific priority
– Signal Link Set & Signal Link
A Signal Link Set is a container holding a number of
valid Signal Links connected to a adjacent node
A Signal Link is a specific connection to a adjacent
node used to exchange data messages
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1519
Node B Node A
SS7 Network Example
Signal Link
Signal Link Set
Signal Link
Signal Link Set
Cable
Signal Point
PC=100
Signal Point
PC=200
Signal Route Set
Destination=200
Signal Link Signal Link
Signal Route
Priority=1
Signal Route Set
Destination=100
Signal Route
Priority=1
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1520
Half Presentation Summary
Fault tolerance in one example architecture– Available processing units/boards
– Reliable Program Execution
Introduction to SS7– Terminology
– SS7 network example
Parts To Cover– Robustness features covered in verification
– Problems encountered due to robustness features
– Verification examples from the lab
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1521
Covering Robustness
Examples of covered robustness areas– Software Failures
Program Restarts
Reliable Program Switch
– Hardware Failures
Board Failures
Subrack Failures
Node Failures
– Bearer Failures
Bearer Failure
Bearer Breakage
– Network Failures
Traffic Rerouting
Congestion Handling
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1522
Software Failures
Reliable Program Failures– What reliable programs exist in the scope?
– What conditions needs to be covered?
Idle system
Loaded system
Non Reliable Program Failures– What programs on the node exist in the scope?
– What conditions needs to be covered?
Idle system
Loaded system
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1523
Hardware Failures
Board Failures– What boards on the node exist in the scope?
What boards is used by reliable programs?
What boards is used by non reliable programs?
– Conditions to be covered
Failure in Idle board
Failure in loaded board
Subrack Failures– For multi subrack nodes
– Conditions to be covered
Failure in idle subrack
Failure in loaded subrack
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1524
Node C
Network and Bearer Failures
Node B Node A
Signal Link
Signal Link Set
Signal Link
Signal Link Set
Signal Point
PC=100
Signal Point
PC=200
Signal Route Set
Destination=200
Signal Link Signal Link
Signal Route
Priority=1
Signal Route Set
Destination=100
Signal Route
Priority=1
Signal Link Set
Signal Point
PC=300
Signal Route Set
Destination=100
Signal Route
Priority=2
Signal Link Set
Signal Route
Priority=2
Signal Link Set
Signal Route Set
Destination=200
Signal Route
Priority=1
Signal Route
Priority=1
Signal Link Set
Sig
nal Lin
k
Sig
nal Lin
k
Sig
nal Lin
k
Sig
nal Lin
k
Signal Link
Signal Link
Signal Link
Signal Link
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1525
Node and Bearer Failures
Node B Node A
Signal Link
Signal Link Set
Signal Link
Signal Link Set
Signal Point
PC=100
Signal Point
PC=200
Signal Route Set
Destination=200
Signal Link Signal Link
Signal Route
Priority=1
Signal Route Set
Destination=100
Signal Route
Priority=1
Signal Route
Priority=2
Signal Route
Priority=2
Node 3
Node 4
Node 1
Node 2
Signal Route
Priority=3
Signal Route
Priority=4
Signal Route
Priority=3
Signal Route
Priority=4
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1526
Problems Encountered…
Quotes from the lab– “We did send some data from node A to node B but we
have no idea of how it got there!”
– “It should not be possible to send the traffic but somehow it is running smoothly”
Configurations…– What redundancy is relevant to verify?
– What failures is relevant to cover?
Finding and Fixing Faults…– For each fault:
What hardware was used and where was it located? What software revisions was used in what software? How was that software configured to interact with the
network? How was the network configured?
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1527
Summary and Conclusions
Robustness and Fault tolerance is complex to handle– It is impossible to cover it all!
– Robustness adds complexity and complexity adds test cases
– Test cases for robustness is complex
Flexibility– Flexibility adds even more test cases
– Flexibility adds configurations to execute test cases on
The Challenge!– Lower observabillity
– Lower controllability
The Benefit!– There is no boring days at work!
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1528
Future Challenges?
The evolution in networks– Migration to IP in telecommunication network
– “All IP” solutions
The evolution in hardware– Increasing demand of processing power
Increasing number of CPUs
Increased speed of each CPU
– Consolidation of different platforms to blades
Common subrack
Shared device boards
The evolution in services
– Migration of services
Circuit to Packet
Data explosion in services
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1529
Questions
Questions and Comments
Top right
corner for
field
customer or
partner logotypes.
See Best practice
for example.
Slide title
40 pt
Slide subtitle
24 pt
Text
24 pt
5
20 pt
© Ericsson AB 2005 2005-07-1530
References & Pointers
References:
Ericsson public articles– http://www.ericsson.com/about/publications/review
SS7 Tutorials– http://www.pt.com/tutorials/ss7/
SS7 ITU-T Standards– www.itu.int