torino, january, 15 2019 stefano gollinucci vodafone italy … · 2019-01-17 · opt dpi data...
TRANSCRIPT
C2 General
Torino, January, 15th 2019
Stefano Gollinucci
Vodafone Italy
Transport Network Operation
+39 347 2270066
C2 General
Agenda
Network architecture overview
IP backhauling, IP Core and underlying TX
IP MPLS
IP Core overview
Challenges: traffic growth, latency, resiliency
Network operation evolution
1
2
3
4
5
2
6
C2 General
Network overview
Mobile Fixed
Data Core
Voice Core
SGSN/MME GGSN/GTW
Intelligent Network
OCS OCSOCS
Voice & SMS ServiceFMP
VMS SMSC
Backbone
VPNVO TAS
Application Server
CSDBHLR FEHSS FEEIR FE
BSC RNCBNG
CSCFMGW
MSS/VLR
MGWMGW
SBC
PCRF
VRS
OPT
DPI
Data Service
FW
C2 General
Agenda
Network architecture overview
IP backhauling, IP Core and underlying TX
IP MPLS
IP Core overview
Challenges: traffic growth, latency, resiliency
Network operation evolution
1
2
3
4
5
4
6
C2 General
Transport network: mobile-fixed convergence on IP networks
5
PE
PE
PE
PE
IP Back-hauling AS IP Core AS
Optical DWDM Long DistanceOptical DWDM MAN
PoC3
~500 PoC2
PoC1
Mobile Domain
Fixed Domain
Underlying TX
MicroWave
32 Core sites
~20,000 sites
C2 General
RMX
RM1
GE1
RM4
ATB
VR1
SOR
CTG
LEV
LIB
OBT
CIV
PRT
GAL
TDI
LOI
COR
OLG
MALROG
BSK
ALT
TNT
CIS
BOR
VEA PAL
ANM
SBM
CAS
APR
CAN
PIGRCS
VAL
INV
LAT
CAP
GIO
CIT
LCG
SGV
FIR
RIO
PIN
CRU
SVB
IMP
UMT
SLU
MAM
THI
OSK
TEO
ORO
GAV
MUR
VIL
MI4
MI3
MI5
CO1
VA1
BS1
BG1
BBG TNB
VIB
BL1PN1
TV1
PD1
RO1
FE1
UDB
TRI
AN2
FR1
LT1
RMA
RM3
MOB
MAN
PR1
PCB
CR1
PI1
GR1
TO2
TO1CN1
SV1
IMB
PI2
TOA
FG1
CE1NA1
FI1
AR1
BO2
BO1
VE1
ALBPV1
VCC
LI1
TR1
VT1 PE2
MS1
LU1
PT1
CA2
OR1
SS1
OT1
NU1
BI1
RE1
MN1
RI1
AQ1
TE1
BN1
GO1BOR
CAJ
CB1
SI1
OG1
TER
SPB
BLC
IM1
BGK
SDO
TARPOL
INS
CAR
BR1
TA1
KR1
CDL
GROLTB LTC
TUP
POM
TUP
SALSCG
LAG
CTV
BAR
MOT
GRSSCG
BAR
VLG
SIS
SOV
RCBBRL
LAM
ROS
RCT
TAO
LEB
NA2
MT1PZB
BA1
BA2
CZ2
CS1
VV1
MEI
AV1
SA1
RCA
FGB
NOL
CAG
MZM
GIX
MIX
MIY
BT1
TAT
FIY
RMS
INV
NA1a
NAY
TOY
PDY VEN
BOY
PRN
42
41
33
39
37
37
37
32
32
33
36 36
31
31
30
30
26
26
27
27
24
24
17
17
3
1
1
3
2 2
2
1212
4
4
6
65
7
14
CH1
VSM
PSK
BAY
GEN
21
18
PRS
PIS
MOI
ALS
MES
PAT
SAG
CFL
BAG
SCL
CNU
SGJ
SCC
LCT
PA1
EN1 CT2
CTN
SR1
RG1CL1AG1
TPB
TP1
MAZ
CTY
9
8
10
10
1111
COD
BSKBES
41
SAV
BZB
TS1
14
CZT
MC1
AP1
MGR
MSV
23
FOL FBR
PG1PGT
23
CHR
RAV
PES
RAB RNB
PS1
FO2 RNT
PMZ
BIT
FOG
7
5
MAL
VIZ
AOA
VBA
39MI1
CNSTVS
LOI
MON
FER
ROV
PCZ
NOV
MI14
MI11
MI13
MID
CSX
SPM
17
ME1
8
9
34 34
35 35
INV
PI22
MI24 MI23VCQ
TO21
BDQ
ALQ
GEQ
CGQ
SAQ
RM21
CIQ
ORQ
CAQ
RM24
MIQ
CTQ
SOQ
REQ
BOQ
BO22
BO21
LOQ
FIQ
PEQ
COQ
TDQ
GAQRMQ
FI21
MOQ
TOQ
MI25
MI21
11
2
2
3
3
4
5
5
59
59
5454
51
5152
52
57 57
58 58
5757
45
45
60 60
61
182121
49
49
46
46
48
48
51
5156
56
50
50
54
54
55
56
56
55
55
47 47
MIQ MZQ RZQ
VRQ
CMQ
PD21
FEQ
FOQ
PSQ
CHQ
FMQ
AAQ
PE22
TEQ
FGQ
ADQ
CNQ
BA22
GRQNA22NA21
RIQ
LTQ
VEQ
DUQ
CDQ
MTQ
NAQ
ARX
18
PET
SVC
OLT
4
2
62
62
63
63
RC1
CAJ
INV
POG
SGI
61TRG
ACQ
AFR
42
DWDM (Dense Wavelength Division Mux)
3rd window (1,550 nm)
240 ROADM nodes (Reconfigurable optical add-drop mux)
140 regeneration nodes
10/100/200Gbits optical channels
The underlying TX layer: DWDM
Transponder
Mux
DemuxDCM
Amplifier
EDFA
TX
RX
Amplifier
EDFA
Host
Host
C2 General
Agenda
Network architecture overview
IP backhauling, IP Core and underlying TX
IP MPLS
IP Core overview
Challenges: traffic growth, latency, resiliency
Network operation evolution
1
2
3
4
5
7
6
C2 General
MPLS
8
MPLS (Multi Protocol Label Switching):
Make it possible to segregate traffic flow through the creation of VPNs (Virtual Private
Networks)
Manage traffic routing implementing a FEC (Forwarding Equivalence Class), that is a
group of IP packets which are forwarded in the same manner, over the same path,
and with the same forwarding treatment. While in a plain IP network the FEC is
usually determined at each hop, on an MPLS network the FEC is determined once, at
the ingress of the network.
Routing is based on distribution and swap of labels between routers rather than less
efficient IP routing table lookup
Traffic engineering is supported
Fast re-routing (<50 msec) in case of faults
C2 General
The MPLS Shim Header
IP PacketMPLS Shim Header(s)L2 Header
Label TTLSEXP
4 bytes
20 bits 3 bits 1 bit 8 bits
Bottom of label stack
EXP Bits: QoS
C2 General
MPLS Header
max number of headers in the stack is 3 => MTU must increased by 3 x 4 = 12 bytes
IP PacketL2 HeaderForwarding Label
(LDP)
IP PacketL2 Header
IP PacketL2 HeaderTraffic Eng FRRLabel (RSVP)
Forwarding Label(LDP)
IP PacketL2 HeaderForwarding Label
(LDP)L3 VPN Label
(MP-BGP)
IP PacketL2 HeaderForwarding Label
(LDP)L3 VPN Label
(MP-BGP)Traffic Eng FRRLabel (RSVP)
C2 General
Global RT VRF A VRF B VRF C
PE
VRF A
VRF B
VRF C
VRF C
CE1
CE2CE3
CE4
Other
MPLS PEs support the creation VRFs. Each VRF
constitutes a separate routing and forwarding table,
isolated from the others.
The “regular” routing table is called Global
Routing Table, and routes/packets default to this
table when a VRF is not specified
VRF names have only local significance. Having the
same VRF name among different routers does not a
mean the two VRFs are part of the same VPN.
Each VRF has an associated (unique to the router)
Route Distinguisher (RD). The RD is used by MP-
BGP to avoid confusing routes with overlapping IP
addresses from different VPNs. It’s not used to
decide which route will be part of which VPN (the
Route Target is used instead for this).
Even if it’s common to use the same RD for “similar”
VRFs on different PEs, this is not always the case.
Each physical/logical router interface can be
associated (at most) to one VRF. By doing so that
interface will be bound to the corresponding VRF. If
a VRF is not specified, the interface will be bound to
the Global RT.
One common situation is to have many CEs, each
connected with a physical interface to the PE, with
each interface associated to one of the PE’s VRFs.
RD 100 RD 150 RD 200
VRF – VIRTUAL ROUTING AND FORWARDING
C2 General
PE1
PE2
PE3
PE4
For example, using a RT value to denote a specific VPN,
we can build full-mesh VPNs, completely isolated one
from each other.
In this example we have two full-meshed VPNs, one
associated with RT 100 and the other with RT 200.
MP-iBGP
RT 100
RT 100
RT 100
RT 100
RT 200
RT 200
RT 100
RT 100
RT 200
RT 200
GRT
GRT
GRT
GRT
import
export
export
import
export
import
import
export
import
export
VPN – VIRTUAL PRIVATE NETWORK
C2 General
P
BGP
PECE PE
CE
10.10.10.0/24
Customer
“PINOCOM”
Milano
Customer
“GINONET”
Milano
CE
10.10.10.0/24
VRF PINOCOM
VRF GINONET_MIVRF GINONET_BO
VRF PINOCOM
* PE-CE Protocol
(OSPF, BGP, Static
routes, …)RD 30722:100
RD 30722:200
RD 30722:100
RD 30722:300
70
RT 30722:80
VPN LABEL: 6666
30722:200:10.10.10.0/24
30722:100:10.10.10.0/24RT 30722:70
VPN LABEL: 5555
80
80
5555
Route Distinguisher
4 byte – To manage IP
address overlapping
VPNv4 Address
Family
Route Target – BGP Extended
Community (4 byte) – To
import/export routes in VRFs
VPN Labels are piggybacked on MP-BGP Announcements
Customer
“PINOCOM”
Bologna
Customer
“GINONET”
Bologna
10.10.10.1
to 10.10.10.110.10.10.0/24
MP-
Multi-Protocol BGP
70
✓P P
C2 General
QOS
14
QOS class mapping in MPLS networks done using EXP bits.
Three strict priority classes, for voice services and signaling.
Data traffic is assigned Default, Standard or Enhanced classes, with WRED algorithm ti handle
congestion
Class DSCP EXP bits
Queuing
Algorithm Scheduler
Control Plane CS6, CS7 6,7 - PQ
Voice EF 5 - PQ
Enhanced/Standard
AF31, AF32,
AF41, AF42 1,2,3,4 WRED
PQ, CBWF,
MDRR
Default default 0 WRED
PQ, CBWF,
MDRR
C2 General
WRED
15
K = % discard
0%
20%
Queue length
Starting from A, WRED
starts drop packets
WRED drops the 100 % of
packets at C (Higher
threshold)
A C
K
Before A (Lower
threshold), there is
buffering without Drops
100%
% packet discard
WRED (Weighted Random Early Detection) is a congestion notification method for TCP/IP
queues. It is designed to manage bursty traffic (i.e. 4G)
C2 General
Agenda
Network architecture overview
IP backhauling, IP Core and underlying TX
IP MPLS
IP Core overview
Challenges: traffic growth, latency, resiliency
Network operation evolution
1
2
3
4
5
16
6
C2 General
IP Core – POC1 Core sites
17
IP Core is deployed in 32 Core sites, each of them including:
Redundant POC1 Aggregation routers to collect traffic from IP backhauling POC2
network
BNG (BRAS) servers to manage fixed customers data traffic towards IP Core
2G, 3G controllers to manage mobile traffic, they forward voice traffic towards Voice
Core, and data traffic towards Data Core
Redundant 4G SecureGateways, terminating IPSEC tunneling, and forwarding VOLTE
(Voice over 4G) towards IMS Voice Core and data traffic towards Data Core
Voice Core systems, MSS, MGW, IMS, etc.
Data Core systems, MME, GTW, etc.
Redundant PE routers as MPLS ingress points of IP core
C2 General
IP Core – Core sites overview
PE
PE
POC1 - CORE SITES
POC1
Aggregation router
RNC
(3G)
SecureGW
(4G)
BSC
(2G)
BNG
(BRAS- fixed)
POC2
POC2
MSS/MGW
MME/GTW
BACKHAULING
IP CORE
C2 General
IP Core Network
19
The IP Core consists of:
64 PE routers (Label Switching Edge) located in 32 Core sites
8 P routers (Label Switching Router) in a two-layers fully redundant architecture
8 Route Reflectors to manage iBGP sessions scalability issue
Dedicated PEs towards Internet, with cache servers to increase efficiency with OTTs
traffic and to announce eBGP routes
DPI (Deep Packet Inspection) layer to implement optimisation policies on Internet
traffic
Internet Peering points as well as interconnections with internatonal carrier in Roma,
Milano
Anti-DDOS systems
C2 General
IP Core Network Layout
20
PE
PEPE PE
PE
PE
LSR (LABEL SWITCHING ROUTERS)
PE (Provider Edge)
P
P
P P P
PCE
C&W
DPI (DEEP PACKET INSPECTION)
INTERNET EDGE + CACHE SERVERS
CORE SITES
BRAS
(fixed)
RNC
(3G)
SECGW
(4G)
PEERING (RM,MI)
ANTI-DDOS
MAN INTERNET (RM,MI)
C2 General
IP Core security: a few examples
21
DDOS attack
An ANTI-DDOS system elaborates real time traffic profiles from the MAN
Internet routers. When an anomaly is detected the system commands the
routers to deviate the attacked prefix towards the system itself. The system
implements a deep inspection algorithm to clean it up and and re-inject the
traffic to the routers
BGP hijacking
BGP protocol is used to announce IP prefixes (i.e. up to /16). Whenever an
operator announce a more specific route (i.e. /24) by mistake or aiming at
illegal scope the traffic is deviated. Operators can detect prefix
announcements on routers which are visible on public platforms like looking
glass. The remedy is to announce a more specific route. Carriers usually
implement automatic consistency checks between BGP routes and RIPE
registrations
C2 General
Agenda
Network architecture overview
IP backhauling, IP Core and underlying TX
IP MPLS
IP Core overview
Challenges: traffic growth, latency, resiliency
Network operation evolution
1
2
3
4
5
22
6
C2 General
Challenges: traffic trends and capacity management
23
Data traffic growth is both technology and market driven
Non linear effects play an increasing role in traffic profiles:
Gaming platform new releases
Simutaneous software upgrades downloads
Streaming of special events, football matches, concerts, etc.
Cache servers increasing in number and moved towards the customer to relieve
bandwidth requirements and costs on long distance
Increase of direct peering points
QOS!
SDN (Software Defined Network) offers capabilities to support and enhance capacity
planning processes
C2 General
Latency, latency, latency!
24
Throughput is depending on latency!
IoT applications rely on low latency
Real time applications with very low latency
requirements also to drive the future evolution of 5G
mobile networks
QOS is key to manage buffering and queuing privileging
the low latency applications
Moving intelligence towards peripheral data center to
shorten the physical path and reduce RTTs
C2 General
Resiliency (I)
25
Resiliency policy: active-standby versus load sharing
Active-standby optimizes latency, since primary and secondary path are
likely to have different RTTs being on different physical paths. On the
other end load sharing is in principle more efficient, and makes sure
there is no unused link in the network minimising the risk of ‘silent’
issues
Minimum-link configuration
Due to technical constraints primary/secondary paths may consist in
10/100Gbits link bundles. In case of a failure on a single link the primary
path remains still active but with reduced capacity. The minimum-link
parameter defines the minimum number of links active on the bundle
which has to trigger switching on full capacity secondary path to avoid
congestion
SDN (Software Defined Network) advanced features include
automatic re-configuration and intelligent re-routing
C2 General
Resiliency (II)
26
Critical components and risk analysis
Likelihood of double failures has to be carefully
estimated, taking into account critical
components failure rate and the expected time to
recovery. For example islands connectivity via
submarine cables shows a relatively low failure rate
but a potentially high time to recovery
Disaster recovery
Extraordinary events like heartquakes, flood,
accidental fire may seriously impact service
continuity of telco networks, which play strategic
roles during an emergency situation. Specific
countermeasures on site robustness, let alone
recovery plans have to be designed and
periodically tested to ensure Business Continuity
C2 General
Agenda
Network architecture overview
IP backhauling, IP Core and underlying TX
IP MPLS
IP Core overview
Challenges: traffic growth, latency, resiliency
Network operation evolution
1
2
3
4
5
27
6
C2 General
Monitoring, preventive maintenance, robots (I)
28
Real time monitoring is usually done looking at alarms
propagated by network equipments. End to End service KPIs and
traffic performance are also used though, being more effective
to quickly identify the root-cause
Extensive preventive checks on disturbances, event logs,
misconfigurations do provide useful hints to prevent failures
taking the proper actions proactively
Automation is key to improve both reactivity and prevention,
providing intelligent correlation engines, relieving humans in
repetitive tasks and zeroing the risk of missing critical
information
C2 General
Monitoring, preventive maintenance, robots (II)
29
Growing complexity makes increasingly difficult and risky
manual tasks (i.e. manual correlation on VRF/VPN configuration
data, or massive parameter changes in a large MPLS network )
Traditional Network Inventory systems based on manual
documentation could be hard to maintain and is prone to errors
in matching data between network layers
SDN can provide self discovery inventories, error free and
automatically up to date, matching data between different layers
The level of automation is of paramount importance to
overcome the complexity in trouble-shooting, to minimize errors
in configuration tasks
C2 General
SDN impact on operation: recap
30
SDN solutions:
provide standardised Northbound APIs and third parties
management
support DevOps and programmable interfaces
discovery based inventory
provide optional analytics and Artificial Intelligence
modules for advanced features lik automatic re-routing
or predictive maintenance
C2 General
From Preventive To Predictive
31
Extensive routine checks have been improving quality and reducing outage probability,
Today exploiting big data and applying machine learning algorithms is expected to boost
maintenance processes efficiency and achieve perfect quality.
C2 General
Predictive Models
32
Deep learning/machine learning algorithms (Neural Networks, Random
Forest, Vector Machine, Logistic Regression, etc.) are able to correlate
events, identify patterns and make predictions
Precision, Recall, AUC KPIs used to estimate suitability to provide reliable
predictions
Algorithms need training and fine-tuning.
Training could take advantage on expert hints rather than relying on a
black box approach
Nr. of events correctly predicted TRUE
Nr. Of events predicted TRUE
PRECISION
Nr. of events correctly predicted TRUE
Nr. of events actually occurred TRUE
RECALL
5
15= 33%
5
10= 50%
Prediction
Re
ali
ty
False True
Fa
lse
Tru
e
80 10
5 5
Reality and model coincide Reality and model diverge
Confusion Matrix
C2 General
From predictive to prescriptive
33
Deep learning/Machine learning predictions are usually ‘blind’, they
predict what will happen but cannot tell you why it will happen
Algorithms can learn from humans and provide more insights on root
causes
Predictions should pave the way to proactive action: not an obvious step!
Helping to govern an ever increasing complexity in an effective and
efficient way is the ultimate goal
C2 General
In summary:
34
IP networks are a key component of telco networks, they are growing in
size and complexity, a growth that is going to promise considerable
challenges
SDN, automation and the promising frontier of Artificial Intelligence are
likely to become an indispensable aid for both design and operation
IP technologies do require from engineers high profile, very specialized
skills
Nevertheless the capability to cooperate with ‘clients’ as well as experts
from other areas, as well as the unrelenting thirst to learn new things will
be the key for success