optimised redundancy for security gateway deployments · optimised redundancy for security gateway...

21
Optimised redundancy for Security Gateway deployments

Upload: buidieu

Post on 31-Aug-2018

234 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

Optimised redundancy for Security Gateway deployments

Page 2: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

2 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

Customer Priorities

Juniper LTE Solution

RECAP:- JUNIPER LTE SECURITY OFFERING

SRX Security

Service Gateways

• 120G FW

• 30G IPS

• 10M Sessions

• 350k SPS

• 21M pps (64B)

• TL 9000 certification

• In Service SW Upgrades

• NEBS III / DC Power

• CC EAL

• Hot Swap I/O Cards

• ICSA

RAN and UE protection

Secure business and access to

all services from any to any

Mission critical availability

Voice over LTE

SCTP protection

Scalability

Core elements protection

Coordinated protection

• IP & GTP & SCTP Firewall

• QoS

• DoS Protection

• IPv6

• IPSec

• High Availability

Page 3: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

3 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

RESILIENCY CONSIDERATIONS FOR LTE/SEGW

eUTRAN

S-GW

MME

Evolved

Packet

Core

Cell sites

Services/Internet Security

Gateway

Catastrophic Act of Nature/Criminality/Terrorism

Geographic site distribution

Highly available Security Gateway

Clustered mode with IPSec tunnel and S1-U/S1-MME session synchronisation

Redundant everything

Inter-node cluster links, power feeds and PSUs, physical SeGWs

Fast failover for latency-sensitive services like VoLTE

Provide lowest possible failover times, under 0.5s

Maintain signalling

Ensure SeGW does not cause problems with common signalling failover times (800ms)

Node maintenance

Firmware and hardware upgrades with near-zero downtime

Page 4: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

4 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

Aggregation site 2

Aggregation site 1

ANATOMY OF A REDUNDANT SOLUTION

BACKHAUL

S-GW P-GW

MME

Cell-site

Core Site

Geographic distribution

Requires inter-site L2

connectivity

No hard distance limitations

Latency between sites must be

less than 100ms

Redundant HA links

Dual links for control and data

plane HA

Separate physical paths for

best redundancy

SRX5800

Rear

2+2 Redundant power

Dual power supplies on dual

zones per site

Resilient against loss of 1

entire feed or 2 PSUs

High Availability

Synchronisation of IPSec SAs

for rapid failover

Failover time commonly ~1s

L3 redundancy

BFD used to provide link

failover at L3 in ~300ms

Mitigates loss of adjacent

routers or links

Active/Active VPN

Split SCTP signalling for dual-

homed nodes

SCTP handles subsecond

signalling failover

Page 5: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

5 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

GEOGRAPHIC CLUSTER DISTRIBUTION

Site A Site B

L2

Infrastructure

Cluster

Jurisdiction

HA Links

Mitigate catastrophic event by distributing SeGW cluster members between

physical sites with L2 connectivity (required)

No hard maximum distance

Latency between sites should be less than 100ms

HA connections can be directly cabled or over a switched infrastructure

Appnote enclosed explains design guidelines

Page 6: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

6 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

MULTIPLE HA LINKS

Dual links can be used for control and forwarding plane (Fabric) HA

Maximum availability of cluster links across distributed sites

Requires additional Routing Engine (RE) per node for dual control links

2 I/O ports per node required for dual Fabric links (1Gbps or 10Gbps)

Should be cabled over separate physical paths/infrastructures for greatest

resilience

Node 0 Control

plane

Node 0 Dataplane

Node 1 Control

plane

Node 1 Dataplane

SRX Node 0 SRX Node 1

Separate physical paths

between sites

Page 7: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

7 Copyright © 2011 Juniper Networks, Inc. www.juniper.net

REDUNDANT POWER OPTIONS

Fully redundant, 2+2 power (DC or high-capacity AC) available

Dual zones on SRX (as above)

Dual power feeds in aggregation site should be distributed across zones

Eg, Feed 1 goes to PEM 0 and PEM 1, Feed 2 to PEM 2 and PEM 3

SRX can continue to fully function through loss of

Entire single power feed

Up to 2 PSUs, providing they are different zones

Power

feed 2

Power

feed 1

Page 8: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

8 Copyright © 2009 Juniper Networks, Inc. www.juniper.net

HIGH AVAILABILITY:- CORE FUNCTIONALITY

JUNOS HA provides a number of core resilience functions on SeGW

Synchronisation of IPSec SAs

No tunnel re-establishment = minimal downtime for SeGW failover

Synchronisation of underlying clear-text sessions – SCTP and GTP

Allows for stateful security and HA for SCTP signalling

ISSU (In-Service Software Upgrades)*

Upgrade JUNOS with minimal downtime (potentially subsecond)

SPC capacity upgrade

Scale performance with minimal downtime (potentially subsecond)

IPSec tunnels

IPSec SA and

session sync

*IPSec support for ISSU coming 2H2012

Page 9: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

9 Copyright © 2009 Juniper Networks, Inc. www.juniper.net

OPTIMISED L3 FAILOVER

EPC

RAN

OSPF/BFD adjacency

EPC

L3 forwarding interface (Reth)

Site A Site B

Use 2 x L3 links up and down stream

for optimised failover

BFD (+DRP) runs between SRX and

adjacent aggregation/PE routers

Loss of aggregation/PE router or a

link causes L3 route failover

HA failover occurs only if both L3

interfaces (up or down stream) on a

node are down

Failover with BFD occurs with an

absolute downtime of ~350ms

Ideal for high priority traffic

requirements, eg VoLTE

Page 10: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

10 Copyright © 2009 Juniper Networks, Inc. www.juniper.net

OPTIMISED L3 FAILOVER – IPSEC TERMINATION

L3 ingress IP changes as interface

fails over

Needs an agnostic logical interface for

IPSec termination

‘Loopback Reth’

A physical interface is kept up with a

local loop cable

Used as the outgoing interface for

IKE negotiation – but no traffic

traverses the looped cable

Can be 1Gbps or 10Gbps – no

forwarding needed

Can be migrated to logical loopback

from JUNOS 12.3 (loopback currently

not supported for IPSec termination in

cluster mode)

IKE/IPSec

termination point

Loopback

cable

SRX

Aggregation

router (site A)

Aggregation

router (site B)

NB Logical view only,

SRX cluster not shown

Possible IPSec

tunnel paths

L3 interfaces

Page 11: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

11 Copyright © 2009 Juniper Networks, Inc. www.juniper.net

SIGNALLING OPTIMISATION

MME

eNB The problem:-

• SCTP signalling applications typically

failover in 800ms or less

• For dual-homed signalling, primary AND

secondary paths could both fail in 1.6s

• Under certain conditions, SeGW HA failover

takes > 1.6s

• HA failover could lead to complete loss of

signalling

The solution:-

• Split the primary and secondary SCTP

sessions, both from a RAN path perspective

and also an SeGW termination point

perspective

•Use Active/Active HA and divide the homing

across cluster members

Association setup

(INIT exchange) +

primary SCTP path

Secondary SCTP

path

Page 12: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

12 Copyright © 2009 Juniper Networks, Inc. www.juniper.net

SIGNALLING RESILIENCE WITH ACTIVE/ACTIVE HA

S-GW MME

MS

eNB

RAN

User plane

Primary

SCTP

Secondary

SCTP VPN B VPN A

SCTP dual-homed association

split down dual IPSec tunnels

In case of loss of primary path or

primary SeGW, signalling fails to

secondary VPN

Secondary VPN always up

Signalling timers (~800ms) are

catered for

User plane is not rerouted to

secondary VPN

Assumes failover time (1-3s) is

acceptable for user plane

Page 13: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

13 Copyright © 2009 Juniper Networks, Inc. www.juniper.net

SIGNALLING RESILIENCE WITH ACTIVE/ACTIVE HA – FAILOVER WALKTHROUGH

S-GW MME

MS

eNB

RAN

User plane

Primary

SCTP

Secondary

SCTP VPN B VPN A

1

1 Normal operating conditions – User plane

and primary SCTP through RG1, secondary

SCTP through RG2

2 RG1 failure (eg SRX loses power). User

plane forwarding and primary SCTP path

lost

3 RG1 begins to failover; SCTP detects path

down and uses secondary path

4 Failover completes, RG1 and RG2 active

on same node. User plane traffic resumes

5 Primary signalling path recovered through

SCTP heartbeats. HA preemption can be

optionally configured to failback

2

3

4

5

Page 14: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

14 Copyright © 2009 Juniper Networks, Inc. www.juniper.net

A/A ADDITIONAL BENEFIT:- SCTP ALG

MME

RAN

Primary

SCTP

Secondary

SCTP IP B IP A

SCTP Association is

synchronised across cluster

Possible sessions for a given

association are clearly defined

by src/dst IP addresses in the

INIT exchange

Turning on SCTP ALG allows

SCTP to be handled statefully

Prevents any potential attacks

listed in RFC5062, eg hijacking,

bombing

IP C

IP D

Init

exchange SCTP Association

SIP=A,B

DIP=C,D

Page 15: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

15 Copyright © 2009 Juniper Networks, Inc. www.juniper.net

USER PLANE FAILOVER WITH DUAL TUNNEL

User plane failover requires a

mechanism to detect that the

tunnel is down (or not passing

traffic due to a problem in the

path)

This could be DPD

Tends to have long timers which

do not facilitate rapid failover

30s+ common for DPD to detect

tunnel down

Checks tunnel liveness only via

IKE (does not extend to

forwarding plane checking)

Could also be a DRP

Not necessarily supported on

eNBs

Page 16: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

16 Copyright © 2009 Juniper Networks, Inc. www.juniper.net

FUTURE FOR TUNNEL FAILOVER – BFDoIPSEC?

BFD could offer a solution

Could be run in conjunction

with static routes

Granular timing options for

BFD keepalives

50ms is typical minimum

Can give high speed

failover between tunnels

including user plane

Currently supported over

IPSec on SRX

Not supported on all (any?)

base stations today, but

planned*

*caveat:- Juniper is not a basestation vendor, this is what we have heard!

Page 17: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

17 Copyright © 2009 Juniper Networks, Inc. www.juniper.net

GEOGRAPHIC MIGRATION OF SEGW

SeGW deployments tending

towards a large scale

centralised deployment

A more distributed architecture

has advantages

More efficient X2 transport

Minimal impact of SeGW

node failure

Lower performance

requirements per node

Loopback termination of

IPSec VPNs could offer a

simple migration path in

conjunction with A/A

Dual tunnels could exist on

different clusters during

migration

MME S-GW

One VPN migrated;

traffic failed over; 2nd

VPN migrated

Page 18: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

18 Copyright © 2009 Juniper Networks, Inc. www.juniper.net

SEGW:- REDUNDANCY SUMMARY MATRIX

Requirement Solution component Notes

Redundant power 2+2 PSUs Dual feeds per site required

Redundant HA links Dual control/Dual data plane HA

links

Links pairs should traverse

disparate paths

High Availability SRX cluster Provides IPSec SA and session

synchronisation

Fast failover at L3 Dual L3 links with BFD Mitigates loss of adjacent routers

or links

Signalling failover Active/Active Dual tunnel Design may not be supported by

all radio vendors

Geographic redundancy Dispersed cluster L2 needed between sites

Page 19: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

19 Copyright © 2009 Juniper Networks, Inc. www.juniper.net

REBOOT

Create Temporary IPSec Tunnel

eNodeB SGW

DHCP (can be coresident on SRX)

PKI - BE PKI – FE 1

DHCP: eN- IP@ & operator specific CA-IP@ / SEG-IP@ / NetAct-IP@

Authenticate to Operator’s CA with eNB vendor Certificate & key signing request

Create, sign & download operator’s eNB Certificate

Create Permanent IPSec Tunnel

Init

ial

Tu

nn

el

Pe

rma

ne

nt

Tu

nn

el

RE

LA

Y

Conf Server

PKI – FE 2

PROVISIONING – AUTO CONFIGURATION PROTOCOL WORKFLOW

Page 20: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@

20 Copyright © 2009 Juniper Networks, Inc. www.juniper.net

JUNIPER SRX AS SEGW:- INVESTMENT PROTECTION AND FUTURE SCALE

•Up to 8x jump in scale

• Headroom for future growth

•2x-3x boost in performance

•Redundant components

•Stateful HA

•In-service SW upgrade

•In-service HW upgrade

•Backward compatible - Low upgrade cost

•Operational Simplicity – No change to security config

Investment Protection

Non-stop services

Scale Performance

Hardware Refresh:- Key points

Next-generation SPC

Page 21: Optimised redundancy for Security Gateway deployments · Optimised redundancy for Security Gateway deployments . ... Association setup (INIT exchange) + ... CA-IP@ / SEG-IP@ / NetAct-IP@