mobicents summit 2012 - vladimir ralev - mobicents load balancer and high availability
TRANSCRIPT
HA and SIP Load Balancing
Design documentationRevised: Sep, 2012
Important!
● HA here doesn't imply replication.● HA quality units
○ one nine, two nines, three nines and so on○ can claim them without any replication
■ it's not even cheating● It is sufficient to partition your calls onto a set of machines
○ The SIP protocol doesn't mind address changes○ Can go into production against any production-ready
phone
Deployment Scenarios Overview
● Pure IP load-balancing○ No SIP-based affinity, only IP-based affinity○ Might violate some SIP rules○ Not recommended
● Standalone SIP-based load-balancing○ Provides SIP-based affinity○ Provides SIP protocol compliance
● Distributed load balancing○ IP load balancer in front○ Multiple SIP load balancers at the back end○ Eliminates the Single Point of Failure problem○ Scales better when SIP LB capacity is exceeded
● Cooperative load balancing with HTTP (integrated, mod_jk and mod_cluster)
Deployment Scenarios Overview
● Pure IP load-balancing○ No SIP-based affinity, only IP-based affinity○ Might violate some SIP rules○ Not recommended
● Standalone SIP-based load-balancing○ Provides SIP-based affinity○ Provides SIP protocol compliance
● Distributed load balancing○ IP load balancer in front○ Multiple SIP load balancers at the back end○ Eliminates the Single Point of Failure problem○ Scales better when SIP LB capacity is exceeded
● Cooperative load balancing with HTTP (integrated, mod_jk and mod_cluster)
NOT ENTIRELY BAD
ALMOST USELESS
Deployment Scenarios Overview
●New this year!● DNS load balancing
○ No SIP message affinity○ Affinity is temporary per UA by TTL○ No built-in heartbeats with the SIP servers - must do it
on your own, with a module or by the UA○ You better try to keep all your IPs up
■ Use IP takeover for fast recovery
Two kinds of DNS load balancing
● Dynamic record DNS○ Round-Robin DNS
● DNS SRV○ Built-in load balancing with statistical weights○ Requires support from the SIP phones (very common)
_service._proto.name TTL class SRV priority weight port target
For instance:_sip._tcp.example.com. 86400 IN SRV 0 5 5060 sipserver.example.com._sip._tcp.example.com. 86400 IN SRV 0 5 5060 sipserver.example.com.
Deployment Scenarios Overview
●New this year!● Dedicated IP load balancing by SIP headers
○ Equivalent to DNS SRV without the need of phone support
Pure IP load balancing
● Low priority● Not very useful - just to clarify theoretically
○ Routes based on IP/UDP/TCP (Layer 3/4) fields - IP address, source or destination port, etc
○ Can not make routing decisions based on SIP messages (BigIP F5, etc are exception)
○ SIP AS-initiated requests and responses hard to route○ IP LB is not a SIP entity.
Pure IP load balancing (cont'ed)
Retransmissions● UDP
○ Retransmissions will get sprayed randomly○ Forks and race conditions
● TCP○ Works fine
■ Unless the TCP connection fails and the new connection ends up on new node
Via headers in IP load balancing
● Via headers should contain the original node address, not the address of the IP load balancer. Otherwise the SIP phones will follow the SIP spec and route responses through the IP load balancer, creating additional traffic.
● Via headers are per-transaction. Mid-transaction fail-over is not supported although we are able to recover from it with retransmissions.
● Before JBCP 1.2.6 and MSS 1.3 the Via headers were with IP balancer addresses.
● Via headers must be IP balancer headers only if the load balancer is capable of Via branch affinity
● The SIP LB is a stateless SIP proxy● Responses and subsequent requests will follow the same
path - UDP follows Vias and TCP/TLS follows the established connections as required per SIP spec.
● SIP LB can make routing decisions based on SIP headers or content. It parses the SIP messages.
● The Standalone SIP LB is a Single Point of Failure
Standalone Mobicents SIP Load Balancer
Note: In case of SIP AS or SIP LB failure in in step 4 the response will be lost.
Standalone Load Balancer (proxy)
Distributed Load Balancer
● IP Load Balancer in front of the SIP LB● The SIP LB will advertise the IP LB address instead of its
own● Support for multiple SIP LBs. The IP LB will distribute the
load among the SIP LBs● Support for bidirectional load-balancing
Distributed Load Balancer
● The SIP LBs may maintain shared state when it is needed for certain load-balancing algorithms
● Certain algorithms don't need shared state (like consistent hash)
Bidirectional Distributed SIP LB● The SIP Application Servers not only receive client
requests, but can initiate SIP requests and transactions on their own
Solving the case for both directions in the Distributed LBWhen requests come from SIP phones(clients) it is clear we should use an IP load balancer in front of the external ports of the SIP LBs. However, when the Application Servers are initiating requests there are two options:1. The Application Server is always aware which SIP LBs are
alive, so if one dies the AS will pick another one on it's own. No IP load balancer is needed. This method works in terms of heartbeat resolution.
2. Put an IP load balancer between the Application Servers and the internal ports of the SIP load balancers as shown in the next slide.
Distributed Load Balancer (2 IP LBs)BIDIRECTIONAL
Bidirectional Distributed SIP LB Deployment Scenarios
● The SIP LB can be configured with separate ports for inbound and outbound messages (simply specify the internalPort property in the SIP LB configuration file)
● Two IP LBs - use separate IP LBs for request that come from clients and requests that come from servers
● One IP LB - use the same IP LB for both types of requests. The problem with this one is that direction analysis must be done using Via header. If the SIP AS is a non-Record-Routing proxy application then non-initial requests initiated by the callee will bypass the SIP AS and there will be no SIP AS Via header to give a hint that the request comes from the callee and should no go to the SIP AS.
Distributed Load Balancer (2 IP LBs)BIDIRECTIONAL
Distributed Load Balancer (or 1 IP LB)BIDIRECTIONAL
Converged and Cooperative Load
BalancingSIP, HTTP and other protocols
Integrated HTTP forwarding● The Mobicents Load Balancer supports HTTP forwarding● SIP and HTTP can use common consistent hash affinity key
to group and fail-over together SIP and HTTP sessions● Example
○ SIP URI sip:[email protected]○ HTTP URL http://host.com/page?appsession=app1○ app1 is sip user in the URI and appsession parameter○ app1 will be hashed against the AS nodes for both and
will cause the SIP and the HTTP request to always stick to the same node
● When there is no key, the balancer behaves like mod_jk and analyses jvmRoute component to selet node
● Alternatively, you can use mod_jk and mod_cluster
Distributed Load Balancer (3 IP LBs)BIDIRECTIONAL
HTTP and SIP consistent hashing
Session affinity with mod_jk
mod_jk support
mod_jk and mod_cluster can be manipulated by changing the jsessionid cookie to reroute requests to a node of choice!Additionally mod_cluster can be controlled by the MCCP protcol
mod_jk hints
Rolling Upgrades from the LB
● Each node is bootstrapped with a version system property○ Each node is started with -Dversion=1
● The version is advertised in the SIP LB heartbeat● The SIP LB has awareness of the groups with particular
version and can detect conditions that jeopardize the opertaions
○ More than two versions○ Node count dangerously low○ Stalled upgrade with idle nodes
Cluster groups
Divide the cluster into subgroups that failover only internally
subclusterMap=( 192.168.1.1, 192.168.1.2 ) ( 10.10.10.10, 20.20.20.20, 30.30.30.30)
SIP LB Internal Architecture
What is needed for the SIP LB to support the deployment scenarios?
Quick SIP LB functional spec
● Dumb SIP parsing - as dumb as possible with JSIP stateless.
● Pluggable routing decision algorithms. Pass the message to the algorithm and it will return the node where to send the message.
● Shared store - use JBoss Cache 3.2.1● No need to translate between TCP and UDP. The SIP AS
will be able to handle both anyway.● Support separate SIP ports for inbound and outbound
messages (add an internalPort property in the config file)● Support for single SIP port (delete the internalPort
property)● RMI and JGroups heartbeats (right now RMI)● Protocol to allow AS or other entity to give instructions to the
SIP LB.(like the mod_cluster protocol)
Example Algorithms
Sample pluggable algorithms for the SIP load balancer
#1 - Call-ID affinity workflow
#2 - Consistent-hash on Call-ID
#3- Persistent Consistent Hash
Balancer Algorithm Interface
public interface BalancerAlgorithm { SIPNode processExternalRequest(Request request); SIPNode processHttpRequest(HttpRequest request); void processInternalRequest(Request request); void processExternalResponse(Response response); void processInternalResponse(Response response); void nodeRemoved(SIPNode node); void nodeAdded(SIPNode node); Properties getProperties(); void setProperties(Properties properties); BalancerContext getBalancerContext(); void jvmRouteSwitchover(String fromJvmRoute, String toJvmRoute); void init(); void stop(); void assignToNode(String id, SIPNode node);}
● Click here to see the full interface with documentatione/browse/trunk/tool
● Click here to see one example algorithm Call-ID affinity with association map
The SIP LB configuration file# The binding address of the load balancerhost=127.0.0.1
# The RMI port used for heartbeat signalsrmiRegistryPort=2000
# The SIP port used where client should connectexternalPort=5060 # The SIP port from where servers will receive messages# Delete if you want to use only one port for both inbound and outbound)internalPort=5065 # The HTTP port for HTTP forwarding.# If you like to have integrated HTTP load balancer, this is the entry pointhttpPort=8080
#Specify UDP or TCP (for now both must be the same) internalTransport=UDPexternalTransport=UDP
# If you are using IP load balancer, put the IP address and port hereexternalIpLoadBalancerAddress=127.0.0.1externalIpLoadBalancerPort=111 # Requests initited from the App Servers can route to this address (if you are using 2 IP load balancers for bidirectional SIP LB)internalIpLoadBalancerAddress=127.0.0.1internalIpLoadBalancerPort=111
# Designate extra IP addresses as serer nodes#extraServerNodes=222.221.21.12:21,45.6.6.7:9003,33.5.6.7,33.9.9.2
...the SIP LB configuration file# Call-ID affinity algortihm settings. This algorithm is the default. No need to uncomment it.#algorithmClass=org.mobicents.tools.sip.balancer.CallIDAffinityBalancerAlgorithm# This property specifies how much time to keep an association before being evitcted.# It is needed to avoid memory leaks on dead calls. The time is in seconds.#callIdAffinityMaxTimeInCache=500
# Uncomment to enable the consistent hash based on Call-ID algorithm. #algorithmClass=org.mobicents.tools.sip.balancer.HeaderConsistentHashBalancerAlgorithm# This property is not required, it defaults to Call-ID if not set, cna be "from.user" or "to.user" when you want the SIP URI username#sipHeaderAffinityKey=Call-ID#specify the GET HTTP parameter to be used as hash key #httpAffinityKey=appsession # Uncomment to enable the persistent consistent hash based on Call-ID algorithm. #algorithmClass=org.mobicents.tools.sip.balancer.PersistentConsistentHashBalancerAlgorithm# This property is not required, it defaults to Call-ID if not set#sipHeaderAffinityKey=Call-ID#specify the GET HTTP parameter to be used as hash key #httpAffinityKey=appsession #This is the JBoss Cache 3.1 configuration file (with jgroups), if not specified it will use default#persistentConsistentHashCacheConfiguration=/home/config.xml # Call-ID affinity algortihm settings. This algorithm is the default. No need to uncomment it.#algorithmClass=org.mobicents.tools.sip.balancer.CallIDAffinityBalancerAlgorithm# This property specifies how much time to keep an association before being evitcted.# It is needed to avoid memory leaks on dead calls. The time is in seconds.#callIdAffinityMaxTimeInCache=500
...the SIP LB configuration file# Uncomment to enable the consistent hash based on Call-ID algorithm. #algorithmClass=org.mobicents.tools.sip.balancer.HeaderConsistentHashBalancerAlgorithm# This property is not required, it defaults to Call-ID if not set, cna be "from.user" or "to.user" when you want the SIP URI username#sipHeaderAffinityKey=Call-ID# and specify the GET HTTP parameter to be used as hash key #httpAffinityKey=appsession # Uncomment to enable the persistent consistent hash based on Call-ID algorithm. #algorithmClass=org.mobicents.tools.sip.balancer.PersistentConsistentHashBalancerAlgorithm# This property is not required, it defaults to Call-ID if not set#sipHeaderAffinityKey=Call-ID# and specify the GET HTTP parameter to be used as hash key #httpAffinityKey=appsession #This is the JBoss Cache 3.1 configuration file (with jgroups), if not specified it will use default#persistentConsistentHashCacheConfiguration=/home/config.xml
#NEW PROPERTIES IN MSS 1.2#If a node doesnt check in within that time, it is considered dead nodeTimeout=5100#The consistency of the above condition is checked every heartbeatInterval millisecondsheartbeatInterval=5000
#JSIP stack configuration.....
Example Configurations
How to configure common scenarios?
Note: All distributed configurations must use a consistent hash routing algorithm.
Example with Call-ID header hashing:# Uncomment to enable the consistent hash based on Call-ID algorithm. algorithmClass=org.mobicents.tools.sip.balancer.HeaderConsistentHashBalancerAlgorithm# This property is not required, it defaults to Call-ID if not setsipHeaderAffinityKey=Call-ID#specify the GET HTTP parameter to be used as hash key httpAffinityKey=appsession
Two SIP LBs with client pools
Request coming from phone. If a load balancer fails then a phone pool will experience
outage.
Two SIP LBs with Internal and External IP LBs in the same
network with sample configurations.
Request coming from phone.
Two SIP LBs with Internal and External IP LBs in the same
network with sample configurations.
Request coming from Application Server.
Two SIP LBs with Internal and External IP LBs in the same
network with sample configurations.
Full picture
IP load balancer problems
SIP LB Topics 2011
Large scale tests
● Simulate the IP load balancing and rewriting the addresses● Run many nodes, many load balancers● Execute the test as part of a larger scenario● Simulate application server
○ Heartbeat○ Handling and initiating requests
● TLS
Cluster groups
Divide the cluster into subgroups that failover only internally
subclusterMap=( 192.168.1.1, 192.168.1.2 ) ( 10.10.10.10, 20.20.20.20, 30.30.30.30)
Worst-case testing
Load balancing without affinity
Rolling Upgrades from the LB
● Each node is bootstrapped with a version system property● The version is advertised in the SIP LB heartbeat● The SIP LB has awareness of the groups with particular
version and can detect conditions that jeopardize the opertaions
○ More than two versions○ Node count dangerously low○ Stalled upgrade with idle nodes
Performance testing
10K requests/s
Moving the load balancing on the server side
● Eliminates the worst case● Some requests will go deeper in the pipeline costing more● Can't change the Route headers● Delay the load balancing decision as much as possible
NIO in the SIP LB
BIO is limited at 2500-10000 concurrent sockets on servers