scaling sip servers

Scaling SIP Scaling SIP ServersServers

Sankaran NarayananJoint work with CINEMA team

IRT Group Meeting – April 17, 2002

AgendaAgenda Introduction Issues in scaling Facets of sipd architecture Some results Conclusion and Future Work

Introduction – SIP serversIntroduction – SIP servers SIP Signaling – Proxy,

redirect Proxies

Call routing by contact location

UDP/TCP/TLS Stateful or stateless Programmable scripts

User location – Registrars

SQLdatabase

What is scale ?What is scale ? Large call volumes,

commodity hardware [Schu0012:Industrial]

Response times (mean, deviation), Turn around time

Goals Delay budget [SIPstone]

R2 < 2 s R1 < 500 ms

Class-5 switches handle > 750K BHCA

REGISTER

200 OK

INVITE

180200

ACKACK

Limits to scalingLimits to scaling Not CPU bound

Network I/O – blocking Wait for responses Latency: Contact, DNS lookups

OS resource limits Open files (<= 1024 on Unix) LWP’s (Solaris) vs. user-kernel threads

(Linux, Windows) Try not to…

Customize and recompile OS (parts) server into kernel (khttpd, AFPA, …)

The problemThe problem Scaling CPU-bound jobs (throughput=1/delay)

Hardware: CPU speed, RAM, … Software: better OS, scheduler, … Algorithm: optimize protocol processing

Blocking (Network, Disk I/O) is expensive Hypothesis

I/O-bound CPU-bound; reduce blocking Optimized resource usage – stability at high

Facets of sipd architectureFacets of sipd architecture Blocking Process models Socket management Protocol processing

BlockingBlocking Mutex, event (socket,

timeout), fread Queue builds up

Potentially high variability Tandem queue system

Easy to fix Non-blocking calls (event

driven, later!) Move queue to different

thread (lazy logger)

Logger { lock; write; unlock;}

Blocking (2)Blocking (2) Call routing involves ( 1)

contact lookups 10 ms per query (approx)

Cache Works well for sipd style

servers Fetch-on-demand with

replacement (harder) Loading entire database is easy

need for refresh – long lived servers.

Potentially useful for DNS SRV lookups (?)

SQLdatabase

PeriodicRefresh

< 1 ms

REGISTER performanceREGISTER performanceSingle CPU Sun Ultra10

Response time is constant for Cache (FastSQL)

Process models (1)Process models (1)One thread per

request Doesn’t scale

Too many threads over a short timescale

Stateless proxy: 2-4 threads per transaction

High load affects throughput

IncomingRequestsR1-4

Process models (2)Process models (2)Thread pool + Queue Thread overhead less;

more useful processing Overload management

drop requests over responses, drop tail

Not enough if holding time is high

Each request holds (blocks) a thread

IncomingRequestsR1-4

Fixed number of threads

Stateless proxy (Solaris)Stateless proxy (Solaris)

Turnaround time is almost constant for stateless proxy

• The sudden increase in response time - client problem

• UDP losses on Ultra10 @ (120 * 6 * 500 * 8) bps

Stateless proxy (Linux)Stateless proxy (Linux)

Request turnaround time breaks downResponse turnaround time is constantEffect of high holding times and thread schedulingHow to set queue size – investigate?

Queue evolution for sipdQueue evolution for sipd

Number of requests (y-axis) waiting in the queue for a free thread on Solaris (left) and Linux (right) over a period of up-time (x-axis).

Process models (3)Process models (3) Blocking thread model needs “too

many” threads Stateful transaction stays for 30 s Return thread to free pool instead of

blocking Event-driven architectures

State transition triggered by a global event scheduler

OnIncoming1xx(), OnInviteTimeout(), … SIP-CGI: pre-forked multiple processes

Socket managementSocket management Problem: open sockets limit (1024),

“liveness” detection, retransmission One socket per transaction does not

scale Global socket if downstream server is

alive, soft state – works for UDP Hard for TCP/TLS – connections Worse for Java servers – no select, poll

Optimizing protocol Optimizing protocol processingprocessing Not too useful if CPU is not the

bottleneck Text protocol - parsing, formatting

overheads Order of headers matter (Via) Other optimizations (parse-on-

demand, date formatting). . .

ConclusionConclusion Unlike web servers: can be stateful, less

disk I/O, lesser impact of TCP stack/behavior, …

Pros: UDP, Stateless routing, Load-balancing using DNS, …

Challenges: scaling state machine, Towards 2.5M BHCA (3600 messages/s)

Event driven architecture (SEDA?) Resource management (file limits, threads) Tuning operating system (scheduler, …)

Future workFuture work Stateful proxy performance

Evaluate event driven architecture Effect of request forking (> 1

contacts) on server behavior Programmable scripts

Queue management and overload control

Other types of servers (conference servers, media servers, etc.),

ReferencesReferences CINEMA web page.

http://www.cs.columbia.edu/IRT/cinema H. Schulzrinne. “Industrial strength

internet telephony,” Presentation at 6th SIP bakeoff, Dec. 2000.

H. Schulzrinne et. al. “SIPstone – Benchmarking SIP server performance,” CS Technical report, Columbia University.

scaling sip servers

free thread

time xaxis

sreturn thread

thread schedulinghow

ultra10 response time

blocking thread model

cpu speed

thread pool queuethread

Documents

virtualization for the win! scaling electronic sports...

a system design for elastically scaling transaction ... ·...

please read - logmeincdn.azureedge.net · fully compatible...

crossbow: scaling deep learning with small batch sizes on...

scalability guidelines for software as a...

rapids scaling on dell emc poweredge servers › ... ›...

scaling up and out your virtualized sql servers

sip enabled infrastructure implements voip based pbx... ·...

performance scaling of cassandra on high-thread count...

waste water treatment in the cloud - suomen ... opc ua...

beaweblogic sip server - oracle · 2008. 5. 28. · sip...

sip-as. - elsevier › samplechapters › ... ·...

dedicated servers in gears of war 3 scaling to millions of...

scaling xen within rackspace cloud servers

smartvoice sip trunking - tpx communicationssip sip platform...

crossbow: scaling deep learning on multi-gpu servers

department of veterans affairs - office of .... sip sip is a...

scaling data servers via cooperative caching

sip: session initiation protocol · sip also provides a...

load testing of sip and webrtc...