reliable and scalable internet telephony
DESCRIPTION
Reliable and Scalable Internet Telephony. Kundan Singh and Henning Schulzrinne Internet Real Time Lab – Internal Talk Sept 24, 2004. database (SCP) 10 million customers 2 million lookups/hour. database (SCP) for freephone, calling card, …. - PowerPoint PPT PresentationTRANSCRIPT
Reliable and Scalable Reliable and Scalable Internet TelephonyInternet Telephony
Kundan Singh and Henning SchulzrinneInternet Real Time Lab – Internal Talk
Sept 24, 2004
2
Telephone reliabilityTelephone reliability(PSTN: Public Switched Telephone Network)(PSTN: Public Switched Telephone Network)
“bearer” network telephone switch(SSP)
database (SCP)for freephone, calling card, …
signaling network(SS7)
signaling router(STP)
local telephone switch(class 5 switch)10,000 customers20,000 calls/hour
database (SCP)10 million customers2 million lookups/hour
signaling router (STP)1 million customers1.5 million calls/hour
regional telephone switch(class 4 switch)100,000 customers150,000 calls/hour
3
DB
Internet telephonyInternet telephony(SIP: Session Initiation Protocol)(SIP: Session Initiation Protocol)
[email protected]@yahoo.com yahoo.com example.comREGISTER
INVITEINVITE 192.1.2.4129.1.2.3
DNS
4
SIP network architectureSIP network architectureScalability requirement depends on roleScalability requirement depends on role
GW
GW
MG
MG
MG
IP network
PSTN
SIP/PSTN SIP/MGC SIP/MGCCarrier network
ISP
ISP Cybercafe
IP
PSTNGW
PBX
IP phones
PSTN phones T1 PRI/BRI
5
Reliability and scalabilityReliability and scalabilityfor for call routing, registrationcall routing, registration, conferencing, , conferencing, voicemailsvoicemails
Requirements Reliable
Mean Time Between Failures (MTBF), Mean Time To Recover (MTTR)
Scalable Registration rate, call rate, #requests/s
Proposed solutions Server redundancy
Apply existing web-redundancy designs Evaluate quantitatively (future work)
Peer-to-peer Novel P2P-SIP architecture Evaluate quantitatively (future work)
6
Server redundancyServer redundancyThe problem: failure or overloadThe problem: failure or overload
REGISTERINVITE
7
Server redundancyServer redundancyReplicate registration or search on callReplicate registration or search on call
REGISTERINVITE REGISTERINVITE
8
Server redundancyServer redundancyKnown techniquesKnown techniques
Client-based Cisco phones: primary and backup
proxy DNS
NAPTR, SRV IP address takeover Database redundancy . . .
9
High availabilityHigh availabilityFailover in CINEMAFailover in CINEMA
Slave/master
Webscripts
D2
P2
Master/slave
Webscripts
D1
P1
phone.cs.columbia.edu sip2.cs.columbia.eduREGISTER
proxy1 = phone.csbackup = sip2.cs
_sip._udp SRV 0 0 5060 phone.cs.columbia.edu SRV 1 0 5060 sip2.cs.columbia.edu
replication
10
High availabilityHigh availabilityTime to recoverTime to recover
Client re-sends INVITE to P2 Immediately on ICMP error Or after 10s otherwise
sipd has in-memory cache Refresh registration much before expiry Registrations are additive
Measurement of recovery time Optimal #servers
11
ScalabilityScalabilityLoad sharing: redundant proxies and databasesLoad sharing: redundant proxies and databases
REGISTER Write to D1 & D2
INVITE Read from D1 or
D2 Database write/
synchronization traffic becomes bottleneck
D1
D2
P1
P2
P3
REGISTER
INVITE
12
ScalabilityScalabilityLoad sharing: divide the user spaceLoad sharing: divide the user space
Proxy and database on the same host
Stateless proxy can become overloaded
Hashing Static vs dynamic
D1
D2
P1
P2
P3
D3
a-h
i-q
r-z
13
ScalabilityScalabilityComparison of the two designsComparison of the two designs
((tr/D)+1)TN= (A/D) + B
((tr+1)/D)TN= (A/D) + (B/D)
D1
D2
P1
P2
P3
D1
D2
P1
P2
P3D2
a-h
i-q
r-z
Total time per DB
D = number of database serversN = number of writes (REGISTER)r = #reads/#writes = (INV+REG)/REG ~ 2T = write latencyt = read latency/write latency
14
Reliability and scalabilityReliability and scalabilityTwo stage architecture for CINEMATwo stage architecture for CINEMA
Master
Slave
Master
Slave
sip:[email protected]:[email protected]
s1
s2
s3
a1
a2
b1
b2
a*@example.com
b*@example.com
example.com_sip._udp SRV 0 0 s1.example.com SRV 0 0 s2.example.com SRV 0 0 s3.example.com SRV 1 0 ex.backup.com
a.example.com_sip._udp SRV 0 0 a1.example.com SRV 1 0 a2.example.com
b.example.com_sip._udp SRV 0 0 b1.example.com SRV 1 0 b2.example.com
Request-rate = f(#stateless, #groups)
Bottleneck: CPU, memory, bandwidth?Failover latency: ?
15
Server-based vs peer-to-peerServer-based vs peer-to-peer Server-based
Cost: maintenance, configuration Central points of failures Controlled infrastructure (e.g., DNS)
Peer-to-peer Robust: no central dependency Self organizing, no configuration Scalability ?
C
C
C
C
C
S
P
P
P
P
P
16
Related work: Skype Related work: Skype From the KaZaA communityFrom the KaZaA community
Host cache of some super nodes Bootstrap IP addresses Auto-detect NAT/firewall settings
STUN and TURN Protocol among super nodes – ?? Allows searching a user (e.g., kun*) History of known buddies All communication is encrypted Promote to super node
Based on availability, capacity Conferencing
P
P
PP
PP
P P
P
P P P
17
We propose: P2P-SIPWe propose: P2P-SIP Unlike server-based SIP architecture Unlike proprietary Skype architecture
Robust and efficient lookup using DHT Interoperability
DHT algorithm uses SIP communication Hybrid architecture
Lookup in SIP+P2P Unlike file-sharing applications
Data storage, caching, delay, reliability Disadvantages
Lookup delay and security
18
P2P-SIPP2P-SIPBackground: DHT (Chord)Background: DHT (Chord)
Identifier circle Keys assigned to successor Evenly distributed keys and nodes Finger table: logN
ith finger points to first node that succeeds n by at least 2i-1
Stabilization for join/leave
18
14
21
3238
58
47
10
2430
54
38
42
Key node
8+1 = 9 148+2 = 10
14
8+4 = 12
14
8+8 = 16
21
8+16=24
32
8+32=40
42
19
P2P-SIPP2P-SIPDesign AlternativesDesign Alternatives
65a1fc
d13da3
d4213f
d462bad467c4
d471f1
d46a1c
Route(d46a1c)
18
14
21
3238
58
47
10
24 30
54
38
42
Use DHT in server farm
Use DHT for all clients; But some are resource limited
Use DHT among super-nodes
1. Hierarchy2. Dynamically adapt
servers
clients
1
10
2430
54
38
20
P2P-SIPP2P-SIPNode architecture: registrar, proxy, user agentNode architecture: registrar, proxy, user agent
DHT communication using SIP REGISTER Known node: sip:[email protected] Unknown node: sip:[email protected] User: sip:[email protected]
User interface (buddy list, etc.)
SIPICE RTP/RTCPCodecs
Audio devicesDHT (Chord)
On startup
Discover
User location
Multicast REGPeer found/Detect NAT
REG REG, INVITE,MESSAGE
Signup,Find buddies
JoinFind
Leave
On resetSignout,transfer
IM,call
21
P2P-SIPP2P-SIPNode StartupNode Startup
SIP REGISTER with SIP registrar
DHT Discover peers: multicast REGISTER Join DHT using node-key=Hash(ip) REGISTER with DHT using user-
key=Hash([email protected]) Dialing out
Call, instant message, etc. INVITE sip:[email protected] MESSAGE sip:[email protected] Last seen, SIP NAPTR/SRV, DHT
REGISTERDB
sipd
Detect peers
columbia.edu
14
32
5812
42REGISTER alice=42
REGISTER bob=12
22
P2P-SIPP2P-SIPNode LeavesNode Leaves
Graceful leave Un-REGISTER Transfer registrations
Failure Attached nodes detect
and re-REGISTER New REGISTER goes to
new super-nodes Super-nodes adjust DHT
accordingly
DHT
REGISTER key=42
OPTIONS
42
42
REGISTER
23
P2P-SIPP2P-SIPImplementationImplementation
sippeer: C++, Unix (Linux), Chord Node join and form
the DHT Node failure is
detected and DHT updated
Registrations transferred on node shutdown
Co-located sipc can use sippeer service
1
11
9
30
26
31
15
29
25
19
31
26
24
P2P-SIPP2P-SIPEvaluationEvaluation #super-nodes needed depends
on Registration refresh rate, replication Join/leave rate, uptime Call arrival rate CPU, memory, bandwidth limits
Other metrics Call setup latency Recovery time after super-node
failure
25
P2P-SIPP2P-SIPAdvanced services and open issuesAdvanced services and open issues
Offline messages INVITE or MESSAGE fails => Responsible
node stores voicemail, instant message. Conferencing
Mixer, full mesh, multicast Open issues
P2P reputation system Motivation to become super node Security (SPAM, DOS, spy, …) . . .
26
Server-based vs peer-to-Server-based vs peer-to-peerpeer
Reliability, failover latency
DNS-based. Depends on client retry timeout, DB replication latency, registration refresh interval
DHT self organization and periodic registration refresh. Depends on client timeout, registration refresh interval.
Scalability, number of users
Depends on number of servers in the two stages.
Depends on refresh rate, join/leave rate, uptime
Call setup latency
One or two steps. O(log(N)) steps.
Security TLS, digest authentication, S/MIME
Additionally needs a reputation system, working around spy nodes
Maintenance, configuration
Administrator: DNS, database, middle-box
Automatic: one time bootstrap node addresses
PSTN interoperability
Gateways, TRIP, ENUM Interact with server-based infrastructure or co-locate peer node with the gateway
27
SummarySummary Motivation
PSTN is reliable and scalable Can IP telephony do better?
Server-based DNS, stateless, DB replication, two
stage Peer-to-peer
SIP, DHT, soft state, self organizing
28
InternalTelephoneExtn: 7040
SIP/PSTN Gateway
Department PBX
Web based configuration
Web server
Telephoneswitch
SQLdatabase
sipd:Proxy, redirect, Registrar server
NetMeeting
H.323
rtspd: media server
sipum: Unified messaging
Quicktime
RTSP clients
RTSP
713x
CINEMA servers
sipconf: Conference server
siph323: SIP-H.323 translator
Local/long distance1-212-5551212
PSTN
Beyond proxy/registrarBeyond proxy/registrarCINEMA: Columbia InterNet Extensible Multimedia CINEMA: Columbia InterNet Extensible Multimedia ArchitectureArchitecture
SIP
VXML
vxmlcgi
29
Communication to Communication to collaborationcollaboration Synchronous (tightly coupled)
Video conference, IM, screen sharing, … Asynchronous (loosely coupled)
File sharing, message board, … Messaging and notifications
Personalized view Per-user calendar, access control, address
bookGoal: provide personalized access, alternate between synchronous and asynchronous communication, and access from different devices and clients.