rtp audio and video for the internet
TRANSCRIPT
-
8/17/2019 RTP Audio and Video for the Internet
1/208
[ Te am LiB ]
• Tab le o f C o nt ent s
R T P: A udio a nd V ide o f o r t he I nt e rne t
By Colin Perkins
Publisher: Addison Wesley
Pub Date: June 12, 2003
I S BN : 0 - 6 7 2 - 3 2 2 49 - 8P ages : 4 3 2
The Real-time Transport Protocol (RTP) provides a framework for delivery of audio and video across IP networks
and unprecedented quality and reliability. In RTP: Audio and Video for the Internet, Colin Perkins, a leader of the
RTP standardization process in the IETF, offers readers detailed technical guidance for designing, implementing, and
managing any RTP- based system.
By bringing together crucial information that was previously scattered or difficult to find, Perkins has created anincr edib le r es our ce t hat enab les p r o fes s io nals t o lever age R TP ' s b enefit s in a w id e r ange o f V oice- o ver I P (V olP ) and
streaming media applications. He demonstrates how RTP supports audio/video transmission in IP networks, and
shares strategies for maximizing performance, robustness, security, and p rivacy.
Comprehensive, exceptionally clear, and replete with examples, this book is the definitive RTP reference for every
audio/video application designer, de veloper, rese archer, and ad ministrator.
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
http://www.informit.com/safari/author_bio.asp@ISBN=0672322498http://www.informit.com/safari/author_bio.asp@ISBN=0672322498
-
8/17/2019 RTP Audio and Video for the Internet
2/208
[ Te am LiB ]
• Tab le o f C o nt ent s
R T P: A udio a nd V ide o f o r t he I nt e rne t
By Colin Perkins
Publisher: Addison Wesley
Pub Date: June 12, 2003
I S BN : 0 - 6 7 2 - 3 2 2 49 - 8P ages : 4 3 2
Copyright
Preface
Introduction
Organization of the Book
Intended Audience
AcknowledgmentsPart I. Introduction to Networked Multimedia
Chapter 1. An Introduction to RTP
A Brief History of Audio/Video Networking
A S na p sho t o f RTP
Related Standards
Overview of an RTP Implementation
Summary
Chapter 2. Voice and Video Communication Over Packet Networks
TC P / IP and t he O S I R efer ence M o d el
Performance Characteristics of an IP Network
Measuring IP Network Performance
Effects of Transport Protocols
Requirements for Audio/Video Transport in Packet Networks
Summary
P a rt I I. M ed ia Tr a ns p or t U sing RTP
Chapter 3. The Real-Time Transport Protocol
Fundamental Design Philosophies of RTP
Standard Elements of RTPRelated Standards
Future Standards Development
Summary
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
http://www.informit.com/safari/author_bio.asp@ISBN=0672322498http://www.informit.com/safari/author_bio.asp@ISBN=0672322498
-
8/17/2019 RTP Audio and Video for the Internet
3/208
Chapter 4. RTP Data Transfer Protocol
RTP Sessions
The RT P Dat a Tr ans fer P ack et
Packet Validation
Translators and Mixers
Summary
Chapter 5. RTP Control Protocol
Components of RTCPTransport of RTCP Packets
RTCP Packet Formats
Security and Privacy
Packet Validation
Participant Database
Timing Rules
Summary
Chapter 6. Media Capture, Playout, and Timing
Be ha vio r o f a S e nd e r
Media Capture and Compression
Generating RTP Packets
B ehavio r o f a R eceiver
Packet Reception
The Playout Buffer
Adapting the Playout Point
Decoding, Mixing, and Playout
Summary
Chapter 7. Lip Synchronization
Sender Behavior Receiver Behavior
Synchronization Accuracy
Summary
Part III. Robustness
Chapter 8. Error Concealment
Techniques for Audio Loss Concealment
Techniques for Video Loss Concealment
Interleaving
Summary
Chapter 9. Error Correction
Forward Error Correction
Channel Coding
Retransmission
Implementation Considerations
Summary
Chapter 10. Congestion Control
The N eed fo r C o nges tio n C o nt ro l
Congestion Control on the InternetImplications for Multimedia
Congestion Control for Multimedia
Summary
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
4/208
Part IV. Advanced Topics
Chapter 11. Header Compression
Introductory Concepts
Compressed RTP
Robust Header Compression
Considerations for RTP Applications
Summary
Chapter 12. Multiplexing and Tunneling
The Motivation for Multiplexing
Tunneling Multiplexed Compresse d RTP
Other Approaches to Multiplexing
Summary
Chapter 13. Security Considerations
Privacy
Confidentiality
Authentication
Replay ProtectionDenial of Service
Mixers and Translators
Active Content
Other Considerations
Summary
References
IETF RFC Standards
IETF Internet-Drafts
Other Standards
C o nfer ence and J o ur nal P ap ers
Books
W e b S it es
Other References
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
5/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
6/208
Copyright
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks.
W her e t ho s e d es ignat io ns ap p ear in t his b o o k, an d A dd is o n- W es ley w as awar e o f a t rad emar k claim, t he
designations have been printed with initial capital letters or in all capitals.
T he aut ho r and p ub lis her have t aken car e in t he p r epar atio n o f t his b o o k, b u t mak e no exp r ess ed o r imp lied w arr ant yof any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential
d amages in co nnect io n w it h o r ar is ing o ut o f t he us e o f t he info r mat io n o r p r ogr ams co nt ained her ein.
The p ub lis her o ffer s d is co unt s o n t his b o o k w hen o r der ed in q uant it y fo r b ulk p ur chas es and s p ecial s ales . F o r mo r e
information, please contact:
U.S. Corporate and Government Sales
( 80 0 ) 3 8 2- 3 4 1 9
F o r s a le s o ut sid e o f t he U .S . , p le a se c o nt ac t:
International Sales
( 31 7 ) 5 8 1- 3 7 9 3
V is it A dd is o n- W es ley o n t he W eb : www.awprofessional.com
Library of Congress Cataloging-in-Publication Data
L C C N : 2 0 0 10 8 9 23 4
C o p y right © 2 0 0 3 b y P ears o n E ducatio n, I nc.
A ll r ight s r eser ved . N o p ar t o f t his p ub licatio n may b e r ep ro d uced , s t o red i n a r etr ieval s ys t em, o r t rans mit ted , i n any
form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior consent of
the publisher. Printed in the United States of America. Published simultaneously in Canada.
F o r info r mat io n o n o b taining p er mis s io n fo r us e o f mat er ial fr o m t his w or k , p leas e s ub mit a w rit ten r eq ues t t o :
Pearson Education, Inc.
Rights and Contracts Department
75 Arlington Street, Suite 300
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
mailto:[email protected]:[email protected]://www.awprofessional.com/default.htmhttp://www.awprofessional.com/default.htmmailto:[email protected]:[email protected]
-
8/17/2019 RTP Audio and Video for the Internet
7/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
8/208
[ Te am LiB ]
Pr e f ac e
Introduction
Organization of the Book
Intended Audience
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
9/208
[ Te am LiB ]
Introduction
This book describes the protocols, standards, and architecture of systems that deliver real-time voice, music, and
video over IP networks, such as the Internet. These systems include voice- over-IP, telephony, teleconferencing,
streaming video, and webcasting applications. The book focuses on media transport: how to deliver audio and video
r eliab ly acr os s an I P net wo r k, h o w t o ens ur e high q ualit y in t he face o f net wo r k p r ob lems , and h o w t o ens ur e t hat t he
s ys t em is s ecur e.
The book adopts a standards-based approach, based around the Real-time Transport Protocol (RTP) and its
associated profiles and payload formats. It describes the RTP framework, how to build a system that uses that
framework, and extensions to RTP for security and reliability.
M any med ia co d ecs ar e s uit able fo r us e w it h R TP — fo r examp le, M P EG aud io and v id eo ; I TU H . 26 1 and H . 2 63
video; G.711, G.722, G.726, G.728, and G.729 audio; and industry standards such as GSM, QCELP, and AMR
audio. RTP implementations typically integrate existing media codecs, rather than developing them specifically.
Accordingly, this book describes how media codecs are integrated into an RTP system, but not how media codecsare designed.
Call setup, session initiation, and control protocols, such as SIP, RTSP, and H.323, are also outside the scope of this
book. Most RTP implementations are used as part of a complete system, driven by one of these control protocols.
However, the interactions between the various parts of the system are limited, and it is possible to understand media
transport without understanding the signaling. Similarly, session description using SDP is not covered, because it is
part of the signaling.
Res o ur ce r es er vat io n is us eful in s o me s it uat io ns , b ut it is no t r equir ed fo r t he co r rect o p erat io n o f R TP . This b o o k
touches on the use of resource reservation through both the Integrated Services and the Differentiated Servicesframeworks, but it does not go into details.
That t hes e ar eas ar e no t co ver ed in t his b o o k d o es no t mean t hat t hey ar e unimp o rt ant . A s ys t em us ing R TP will us e
a r ange o f med ia co d ecs and w ill emp lo y s o me fo r m o f call s etup , s es sio n init iat io n, o r co nt ro l. T he w ay t his is d o ne
d ep end s o n t he ap p licatio n, t ho ugh: The need s o f a t elep ho ny s ys t em ar e ver y d iffer ent fr o m t ho s e o f a w eb cast ing
application. This book describes only the media transport layer that is common to all those systems.
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
10/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
11/208
O r g an iza ti o n o f t he B o ok
The book is logically divided into four parts: P art I, Introduction to N etworked Multimedia, introduces the problem
space, provides background, and outlines the properties of the Internet that affect audio/video transport:
C ha p te r 1, An Introduction to RTP, gives a brief introduction to the Real-time Transport Protocol, outlines
the relationship between RTP and other standards, and describes the scope of the book.
C ha p te r 2, Voice and Video Communication over Packet Networks, describes the unique environment
provided by IP networks, and how this environment affects packet audio/video applications.
The next five chapters, which constitute P ar t I I, M ed ia Tr ans p o rt U sing R TP , d is cus s t he b as ics o f t he R eal- t ime
Transport Protocol.
R o a d M a p fo r Thi s B oo k
You will need this information to design and build a tool for voice-over-IP, streaming music or video, and so on.
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
12/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
13/208
[ Te am LiB ]
I nt e nd e d A ud i e nc e
This book describes audio/video transport over IP networks in considerable detail. It assumes some basic familiarity
with IP network programming and the operation of network protocol stacks, and it builds on this knowledge to
describe the features unique to audio/video transport. An extensive list of references is included, pointing readers to
additional information on specific topics and to background reading material.
Several classes of readers might be expected to find this book useful:
Engineers. The primary audience is those building voice- over-IP applications, teleconferencing systems, and
streaming media and webcasting applications. This book is a guide to the design and implementation of the
media engine of such systems. It should be read in conjunction with the relevant technical standards, and it
builds on those standards to show how a system is built. This book does not discuss signaling (for example,
S IP , RTS P , o r H. 3 23 ), w hic h is a s e pa ra te s ub je c t w or thy o f a b o ok in it s o wn r ight . Ins te a d it t alk s in d e ta il
about media transport, and how to achieve good-quality audio and smooth-motion video over IP networks.
S t ud ent s. The b o o k can b e r ead as an acco mp animent t o a co ur s e in net wo r k p r ot o co l d es ign o r
telecommunications, at either a graduate or an advanced undergraduate level. Familiarity with IP networks
and layered protocol architectures is assumed. The unique aspects of protocols for real-time audio/video
transport are highlighted, as are the differences from a typical layered system model. The cross-disciplinary
nature of the subject is highlighted, in particular the relation between the psychology of human perception and
t he d emand s o f r o bus t med ia d eliver y.
R es earcher s . A cad emics and ind us tr ial r es ear cher s can us e t his b o o k as a s o ur ce o f info r mat io n ab o ut t he
s t and ar ds and algo r it hms t hat co ns t it ut e t he cur r ent s t at e o f t he ar t in r eal- t ime aud io / vid eo t rans p o rt o ver I P
networks. Pointers to the literature are included in the References section, and they will be useful starting
points for those seeking further depth and areas where more research is needed.
Network administrators. An understanding of the technical protocols underpinning the common streaming
audio/video applications is useful for those administering computer networks—to show how those
applications can affect the behavior of the network, and how the network can be engineered to suit those
applications better. This book includes extensive discussion of the most common network behavior (and how
applications can adapt to it), the needs of congestion control, and the security implications of real-timeaudio/video traffic.
In s ummar y, t his b o o k can b e us ed as a r efer ence, in co nj unct io n w it h t he t echnical s t and ard s , as a s tud y guid e, o r as
part of an advanced course on network protocol design or communication technology.
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
14/208
[ Te am LiB ]
Acknowledgments
A b o o k s uch as t his is no t w rit ten in is o lat io n; r ather it is s hap ed b y t he aut ho r ' s exp er iences and int eractio ns w it h
other researchers and practitioners. In large part, I gained my experience while working at University College
Lo nd o n. I am gr ateful t o V ick y H ar d man, P et er K ir s tein, an d A ngela S as se fo r t he o p p or tunit y t o w o rk o n t heir projects, and to Anna Bouch, Ian Brown, Anna Conniff, Jon Crowcroft, Panos Gevros, Atanu Ghosh, Mark
Handley, Tristan Henderson, Orion Hodson, Nadia Kausar, Isidor Kouvelas, Piers O' Hanlon, Louise Sheeran,
Lorenzo Vicisano, and everyone else associated with G11, for providing a stimulating working environment, a nd a
distracting social scene.
I wish to thank my colleagues at the USC Information Sciences Institute for their support, in particular Alec
Aakesson, Helen Ellis, Jarda Flidr, Ladan Gharai, Tom Lehman, Dan Massey, and Nikhil Mittal. Allison Mankin
provided the opportunity to work at USC/ISI, for which I am grateful.
On a personal note, Peter Phillips, Stewart Cambridge, Sonja Krugmann, and Alison Gardiner each helped me make
t he b ig mo ve, in t heir o wn s p ecial w ay. I t hank yo u.
The staff at Addison-Wesley did an excellent job in the production of this book. In particular, Dayna Isley and Jessica
Goldstein provided encouragement to a new author and showed great patience during endless revisions. Thanks are
also due to Amy Fleischer, Elizabeth Finney, Laurie McGuire, Cheri Clark, Rebecca Martin, and Stephanie Hiebert.
The technical editors—Steve Casner and Orion Hodson—did sterling work, significantly improving the quality of the
book, correcting many mistakes and contributing significantly to the text. Any errors that remain are mine alone.
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
15/208
[ Te am LiB ]
P ar t I : I nt ro d uc ti o n t o N e two r ke d
Multimedia1 An Introduction to RTP
2 Voice and Video Communication over Packet Networks
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
16/208
[ Te am LiB ]
C ha p te r 1 . A n In t ro d uc ti on t o R TP
A Brief History of Audio/Video Networking
A S na p sho t o f RTP
Related Standards
Overview of an RTP Implementation
The Internet is changing: Static content is giving way to streaming video, text is being replaced by music and the
spoken word, and interactive audio and video is becoming commonplace. These changes require new applications,
and they pose new and unique challenges for application designers.
This book describes how to build these new applications: voice- over-IP, telephony, teleconferencing, streaming
video, and webcasting. It looks at the challenges inherent in reliable delivery of audio and video across an IP network,
and it exp lains ho w t o ens ur e high q ualit y in t he face o f net wo r k p r ob lems , as w ell as ho w t o ens ur e t hat t he s ys t em is
secure. The emphasis is on open standards, in particular those devised by the Internet Engineering Task Force (IETF)
and the International Telecommunications Union (ITU), rather than on proprietary solutions.
This chap ter b egins o ur examinat io n o f t he R eal- t ime Tr ans p o rt P r o to co l ( RTP ) w it h a b r ief lo o k at t he his t or y o f
audio/video networking and an overview of RTP and its relation to other standards.
Throughout this text, extensive references are provided, as indicated by superscript numbers that map to
ent ries in t he R efer ences s ectio n at t he end o f t he b o o k. B ecaus e t he RT P s tand ar d is s t ill evo lving, and
because it intersects with so many other technologies, these references are provided to help readers gain
additional background information and pursue further research interests.
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
17/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
18/208
A B r ie f H i st o ry o f A ud io / Vi de o N e tw or ki ng
The id ea o f us ing p ack et net wo r ks — s uch as t he I nt ernet — to t rans p o rt vo ice and vid eo i s no t new. Exp eriment s w it h
vo ice o ver p acket net wo r ks s t ret ch b ack t o t he ear ly 1 9 7 0s . The fir s t RF C o n t his s ub j ect — the N et wo r k V oice
P r o to co l ( N VP )1 —dates from 1977. Video came later, but still there is over ten years of experience with audio/video
conferencing and streaming on the Internet.
E a rl y P a c k e t Vo i c e a nd Vi de o E x pe r ime n ts
The init ial d evelo p ers o f N V P w er e r es ear cher s t rans mit ting p acket vo ice o ver t he A RP AN E T, t he p r ed ecess o r t o
the Internet. The ARPANET provided a reliable-stream service (analogous to TCP/IP), but this introduced too much
delay, so an "uncontrolled packet" service was developed, akin to the modern UDP/IP datagrams used with RTP.
The NVP was layered directly over this uncontrolled packet service. Later the experiments were extended beyond
the ARPANET to interoperate with the Packet Radio Network and the Atlantic Satellite Network (SATNET),
running NVP over those networks.
A ll o f t he se e a rly e xp e rime nt s we re limit ed t o o ne o r t wo vo ic e c ha nne ls a t a t ime b y t he lo w b a nd wid th o f t he e a rly
net wo r ks . I n t he 1 9 8 0s , t he cr eat io n o f t he 3 - M b p s W id eb and S at ellit e N et wo r k enab led no t o nly a lar ger numb er o f voice channels but also the development of packet video. To access the one-hop, reserved-bandwidth, multicast
service of the satellite network, a connection-oriented inter-network protocol called the Stream Protocol (ST) was
d evelo p ed. Bo t h a s econd v ers io n o f N V P , called N V P - I I, and a co mp anio n P ack et V id eo P r o to co l w er e
transported over ST to provide a prototype packet-switched video teleconferencing service.
I n 1 9 8 9– 1 9 90 , t he s atellit e net wo r k w as r eplaced w it h t he Ter res tr ial W id eb and N et w or k and a r esear ch net wo r k
called DARTnet while ST evolved into ST-II. The packet video conferencing system was put into scheduled
production to support geographically distributed meetings of network researchers and others at up to five sites
simultaneously.
ST and ST-II were operated in parallel with IP at the inter-network layer but achieved only limited deployment on
government and research networks. As an alternative, initial deployment of conferencing using IP began on DARTnet,
enabling multiparty conferences with NVP-II transported over multicast UDP/IP. At the March 1992 meeting of the
IETF, audio was transmitted across the Internet to 20 sites on three continents over multicast "tunnels"—the Mbone
(which stands for "multicast backbone")—extended from DARTnet. At that same meeting, development of RTP was
begun.
Au di o a n d V id e o o n t he I n te r ne t
Following from these early experiments, interest in video conferencing within the Internet community took hold in the
ear ly 1 9 9 0s . A t ab o ut t his t ime, t he p r o ces s ing p o wer and mult imed ia cap ab ilit ies o f w o rk s tat io ns and P C s b ecame
sufficient to e nable the simultaneous capture, compression, and playback of audio and video streams. In parallel,
development of IP multicast allowed the transmission of real-time data to any number of recipients connected to the
Internet.
Video conferencing and multimedia streaming were obvious and well-e xecuted multicast applications. Research
gr o up s t o ok t o d evelo p ing t o ols s uch as vic and vat fr o m t he Lawr ence B er k eley Lab o rat or y,87 nevot from the
University of Massachusetts, the INRIA video conferencing system, nv from Xerox PARC, and rat from University
College London.77 These tools followed a new approach to conferencing, based on connectionless protocols, the
end-to - end argument, and application-level framing.65,70,76 Conferences were minimally managed, with no
admission or floor control, and the transport layer was thin and adaptive. Multicast was used both for wide-area datatransmission and as an interprocess communication mechanism between applications on the same machine (to
exchange synchronization information betwee n audio and video tools). The resulting collabora tive environment
consisted of lightly coupled applications and highly distributed participants.
The multicast conferencing (Mbone) tools had a significant impact: They led to widespread understanding of the
problems inherent in delivering real-time media over IP networks, the need for scalable solutions, and error and
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
19/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
20/208
[ Te am LiB ]
A S na ps ho t o f RTP
The key standard for audio/video transport in IP networks is the Real-time Transport Protocol (RTP), along with its
associated profiles and payload formats. RTP aims to provide services useful for the transport of real-time media,
such as audio and video, over IP networks. These services include timing recovery, loss detection and correction,
payload and source identification, reception quality feedback, media synchronization, and membership management.
RTP was originally designed for use in multicast conferences, using the lightweight sessions model. Since that time, it
has proven useful for a range of other applications: in H.323 video conferencing, webcasting, and TV distribution; and
in both wired and cellular telephony. The protocol has been demonstrated to scale from point-to-point use to
multicast sessions with thousands of users, and from low-bandwidth cellular telephony applications to the delivery of
uncompressed High- Definition Television (HDTV) signals at gigabit rates.
RTP was developed by the Audio/Video Transport working group of the IETF and has since been adopted by the
I TU as p ar t o f it s H . 32 3 s eries o f r ecommend atio ns , and b y var io us o t her s t and ar ds o r ganizat io ns . The fir s t ver s io n o f
R TP w as co mp let ed in J anuar y 1 9 9 6.6 RT P need s t o b e p r o filed f o r p ar ticular us es b efo r e it is co mp let e; an init ial
profile was defined along with the RTP specification,7 and several more profiles are under development. Profiles areaccompanied by several payload format specifications, describing the transport of a particular media format.
Development of RTP is ongoing, and a revision is nearing completion at the time of this writing.49,50
A detailed introduction to RTP is provided in C ha p te r 3, T he R eal- t ime Tr ans p o rt P r o to co l, and mo s t o f t his b o o k
discusses the design of systems that use RTP and its various extensions.
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
21/208
[ Te am LiB ]
R e l at e d S t an da r ds
I n ad d it io n t o R TP , a co mp let e s ys t em t yp ically r eq uir es t he us e o f var io us o t her p r o to co ls and st and ar ds fo r s es sio n
announcement, initiation, and control; media compression; and network transport.
Figure 1.1 shows how the negotiation and call control protocols, the media transport layer (provided by RTP), the
compression-decompression algorithms (codecs), and the underlying network are related, according to both the IETF
and ITU conferencing frameworks. The two parallel sets of call control and media negotiation standards use the same
media transport framework. Like-wise, the media codecs are common no matter how the session is negotiated and
irrespective of the underlying network transport.
Figure 1.1. IETF and ITU Protocols for Audio/Video Transport on the Internet
The relation between these standards and RTP is outlined further in C ha p te r 3, The Real-time Transport Protocol.
However, the main focus of this book is media transport, rather than signaling and control.
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
22/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
23/208
O ve r vi e w o f a n RTP I mp le me nt at i on
As Figure 1.1 s ho ws , the co r e o f any s ys t em fo r d eliver y o f r eal- t ime aud io / vid eo over I P is R TP : I t p r o vid es t he
common media transport layer, independent of the signaling protocol and application. Before we look in more detail
at R TP an d t he d es ign o f s ys tems us ing R TP , it w ill b e us eful t o have an o ver view o f t he r es po ns ib ilit ies o f RT P
s end ers and r eceiver s in a s ys t em.
B e ha vio r o f a n R TP S e nde r
A sender is responsible for capturing and transforming audiovisual data for transmission, as well as for generating
RTP packets. It may also participate in error correction and congestion control by adapting the transmitted media
s t ream in r esp o ns e t o r eceiver feed back . A d iagr am o f t he s end ing p r o ces s is s ho wn in Figure 1.2.
F ig ure 1 . 2 . B l oc k D ia g ra m o f a n R TP S e nde r
Uncompressed media data—audio or video—is captured into a buffer, from which compressed frames are
produced. Frames may be encoded in several ways depending on the compression algorithm used, and encoded
fr ame s ma y d e pe nd o n b ot h e a rlie r a nd la te r d a ta .
Compressed frames are loaded into RTP packets, ready for sending. If frames are large, they may be fragmented
into several RTP packets; if they are small, several frames may be bundled into a single RTP packet. Depending on
t he e rr or c o rr ec tio n s c he me in us e , a c ha nne l c o de r ma y b e us e d t o ge ne ra te e r ro r c o rr e ct io n p a ck e ts o r t o r e or de r
packets before transmission.
A ft er t he R TP p ack ets have b een s ent , t he b uffer ed med ia d ata co r res p ond ing t o tho s e p ack et s is event ually fr eed.
The s e nd e r mus t no t d is c ar d d a ta t ha t might b e ne e de d fo r e r ro r c o rr e ct io n o r fo r t he e nc o ding p ro c es s. This
r eq uir e me nt ma y me a n t ha t t he s e nd e r mus t b uffe r d a ta fo r s o me t ime a ft er t he c o rr es p ond ing p a ck e ts ha ve b e en s e nt ,
d ep end ing o n t he co d ec and er ro r co r rectio n s cheme us ed .
The s end er is r es po ns ib le fo r gener ating p er io d ic s t at us r epo r ts fo r t he med ia s t reams it is gener ating, includ ing t ho s e
required for lip synchronization. It also receives reception quality feedback from other participants and may use that
information to adapt its transmission.
B e ha vio r o f a n R TP R e c eive r
A receiver is responsible for collecting RTP packets from the network, correcting any losses, recovering the timing,
decompressing the media, and presenting the result to the user. It also sends reception quality feedback, allowing the
s end er t o ad ap t t he t rans mis s io n t o t he r eceiver , an d i t maint ains a d atab ase o f p ar ticip ant s in t he s es sio n. A p o s sib le block diagram for the receiving process is shown in Figure 1.3; implementations sometimes perform the operations in
a d iffer ent o r der d ep end ing o n t heir need s.
Fi gure 1 . 3 . B l oc k D ia g ra m o f a n R TP R e c e i v e r
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
24/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
25/208
[ Te am LiB ]
Summary
This chapter has introduced the protocols and standards for real-time delivery of multimedia over IP networks, in
particular the Real-time Transport Protocol (RTP). The remainder of this book discusses the features and use of RTP
in detail. The aim is to expand on the standards documents, explaining both the rationale behind the standards and
possible implementation choices and their trade-offs.
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
26/208
[ Te am LiB ]
C ha pt er 2 . V oi ce a nd V id eo C om m un ica ti on O ve r
Packet Networks
T C P /I P and t he O S I R efer ence M o d el
Performance Characteristics of an IP Network
Measuring IP Network Performance
Effects of Transport Protocols
Requirements for Audio/Video Transport in Packet Networks
Before delving into details of RTP, you should understand the properties of IP networks such as the Internet, and
how they affect voice and video communication. This chapter reviews the basics of the Internet architecture and
outlines typical behavior of a network connection. This review is followed by a discussion of the transport
r eq uir ement s fo r aud io and vid eo , and ho w w ell t hes e r equir ement s ar e met b y t he net wo r k.
IP networks have unique characteristics that influence the design of applications and protocols for audio/video
t rans p o rt . U nd ers t and ing t hes e char act eris t ics is vit al if yo u ar e t o ap p reciat e t he t rad e- o ffs invo lved in t he d es ign o f RTP, and how they influence applications that use RTP.
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
27/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
28/208
TC P / IP a nd t he O S I R e fe r e nc e M o de l
When you're thinking about computer networks, it is important to understand the concepts and implications of
protocol layering. The OSI reference model,93 illustrated in Figure 2.1, p r o vid es a us eful b as is fo r d is cus s io n and
comparison of layered systems.
F ig ure 2 . 1 . T he O S I R e f e r e nc e M o de l
The mo d el co mp r is es a s et o f s even layer s , each b uild ing o n t he s er vices p r ovid ed b y t he lo wer layer and , in t ur n,
providing a more abstract service to the layer above. The functions of the layers are as listed here:
1.
Physical layer. The lowest layer—the physical layer—includes the physical network connection devices and
protocols, such as cables, plugs, switches, and electrical standards.2.
D at a link layer . The d ata link layer b uild s o n t he p hys ical co nnectio n; fo r examp le, it t ur ns a t wis t ed - p air cab le
into Ethernet. This layer provides framing for data transport units, defines how the link is shared among
multiple connected devices, and supplies addressing for devices on each link.
3.
Network layer. The network layer connects links, unifying them into a single network. It provides addressing
and r o ut ing o f mes s ages t hr o ugh t he net wo r k. I t may als o p r o vid e co nt ro l o f co nges tio n in t he s wit ches ,
prioritization of certain messages, billing, and so on. A network layer device processes messages receivedfrom one link and dispatches them to another, using routing information exchanged with its peers at the far
end s o f t ho s e link s .
4.
Transport layer. The transport layer is the first end-to-end layer. It takes responsibility for delivery of
messages from one system to another, using the services provided by the network layer. This responsibility
inc lud es r ovid in r elia bilit a nd flo w c ont ro l if t he a re ne ed ed b t he s es sio n la e r a nd no t r ovid ed b the
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
29/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
30/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
31/208
P e r fo r ma nc e C ha r ac t er i st i c s o f a n I P N e tw or k
As is ap p arent fr o m t he ho ur glas s mo d el o f t he I nt ernet ar chit ectur e, an ap p licatio n is hid d en fr o m t he d etails o f t he
lo wer layer s b y t he ab s tr actio n o f I P . This means it ' s no t p o s sib le t o d eter mine d ir ectly t he t yp es o f net wo r ks acr os s
which an IP packet will have traveled—it could be anything from a 14.4-kilobit cellular radio connection to a
multi-gigabit optical fiber—or the level of congestion of that network. The only means of discovering the performance
of the network are observation and measurement.
S o w hat d o w e need t o meas ur e, and ho w d o w e meas ur e it ? Luck ily, the d es ign o f t he I P layer means t hat t he
numb er o f p ar amet ers is limit ed, and t hat numb er o ft en can b e fur ther co ns t rained b y t he need s o f t he ap p licatio n. The
mo s t imp o r tant q ues tio ns w e can as k ar e t hes e:
W ha t is t he p ro b ab ilit y t ha t a p a ck e t will b e lo s t in t he ne tw or k?
W ha t is t he p ro b ab ilit y t ha t a p a ck e t will b e c o rr up te d in t he ne tw or k?
Ho w lo ng d o es a p a ck e t t ak e t o t ra ve rs e t he ne tw or k? I s t he t ra ns it t ime c o ns ta nt o r va ria b le ?
W ha t s ize o f p a ck e t c a n b e a c co mmo d at ed ?
W ha t is t he ma ximum r at e a t whic h w e c a n s end p a c k et s?
The next section provides some sample measurements for the first four listed parameters. The maximum rate isclo s ely t ied to t he p r ob ab ilit y t hat p ack et s ar e lo s t in t he net wo r k , as d is cus s ed in C ha p te r 1 0, Congestion Control.
What affects such measurements? The obvious factor is the location of the measurement stations. Measurements
t aken b etw een t wo s ys t ems o n a LA N w ill clear ly s ho w p r o per ties d iffer ent fr o m t ho s e o f a t rans atlant ic co nnect io n!
B ut geo gr ap hy is no t t he o nly fact or ; t he numb er o f link s t raver sed ( oft en r efer red t o as t he numb er o f ho p s ), t he
numb er o f p r o vid ers cr o ss ed , and t he t imes at w hich t he meas ur ement s ar e t aken all ar e facto r s. The I nt ernet is a
lar ge, co mp lex, and d ynamic s ys t em, s o car e mus t b e t aken t o en sur e t hat any meas ur ement s ar e r ep res ent ative o f t he
part of the network where an application is to be used.
W e als o have t o co ns id er w hat s o rt o f net wo r k is b eing us ed , w hat o t her t raffic is p r esent , and ho w much o t her
t raffic is p r esent . T o d ate, t he vas t maj or it y o f net wo r k p aths ar e fixed , w ir ed ( eit her co p per o r o p tical fib er )co nnectio ns , and t he vas t maj or it y o f t r affic ( 9 6 % o f b yt es, 6 2 % o f flo ws , accor d ing t o a r ecent es timat e123) is TC P
based. The implications of these traffic patterns are as follows:
Because the infrastructure is primarily wired and fixed, the links are very reliable, and loss is caused mostly
by congestion in the routers.
TC P t r ans p o rt mak es t he as s ump t io n t hat p ack et lo s s is a s ignal t hat t he b o tt leneck b and wid t h has b een
reached, congestion is occurring, and it should reduce its sending rate. A TCP flow will increase its sendingr ate unt il lo s s is o b s er ved , and t hen b ack o ff, as a w ay o f d eter mining t he maximum r ate a p ar ticular
connection can support. Of course, the result is a temporary overloading of the bottleneck link, which may
affect other traffic.
I f t he co mp o s it io n o f t he net wo r k infr ast ruct ur e o r t he t raffic changes , o t her s o ur ces o f lo s s may b eco me imp o r tant .
F o r examp le, a lar ge incr eas e in t he numb er o f w ir eles s us er s w ould lik ely incr eas e t he p r op o r tio n o f lo s s d ue t o
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
32/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
33/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
34/208
M e a s u r i ng I P Ne tw o r k Pe r fo r ma n c e
This section outlines some of the available data on IP network performance, including published results on average
packet loss, patterns of loss, packet corruption and duplication, transit time, and the effects of multicast.
S ever al s t ud ies have meas ur ed n etw or k b ehavio r o ver a w id e r ange o f co nd it io ns o n t he p ub lic I nt ernet . F o r
examp le, P axs o n r epo r ts o n t he b ehavio r o f 2 0 , 00 0 t r ans fer s amo ng 3 5 s it es in 9 co unt ries ;124,95 Handley122 and
Bolot67,66 r ep or t o n t he b ehavio r o f mult icas t s es sio ns ; and Y aj nik , M o o n, K ur o se, and T ow sley r ep or t o n t he
t emp o ral d ep end ence in p ack et lo s s s tat is t ics .89,108,109 O t her s o ur ces o f d ata includ e t he t raffic ar chives
maintained by CAIDA (the Cooperative Association for Internet Data Analysis),117 t he N L AN R ( N atio nal
Laboratory for Applied Network Research),119 and the ACM (Association for Computing Machinery).116
Ave r a g e P a c k e t Lo s s
V ar io us p acket lo s s met rics can b e s t ud ied . F o r examp le, t he aver age lo s s r ate gives a gener al meas ur e o f net wo r k
congestion, while loss patterns and correlation give insights into the dynamics of the network.
The r ep or t ed m eas ur ement s o f aver age p ack et lo s s r ate s ho w a r ange o f co nd it io ns . F o r examp le, meas ur ement s o f TC P /I P t ra ffic t ak e n b y P a xs o n in 1 9 94 a nd 1 9 9 5 s ho w t ha t 3 0 % t o 7 0 % o f flo ws , d e p end ing o n p a th t ak e n a nd
d a te , s ho we d no p a ck e t lo s s, b ut o f t ho s e flo ws t ha t d id s ho w lo ss , t he a ve r age lo ss r ange d fr om 3 % t o 1 7 % ( the se
results are summarized in T ab le 2 . 1). Data from Bolot, using 64- kilobit PCM- encoded audio, shows similar patterns,
wit h lo ss r a te s b e twe e n 4 % a nd 1 6 % d e pe nd ing o n t ime o f d a y, a lt ho ugh t his d a ta a ls o d a te s fr om 1 9 95 . M or e r ec e nt
results from Yajnik et al., taken using simulated audio traffic in 1997–1998, show lower loss rates of 1.38% to
11.03%. Handley's results—two sets of approximately 3.5 million packets of data and reception report statistics for
multicast video sessions in May and September 1996—show loss averaged over five-second intervals varying
between 0% and 100%, depending on receiver location and time of day. A sample for one particular receiver during
a t en- h our p er io d o n M ay 2 9 , 1 9 9 6, p lo t ted in Figure 2.5, shows the average loss rate, sampled over five-second
intervals, varying between 0% and 20%.
Figure 2.5. Loss Rate Distribution versus Time 122
Tab le 2 . 1 . P ack et L os s R at es fo r V ario us R egio ns 9 5
Fraction of Flows Showing N o Loss Average Loss Rate for Flows with Loss
Region Dec. 1994 Dec. 1995 Dec. 1994 Dec. 1995
Within Europe 48% 58% 5.3% 5.9%
Within U.S. 66% 69% 3.6% 4.4%
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
35/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
36/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
37/208
E ffe c t s o f Tr a ns po r t P r o t oc o l s
Thus far, our consideration of network characteristics has focused on IP. Of course, programmers almost never use
the raw IP service. Instead, they build their applications on top of one of the higher-layer transport protocols, typically
eit her U DP o r TC P . T hes e p r ot o co ls p r o vid e ad d it io nal featur es b eyo nd t ho s e p r o vid ed b y I P . H o w d o t hes e ad d ed
featur es affect t he b ehavio r o f t he net wo r k as s een b y t he ap p licat io n?
UDP/IP
The U ser D at agr am P r o to co l ( UD P ) p r ovid es a minimal s et o f ext ens io ns t o I P . The U DP head er is s ho wn in Figure
2.14. It comprises 64 bits of additional header representing source and destination port identifiers, a length field, and a
checksum.
Fig ure 2 . 1 4. Fo rma t o f a U D P H e a de r
The source and destination ports identify the endpoints within the communicating hosts, allowing for multiplexing of
d iffer ent s er vices o nt o d iffer ent p o r ts . S o me s ervices r un o n w ell- k no wn p o rt s ; o t her s us e a p o r t t hat is d ynamically
negotiated during call setup. The length field is redundant with that in the IP header. The checksum is used to detect
co r rup t io n o f t he p aylo ad and is o p tio nal ( it is s et t o zer o fo r ap p licatio ns t hat have no us e fo r a check sum) .
Ap ar t fr o m t he ad d it io n o f p o rt s and a check sum, U DP p r ovid es t he r aw I P s ervice. I t d o es no t p r ovid e any
enhanced reliability to the transport (although the checksum does allow for detection of payload errors that IP does
no t d etect) , no r d o es it affect t he t iming o f p acket d eliver y. An ap p licatio n us ing U DP p ro vid es d ata p acket s t o t he
t rans p o rt layer , w hich d eliver s t hem t o a p o rt o n t he d es tinat io n machine ( o r t o a gr o up o f machines if mult icas t isus ed ). Tho s e p ack et s may b e lo s t, d elayed , o r mis o rd ered in t rans it , exactly as o b s er ved fo r t he r aw I P s er vice.
T C P / I P
The mo s t co mmo n t rans p o rt p r ot o co l o n t he I nt ernet is TC P . A lt ho ugh U DP p r ovid es o nly a s mall s et o f ad d it io ns t o
the IP service, TCP adds a significant amount of additional functionality: It abstracts the unreliable packet delivery
s ervice o f I P t o p r ovid e r eliab le, s eq uent ial d eliver y o f a b yt e s t ream b etw een p o rt s o n t he s o ur ce and a s ingle
destination host.
A n ap p licatio n us ing TC P p r ovid es a s t ream o f d ata t o t he t r ans p o rt layer , w hich fr agment s it fo r t rans mis s io n in
ap p ro p r iat ely s ized p ack ets , and at a r ate s uit ab le fo r t he net wo r k . P acket s ar e ack no w led ged b y t he r eceiver , and
t ho s e t hat ar e lo s t in t rans it ar e r etr ans mit ted b y t he s o ur ce. W hen d ata ar rives , it is b uffer ed at t he r eceiver s o t hat it
can b e d eliver ed in o r der . This p r oces s is t rans p arent t o t he ap p licat io n, w hich s imp ly s ees a "p ip e" o f d ata flo wing
acr os s t he net wo r k .
A s lo ng as t he ap p licatio n p r ovid es s ufficient d ata, t he TC P t rans p o rt layer w ill incr eas e it s s end ing r ate unt il t he
net wo r k exhib it s p ack et lo s s . P ack et lo s s is t r eat ed as a s ignal t hat t he b and wid t h o f t he b o tt leneck link has b een
exceeded and the connection should reduce its sending rate to match. Accordingly, TCP reduces its sending rate
when loss occurs. This process continues, with TCP continually probing the sustainable transmission rate across the
net wo r k; t he r esult is a s end ing r ate s uch as t hat illus tr ated in Figure 2.15.
Figure 2.15. S ample TCP Se nding Ra te
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
38/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
39/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
40/208
R e qu i r e me n ts fo r A ud i o / V i de o Tr a n sp o r t i n Pa c k e t N e two r k s
S o f ar , t his chap ter has exp lo r ed t he char act eris t ics o f I P n et wo r k s in s o me d etail, an d has lo o k ed b r iefly at t he
behavior of the transport protocols layered above them. We can now relate this discussion to real-time audio and
video transport, consider the requirements for delivery of media streams over an IP network, and determine how well
the network meets those requirements.
Whe n w e d e sc rib e me d ia a s r ea l- t ime , we me a n s imp ly t ha t t he r ec e ive r is p la ying o ut t he me d ia s tr ea m a s it is
r eceived , rat her t han s imp ly s t or ing t he co mp let e s t ream in a file fo r lat er p lay- b ack . In t he id eal cas e, p layo ut at t he
receiver is immediate and synchronous, although in practice some unavoidable transmission delay is imposed by the
network.
The p r imar y r eq uir ement t hat r eal- t ime med ia p laces o n t he t rans p o rt p r ot o co l is fo r p r ed ict ab le var iat io n in net wo r k
transit time. Consider, for example, an IP telephony system transporting encoded voice in 20- millisecond frames: The
s o ur ce w ill t rans mit o ne p ack et ever y 2 0 millis econd s , and id eally w e w ould lik e t ho s e t o ar rive w it h t he s ame s p acing
s o t hat t he s p eech t hey co nt ain can b e p layed o ut immed iat ely. S o me var iat io n in t rans it t ime can b e acco mmo d ated
by the insertion of additional buffering delay at the receiver, but this is possible only if that variation can be
char act erized and t he r eceiver can ad ap t t o mat ch t he var iat io n ( this p r oces s is d es cr ib ed in d etail in C ha p te r 6,Media Ca pture, Playout, and Timing).
A lesser requirement is reliable delivery of all packets by the network. Clearly, reliable delivery is desirable, but many
aud io and vid eo ap p licat io ns can t o ler ate s o me lo s s : I n o ur I P t elep ho ny examp le, lo s s o f a s ingle p acket w ill r es ult in
a dropout of one-fiftieth of a second, which, with suitable error concealment, is barely noticeable. Because of the
t ime- v ar ying nat ur e o f med ia s t reams , s o me lo s s is us ually accept able b ecaus e it s effect s ar e q uick ly co rr ected b y t he
ar rival o f new d ata. T he amo unt o f lo s s t hat is accept able d ep end s o n t he ap p licatio n, t he enco d ing met ho d u sed , and
t he p a tt er n o f lo ss . C ha p te r 8, Error Concealment, and C ha p te r 9, Error Correction, discuss loss tolerance.
T hes e r eq uir ement s d r ive t he cho ice o f t rans p o rt p r ot o co l. I t s ho uld b e clear t hat TC P / IP is no t ap p ro p riat e b ecaus eit favors reliability over timeliness, and our applications require timely delivery. A UDP/IP-based transport should be
s uit able, p r o vid ed t hat t he var iat io n in t rans it t ime o f t he net wo r k can b e char act erized and lo s s r ates ar e accept able.
The standard Real-time Transport Protocol (RTP) builds on UDP/IP, and provides timing recovery and loss
detection, to enable the development of robust systems. RTP and associated standards will be discussed in extensive
d etail in t he r emaind er o f t his b o o k.
Despite TCP's limitations for real-time applications, some audio/video applications use it for their transport. Such
applications attempt to estimate the average throughput of the TCP connection and adapt their send rate to match.
This ap p ro ach can b e mad e t o wo r k when t ight end - t o - end d elay b o und s ar e no t r equir ed and an ap p licatio n has
several seconds worth of buffering to cope with the variation in delivery time caused by TCP retransmission andcongestion control. It does not work reliably for interactive applications, which need short end-to- end delay, because
the variation in transit time caused by TCP is too great.
T he p r imar y r atio nale fo r t he us e o f TC P / IP t r ans p o rt is t hat many fir ewalls p as s T C P co nnectio ns b ut b lo ck U DP .
This situation is changing rapidly, as RTP-based systems become more prevalent and firewalls smarter. I strongly
recommend that new applications be based on RTP-over-UDP/IP. RTP provides for higher quality by enabling
ap p licatio ns t o ad ap t in a w ay t hat is ap p ro p r iat e fo r r eal- t ime med ia, and b y p r omo t ing int ero p erab ilit y ( b ecaus e it is
an o p en s tand ar d) .
B e n e f it s o f P a c k e t - B a s e d Au di o /Vi de o
At t his s t age yo u may b e w ond ering w hy anyo ne w ould cons id er a p ack et - b as ed aud io o r vid eo ap p licatio n o ver an
IP network. Such a network clearly poses challenges to the reliable delivery of real-time media streams. Although
t hes e challenges ar e r eal, an I P net wo r k has s o me d is t inct ad vant ages t hat lead t o t he p o tent ial fo r s ignificant gains in
efficiency and flexibility, which can outweigh the disadvantages.
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
41/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
42/208
[ Te am LiB ]
Summary
The properties of an IP network are significantly different from those of traditional telephony, audio, or video
distribution networks. When designing applications that work over IP, you need to be aware of these unique
characteristics, and make your system robust to their effects.
The remainder of this book will describe an architecture for such systems, explaining RTP and its model for timing
recovery and lip synchronization, error correction and concealment, congestion control, header compression,
multiplexing and tunneling, and security.
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
43/208
[ Te am LiB ]
Part II: Media Transport Using
RTP3 The Real-time Transport Protocol
4 RTP Data Transfer Protocol
5 RTP Control Protocol
6 Med ia Cap ture, Playout, and Timing
7 Lip Synchronization
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
44/208
[ Te am LiB ]
Ch ap ter 3. Th e Real -Ti me Tra nsp ort P ro to co l
Fundamental Design Philosophies of RTP
Standard Elements of RTP
Related Standards
Future Standards Development
This chapter describes the design of the RTP framework starting with the philosophy and background of the design,
gives an o ver view o f t he ap p licab le s tand ar ds , and exp lains ho w t ho s e s t and ard s int err elat e. I t co nclud es w it h a
discussion of possible future directions for the development of those standards.
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
45/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
46/208
F u nd ame n ta l D e si g n P h i l os op hi e s o f R TP
The challenge facing the designers of RTP was to build a mechanism for robust, real-time media delivery above an
unreliable transport layer. They achieved this goal with a design that follows the twin philosophies of application-level
framing and the end-to-e nd principle.
Ap pl i c a t i o n -L e v e l F r a m i ng
The concepts behind application-level framing were first elucidated by Clark and Tennenhouse65 in 1990. Their
cent ral t hes is is t hat o nly t he ap p licatio n has s ufficient k no wled ge o f it s d ata t o mak e an info r med d ecis io n ab o ut ho w
t hat d ata s ho uld b e t rans p o rt ed. T he imp licatio n is t hat a t rans p o rt p r ot o co l s ho uld accept d ata in
application-meaningful units (application data units, ADUs) and expose the details of their delivery as much as
possible so that the application can make an appropriate response if an error occurs. The application partners with the
transport, co operating to achieve reliable delivery.
Application-level framing comes from the recognition that there are many ways in which an application can recover
fr o m net wo r k p r ob lems , and t hat t he co r rect ap p ro ach d ep end s o n b o th t he ap p licatio n and t he s cenar io in w hich it is
being used. In some cases it is necessary to retransmit an exact copy of the lost data. In others, a lower-fidelity copymay be used, or the data may have been superseded, so the replacement is different from the original. Alternatively,
t he lo s s can b e igno r ed if t he d ata w as o f o nly t r ans ient int eres t. Thes e cho ices ar e p o s sib le o nly if t he ap p licatio n
interacts closely with the transport.
The go al o f ap p licatio n- level fr aming is s o mewhat at o d d s w it h t he d es ign o f T C P, w hich hid es t he lo s sy nat ur e o f t he
underlying IP network to achieve reliable delivery at the expense of timeliness. It does, however, fit well with
UDP-based transport and with the characteristics of real-time media. As noted in C ha p te r 2, Vo ice and V id eo
Communication over Packet Networks, real-time audio and visual media is often loss tolerant but has strict timing
bounds. By using application-level framing with UDP-based transport, we are able to accept losses where necessary,
but we also have the flexibility to use the full spectrum of recovery techniques, such as retransmission and forwarderror correction, where appropriate.
These techniques give an application great flexibility to react to network problems in a suitable manner, rather than
being constrained by the dictates of a single transport layer.
A network that is designed according to the principles of application-level framing should not be specific to a
particular application. Rather it should expose the limitations of a generic transport layer so that the application can
cooperate with the network in achieving the best possible delivery. Application-level framing implies a weakening of
t he s t rict layer s d efined b y t he O S I r efer ence mo d el. I t is a p r agmat ic ap p ro ach, ack no wled ging t he imp o rt ance o f
layer ing, b ut accept ing t he need t o exp o s e s o me d etails o f t he lo wer layer s .
The philosophy of application-level framing implies smart, network-aware applications that are capable of reacting to
problems.
T h e E nd - to - E n d P r i nc i p l e
The other design philosophy adopted by RTP is the end-to-end principle.70 I t is o ne o f t wo a p pr oa c he s t o d e signing
a s ys t em t hat mus t co mmunicate r eliab ly acr os s a net wo r k. I n o ne ap p ro ach, t he s ys t em can p as s r es po ns ib ilit y fo r t he
correct delivery of data along with that data, thus ensuring reliability hop by hop. In the other approach, the
responsibility for data can remain with the endpoints, ensuring reliability end-to-end even if the individual hops are
unr eliab le. I t is t his s econd end - t o - end ap p ro ach t hat p ermeates t he d es ign o f t he I nt ernet , w it h b o th TC P and R TPfollowing the end-to - end principle.
The main consequence of the end-to-end principle is that intelligence tends to bubble up toward the top of the
protocol stack. If the systems that make up the network path never take responsibility for the data, they can be simple
and d o n o t need t o b e r o bus t . They may d is card d at a t hat t hey canno t d eliver , b ecaus e t he end p o int s w ill r ecover
without their help. The end-to-end principle implies that intelligence is at the endpoints, not within the network.
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
47/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
48/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
49/208
St anda rd E le me nt s o f RTP
The primary standard for audio/video transport in IP networks is the Real-time Transport Protocol (RTP), along with
associated profiles and payload formats. RTP was developed by the Audio/Video Transport working group of the
Internet Engineering Task Force (IETF), and it has since been adopted by the International Telecommunications
U nio n ( I TU ) as p art o f it s H . 32 3 s eries o f r ecommend atio ns , and b y s ever al o t her s t and ard s o r ganizat io ns .
R TP p r o vid es a fr amewo r k fo r t he t rans p o rt o f r eal- t ime med ia and n eed s t o b e p r o filed fo r p ar ticular us es b efo r e it
is complete. The RTP profile for audio and video conferences with minimal control was standardized along with RTP,
and several more profiles are under development. Each profile is accompanied by several payload format
specifications, each of which describes the transport of a particular media format.
T h e R T P S p e c i f i c a t i o n
R TP w as p ub lis hed as an I ETF p r o p o sed s t and ar d ( RF C 1 8 8 9 ) in J anuar y 1 9 9 6,6 and its revision for draft standard
status is almost complete.50 The first revision of ITU recommendation H.323 included a verbatim copy of the RTP
specification; later revisions reference the current IETF standard.
I n t he I ETF s t and ar ds p r o ces s ,8 a specification undergoes a development cycle in which multiple Internet
d ra ft s a re p ro d uc e d a s t he d e ta ils o f t he d e sign a re w or ke d out . W he n t he d e sign is c o mp le te , it is
published as a proposed standard RFC. A proposed standard is generally considered stable, with all
known design issues worked out, and suitable for implementation. If that proposed standard proves
useful, and if there are independent and interoperable implementations of each feature of that standard, it
can then be advanced to draft standard status (possibly involving changes to correct any problems found
in t he p r op o s ed s t and ard ) . F inally, aft er ext ens ive exp er ience, it may b e p ub lis hed as a full s t and ard
RFC. Advancement beyond proposed standard status is a significant hurdle that many protocols never
achieve.
RTP typically sits on top of UDP/IP transport, enhancing that transport with loss detection and reception quality
repor ting, provision for timing recovery and synchronization, payload and source identification, and marking of
significant events within the media stream. Most implementations of RTP are part of an application or library that is
layer ed ab o ve t he U DP / IP s o cket s int erface p r o vid ed b y t he o p erat ing s ys t em. This is no t t he o nly p o s sib le d es ign,
though, and nothing in the RTP protocol requires UDP or IP. For example, some implementations layer RTP above
TC P / IP , and o t her s us e R TP o n no n- I P n etw or k s, s uch as A synchr o no us Tr ans fer M o de ( ATM ) net wo r ks .
Ther e ar e t wo p art s t o R TP : t he d ata t r ans fer p r ot o co l and an as s ociat ed co nt ro l p r o to co l. The R TP d ata t rans fer
protocol manages delivery of real-time data, such as audio and video, between end systems. It defines an additionallevel of framing for the media payload, incorporating a sequence number for loss detection, timestamp to enable timing
recovery, payload type and source identifiers, and a marker for significant events within the media stream. Also
specified are rules for timestamp and sequence number usage, although these rules are somewhat dependent on the
profile and payload format in use, and for multiplexing multiple streams within a session. The RTP data transfer
protocol is discussed further in C ha p te r 4.
The RTP control protocol (RTCP) provides reception quality feedback, participant identification, and
synchronization between media streams. RTCP runs alongside RTP and provides periodic reporting of this
information. Although data packets are typically sent every few milliseconds, the control protocol operates on the
scale of seconds. The information sent in RTCP is necessary for synchronization between media streams—for
example, for lip synchronization between audio and video—and can be useful for adapting the transmission accordingto reception quality feedback, and for identifying the participants. The RTP control protocol is discussed further in
C ha p te r 5.
RTP s up p o rt s t he no t io n o f mixer s and t rans lat or s , mid d le b o xes t hat can o p erat e o n t he med ia as it flo ws b etw een
endpoints. These may be used to translate an RTP session between different lower-layer protocols—for example,
bridging between participants on IPv4 and IPv6 networks, or bringing a unicast-only participant into a multicast
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
50/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
51/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
52/208
-
8/17/2019 RTP Audio and Video for the Internet
53/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
54/208
[ Te am LiB ]
Fu tu r e S ta n da r d s D e ve l o p me n t
W it h t he r evis io n o f RT P fo r d r aft s t and ar d s t at us , t her e ar e no k n o wn unr es olved i ss ues w it h t he p r ot o co l
s p ecificatio n, and R TP it s elf is no t exp ected t o ch ange in t he fo r eseeab le fut ur e. T his d o es no t mean t hat t he s t and ar ds
work is finished, though. New payload formats are always under development, and work on new profiles will extend
RTP to encompass new functionality (for example, the profiles for secure RTP and enhanced feedback).
In the long term, we expect the RTP framework to evolve along with the network itself. Future changes in the
net wo r k may als o affect R TP , an d we exp ect new p r ofiles t o b e d evelo p ed t o t ak e ad vant age o f any changes . W e
also expect a continual series of new payload format specifications, to keep up with changes in codec technology and
to provide new error resilience schemes.
Finally, we can expect considerable changes in the related protocols for call setup and control, resource reservation,
and quality of service. These protocols are newer than RTP, and they are currently undergoing rapid development,
implying that changes here will likely be more substantial than changes to RTP, its profile, and payload formats.
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
55/208
[ Te am LiB ]
Summary
RTP provides a flexible framework for delivery of real-time media, such as audio and video, over IP networks. Its
core philosophies—application-level framing and the end-to- end principle—make it well suited to the unique
environment of IP networks.
This chapter has provided an overview of the RTP specification, profiles, and payload formats. Related standards
cover call setup, control and advertisement, and resource reservation.
The t wo p art s o f R TP int ro d uced in t his chap ter — the d ata t rans fer p r o to co l and t he co nt ro l p r o to co l— ar e co ver ed
in d etail in t he next t wo chap ter s.
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
56/208
[ Te am LiB ]
C ha pt er 4 . R TP Da ta Tra nsf e r P ro to co l
RTP Sessions
The R TP D at a Tr ans fer P ack et
Packet Validation
Translators and Mixers
This chapter explains the RTP data transfer protocol, the means by which real-time media is exchanged. The
d is cus s io n fo cus es o n t he "o n- t he- w ir e" as p ect s o f R TP — t hat is , t he p ack et fo r mat s and r equir ement s fo r
interoperability; the design of a system using RTP is explained in later chapters.
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
57/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
58/208
RTP Sessions
A s es sio n co ns is t s o f a gr o up o f p ar ticip ant s w ho ar e co mmunicating us ing RT P. A p ar ticip ant may b e act ive in
multiple RTP sessions—for instance, one session for exchanging audio data and another session for exchanging video
d ata. F o r each p ar ticip ant , the s es sio n is id ent ified b y a net wo r k add r ess and p o rt p air t o which d ata s ho uld b e s ent ,
a nd a p o rt p a ir o n whic h d a ta is r ec e ive d . The s e nd a nd r e ce ive p o rt s ma y b e t he s ame . Ea c h p or t p a ir c o mp ris e s t wo
ad jacent p o rt s : an even- n umb er ed p o r t fo r R TP d at a p acket s, an d t he next higher ( o dd - numb ered ) p o rt fo r R TC P
control packets. The default port pair is 5004 and 5005 for UDP/IP, but many applications dynamically allocate ports
d ur ing s es sio n s etup and igno r e t he d efault . R TP s es s io ns ar e d es igned t o t r ans p o rt a s ingle t yp e o f med ia; in a
multimedia communication, each media type should be carried in a separate RTP session.
The lat es t r evis io n t o t he R TP s p ecificatio n r elaxes t he r equir ement t hat t he R TP d ata p o r t b e
even-numbered, and allows non- adjacent RTP and RTCP ports. This change makes it possible to use
RTP in environments where certain types of Network Address Translation (NAT) devices are present. If
possible, for compatibility with older implementations, it is wise to use adjacent ports, even though this is
not strictly required.
A s es sio n can b e unicas t, eit her d ir ectly b etw een t wo p ar ticip ant s ( a p o int - t o - p o int s es sio n) o r t o a cent ral s erver t hat
r ed is t rib ut es t he d ata. O r it can b e mult icas t t o a gr o up o f p art icip ant s . A s es sio n als o need no t b e r es tr ict ed to a
s ingle t rans p o rt ad d res s s p ace. F o r examp le, R TP t r ans lat or s can b e us ed t o b r id ge a s es sio n b etw een unicas t and
multicast, or between IP and another transport, such as IPv6 or ATM. Translators are discussed in more detail later
in this chapter, in the section titled Translators and Mixers. So me examp les o f s es sio n t o po lo gies ar e s ho wn in Figure
4.1.
F ig ure 4 . 1. Ty pe s o f R TP S e s s i o ns
The r ange o f p o s sib le s es sio ns means t hat an R TP en d s ys tem s ho uld b e w rit ten t o b e es s ent ially agno s tic ab o ut t he
underlying transport. It is good design to restrict knowledge of the transport address and ports to your low-level
networking code only, and to use RTP-level mechanisms for participant identification. RTP provides a "
synchronization source" fo r t his p ur p o se, d es cr ib ed in mo r e d etail lat er in t his chap ter .
In particular, note these tips:
Yo u s ho uld no t us e a t ra ns p or t a d dr es s a s a p a rt ic ip a nt id e nt ifie r b e ca us e t he d a ta ma y ha ve p a ss e d t hr ough
a t r ans lat or o r mixer t hat may hid e t he o r iginal s o ur ce ad d res s . I ns t ead , us e t he s ynchr o nizat io n s o ur ce
identifiers.
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
59/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
60/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
61/208
The RTP Data Transfer Packet
T he fo r mat o f an R TP d at a t r ans fer p acket is illus tr ated in Figure 4.2. The re a re fo ur p a rt s t o t he p a ck e t:
1.
The mandatory RTP header
2.
An optional header extension
3.
An optional payload header (depending on the payload format used)
4.
The p aylo ad d ata it self
Fig ure 4 . 2 . A n R T P D a t a T ra ns f e r Pa c k e t
The entire RTP packet is contained within a lower-layer payload, typically UDP/IP.
H e a de r E l e me n ts
The mandatory RTP data packet header is typically 12 octets in length, although it may contain a contributing source
lis t , w hich can exp and t he lengt h b y 4 t o 6 0 ad d it io nal o ctet s. T he field s in t he mand ato r y head er ar e t he p aylo ad
type, sequence number, time- stamp, and synchronization source identifier. In addition, there is a count of contributing
s o ur ces, a mar k er fo r int eres ting event s , s up p o rt fo r p ad ding and a head er ext ens io n, and a ver sio n numb er .
PAYLOAD TYPE
The payload type, or PT, field of the RTP header identifies the media transported by an RTP packet. The receiving
application examines the payload type to determine how to treat the data—for example, passing it to a particular
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
62/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
63/208
[ Te am LiB ]
Packet Validation
Because RTP sessions typically use a dynamically negotiated port pair, it is especially important to validate that
packets received really are RTP, and not misdirected other data. At first glance, confirming this fact is nontrivial
because RTP packets do not contain an explicit protocol identifier; however, by observing the progression of header
field s o ver s ever al p acket s, w e can q uick ly o b tain s t ro ng co nfid ence in t he valid it y o f an RT P s tr eam.
P o ss ib le va lid it y c he c ks t ha t c a n b e p e rfo rme d o n a s tr ea m o f RTP p a c k et s a re o ut line d in A pp e nd ix A o f t he RTP
s p ecificatio n. Ther e ar e t wo t yp es o f t est s :
1.
Per-packet checking, based on fixed known values of the header fields. For example, packets in which the
ver sio n numb er is no t eq ual t o 2 ar e invalid , as ar e t ho s e w it h an unexp ected p aylo ad t yp e.
2.
Per-flow checking, based on patterns in the header fields. For example, if the SSRC is constant, and thesequence number increments by one with each packet received, and the timestamp intervals are appropriate
for the payload type, this is almost certainly an RTP flow and not a misdirected stream.
The p er - flo w check s ar e mo r e lik ely t o d etect invalid p acket s, b ut t hey r eq uir e ad d it io nal s t at e t o b e k ep t in t he
r eceiver . This s t at e is r eq uir ed fo r a valid s o ur ce, but car e mus t b e t aken b ecaus e ho ld ing t o o much s t at e t o d et ect
invalid s o ur ces can lead t o a d enial- o f- s er vice at tack , in w hich a malicio us s o ur ce flo o d s a r eceiver w it h a s t ream o f
bogus packets designed to use up resources.
A robust implementation will employ strong per-packet validity checks to weed out as many invalid packets as
possible before committing resources to the per-flow checks to catch the others. It should also be prepared toaggr es sively d is card s t at e fo r s o ur ces t hat ap p ear t o b e b o gus , t o mit igat e t he effect s o f d enial- o f- s er vice at tack s.
I t is als o po s s ib le t o valid ate t he co nt ent s o f an RT P d ata s t ream agains t t he co r res p ond ing RT C P co nt ro l p acket s.
To d o t h is , t he ap p licatio n d is card s RT P p ack ets unt il an R TC P s o ur ce d es crip t io n p ack et w it h t he s ame S S RC is
received. This is a very strong validity check, but it can result in significant validation delay, particularly in large
s es sio ns ( b ecaus e t he R TC P r ep or ting int erval can b e many s econd s ). F o r t his r easo n w e r ecommend t hat
applications validate the RTP data stream directly, using RTCP as confirmation rather than the primary means of
validation.
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
64/208
[ Te am LiB ]
i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t
-
8/17/2019 RTP Audio and Video for the Internet
65/208
Tr a ns la t or s a nd M i xe r s
I n ad d it io n t o no r mal end s ys t ems , RT P s up p o rt s mid d le b o xes t hat can o p erat e o n a med ia s t ream w it hin a s es sio n.
Two classes of middle boxes are defined: translators and mixers.
T r a n s l a t o r s
A translator is an intermediate system that operates on RTP data while maintaining the synchronization source and
timeline of a stream. Examples include systems that convert between media-encoding formats without mixing, that
bridge between different transport protocols, that add or remove encryption, or that filter media streams. A translator
is invis ib le t o t he R TP end s ys t ems unles s t ho s e s ys t ems have p r io r k no wled ge o f t he unt rans lat ed m ed ia. Ther e ar e a
few clas s es o f t rans lat or s :
Bridges. Bridges are one-to-one translators that don't change the media encoding—for example, gateways
between different transport protocols, like RTP/UDP/IP and RTP/ATM, or RTP/UDP/IPv4 and
RTP/UDP/IPv6. Bridges make up the simplest class of translator, and typically they cause no changes to the
RTP or RTCP data.
Transcoders. Transcoders are one-to-one translators that change the media encoding—for example,
d ecod ing t he co mp r ess ed d ata and r eenco d ing it w it h a d iffer ent p aylo ad fo r mat — to b ett er s uit t he
characteristics of the output network. The payload type usually changes, as may the padding, but other RTP
header fields generally remain unchanged. These translations require state to be maintained so that the RTCP
s end er r ep or ts can b e ad jus t ed t o mat ch, b ecaus e t hey co nt ain co unt s o f s o ur ce b it r ate.
Exploders. Exploders are one-to-many translators, which take in a single packet and produce multiple packets. For example, they receive a stream in which multiple frames of codec output are included within
each R TP p ack et, and t hey p r od uce o ut p ut w it h a s ingle fr ame p er p acket . The gener ated p ack et s have t he
s ame S S RC , b ut t he o t her R TP head er field s may have t o b e changed , d ep end ing o n t he t rans lat io n. T hes e
translations require maintenance of bidirectional state: The translator must adjust both outgoing RTCP sender
reports and returning receiver reports to match.
Mergers. Mergers are many-to-one translators, combining multiple packets into one. This is the inverse of the
previous category, and the same issues apply.
T he d efining char act eris t ic o f a t rans lat or is t hat each inp ut s t ream p r o duces a s ingle o ut p ut s t ream, w it h t he s ame
S S RC . The t rans lat or it s elf is no t a p ar ticip ant in t he R TP s e