rtp audio and video for the internet

Upload: pvsairam

Post on 06-Jul-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/17/2019 RTP Audio and Video for the Internet

    1/208

    [ Te am LiB ]

    • Tab le o f C o nt ent s

    R T P: A udio a nd V ide o f o r t he I nt e rne t

    By Colin Perkins

    Publisher: Addison Wesley

    Pub Date: June 12, 2003

    I S BN : 0 - 6 7 2 - 3 2 2 49 - 8P ages : 4 3 2

    The Real-time Transport Protocol (RTP) provides a framework for delivery of audio and video across IP networks

    and unprecedented quality and reliability. In RTP: Audio and Video for the Internet, Colin Perkins, a leader of the

    RTP standardization process in the IETF, offers readers detailed technical guidance for designing, implementing, and

    managing any RTP- based system.

      By bringing together crucial information that was previously scattered or difficult to find, Perkins has created anincr edib le r es our ce t hat enab les p r o fes s io nals t o lever age R TP ' s b enefit s in a w id e r ange o f V oice- o ver I P (V olP ) and

    streaming media applications. He demonstrates how RTP supports audio/video transmission in IP networks, and

    shares strategies for maximizing performance, robustness, security, and p rivacy.

      Comprehensive, exceptionally clear, and replete with examples, this book is the definitive RTP reference for every

    audio/video application designer, de veloper, rese archer, and ad ministrator.

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

    http://www.informit.com/safari/author_bio.asp@ISBN=0672322498http://www.informit.com/safari/author_bio.asp@ISBN=0672322498

  • 8/17/2019 RTP Audio and Video for the Internet

    2/208

    [ Te am LiB ]

    • Tab le o f C o nt ent s

    R T P: A udio a nd V ide o f o r t he I nt e rne t

    By Colin Perkins

    Publisher: Addison Wesley

    Pub Date: June 12, 2003

    I S BN : 0 - 6 7 2 - 3 2 2 49 - 8P ages : 4 3 2

    Copyright

    Preface

    Introduction

    Organization of the Book 

    Intended Audience

    AcknowledgmentsPart I. Introduction to Networked Multimedia

    Chapter 1. An Introduction to RTP

    A Brief History of Audio/Video Networking

    A S na p sho t o f RTP

    Related Standards

    Overview of an RTP Implementation

    Summary

    Chapter 2. Voice and Video Communication Over Packet Networks

    TC P / IP and t he O S I R efer ence M o d el

    Performance Characteristics of an IP Network 

    Measuring IP Network Performance

    Effects of Transport Protocols

    Requirements for Audio/Video Transport in Packet Networks

    Summary

    P a rt I I. M ed ia Tr a ns p or t U sing RTP

    Chapter 3. The Real-Time Transport Protocol

    Fundamental Design Philosophies of RTP

    Standard Elements of RTPRelated Standards

    Future Standards Development

    Summary

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

    http://www.informit.com/safari/author_bio.asp@ISBN=0672322498http://www.informit.com/safari/author_bio.asp@ISBN=0672322498

  • 8/17/2019 RTP Audio and Video for the Internet

    3/208

    Chapter 4. RTP Data Transfer Protocol

    RTP Sessions

    The RT P Dat a Tr ans fer P ack et

    Packet Validation

    Translators and Mixers

    Summary

    Chapter 5. RTP Control Protocol

    Components of RTCPTransport of RTCP Packets

    RTCP Packet Formats

    Security and Privacy

    Packet Validation

    Participant Database

    Timing Rules

    Summary

    Chapter 6. Media Capture, Playout, and Timing

    Be ha vio r o f a S e nd e r  

    Media Capture and Compression

    Generating RTP Packets

    B ehavio r o f a R eceiver  

    Packet Reception

    The Playout Buffer 

    Adapting the Playout Point

    Decoding, Mixing, and Playout

    Summary

    Chapter 7. Lip Synchronization

    Sender Behavior Receiver Behavior 

    Synchronization Accuracy

    Summary

    Part III. Robustness

    Chapter 8. Error Concealment

    Techniques for Audio Loss Concealment

    Techniques for Video Loss Concealment

    Interleaving

    Summary

    Chapter 9. Error Correction

    Forward Error Correction

    Channel Coding

    Retransmission

    Implementation Considerations

    Summary

    Chapter 10. Congestion Control

    The N eed fo r C o nges tio n C o nt ro l

    Congestion Control on the InternetImplications for Multimedia

    Congestion Control for Multimedia

    Summary

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    4/208

    Part IV. Advanced Topics

    Chapter 11. Header Compression

    Introductory Concepts

    Compressed RTP

    Robust Header Compression

    Considerations for RTP Applications

    Summary

    Chapter 12. Multiplexing and Tunneling

    The Motivation for Multiplexing

    Tunneling Multiplexed Compresse d RTP

    Other Approaches to Multiplexing

    Summary

    Chapter 13. Security Considerations

    Privacy

    Confidentiality

    Authentication

    Replay ProtectionDenial of Service

    Mixers and Translators

    Active Content

    Other Considerations

    Summary

    References

    IETF RFC Standards

    IETF Internet-Drafts

    Other Standards

    C o nfer ence and J o ur nal P ap ers

    Books

    W e b S it es

    Other References

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    5/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    6/208

    Copyright

    Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks.

    W her e t ho s e d es ignat io ns ap p ear in t his b o o k, an d A dd is o n- W es ley w as awar e o f a t rad emar k claim, t he

    designations have been printed with initial capital letters or in all capitals.

      T he aut ho r and p ub lis her have t aken car e in t he p r epar atio n o f t his b o o k, b u t mak e no exp r ess ed o r imp lied w arr ant yof any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential

    d amages in co nnect io n w it h o r ar is ing o ut o f t he us e o f t he info r mat io n o r p r ogr ams co nt ained her ein.

      The p ub lis her o ffer s d is co unt s o n t his b o o k w hen o r der ed in q uant it y fo r b ulk p ur chas es and s p ecial s ales . F o r mo r e

    information, please contact:

      U.S. Corporate and Government Sales

    ( 80 0 ) 3 8 2- 3 4 1 9

    [email protected]

      F o r s a le s o ut sid e o f t he U .S . , p le a se c o nt ac t:

      International Sales

    ( 31 7 ) 5 8 1- 3 7 9 3

    [email protected]

      V is it A dd is o n- W es ley o n t he W eb : www.awprofessional.com

      Library of Congress Cataloging-in-Publication Data

      L C C N : 2 0 0 10 8 9 23 4

      C o p y right © 2 0 0 3 b y P ears o n E ducatio n, I nc.

      A ll r ight s r eser ved . N o p ar t o f t his p ub licatio n may b e r ep ro d uced , s t o red i n a r etr ieval s ys t em, o r t rans mit ted , i n any

    form, or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior consent of                 

    the publisher. Printed in the United States of America. Published simultaneously in Canada.

      F o r info r mat io n o n o b taining p er mis s io n fo r us e o f mat er ial fr o m t his w or k , p leas e s ub mit a w rit ten r eq ues t t o :

      Pearson Education, Inc.

    Rights and Contracts Department

    75 Arlington Street, Suite 300

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

    mailto:[email protected]:[email protected]://www.awprofessional.com/default.htmhttp://www.awprofessional.com/default.htmmailto:[email protected]:[email protected]

  • 8/17/2019 RTP Audio and Video for the Internet

    7/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    8/208

    [ Te am LiB ]

    Pr e f ac e

      Introduction

    Organization of the Book 

    Intended Audience

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    9/208

    [ Te am LiB ]

    Introduction

      This book describes the protocols, standards, and architecture of systems that deliver real-time voice, music, and

    video over IP networks, such as the Internet. These systems include voice- over-IP, telephony, teleconferencing,

    streaming video, and webcasting applications. The book focuses on media transport: how to deliver audio and video

    r eliab ly acr os s an I P net wo r k, h o w t o ens ur e high q ualit y in t he face o f net wo r k p r ob lems , and h o w t o ens ur e t hat t he

    s ys t em is s ecur e.

      The book adopts a standards-based approach, based around the Real-time Transport Protocol (RTP) and its

    associated profiles and payload formats. It describes the RTP framework, how to build a system that uses that

    framework, and extensions to RTP for security and reliability.

      M any med ia co d ecs ar e s uit able fo r us e w it h R TP — fo r examp le, M P EG aud io and v id eo ; I TU H . 26 1 and H . 2 63

    video; G.711, G.722, G.726, G.728, and G.729 audio; and industry standards such as GSM, QCELP, and AMR 

    audio. RTP implementations typically integrate existing media codecs, rather than developing them specifically.

    Accordingly, this book describes how media codecs are integrated into an RTP system, but not how media codecsare designed.

      Call setup, session initiation, and control protocols, such as SIP, RTSP, and H.323, are also outside the scope of this

     book. Most RTP implementations are used as part of a complete system, driven by one of these control protocols.

    However, the interactions between the various parts of the system are limited, and it is possible to understand media

    transport without understanding the signaling. Similarly, session description using SDP is not covered, because it is

     part of the signaling.

      Res o ur ce r es er vat io n is us eful in s o me s it uat io ns , b ut it is no t r equir ed fo r t he co r rect o p erat io n o f R TP . This b o o k  

    touches on the use of resource reservation through both the Integrated Services and the Differentiated Servicesframeworks, but it does not go into details.

      That t hes e ar eas ar e no t co ver ed in t his b o o k d o es no t mean t hat t hey ar e unimp o rt ant . A s ys t em us ing R TP will us e

    a r ange o f med ia co d ecs and w ill emp lo y s o me fo r m o f call s etup , s es sio n init iat io n, o r co nt ro l. T he w ay t his is d o ne

    d ep end s o n t he ap p licatio n, t ho ugh: The need s o f a t elep ho ny s ys t em ar e ver y d iffer ent fr o m t ho s e o f a w eb cast ing

    application. This book describes only the media transport layer that is common to all those systems.

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    10/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    11/208

    O r g an iza ti o n o f t he B o ok  

      The book is logically divided into four parts: P art I, Introduction to N etworked Multimedia, introduces the problem

    space, provides background, and outlines the properties of the Internet that affect audio/video transport:

      C ha p te r 1, An Introduction to RTP, gives a brief introduction to the Real-time Transport Protocol, outlines

    the relationship between RTP and other standards, and describes the scope of the book.

    C ha p te r 2, Voice and Video Communication over Packet Networks, describes the unique environment

     provided by IP networks, and how this environment affects packet audio/video applications.

      The next five chapters, which constitute P ar t I I, M ed ia Tr ans p o rt U sing R TP , d is cus s t he b as ics o f t he R eal- t ime

    Transport Protocol.

      R o a d M a p fo r Thi s B oo k  

    You will need this information to design and build a tool for voice-over-IP, streaming music or video, and so on.

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    12/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    13/208

    [ Te am LiB ]

    I nt e nd e d A ud i e nc e

      This book describes audio/video transport over IP networks in considerable detail. It assumes some basic familiarity

    with IP network programming and the operation of network protocol stacks, and it builds on this knowledge to

    describe the features unique to audio/video transport. An extensive list of references is included, pointing readers to

    additional information on specific topics and to background reading material.

      Several classes of readers might be expected to find this book useful:

      Engineers. The primary audience is those building voice- over-IP applications, teleconferencing systems, and

    streaming media and webcasting applications. This book is a guide to the design and implementation of the

    media engine of such systems. It should be read in conjunction with the relevant technical standards, and it

     builds on those standards to show how a system is built. This book does not discuss signaling (for example,

    S IP , RTS P , o r H. 3 23 ), w hic h is a s e pa ra te s ub je c t w or thy o f a b o ok in it s o wn r ight . Ins te a d it t alk s in d e ta il

    about media transport, and how to achieve good-quality audio and smooth-motion video over IP networks.

    S t ud ent s. The b o o k can b e r ead as an acco mp animent t o a co ur s e in net wo r k p r ot o co l d es ign o r  

    telecommunications, at either a graduate or an advanced undergraduate level. Familiarity with IP networks

    and layered protocol architectures is assumed. The unique aspects of protocols for real-time audio/video

    transport are highlighted, as are the differences from a typical layered system model. The cross-disciplinary

    nature of the subject is highlighted, in particular the relation between the psychology of human perception and

    t he d emand s o f r o bus t med ia d eliver y.

    R es earcher s . A cad emics and ind us tr ial r es ear cher s can us e t his b o o k as a s o ur ce o f info r mat io n ab o ut t he

    s t and ar ds and algo r it hms t hat co ns t it ut e t he cur r ent s t at e o f t he ar t in r eal- t ime aud io / vid eo t rans p o rt o ver I P

    networks. Pointers to the literature are included in the References section, and they will be useful starting

     points for those seeking further depth and areas where more research is needed.

      Network administrators. An understanding of the technical protocols underpinning the common streaming

    audio/video applications is useful for those administering computer networks—to show how those

    applications can affect the behavior of the network, and how the network can be engineered to suit those

    applications better. This book includes extensive discussion of the most common network behavior (and how

    applications can adapt to it), the needs of congestion control, and the security implications of real-timeaudio/video traffic.

      In s ummar y, t his b o o k can b e us ed as a r efer ence, in co nj unct io n w it h t he t echnical s t and ard s , as a s tud y guid e, o r as

     part of an advanced course on network protocol design or communication technology.

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    14/208

    [ Te am LiB ]

    Acknowledgments

    A b o o k s uch as t his is no t w rit ten in is o lat io n; r ather it is s hap ed b y t he aut ho r ' s exp er iences and int eractio ns w it h

    other researchers and practitioners. In large part, I gained my experience while working at University College

    Lo nd o n. I am gr ateful t o V ick y H ar d man, P et er K ir s tein, an d A ngela S as se fo r t he o p p or tunit y t o w o rk o n t heir   projects, and to Anna Bouch, Ian Brown, Anna Conniff, Jon Crowcroft, Panos Gevros, Atanu Ghosh, Mark 

    Handley, Tristan Henderson, Orion Hodson, Nadia Kausar, Isidor Kouvelas, Piers O' Hanlon, Louise Sheeran,

    Lorenzo Vicisano, and everyone else associated with G11, for providing a stimulating working environment, a nd a

    distracting social scene.

    I wish to thank my colleagues at the USC Information Sciences Institute for their support, in particular Alec

    Aakesson, Helen Ellis, Jarda Flidr, Ladan Gharai, Tom Lehman, Dan Massey, and Nikhil Mittal. Allison Mankin

     provided the opportunity to work at USC/ISI, for which I am grateful.

    On a personal note, Peter Phillips, Stewart Cambridge, Sonja Krugmann, and Alison Gardiner each helped me make

    t he b ig mo ve, in t heir o wn s p ecial w ay. I t hank yo u.

    The staff at Addison-Wesley did an excellent job in the production of this book. In particular, Dayna Isley and Jessica

    Goldstein provided encouragement to a new author and showed great patience during endless revisions. Thanks are

    also due to Amy Fleischer, Elizabeth Finney, Laurie McGuire, Cheri Clark, Rebecca Martin, and Stephanie Hiebert.

    The technical editors—Steve Casner and Orion Hodson—did sterling work, significantly improving the quality of the

     book, correcting many mistakes and contributing significantly to the text. Any errors that remain are mine alone.

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    15/208

    [ Te am LiB ]

    P ar t I : I nt ro d uc ti o n t o N e two r ke d

    Multimedia1 An Introduction to RTP

    2 Voice and Video Communication over Packet Networks

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    16/208

    [ Te am LiB ]

    C ha p te r 1 . A n In t ro d uc ti on t o R TP

      A Brief History of Audio/Video Networking

    A S na p sho t o f RTP

    Related Standards

    Overview of an RTP Implementation

      The Internet is changing: Static content is giving way to streaming video, text is being replaced by music and the

    spoken word, and interactive audio and video is becoming commonplace. These changes require new applications,

    and they pose new and unique challenges for application designers.

      This book describes how to build these new applications: voice- over-IP, telephony, teleconferencing, streaming

    video, and webcasting. It looks at the challenges inherent in reliable delivery of audio and video across an IP network,

    and it exp lains ho w t o ens ur e high q ualit y in t he face o f net wo r k p r ob lems , as w ell as ho w t o ens ur e t hat t he s ys t em is

    secure. The emphasis is on open standards, in particular those devised by the Internet Engineering Task Force (IETF)

    and the International Telecommunications Union (ITU), rather than on proprietary solutions.

      This chap ter b egins o ur examinat io n o f t he R eal- t ime Tr ans p o rt P r o to co l ( RTP ) w it h a b r ief lo o k at t he his t or y o f                  

    audio/video networking and an overview of RTP and its relation to other standards.

    Throughout this text, extensive references are provided, as indicated by superscript numbers that map to

    ent ries in t he R efer ences s ectio n at t he end o f t he b o o k. B ecaus e t he RT P s tand ar d is s t ill evo lving, and

     because it intersects with so many other technologies, these references are provided to help readers gain

    additional background information and pursue further research interests.

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    17/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    18/208

    A B r ie f H i st o ry o f A ud io / Vi de o N e tw or ki ng

      The id ea o f us ing p ack et net wo r ks — s uch as t he I nt ernet — to t rans p o rt vo ice and vid eo i s no t new. Exp eriment s w it h

    vo ice o ver p acket net wo r ks s t ret ch b ack t o t he ear ly 1 9 7 0s . The fir s t RF C o n t his s ub j ect — the N et wo r k V oice

    P r o to co l ( N VP )1 —dates from 1977. Video came later, but still there is over ten years of experience with audio/video

    conferencing and streaming on the Internet.

      E a rl y P a c k e t Vo i c e a nd Vi de o E x pe r ime n ts

      The init ial d evelo p ers o f N V P w er e r es ear cher s t rans mit ting p acket vo ice o ver t he A RP AN E T, t he p r ed ecess o r t o

    the Internet. The ARPANET provided a reliable-stream service (analogous to TCP/IP), but this introduced too much

    delay, so an "uncontrolled packet" service was developed, akin to the modern UDP/IP datagrams used with RTP.

    The NVP was layered directly over this uncontrolled packet service. Later the experiments were extended beyond

    the ARPANET to interoperate with the Packet Radio Network and the Atlantic Satellite Network (SATNET),

    running NVP over those networks.

      A ll o f t he se e a rly e xp e rime nt s we re limit ed t o o ne o r t wo vo ic e c ha nne ls a t a t ime b y t he lo w b a nd wid th o f t he e a rly

    net wo r ks . I n t he 1 9 8 0s , t he cr eat io n o f t he 3 - M b p s W id eb and S at ellit e N et wo r k enab led no t o nly a lar ger numb er o f                voice channels but also the development of packet video. To access the one-hop, reserved-bandwidth, multicast

    service of the satellite network, a connection-oriented inter-network protocol called the Stream Protocol (ST) was

    d evelo p ed. Bo t h a s econd v ers io n o f N V P , called N V P - I I, and a co mp anio n P ack et V id eo P r o to co l w er e

    transported over ST to provide a prototype packet-switched video teleconferencing service.

      I n 1 9 8 9– 1 9 90 , t he s atellit e net wo r k w as r eplaced w it h t he Ter res tr ial W id eb and N et w or k and a r esear ch net wo r k  

    called DARTnet while ST evolved into ST-II. The packet video conferencing system was put into scheduled

     production to support geographically distributed meetings of network researchers and others at up to five sites

    simultaneously.

      ST and ST-II were operated in parallel with IP at the inter-network layer but achieved only limited deployment on

    government and research networks. As an alternative, initial deployment of conferencing using IP began on DARTnet,

    enabling multiparty conferences with NVP-II transported over multicast UDP/IP. At the March 1992 meeting of the

    IETF, audio was transmitted across the Internet to 20 sites on three continents over multicast "tunnels"—the Mbone

    (which stands for "multicast backbone")—extended from DARTnet. At that same meeting, development of RTP was

     begun.

      Au di o a n d V id e o o n t he I n te r ne t

      Following from these early experiments, interest in video conferencing within the Internet community took hold in the

    ear ly 1 9 9 0s . A t ab o ut t his t ime, t he p r o ces s ing p o wer and mult imed ia cap ab ilit ies o f w o rk s tat io ns and P C s b ecame

    sufficient to e nable the simultaneous capture, compression, and playback of audio and video streams. In parallel,

    development of IP multicast allowed the transmission of real-time data to any number of recipients connected to the

    Internet.

      Video conferencing and multimedia streaming were obvious and well-e xecuted multicast applications. Research

    gr o up s t o ok t o d evelo p ing t o ols s uch as vic and vat fr o m t he Lawr ence B er k eley Lab o rat or y,87 nevot from the

    University of Massachusetts, the INRIA video conferencing system, nv from Xerox PARC, and rat from University

    College London.77 These tools followed a new approach to conferencing, based on connectionless protocols, the

    end-to - end argument, and application-level framing.65,70,76 Conferences were minimally managed, with no

    admission or floor control, and the transport layer was thin and adaptive. Multicast was used both for wide-area datatransmission and as an interprocess communication mechanism between applications on the same machine (to

    exchange synchronization information betwee n audio and video tools). The resulting collabora tive environment

    consisted of lightly coupled applications and highly distributed participants.

      The multicast conferencing (Mbone) tools had a significant impact: They led to widespread understanding of the

     problems inherent in delivering real-time media over IP networks, the need for scalable solutions, and error and

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    19/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    20/208

    [ Te am LiB ]

    A S na ps ho t o f RTP

      The key standard for audio/video transport in IP networks is the Real-time Transport Protocol (RTP), along with its

    associated profiles and payload formats. RTP aims to provide services useful for the transport of real-time media,

    such as audio and video, over IP networks. These services include timing recovery, loss detection and correction,

     payload and source identification, reception quality feedback, media synchronization, and membership management.

    RTP was originally designed for use in multicast conferences, using the lightweight sessions model. Since that time, it

    has proven useful for a range of other applications: in H.323 video conferencing, webcasting, and TV distribution; and

    in both wired and cellular telephony. The protocol has been demonstrated to scale from point-to-point use to

    multicast sessions with thousands of users, and from low-bandwidth cellular telephony applications to the delivery of                 

    uncompressed High- Definition Television (HDTV) signals at gigabit rates.

      RTP was developed by the Audio/Video Transport working group of the IETF and has since been adopted by the

    I TU as p ar t o f it s H . 32 3 s eries o f r ecommend atio ns , and b y var io us o t her s t and ar ds o r ganizat io ns . The fir s t ver s io n o f                  

    R TP w as co mp let ed in J anuar y 1 9 9 6.6 RT P need s t o b e p r o filed f o r p ar ticular us es b efo r e it is co mp let e; an init ial

     profile was defined along with the RTP specification,7 and several more profiles are under development. Profiles areaccompanied by several payload format specifications, describing the transport of a particular media format.

    Development of RTP is ongoing, and a revision is nearing completion at the time of this writing.49,50

      A detailed introduction to RTP is provided in C ha p te r 3, T he R eal- t ime Tr ans p o rt P r o to co l, and mo s t o f t his b o o k  

    discusses the design of systems that use RTP and its various extensions.

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    21/208

    [ Te am LiB ]

    R e l at e d S t an da r ds

      I n ad d it io n t o R TP , a co mp let e s ys t em t yp ically r eq uir es t he us e o f var io us o t her p r o to co ls and st and ar ds fo r s es sio n

    announcement, initiation, and control; media compression; and network transport.

      Figure 1.1  shows how the negotiation and call control protocols, the media transport layer (provided by RTP), the

    compression-decompression algorithms (codecs), and the underlying network are related, according to both the IETF

    and ITU conferencing frameworks. The two parallel sets of call control and media negotiation standards use the same

    media transport framework. Like-wise, the media codecs are common no matter how the session is negotiated and

    irrespective of the underlying network transport.

      Figure 1.1. IETF and ITU Protocols for Audio/Video Transport on the Internet

      The relation between these standards and RTP is outlined further in C ha p te r 3, The Real-time Transport Protocol.

    However, the main focus of this book is media transport, rather than signaling and control.

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    22/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    23/208

    O ve r vi e w o f a n RTP I mp le me nt at i on

      As Figure 1.1  s ho ws , the co r e o f any s ys t em fo r d eliver y o f r eal- t ime aud io / vid eo over I P is R TP : I t p r o vid es t he

    common media transport layer, independent of the signaling protocol and application. Before we look in more detail

    at R TP an d t he d es ign o f s ys tems us ing R TP , it w ill b e us eful t o have an o ver view o f t he r es po ns ib ilit ies o f RT P

    s end ers and r eceiver s in a s ys t em.

      B e ha vio r o f a n R TP S e nde r

      A sender is responsible for capturing and transforming audiovisual data for transmission, as well as for generating

    RTP packets. It may also participate in error correction and congestion control by adapting the transmitted media

    s t ream in r esp o ns e t o r eceiver feed back . A d iagr am o f t he s end ing p r o ces s is s ho wn in Figure 1.2.

      F ig ure 1 . 2 . B l oc k D ia g ra m o f a n R TP S e nde r

      Uncompressed media data—audio or video—is captured into a buffer, from which compressed frames are

     produced. Frames may be encoded in several ways depending on the compression algorithm used, and encoded

    fr ame s ma y d e pe nd o n b ot h e a rlie r a nd la te r d a ta .

      Compressed frames are loaded into RTP packets, ready for sending. If frames are large, they may be fragmented

    into several RTP packets; if they are small, several frames may be bundled into a single RTP packet. Depending on

    t he e rr or c o rr ec tio n s c he me in us e , a c ha nne l c o de r ma y b e us e d t o ge ne ra te e r ro r c o rr e ct io n p a ck e ts o r t o r e or de r  

     packets before transmission.

      A ft er t he R TP p ack ets have b een s ent , t he b uffer ed med ia d ata co r res p ond ing t o tho s e p ack et s is event ually fr eed.

    The s e nd e r mus t no t d is c ar d d a ta t ha t might b e ne e de d fo r e r ro r c o rr e ct io n o r fo r t he e nc o ding p ro c es s. This

    r eq uir e me nt ma y me a n t ha t t he s e nd e r mus t b uffe r d a ta fo r s o me t ime a ft er t he c o rr es p ond ing p a ck e ts ha ve b e en s e nt ,

    d ep end ing o n t he co d ec and er ro r co r rectio n s cheme us ed .

    The s end er is r es po ns ib le fo r gener ating p er io d ic s t at us r epo r ts fo r t he med ia s t reams it is gener ating, includ ing t ho s e

    required for lip synchronization. It also receives reception quality feedback from other participants and may use that

    information to adapt its transmission.

      B e ha vio r o f a n R TP R e c eive r

      A receiver is responsible for collecting RTP packets from the network, correcting any losses, recovering the timing,

    decompressing the media, and presenting the result to the user. It also sends reception quality feedback, allowing the

    s end er t o ad ap t t he t rans mis s io n t o t he r eceiver , an d i t maint ains a d atab ase o f p ar ticip ant s in t he s es sio n. A p o s sib le block diagram for the receiving process is shown in Figure 1.3; implementations sometimes perform the operations in

    a d iffer ent o r der d ep end ing o n t heir need s.

      Fi gure 1 . 3 . B l oc k D ia g ra m o f a n R TP R e c e i v e r

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    24/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    25/208

    [ Te am LiB ]

    Summary

      This chapter has introduced the protocols and standards for real-time delivery of multimedia over IP networks, in

     particular the Real-time Transport Protocol (RTP). The remainder of this book discusses the features and use of RTP

    in detail. The aim is to expand on the standards documents, explaining both the rationale behind the standards and

     possible implementation choices and their trade-offs.

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    26/208

    [ Te am LiB ]

    C ha pt er 2 . V oi ce a nd V id eo C om m un ica ti on O ve r

    Packet Networks

      T C P /I P and t he O S I R efer ence M o d el

    Performance Characteristics of an IP Network 

    Measuring IP Network Performance

    Effects of Transport Protocols

    Requirements for Audio/Video Transport in Packet Networks

      Before delving into details of RTP, you should understand the properties of IP networks such as the Internet, and

    how they affect voice and video communication. This chapter reviews the basics of the Internet architecture and

    outlines typical behavior of a network connection. This review is followed by a discussion of the transport

    r eq uir ement s fo r aud io and vid eo , and ho w w ell t hes e r equir ement s ar e met b y t he net wo r k.

      IP networks have unique characteristics that influence the design of applications and protocols for audio/video

    t rans p o rt . U nd ers t and ing t hes e char act eris t ics is vit al if yo u ar e t o ap p reciat e t he t rad e- o ffs invo lved in t he d es ign o f          RTP, and how they influence applications that use RTP.

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    27/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    28/208

    TC P / IP a nd t he O S I R e fe r e nc e M o de l

      When you're thinking about computer networks, it is important to understand the concepts and implications of                 

     protocol layering. The OSI reference model,93 illustrated in Figure 2.1, p r o vid es a us eful b as is fo r d is cus s io n and

    comparison of layered systems.

      F ig ure 2 . 1 . T he O S I R e f e r e nc e M o de l

      The mo d el co mp r is es a s et o f s even layer s , each b uild ing o n t he s er vices p r ovid ed b y t he lo wer layer and , in t ur n,

     providing a more abstract service to the layer above. The functions of the layers are as listed here:

    1.

      Physical layer. The lowest layer—the physical layer—includes the physical network connection devices and

     protocols, such as cables, plugs, switches, and electrical standards.2.

    D at a link layer . The d ata link layer b uild s o n t he p hys ical co nnectio n; fo r examp le, it t ur ns a t wis t ed - p air cab le

    into Ethernet. This layer provides framing for data transport units, defines how the link is shared among

    multiple connected devices, and supplies addressing for devices on each link.

    3.

      Network layer. The network layer connects links, unifying them into a single network. It provides addressing

    and r o ut ing o f mes s ages t hr o ugh t he net wo r k. I t may als o p r o vid e co nt ro l o f co nges tio n in t he s wit ches ,

     prioritization of certain messages, billing, and so on. A network layer device processes messages receivedfrom one link and dispatches them to another, using routing information exchanged with its peers at the far 

    end s o f t ho s e link s .

    4.

    Transport layer. The transport layer is the first end-to-end layer. It takes responsibility for delivery of                 

    messages from one system to another, using the services provided by the network layer. This responsibility

    inc lud es r ovid in r elia bilit a nd flo w c ont ro l if t he a re ne ed ed b t he s es sio n la e r a nd no t r ovid ed b the

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    29/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    30/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    31/208

    P e r fo r ma nc e C ha r ac t er i st i c s o f a n I P N e tw or k  

      As is ap p arent fr o m t he ho ur glas s mo d el o f t he I nt ernet ar chit ectur e, an ap p licatio n is hid d en fr o m t he d etails o f t he

    lo wer layer s b y t he ab s tr actio n o f I P . This means it ' s no t p o s sib le t o d eter mine d ir ectly t he t yp es o f net wo r ks acr os s

    which an IP packet will have traveled—it could be anything from a 14.4-kilobit cellular radio connection to a

    multi-gigabit optical fiber—or the level of congestion of that network. The only means of discovering the performance

    of the network are observation and measurement.

      S o w hat d o w e need t o meas ur e, and ho w d o w e meas ur e it ? Luck ily, the d es ign o f t he I P layer means t hat t he

    numb er o f p ar amet ers is limit ed, and t hat numb er o ft en can b e fur ther co ns t rained b y t he need s o f t he ap p licatio n. The

    mo s t imp o r tant q ues tio ns w e can as k ar e t hes e:

      W ha t is t he p ro b ab ilit y t ha t a p a ck e t will b e lo s t in t he ne tw or k?

    W ha t is t he p ro b ab ilit y t ha t a p a ck e t will b e c o rr up te d in t he ne tw or k?

    Ho w lo ng d o es a p a ck e t t ak e t o t ra ve rs e t he ne tw or k? I s t he t ra ns it t ime c o ns ta nt o r va ria b le ?

    W ha t s ize o f p a ck e t c a n b e a c co mmo d at ed ?

    W ha t is t he ma ximum r at e a t whic h w e c a n s end p a c k et s?

      The next section provides some sample measurements for the first four listed parameters. The maximum rate isclo s ely t ied to t he p r ob ab ilit y t hat p ack et s ar e lo s t in t he net wo r k , as d is cus s ed in C ha p te r 1 0, Congestion Control.

      What affects such measurements? The obvious factor is the location of the measurement stations. Measurements

    t aken b etw een t wo s ys t ems o n a LA N w ill clear ly s ho w p r o per ties d iffer ent fr o m t ho s e o f a t rans atlant ic co nnect io n!

    B ut geo gr ap hy is no t t he o nly fact or ; t he numb er o f link s t raver sed ( oft en r efer red t o as t he numb er o f ho p s ), t he

    numb er o f p r o vid ers cr o ss ed , and t he t imes at w hich t he meas ur ement s ar e t aken all ar e facto r s. The I nt ernet is a

    lar ge, co mp lex, and d ynamic s ys t em, s o car e mus t b e t aken t o en sur e t hat any meas ur ement s ar e r ep res ent ative o f t he

     part of the network where an application is to be used.

      W e als o have t o co ns id er w hat s o rt o f net wo r k is b eing us ed , w hat o t her t raffic is p r esent , and ho w much o t her  

    t raffic is p r esent . T o d ate, t he vas t maj or it y o f net wo r k p aths ar e fixed , w ir ed ( eit her co p per o r o p tical fib er )co nnectio ns , and t he vas t maj or it y o f t r affic ( 9 6 % o f b yt es, 6 2 % o f flo ws , accor d ing t o a r ecent es timat e123) is TC P

     based. The implications of these traffic patterns are as follows:

      Because the infrastructure is primarily wired and fixed, the links are very reliable, and loss is caused mostly

     by congestion in the routers.

    TC P t r ans p o rt mak es t he as s ump t io n t hat p ack et lo s s is a s ignal t hat t he b o tt leneck b and wid t h has b een

    reached, congestion is occurring, and it should reduce its sending rate. A TCP flow will increase its sendingr ate unt il lo s s is o b s er ved , and t hen b ack o ff, as a w ay o f d eter mining t he maximum r ate a p ar ticular  

    connection can support. Of course, the result is a temporary overloading of the bottleneck link, which may

    affect other traffic.

      I f t he co mp o s it io n o f t he net wo r k infr ast ruct ur e o r t he t raffic changes , o t her s o ur ces o f lo s s may b eco me imp o r tant .

    F o r examp le, a lar ge incr eas e in t he numb er o f w ir eles s us er s w ould lik ely incr eas e t he p r op o r tio n o f lo s s d ue t o

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    32/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    33/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    34/208

    M e a s u r i ng I P Ne tw o r k Pe r fo r ma n c e

      This section outlines some of the available data on IP network performance, including published results on average

     packet loss, patterns of loss, packet corruption and duplication, transit time, and the effects of multicast.

      S ever al s t ud ies have meas ur ed n etw or k b ehavio r o ver a w id e r ange o f co nd it io ns o n t he p ub lic I nt ernet . F o r  

    examp le, P axs o n r epo r ts o n t he b ehavio r o f 2 0 , 00 0 t r ans fer s amo ng 3 5 s it es in 9 co unt ries ;124,95  Handley122  and

    Bolot67,66  r ep or t o n t he b ehavio r o f mult icas t s es sio ns ; and Y aj nik , M o o n, K ur o se, and T ow sley r ep or t o n t he

    t emp o ral d ep end ence in p ack et lo s s s tat is t ics .89,108,109 O t her s o ur ces o f d ata includ e t he t raffic ar chives

    maintained by CAIDA (the Cooperative Association for Internet Data Analysis),117  t he N L AN R ( N atio nal

    Laboratory for Applied Network Research),119  and the ACM (Association for Computing Machinery).116

      Ave r a g e P a c k e t Lo s s

      V ar io us p acket lo s s met rics can b e s t ud ied . F o r examp le, t he aver age lo s s r ate gives a gener al meas ur e o f net wo r k  

    congestion, while loss patterns and correlation give insights into the dynamics of the network.

      The r ep or t ed m eas ur ement s o f aver age p ack et lo s s r ate s ho w a r ange o f co nd it io ns . F o r examp le, meas ur ement s o f                  TC P /I P t ra ffic t ak e n b y P a xs o n in 1 9 94 a nd 1 9 9 5 s ho w t ha t 3 0 % t o 7 0 % o f flo ws , d e p end ing o n p a th t ak e n a nd

    d a te , s ho we d no p a ck e t lo s s, b ut o f t ho s e flo ws t ha t d id s ho w lo ss , t he a ve r age lo ss r ange d fr om 3 % t o 1 7 % ( the se

    results are summarized in T ab le 2 . 1). Data from Bolot, using 64- kilobit PCM- encoded audio, shows similar patterns,

    wit h lo ss r a te s b e twe e n 4 % a nd 1 6 % d e pe nd ing o n t ime o f d a y, a lt ho ugh t his d a ta a ls o d a te s fr om 1 9 95 . M or e r ec e nt

    results from Yajnik et al., taken using simulated audio traffic in 1997–1998, show lower loss rates of 1.38% to

    11.03%. Handley's results—two sets of approximately 3.5 million packets of data and reception report statistics for 

    multicast video sessions in May and September 1996—show loss averaged over five-second intervals varying

     between 0% and 100%, depending on receiver location and time of day. A sample for one particular receiver during

    a t en- h our p er io d o n M ay 2 9 , 1 9 9 6, p lo t ted in Figure 2.5, shows the average loss rate, sampled over five-second

    intervals, varying between 0% and 20%.

      Figure 2.5. Loss Rate Distribution versus Time 122

      Tab le 2 . 1 . P ack et L os s R at es fo r V ario us R egio ns 9 5

    Fraction of Flows Showing N o Loss Average Loss Rate for Flows with Loss

    Region Dec. 1994 Dec. 1995 Dec. 1994 Dec. 1995

    Within Europe 48% 58% 5.3% 5.9%

    Within U.S. 66% 69% 3.6% 4.4%

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    35/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    36/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    37/208

    E ffe c t s o f Tr a ns po r t P r o t oc o l s

      Thus far, our consideration of network characteristics has focused on IP. Of course, programmers almost never use

    the raw IP service. Instead, they build their applications on top of one of the higher-layer transport protocols, typically

    eit her U DP o r TC P . T hes e p r ot o co ls p r o vid e ad d it io nal featur es b eyo nd t ho s e p r o vid ed b y I P . H o w d o t hes e ad d ed

    featur es affect t he b ehavio r o f t he net wo r k as s een b y t he ap p licat io n?

      UDP/IP

      The U ser D at agr am P r o to co l ( UD P ) p r ovid es a minimal s et o f ext ens io ns t o I P . The U DP head er is s ho wn in Figure

    2.14. It comprises 64 bits of additional header representing source and destination port identifiers, a length field, and a

    checksum.

      Fig ure 2 . 1 4. Fo rma t o f a U D P H e a de r

      The source and destination ports identify the endpoints within the communicating hosts, allowing for multiplexing of               

    d iffer ent s er vices o nt o d iffer ent p o r ts . S o me s ervices r un o n w ell- k no wn p o rt s ; o t her s us e a p o r t t hat is d ynamically

    negotiated during call setup. The length field is redundant with that in the IP header. The checksum is used to detect

    co r rup t io n o f t he p aylo ad and is o p tio nal ( it is s et t o zer o fo r ap p licatio ns t hat have no us e fo r a check sum) .

      Ap ar t fr o m t he ad d it io n o f p o rt s and a check sum, U DP p r ovid es t he r aw I P s ervice. I t d o es no t p r ovid e any

    enhanced reliability to the transport (although the checksum does allow for detection of payload errors that IP does

    no t d etect) , no r d o es it affect t he t iming o f p acket d eliver y. An ap p licatio n us ing U DP p ro vid es d ata p acket s t o t he

    t rans p o rt layer , w hich d eliver s t hem t o a p o rt o n t he d es tinat io n machine ( o r t o a gr o up o f machines if mult icas t isus ed ). Tho s e p ack et s may b e lo s t, d elayed , o r mis o rd ered in t rans it , exactly as o b s er ved fo r t he r aw I P s er vice.

      T C P / I P

      The mo s t co mmo n t rans p o rt p r ot o co l o n t he I nt ernet is TC P . A lt ho ugh U DP p r ovid es o nly a s mall s et o f ad d it io ns t o

    the IP service, TCP adds a significant amount of additional functionality: It abstracts the unreliable packet delivery

    s ervice o f I P t o p r ovid e r eliab le, s eq uent ial d eliver y o f a b yt e s t ream b etw een p o rt s o n t he s o ur ce and a s ingle

    destination host.

      A n ap p licatio n us ing TC P p r ovid es a s t ream o f d ata t o t he t r ans p o rt layer , w hich fr agment s it fo r t rans mis s io n in

    ap p ro p r iat ely s ized p ack ets , and at a r ate s uit ab le fo r t he net wo r k . P acket s ar e ack no w led ged b y t he r eceiver , and

    t ho s e t hat ar e lo s t in t rans it ar e r etr ans mit ted b y t he s o ur ce. W hen d ata ar rives , it is b uffer ed at t he r eceiver s o t hat it

    can b e d eliver ed in o r der . This p r oces s is t rans p arent t o t he ap p licat io n, w hich s imp ly s ees a "p ip e" o f d ata flo wing

    acr os s t he net wo r k .

      A s lo ng as t he ap p licatio n p r ovid es s ufficient d ata, t he TC P t rans p o rt layer w ill incr eas e it s s end ing r ate unt il t he

    net wo r k exhib it s p ack et lo s s . P ack et lo s s is t r eat ed as a s ignal t hat t he b and wid t h o f t he b o tt leneck link has b een

    exceeded and the connection should reduce its sending rate to match. Accordingly, TCP reduces its sending rate

    when loss occurs. This process continues, with TCP continually probing the sustainable transmission rate across the

    net wo r k; t he r esult is a s end ing r ate s uch as t hat illus tr ated in Figure 2.15.

      Figure 2.15. S ample TCP Se nding Ra te

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    38/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    39/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    40/208

    R e qu i r e me n ts fo r A ud i o / V i de o Tr a n sp o r t i n Pa c k e t N e two r k s

      S o f ar , t his chap ter has exp lo r ed t he char act eris t ics o f I P n et wo r k s in s o me d etail, an d has lo o k ed b r iefly at t he

     behavior of the transport protocols layered above them. We can now relate this discussion to real-time audio and

    video transport, consider the requirements for delivery of media streams over an IP network, and determine how well

    the network meets those requirements.

      Whe n w e d e sc rib e me d ia a s r ea l- t ime , we me a n s imp ly t ha t t he r ec e ive r is p la ying o ut t he me d ia s tr ea m a s it is

    r eceived , rat her t han s imp ly s t or ing t he co mp let e s t ream in a file fo r lat er p lay- b ack . In t he id eal cas e, p layo ut at t he

    receiver is immediate and synchronous, although in practice some unavoidable transmission delay is imposed by the

    network.

      The p r imar y r eq uir ement t hat r eal- t ime med ia p laces o n t he t rans p o rt p r ot o co l is fo r p r ed ict ab le var iat io n in net wo r k  

    transit time. Consider, for example, an IP telephony system transporting encoded voice in 20- millisecond frames: The

    s o ur ce w ill t rans mit o ne p ack et ever y 2 0 millis econd s , and id eally w e w ould lik e t ho s e t o ar rive w it h t he s ame s p acing

    s o t hat t he s p eech t hey co nt ain can b e p layed o ut immed iat ely. S o me var iat io n in t rans it t ime can b e acco mmo d ated

     by the insertion of additional buffering delay at the receiver, but this is possible only if that variation can be

    char act erized and t he r eceiver can ad ap t t o mat ch t he var iat io n ( this p r oces s is d es cr ib ed in d etail in C ha p te r 6,Media Ca pture, Playout, and Timing).

      A lesser requirement is reliable delivery of all packets by the network. Clearly, reliable delivery is desirable, but many

    aud io and vid eo ap p licat io ns can t o ler ate s o me lo s s : I n o ur I P t elep ho ny examp le, lo s s o f a s ingle p acket w ill r es ult in

    a dropout of one-fiftieth of a second, which, with suitable error concealment, is barely noticeable. Because of the

    t ime- v ar ying nat ur e o f med ia s t reams , s o me lo s s is us ually accept able b ecaus e it s effect s ar e q uick ly co rr ected b y t he

    ar rival o f new d ata. T he amo unt o f lo s s t hat is accept able d ep end s o n t he ap p licatio n, t he enco d ing met ho d u sed , and

    t he p a tt er n o f lo ss . C ha p te r 8, Error Concealment, and C ha p te r 9, Error Correction, discuss loss tolerance.

      T hes e r eq uir ement s d r ive t he cho ice o f t rans p o rt p r ot o co l. I t s ho uld b e clear t hat TC P / IP is no t ap p ro p riat e b ecaus eit favors reliability over timeliness, and our applications require timely delivery. A UDP/IP-based transport should be

    s uit able, p r o vid ed t hat t he var iat io n in t rans it t ime o f t he net wo r k can b e char act erized and lo s s r ates ar e accept able.

      The standard Real-time Transport Protocol (RTP) builds on UDP/IP, and provides timing recovery and loss

    detection, to enable the development of robust systems. RTP and associated standards will be discussed in extensive

    d etail in t he r emaind er o f t his b o o k.

      Despite TCP's limitations for real-time applications, some audio/video applications use it for their transport. Such

    applications attempt to estimate the average throughput of the TCP connection and adapt their send rate to match.

    This ap p ro ach can b e mad e t o wo r k when t ight end - t o - end d elay b o und s ar e no t r equir ed and an ap p licatio n has

    several seconds worth of buffering to cope with the variation in delivery time caused by TCP retransmission andcongestion control. It does not work reliably for interactive applications, which need short end-to- end delay, because

    the variation in transit time caused by TCP is too great.

      T he p r imar y r atio nale fo r t he us e o f TC P / IP t r ans p o rt is t hat many fir ewalls p as s T C P co nnectio ns b ut b lo ck U DP .

    This situation is changing rapidly, as RTP-based systems become more prevalent and firewalls smarter. I strongly

    recommend that new applications be based on RTP-over-UDP/IP. RTP provides for higher quality by enabling

    ap p licatio ns t o ad ap t in a w ay t hat is ap p ro p r iat e fo r r eal- t ime med ia, and b y p r omo t ing int ero p erab ilit y ( b ecaus e it is

    an o p en s tand ar d) .

      B e n e f it s o f P a c k e t - B a s e d Au di o /Vi de o

      At t his s t age yo u may b e w ond ering w hy anyo ne w ould cons id er a p ack et - b as ed aud io o r vid eo ap p licatio n o ver an

    IP network. Such a network clearly poses challenges to the reliable delivery of real-time media streams. Although

    t hes e challenges ar e r eal, an I P net wo r k has s o me d is t inct ad vant ages t hat lead t o t he p o tent ial fo r s ignificant gains in

    efficiency and flexibility, which can outweigh the disadvantages.

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    41/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    42/208

    [ Te am LiB ]

    Summary

      The properties of an IP network are significantly different from those of traditional telephony, audio, or video

    distribution networks. When designing applications that work over IP, you need to be aware of these unique

    characteristics, and make your system robust to their effects.

      The remainder of this book will describe an architecture for such systems, explaining RTP and its model for timing

    recovery and lip synchronization, error correction and concealment, congestion control, header compression,

    multiplexing and tunneling, and security.

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    43/208

    [ Te am LiB ]

    Part II: Media Transport Using

    RTP3 The Real-time Transport Protocol

    4 RTP Data Transfer Protocol

    5 RTP Control Protocol

    6 Med ia Cap ture, Playout, and Timing

    7 Lip Synchronization

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    44/208

    [ Te am LiB ]

    Ch ap ter 3. Th e Real -Ti me Tra nsp ort P ro to co l

      Fundamental Design Philosophies of RTP

    Standard Elements of RTP

    Related Standards

    Future Standards Development

      This chapter describes the design of the RTP framework starting with the philosophy and background of the design,

    gives an o ver view o f t he ap p licab le s tand ar ds , and exp lains ho w t ho s e s t and ard s int err elat e. I t co nclud es w it h a

    discussion of possible future directions for the development of those standards.

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    45/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    46/208

    F u nd ame n ta l D e si g n P h i l os op hi e s o f R TP

      The challenge facing the designers of RTP was to build a mechanism for robust, real-time media delivery above an

    unreliable transport layer. They achieved this goal with a design that follows the twin philosophies of application-level

    framing and the end-to-e nd principle.

      Ap pl i c a t i o n -L e v e l F r a m i ng

      The concepts behind application-level framing were first elucidated by Clark and Tennenhouse65  in 1990. Their 

    cent ral t hes is is t hat o nly t he ap p licatio n has s ufficient k no wled ge o f it s d ata t o mak e an info r med d ecis io n ab o ut ho w

    t hat d ata s ho uld b e t rans p o rt ed. T he imp licatio n is t hat a t rans p o rt p r ot o co l s ho uld accept d ata in

    application-meaningful units (application data units, ADUs) and expose the details of their delivery as much as

     possible so that the application can make an appropriate response if an error occurs. The application partners with the

    transport, co operating to achieve reliable delivery.

      Application-level framing comes from the recognition that there are many ways in which an application can recover 

    fr o m net wo r k p r ob lems , and t hat t he co r rect ap p ro ach d ep end s o n b o th t he ap p licatio n and t he s cenar io in w hich it is

     being used. In some cases it is necessary to retransmit an exact copy of the lost data. In others, a lower-fidelity copymay be used, or the data may have been superseded, so the replacement is different from the original. Alternatively,

    t he lo s s can b e igno r ed if t he d ata w as o f o nly t r ans ient int eres t. Thes e cho ices ar e p o s sib le o nly if t he ap p licatio n

    interacts closely with the transport.

      The go al o f ap p licatio n- level fr aming is s o mewhat at o d d s w it h t he d es ign o f T C P, w hich hid es t he lo s sy nat ur e o f t he

    underlying IP network to achieve reliable delivery at the expense of timeliness. It does, however, fit well with

    UDP-based transport and with the characteristics of real-time media. As noted in C ha p te r 2, Vo ice and V id eo

    Communication over Packet Networks, real-time audio and visual media is often loss tolerant but has strict timing

     bounds. By using application-level framing with UDP-based transport, we are able to accept losses where necessary,

     but we also have the flexibility to use the full spectrum of recovery techniques, such as retransmission and forwarderror correction, where appropriate.

      These techniques give an application great flexibility to react to network problems in a suitable manner, rather than

     being constrained by the dictates of a single transport layer.

      A network that is designed according to the principles of application-level framing should not be specific to a

     particular application. Rather it should expose the limitations of a generic transport layer so that the application can

    cooperate with the network in achieving the best possible delivery. Application-level framing implies a weakening of                 

    t he s t rict layer s d efined b y t he O S I r efer ence mo d el. I t is a p r agmat ic ap p ro ach, ack no wled ging t he imp o rt ance o f                  

    layer ing, b ut accept ing t he need t o exp o s e s o me d etails o f t he lo wer layer s .

      The philosophy of application-level framing implies smart, network-aware applications that are capable of reacting to

     problems.

      T h e E nd - to - E n d P r i nc i p l e

      The other design philosophy adopted by RTP is the end-to-end principle.70  I t is o ne o f t wo a p pr oa c he s t o d e signing

    a s ys t em t hat mus t co mmunicate r eliab ly acr os s a net wo r k. I n o ne ap p ro ach, t he s ys t em can p as s r es po ns ib ilit y fo r t he

    correct delivery of data along with that data, thus ensuring reliability hop by hop. In the other approach, the

    responsibility for data can remain with the endpoints, ensuring reliability end-to-end even if the individual hops are

    unr eliab le. I t is t his s econd end - t o - end ap p ro ach t hat p ermeates t he d es ign o f t he I nt ernet , w it h b o th TC P and R TPfollowing the end-to - end principle.

      The main consequence of the end-to-end principle is that intelligence tends to bubble up toward the top of the

     protocol stack. If the systems that make up the network path never take responsibility for the data, they can be simple

    and d o n o t need t o b e r o bus t . They may d is card d at a t hat t hey canno t d eliver , b ecaus e t he end p o int s w ill r ecover  

    without their help. The end-to-end principle implies that intelligence is at the endpoints, not within the network.

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    47/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    48/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    49/208

    St anda rd E le me nt s o f RTP

      The primary standard for audio/video transport in IP networks is the Real-time Transport Protocol (RTP), along with

    associated profiles and payload formats. RTP was developed by the Audio/Video Transport working group of the

    Internet Engineering Task Force (IETF), and it has since been adopted by the International Telecommunications

    U nio n ( I TU ) as p art o f it s H . 32 3 s eries o f r ecommend atio ns , and b y s ever al o t her s t and ard s o r ganizat io ns .

      R TP p r o vid es a fr amewo r k fo r t he t rans p o rt o f r eal- t ime med ia and n eed s t o b e p r o filed fo r p ar ticular us es b efo r e it

    is complete. The RTP profile for audio and video conferences with minimal control was standardized along with RTP,

    and several more profiles are under development. Each profile is accompanied by several payload format

    specifications, each of which describes the transport of a particular media format.

      T h e R T P S p e c i f i c a t i o n

      R TP w as p ub lis hed as an I ETF p r o p o sed s t and ar d ( RF C 1 8 8 9 ) in J anuar y 1 9 9 6,6  and its revision for draft standard

    status is almost complete.50  The first revision of ITU recommendation H.323 included a verbatim copy of the RTP

    specification; later revisions reference the current IETF standard.

    I n t he I ETF s t and ar ds p r o ces s ,8  a specification undergoes a development cycle in which multiple Internet

    d ra ft s a re p ro d uc e d a s t he d e ta ils o f t he d e sign a re w or ke d out . W he n t he d e sign is c o mp le te , it is

     published as a proposed standard RFC. A proposed standard is generally considered stable, with all

    known design issues worked out, and suitable for implementation. If that proposed standard proves

    useful, and if there are independent and interoperable implementations of each feature of that standard, it

    can then be advanced to draft standard status (possibly involving changes to correct any problems found

    in t he p r op o s ed s t and ard ) . F inally, aft er ext ens ive exp er ience, it may b e p ub lis hed as a full s t and ard

    RFC. Advancement beyond proposed standard status is a significant hurdle that many protocols never 

    achieve.

    RTP typically sits on top of UDP/IP transport, enhancing that transport with loss detection and reception quality

    repor ting, provision for timing recovery and synchronization, payload and source identification, and marking of                 

    significant events within the media stream. Most implementations of RTP are part of an application or library that is

    layer ed ab o ve t he U DP / IP s o cket s int erface p r o vid ed b y t he o p erat ing s ys t em. This is no t t he o nly p o s sib le d es ign,

    though, and nothing in the RTP protocol requires UDP or IP. For example, some implementations layer RTP above

    TC P / IP , and o t her s us e R TP o n no n- I P n etw or k s, s uch as A synchr o no us Tr ans fer M o de ( ATM ) net wo r ks .

      Ther e ar e t wo p art s t o R TP : t he d ata t r ans fer p r ot o co l and an as s ociat ed co nt ro l p r o to co l. The R TP d ata t rans fer  

     protocol manages delivery of real-time data, such as audio and video, between end systems. It defines an additionallevel of framing for the media payload, incorporating a sequence number for loss detection, timestamp to enable timing

    recovery, payload type and source identifiers, and a marker for significant events within the media stream. Also

    specified are rules for timestamp and sequence number usage, although these rules are somewhat dependent on the

     profile and payload format in use, and for multiplexing multiple streams within a session. The RTP data transfer 

     protocol is discussed further in C ha p te r 4.

      The RTP control protocol (RTCP) provides reception quality feedback, participant identification, and

    synchronization between media streams. RTCP runs alongside RTP and provides periodic reporting of this

    information. Although data packets are typically sent every few milliseconds, the control protocol operates on the

    scale of seconds. The information sent in RTCP is necessary for synchronization between media streams—for 

    example, for lip synchronization between audio and video—and can be useful for adapting the transmission accordingto reception quality feedback, and for identifying the participants. The RTP control protocol is discussed further in

    C ha p te r 5.

      RTP s up p o rt s t he no t io n o f mixer s and t rans lat or s , mid d le b o xes t hat can o p erat e o n t he med ia as it flo ws b etw een

    endpoints. These may be used to translate an RTP session between different lower-layer protocols—for example,

     bridging between participants on IPv4 and IPv6 networks, or bringing a unicast-only participant into a multicast

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    50/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    51/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    52/208

  • 8/17/2019 RTP Audio and Video for the Internet

    53/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    54/208

    [ Te am LiB ]

    Fu tu r e S ta n da r d s D e ve l o p me n t

      W it h t he r evis io n o f RT P fo r d r aft s t and ar d s t at us , t her e ar e no k n o wn unr es olved i ss ues w it h t he p r ot o co l

    s p ecificatio n, and R TP it s elf is no t exp ected t o ch ange in t he fo r eseeab le fut ur e. T his d o es no t mean t hat t he s t and ar ds

    work is finished, though. New payload formats are always under development, and work on new profiles will extend

    RTP to encompass new functionality (for example, the profiles for secure RTP and enhanced feedback).

      In the long term, we expect the RTP framework to evolve along with the network itself. Future changes in the

    net wo r k may als o affect R TP , an d we exp ect new p r ofiles t o b e d evelo p ed t o t ak e ad vant age o f any changes . W e

    also expect a continual series of new payload format specifications, to keep up with changes in codec technology and

    to provide new error resilience schemes.

      Finally, we can expect considerable changes in the related protocols for call setup and control, resource reservation,

    and quality of service. These protocols are newer than RTP, and they are currently undergoing rapid development,

    implying that changes here will likely be more substantial than changes to RTP, its profile, and payload formats.

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    55/208

    [ Te am LiB ]

    Summary

      RTP provides a flexible framework for delivery of real-time media, such as audio and video, over IP networks. Its

    core philosophies—application-level framing and the end-to- end principle—make it well suited to the unique

    environment of IP networks.

      This chapter has provided an overview of the RTP specification, profiles, and payload formats. Related standards

    cover call setup, control and advertisement, and resource reservation.

      The t wo p art s o f R TP int ro d uced in t his chap ter — the d ata t rans fer p r o to co l and t he co nt ro l p r o to co l— ar e co ver ed

    in d etail in t he next t wo chap ter s.

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    56/208

    [ Te am LiB ]

    C ha pt er 4 . R TP Da ta Tra nsf e r P ro to co l

      RTP Sessions

    The R TP D at a Tr ans fer P ack et

    Packet Validation

    Translators and Mixers

      This chapter explains the RTP data transfer protocol, the means by which real-time media is exchanged. The

    d is cus s io n fo cus es o n t he "o n- t he- w ir e" as p ect s o f R TP — t hat is , t he p ack et fo r mat s and r equir ement s fo r  

    interoperability; the design of a system using RTP is explained in later chapters.

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    57/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    58/208

    RTP Sessions

      A s es sio n co ns is t s o f a gr o up o f p ar ticip ant s w ho ar e co mmunicating us ing RT P. A p ar ticip ant may b e act ive in

    multiple RTP sessions—for instance, one session for exchanging audio data and another session for exchanging video

    d ata. F o r each p ar ticip ant , the s es sio n is id ent ified b y a net wo r k add r ess and p o rt p air t o which d ata s ho uld b e s ent ,

    a nd a p o rt p a ir o n whic h d a ta is r ec e ive d . The s e nd a nd r e ce ive p o rt s ma y b e t he s ame . Ea c h p or t p a ir c o mp ris e s t wo

    ad jacent p o rt s : an even- n umb er ed p o r t fo r R TP d at a p acket s, an d t he next higher ( o dd - numb ered ) p o rt fo r R TC P

    control packets. The default port pair is 5004 and 5005 for UDP/IP, but many applications dynamically allocate ports

    d ur ing s es sio n s etup and igno r e t he d efault . R TP s es s io ns ar e d es igned t o t r ans p o rt a s ingle t yp e o f med ia; in a

    multimedia communication, each media type should be carried in a separate RTP session.

    The lat es t r evis io n t o t he R TP s p ecificatio n r elaxes t he r equir ement t hat t he R TP d ata p o r t b e

    even-numbered, and allows non- adjacent RTP and RTCP ports. This change makes it possible to use

    RTP in environments where certain types of Network Address Translation (NAT) devices are present. If                 

     possible, for compatibility with older implementations, it is wise to use adjacent ports, even though this is

    not strictly required.

    A s es sio n can b e unicas t, eit her d ir ectly b etw een t wo p ar ticip ant s ( a p o int - t o - p o int s es sio n) o r t o a cent ral s erver t hat

    r ed is t rib ut es t he d ata. O r it can b e mult icas t t o a gr o up o f p art icip ant s . A s es sio n als o need no t b e r es tr ict ed to a

    s ingle t rans p o rt ad d res s s p ace. F o r examp le, R TP t r ans lat or s can b e us ed t o b r id ge a s es sio n b etw een unicas t and

    multicast, or between IP and another transport, such as IPv6 or ATM. Translators are discussed in more detail later 

    in this chapter, in the section titled Translators and Mixers. So me examp les o f s es sio n t o po lo gies ar e s ho wn in Figure

    4.1.

      F ig ure 4 . 1. Ty pe s o f R TP S e s s i o ns

      The r ange o f p o s sib le s es sio ns means t hat an R TP en d s ys tem s ho uld b e w rit ten t o b e es s ent ially agno s tic ab o ut t he

    underlying transport. It is good design to restrict knowledge of the transport address and ports to your low-level

    networking code only, and to use RTP-level mechanisms for participant identification. RTP provides a "

    synchronization source" fo r t his p ur p o se, d es cr ib ed in mo r e d etail lat er in t his chap ter .

      In particular, note these tips:

      Yo u s ho uld no t us e a t ra ns p or t a d dr es s a s a p a rt ic ip a nt id e nt ifie r b e ca us e t he d a ta ma y ha ve p a ss e d t hr ough

    a t r ans lat or o r mixer t hat may hid e t he o r iginal s o ur ce ad d res s . I ns t ead , us e t he s ynchr o nizat io n s o ur ce

    identifiers.

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    59/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    60/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    61/208

    The RTP Data Transfer Packet

      T he fo r mat o f an R TP d at a t r ans fer p acket is illus tr ated in Figure 4.2. The re a re fo ur p a rt s t o t he p a ck e t:

    1.

      The mandatory RTP header 

    2.

    An optional header extension

    3.

    An optional payload header (depending on the payload format used)

    4.

    The p aylo ad d ata it self                  

      Fig ure 4 . 2 . A n R T P D a t a T ra ns f e r Pa c k e t

      The entire RTP packet is contained within a lower-layer payload, typically UDP/IP.

      H e a de r E l e me n ts

      The mandatory RTP data packet header is typically 12 octets in length, although it may contain a contributing source

    lis t , w hich can exp and t he lengt h b y 4 t o 6 0 ad d it io nal o ctet s. T he field s in t he mand ato r y head er ar e t he p aylo ad

    type, sequence number, time- stamp, and synchronization source identifier. In addition, there is a count of contributing

    s o ur ces, a mar k er fo r int eres ting event s , s up p o rt fo r p ad ding and a head er ext ens io n, and a ver sio n numb er .

      PAYLOAD TYPE

      The payload type, or PT, field of the RTP header identifies the media transported by an RTP packet. The receiving

    application examines the payload type to determine how to treat the data—for example, passing it to a particular 

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    62/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    63/208

    [ Te am LiB ]

    Packet Validation

      Because RTP sessions typically use a dynamically negotiated port pair, it is especially important to validate that

     packets received really are RTP, and not misdirected other data. At first glance, confirming this fact is nontrivial

     because RTP packets do not contain an explicit protocol identifier; however, by observing the progression of header 

    field s o ver s ever al p acket s, w e can q uick ly o b tain s t ro ng co nfid ence in t he valid it y o f an RT P s tr eam.

      P o ss ib le va lid it y c he c ks t ha t c a n b e p e rfo rme d o n a s tr ea m o f RTP p a c k et s a re o ut line d in A pp e nd ix A o f t he RTP

    s p ecificatio n. Ther e ar e t wo t yp es o f t est s :

    1.

      Per-packet checking, based on fixed known values of the header fields. For example, packets in which the

    ver sio n numb er is no t eq ual t o 2 ar e invalid , as ar e t ho s e w it h an unexp ected p aylo ad t yp e.

    2.

    Per-flow checking, based on patterns in the header fields. For example, if the SSRC is constant, and thesequence number increments by one with each packet received, and the timestamp intervals are appropriate

    for the payload type, this is almost certainly an RTP flow and not a misdirected stream.

      The p er - flo w check s ar e mo r e lik ely t o d etect invalid p acket s, b ut t hey r eq uir e ad d it io nal s t at e t o b e k ep t in t he

    r eceiver . This s t at e is r eq uir ed fo r a valid s o ur ce, but car e mus t b e t aken b ecaus e ho ld ing t o o much s t at e t o d et ect

    invalid s o ur ces can lead t o a d enial- o f- s er vice at tack , in w hich a malicio us s o ur ce flo o d s a r eceiver w it h a s t ream o f                  

     bogus packets designed to use up resources.

      A robust implementation will employ strong per-packet validity checks to weed out as many invalid packets as

     possible before committing resources to the per-flow checks to catch the others. It should also be prepared toaggr es sively d is card s t at e fo r s o ur ces t hat ap p ear t o b e b o gus , t o mit igat e t he effect s o f d enial- o f- s er vice at tack s.

      I t is als o po s s ib le t o valid ate t he co nt ent s o f an RT P d ata s t ream agains t t he co r res p ond ing RT C P co nt ro l p acket s.

    To d o t h is , t he ap p licatio n d is card s RT P p ack ets unt il an R TC P s o ur ce d es crip t io n p ack et w it h t he s ame S S RC is

    received. This is a very strong validity check, but it can result in significant validation delay, particularly in large

    s es sio ns ( b ecaus e t he R TC P r ep or ting int erval can b e many s econd s ). F o r t his r easo n w e r ecommend t hat

    applications validate the RTP data stream directly, using RTCP as confirmation rather than the primary means of                 

    validation.

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    64/208

    [ Te am LiB ]

    i s d o c um e n t i s c r e at e d w i t h t h e u n r eg i s te r ed v er s i on o f C H M 2P D F P i l o t

  • 8/17/2019 RTP Audio and Video for the Internet

    65/208

    Tr a ns la t or s a nd M i xe r s

      I n ad d it io n t o no r mal end s ys t ems , RT P s up p o rt s mid d le b o xes t hat can o p erat e o n a med ia s t ream w it hin a s es sio n.

    Two classes of middle boxes are defined: translators and mixers.

      T r a n s l a t o r s

      A translator is an intermediate system that operates on RTP data while maintaining the synchronization source and

    timeline of a stream. Examples include systems that convert between media-encoding formats without mixing, that

     bridge between different transport protocols, that add or remove encryption, or that filter media streams. A translator 

    is invis ib le t o t he R TP end s ys t ems unles s t ho s e s ys t ems have p r io r k no wled ge o f t he unt rans lat ed m ed ia. Ther e ar e a

    few clas s es o f t rans lat or s :

      Bridges. Bridges are one-to-one translators that don't change the media encoding—for example, gateways

     between different transport protocols, like RTP/UDP/IP and RTP/ATM, or RTP/UDP/IPv4 and

    RTP/UDP/IPv6. Bridges make up the simplest class of translator, and typically they cause no changes to the

    RTP or RTCP data.

    Transcoders. Transcoders are one-to-one translators that change the media encoding—for example,

    d ecod ing t he co mp r ess ed d ata and r eenco d ing it w it h a d iffer ent p aylo ad fo r mat — to b ett er s uit t he

    characteristics of the output network. The payload type usually changes, as may the padding, but other RTP

    header fields generally remain unchanged. These translations require state to be maintained so that the RTCP

    s end er r ep or ts can b e ad jus t ed t o mat ch, b ecaus e t hey co nt ain co unt s o f s o ur ce b it r ate.

    Exploders. Exploders are one-to-many translators, which take in a single packet and produce multiple packets. For example, they receive a stream in which multiple frames of codec output are included within

    each R TP p ack et, and t hey p r od uce o ut p ut w it h a s ingle fr ame p er p acket . The gener ated p ack et s have t he

    s ame S S RC , b ut t he o t her R TP head er field s may have t o b e changed , d ep end ing o n t he t rans lat io n. T hes e

    translations require maintenance of bidirectional state: The translator must adjust both outgoing RTCP sender 

    reports and returning receiver reports to match.

    Mergers. Mergers are many-to-one translators, combining multiple packets into one. This is the inverse of the

     previous category, and the same issues apply.

      T he d efining char act eris t ic o f a t rans lat or is t hat each inp ut s t ream p r o duces a s ingle o ut p ut s t ream, w it h t he s ame

    S S RC . The t rans lat or it s elf is no t a p ar ticip ant in t he R TP s e