RTP - Audio and Video for the Internet
Addison-Wesley, 2003
ISBN-10: 0672322498
ISBN-13: 978 0672322495
First Edition (English)
Note: This book was published in 2003. While the content is still an
accurate description of the core RTP protocol, it's missing details of
many important and widely used extensions.
This book describes the protocols, standards, and architecture of systems
that deliver real-time voice, music, and video over IP networks, such as
the Internet. Relevant applications include voice-over-IP, telephony,
teleconferencing, streaming video, and web-casting. The focus of the book
is media transport: how to reliably deliver audio and video across an IP
network, how to ensure high quality in the face of network problems, and
how to ensure that the system is secure. The book adopts a standards-based
approach, based around the Real-time Transport Protocol, RTP, and its
associated profiles and payload formats. It describes the RTP framework,
how to build a system that uses that framework, and extensions to RTP for
security and reliability.
Contents
The book is logically divided into four sections: The first section
introduces the problem space, provides background, and outlines the
properties of the Internet that affect audio/video transport. These
are the chapters in the first section:
- Chapter 1, Introduction gives a brief introduction to
the Real-time Transport Protocol, outlines the relationship
between RTP and other standards, and describes the scope of
the book.
- Chapter 2, Audio/Video Communication over Packet
Networks describes the unique environment provided by IP
networks, and how this affects packet audio/video applications.
The second part, consisting of the next five chapters, discusses the
basics of the Real-time Transport Protocol. This is information you
need to design and build a tool for voice-over-IP, streaming music or
video, and so forth. Following are the related chapters:
- Chapter 3, The Real-Time Transport Protocol introduces
RTP and related standards, describes how they fit together, and
outlines the design philosophy underpinning the protocol.
- Chapter 4, RTP Data Transfer Protocol is a detailed
description of the transport protocol, used to convey audio/visual
data over IP networks.
- Chapter 5, RTP Control Protocol describes the control
protocol, which provides reception quality feedback, membership
control, and synchroniszation.
- Chapter 6, Media Capture, Playout and Timing explains
how a receiver can reconstruct the audio/visual data and play
it out to the user with correct timing.
- Chapter 7, Lip Synchronization addresses a related
problem: how to synchronisze media streams, for example, to get
lip synchroniszation.
The third part of the book discusses robustness: how to make your
application reliable in the face of network problems, and how to
compensate for loss and congestion in the network. You can build a
system without using these techniques, but it will sound a lot
better, and the pictures will be smoother and less susceptible to
corruption, if you apply them. These chapters make up the third part
of the book:
- Chapter 8, Error Concealment addresses the issue of
concealing errors caused by incomplete reception, describing
several techniques a receiver can use to hide network problems.
- Chapter 9, Error Correction describes techniques that
can be used to repair damaged media streams, wherein which the
sender and receiver cooperate in repairing the media stream.
- Chapter 10, Congestion Control discusses the way the
Internet responds to congestion, and how this affects the design
of audio/video applications.
The final section describes a number of techniques, that have more
specialized use. Many implementations do not use these features, but
they can give a significant performance increase in some cases. These
are the chapters:
- Chapter 11 Header compression outlines a technique
that can significantly increase the efficiency of RTP on
low-speed network links, such as dial-up modems or cellular
radio links.
- Chapter 12, Multiplexing and Tunnelling presents
alternatives to header compression that work by combining
several media streams into one. Again, the intent is to
improve efficiency on low-speed links.
- Chapter 13, Security Considerations describes how
encryption and authentication technology can be used to protect
RTP sessions; it also describes common security and privacy
issues.
Audience
This book describes audio/video transport over IP networks in
considerable detail. It assumes some basic familiarity with IP
network programming, and the operation of network protocols,
and builds on this knowledge to describe the features unique to
audio/video transport. An extensive list of references is included,
pointing readers to additional information on specific topics and
to background reading material.
Several audiences will find this book useful:
- Engineers: The primary audience is those building Voice-over-IP
applications, teleconferencing systems, and streaming media and
web-casting applications. This book is a guide to the design and
implementation of the media transport engine part of such systems.
It should be read in conjunction with the relevant technical
standards, and it builds from those standards to show how a system
is designed and engineered. This book does not discuss signalling
(for example, SIP, RTSP, or H.323) since that is a separate
subject worthy of a book in its own right. Instead it talks in
detail about media transport, and how to achieve good quality
audio and smooth motion video over IP networks.
- Students: The book can be read as an accompaniment to a course in
network protocol design or telecommunications, either at either
graduate or advanced undergraduate level. Familiarity with IP
networks, and layered protocol architectures, is assumed. The
unique aspects of protocols for real-time audio/video transport
are highlighted, as are the differences from a typical layered
system model. The cross-disciplinary nature of the subject is
noted, in particular the relation between the psychology of human
perception and the demands of robust media delivery.
- Researchers: Academics and industrial researchers can use this
book as a source of information about the standards and algorithms
comprising the current state of the art in real-time audio/video
transport over IP networks. Pointers to the literature are, included
in the References section, and will be useful starting points for
those seeking further depth and areas where more research is
needed.
- Network Administrators: Understanding the technical protocols
underpinning the common streaming audio/video applications
illuminates for network administrators how those applications can
affect the behaviour of the network, and how the network can be
engineered to better suit those applications. This book includes
extensive discussion of the network behaviour commonly seen, and
covers how applications can adapt to it, the needs of congestion
control, and the security implications of real-time audio/video
traffic.
This book can be used as a reference, in conjunction with the technical
standards, as a study guide, or as part of an advanced course on network
protocol design or communication technology.
Errata
If you believe you have found a mistake in the book, please
contact me.
Chapter 2
- Figure 2.12: The arrow labelled "delay" should cover the
gap between the onset of the delay spike and the first delayed packet,
rather than the onset of the delay spike and the time the transit
delay returns to normal.
- Figure 2.13: The vertical axis should be labelled "Average loss
probability" rather than "Loss probability"
Chapter 4
Chapter 5
- Figure 5.1: RTCP packets are a multiple of 32 bits in length. Padding
is used to increase the length of the packet to another multiple of
32 bits, longer than the natural length of the packet. Accordingly,
the figure is incorrect, since it implies that the padding is needed
to pad to a 32 bit boundary.
- Figure 5.3: The brace labelling "First Report Block" should
include the fields from the Reportee SSRC down to the DLSR. It should
not include the Reporter SSRC, V, P, RC, PT and Length fields.
- Figure 5.12: the length of the SDES packet should be 10, since the
padding is included in the length.
- Page 122: Listing 5.1 is missing an opening brace in the function
definition. The listing should begin:
validate_rtcp(rtcp_t *packet, int length)
{
rtcp_t *end = (rtcp_t *) (((char *) packet) + length);
- Page 123: The code fragment to check that all RTCP packets are
compound packets is missing an opening parenthesis, and should
read:
if (((packet->length + 1) * 4) == length) {
- Page 130: The code fragment is incorrectly wrapped; "75% of
RTCP bandwidth" should all be on one line.
- Page 130: The code fragment that says:
If (senders > 0) and (senders <(25% of total number of participants)) {
should read:
If (senders > 0) and (senders <= (25% of total number of participants)) {
Chapter 6
- Page 154: In the NTSC video example, the timestamp should increase by
exactly 3003 per frame (not per packet).
- Page 178: The calculation of d_n is reversed compared to the
formula on page 176, and should read:
uint32_t d_n = curr_time - p->ts
since there is an unknown constant random offset applied to the RTP
timestamp, this change makes no difference in practice.
- Page 190: Comments in the code sample are incorrectly formatted. Also
the calculation of delta_var is missing a minus sign, and
should read:
delta_var = (abs(transit - last_transit) + abs(transit - last_last_transit))/8;
- Page 196: The first line of the code sample should read:
if ((curr->pt == COMFORT_NOISE_PT) || is_comfort_noise(curr)) {
- Page 205: just before the figure, "directory memory access"
should be "direct memory access"
Chapter 7
- Page 217: The equation should be:
Ts = Tssr + (M - Msr) / R
Chapter 8
Chapter 9
- Figures 9.3 and 9.9: The legend should read "CC = contributing source
count" instead of "CC = list of contributing sources"
- Page 261: just before the SDP, "for example, 122" should
read "in this example, 122"
Chapter 10
- Figure 10.1: The horizontal axis should be labelled "Packet
Sending Rate"
Chapter 11
- Figure 11.3: The field containing the M'S'T'I'
headers should contain the CC header, not a Link Sequence header
References
- Reference 43 has now been published as RFC 3545.
- Reference 49 has now been published as RFC 3551.
- Reference 50 has now been published as RFC 3550.
- Reference 51 has now been published as RFC 3555.
- Reference 54 has now been published as RFC 3569.
There are no significant technical changes in the RFC versions of these
references, compared to the draft versions used in the preparation of
the book.
Thanks are due to Akimichi Ogawa, Badri Natarajan, Edd Inglis,
Jason Van Eaton, David Tod, Jungkhun Byun, Yeong-Chuan Lim,
Jeffrey Jo, and Ralf Globisch for reporting a number of errata.