Doc.
A/52
10 Nov 94
12 Apr 95
24 May 95
20 Dec 95
DIGITAL AUDIO COMPRESSION STANDARD
(AC-3)
ADVANCED TELEVISION SYSTEMS COMMITTEE
James C. McKinney, Chairman
Dr. Robert Hopkins, Executive Director
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Blank Page
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
DIGITAL AUDIO COMPRESSION (AC-3)
ATSC STANDARD
Table of Contents
LIST OF FIGURES v
LIST OF TABLES vi
FOREWORD...............................................................................................................................1
1. INTRODUCTION.....................................................................................................................2
1.1 Motivation 2
1.2 Encoding 2
1.3 Decoding 4
2. SCOPE...................................................................................................................................6
3. REFERENCES........................................................................................................................6
3.1 Normative references 6
3.2 Informative references 6
4. NOTATION, DEFINITIONS, AND TERMINOLOGY
................................................................7
4.1 Compliance notation 7
4.2 Definitions 7
4.3 Terminology abbreviations 8
5. BIT STREAM SYNTAX.........................................................................................................12
5.1 Synchronization frame 12
5.2 Semantics of syntax specification 12
5.3 Syntax specification 12
5.3.1 syncinfo - synchronization information 13
5.3.2 bsi - bit stream information 13
5.3.3 audblk - audio block 14
5.3.4 auxdata - auxiliary data 19
5.3.5 errorcheck - error detection code 19
5.4 Description of bit stream elements 19
5.4.1 syncinfo - synchronization information 19
5.4.2 bsi - bit stream information 20
5.4.3 audblk audio block 26
5.4.4 auxdata - auxiliary data field 35
—i—
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
5.4.5 errorcheck - frame error detection field 36
5.5 Bit stream constraints 37
6. DECODING THE AC-3 BIT STREAM
...................................................................................40
6.1 Introduction 40
6.2 Summary of the decoding process 40
6.2.1 Input bit stream 40
6.2.2 Synchronization and error detection 42
6.2.3 Unpack BSI, side information 42
6.2.4 Decode exponents 42
6.2.5 Bit allocation 43
6.2.6 Process mantissas 43
6.2.7 De-coupling 43
6.2.8 Rematrixing 43
6.2.9 Dynamic range compression 43
6.2.10 Inverse transform 44
6.2.11 Window, overlap/add 44
6.2.12 Downmixing 44
6.2.13 PCM output buffer 44
6.2.14 Output PCM 44
7. ALGORITHMIC DETAILS.....................................................................................................45
7.1 Exponent coding 45
7.1.1 Overview 45
7.1.2 Exponent strategy 46
7.1.3 Exponent decoding 47
7.2 Bit allocation 50
7.2.1 Overview 50
7.2.2 Parametric bit allocation 51
7.2.3 Bit allocation tables 58
7.3 Quantization and decoding of mantissas 65
7.3.1 Overview 65
7.3.2 Expansion of mantissas for asymmetric quantization (6≤ bap ≤ 15) 66
7.3.3 Expansion of mantissas for symmetrical quantization (1≤ bap ≤ 5) 66
7.3.4 Dither for zero bit mantissas (bap=0) 67
7.3.5 Ungrouping of mantissas 69
7.4 Channel coupling 69
7.4.1 Overview 69
7.4.2 Sub-band structure for coupling 70
7.4.3 Coupling coordinate format 71
7.5 Rematrixing 72
7.5.1 Overview 72
7.5.2 Frequency band definitions 73
7.5.3 Encoding technique 74
7.5.4 Decoding technique 74
7.6 Dialogue normalization 75
7.6.1 Overview 75
— ii —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
7.7 Dynamic range compression 76
7.7.1 Dynamic range control; dynrng, dynrng2 76
7.7.2 Heavy compression; compr, compr2 79
7.8 Downmixing 81
7.8.1 General downmix procedure 82
7.8.2 Downmixing into two channels 85
7.9 Transform equations and block switching 87
7.9.1 Overview 87
7.9.2 Technique 87
7.9.3 Decoder implementation 88
7.9.4 Transformation equations 88
7.9.5 Channel gain range code 93
7.10 Error detection 93
7.10.1 CRC checking 94
7.10.2 Checking bit stream consistency 96
8. ENCODING THE AC-3 BIT STREAM
...................................................................................98
8.1 Introduction 98
8.2 Summary of the encoding process 98
8.2.1 Input PCM 98
8.2.2 Transient detection 100
8.2.3 Forward transform 101
8.2.4 Coupling strategy 101
8.2.5 Form coupling channel 102
8.2.6 Rematrixing 102
8.2.7 Extract exponents 102
8.2.8 Exponent strategy 103
8.2.9 Dither strategy 103
8.2.10 Encode exponents 103
8.2.11 Normalize mantissas 104
8.2.12 Core bit allocation 104
8.2.13 Quantize mantissas 104
8.2.14 Pack AC-3 frame 105
ANNEX A - AC-3 ELEMENTARY STREAMS IN AN MPEG-2 MULTIPLEX
Normative
( ) .......107
1. SCOPE............................................................................................................................... 107
2. INTRODUCTION................................................................................................................. 107
3. DETAILED SPECIFICATION.............................................................................................. 107
3.1 Stream_type 107
3.2 Stream_id 108
3.2.1 Transport stream 108
3.2.2 Program stream 108
3.3 Registration descriptor 108
— iii —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
3.4 AC-3 audio descriptor 109
3.5 ISO_639_language_code 112
3.6 STD audio buffer size 112
4. PES CONSTRAINTS.......................................................................................................... 113
4.1 Encoding 113
4.2 Decoding 113
4.3 Byte-alignment 114
ANNEX B - AC-3 DATA STREAM IN IEC958 INTERFACE
Informative
( ) .............................. 115
1. SCOPE............................................................................................................................... 115
2. INTRODUCTION................................................................................................................. 115
3. BASIC PARAMETERS OF IEC958:1989 INTERFACE
....................................................... 115
4. DETAILED SPECIFICATION.............................................................................................. 116
4.1 Channel status word 116
4.1.1 Channel status word — consumer application 117
4.1.2 Channel status word — professional application 117
4.2 Placement of data into sub-frames 118
4.2.1 32-bit mode 118
4.2.2 16-bit mode 119
4.3 Validity flag 119
4.4 Coding of preamble 119
4.4.1 32-bit mode 120
4.4.2 16-bit mode 120
4.4.3 burst_info 120
4.4.4 length_code 121
4.5 Burst spacing 121
4.6 The null data_type 121
4.7 The AC-3 data_type 122
4.7.1 Placement of AC-3 frames into data bursts 122
4.7.2 Symbol frequency 123
4.8 The time stamp data_type 123
4.8.1 Preamble values 124
4.8.2 Time stamp payload 124
5. AUTO DETECTION OF AUDIO/DATA MODE
....................................................................126
— iv —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
ANNEX C - AC-3 KARAOKE MODEInformative
( ).................................................................127
1. SCOPE............................................................................................................................... 127
2. INTRODUCTION................................................................................................................. 127
3. DETAILED SPECIFICATION.............................................................................................. 128
3.1 Karaoke mode indication 128
3.2 Karaoke mode channel assignment 128
3.3 Reproduction of karaoke mode bit streams 128
3.3.1 Karaoke aware decoders 128
3.3.2 Karaoke capable decoders 129
List of Figures
Figure 1.1. Example application of AC-3 to satellite audio transmission. 3
Figure 1.2. The AC-3 encoder. 4
Figure 1.3. The AC-3 decoder. 5
Figure 5.1. AC-3 synchronization frame. 12
Figure 6.1. Flow diagram of the decoding process. 41
Figure 8.1. Flow diagram of the encoding process. 99
ANNEX A
None
ANNEX B
Figure 1. Encoding audio with time code. 124
Figure 2. Time stamps and AC-3 frames in the IEC958 data stream. 125
Figure 3. PCM-DATA auto mode detection. 126
ANNEX C
None
—v—
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
List of Tables
Table 5.1 Sample Rate Codes 20
Table 5.2 Bit Stream Mode 20
Table 5.3 Audio Coding Mode 21
Table 5.4 Center Mix Level 21
Table 5.5 Surround Mix Level 22
Table 5.6 Dolby Surround Mode 22
Table 5.7 Room Type 24
Table 5.8 Time Code Exists 25
Table 5.9 Master Coupling Coordinate 29
Table 5.10 Number of Rematrixing Bands 30
Table 5.11 Delta Bit Allocation Exists States 33
Table 5.12 Bit Allocation Deltas 34
Table 5.13 Frame Size Code Table (1 word = 16 bits) 38
Table 5.14 Language Code Table 39
Table 7.1 Mapping of Differential Exponent Values, D15 Mode 46
Table 7.2 Mapping of Differential Exponent Values, D25 Mode 46
Table 7.3 Mapping of Differential Exponent Values, D45 Mode 47
Table 7.4 Exponent Strategy Coding 47
Table 7.5 LFE Channel Exponent Strategy Coding 47
Table 7.6 Slow Decay Table, slowdec[] 58
Table 7.7 Fast Decay Table, fastdec[] 58
Table 7.8 Slow Gain Table, slowgain[] 58
Table 7.9 dB/Bit Table, dbpbtab[] 58
Table 7.10 Floor Table, floortab[] 59
Table 7.11 Fast Gain Table, fastgain[] 59
Table 7.12 Banding Structure Tables, bndtab[], bndsz[] 60
× A) + B
Table 7.13 Bin Number to Band Number Table, masktab[bin], bin = (10 61
Table 7.14 Log-Addition Table, latab[val], val = (10× A) + B 62
Table 7.15 Hearing Threshold Table, hth[fscod][band] 63
Table 7.16 Bit Allocation Pointer Table, baptab[] 64
Table 7.17 Quantizer Levels and Mantissa Bits vs. bap 65
— vi —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 7.18 Mapping of bap to Quantizer 66
Table 7.19 bap=1 (3-Level) Quantization 67
Table 7.20 bap=2 (5-Level) Quantization 67
Table 7.21 bap=3 (7-Level) Quantization 68
Table 7.22 bap=4 (11-Level) Quantization 68
Table 7.23 bap=5 (15-Level) Quantization 68
Table 7.24 Coupling Sub-Bands 70
Table 7.25 Rematrix Banding Table A 73
Table 7.26 Rematrixing Banding Table B 73
Table 7.27 Rematrixing Banding Table C 74
Table 7.28 Rematrixing Banding Table D 74
Table 7.29 Meaning of 3 msb of dynrng 79
Table 7.30 Meaning of 3 msb of compr 81
Table 7.31 LoRo Scaled Downmix Coefficients 86
Table 7.32 LtRt Scaled Downmix Coefficients 86
Table 7.33 Transform Window Sequence (w[addr]), Where addr = (10 * A) + B 92
Table 7.34 5/8_frame Size Table; Number of Words in the First 5/8 of the Frame 95
ANNEX A
Table 1 AC-3 Registration Descriptor 108
Table 2 AC-3 Audio Descriptor Syntax 109
Table 3 Sample Rate Code Table 110
Table 4 Bit Rate Code Table 110
Table 5 dsurmod Table 111
Table 6 num_channels Table 111
ANNEX B
Table 1 IEC958 Sub-frame 116
Table 2 Channel Status Bits 117
Table 3 Channel Status Bits in Byte 0 118
Table 4 Preamble Words 119
Table 5 burst_info 120
Table 6 Values of data_type 120
Table 7 Values of data_type_dependent When data_type = 1 122
— vii —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 8 Time Stamp Payload 124
Table 9 Frame Rate Code 125
ANNEX C
Table 1 Channel Array Ordering 128
Table 2 Coefficient Values for Karaoke Aware Decoders 129
Table 3 Default Coefficient Values for Karaoke Capable Decoders 129
— viii —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
DIGITAL AUDIO COMPRESSION (AC-3)
ATSC STANDARD
FOREWORD
The United States Advanced Television Systems Committee (ATSC) was formed
by the member organizations of the Joint Committee on InterSociety Coordination
(JCIC)1, recognizing that the prompt, efficient and effective development of a coordinated
set of national standards is essential to the future development of domestic television
services.
One of the activities of the ATSC is exploring the need for and, where appropriate,
coordinating the development of voluntary national technical standards for Advanced
Television Systems (ATV). The ATSC Executive Committee assigned the work of
documenting the U.S. ATV standard to a number of specialist groups working under the
Technology Group on Distribution (T3). The Audio Specialist Group (T3/S7) was
charged with documenting the ATV audio standard.
This document was prepared initially by the Audio Specialist Group as part of its
efforts to document the United States Advanced Television broadcast standard. It was
approved by the Technology Group on Distribution on September 26, 1994, and by the
full ATSC Membership as an ATSC Standard on November 10, 1994. Annex A, “AC-3
Elementary Streams in an MPEG-2 Multiplex,” was approved by the Technology Group
on Distribution on February 23, 1995, and by the full ATSC Membership on April 12,
1995. Annex B, “AC-3 Data Stream in IEC958 Interface,” and Annex C, “AC-3 Karaoke
Mode,” were approved by the Technology Group on Distribution on October 24, 1995
and by the full ATSC Membership on December 20, 1995. ATSC Standard A/53, “Digital
Television Standard for HDTV Transmission”, references this document and describes
how the audio coding algorithm described herein is applied in the U.S. ATV standard.
At the time of release of this document, the system description contained herein
had not been verified by the transmission of signals from independently developed
encoders to separately developed decoders.
1
The JCIC is presently composed of: the Electronic Industries Association (EIA), the Institute of Electrical
and Electronic Engineers (IEEE), the National Association of Broadcasters (NAB), the National Cable
Television Association (NCTA), and the Society of Motion Picture and Television Engineers (SMPTE).
NOTE: The user’s attention is called to the possibility that compliance with this standard may require use
of an invention covered by patent rights. By publication of this standard, no position is taken with respect
to the validity of this claim, or of any patent rights in connection therewith. The patent holder has,
however, filed a statement of willingness to grant a license under these rights on reasonable and
nondiscriminatory terms and conditions to applicants desiring to obtain such a license. Details may be
obtained from the publisher.
—1—
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
1. INTRODUCTION
1.1 Motivation
In order to more efficiently broadcast or record audio signals, the amount of
information required to represent the audio signals may be reduced. In the case of digital
audio signals, the amount of digital information needed to accurately reproduce the
original pulse code modulation (PCM) samples may be reduced by applying a digital
compression algorithm, resulting in a digitally compressed representation of the original
signal. (The term compression used in this context means the compression of the amount
of digital information which must be stored or recorded, and not the compression of
dynamic range of the audio signal.) The goal of the digital compression algorithm is to
produce a digital representation of an audio signal which, when decoded and reproduced,
sounds the same as the original signal, while using a minimum of digital information (bit-
rate) for the compressed (or encoded) representation. The AC-3 digital compression
algorithm specified in this document can encode from 1 to 5.1 channels of source audio
from a PCM representation into a serial bit stream at data rates ranging from 32 kbps to
640 kbps. The 0.1 channel refers to a fractional bandwidth channel intended to convey
only low frequency s(ubwoofer) signals.
A typical application of the algorithm is shown in Figure 1.1. In this example, a 5.1
channel audio program is converted from a PCM representation requiring more than 5
Mbps (6 channels × 48 kHz × 18 bits = 5.184 Mbps) into a 384 kbps serial bit stream by
the AC-3 encoder. Satellite transmission equipment converts this bit stream to an RF
transmission which is directed to a satellite transponder. The amount of bandwidth and
power required by the transmission has been reduced by more than a factor of 13 by the
AC-3 digital compression. The signal received from the satellite is demodulated back into
the 384 kbps serial bit stream, and decoded by the AC-3 decoder. The result is the original
5.1 channel audio program.
Digital compression of audio is useful wherever there is an economic benefit to be
obtained by reducing the amount of digital information required to represent the audio.
Typical applications are in satellite or terrestrial audio broadcasting, delivery of audio over
metallic or optical cables, or storage of audio on magnetic, optical, semiconductor, or
other storage media.
1.2 Encoding
The AC-3 encoder accepts PCM audio and produces an encoded bit stream
consistent with this standard. The specifics of the audio encoding process are not
normative requirements of this standard. Nevertheless, the encoder must produce a bit
stream matching the syntax described in Section 5, which, when decoded according to
Sections 6 and 7, produces audio of sufficient quality for the intended application. Section
8 contains informative information on the encoding process. The encoding process is
briefly described below.
The AC-3 algorithm achieves high coding gain (the ratio of the input bit-rate to the
output bit-rate) by coarsely quantizing a frequency domain representation of the audio
—2—
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
signal. A block diagram of this process is shown in Figure 1.2. The first step in the
encoding process is to transform the representation of audio from a sequence of PCM time
samples into a sequence of blocks of frequency coefficients. This is done in the analysis
filter bank. Overlapping blocks of 512 time samples are multiplied by a time window and
transformed into the frequency domain. Due to the overlapping blocks, each PCM input
sample is represented in two sequential transformed blocks. The frequency domain
representation may then be decimated by a factor of two so that each block contains 256
frequency coefficients. The individual frequency coefficients are represented in binary
exponential notation as a binary exponent and a mantissa. The set of exponents is encoded
into a coarse representation of the signal spectrum which is referred to as the spectral
envelope. This spectral envelope is used by the core bit allocation routine which
determines how many bits to use to encode each individual mantissa. The spectral
envelope and the coarsely quantized mantissas for 6 audio blocks (1536 audio samples)
are formatted into an AC-3 frame. The AC -3 bit stream is a sequence of AC-3 frames.
The actual AC-3 encoder is more complex than indicated in Figure 1.2. The
following functions not shown above are also included:
1. A frame header is attached which contains information (bit-rate, sample rate, number
of encoded channels, etc.) required to synchronize to and decode the encoded bit
stream.
Transmission
Input Audio
Signals
Left
Encoded
Center Bit-Stream Modulated
Right 384 kb/s Signal
Transmission
Left Surround AC-3 Encoder
Equipment
Right Surround
Low Frequency
Effects Satellite Dish
Reception
Output Audio
Signals
Left
Encoded
Modulated Bit-Stream Center
Signal 384 kb/s Right
Reception
AC-3 Decoder Left Surround
Equipment
Right Surround
Low Frequency
Satellite Dish
Effects
Figure 1.1. Example application of AC-3 to satellite audio transmission.
—3—
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
PCM Time Exponents Spectral
Analysis Filter
Samples Envelope Bit Allocation
Bank
Encoding
Mantissas
Mantissa Bit Allocation Information
Quantization
Quantized Encoded
Mantissas Spectral
Envelope
Encoded AC-3
AC-3 Frame Formatting
Bit-Stream
Figure 1.2. The AC-3 encoder.
2. Error detection codes are inserted in order to allow the decoder to verify that a
received frame of data is error free.
3. The analysis filterbank spectral resolution may be dynamically altered so as to better
match the time/frequency characteristic of each audio block.
4. The spectral envelope may be encoded with variable time/frequency resolution.
5. A more complex bit allocation may be performed, and parameters of the core bit
allocation routine modified so as to produce a more optimum bit allocation.
6. The channels may be coupled together at high frequencies in order to achieve higher
coding gain for operation at lower bit-rates.
7. In the two-channel mode a rematrixing process may be selectively performed in order
to provide additional coding gain, and to allow improved results to be obtained in the
event that the two-channel signal is decoded with a matrix surround decoder.
1.3 Decoding
The decoding process is basically the inverse of the encoding process. The
decoder, shown in Figure 1.3, must synchronize to the encoded bit stream, check for
errors, and de-format the various types of data such as the encoded spectral envelope and
the quantized mantissas. The bit allocation routine is run and the results used to unpack
and de-quantize the mantissas. The spectral envelope is decoded to produce the
exponents. The exponents and mantissas are transformed back into the time domain to
produce the decoded PCM time samples.
—4—
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Encoded AC-3 AC-3 Frame Syncronization, Error Detection,
Bit-Stream and Frame De-formatting
Quantized
Mantissas
Bit Allocation
Encoded
Bit Information Mantissa
Spectral
Allocation De-quantization
Envelope
Mantissas
Spectral Exponents
Synthesis PCM Time
Envelope
Filter Bank Samples
Decoding
Figure 1.3. The AC-3 decoder.
The actual AC-3 decoder is more complex than indicated in Figure 1.3. The
following functions not shown above are included:
1. Error concealment or muting may be ap plied in case a data error is detected.
2. Channels which have had their high-frequency content coupled together must be de-
coupled.
3. Dematrixing must be applied (in the 2-channel mode) whenever the channels have been
rematrixed.
4. The synthesis filterbank resolution must be dynamically altered in the same manner as
the encoder analysis filter bank had been during the encoding process.
—5—
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
2. SCOPE
The normative portions of this standard specify a coded representation of audio
information, and specify the decoding process. Informative information on the encoding
process is included. The coded representation specified herein is suitable for use in digital
audio transmission and storage applications. The coded representation may convey from 1
to 5 full bandwidth audio channels, along with a low frequency enhancement channel. A
wide range of encoded bit-rates is supported by this specification.
A short form designation of this audio coding algorithm is “AC-3”.
3. REFERENCES
3.1 Normative references
The following documents contain provisions which, through reference in this text,
constitute provisions of this standard. At the time of publication, the editions indicated
were valid. All standards are subject to revision, and parties to agreement based on this
standard are encouraged to investigate the possibility of applying the most recent editions
of the documents listed below.
None.
3.2 Informative references
The following documents contain information on the algorithm described in this
standard, and may be useful to those who are using or attempting to understand this
standard. In the case of conflicting information, the information contained in this standard
should be considered correct.
Todd, C. et. al., “AC-3: Flexible Perceptual Coding for Audio Transmission and Storage”,
AES 96th Convention, Preprint 3796, Feb. 1994.
Ehmer, R. H., "Masking Patterns of Tones," J. Acoust. Soc. Am., vol. 31, pp. 1115-1120
(1959 Aug.).
Ehmer, R H., "Masking of Tones vs. Noise Bands," J. Acoust. Soc. Am., vol. 31, pp
1253-1256 (1959 Sept.).
Moore, B.C.J., and Glasberg, B.R., “Formulae Describing Frequency Selectivity as a
Function of Frequency and Level, and Their Use in Calculating Excitation Patterns,”
Hearing Research, Vol. 28, pp. 209-225 (1987).
Zwicker, E. “Subdivision of the Audible Frequency Range into Critical Bands
(Frequenzgruppen),” [Link]. Soc. of Am., Vol. 33, p. 248 (Feb. 1961).
—6—
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
4. NOTATION, DEFINITIONS, AND TERMINOLOGY
4.1 Compliance notation
As used in this document, “must”, “shall” or “will” denotes a mandatory provision
of this standard. “Should” denotes a provision that is recommended but not mandatory.
“May” denotes a feature whose presence does not preclude compliance, and that may or
may not be present at the option of theimplementor.
4.2 Definitions
A number of terms are used in this document. Below are definitions which explain
the meaning of some of the terms which are used.
audio block: A set of 512 audio samples consisting of 256 samples of the preceding audio
block, and 256 new time samples. A new audio block occurs every 256 audio
samples. Each audio sample is represented in two audio blocks.
bin: The number of the frequency coefficient, as in frequency bin number n. The
512 point TDAC transform produces 256 frequency coefficients or frequency
bins.
coefficient: The time domain samples are converted into frequency domain coefficients
by the transform.
coupled channel: A full bandwidth channel whose high frequency information is combined
into the coupling channel.
coupling band: A band of coupling channel transform coefficients covering one or more
coupling channel sub-bands.
coupling channel: The channel formed by combining the high frequency information from the
coupled channels.
coupling sub-band: A sub-band consisting of a group of 12 coupling channel transform
coefficients.
downmixing: Combining (or mixing down) the content of n original channels to produce
m channels, where m<n.
exponent set: The set of exponents for an independent channel, for the coupling channel,
or for the low frequency portion of a coupled channel.
full bandwidth (fbw) An audio channel capable of full audio bandwidth. All channels (left, center,
channel: right, left surround, right surround) except thelfe channel are fbw channels.
independent channel: A channel whose high frequency information is not combined into the
coupling channel. (The lfe channel is always independent.)
low frequency effects l(fe) An optional single channel of limited (<120 Hz) bandwidth, which is
channel: intended to be reproduced at a level +10 dB with respect to the fbw channels.
The optional lfe channel allows high sound pressure levels to be provided for
low frequency sounds.
spectral envelope: A spectral estimate consisting of the set of exponents obtained by decoding
the encoded exponents. Similar (but not identical) to the original set of
exponents.
synchronization frame: A unit of the serial bit stream capable of being fully decoded. The
synchronization frame begins with a sync code and contains 1536 coded
audio samples.
—7—
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
window: A time vector which is multiplied by an audio block to provide a windowed
audio block. The window shape establishes the frequency selectivity of the
filterbank, and provides for the proper overlap/add characteristic to avoid
blocking artifacts.
4.3 Terminology abbreviations
A number of abbreviations are used to refer to elements employed in the AC-3
format. The following list is a cross reference from each abbreviation to the terminology
which it represents. For most items, a reference to further information is provided. This
document makes extensive use of these abbreviations. The abbreviations are lower case
with a maximum length of 12 characters, and are suitable for use in either high level or
assembly language computer software coding. Those who implement this standard are
encouraged to use these same abbreviations in any computer source code, or other
hardware or software implementation documentation.
Abbreviation Terminology Reference
acmod audio coding mode Section [Link] on page21
addbsi additional bit stream information Section [Link] on page26
addbsie additional bit stream information exists Section [Link] on page26
addbsil additional bit stream information length Section [Link] on page26
audblk audio block Section 5.4.3 on page26
audprodie audio production information exists Section [Link] on page23
audprodi2e audio production information exists, ch2 Section [Link] on page24
auxbits auxiliary data bits Section [Link] on page36
auxdata auxiliary data field Section [Link] on page36
auxdatae auxiliary data exists Section [Link] on page36
auxdatal auxiliary data length Section [Link] on page36
baie bit allocation information exists Section [Link] on page31
bap bit allocation pointer
bin frequency coefficient bin in index bin
[ ] Section [Link] on page28
blk block in array index [blk]
blksw block switch flag Section [Link] on page26
bnd band in array index [bnd]
bsi bit stream information Section 5.4.2 on page20
bsid bit stream identification Section [Link] on page20
bsmod bit stream mode Section [Link] on page20
ch channel in array index [ch]
chbwcod channel bandwidth code Section [Link] on page30
chexpstr channel exponent strategy Section [Link] on page30
chincpl channel in coupling Section [Link] on page27
chmant channel mantissas Section [Link] on page35
clev center mixing level coefficient Section [Link] on page21
cmixlev center mix level Section [Link] on page21
compr compression gain word Section [Link] on page23
compr2 compression gain word, ch2 Section [Link] on page24
compre compression gain word exists Section [Link] on page22
—8—
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Abbreviation Terminology Reference
compr2e compression gain word exists, ch2 Section [Link] on page24
copyrightb copyright bit Section [Link] on page25
cplabsexp coupling absolute exponent Section [Link] on page30
cplbegf coupling begin frequency code Section [Link] on page27
cplbndstrc coupling band structure Section [Link] on page28
cplco coupling coordinate Section 7.4.3 on page71
cplcoe coupling coordinates exist Section [Link] on page28
cplcoexp coupling coordinate exponent Section [Link] on page29
cplcomant coupling coordinate mantissa Section [Link] on page29
cpldeltba coupling dba Section [Link] on page34
cpldeltbae coupling dba exists Section [Link] on page33
cpldeltlen coupling dba length Section [Link] on page34
cpldeltnseg coupling dba number of segments Section [Link] on page33
cpldeltoffst coupling dba offset Section [Link] on page33
cplendf coupling end frequency code Section [Link] on page27
cplexps coupling exponents Section [Link] on page30
cplexpstr coupling exponent strategy Section [Link] on page30
cplfgaincod coupling fast gain code Section [Link] on page32
cplfleak coupling fast leak initialization Section [Link] on page33
cplfsnroffst coupling fine SNR offset Section [Link] on page32
cplinu coupling in use Section [Link] on page27
cplleake coupling leak initialization exists Section [Link] on page32
cplmant coupling mantissas Section [Link] on page35
cplsleak coupling slow leak initialization Section [Link] on page33
cplstre coupling strategy exists Section [Link] on page27
crc1 crc - cyclic redundancy check word 1 Section [Link] on page19
crc2 crc - cyclic redundancy check word 2 Section [Link] on page36
crcrsv crc reserved bit Section [Link] on page36
csnroffst coarse SNR offset Section [Link] on page32
d15 d15 exponent coding mode Section [Link] on page30
d25 d25 exponent coding mode Section [Link] on page30
d45 d45 exponent coding mode Section [Link] on page30
dba delta bit allocation Section [Link] on page33
dbpbcod dB per bit code Section [Link] on page31
deltba channel dba Section [Link] on page34
deltbae channel dba exists Section [Link] on page33
deltbaie dba information exists Section [Link] on page33
deltlen channel dba length Section [Link] on page34
deltnseg channel dba number of segments Section [Link] on page34
deltoffst channel dba offset Section [Link] on page34
dialnorm dialogue normalization word Section [Link] on page22
dialnorm2 dialogue normalization word, ch2 Section [Link] on page24
dithflag dither flag Section [Link] on page26
dsurmod Dolby surround mode Section [Link] on page22
dynrng dynamic range gain word Section [Link] on page26
dynrng2 dynamic range gain word, ch2 Section [Link] on page27
dynrnge dynamic range gain word exists Section [Link] on page26
dynrng2e dynamic range gain word exists, ch2 Section [Link] on page27
exps channel exponents Section [Link] on page31
—9—
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Abbreviation Terminology Reference
fbw full bandwidth
fdcycod fast decay code Section [Link] on page31
fgaincod channel fast gain code Section [Link] on page32
floorcod masking floor code Section [Link] on page32
floortab masking floor table Section [Link] on page57
frmsizecod frame size code Section [Link] on page20
fscod sampling frequency code Section [Link] on page19
fsnroffst channel fine SNR offset Section [Link] on page32
gainrng channel gain range code Section [Link] on page31
grp group in index [grp]
langcod language code Section [Link] on page23
langcod2 language code, ch2 Section [Link] on page24
langcode language code exists Section [Link] on page23
langcod2e language code exists, ch2 Section [Link] on page24
lfe low frequency effects
lfeexps lfe exponents Section [Link] on page31
lfeexpstr lfe exponent strategy Section [Link] on page30
lfefgaincod lfe fast gain code Section [Link] on page32
lfefsnroffst lfe fine SNR offset Section [Link] on page32
lfemant lfe mantissas Section [Link] on page35
lfeon lfe on Section [Link] on page22
mixlevel mixing level Section [Link] on page23
mixlevel2 mixing level, ch2 Section [Link] on page24
mstrcplco master coupling coordinate Section [Link] on page28
nauxbits number of auxiliary bits Section [Link] on page36
nchans number of channels Section [Link] on page21
nchgrps number of fbw channel exponent groups Section [Link] on page31
nchmant number of fbw channel mantissas Section [Link] on page35
ncplbnd number of structured coupled bands Section [Link] on page28
ncplgrps number of coupled exponent groups Section [Link] on page30
ncplmant number of coupled mantissas Section [Link] on page35
ncplsubnd number of coupling sub-bands Section [Link] on page27
nfchans number of fbw channels Section [Link] on page21
nlfegrps number of lfe channel exponent groups Section [Link] on page31
nlfemant number of lfe channel mantissas Section [Link] on page35
origbs original bit stream Section [Link] on page25
phsflg phase flag Section [Link] on page29
phsflginu phase flags in use Section [Link] on page27
rbnd rematrix band in index [rbnd]
rematflg rematrix flag Section [Link] on page29
rematstr rematrixing strategy Section [Link] on page29
roomtyp room type Section [Link] on page23
roomtyp2 room type, ch2 Section [Link] on page25
sbnd sub-band in index [sbnd]
sdcycod slow decay code Section [Link] on page31
seg segment in index [seg]
sgaincod slow gain code Section [Link] on page31
skipfld skip field Section [Link] on page35
skipl skip length Section [Link] on page35
— 10 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Abbreviation Terminology Reference
skiple skip length exists Section [Link] on page35
slev surround mixing level coefficient Section [Link] on page21
snroffste SNR offset exists Section [Link] on page32
surmixlev surround mix level Section [Link] on page21
syncframe synchronization frame Section 5.1 on page12
syncinfo synchronization information Section 5.3.1 on page13
syncword synchronization word Section [Link] on page19
tdac time division aliasing cancellation
timecod1 time code first half Section [Link] on page25
timecod2 time code second half Section [Link] on page25
timecod1e time code first half exists Section [Link] on page25
timecod2e time code second half exists Section [Link] on page25
— 11 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
5. BIT STREAM SYNTAX
5.1 Synchronization frame
An AC-3 serial coded audio bit stream is made up of a sequence of
synchronization frames (see Figure 5.1). Each synchronization frame contains 6 coded
audio blocks (AB), each of which represent 256 new audio samples. A synchronization
information (SI) header at the beginning of each frame contains information needed to
acquire and maintain synchronization. A bit stream information (BSI) header follows SI,
and contains parameters describing the coded audio service. The coded audio blocks may
be followed by an auxiliary data (Aux) field. At the end of each frame is an error check
field that includes a CRC word for error detection. An additional CRC word is located in
the SI header, the use of which is optional.
SI BSI
SI BSI
C
AB 0 AB 1 AB 2 AB 3 AB 4 AB 5 Aux R
C
Sync Frame
Figure 5.1. AC-3 synchronization frame.
5.2 Semantics of syntax specification
The following pseudo code describes the order of arrival of information within the
bit stream. This pseudo code is roughly based on C language syntax, but simplified for
ease of reading. For bit stream elements which are larger than 1-bit, the order of the bits in
the serial bit stream is either most-significant-bit-first (for numerical values), or left-bit-
first (for bit-field values). Fields or elements contained in the bit stream are indicated with
bold type. Syntactic elements are typographically distinguished by the use of a different
font (e.g., dynrng).
Some AC-3 bit stream elements naturally form arrays. This syntax specification
treats all bit stream elements individually, whether or not they would naturally be included
in arrays. Arrays are thus described as multiple elements (as in blksw[ch] as opposed to
simply blksw or blksw[]), and control structures such as for loops are employed to increment
the index ([ch] for channel in this example).
5.3 Syntax specification
A continuous audio bit stream would consist of a sequence of synchronization
frames:
— 12 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Syntax
AC-3_bitstream()
{
while(true)
{
syncframe() ;
}
} /* end of AC-3 bit stream */
The syncframe consists of the syncinfo and bsi fields, the 6 coded audblk fields, the
auxdata field, and theerrorcheck field.
Syntax
syncframe()
{
syncinfo() ;
bsi() ;
for(blk = 0; blk < 6; blk++)
{
audblk() ;
}
auxdata() ;
errorcheck() ;
} /* end of syncframe */
Each of the bit stream elements, and their length, are itemized in the following
pseudo code. Note that all bit stream elements arrive most significant bit first, or left bit
first, in time.
5.3.1 syncinfo - synchronization information
Syntax word size
syncinfo()
{
syncword................................ ................................ ................................ ................................ ........16
crc1 ................................ ................................ ................................ ................................ ................. 16
fscod................................ ................................ ................................ ................................ .................2
frmsizecod................................ ................................ ................................ ................................ ........6
} /* end of syncinfo */
5.3.2 bsi - bit stream information
Syntax word size
bsi()
{
bsid ................................ ................................ ................................ ................................ ...................5
bsmod................................ ................................ ................................ ................................ ...............3
acmod................................ ................................ ................................ ................................ ...............3
if((acmod & 0x1) && (acmod != 0x1)) /* if 3 front channels */ {cmixlev}................................ ................2
if(acmod & 0x4) /* if a surround channel exists */ {surmixlev} ................................ ............................... 2
if(acmod == 0x2) /* if in 2/0 mode */ {dsurmod}................................ ................................ ..................2
— 13 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Syntax word size
lfeon................................ ................................ ................................ ................................ ..................1
dialnorm................................ ................................ ................................ ................................ ............5
compre................................ ................................ ................................ ................................ ..............1
if(compre) {compr}................................ ................................ ................................ ............................ 8
langcode................................ ................................ ................................ ................................ ...........1
if(langcode) {langcod}................................ ................................ ................................ ....................... 8
audprodie................................ ................................ ................................ ................................ ..........1
if(audprodie)
{
mixlevel................................ ................................ ................................ ................................ ......5
roomtyp................................ ................................ ................................ ................................ ......2
}
if(acmod == 0) /* if 1+1 mode (dual mono, so some items need a second value) */
{
dialnorm2................................ ................................ ................................ ................................ ...5
compr2e................................ ................................ ................................ ................................ .....1
if(compr2e) {compr2}................................ ................................ ................................ ..................8
lngcod2e................................ ................................ ................................ ................................ ....1
if(langcod2e) {langcod2}................................ ................................ ................................ .............8
audprodi2e................................ ................................ ................................ ................................ .1
if(audprodi2e)
{
mixlevel2................................ ................................ ................................ ............................. 5
roomtyp2................................ ................................ ................................ ............................. 2
}
}
copyrightb................................ ................................ ................................ ................................ .........1
origbs................................ ................................ ................................ ................................ ................1
timecod1e................................ ................................ ................................ ................................ .........1
if(timecod1e) {timecod1}................................ ................................ ................................ ................. 14
timecod2e................................ ................................ ................................ ................................ .........1
if(timecod2e) {timecod2}................................ ................................ ................................ ................. 14
addbsie................................ ................................ ................................ ................................ .............1
if(addbsie)
{
addbsil ................................ ................................ ................................ ................................ .......6
addbsi ................................ ................................ ................................ .................... (addbsil+1)×8
}
} /* end of bsi */
5.3.3 audblk - audio block
Syntax word size
audblk()
{
/* These fields for block switch and dither flags */
for(ch = 0; ch < nfchans; ch++) {blksw[ch]}................................ ................................ ....................... 1
for(ch = 0; ch < nfchans; ch++) {dithflag[ch]}................................ ................................ ....................1
/* These fields for dynamic range control */
dynrnge................................ ................................ ................................ ................................ .............1
if(dynrnge) {dynrng}................................ ................................ ................................ ........................... 8
— 14 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Syntax word size
if(acmod == 0) /* if 1+1 mode */
{
dynrng2e................................ ................................ ................................ ................................ ....1
if(dynrng2e) {dynrng2}................................ ................................ ................................ ................8
}
/* These fields for coupling strategy information */
cplstre................................ ................................ ................................ ................................ ...............1
if(cplstre)
{
cplinu ................................ ................................ ................................ ................................ .........1
if(cplinu)
{
for(ch = 0; ch < nfchans; ch++) {chincpl[ch]}................................ ................................ .......1
if(acmod == 0x2) {phsflginu} /* if in 2/0 mode */................................ ................................ ....1
cplbegf ................................ ................................ ................................ ................................ 4
cplendf ................................ ................................ ................................ ................................ 4
/* ncplsubnd = 3 + cplendf - cplbegf */
for(bnd = 1; bnd < ncplsubnd; bnd++) {cplbndstrc[bnd]}................................ ..................... 1
}
}
/* These fields for coupling coordinates, phase flags */
if(cplinu)
{
for(ch = 0; ch < nfchans; ch++)
{
if(chincpl[ch])
{
cplcoe[ch]................................ ................................ ................................ ..................... 1
if(cplcoe[ch])
{
mstrcplco[ch]................................ ................................ ................................ ........2
/* ncplbnd derived from ncplsubnd, and cplbndstrc */
for(bnd = 0; bnd < ncplbnd; bnd++)
{
cplcoexp[ch][bnd]................................ ................................ .......................... 4
cplcomant[ch][bnd]................................ ................................ ........................ 4
}
}
}
}
if((acmod == 0x2) && phsflginu && (cplcoe[0] || cplcoe[1]))
{
for(bnd = 0; bnd < ncplbnd; bnd++) {phsflg[bnd]}................................ ................................ 1
}
}
/* These fields for rematrixing operation in the 2/0 mode */
if(acmod == 0x2) /* if in 2/0 mode */
{
rematstr................................ ................................ ................................ ................................ ......1
if(rematstr)
{
— 15 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Syntax word size
if((cplbegf > 2) || (cplinu == 0))
{
for(rbnd = 0; rbnd < 4; rbnd++) {rematflg[rbnd]}................................ ........................... 1
}
if((2 >= cplbegf > 0) && cplinu)
{
for(rbnd = 0; rbnd < 3; rbnd++) {rematflg[rbnd]}................................ ........................... 1
}
if((cplbegf == 0) && cplinu)
{
for(rbnd = 0; rbnd < 2; rbnd++) {rematflg[rbnd]}................................ ........................... 1
}
}
}
/* These fields for exponent strategy */
if(cplinu) {cplexpstr}................................ ................................ ................................ .......................... 2
for(ch = 0; ch < nfchans; ch++) {chexpstr[ch]}................................ ................................ ...................2
if(lfeon) {lfeexpstr}................................ ................................ ................................ ............................. 1
for(ch = 0; ch < nfchans; ch++)
{
if(chexpstr[ch] != reuse)
{
if(!chincpl[ch]) {chbwcod[ch]}................................ ................................ .............................. 6
}
}
/* These fields for exponents */
if(cplinu) /* exponents for the coupling channel */
{
if(cplexpstr != reuse)
{
cplabsexp................................ ................................ ................................ ............................ 4
/* ncplgrps derived from ncplsubnd, cplexpstr */
for(grp = 0; grp< ncplgrps; grp++) {cplexps[grp]}................................ ................................ .7
}
}
for(ch = 0; ch < nfchans; ch++) /* exponents for full bandwidth channels */
{
if(chexpstr[ch] != reuse)
{
exps[ch][0]................................ ................................ ................................ .......................... 4
/* nchgrps derived from chexpstr[ch], and cplbegf or chbwcod[ch] */
for(grp = 1; grp <= nchgrps[ch]; grp++) {exps[ch][grp]}................................ ........................ 7
gainrng[ch]................................ ................................ ................................ .......................... 2
}
}
if(lfeon) /* exponents for the low frequency effects channel */
{
if(lfeexpstr != reuse)
{
lfeexps[0]................................ ................................ ................................ ............................. 4
/* nlfegrps = 2 */
for(grp = 1; grp <= nlfegrps; grp++) {lfeexps[grp]}................................ ................................ 7
— 16 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Syntax word size
}
}
/* These fields for bit-allocation parametric information */
baie ................................ ................................ ................................ ................................ ...................1
if(baie)
{
sdcycod................................ ................................ ................................ ................................ ......2
fdcycod................................ ................................ ................................ ................................ ......2
sgaincod................................ ................................ ................................ ................................ ....2
dbpbcod................................ ................................ ................................ ................................ .....2
floorcod................................ ................................ ................................ ................................ ......3
}
snroffste................................ ................................ ................................ ................................ ............1
if(snroffste)
{
csnroffst................................ ................................ ................................ ................................ .....6
if(cplinu)
{
cplfsnroffst................................ ................................ ................................ .......................... 4
cplfgaincod................................ ................................ ................................ ......................... 3
}
for(ch = 0; ch < nfchans; ch++)
{
fsnroffst[ch]................................ ................................ ................................ ........................ 4
fgaincod[ch]................................ ................................ ................................ ........................ 3
}
if(lfeon)
{
lfefsnroffst................................ ................................ ................................ ........................... 4
lfefgaincod................................ ................................ ................................ .......................... 3
}
}
if(cplinu)
{
cplleake................................ ................................ ................................ ................................ ......1
if(cplleake)
{
cplfleak................................ ................................ ................................ ................................ 3
cplsleak................................ ................................ ................................ ............................... 3
}
}
/* These fields for delta bit allocation information */
deltbaie................................ ................................ ................................ ................................ .............1
if(deltbaie)
{
if(cplinu) {cpldeltbae}................................ ................................ ................................ .................2
for(ch = 0; ch < nfchans; ch++) {deltbae[ch]}................................ ................................ ..............2
if(cplinu)
{
if(cpldeltbae==new info follows)
{
cpldeltnseg................................ ................................ ................................ ...................3
— 17 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Syntax word size
for(seg = 0; seg <= cpldeltnseg; seg++)
{
cpldeltoffst[seg]................................ ................................ ................................ ....5
cpldeltlen[seg]................................ ................................ ................................ .......4
cpldeltba[seg]................................ ................................ ................................ ........3
}
}
}
for(ch = 0; ch < nfchans; ch++)
{
if(deltbae[ch]==new info follows)
{
deltnseg[ch]................................ ................................ ................................ .................3
for(seg = 0; seg <= deltnseg[ch]; seg++)
{
deltoffst[ch][seg]................................ ................................ ................................ ...5
deltlen[ch][seg]................................ ................................ ................................ .....4
deltba[ch][seg]................................ ................................ ................................ ......3
}
}
}
}
/* These fields for inclusion of unused dummy data */
skiple................................ ................................ ................................ ................................ .................1
if(skiple)
{
skipl................................ ................................ ................................ ................................ ............9
skipfld ................................ ................................ ................................ ............................. skipl × 8
}
/* These fields for quantized mantissa values */
ch = 0
do /* mantissas of chs up to and including first coupled ch */
{
for(bin = 0; bin < nchmant[ch]; bin++) {chmant[ch][bin]}................................ .................... (0-16)
ch += 1
}
while(chinclp[ch] == 0 && ch < nfchans)
if(cplinu) /* mantissas of coupling channel */
{
for(bin = 0; bin < ncplmant; bin++) {cplmant[bin]}................................ .............................. (0-16)
}
while(ch<nfchans) /* mantissas of remaining channels, whether or not coupled */
{
for(bin = 0; bin < nchmant[ch]; bin++) {chmant[ch][bin]}................................ .................... (0-16)
ch += 1
}
if(lfeon) /* mantissas of low frequency effects channel */
{
for(bin = 0; bin < nlfemant; bin++) {lfemant[bin]}................................ ................................ (0-16)
}
} /* end of audblk */
— 18 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
5.3.4 auxdata - auxiliary data
Syntax word size
auxdata()
{
auxbits................................ ................................ ................................ ................................ ...nauxbits
if(auxdatae)
{
auxdatal................................ ................................ ................................ ................................ ....14
}
auxdatae ................................ ................................ ................................ ................................ ...........1
} /* end of auxdata */
5.3.5 errorcheck - error detection code
Syntax word size
errorcheck()
{
crcrsv................................ ................................ ................................ ................................ ................1
crc2 ................................ ................................ ................................ ................................ ................. 16
} /* end of errorcheck */
5.4 Description of bit stream elements
A number of bit stream elements have values which may be transmitted, but whose
meaning has been reserved. If a decoder receives a bit stream which contains reserved
values, the decoder may or may not be able to decode and produce audio. In the
description of bit stream elements which have reserved codes, there is an indication of
what the decoder can do if the reserved code is received. In some cases, the decoder can
not decode audio. In other cases, the decoder can still decode audio by using a default
value for a parameter which was indicated by a reserved code.
5.4.1 syncinfo - synchronization information
[Link] syncword - synchronization word - 16 bits
The syncword is always 0x0B77, or ‘0000 1011 0111 0111’. Transmission of the
syncword, like other bit field elements, is left bit first.
[Link] crc1 - cyclic redundancy check 1 - 16 bits
This 16 bit-CRC applies to the first 5/8 of the frame. Transmission of the CRC,
like other numerical values, is most significant bit first.
[Link] fscod - sample rate code - 2 bits
This is a 2-bit code indicating sample rate according to Table 5.1. If the reserved
code is indicated, the decoder should not attempt to decode audio and should mute.
— 19 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 5.1 Sample Rate Codes
fscod sampling rate, kHz
‘00’ 48
‘01’ 44.1
‘10’ 32
‘11’ reserved
[Link] frmsizecod - frame size code - 6 bits
The frame size code is used along with the sample rate code to determine the
number of (2-byte) words before the nextsyncword. See Table 5.13 on page38.
5.4.2 bsi - bit stream information
[Link] bsid - bit stream identification - 5 bits
This bit field has a value of ‘01000’ (=8) in this version of this standard. Future
modifications of this standard may define other values. Values of bsid smaller than 8 will
be used for versions of AC-3 which implement subsets of the version 8 syntax. Decoders
which can decode version 8 will thus be able to decode version numbers less than 8. If this
standard is extended by the addition of additional elements or features, a value of bsid
greater than 8 will be used. Decoders built to this version of the standard will not be able
to decode versions with bsid greater than 8. Thus, decoders built to this standard shall
mute if the value of bsid is greater than 8, and should decode and reproduce audio if the
value of bsid is less than or equal to 8.
[Link] bsmod - bit stream mode - 3 bits
This 3-bit code indicates the type of service that the bit stream conveys as defined
in Table 5.2.
Table 5.2 Bit Stream Mode
bsmod acmod type of service
‘000’ any main audio service: complete main (CM)
‘001’ any main audio service: music and effects (ME)
‘010’ any associated service: visually impaired (VI)
‘011’ any associated service: hearing impaired (HI)
‘100’ any associated service: dialogue (D)
‘101’ any associated service: commentary (C)
‘110’ any associated service: emergency (E)
‘111’ ‘001’ associated service: voice over (VO)
‘111’ ‘010’ - ‘111’ main audio service: karaoke
— 20 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
[Link] acmod - audio coding mode - 3 bits
This 3-bit code, shown in Table 5.3, indicates which of the main service channels
are in use, ranging from 3/2 to 1/0. If the msb of acmod is a 1, surround channels are in use
and surmixlev follows in the bit stream. If the msb of acmod is a 0, the surround channels are
not in use and surmixlev does not follow in the bit stream. If the lsb of acmod is a 0, the
center channel is not in use. If the lsb of acmod is a 1, the center channel is in use. Note:
The state of acmod sets the number of full-bandwidth channels parameter, nfchans, (e.g., for
3/2 mode, nfchans = 5; for 2/1 mode, nfchans = 3; etc.). The total number of channels,
nchans, is equal to nfchans if the lfe channel is off, and is equal to 1+nfchans if the lfe
channel is on. If acmod is 0, then two completely independent program channels (dual
mono) are encoded into the bit stream, and are referenced as Ch1, Ch2. In this case, a
number of additional items are present in BSI or audblk to fully describe Ch2. Table 5.3
also indicates the channel ordering (the order in which the channels are processed) for
each of the modes.
Table 5.3 Audio Coding Mode
acmod audio coding mode nfchans channel array ordering
‘000’ 1+1 2 Ch1, Ch2
‘001’ 1/0 1 C
‘010’ 2/0 2 L, R
‘011’ 3/0 3 L, C, R
‘100’ 2/1 3 L, R, S
‘101’ 3/1 4 L, C, R, S
‘110’ 2/2 4 L, R, SL, SR
‘111’ 3/2 5 L, C, R, SL, SR
[Link] cmixlev - center mix level - 2 bits
When three front channels are in use, this 2-bit code, shown in Table 5.4, indicates
the nominal down mix level of the center channel with respect to the left and right
channels. If cmixlev is set to the reserved code, decoders should still reproduce audio. The
intermediate value ofcmixlev (-4.5 dB) may be used in this case.
Table 5.4 Center Mix Level
cmixlev clev
‘00’ 0.707 (-3.0 dB)
‘01’ 0.596 (-4.5 dB)
‘10’ 0.500 (-6.0 dB)
‘11’ reserved
[Link] surmixlev - surround mix level - 2 bits
If surround channels are in use, this 2-bit code, shown in Table 5.5, indicates the
nominal down mix level of the surround channels. If surmixlev is set to the reserved code,
— 21 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
the decoder should still reproduce audio. The intermediate value of surmixlev (-6 dB) may
be used in this case.
Table 5.5 Surround Mix Level
surmixlev slev
‘00’ 0.707 (-3 dB)
‘01’ 0.500 (-6 dB)
‘10’ 0
‘11’ reserved
[Link] dsurmod - Dolby surround mode - 2 bits
When operating in the two channel mode, this 2-bit code, as shown in Table 5.6,
indicates whether or not the program has been encoded in Dolby Surround. This
information is not used by the AC-3 decoder, but may be used by other portions of the
audio reproduction equipment. If dsurmod is set to the reserved code, the decoder should
still reproduce audio. The reserved code may be interpreted as “not indicated”.
Table 5.6 Dolby Surround Mode
dsurmod indication
‘00’ not indicated
‘01’ NOT Dolby Surround encoded
‘10’ Dolby Surround encoded
‘11’ reserved
[Link] lfeon - low frequency effects channel on - 1 bit
This bit has a value of 1 if the lfe (sub woofer) channel is on, and a value of 0 if the
lfe channel is off.
[Link] dialnorm - dialogue normalization - 5 bits
This 5-bit code indicates how far the average dialogue level is below digital 100%.
Valid values are 1-31. The value of 0 is reserved. The values of 1 to 31 are interpreted as
-1 dB to -31 dB with respect to digital 100%. If the reserved value of 0 is received, the
decoder shall use -31 dB. The value of dialnorm shall affect the sound reproduction level. If
the value is not used by the AC-3 decoder itself, the value shall be used by other parts of
the audio reproduction equipment. Dialogue normalization is further explained in Section
7.6 on page 75.
[Link] compre - compression gain word exists - 1 bit
If this bit is a 1, the following 8 bits represent a compression control word.
— 22 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
[Link] compr - compression gain word - 8 bits
This encoder generated gain word may be present in the bit stream. If so, it may
used to scale the reproduced audio level in order to reproduce a very narrow dynamic
range, with an assured upper limit of instantaneous peak reproduced signal level in the
monophonic downmix. The meaning and use of compr is described further in Section 7.7.2
on page 79.
[Link] langcode - language code exists - 1 bit
If this bit is a 1, the following 8 bits represent a language code. If this bit is a 0, the
language of the audio service is not indicated.
[Link] langcod - language code - 8 bits
This is an 8 bit code representing the language of the audio service. See Table 5.14
on page 39 for the mapping oflangcod into language.
[Link] audprodie - audio production information exists - 1 bit
If this bit is a 1, the mixlevel and roomtyp fields exist, indicating information about
the audio production environment (mixing room).
[Link] mixlevel - mixing level - 5 bits
This 5-bit code indicates the absolute acoustic sound pressure level of an individual
channel during the final audio mixing session. The 5-bit code represents a value in the
range 0 to 31. The peak mixing level is 80 plus the value of mixlevel dB SPL, or 80 to 111
dB SPL. The peak mixing level is the acoustic level of a sine wave in a single channel
whose peaks reach 100% in the PCM representation. The absolute SPL value is typically
measured by means of pink noise with an RMS value of -20 or -30 dB with respect to the
peak RMS sine wave level. The value of mixlevel is not typically used within the AC-3
decoder, but may be used by other parts of the audio reproduction equipment.
[Link] roomtyp - room type - 2 bits
This 2-bit code, shown in Table 5.7, indicates the type and calibration of the
mixing room used for the final audio mixing session. The value of roomtyp is not typically
used by the AC-3 decoder, but may be used by other parts of the audio reproduction
equipment. If roomtyp is set to the reserved code, the decoder should still reproduce audio.
The reserved code may be interpreted as “not indicated”.
— 23 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 5.7 Room Type
roomtyp type of mixing room
‘00’ not indicated
‘01’ large room, X curve monitor
‘10’ small room, flat monitor
‘11’ reserved
[Link] dialnorm2 - dialogue normalization, ch2 - 5 bits
This 5-bit code has the same meaning as dialnorm , except that it applies to the
second audio channel when acmod indicates two independent channels (dual mono 1+1
mode).
[Link] compr2e - compression gain word exists, ch2 - 1 bit
If this bit is a 1, the following 8 bits represent a compression gain word for Ch2.
[Link] compr2 - compression gain word, ch2 - 8 bits
This 8-bit word has the same meaning as compr, except that it applies to the second
audio channel whenacmod indicates two independent channels (dual mono 1+1 mode).
[Link] langcod2e - language code exists, ch2 - 1 bit
If this bit is a 1, the following 8 bits represent a language code for Ch2. If this bit is
a 0, the language of the Ch2 is not indicated.
[Link] langcod2 - language code, ch2 - 8 bits
This 8-bit code has the same meaning as langcod, except that it applies to the
second audio channel when acmod indicates two independent channels (dual mono, 1+1
mode).
[Link] audprodi2e - audio production information exists, ch2 - 1 bit
If this bit is a 1, the following two data fields exist indicating information about the
audio production for Ch2.
[Link] mixlevel2 - mixing level, ch2 - 5 bits
This 5-bit code has the same meaning as mixlevel, except that it applies to the
second audio channel when acmod indicates two independent channels (dual mono 1+1
mode).
— 24 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
[Link] roomtyp2 - room type, ch2 - 2 bits
This 2-bit code has the same meaning as roomtyp , except that it applies to the
second audio channel when acmod indicates two independent channels (dual mono 1+1
mode).
[Link] copyrightb - copyright bit - 1 bit
If this bit has a value of 1, the information in the bit stream is indicated as
protected by copyright. It has a value of 0 if the information is not indicated as protected.
[Link] origbs - original bit stream - 1 bit
This bit has a value of 1 if this is an original bit stream. This bit has a value of 0 if
this is a copy of another bit stream.
[Link] timecod1e, timcode2e - time code (first and second) halves exist - 2 bits
These values indicate, as shown in Table 5.8, whether time codes follow in the bit
stream. The time code can have a resolution of 1/64th of a frame (one frame = 1/30th of a
second). Since only the high resolution portion of the time code is needed for fine
synchronization, the 28 bit time code is broken into two 14 bit halves. The low resolution
first half represents the code in 8 second increments up to 24 hours. The high resolution
second half represents the code in 1/64th frame increments up to 8 seconds.
Table 5.8 Time Code Exists
timecod2e,timecod1e time code present
‘0’,’0’ not present
‘0’,’1’ first half (14 bits) present
‘1’,’0’ second half (14 bits) present
‘1’,’1’ both halves (28 bits) present
[Link] timecod1 - time code first half - 14 bits
The first 5 bits of this 14-bit field represent the time in hours, with valid values of
0-23. The next 6 bits represent the time in minutes, with valid values of 0-59. The final 3
bits represents the time in 8 second increments, with valid values of 0-7 (representing 0, 8,
16, ... 56 seconds).
[Link] timecod2 - time code second half - 14 bits
The first 3 bits of this 14-bit field represent the time in seconds, with valid values
from 0-7 (representing 0-7 seconds). The next 5 bits represents the time in frames, with
valid values from 0-29. The final 6 bits represents fractions of 1/64 of a frame, with valid
values from 0-63.
— 25 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
[Link] addbsie - additional bit stream information exists - 1 bit
If this bit has a value of 1 there is additional bit stream information, the length of
which is indicated by the next field. If this bit has a value of 0, there is no additional bit
stream information.
[Link] addbsil - additional bit stream information length - 6 bits
This 6-bit code, which exists only if addbside is a 1, indicates the length in bytes of
additional bit stream information. The valid range of addbsil is 0-63, indicating 1-64
additional bytes, respectively. The decoder is not required to interpret this information,
and thus shall skip over this number of bytes following in the data stream.
× 8) bits
[Link] addbsi - additional bit stream information - ((addbsil+1)
This field contains 1 to 64 bytes of any additional information included with the bit
stream information structure.
5.4.3 audblk audio block
[Link] blksw[ch] - block switch flag - 1 bit
This flag, for channel [ch], indicates whether the current audio block was split into
2 sub-blocks during the transformation from the time domain into the frequency domain.
A value of 0 indicates that the block was not split, and that a single 512 point TDAC
transform was performed. A value of 1 indicates that the block was split into 2 sub-blocks
of length 256, that the TDAC transform length was switched from a length of 512 points
to a length of 256 points, and that 2 transforms were performed on the audio block (one
on each sub-block). Transform length switching is described in more detail in Section 7.9
on page 87.
[Link] dithflag[ch] - dither flag - 1 bit
This flag, for channel [ch], indicates that the decoder should activate dither during
the current block. Dither is described in detail in Section 7.3.4 on page
67.
[Link] dynrnge - dynamic range gain word exists - 1 bit
If this bit is a 1, the dynamic range gain word follows in the bit stream. If it is 0,
the gain word is not present, and the previous value is reused, except for block 0 of a
frame where if the control word is not present the current value of
dynrng is set to 0.
[Link] dynrng - dynamic range gain word - 8 bits
This encoder-generated gain word is applied to scale the reproduced audio as
described in Section 7.7.1 on page76.
— 26 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
[Link] dynrng2e - dynamic range gain word exists, ch2 - 1 bit
If this bit is a 1, the dynamic range gain word for channel 2 follows in the bit
stream. If it is 0, the gain word is not present, and the previous value is reused, except for
block 0 of a frame where if the control word is not present the current value of dynrng2 is
set to 0.
[Link] dynrng2 - dynamic range gain word ch2 - 8 bits
This encoder-generated gain word is applied to scale the reproduced audio of Ch2,
in the same manner asdynrng is applied to Ch1, as described in Section 7.7.1 on page76.
[Link] cplstre - coupling strategy exists - 1 bit
If this bit is a 1, coupling information follows in the bit stream. If it is 0, new
coupling information is not present, and coupling parameters previously sent are reused.
[Link] cplinu - coupling in use - 1 bit
If this bit is a 1, coupling is currently being utilized, and coupling parameters
follow. If it is 0, coupling is not being utilized (all channels are independent) and no
coupling parameters follow in the bit stream.
[Link] chincpl[ch] - channel in coupling - 1 bit
If this bit is a 1, then the channel indicated by the index [ch] is a coupled channel. If
the bit is a 0, then this channel is not coupled. Since coupling is not used in the 1/0 mode,
if any chincpl[] values exist there will be 2 to 5 values. Of the values present, at least two
values will be 1, since coupling requires more than one coupled channel to be coupled.
[Link] phsflginu - phase flags in use - 1 bit
If this bit (defined for 2/0 mode only) is a 1, phase flags are included with coupling
coordinate information. Phase flags are described in Section 7.4 on page 69.
[Link] cplbegf - coupling begin frequency code - 4 bits
This 4-bit code is interpreted as the sub-band number (0 to 15) which indicates the
lower frequency band edge of the coupling channel (or the first active sub-band) as shown
in Table 7.24 on page70.
[Link] cplendf - coupling end frequency code - 4 bits
This 4-bit code indicates the upper band edge of the coupling channel. The upper
band edge (or last active sub-band) is cplendf+2, or a value between 2 and 17. See Table
7.24 on page 70.
The number of active coupling sub-bands is equal to
ncplsubnd , which is calculated:
ncplsubnd = 3 + cplendf - cplbegf ;
— 27 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
[Link] cplbndstrc[sbnd] - coupling band structure - 1 bit
There are 18 coupling sub-bands defined in Table 7.24 on page 70, each containing
12 frequency coefficients. The fixed 12-bin wide coupling sub-bands are converted into
coupling bands, each of which may be wider than (a multiple of) 12 frequency bins. Each
coupling band may contain one or more coupling sub-bands. Coupling coordinates are
transmitted for each coupling band. Each band’s coupling coordinate must be applied to
all the coefficients in the coupling band.
The coupling band structure indicates which coupling sub-bands are combined into
wider coupling bands. When cplbndstrc[sbnd] is a 0, the sub-band number [sbnd] is not
combined into the previous band to form a wider band, but starts a new 12 wide coupling
band. When cplbndstrc[sbnd] is a 1, then the sub-band [sbnd] is combined with the previous
band, making the previous band 12 bins wider. Each successive value of cplbndstrc which is
a 1 will continue to combine sub-bands into the current band. When another cplbndstrc
value of 0 is received, then a new band will be formed, beginning with the 12 bins of the
current sub-band. The set ofcplbndstrc[sbnd] values is typically considered an array.
Each bit in the array corresponds to a specific coupling sub-band in ascending
frequency order. The first element of the array corresponds to the sub-band cplbegf, is
always 0, and is not transmitted. (There is no reason to send a cplbndstrc bit for the first
sub-band at cplbegf, since this bit would always be 0.) Thus, there are ncplsubnd -1 values of
cplbndstrc transmitted. If there is only one coupling sub-band, then no cplbndstrc bits are
sent.
The number of coupling bands, ncplbnd , may be computed from ncplsubnd and
cplbnstrc :
ncplbnd = (ncplsubnd - (cplbndstrc[cplbegf+1] + ... + cplbndstrc[cplendf+2])) ;
[Link] cplcoe[ch] - coupling coordinates exist - 1 bit
Coupling coordinates indicate, for a given channel and within a given coupling
band, the fraction of the coupling channel frequency coefficients to use to re-create the
individual channel frequency coefficients. Coupling coordinates are conditionally
transmitted in the bit stream. If new values are not delivered, the previously sent values
remain in effect. See Section 7.4 on page69 for further information on coupling.
If cplcoe[ch] is 1, the coupling coordinates for the corresponding channel [ch] exist
and follow in the bit stream. If the bit is 0, the previously transmitted coupling coordinates
for this channel are reused. All coupling coordinates are always transmitted in block 0 of
each syncframe.
[Link] mstrcplco[ch] - master coupling coordinate - 2 bits
This per channel parameter establishes a per channel gain factor (increasing the
dynamic range) for the coupling coordinates as shown in Table 5.9.
— 28 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 5.9 Master Coupling Coordinate
mstrcplco[ch] cplco[ch][bnd] gain multiplier
‘00’ 1
‘01’ 2-3
‘10’ 2-6
‘11’ 2-9
[Link] cplcoexp[ch][bnd] - coupling coordinate exponent - 4 bits
Each coupling coordinate is composed of a 4-bit exponent and a 4-bit mantissa.
This element is the value of the coupling coordinate exponent for channel [ch] and band
[bnd]. The index [ch] only will exist for those channels which are coupled. The index [bnd]
will range from 0 to ncplbnds. See Section 7.4.3 on page 71 for further information on how
to interpret coupling coordinates.
[Link] cplcomant[ch][bnd] - coupling coordinate mantissa- 4 bits
This element is the 4-bit coupling coordinate mantissa for channel [ch] and band
[bnd].
[Link] phsflg[bnd] - phase flag - 1 bit
This element (only used in the 2/0 mode) indicates whether the decoder should
phase invert the coupling channel mantissas when reconstructing the right output channel.
The index [bnd] can range from 0 to ncplbnd. Phase flags are described in Section 7.4 on
page 69.
[Link] rematstr - rematrixing strategy - 1 bit
If this bit is a 1, then new rematrix flags are present in the bit stream. If it is 0,
rematrix flags are not present, and the previous values should be reused. The rematstr
parameter is present only in the 2/0 audio coding mode.
[Link] rematflg[rbnd] - rematrix flag - 1 bit
This bit indicates whether the transform coefficients in rematrixing band [rbnd] have
been rematrixed. If this bit is a 1, then the transform coefficients in [rbnd] were rematrixed
into sum and difference channels. If this bit is a 0, then rematrixing has not been performed
in band [rbnd]. The number of rematrixing bands (and the number of values of [rbnd])
depend on coupling parameters as shown in Table 5.10. Rematrixing is described in
Section 7.5 on page72.
— 29 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 5.10 Number of Rematrixing Bands
condition # of rematrixing bands
cplinu == 0 4
(cplinu == 1) && (cplbegf > 2) 4
(cplinu == 1) && (2 ≥ cplbegf > 0) 3
(cplinu == 1) && (cplbegf == 0) 2
[Link] cplexpstr - coupling exponent strategy - 2 bits
This element indicates the method of exponent coding that is used for the coupling
channel as shown in Table 7.4 on page 47. See Section 7.1 on page 45 for explanation of
each exponent strategy.
[Link] chexpstr[ch] - channel exponent strategy - 2 bits
This element indicates the method of exponent coding that is used for channel [ch],
as shown in Table 7.4 on page47. This element exists for each full bandwidth channel.
[Link] lfeexpstr - low frequency effects channel exponent strategy - 1 bit
This element indicates the method of exponent coding that is used for the lfe
channel, as shown in Table 7.5 on page47.
[Link] chbwcod[ch] - channel bandwidth code - 6 bits
The chbwcod[ch] element is an unsigned integer which defines the upper band edge
for full-bandwidth channel [ch]. This parameter is only included for fbw channels which are
not coupled. (See Section 7.1.3 on page 47 on exponents for the definition of this
parameter.) Valid values are in the range of 0-60. If a value greater than 60 is received, the
bit stream is invalid and the decoder shall cease decoding audio and mute.
[Link] cplabsexp - coupling absolute exponent - 4 bits
This is an absolute exponent, which is used as a reference when decoding the
differential exponents for the coupling channel.
[Link] cplexps[grp] - coupling exponents - 7 bits
Each value of cplexps indicates the value of 3, 6, or 12 differentially-coded coupling
channel exponents for the coupling exponent group [grp] for the case of D15, D25, or D45
coding, respectively. The number of cplexps values transmitted equals ncplgrps, which may
be determined from cplbegf, cplendf, and cplexpstr. Refer to Section 7.1.3 on page 47 for
further information.
— 30 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
[Link] exps[ch][grp] - channel exponents - 4 or 7 bits
These elements represent the encoded exponents for channel [ch]. The first element
([grp]=0) is a 4-bit absolute exponent for the first (DC term) transform coefficient. The
subsequent elements ([grp]>0) are 7-bit representations of a group of 3, 6, or 12
differentially coded exponents (corresponding to D15, D25, D45 exponent strategies
respectively). The number of groups for each channel, nchgrps[ch] , is determined from
cplbegf if the channel is coupled, or chbwcod[ch] of the channel is not coupled. Refer to
Section 7.1.3 on page47 for further information.
[Link] gainrng[ch] - channel gain range code - 2 bits
This per channel 2-bit element may be used to determine a block floating-point
shift value for the inverse TDAC transform filterbank. Use of this code allows increased
dynamic range to be obtained from a limited word length transform computation. For
further information see Section 7.9.5 on page93.
[Link] lfeexps[grp] - low frequency effects channel exponents - 4 or 7 bits
These elements represent the encoded exponents for the LFE channel. The first
element ([grp]=0) is a 4-bit absolute exponent for the first (DC term) transform coefficient.
There are two additional elements (nlfegrps=2) which are 7-bit representations of a group
of 3 differentially coded exponents. The total number of lfe channel exponents (nlfemant ) is
7.
[Link] baie - bit allocation information exists - 1 bit
If this bit is a 1, then five separate fields (totaling 11 bits) follow in the bit stream.
Each field indicates parameter values for the bit allocation process. If this bit is a 0, these
fields do not exist. Further details on these fields may be found in Section 7.2 on page 50.
[Link] sdcycod - slow decay code - 2 bits
This is a 2-bit code specifies the slow decay parameter in the bit allocation process.
[Link] fdcycod - fast decay code - 2 bits
This is a 2-bit code specifies the fast decay parameter in the decode bit allocation
process.
[Link] sgaincod - slow gain code - 2 bits
This is a 2-bit code specifies the slow gain parameter in the decode bit allocation
process.
[Link] dbpbcod - dB per bit code - 2 bits
This 2-bit code specifies the dB per bit parameter in the bit allocation process.
— 31 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
[Link] floorcod - masking floor code - 3 bits
This 3-bit code specifies the floor code parameter in the bit allocation process.
[Link] snroffste - SNR offset exists - 1 bit
If this bit has a value of 1, a number of bit allocation parameters follow in the bit
stream. If this bit has a value of 0, SNR offset information does not follow, and the
previously transmitted values should be used for this block. The bit allocation process and
these parameters are described in Section 7.2.2 on page 51.
[Link] csnroffst - coarse SNR offset - 6 bits
This 6-bit code specifies the coarse SNR offset parameter in the bit allocation
process.
[Link] cplfsnroffst - coupling fine SNR offset - 4 bits
This 4-bit code specifies the coupling channel fine SNR offset in the bit allocation
process.
[Link] cplfgaincod - coupling fast gain code - 3 bits
This 3-bit code specifies the coupling channel fast gain code used in the bit
allocation process.
[Link] fsnroffst[ch] - channel fine SNR offset - 4 bits
This 4-bit code specifies the fine SNR offset used in the bit allocation process for
channel [ch].
[Link] fgaincod[ch] - channel fast gain code - 3 bits
This 3-bit code specifies the fast gain parameter used in the bit allocation process
for channel[ch].
[Link] lfefsnroffst - low frequency effects channel fine SNR offset - 4 bits
This 4-bit code specifies the fine SNR offset parameter used in the bit allocation
process for the lfe channel.
[Link] lfefgaincod - low frequency effects channel fast gain code - 3 bits
This 3-bit code specifies the fast gain parameter used in the bit allocation process
for the lfe channel.
[Link] cplleake - coupling leak initialization exists - 1 bit
If this bit is a 1, leak initialization parameters follow in the bit stream. If this bit is a
0, the previously transmitted values still apply.
— 32 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
[Link] cplfleak - coupling fast leak initialization - 3 bits
This 3-bit code specifies the fast leak initialization value for the coupling channel's
excitation function calculation in the bit allocation process.
[Link] cplsleak - coupling slow leak initialization - 3 bits
This 3-bit code specifies the slow leak initialization value for the coupling channel's
excitation function calculation in the bit allocation process.
[Link] deltbaie - delta bit allocation information exists - 1 bit
If this bit is a 1, some delta bit allocation information follows in the bit stream. If
this bit is a 0, the previously transmitted delta bit allocation information still applies,
except for block 0. If deltbaie is 0 in block 0, then cpldeltnseg and deltnseg[ch] are set to 0,
and no delta bit allocation is applied. Delta bit allocation is described in Section [Link] on
page 56.
[Link] cpldeltbae - coupling delta bit allocation exists - 2 bits
This 2-bit code indicates the delta bit allocation strategy for the coupling channel,
as shown in Table 5.11. If the reserved state is received, the decoder should not decode
audio, and should mute.
Table 5.11 Delta Bit Allocation Exists States
cpldeltbae, deltbae code
‘00’ reuse previous state
‘01’ new info follows
‘10’ perform no delta alloc
‘11’ reserved
[Link] deltbae[ch] - delta bit allocation exists - 2 bits
This per full bandwidth channel 2-bit code indicates the delta bit allocation strategy
for the corresponding channel, as shown in Table 5.11.
[Link] cpldeltnseg - coupling delta bit allocation number of segments - 3 bits
This 3-bit code indicates the number of delta bit allocation segments that exist for
the coupling channel. The value of this parameter ranges from 1 to 8, and is calculated by
adding 1 to the 3-bit binary number represented by the code.
[Link] cpldeltoffst[seg] - coupling delta bit allocation offset - 5 bits
The first 5-bit code ([seg]=0) indicates the number of the first bit allocation band
(as specified in 7.4.2 on page 70) of the coupling channel for which delta bit allocation
values are provided. Subsequent codes indicate the offset from the previous delta segment
end point to the next bit allocation band for which delta bit allocation values are provided.
— 33 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
[Link] cpldeltlen[seg] - coupling delta bit allocation length - 4 bits
Each 4-bit code indicates the number of bit allocation bands that the corresponding
segment spans.
[Link] cpldeltba[seg] - coupling delta bit allocation - 3 bits
This 3-bit value is used in the bit allocation process for the coupling channel
.
Each 3-bit code indicates an adjustment to the default masking curve computed in
the decoder. The deltas are coded as shown in Table 5.12.
Table 5.12 Bit Allocation Deltas
cpldeltba, deltba adjustment
‘000’ -24 dB
‘001’ -18 dB
‘010’ -12 dB
‘011’ -6 dB
‘100’ +6 dB
‘101’ +12 dB
‘110’ +18 dB
‘111’ +24 dB
[Link] deltnseg[ch] - channel delta bit allocation number of segments - 3 bits
These per full bandwidth channel elements are 3-bit codes indicating the number of
delta bit allocation segments that exist for the corresponding channel. The value of this
parameter ranges from 1 to 8, and is calculated by adding 1 to the 3-bit binary code.
[Link] deltoffst[ch][seg] - channel delta bit allocation offset - 5 bits
The first 5-bit code ([seg]=0) indicates the number of the first bit allocation band
(see Section [Link] on page 56) of the corresponding channel for which delta bit
allocation values are provided. Subsequent codes indicate the offset from the previous
delta segment end point to the next bit allocation band for which delta bit allocation values
are provided.
[Link] deltlen[ch][seg] - channel delta bit allocation length - 4 bits
Each 4-bit code indicates the number of bit allocation bands that the corresponding
segment spans.
[Link] deltba[ch][seg] - channel delta bit allocation - 3 bits
This 3-bit value is used in the bit allocation process for the indicated channel. Each
3-bit code indicates an adjustment to the default masking curve computed in the decoder.
The deltas are coded as shown in Table 5.12.
— 34 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
[Link] skiple - skip length exists - 1 bit
If this bit is a 1, then the skipl parameter follows in the bit stream. If this bit is a 0,
skipl does not exist.
[Link] skipl - skip length - 9 bits
This 9-bit code indicates the number of dummy bytes to skip (ignore) before
unpacking the mantissas of the current audio block.
[Link] skipfld - skip field - (skipl× 8) bits
This field contains the null bytes of data to be skipped, as indicated by the skipl
parameter.
[Link] chmant[ch][bin] - channel mantissas - 0 to 16 bits
The actual quantized mantissa values for the indicated channel. Each value may
contain from 0 to as many as 16 bits. The number of mantissas for the indicated channel is
equal to nchmant[ch] , which may be determined from chbwcod[ch] (see Section 7.1.3 on
page 47) if the channel is not coupled, or from cplbegf (see Section 7.4.2 on page 70) if the
channel is coupled. Detailed information on packed mantissa data is in Section 7.3 on page
65.
[Link] cplmant[bin] - coupling mantissas - 0 to 16 bits
The actual quantized mantissa values for the coupling channel. Each value may
contain from 0 to as many as 16 bits. The number of mantissas for the coupling channel is
equal to ncplmant , which may be determined from:
ncplmant = 12 × ncplsubnd.
[Link] lfemant[bin] - low frequency effects channel mantissas - 0 to 16 bits
The actual quantized mantissa values for the lfe channel. Each value may contain
from 0 to as many as 16 bits. The value of nlfemant is 7, so there are 7 mantissa values for
the lfe channel.
5.4.4 auxdata - auxiliary data field
Unused data at the end of a frame will exist whenever the encoder does not utilize
all available data for encoding the audio signal. This may occur if the final bit allocation
falls short of using all available bits, or if the input audio signal simply does not require all
available bits to be coded transparently. Or, the encoder may be instructed to intentionally
leave some bits unused by audio so that they are available for use by auxiliary data. Since
the number of bits required for auxiliary data may be smaller than the number of bits
available (which will be time varying) in any particular frame, a method is provided to
signal the number of actual auxiliary data bits in each frame.
— 35 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
[Link] auxbits - auxiliary data bits - nauxbits bits
This field contains auxiliary data. The total number of bits in this field is:
nauxbits = (bits in frame) - (bits used by all bit stream elements except for auxbits) ;
The number of bits in the frame can be determined from the frame size code
(frmsizcod ) and Table 5.13 on page 38. The number of bits used includes all bits used by bit
stream elements with the exception of auxbits. Any dummy data which has been included
with skip fields (skipfld) is included in the used bit count. The length of the auxbits field is
adjusted by the encoder such that the crc2 element falls on the last 16-bit word of the
frame.
If the number of user bits indicated by auxdatal is smaller than the number of
available aux bits nauxbits, the user data is located at the end of the auxbits field. This allows
a decoder to find and unpack the auxdatal user bits without knowing the value of nauxbits
(which can only be determined by decoding the audio in the entire frame). The order of the
user data in the auxbits field is forward. Thus the aux data decoder (which may not decode
any audio) may simply look to the end of the AC-3 syncframe to find auxdatal, backup
auxdatal bits (from the beginning of auxdatal ) in the data stream, and then unpack auxdatal
bits moving forward in the data stream.
[Link] auxdatal - auxiliary data length - 14 bits
This 14-bit integer value indicates the length, in bits, of the user data in the auxbits
auxiliary field.
[Link] auxdatae - auxiliary data exists - 1 bit
If this bit is a 1, then the auxdatal parameter precedes in the bit stream. If this bit is
a 0, auxdatal does not exist, and there is no user data.
5.4.5 errorcheck - frame error detection field
[Link] crcrsv - CRC reserved bit - 1 bit
Reserved for use in specific applications to ensure crc2 will not be equal to the sync
word. Use of this bit is optional by encoders. If the crc2 calculation results in a value equal
to the syncword, the crcrsv bit may be inverted. This will result in a crc2 value which is not
equal to the syncword.
[Link] crc2 - cyclic redundancy check 2 - 16 bits
The 16 bit CRC applies to the entire frame. The details of the CRC checking are
described in Section 7.10.1 on page94.
— 36 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
5.5 Bit stream constraints
The following constraints are placed upon the encoded bit stream by the AC-3
encoder. These constraints allow AC-3 decoders to be manufactured with smaller input
memory buffers.
1. The size of block 0 and block 1 combined, will never exceed 5/8 of the frame.
2. The sum of block 5 mantissa data and auxiliary data will never exceed the final 3/8 of
the frame.
3. Block 0 always contains all necessary information to begin correctly decoding the bit
stream.
4. Whenever the state of cplinu changes from off to on, all coupling information is
included in the block in which coupling is turned on. No coupling related information
is reused from any previous blocks where coupling may have been on.
— 37 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 5.13 Frame Size Code Table (1 word = 16 bits)
frmsizecod nominal bit rate fs = 32 kHz fs = 44.1 kHz fs = 48 kHz
words/syncframe words/syncframe words/syncframe
‘000000’ (0) 32 kbps 96 69 64
‘000001’ (0) 32 kbps 96 70 64
‘000010’ (1) 40 kbps 120 87 80
‘000011’ (1) 40 kbps 120 88 80
‘000100’ (2) 48 kbps 144 104 96
‘000101’ (2) 48 kbps 144 105 96
‘000110’ (3) 56 kbps 168 121 112
‘000111’ (3) 56 kbps 168 122 112
‘001000’ (4) 64 kbps 192 139 128
‘001001’ (4) 64 kbps 192 140 128
‘001010’ (5) 80 kbps 240 174 160
‘001011’ (5) 80 kbps 240 175 160
‘001100’ (6) 96 kbps 288 208 192
‘001101’ (6) 96 kbps 288 209 192
‘001110’ (7) 112 kbps 336 243 224
‘001111’ (7) 112 kbps 336 244 224
‘010000’ (8) 128 kbps 384 278 256
‘010001’ (8) 128 kbps 384 279 256
‘010010’ (9) 160 kbps 480 348 320
‘010011’ (9) 160 kbps 480 349 320
‘010100’ (10) 192 kbps 576 417 384
‘010101’ (10) 192 kbps 576 418 384
‘010110’ (11) 224 kbps 672 487 448
‘010111’ (11) 224 kbps 672 488 448
‘011000’ (12) 256 kbps 768 557 512
‘011001’ (12) 256 kbps 768 558 512
‘011010’ (13) 320 kbps 960 696 640
‘011011’ (13) 320 kbps 960 697 640
‘011100’ (14) 384 kbps 1152 835 768
‘011101’ (14) 384 kbps 1152 836 768
‘011110’ (15) 448 kbps 1344 975 896
‘011111’ (15) 448 kbps 1344 976 896
‘100000’ (16) 512 kbps 1536 1114 1024
‘100001’ (16) 512 kbps 1536 1115 1024
‘100010’ (17) 576 kbps 1728 1253 1152
‘100011’ (17) 576 kbps 1728 1254 1152
‘100100’ (18) 640 kbps 1920 1393 1280
‘100101’ (18) 640 kbps 1920 1394 1280
— 38 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 5.14 Language Code Table
langcod language langcod language langcod language langcod language
0x00 unknown/not 0x20 Polish 0x40 background 0x60 Moldavian
applicable sound/clean feed
0x01 Albanian 0x21 Portuguese 0x41 0x61 Malaysian
0x02 Breton 0x22 Romanian 0x42 0x62 Malagasay
0x03 Catalan 0x23 Romansh 0x43 0x63 Macedonian
0x04 Croatian 0x24 Serbian 0x44 0x64 Laotian
0x05 Welsh 0x25 Slovak 0x45 Zulu 0x65 Korean
0x06 Czech 0x26 Slovene 0x46 Vietnamese 0x66 Khmer
0x07 Danish 0x27 Finnish 0x47 Uzbek 0x67 Kazakh
0x08 German 0x28 Swedish 0x48 Urdu 0x68 Kannada
0x09 English 0x29 Turkish 0x49 Ukrainian 0x69 Japanese
0x0A Spanish 0x2A Flemish 0x4A Thai 0x6A Indonesian
0x0B Esperanto 0x2B Walloon 0x4B Telugu 0x6B Hindi
0x0C Estonian 0x2C 0x4C Tatar 0x6C Hebrew
0x0D Basque 0x2D 0x4D Tamil 0x6D Hausa
0x0E Faroese 0x2E 0x4E Tadzhik 0x6E Gurani
0x0F French 0x2F 0x4F Swahili 0x6F Gujurati
0x10 Frisian 0x30 reserved for nat’l 0x50 Sranan Tongo 0x70 Greek
assignment
0x11 Irish 0x31 " 0x51 Somali 0x71 Georgian
0x12 Gaelic 0x32 " 0x52 Sinhalese 0x72 Fulani
0x13 Galician 0x33 " 0x53 Shona 0x73 Dari
0x14 Icelandic 0x34 " 0x54 Serbo-Croat 0x74 Churash
0x15 Italian 0x35 " 0x55 Ruthenian 0x75 Chinese
0x16 Lappish 0x36 " 0x56 Russian 0x76 Burmese
0x17 Latin 0x37 " 0x57 Quechua 0x77 Bulgarian
0x18 Latvian 0x38 " 0x58 Pustu 0x78 Bengali
0x19 Luxembourgian 0x39 " 0x59 Punjabi 0x79 Belorussian
0x1A Lithuanian 0x3A " 0x5A Persian 0x7A Bambora
0x1B Hungarian 0x3B " 0x5B Papamiento 0x7B Azerbijani
0x1C Maltese 0x3C " 0x5C Oriya 0x7C Assamese
0x1D Dutch 0x3D " 0x5D Nepali 0x7D Armenian
0x1E Norwegian 0x3E " 0x5E Ndebele 0x7E Arabic
0x1F Occitan 0x3F " 0x5F Marathi 0x7F Amharic
— 39 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
6. DECODING THE AC-3 BIT STREAM
6.1 Introduction
Section 5 of this standard specifies the details of the AC-3 bit stream syntax. This
section gives an overview of the AC-3 decoding process as diagrammed in Figure 6.1,
where the decoding process flow is shown as a sequence of blocks down the center of the
page, and some of the information flow is indicated by arrowed lines at the sides of the
page. More detailed information on some of the processing blocks will be found in Section
7. The decoder described in this section should be considered one example of a decoder.
Other methods may exist to implement decoders, and these other methods may have
advantages in certain areas (such as instruction count, memory requirement, number of
transforms required, etc.).
6.2 Summary of the decoding process
6.2.1 Input bit stream
The input bit stream will typically come from a transmission or storage system.
The interface between the source of AC-3 data and the AC-3 decoder is not specified in
this standard. The details of the interface effect a number of decoder implementation
details.
[Link] Continuous or burst input
The encoded AC-3 data may be input to the decoder as a continuous data stream
at the nominal bit-rate, or chunks of data may be burst into the decoder at a high rate with
a low duty cycle. For burst mode operation, either the data source or the decoder may be
the master controlling the burst timing. The AC-3 decoder input buffer may be smaller in
size if the decoder can request bursts of data on an as-needed basis. However, the external
buffer memory may be larger in this case.
[Link] Byte or word alignment
Most applications of this standard will convey the elementary AC-3 bit stream with
byte or (16-bit) word alignment. The syncframe is always an integral number of words in
length. The decoder may receive data as a continuous serial stream of bits without any
alignment. Or, the data may be input to the decoder with either byte or word (16-bit)
alignment. Byte or word alignment of the input data may allow some simplification of the
decoder. Alignment does reduce the probability of false detection of the sync word.
— 40 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Input Bit-Stream
Synchronization,
Error Detection
Main Information Unpack BSI, Side Information
Side Information
Packed Exponents Exponent Strategies
Decode Exponents
Bit Allocation Parameters
Bit Allocation
Packed Mantissas Unpack, Ungroup,
Dither Flags
Dequantize, Dither
Mantissas
Coupling Parameters
De-Coupling
Rematrixing Flags
Rematrixing
Dynamic Range Dynamic Range Words
Compression
Block Sw flags
Inverse Transform
Window
Overlap/Add
Downmix
PCM Output Buffer
Output PCM
Figure 6.1. Flow diagram of the decoding process.
— 41 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
6.2.2 Synchronization and error detection
The AC-3 bit-steam format allows rapid synchronization. The 16-bit sync word has
a low probability of false detection. With no input stream alignment the probability of false
detection of the sync word is 0.0015% per input stream bit position. For a bit-rate of 384
kbps, the probability of false sync word detection is 19% per frame. Byte-alignment of the
input stream drops this probability to 2.5%, and word alignment drops it to 1.2%.
When a sync pattern is detected the decoder may be estimated to be in sync and
one of the CRC words (crc1 or crc2) may be checked. Since crc1 comes first and covers the
first 5/8 of the frame, the result of a crc1 check may be available after only 5/8 of the frame
has been received. Or, the entire frame size can be received and crc2 checked. If either
CRC checks, the decoder may safely be presumed to be in sync and decoding and
reproduction of audio may proceed. The chance of false sync in this case would be the
concatenation of the probabilities of a false sync word detection and a CRC misdetection
of error. The CRC check is reliable to 0.0015%. This probability, concatenated with the
probability of a false sync detection in a byte-aligned input bit stream, yield a probability of
false synchronization of 0.000035% (or about once in 3 million synchronization attempts).
If this small probability of false sync is too large for an application, there are
several methods which may reduce it. The decoder may only presume correct sync in the
case that both CRC words check properly. The decoder may require multiple sync words
to be received with the proper alignment. If the data transmission or storage system is
aware that data is in error, this information may be made known to the decoder.
Additional details on methods of bit stream synchronization are not provided in
this standard. Details on the CRC calculation are provided in Section 7.10 on page
93.
6.2.3 Unpack BSI, side information
Inherent to the decoding process is the unpacking (de-multiplexing) of the various
types of information included in the bit stream. Some of these items may be copied from
the input buffer to dedicated registers, some may be copied to specific working memory
location, and some of the items may simply be located in the input buffer with pointers to
them saved to another location for use when the information is required. The information
which must be unpacked is specified in detail in Section 5.3. Further details on the
unpacking of BSI and side information are not provided in this standard.
6.2.4 Decode exponents
The exponents are delivered in the bit stream in an encoded form. In order to
unpack and decode the exponents two types of side information are required. First, the
number of exponents must be known. For fbw channels this may be determined from
either chbwcod[ch] (for uncoupled channels) or from cplbegf (for coupled channels). For the
coupling channel, the number of exponents may be determined from cplbegf and cplendf. For
the lfe channel (when on), there are always 7 exponents. Second, the exponent strategy in
use (D15, etc.) by each channel must be known. The details on how to unpack and decode
exponents are provided in Section 7.1 on page45.
— 42 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
6.2.5 Bit allocation
The bit allocation computation reveals how many bits are used for each mantissa.
The inputs to the bit allocation computation are the decoded exponents, and the bit
allocation side information. The outputs of the bit allocation computation are a set of bit
allocation pointers (baps), one bap for each coded mantissa. The bap indicates the quantizer
used for the mantissa, and how many bits in the bit stream were used for each mantissa.
The bit allocation computation is described in detail in Section 7.2 on page50.
6.2.6 Process mantissas
The coarsely quantized mantissas make up the bulk of the AC-3 data stream. Each
mantissa is quantized to a level of precision indicated by the corresponding bap. In order to
pack the mantissa data more efficiently, some mantissas are grouped together into a single
transmitted value. For instance, two 11-level quantized values are conveyed in a single 7-
bit code (3.5 bits/value) in the bit stream.
The mantissa data is unpacked by peeling off groups of bits as indicated by the
baps. Grouped mantissas must be ungrouped. The individual coded mantissa values are
converted into a de-quantized value. Mantissas which are indicated as having zero bits
may be reproduced as either zero, or by a random dither value (under control of the dither
flag). The mantissa processing is described in full detail in Section 7.3 on page
65.
6.2.7 De-coupling
When coupling is in use, the channels which are coupled must be decoupled.
Decoupling involves reconstructing the high frequency section (exponents and mantissas)
of each coupled channel, from the common coupling channel and the coupling coordinates
for the individual channel. Within each coupling band, the coupling channel coefficients
(exponent and mantissa) are multiplied by the individual channel coupling coordinates. The
coupling process is described in detail in Section 7.4 on page
69.
6.2.8 Rematrixing
In the 2/0 audio coding mode rematrixing may be employed, as indicated by the
rematrix flags (rematflg[rbnd] ). Where the flag indicates a band is rematrixed, the coefficients
encoded in the bit stream are sum and difference values instead of left and right values.
Rematrixing is described in detail in Section 7.5 on page 72.
6.2.9 Dynamic range compression
For each block of audio a dynamic range control value (dynrng) may be included in
the bit stream. The decoder, by default, shall use this value to alter the magnitude of the
coefficient (exponent and mantissa) as specified in Section 7.7.1 on page 76.
— 43 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
6.2.10 Inverse transform
The decoding steps described above will result in a set of frequency coefficients for
each encoded channel. The inverse transform converts the blocks of frequency coefficients
into blocks of time samples. The inverse transform is detailed in Section 7.9 on page 87.
6.2.11 Window, overlap/add
The individual blocks of time samples must be windowed, and adjacent blocks
must be overlapped and added together in order to reconstruct the final continuous time
output PCM audio signal. The window and overlap/add steps are described along with the
inverse transform in Section 7.9 on page87.
6.2.12 Downmixing
If the number of channels required at the decoder output is smaller than the
number of channels which are encoded in the bit stream, then downmixing is required.
Downmixing in the time domain is shown in this example decoder. Since the inverse
transform is a linear operation, it is also possible to downmix in the frequency domain
prior to transformation. Section 7.8 on page 81 describes downmixing and specifies the
downmix coefficients which decoders shall employ.
6.2.13 PCM output buffer
Typical decoders will provide PCM output samples at the PCM sampling rate.
Since blocks of samples result from the decoding process, an output buffer is typically
required. This standard does not specify or describe output buffering in any further detail.
6.2.14 Output PCM
The output PCM samples may be delivered in form suitable for interconnection to
a digital to analog converter (DAC), or in any other form. This standard does not specify
the output PCM format.
— 44 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
7. ALGORITHMIC DETAILS
The following sections describe various aspects of AC-3 coding in detail.
7.1 Exponent coding
7.1.1 Overview
The actual audio information conveyed by the AC-3 bit stream consists of the
quantized frequency coefficients. The coefficients are delivered in floating point form, with
each coefficient consisting of an exponent and a mantissa. This section describes how the
exponents are encoded and packed into the bit stream.
Exponents are 5-bit values which indicate the number of leading zeros in the binary
representation of a frequency coefficient. The exponent acts as a scale factor for each
mantissa, equal to 2-exp. Exponent values are allowed to range from 0 (for the largest value
coefficients with no leading zeroes) to 24. Exponents for coefficients which have more
than 24 leading zeroes are fixed at 24, and the corresponding mantissas are allowed to
have leading zeros. Exponents require 5 bits in order to represent all allowed values.
AC-3 bit streams contain coded exponents for all independent channels, all coupled
channels, and for the coupling and low frequency effects channels (when they are enabled).
Since audio information is not shared across frames, block 0 of every frame will include
new exponents for every channel. Exponent information may be shared across blocks
within a frame, so blocks 1 through 5 may reuse exponents from previous blocks.
AC-3 exponent transmission employs differential coding, in which the exponents
for a channel are differentially coded across frequency. The first exponent of a fbw or lfe
channel is always sent as a 4-bit absolute value, ranging from 0-15. The value indicates the
number of leading zeros of the first (DC term) transform coefficient. Successive (going
higher in frequency) exponents are sent as differential values which must be added to the
prior exponent value in order to form the next absolute value.
The differential exponents are combined into groups in the audio block. The
grouping is done by one of three methods, D15, D25, or D45, which are referred to as
exponent strategies. The number of grouped differential exponents placed in the audio
block for a particular channel depends on the exponent strategy and on the frequency
bandwidth information for that channel. The number of exponents in each group depends
only on the exponent strategy.
An AC-3 audio block contains two types of fields with exponent information. The
first type defines the exponent coding strategy for each channel, and the second type
contains the actual coded exponents for channels requiring new exponents. For
independent channels, frequency bandwidth information is included along with the
exponent strategy fields. For coupled channels, and the coupling channel, the frequency
information is found in the coupling strategy fields.
— 45 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
7.1.2 Exponent strategy
Exponent strategy information for every channel is included in every AC-3 audio
block. Information is never shared across frames, so block 0 will always contain a strategy
indication (D15, D25, or D45) for each channel. Blocks 1 through 5 may indicate reuse of
the prior (within the same frame) exponents. The three exponent coding strategies provide
a tradeoff between data rate required for exponents, and their frequency resolution. The
D15 mode provides the finest frequency resolution, and the D45 mode requires the least
amount of data. In all three modes, a number differential exponents are combined into 7-
bit words when coded into an audio block. The main difference between the modes is how
many differential exponents are combined together.
The absolute exponents found in the bit stream at the beginning of the differentially
coded exponent sets are sent as 4-bit values which have been limited in either range or
resolution in order to save one bit. For fbw and lfe channels, the initial 4-bit absolute
exponent represents a value from 0 to 15. Exponent values larger than 15 are limited to a
value of 15. For the coupled channel, the 5-bit absolute exponent is limited to even values,
and the lsb is not transmitted. The resolution has been limited to valid values of 0,2,4...24.
Each differential exponent can take on one of five values: -2, -1, 0, +1, +2. This allows
deltas of up to ±2 (± 12 dB) between exponents. These five values are mapped into the
values 0, 1, 2, 3, 4 before being grouped, as shown in Table 7.1.
Table 7.1 Mapping of Differential Exponent Values, D15 Mode
diff exp mapped value
+2 4
+1 3
0 2
-1 1
-2 0
Mapped Value = Diff Exp + 2 ;
Diff Exp = Mapped Value - 2 ;
In the D15 mode, the above mapping is applied to each individual differential
exponent for coding into the bit stream. In the D25 mode, each pair of differential
exponents is represented by a single mapped value in the bit stream. In this mode the
second differential exponent of each pair is implied as a delta of 0 from the first element of
the pair as indicated in Table 7.2.
Table 7.2 Mapping of Differential Exponent Values, D25 Mode
diff exp n diff exp n+1 mapped value
+2 0 4
+1 0 3
0 0 2
-1 0 1
-2 0 0
— 46 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
The D45 mode is similar to the D25 mode except that quads of differential
exponents are represented by a single mapped value, as indicated by Table 7.3.
Table 7.3 Mapping of Differential Exponent Values, D45 Mode
diff exp n diff exp n+1 diff exp n+2 diff exp n+3 mapped value
+2 0 0 0 4
+1 0 0 0 3
0 0 0 0 2
-1 0 0 0 1
-2 0 0 0 0
Since a single exponent is effectively shared by 2 or 4 different mantissas, encoders
must ensure that the exponent chosen for the pair or quad is the minimum absolute value
(corresponding to the largest exponent) needed to represent all the mantissas.
For all modes, sets of three adjacent (in frequency) mapped values (M1,M2 and
M3) are grouped together and coded as a 7 bit value according to the following formula:
Coded 7 bit Grouped Value = (25 × M1) + (5 × M2) + M3 ;
The exponent field for a given channel in an AC-3 audio block consists of a single
absolute exponent followed by a number of these grouped values.
7.1.3 Exponent decoding
The exponent strategy for each coupled and independent channel is included in a
set of 2-bit fields designated chexpstr[ch] . When the coupling channel is present, a cplexpstr
strategy code is also included. Table 7.4 shows the mapping from exponent strategy code
into exponent strategy.
Table 7.4 Exponent Strategy Coding
chexpstr[ch], cplexpstr exponent strategy exponents per group
‘00’ reuse prior exponents 0
‘01’ D15 3
‘10’ D25 6
‘11’ D45 12
When the low frequency effects channel is enabled the lfeexpstr field is present. It is
decoded as shown in Table 7.5.
Table 7.5 LFE Channel Exponent Strategy Coding
lfeexpstr exponent strategy exponents per group
‘0’ reuse prior exponents 0
‘1’ D15 3
— 47 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Following the exponent strategy fields in the bit stream is a set of channel
bandwidth codes, chbwcod[ch] . These are only present for independent channels (channels
not in coupling) that have new exponents in the current block. The channel bandwidth
code defines the end mantissa bin number for that channel according to the following:
endmant[ch] = ((chbwcod[ch] + 12) * 3) + 37 ; /* (ch is not coupled) */
For coupled channels the end mantissa bin number is defined by the starting bin
number of the coupling channel:
endmant[ch] = cplstrtmant ; /* (ch is coupled) */
where cplstrtmant is as derived below. By definition the starting mantissa bin number for
independent and coupled channels is 0.
strtmant[ch] = 0 ;
For the coupling channel, the frequency bandwidth information is derived from the
fields cplbegf and cplendf found in the coupling strategy information. The coupling channel
starting and ending mantissa bins are defined as:
cplstrtmant = (cplbegf * 12) + 37 ;
cplendmant = ((cplendf + 3) * 12) + 37 ;
The low frequency effects channel, when present, always starts in bin 0 and always
has the same number of mantissas:
lfestrtmant = 0 ;
lfeendmant = 7 ;
The second set of fields contains coded exponents for all channels indicated to
have new exponents in the current block. These fields are designated as exps[ch][grp] for
independent and coupled channels, cplexps[grp] for the coupling channel, and lfeexps[grp] for
the low frequency effects channel. The first element of the exps fields (exps[ch][0] ) and the
lfeexps field (lfeexps[0] ) is always a 4-bit absolute number. For these channels the absolute
exponent always contains the exponent value of the first transform coefficient (bin #0).
These 4 bit values correspond to a 5-bit exponent which has been limited in range (0 to
15, instead of 0 to 24), i.e., the most significant bit is zero. The absolute exponent for the
coupled channel, cplabsexp, is only used as a reference to begin decoding the differential
exponents for the coupling channel (i.e. it does not represent an actual exponent). The
cplabsexp is contained in the audio block as a 4-bit value, however it corresponds to a 5-bit
value. The LSB of the coupled channel initial exponent is always 0, so the decoder must
take the 4-bit value which was sent, and double it (left shift by 1) in order to obtain the 5-
bit starting value.
For each coded exponent set the number of grouped exponents (not including the
first absolute exponent) to decode from the bit stream is derived as follows:
For independent and coupled channels:
nchgrps[ch] = truncate ((endmant[ch] - 1) / 3) ; /* for D15 mode */
= truncate ((endmant[ch] - 1 + 3) / 6) ; /* for D25 mode */
= truncate ((endmant[ch] - 1 + 9) / 12) ; /* for D45 mode */
For the coupling channel:
— 48 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
ncplgrps = (cplendmant - cplstrtmant) / 3 ; /* for D15 mode */
= (cplendmant - cplstrtmant) / 6 ; /* for D25 mode */
= (cplendmant - cplstrtmant) / 12 ; /* for D45 mode */
For the low frequency effects channel:
nlfegrps =2;
Decoding a set of coded grouped exponents will create a set of 5-bit absolute
exponents. The exponents are decoded as follows:
1. Each 7 bit grouping of mapped values (gexp) is decoded using the inverse of the
encoding procedure:
M1 = truncate (gexp / 25) ;
M2 = truncate ((gexp % 25) / 5) ;
M3 = (gexp % 25) % 5 ;
4. Each mapped value is converted to a differential exponent (dexp) by subtracting the
mapping offset:
dexp = M - 2 ;
3. The set of differential exponents if converted to absolute exponents by adding each
differential exponent to the absolute exponent of the previous frequency bin:
exp[n] = exp[n-1] + dexp[n] ;
4. For the D25 and D45 modes each absolute exponent is copied to the remaining
members of the pair or quad.
The above procedure can be summarized as follows:
Pseudo code
/* unpack the mapped values */
for (grp = 0; grp < ngrps; grp++)
{
expacc = gexp[grp] ;
dexp[grp * 3] = truncate (expacc / 25) ;
expacc = expacc - ( 25 * dexp[grp * 3]) ;
dexp[(grp * 3) + 1] = truncate ( expacc / 5) ;
expacc = expacc - (5 * dexp[(grp * 3) + 1]) ;
dexp[(grp * 3) + 2] = expacc ;
}
/* unbiased mapped values */
for (grp = 0; grp < (ngrps * 3); grp++)
{
dexp[grp] = dexp[grp] - 2 ;
}
/* convert from differentials to absolutes */
prevexp = absexp ;
for (i = 0; i < (ngrps * 3); i++)
{
aexp[i] = prevexp + dexp[i] ;
prevexp = aexp[i] ;
}
— 49 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Pseudo code
/* expand to full absolute exponent array, using grpsize */
exp[0] = absexp ;
for (i = 0; i < (ngrps * 3); i++)
{
for (j = 0; j < grpsize; j++)
{
exp[(i * grpsize) + j +1] = aexp[i] ;
}
}
where,
ngrps = number of grouped exponents (nchgrps[ch], ncplgrps, or nlfegrps)
grpsize = 1 for D15
= 2 for D25
= 4 for D45
absexp = absolute exponent (exps[ch][0], (cplabsexp<<1), or lfeexps[0])
For the coupling channel the above output array, exp[n], should be offset to
correspond to the coupling start mantissa bin:
cplexp[n + cplstrtmant] = exp[n + 1] ;
For the remaining channels exp[n] will correspond directly to the absolute exponent
array for that channel.
7.2 Bit allocation
7.2.1 Overview
The bit allocation routine analyzes the spectral envelope of the audio signal being
coded with respect to masking effects to determine the number of bits to assign to each
transform coefficient mantissa. In the encoder, the bit allocation is performed globally on
the ensemble of channels as an entity, from a common bit pool. There are no preassigned
exponent or mantissa bits, allowing the routine to flexibly allocate bits across channels,
frequencies, and audio blocks in accordance with signal demand.
The bit allocation contains a parametric model of human hearing for estimating a
noise level threshold, expressed as a function of frequency, which separates audible from
inaudible spectral components. Various parameters of the hearing model can be adjusted
by the encoder depending upon signal characteristics. For example, a prototype masking
curve is defined in terms of two piecewise continuous line segments, each with its own
slope and y-axis intercept. One of several possible slopes and intercepts is selected by the
encoder for each line segment. The encoder may iterate on one or more such parameters
until an optimal result is obtained. When all parameters used to estimate the noise level
threshold have been selected by the encoder, the final bit allocation is computed. The
model parameters are conveyed to the decoder with other side information. The decoder
executes the routine in a single pass.
The estimated noise level threshold is computed over 50 bands of nonuniform
bandwidth (an approximate 1/6 octave scale). The banding structure, defined by tables in
— 50 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
the next section, is independent of sampling frequency. The required bit allocation for each
mantissa is established by performing a table lookup based upon the difference between
the input signal power spectral density (PSD) evaluated on a fine-grain uniform frequency
scale, and the estimated noise level threshold evaluated on the coarse-grain (banded)
frequency scale. Therefore, the bit allocation result for a particular channel has spectral
granularity corresponding to the exponent strategy employed. More specifically, a
separate bit allocation will be computed for each mantissa within a D15 exponent set, each
pair of mantissas within a D25 exponent set, and each quadruple of mantissas within a
D45 exponent set.
The bit allocation must be computed in the decoder whenever the exponent
strategy (chexpstr, cplexpstr, lfeexpstr) for one or more channels does not indicate reuse, or
whenever baie, snroffste, or deltbaie = 1. Accordingly, the bit allocation can be updated at a
rate ranging from once per audio block to once per 6 audio blocks, including the integral
steps in between. A complete set of new bit allocation information is always transmitted in
audio block 0.
Since the parametric bit allocation routine must generate identical results in all
encoder and decoder implementations, each step is defined exactly in terms of fixed-point
integer operations and table lookups. Throughout the discussion below, signed two's
complement arithmetic is employed. All additions are performed with an accumulator of
14 or more bits. All intermediate results and stored values are 8-bit values.
7.2.2 Parametric bit allocation
This section describes the seven-step procedure for computing the output of the
parametric bit allocation routine in the decoder. The approach outlined here starts with a
single uncoupled or coupled exponent set and processes all the input data for each step
prior to continuing to the next one. This technique, called vertical execution, is
conceptually straightforward to describe and implement. Alternatively, the seven steps can
be executed horizontally, in which case multiple passes through all seven steps are made
for separate subsets of the input exponent set.
The choice of vertical vs. horizontal execution depends upon the relative
importance of execution time vs. memory usage in the final implementation. Vertical
execution of the algorithm is usually faster due to reduced looping and context save
overhead. However, horizontal execution requires less RAM to store the temporary arrays
generated in each step. Hybrid horizontal/vertical implementation approaches are also
possible which combine the benefits of both techniques.
[Link] Initialization
Compute start/end frequencies for the channel being decoded. These are computed
from parameters in the bit stream as follows:
— 51 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Pseudo code
/* for fbw channels */
for(ch=0; ch<nfchans; ch++)
{
strtmant[ch] = 0;
if(chincpl[ch]) endmant[ch] = 37 + (12 × cplbegf) ; /* channel is coupled */
else endmant[ch] = 37 + (3 × (chbwcod + 12)) ; /* channel is not coupled */
}
/* for coupling channel */
cplstrtmant = 37 + (12 × cplbegf) ;
cplendmant = 37 + (12 × (cplendf + 3)) ;
/* for lfe channel */
lfestartmant = 0 ;
lfeendmant = 7 ;
Special case processing step:
Before continuing with the initialization procedure, all SNR offset parameters from
the bit stream should be evaluated. These include csnroffst , fsnroffst[ch] , cplfsnroffst , and
lfefsnroffst . If they are all found to be equal to zero, then all elements of the bit allocation
pointer array bap[] should be set to zero, and no other bit allocation processing is required
for the current audio block.
Perform table lookups to determine the values of sdecay, fdecay, sgain, dbknee, and
floor from parameters in the bit stream as follows:
Pseudo code
sdecay = slowdec[sdcycod] ; /* Table 7.6 */
fdecay = fastdec[fdcycod] ; /* Table 7.7 */
sgain = slowgain[sgaincod] ; /* Table 7.8 */
dbknee = dbpbtab[dbpbcod] ; /* Table 7.9 */
floor = floortab[floorcod] ; /* Table 7.10 */
Initialize as follows for uncoupled portion of fbw channel:
Pseudo code
start = strtmant[ch] ;
end = endmant[ch] ;
lowcomp = 0 ;
fgain = fastgain[fgaincod[ch]] ; /* Table 7.11 */
snroffset[ch] = ((csnroffst − 15) << 4 + fsnroffst[ch]) << 2 ;
Initialize as follows for coupling channel:
— 52 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Pseudo code
start = cplstrtmant ;
end = cplendmant ;
fgain = fastgain[cplfgaincod] ; /* Table 7.11 */
snroffset = ((csnroffst − 15) << 4 + cplfsnroffst) << 2 ;
if (cplleake)
{
fastleak = (cplfleak << 8) + 768 ;
slowleak = (cplsleak << 8) + 768 ;
}
Initialize as follows for lfe channel:
Pseudo code
start = lfestrtmant ;
end = lfeendmant ;
lowcomp = 0 ;
fgain = fastgain[lfefgaincod] ;
snroffset = ((csnroffst - 15) << 4 + lfefsnroffst) << 2 ;
[Link] Exponent mapping into PSD
This step maps decoded exponents into a 13-bit signed log power-spectral density
function.
Pseudo code
for (bin=start; bin<end; bin++)
{
psd[bin] = (3072 - (exp[bin] << 7)) ;
}
Since exp[k] assumes integral values ranging from 0 to 24, the dynamic range of the
psd[] values is from 0 (for the lowest-level signal) to 3072 for the highest-level signal. The
resulting function is represented on a fine-grain, linear frequency scale.
[Link] PSD integration
This step of the algorithm integrates fine-grain PSD values within each of a
multiplicity of 1/6th octave bands. Table 7.12 contains the 50 array values for bndtab[] and
bndsz. The bndtab[] array gives the first mantissa number in each band. The bndsz[] array
provides the width of each band in number of included mantissas. Table 7.13 contains the
256 array values for masktab[] , showing the mapping from mantissa number into the
associated 1/6 octave band number. These two tables contain duplicate information, all of
which need not be available in an actual implementation. They are shown here for
simplicity of presentation only.
The integration of PSD values in each band is performed with log-addition. The
log-addition is implemented by computing the difference between the two operands and
using the absolute difference divided by 2 as an address into a length 256 lookup table,
latab[] , shown in Table 7.14.
— 53 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Pseudo code
j = start ;
k = masktab[start] ;
do
{
bndpsd[k] = psd[j] ;
j++ ;
for (i = j; i < min(bndtab[k+1], end); i++)
{
bndpsd[k] = logadd(bndpsd[k], psd[j]) ;
j++ ;
}
k++ ;
}
while (end > bndtab[k++]) ;
logadd(a, b)
{
c=a−b;
address = min((abs(c) >> 1), 255) ;
if (c >= 0)
{
return(a + latab(address)) ;
}
else
{
return(b + latab(address)) ;
}
}
[Link] Compute excitation function
The excitation function is computed by applying the prototype masking curve
selected by the encoder (and transmitted to the decoder) to the integrated PSD spectrum
(bndpsd[]). The result of this computation is then offset downward in amplitude by the fgain
and sgain parameters, which are also obtained from the bit stream.
Pseudo code
bndstrt = masktab[start] ;
bndend = masktab[end - 1] + 1 ;
if (bndstrt == 0) /* For fbw and lfe channels */
{ /* Note: Do not call calc_lowcomp() for the last band of the lfe channel, (bin = 6) */
lowcomp = calc_lowcomp(lowcomp, bndpsd[0], bndpsd[1], 0) ;
excite[0] = bndpsd[0] - fgain - lowcomp ;
lowcomp = calc_lowcomp(lowcomp, bndpsd[1], bndpsd[2], 1) ;
excite[1] = bndpsd[1] - fgain - lowcomp ;
begin = 7 ;
for (bin = 2; bin < 7; bin++)
{
lowcomp = calc_lowcomp(lowcomp, bndpsd[bin], bndpsd[bin+1], bin) ;
fastleak = bndpsd[bin] - fgain ;
slowleak = bndpsd[bin] - sgain ;
excite[bin] = fastleak - lowcomp ;
if (bndpsd[bin] <= bndpsd[bin+1])
{
— 54 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Pseudo code
begin = bin + 1 ;
break ;
}
}
for (bin = begin; bin < min(bndend, 22); bin++)
{
lowcomp = calc_lowcomp(lowcomp, bndpsd[bin], bndpsd[bin+1], bin) ;
fastleak -= fdecay ;
fastleak = max(fastleak, bndpsd[bin] - fgain) ;
slowleak -= sdecay ;
slowleak = max(slowleak, bndpsd[bin] - sgain) ;
excite[bin] = max(fastleak - lowcomp, slowleak) ;
}
begin = 22 ;
}
else /* For coupling channel */
{
begin = bndstrt ;
}
for (bin = begin; bin < bndend; bin++)
{
fastleak -= fdecay ;
fastleak = max(fastleak, bndpsd[bin] - fgain) ;
slowleak -= sdecay ;
slowleak = max(slowleak, bndpsd[bin] - sgain) ;
excite[bin] = max(fastleak, slowleak) ;
}
calc_lowcomp(a, b0, b1, bin)
{
if (bin < 7)
{
if ((b0 + 256) == b1) ;
{
a = 384 ;
}
else if (b0 > b1)
{
a = max(0, a - 64) ;
}
}
else if (bin < 20)
{
if ((b0 + 256) == b1)
{
a = 320 ;
}
else if (b0 > b1)
{
a = max(0, a - 64) ;
}
}
— 55 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Pseudo code
else
{
a = max(0, a - 128) ;
}
return(a) ;
}
[Link] Compute masking curve
This step computes the masking (noise level threshold) curve from the excitation
function, as shown below. The hearing threshold hth[][] is shown in Table 7.15. The fscod
and dbpbcod variables are received by the decoder in the bit stream.
Pseudo code
for (bin = bndstrt; bin < bndend; bin++)
{
if (bndpsd[bin] < dbknee)
{
excite[bin] += ((dbknee - bndpsd[bin]) >> 2) ;
}
mask[bin] = max(excite[bin], hth[fscod][bin]) ;
}
[Link] Apply delta bit allocation
The optional delta bit allocation information in the bit stream provides a means for
the encoder to transmit side information to the decoder which directly increases or
decreases the masking curve obtained by the parametric routine. Delta bit allocation can
be enabled by the encoder for audio blocks which derive an improvement in audio quality
when the default bit allocation is appropriately modified. The delta bit allocation option is
available for each fbw channel and the coupling channel.
In the event that delta bit allocation is not being used, and no dba information is
included in the bit stream, the decoder must not modify the default allocation. One way to
insure this is to initialize the cpldeltnseg and deltnseg[ch] delta bit allocation variables to 0 at
the beginning of each frame. This makes the dba processing (shown below) to immediately
terminate, unless dba information (including cpldeltnseg and deltnseg[ch] ) is included in the
bit stream.
The dba information which modifies the decoder bit allocation are transmitted as
side information. The allocation modifications occur in the form of adjustments to the
default masking curve computed in the decoder. Adjustments can be made in multiples of
±6 dB. On the average, a masking curve adjustment of -6 dB corresponds to an increase
of 1 bit of resolution for all the mantissas in the affected 1/6th octave band. The following
code indicates, for a single channel, how the modification is performed. The modification
calculation is performed on the coupling channel (where deltnseg below equals cpldeltnseg )
and on each fbw channel (wheredeltnseg equals deltnseg[ch]).
— 56 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Pseudo code
if ((deltbae == 0) || (deltbae == 1))
{
band = 0 ;
for (seg = 0; seg < deltnseg+1; seg++)
{
band += deltoffst[seg] ;
if (deltba[seg] >= 4)
{
delta = (deltba[seg] - 3) << 7 ;
}
else
{
delta = (deltba[seg] - 4) << 7 ;
}
for (k = 0; k < deltlen[seg]; k++)
{
mask[band] += delta ;
band++ ;
}
}
}
[Link] Compute bit allocation
The bit allocation pointer array (bap[]) is computed in this step. The masking curve,
adjusted by snroffset in an earlier step and then truncated, is subtracted from the fine-grain
psd[] array. The difference is right-shifted by 5 bits, thresholded, and then used as an
address into baptab[] to obtain the final allocation. Thebaptab[] array is shown in Table 7.16.
The sum of all channel mantissa allocations in one frame is constrained by the
encoder to be less than or equal to the total number of mantissa bits available for that
frame. The encoder accomplishes this by iterating on the values of csnroffst and fsnroffst (or
cplfsnroffst or lfefsnroffst for the coupling and low frequency effects channels) to obtain an
appropriate result. The decoder is guaranteed to receive a mantissa allocation which meets
the constraints of a fixed transmission bit-rate.
At the end of this step, the bap[] array contains a series of 4-bit pointers. The
pointers indicate how many bits are assigned to each mantissa. The correspondence
between bap pointer value and quantization accuracy is shown in Table 7.17.
Pseudo code
i = start ;
j = masktab[start] ;
do
{
mask[j] -= snroffset ;
mask[j] -= floor ;
if (mask[j] < 0)
{
mask[j] = 0 ;
}
mask[j] &= 0x1fe0 ;
— 57 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Pseudo code
mask[j] += floor ;
for (k = i; k < min(bndtab[j] + bndsz[j], end); k++)
{
address = (psd[i] - mask[j]) >> 5 ;
address = min(63, max(0, address)) ;
bap[i] = baptab[address] ;
i++ ;
}
}
while (end > bndtab[j++]) ;
7.2.3 Bit allocation tables
Table 7.6 Slow Decay Table, slowdec[]
address slowdec[address]
0 0x0f
1 0x11
2 0x13
3 0x15
Table 7.7 Fast Decay Table, fastdec[]
address fastdec[address]
0 0x3f
1 0x53
2 0x67
3 0x7b
Table 7.8 Slow Gain Table, slowgain[]
address slowgain[address]
0 0x540
1 0x4d8
2 0x478
3 0x410
Table 7.9 dB/Bit Table, dbpbtab[]
address dbpbtab[address]
0 0x000
1 0x700
2 0x900
3 0xb00
— 58 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 7.10 Floor Table, floortab[]
address floortab[address]
0 0x2f0
1 0x2b0
2 0x270
3 0x230
4 0x1f0
5 0x170
6 0x0f0
7 0xf800
Table 7.11 Fast Gain Table, fastgain[]
address fastgain[address]
0 0x080
1 0x100
2 0x180
3 0x200
4 0x280
5 0x300
6 0x380
7 0x400
— 59 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 7.12 Banding Structure Tables, bndtab[], bndsz[]
band # bndtab[band] bndsz[band] band # bndtab[band] bndsz[band]
0 0 1 25 25 1
1 1 1 26 26 1
2 2 1 27 27 1
3 3 1 28 28 3
4 4 1 29 31 3
5 5 1 30 34 3
6 6 1 31 37 3
7 7 1 32 40 3
8 8 1 33 43 3
9 9 1 34 46 3
10 10 1 35 49 6
11 11 1 36 55 6
12 12 1 37 61 6
13 13 1 38 67 6
14 14 1 39 73 6
15 15 1 40 79 6
16 16 1 41 85 12
17 17 1 42 97 12
18 18 1 43 109 12
19 19 1 44 121 12
20 20 1 45 133 24
21 21 1 46 157 24
22 22 1 47 181 24
23 23 1 48 205 24
24 24 1 49 229 24
— 60 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 7.13 Bin Number to Band Number Table,
masktab[bin], bin = (10× A) + B
B=0 B=1 B=2 B=3 B=4 B=5 B=6 B=7 B=8 B=9
A=0 0 1 2 3 4 5 6 7 8 9
A=1 10 11 12 13 14 15 16 17 18 19
A=2 20 21 22 23 24 25 26 27 28 28
A=3 28 29 29 29 30 30 30 31 31 31
A=4 32 32 32 33 33 33 34 34 34 35
A=5 35 35 35 35 35 36 36 36 36 36
A=6 36 37 37 37 37 37 37 38 38 38
A=7 38 38 38 39 39 39 39 39 39 40
A=8 40 40 40 40 40 41 41 41 41 41
A=9 41 41 41 41 41 41 41 42 42 42
A=10 42 42 42 42 42 42 42 42 42 43
A=11 43 43 43 43 43 43 43 43 43 43
A=12 43 44 44 44 44 44 44 44 44 44
A=13 44 44 44 45 45 45 45 45 45 45
A=14 45 45 45 45 45 45 45 45 45 45
A=15 45 45 45 45 45 45 45 46 46 46
A=16 46 46 46 46 46 46 46 46 46 46
A=17 46 46 46 46 46 46 46 46 46 46
A=18 46 47 47 47 47 47 47 47 47 47
A=19 47 47 47 47 47 47 47 47 47 47
A=20 47 47 47 47 47 48 48 48 48 48
A=21 48 48 48 48 48 48 48 48 48 48
A=22 48 48 48 48 48 48 48 48 48 49
A=23 49 49 49 49 49 49 49 49 49 49
A=24 49 49 49 49 49 49 49 49 49 49
A=25 49 49 49 0 0 0
— 61 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 7.14 Log-Addition Table, latab[val], val = (10× A) + B
B=0 B=1 B=2 B=3 B=4 B=5 B=6 B=7 B=8 B=9
A=0 0x0040 0x003f 0x003e 0x003d 0x003c 0x003b 0x003a 0x0039 0x0038 0x0037
A=1 0x0036 0x0035 0x0034 0x0034 0x0033 0x0032 0x0031 0x0030 0x002f 0x002f
A=2 0x002e 0x002d 0x002c 0x002c 0x002b 0x002a 0x0029 0x0029 0x0028 0x0027
A=3 0x0026 0x0026 0x0025 0x0024 0x0024 0x0023 0x0023 0x0022 0x0021 0x0021
A=4 0x0020 0x0020 0x001f 0x001e 0x001e 0x001d 0x001d 0x001c 0x001c 0x001b
A=5 0x001b 0x001a 0x001a 0x0019 0x0019 0x0018 0x0018 0x0017 0x0017 0x0016
A=6 0x0016 0x0015 0x0015 0x0015 0x0014 0x0014 0x0013 0x0013 0x0013 0x0012
A=7 0x0012 0x0012 0x0011 0x0011 0x0011 0x0010 0x0010 0x0010 0x000f 0x000f
A=8 0x000f 0x000e 0x000e 0x000e 0x000d 0x000d 0x000d 0x000d 0x000c 0x000c
A=9 0x000c 0x000c 0x000b 0x000b 0x000b 0x000b 0x000a 0x000a 0x000a 0x000a
A=10 0x000a 0x0009 0x0009 0x0009 0x0009 0x0009 0x0008 0x0008 0x0008 0x0008
A=11 0x0008 0x0008 0x0007 0x0007 0x0007 0x0007 0x0007 0x0007 0x0006 0x0006
A=12 0x0006 0x0006 0x0006 0x0006 0x0006 0x0006 0x0005 0x0005 0x0005 0x0005
A=13 0x0005 0x0005 0x0005 0x0005 0x0004 0x0004 0x0004 0x0004 0x0004 0x0004
A=14 0x0004 0x0004 0x0004 0x0004 0x0004 0x0003 0x0003 0x0003 0x0003 0x0003
A=15 0x0003 0x0003 0x0003 0x0003 0x0003 0x0003 0x0003 0x0003 0x0003 0x0002
A=16 0x0002 0x0002 0x0002 0x0002 0x0002 0x0002 0x0002 0x0002 0x0002 0x0002
A=17 0x0002 0x0002 0x0002 0x0002 0x0002 0x0002 0x0002 0x0002 0x0001 0x0001
A=18 0x0001 0x0001 0x0001 0x0001 0x0001 0x0001 0x0001 0x0001 0x0001 0x0001
A=19 0x0001 0x0001 0x0001 0x0001 0x0001 0x0001 0x0001 0x0001 0x0001 0x0001
A=20 0x0001 0x0001 0x0001 0x0001 0x0001 0x0001 0x0001 0x0001 0x0001 0x0001
A=21 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
A=22 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
A=23 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
A=24 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
A=25 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000 0x0000
— 62 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 7.15 Hearing Threshold Table, hth[fscod][band]
band # hth[0][band] hth[1][band] hth[2][band] band hth[0][band] hth[1][band] hth[2][band]
(fs=48 kHz) (fs=44.1 kHz) (fs=32 kHz) number (fs=48 kHz) (fs=44.1 kHz) (fs=32 kHz)
0 0x04d0 0x04f0 0x0580 25 0x0340 0x0350 0x0380
1 0x04d0 0x04f0 0x0580 26 0x0330 0x0340 0x0380
2 0x0440 0x0460 0x04b0 27 0x0320 0x0340 0x0370
3 0x0400 0x0410 0x0450 28 0x0310 0x0320 0x0360
4 0x03e0 0x03e0 0x0420 29 0x0300 0x0310 0x0350
5 0x03c0 0x03d0 0x03f0 30 0x02f0 0x0300 0x0340
6 0x03b0 0x03c0 0x03e0 31 0x02f0 0x02f0 0x0330
7 0x03b0 0x03b0 0x03d0 32 0x02f0 0x02f0 0x0320
8 0x03a0 0x03b0 0x03c0 33 0x02f0 0x02f0 0x0310
9 0x03a0 0x03a0 0x03b0 34 0x0300 0x02f0 0x0300
10 0x03a0 0x03a0 0x03b0 35 0x0310 0x0300 0x02f0
11 0x03a0 0x03a0 0x03b0 36 0x0340 0x0320 0x02f0
12 0x03a0 0x03a0 0x03a0 37 0x0390 0x0350 0x02f0
13 0x0390 0x03a0 0x03a0 38 0x03e0 0x0390 0x0300
14 0x0390 0x0390 0x03a0 39 0x0420 0x03e0 0x0310
15 0x0390 0x0390 0x03a0 40 0x0460 0x0420 0x0330
16 0x0380 0x0390 0x03a0 41 0x0490 0x0450 0x0350
17 0x0380 0x0380 0x03a0 42 0x04a0 0x04a0 0x03c0
18 0x0370 0x0380 0x03a0 43 0x0460 0x0490 0x0410
19 0x0370 0x0380 0x03a0 44 0x0440 0x0460 0x0470
20 0x0360 0x0370 0x0390 45 0x0440 0x0440 0x04a0
21 0x0360 0x0370 0x0390 46 0x0520 0x0480 0x0460
22 0x0350 0x0360 0x0390 47 0x0800 0x0630 0x0440
23 0x0350 0x0360 0x0390 48 0x0840 0x0840 0x0450
24 0x0340 0x0350 0x0380 49 0x0840 0x0840 0x04e0
— 63 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 7.16 Bit Allocation Pointer Table, baptab[]
address baptab[address] address baptab[address]
0 0 32 10
1 1 33 10
2 1 34 10
3 1 35 11
4 1 36 11
5 1 37 11
6 2 38 11
7 2 39 12
8 3 40 12
9 3 41 12
10 3 42 12
11 4 43 13
12 4 44 13
13 5 45 13
14 5 46 13
15 6 47 14
16 6 48 14
17 6 49 14
18 6 50 14
19 7 51 14
20 7 52 14
21 7 53 14
22 7 54 14
23 8 55 15
24 8 56 15
25 8 57 15
26 8 58 15
27 9 59 15
28 9 60 15
29 9 61 15
30 9 62 15
31 10 63 15
— 64 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 7.17 Quantizer Levels and Mantissa Bits vs. bap
bap quantizer levels mantissa bits
(group bits / num in group)
0 0 0
1 3 1.67 (5/3)
2 5 2.33 (7/3)
3 7 3
4 11 3.5 (7/2)
5 15 4
6 32 5
7 64 6
8 128 7
9 256 8
10 512 9
11 1024 10
12 2048 11
13 4096 12
14 16,384 14
15 65,536 16
7.3 Quantization and decoding of mantissas
7.3.1 Overview
All mantissas are quantized to a fixed level of precision indicated by the
corresponding bap. Mantissas quantized to 15 or fewer levels use symmetric quantization.
Mantissas quantized to more than 15 levels use asymmetric quantization which is a
conventional two’s complement representation.
Some quantized mantissa values are grouped together and encoded into a common
codeword. In the case of the 3-level quantizer, 3 quantized values are grouped together
and represented by a 5-bit codeword in the data stream. In the case of the 5-level
quantizer, 3 quantized values are grouped and represented by a 7-bit codeword. For the
11-level quantizer, 2 quantized values are grouped and represented by a-bit
7 codeword.
In the encoder, each transform coefficient (which is always < 1.0) is left justified
by shifting its binary representation left the number of times indicated by its exponent (0 to
24 left shifts). The amplified coefficient is then quantized to a number of levels indicated
by the correspondingbap.
The following table indicates which quantizer to use for each bap. If a bap equals 0,
no bits are sent for the mantissa. Grouping is used for baps of 1, 2 and 4 (3, 5, and 11 level
quantizers.)
— 65 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 7.18 Mapping of bap to Quantizer
bap quantizer levels quantization type mantissa bits (qntztab[bap])
(group bits / num in group)
0 0 none 0
1 3 symmetric 1.67 (5/3)
2 5 symmetric 2.33 (7/3)
3 7 symmetric 3
4 11 symmetric 3.5 (7/2)
5 15 symmetric 4
6 32 asymmetric 5
7 64 asymmetric 6
8 128 asymmetric 7
9 256 asymmetric 8
10 512 asymmetric 9
11 1024 asymmetric 10
12 2048 asymmetric 11
13 4096 asymmetric 12
14 16,384 asymmetric 14
15 65,536 asymmetric 16
During the decode process, the mantissa data stream is parsed up into single
mantissas of varying length, interspersed with groups representing combined coding of
either triplets or pairs of mantissas. In the bit stream, the mantissas in each exponent set
are arranged in frequency ascending order. However, groups occur at the position of the
first mantissa contained in the group. Nothing is unpacked from the bit stream for the
subsequent mantissas in the group.
≤ bap ≤ 15)
7.3.2 Expansion of mantissas for asymmetric quantization (6
For bit allocation pointer array values, 6 ≤ bap ≤ 15, asymmetric fractional two’s
complement quantization is used. Each mantissa, along with its exponent, are the floating
point representation of a transform coefficient. The decimal point is considered to be to
the left of the MSB; therefore the mantissa word represents the range of
(1.0 - 2 -(qntztab[bap] - 1) ) to -1.0.
The mantissa number k, of length qntztab[bap[k]] , is extracted from the bit stream.
Conversion back to a fixed point representation is achieved by right shifting the mantissa
by its exponent. This process is represented by the following formula:
transform_coefficient[k] = mantissa[k] >> exponent[k] ;
No grouping is done for asymmetrically quantized mantissas.
≤ bap ≤ 5)
7.3.3 Expansion of mantissas for symmetrical quantization (1
For bap values of 1 through 5 (1 ≤ bap ≤ 5), the mantissas are represented by
coded values. The coded values are converted to standard 2’s complement fractional
binary words by a table lookup. The number of bits indicated by a mantissa’s bap are
— 66 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
extracted from the bit stream and right justified. This coded value is treated as a table
index and is used to look up the mantissa value. The resulting mantissa value is right
shifted by the corresponding exponent to generate the transform coefficient value.
transform_coefficient[k] = quantization_table[mantissa_code[k]] >> exponent[k] ;
The mapping of coded mantissa value into the actual mantissa value is shown in
tables Table 7.19 through Table 7.23.
7.3.4 Dither for zero bit mantissas (bap=0)
The AC-3 decoder uses random noise (dither) values instead of quantized values
when the number of bits allocated to a mantissa is zero (bap = 0). The use of the random
value is conditional on the value of dithflag. When the value of dithflag is 1, the random
noise value is used. When the value of dithflag is 0, a true zero value is used. There is a
dithflag variable for each channel. Dither is applied after the individual channels are
extracted from the coupling channel. In this way, the dither applied to each channel's
upper frequencies is uncorrelated.
Any reasonably random sequence may be used to generate the dither values. The
word length of the dither values is not critical. Eight bits is sufficient. The optimum scaling
for the dither words is to take a uniform distribution of values between -1 and +1, and
scale this by 0.707, resulting in a uniform distribution between +0.707 and -0.707. A
scalar of 0.75 is close enough to also be considered optimum. A scalar of 0.5 (uniform
distribution between +0.5 and -0.5) is also acceptable.
Once a dither value is assigned to a mantissa, the mantissa is right shifted
according to its exponent to generate the corresponding transform coefficient.
transform_coefficient[k] = scaled_dither_value >> exponent[k] ;
Table 7.19 bap=1 (3-Level) Quantization
mantissa code mantissa value
0 -2./3
1 0
2 2./3
Table 7.20 bap=2 (5-Level) Quantization
mantissa code mantissa value
0 -4./5
1 -2./5
2 0
3 2./5
4 4./5
— 67 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 7.21 bap=3 (7-Level) Quantization
mantissa code mantissa value
0 -6./7
1 -4./7
2 -2./7
3 0
4 2./7
5 4./7
6 6./7
Table 7.22 bap=4 (11-Level) Quantization
mantissa code mantissa value
0 -10./11
1 -8./11
2 -6./11
3 -4./11
4 -2./11
5 0
6 2./11
7 4./11
8 6./11
9 8./11
10 10./11
Table 7.23 bap=5 (15-Level) Quantization
mantissa code mantissa value
0 -14./15
1 -12./15
2 -10./15
3 -8./15
4 -6./15
5 -4./15
6 -2./15
7 0
8 2./15
9 4./15
10 6./15
11 8./15
12 10./15
13 12./15
14 14./15
— 68 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
7.3.5 Ungrouping of mantissas
In the case when bap = 1, 2, or 4, the coded mantissa values are compressed
further by combining 3 level words and 5 level words into separate groups representing
triplets of mantissas, and 11 level words into groups representing pairs of mantissas.
Groups are filled in the order that the mantissas are processed. If the number of mantissas
in an exponent set does not fill an integral number of groups, the groups are shared across
exponent sets. The next exponent set in the block continues filling the partial groups. If
the total number of 3 or 5 level quantized transform coefficient derived words are not each
divisible by 3, or if the 11 level words are not divisible by 2, the final groups of a block are
padded with dummy mantissas to complete the composite group. Dummies are ignored by
the decoder. Groups are extracted from the bit stream using the length derived from bap.
Three level quantized mantissas (bap = 1) are grouped into triples each of 5 bits. Five level
quantized mantissas (bap = 2) are grouped into triples each of 7 bits. Eleven level
quantized mantissas b(ap = 4) are grouped into pairs each of 7 bits.
Encoder equations
bap = 1:
group_code = 9 * mantissa_code[a] + 3 * mantissa_code[b] + m antissa_code[c] ;
bap = 2:
group_code = 25 * mantissa_code[a] + 5 * mantissa_code[b] + mantissa_code[c] ;
bap = 4:
group_code = 11 * mantissa_code[a] + mantissa_code[b] ;
Decoder equations
bap = 1:
mantissa_code[a] = truncate (group_code / 9) ;
mantissa_code[b] = truncate ((group_code % 9) / 3 ) ;
mantissa_code[c] = (group_code % 9) % 3 ;
bap = 2:
mantissa_code[a] = truncate (group_code / 25) ;
mantissa_code[b] = truncate ((group_code % 25) / 5 ) ;
mantissa_code[c] = (group_code % 25) % 5 ;
bap = 4:
mantissa_code[a] = truncate (group_code / 11) ;
mantissa_code[b] = group_code % 11 ;
where mantissa a comes before mantissa b, which comes before mantissa c
7.4 Channel coupling
7.4.1 Overview
If enabled, channel coupling is performed on encode by averaging the transform
coefficients across channels that are included in the coupling channel. Each coupled
channel has a unique set of coupling coordinates which are used to preserve the high
frequency envelopes of the original channels. The coupling process is performed above a
coupling frequency that is defined by thecplbegf value.
— 69 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
The decoder converts the coupling channel back into individual channels by
multiplying the coupled channel transform coefficient values by the coupling coordinate
for that channel and frequency sub-band. An additional processing step occurs for the 2/0
mode. If the phsflginu bit = 1 or the equivalent state is continued from a previous block,
then phase restoration bits are sent in the bit stream via phase flag bits. The phase flag bits
represent the coupling sub-bands in a frequency ascending order. If a phase flag bit = 1 for
a particular sub-band, all the right channel transform coefficients within that coupled sub-
band are negated after modification by the coupling coordinate, but before inverse
transformation.
7.4.2 Sub-band structure for coupling
Transform coefficients # 37 through # 252 are grouped into 18 sub-bands of 12
coefficients each, as shown in Table 7.24. The parameter cplbegf indicates the number of
the coupling sub-band which is the first to be included in the coupling process. Below the
frequency (or transform coefficient number) indicated by cplbegf all channels are
independently coded. Above the frequency indicated by cplbegf, channels included in the
coupling process (chincpl[ch] = 1) share the common coupling channel up to the frequency
(or tc #) indicated by cplendf. The coupling channel is coded up to the frequency (or tc #)
indicated by cplendf, which indicates the last coupling sub-band which is coded. The
parameter cplendf is interpreted by adding 2 to its value, so the last coupling sub-band
which is coded can range from 2-17.
Table 7.24 Coupling Sub-Bands
coupling low tc # high tc # lf cutoff (kHz) hf cutoff (kHz) lf cutoff (kHz) hf cutoff (kHz)
sub-band # @ fs=48 kHz @ fs=48 kHz @ fs=44.1 kHz @ fs=44.1 kHz
0 37 48 3.42 4.55 3.14 4.18
1 49 60 4.55 5.67 4.18 5.21
2 61 72 5.67 6.80 5.21 6.24
3 73 84 6.80 7.92 6.24 7.28
4 85 96 7.92 9.05 7.28 8.31
5 97 108 9.05 10.17 8.31 9.35
6 109 120 10.17 11.30 9.35 10.38
7 121 132 11.30 12.42 10.38 11.41
8 133 144 12.42 13.55 11.41 12.45
9 145 156 13.55 14.67 12.45 13.48
10 157 168 14.67 15.80 13.48 14.51
11 169 180 15.80 16.92 14.51 15.55
12 181 192 16.92 18.05 15.55 16.58
13 193 204 18.05 19.17 16.58 17.61
14 205 216 19.17 20.30 17.61 18.65
15 217 228 20.30 21.42 18.65 19.68
16 229 240 21.42 22.55 19.68 20.71
17 241 252 22.55 23.67 20.71 21.75
Note: At 32 kHz sampling rate the sub-band frequency ranges are 2/3 the values of those for 48 kHz.
— 70 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
The coupling sub-bands are combined into coupling bands for which coupling
coordinates are generated (and included in the bit stream). The coupling band structure is
indicated by cplbndstrc[sbnd] . Each bit of the cplbndstrc[] array indicates whether the sub-band
indicated by the index is combined into the previous (lower in frequency) coupling band.
Coupling bands are thus made from integral numbers of coupling sub-bands. (See clause
[Link] on page 28.)
7.4.3 Coupling coordinate format
Coupling coordinates exist for each coupling band [bnd] in each channel [ch] which
is coupled (chincp[ch] ==1). Coupling coordinates are sent in a floating point format. The
exponent is sent as a 4-bit value (cplcoexp[ch][bnd] ) indicating the number of right shifts
which should be applied to the fractional mantissa value. The mantissas are transmitted as
4-bit values (cplcomant[ch][bnd] ) which must be properly scaled before use. Mantissas are
unsigned values so a sign bit is not used. Except for the limiting case where the exponent
value = 15, the mantissa value is known to be between 0.5 and 1.0. Therefore, when the
exponent value < 15, the msb of the mantissa is always equal to ‘1’ and is not transmitted;
the next 4 bits of the mantissa are transmitted. This provides one additional bit of
resolution. When the exponent value = 15 the mantissa value is generated by dividing the
4-bit value of cplcomant by 16. When the exponent value is < 15 the mantissa value is
generated by adding 16 to the 4-bit value ofcplcomant and then dividing the sum by 32.
Coupling coordinate dynamic range is increased beyond what the 4-bit exponent
can provide by the use of a per channel 2-bit master coupling coordinate (mstrcplco[ch] )
which is used to range all of the coupling coordinates within that channel. The exponent
values for each channel are increased by 3 times the value of mstrcplco which applies to that
channel. This increases the dynamic range of the coupling coordinates by an additional 54
dB.
The following pseudo code indicates how to generate the coupling coordinate
(cplco) for each coupling band[bnd] in each channel[ch].
Pseudo code
if (cplcoexp[ch, bnd] == 15)
{
cplco_temp[ch,bnd] = cplcomant[ch,bnd] / 16 ;
}
else
{
cplco_temp[ch,bnd] = (cplcomant[ch,bnd] + 16) / 32 ;
}
cplco[ch,bnd] = cplco_temp[ch,bnd] >> (cplcoexp[ch,bnd] + 3 * mstrcplco[ch]) ;
Using the cplbndstrc[] array, the values of coupling coordinates which apply to
coupling bands are converted (by duplicating values as indicated by values of ‘1’ in
cplbandstrc[] ) to values which apply to coupling sub-bands.
Individual channel mantissas are then reconstructed from the coupled channel as
follows:
— 71 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Pseudo code
for(sbnd = cplbegf; sbnd < 3 + cplendf; sbnd++)
{
for (bin = 0; bin < 12; bin++)
{
chmant[ch, sbnd*12+bin+37] = cplmant[sbnd*12+bin+37] * cplco[ch, sbnd] * 8 ;
}
}
7.5 Rematrixing
7.5.1 Overview
Rematrixing in AC-3 is a channel combining technique in which sums and
differences of highly correlated channels are coded rather than the original channels
themselves. That is, rather than code and pack left and right in a two channel coder, we
construct:
left' = 0.5*(left + right) ;
right' = 0.5*(left - right) ;
The usual quantization and data packing operations are then performed on left' and
right'. Clearly, if the original stereo signal were identical in both channels (i.e. two-channel
mono), this technique will result in a left' signal that is identical to the original left and
right channels, and a right' signal that is identically zero. As a result, we can code the right'
channel with very few bits, and increase accuracy in the more important left' channel.
This technique is especially important for preserving Dolby Surround
compatibility. To see this, consider a two channel mono source signal such as that
described above. A Dolby Pro Logic decoder will try to steer all in-phase information to
the center channel, and all out-of-phase information to the surround channel. If
rematrixing is not active, the Pro Logic decoder will receive the following signals:
Received left = left + QN1 ;
Received right = right + QN2 ;
where QN1 and QN2 are independent (i.e. uncorrelated) quantization noise sequences,
which correspond to the AC-3 coding algorithm quantization, and are program dependent.
The Pro Logic decoder will then construct center and surround channels as:
center = 0.5*(left + QN1) + 0.5*(right + QN2) ;
surround = 0.5*(left + QN1) - 0.5*(right + QN2) ; /* ignoring the 90 degree phase shift */
In the case of the center channel, QN1 and QN2 add, but remain masked by the
dominant signal left + right. In the surround channel, however, left - right cancels to zero,
and the surround speakers are left to reproduce the difference in the quantization noise
sequences (QN1 - QN2).
If channel rematrixing is active, the center and surround channels will be more
easily reproduced as:
center = left' + QN1 ;
surround = right' + QN2 ;
— 72 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
In this case, the quantization noise in the surround channel QN2 is much lower in
level, and it is masked by the difference signal, right'.
7.5.2 Frequency band definitions
In AC-3, rematrixing is performed independently in separate frequency bands.
There are four bands with boundary locations dependent on coupling information. The
boundary locations are by coefficient bin number, and the corresponding rematrixing band
frequency boundaries change with sampling frequency. The tables below indicate the
rematrixing band frequencies for sampling rates of 48 kHz and 44.1 kHz. At 32 kHz
sampling rate the rematrixing band frequencies are 2/3 the values of those shown for
48 kHz.
[Link] Coupling not in use
If coupling is not in use (cplinu = 0), then there are 4 rematrixing bands, (nrematbd =
4).
Table 7.25 Rematrix Banding Table A
band # low coeff # high coeff # low freq (kHz) high freq (kHz) low freq (kHz) high freq (kHz)
fs = 48 kHz fs = 48 kHz fs = 44.1 kHz fs = 44.1 kHz
0 13 24 1.17 2.30 1.08 2.11
1 25 36 2.30 3.42 2.11 3.14
2 37 60 3.42 5.67 3.14 5.21
3 61 252 5.67 23.67 5.21 21.75
[Link] Coupling in use, cplbegf > 2
If coupling is in use (cplinu = 1), and cplbegf > 2, there are 4 rematrixing bands
(nrematbd = 4). The last (fourth) rematrixing band ends at the point where coupling begins.
Table 7.26 Rematrixing Banding Table B
band # low coeff # high coeff # low freq (kHz) high freq (kHz) low freq (kHz) high freq (kHz)
fs = 48 kHz fs = 48 kHz fs = 44.1 kHz fs = 44.1 kHz
0 13 24 1.17 2.30 1.08 2.11
1 25 36 2.30 3.42 2.11 3.14
2 37 60 3.42 5.67 3.14 5.21
3 61 A 5.67 B 5.21 C
A = 36 + cplbegf*12 B = (A+1/2)*0.09375 kHz C = (A+1/2)*0.08613 kHz
[Link] Coupling in use, 2≥ cplbegf > 0
If coupling is in use (cplinu = 1), and 2 ≥ cplbegf > 0, there are 3 rematrixing bands
(nrematbd = 3). The last (third) rematrixing band ends at the point where coupling begins.
— 73 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 7.27 Rematrixing Banding Table C
band # low coeff # high coeff # low freq (kHz) high freq (kHz) low freq (kHz) high freq (kHz)
fs = 48 kHz fs = 48 kHz fs = 44.1 kHz fs = 44.1 kHz
0 13 24 1.17 2.30 1.08 2.11
1 25 36 2.30 3.42 2.11 3.14
2 37 A 3.42 B 3.14 C
A = 36 + cplbegf*12 B = (A+1/2)*0.09375 kHz C = (A+1/2)*0.08613 kHz
[Link] Coupling in use, cplbegf=0
If coupling is in use (cplinu = 1), and cplbegf = 0, there are 2 rematrixing bands
(nrematbd = 2).
Table 7.28 Rematrixing Banding Table D
band # low coeff # high coeff # low freq (kHz) high freq (kHz) low freq (kHz) high freq (kHz)
fs = 48 kHz fs = 48 kHz fs = 44.1 kHz fs = 44.1 kHz
0 13 24 1.17 2.30 1.08 2.11
1 25 36 2.30 3.42 2.11 3.14
7.5.3 Encoding technique
If the 2/0 mode is selected, then rematrixing is employed by the encoder. The
squares of the transform coefficients are summed up over the previously defined
rematrixing frequency bands for the following combinations: L, R, L+R, L-R.
Pseudo code
if(minimum sum for a rematrixing sub-band n is L or R)
{
the variable rematflg[n] = 0 ;
transmitted left = input L ;
transmitted right = input R ;
}
if(minimum sum for a rematrixing sub-band n is L+R or L-R)
{
the variable rematflg[n] = 1 ;
transmitted left = 0.5 * input (L+R) ;
transmitted right = 0.5 * input (L-R) ;
}
This selection of matrix combination is done on a block by block basis. The
remaining encoder processing of the transmitted left and right channels is identical whether
or not the rematrixing flags are 0 or 1.
7.5.4 Decoding technique
For each rematrixing band, a single bit (the rematrix flag) is sent in the data stream,
indicating whether or not the two channels have been rematrixed for that band. If the bit is
— 74 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
clear, no further operation is required. If the bit is set, the AC-3 decoder performs the
following operation to restore the individual channels:
left(band n) = received left(band n) + received right(band n) ;
right(band n) = received left(band n) - received right(band n) ;
Note that if coupling is not in use, the two channels may have different
bandwidths. As such, rematrixing is only applied up to the lower bandwidth of the two
channels. Regardless of the actual bandwidth, all four rematrixing flags are sent in the data
stream (assuming the rematrixing strategy bit is set).
7.6 Dialogue normalization
The AC-3 syntax provides elements which allow the encoded bit stream to satisfy
listeners in many different situations. The dialnorm element allows for uniform reproduction
of spoken dialogue when decoding any AC -3 bit stream.
7.6.1 Overview
When audio from different sources is reproduced, the apparent loudness often
varies from source to source. The different sources of audio might be different program
segments during a broadcast (i.e. the movie vs. a commercial message); different
broadcast channels; or different media (disc vs. tape). The AC-3 coding technology solves
this problem by explicitly coding an indication of loudness into the AC-3 bit stream.
The subjective level of normal spoken dialogue is used as a reference. The 5-bit
dialogue normalization word which is contained in BSI, dialnorm , is an indication of the
subjective loudness of normal spoken dialogue compared to digital 100%. The 5-bit value
is interpreted as an unsigned integer (most significant bit transmitted first) with a range of
possible values from 1 to 31. The unsigned integer indicates the headroom in dB above the
subjective dialogue level. This value can also be interpreted as an indication of how many
dB the subjective dialogue level is below digital 100%.
The dialnorm value is not directly used by the AC-3 decoder. Rather, the value is
used by the section of the sound reproduction system responsible for setting the
reproduction volume, e.g. the system volume control. The system volume control is
generally set based on listener input as to the desired loudness, or sound pressure level
(SPL). The listener adjusts a volume control which generally directly adjusts the
reproduction system gain. With AC-3 and the dialnorm value, the reproduction system gain
becomes a function of both the listeners desired reproduction sound pressure level for
dialogue, and the dialnorm value which indicates the level of dialogue in the audio signal.
The listener is thus able to reliably set the volume level of dialogue, and the subjective
level of dialogue will remain uniform no matter which AC-3 program is decoded.
Example
The listener adjusts the volume control to 67 dB. (With AC-3 dialogue
normalization, it is possible to calibrate a system volume control directly in sound
pressure level, and the indication will be accurate for any AC-3 encoded audio source).
A high quality entertainment program is being received, and the AC-3 bit stream
— 75 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
indicates that dialogue level is 25 dB below 100% digital level. The reproduction
system automatically sets the reproduction system gain so that full scale digital signals
reproduce at a sound pressure level of 92 dB. The spoken dialogue (down 25 dB) will
thus reproduce at 67 dB SPL.
The broadcast program cuts to a commercial message, which has dialogue level at
-15 dB with respect to 100% digital level. The system level gain automatically drops,
so that digital 100% is now reproduced at 82 dB SPL. The dialogue of the commercial
(down 15 dB) reproduces at a 67 dB SPL, as desired.
In order for the dialogue normalization system to work, the dialnorm value must be
communicated from the AC-3 decoder to the system gain controller so that dialnorm can
interact with the listener adjusted volume control. If the volume control function for a
system is performed as a digital multiply inside the AC-3 decoder, then the listener
selected volume setting must be communicated into the AC-3 decoder. The listener
selected volume setting and the dialnorm value must be brought together and combined in
order to adjust the final reproduction system gain.
Adjustment of the system volume control is not an AC-3 function. The AC-3 bit
stream simply conveys useful information which allows the system volume control to be
implemented in a way which automatically removes undesirable level variations between
program sources. It is mandatory that the dialnorm value and the user selected volume
setting both be used to set the reproduction system gain.
7.7 Dynamic range compression
7.7.1 Dynamic range control; dynrng, dynrng2
The dynrng element allows the program provider to implement subjectively pleasing
dynamic range reduction for most of the intended audience, while allowing individual
members of the audience the option to experience more (or all) of the original dynamic
range.
[Link] Overview
A consistent problem in the delivery of audio programming is that different
members of the audience wish to enjoy different amounts of dynamic range. Original high
quality programming (such as feature films) are typically mixed with quite a wide dynamic
range. Using dialogue as a reference, loud sounds like explosions are often 20 dB or more
louder, and faint sounds like leaves rustling may be 50 dB quieter. In many listening
situations it is objectionable to allow the sound to become very loud, and thus the loudest
sounds must be compressed downwards in level. Similarly, in many listening situations the
very quiet sounds would be inaudible, and must be brought upwards in level to be heard.
Since most of the audience will benefit from a limited program dynamic range,
soundtracks which have been mixed with a wide dynamic range are generally compressed:
the dynamic range is reduced by bringing down the level of the loud sounds and bringing
up the level of the quiet sounds. While this satisfies the needs of much of the audience, it
removes the ability of some in the audience to experience the original sound program in its
— 76 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
intended form. The AC-3 audio coding technology solves this conflict by allowing
dynamic range control values to be placed into the AC-3 bit stream.
The dynamic range control values, dynrng, indicate a gain change to be applied in
the decoder in order to implement dynamic range compression. Each dynrng value can
indicate a gain change of ± 24 dB. The sequence of dynrng values are a compression
control signal. An AC-3 encoder (or a bit stream processor) will generate the sequence of
dynrng values. Each value is used by the AC-3 decoder to alter the gain of one or more
audio blocks. The dynrng values typically indicate gain reduction during the loudest signal
passages, and gain increases during the quiet passages. For the listener, it is desirable to
bring the loudest sounds down in level towards dialogue level, and the quiet sounds up in
level, again towards dialogue level. Sounds which are at the same loudness as the normal
spoken dialogue will typically not have their gain changed.
The compression is actually applied to the audio in the AC-3 decoder. The
encoded audio has full dynamic range. It is permissible for the AC-3 decoder to
(optionally, under listener control) ignore the dynrng values in the bit stream. This will
result in the full dynamic range of the audio being reproduced. It is also permissible (again
under listener control) for the decoder to use some fraction of the dynrng control value,
and to use a different fraction of positive or negative values. The AC-3 decoder can thus
reproduce either fully compressed audio (as intended by the compression control circuit in
the AC-3 encoder); full dynamic range audio; or audio with partially compressed dynamic
range, with different amounts of compression for high level signals and low level signals.
Example
A feature film soundtrack is encoded into AC-3. The original program mix has
dialogue level at -25 dB. Explosions reach full scale peak level of 0 dB. Some quiet
sounds which are intended to be heard by all listeners are 50 dB below dialogue level
(or -75 dB). A compression control signal (sequence of dynrng values) is generated by
the AC-3 encoder. During those portions of the audio program where the audio level
is higher than dialogue level the dynrng values indicate negative gain, or gain reduction.
For full scale 0 dB signals (the loudest explosions), gain reduction of -15 dB is
encoded into dynrng. For very quiet signals, a gain increase of 20 dB is encoded into
dynrng.
A listener wishes to reproduce this soundtrack quietly so as not to disturb anyone,
but wishes to hear all of the intended program content. The AC-3 decoder is allowed
to reproduce the default, which is full compression. The listener adjusts dialogue level
to 60 dB SPL. The explosions will only go as loud as 70 dB (they are 25 dB louder
than dialogue but get -15 dB of gain applied), and the quiet sounds will reproduce at
30 dB SPL (20 dB of gain is applied to their original level of 50 dB below dialogue
level). The reproduced dynamic range will be 70 dB - 30 dB = 40 dB.
The listening situation changes, and the listener now wishes to raise the
reproduction level of dialogue to 70 dB SPL, but still wishes to limit how loud the
program plays. Quiet sounds may be allowed to play as quietly as before. The listener
instructs the AC-3 decoder to continue using the dynrng values which indicate gain
reduction, but to attenuate the values which indicate gain increases by a factor of ½.
— 77 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
The explosions will still reproduce 10 dB above dialogue level, which is now 80 dB
SPL. The quiet sounds are now increased in level by 20 dB / 2 = 10 dB. They will now
be reproduced 40 dB below dialogue level, at 30 dB SPL. The reproduced dynamic
range is now 80dB - 30 dB = 50 dB.
Another listener wishes the full original dynamic range of the audio. This listener
adjusts the reproduced dialogue level to 75 dB SPL, and instructs the AC-3 decoder to
ignore the dynamic range control signal. For this listener the quiet sounds reproduce at
25 dB SPL, and the explosions hit 100 dB SPL. The reproduced dynamic range is
100 dB - 25 dB = 75 dB. This reproduction is exactly as intended by the original
program producer.
In order for this dynamic range control method to be effective, it should be used by
all program providers. Since all broadcasters wish to supply programming in the form that
is most usable by their audience, nearly all broadcasters will apply dynamic range
compression to any audio program which has a wide dynamic range. This compression is
not reversible unless it is implemented by the technique embedded in AC-3. If broadcasters
make use of the embedded AC-3 dynamic range control system, then listeners can have
some control over their reproduced dynamic range. Broadcasters must be confident that
the compression characteristic that they introduce into AC-3 will, by default, be heard by
the listeners. Therefore, the AC-3 decoder shall, by default, implement the compression
characteristic indicated by the dynrng values in the data stream. AC-3 decoders may
optionally allow listener control over the use of the dynrng values, so that the listener may
select full or partial dynamic range reproduction.
[Link] Detailed implementation
The dynrng field in the AC-3 data stream is 8-bits in length. In the case that acmod
= 0 (1+1 mode, or 2 completely independent channels) dynrng applies to the first channel
(Ch1), and dynrng2 applies to the second channel (Ch2). While dynrng is described below,
dynrng2 is handled identically. The dynrng value may be present in any audio block. When
the value is not present, the value from the previous block is used, except for block 0. In
the case of block 0, if a new value of dynrng is not present, then a value of ‘0000 0000’
should be used. The most significant bit of dynrng (and of dynrng2) is transmitted first. The
first three bits indicate gain changes in 6.02 dB increments which can be implemented with
an arithmetic shift operation. The following five bits indicate linear gain changes, and
require a 6-bit multiply. We will represent the 3 and 5 bit fields dynrng
of as following:
X0 X1 X2 . Y3 Y4 Y5 Y6 Y7
The meaning of the X values is most simply described by considering X to represent a 3-
bit signed integer with values from -4 to 3. The gain indicated by X is then (X+1) * 6.02
dB. The following table shows this in detail.
— 78 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 7.29 Meaning of 3 msb of dynrng
X0 X1 X2 Integer Value Gain Indicated Arithmetic Shifts
0 1 1 3 +24.08 dB 4 left
0 1 0 2 +18.06 dB 3 left
0 0 1 1 +12.04 dB 2 left
0 0 0 0 +6.02 dB 1 left
1 1 1 -1 0 dB none
1 1 0 -2 -6.02 dB 1 right
1 0 1 -3 -12.04 dB 2 right
1 0 0 -4 -18.06 dB 3 right
The value of Y is a linear representation of a gain change of up to 6 dB. Y is
considered to be an unsigned fractional integer, with a leading value of 1, or: 0.1Y3 Y4 Y5
Y6 Y7 (base 2). Y can represent values between 0.1111112 (or 63/64) and 0.1000002 (or
1/2). Thus, Y can represent gain changes from -0.14 dB to -6.02 dB.
The combination of X and Y values allows dynrng to indicate gain changes from
24.08 - 0.14 = +23.94 dB, to -18.06 - 6 = -24.06 dB. The bit code of ‘0000 0000’
indicates 0 dB (unity) gain.
Partial Compression
The dynrng value may be operated on in order to make it represent a gain change
which is a fraction of the original value. In order to alter the amount of compression which
will be applied, consider thedynrng to represent a signed fractional number, or:
X0 . X1 X2 Y3 Y4 Y5 Y6 Y7
where X0 is the sign bit and X1 X2 Y3 Y4 Y5 Y6 Y7 are a 7-bit fraction. This 8 bit signed
fractional number may be multiplied by a fraction indicating the fraction of the original
compression to apply. If this value is multiplied by 1/2, then the compression range of ± 24
dB will be reduced to ±12 dB. After the multiplicative scaling, the 8-bit result is once
again considered to be of the original form 0XX1 X2 . Y3 Y4 Y5 Y6 Y7 and used normally.
7.7.2 Heavy compression; compr, compr2
The compr element allows the program provider (or broadcaster) to implement a
large dynamic range reduction (heavy compression) in a way which assures that a
monophonic downmix will not exceed a certain peak level. The heavily compressed audio
program may be desirable for certain listening situations such as movie delivery to a hotel
room, or to an airline seat. The peak level limitation is useful when, for instance, a
monophonic downmix will feed an RF modulator and overmodulation must be avoided.
[Link] Overview
Some products which decode the AC-3 bit stream will need to deliver the resulting
audio via a link with very restricted dynamic range. One example is the case of a television
signal decoder which must modulate the received picture and sound onto an RF channel in
order to deliver a signal usable by a low cost television receiver. In this situation, it is
— 79 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
necessary to restrict the maximum peak output level to a known value with respect to
dialogue level, in order to prevent overmodulation. Most of the time, the dynamic range
control signal, dynrng, will produce adequate gain reduction so that the absolute peak level
will be constrained. However, since the dynamic range control system is intended to
implement a subjectively pleasing reduction in the range of perceived loudness, there is no
assurance that it will control instantaneous signal peaks adequately to prevent
overmodulation.
In order to allow the decoded AC-3 signal to be constrained in peak level, a
second control signal, compr, (compr2 for Ch2 in 1+1 mode) may be present in the AC-3
data stream. This control signal should be present in all bit streams which are intended to
be receivable by, for instance, a television set top decoder. The compr control signal is
similar to the dynrng control signal in that it is used by the decoder to alter the reproduced
audio level. The compr control signal has twice the control range as dynrng (±48 dB
compared to ±24 dB) with 1/2 the resolution (0.5 dB vs. 0.25 dB). Also, since the compr
control signal lives in BSI, it only has a time resolution of an AC-3 frame (32 ms) instead
of a block (5.3 ms).
Products which require peak audio level to be constrained should use compr instead
of dynrng when compr is present in BSI. Since most of the time the use of dynrng will
prevent large peak levels, the AC-3 encoder may only need to insert compr occasionally,
i.e., during those instants when the use of dynrng would lead to excessive peak level. If the
decoder has been instructed to use compr, and compr is not present for a particular frame,
then the dynrng control signal shall be used for that frame.
In some applications of AC-3, some receivers may wish to reproduce a very
restricted dynamic range. In this case, the compr control signal may be present at all times.
Then, the use of compr instead of dynrng will allow the reproduction of audio with very
limited dynamic range. This might be useful, for instance, in the case of audio delivery to a
hotel room or an airplane seat.
[Link] Detailed implementation
The compr field in the AC-3 data stream is 8-bits in length. In the case that acmod =
0 (1+1 mode, or 2 completely independent channels) compr applies to the first channel
(Ch1), and compr2 applies to the second channel (Ch2). While compr is described below
(for Ch1), compr2 is handled identically (but for Ch2).
The most significant bit is transmitted first. The first four bits indicate gain changes
in 6.02 dB increments which can be implemented with an arithmetic shift operation. The
following four bits indicate linear gain changes, and require a 5-bit multiply. We will
represent the two 4-bit fields ofcompr as follows:
X0 X1 X2 X3 . Y4 Y5 Y6 Y7
The meaning of the X values is most simply described by considering X to
represent a 4-bit signed integer with values from -8 to +7. The gain indicated by X is then
(X+1) * 6.02 dB. The following table shows this in detail.
— 80 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 7.30 Meaning of 3 msb of compr
X0 X1 X2 X3 Integer Value Gain Indicated Arithmetic Shifts
0 1 1 1 7 +48.16 dB 8 left
0 1 1 0 6 +42.14 dB 7 left
0 1 0 1 5 +36.12 dB 6 left
0 1 0 0 4 +30.10 dB 5 left
0 0 1 1 3 +24.08 dB 4 left
0 0 1 0 2 +18.06 dB 3 left
0 0 0 1 1 +12.04 dB 2 left
0 0 0 0 0 +6.02 dB 1 left
1 1 1 1 -1 0 dB none
1 1 1 0 -2 -6.02 dB 1 right
1 1 0 1 -3 -12.04 dB 2 right
1 1 0 0 -4 -18.06 dB 3 right
1 0 1 1 -5 -24.08 dB 4 right
1 0 1 0 -6 -30.10 dB 5 right
1 0 0 1 -7 -36.12 dB 6 right
1 0 0 0 -8 -42.14 dB 7 right
The value of Y is a linear representation of a gain change of up to -6 dB. Y is
considered to be an unsigned fractional integer, with a leading value of 1, or: 0.1 Y4 Y5 Y6
Y7 (base 2). Y can represent values between 0.111112 (or 31/32) and 0.100002 (or 1/2).
Thus, Y can represent gain changes from -0.28 dB to -6.02 dB.
The combination of X and Y values allows compr to indicate gain changes from
48.16 - 0.28 = +47.88 dB, to-42.14 - 6 = -48.14 dB.
7.8 Downmixing
In many reproduction systems the number of loudspeakers will not match the
number of encoded audio channels. In order to reproduce the complete audio program
downmixing is required. It is important that downmixing be standardized, so that program
providers can be confident of how their program will be reproduced over systems with
various numbers of loudspeakers. With standardized downmixing equations, program
producers can monitor how the downmixed version will sound and make any alterations
necessary so that acceptable results are achieved for all listeners. The program provider
can make use of the cmixlev and smixlev syntactical elements in order to affect the relative
balance of center and surround channels with respect to the left and right channels.
Downmixing of the lfe channel is optional. An ideal downmix would have the lfe
channel reproduce at an acoustic level of +10 dB with respect to the left and right
channels. Since the inclusion of this channel is optional, any downmix coefficient may be
used in practice. Care should be taken to assure that loudspeakers are not overdriven by
the full scale low frequency content of the lfe channel.
— 81 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
7.8.1 General downmix procedure
The following pseudo code describes how to arrive at un-normalized downmix
coefficients. In a practical implementation it may be necessary to then normalize the
downmix coefficients in order to prevent any possibility of overload. Normalization is
achieved by attenuating all downmix coefficients equally, such that the sum of coefficients
used to create any single output channel never exceeds 1.
Pseudo code
downmix()
{
if (acmod == 0) /* 1+1 mode, dual independent mono channels present */
{
if (output_nfront == 1) /* 1 front loudspeaker (center) */
{
if (dualmode == Chan 1) /* Ch1 output requested */
{
route left into center ;
}
else if (dualmode == Chan 2) /* Ch2 output requested */
{
route right into center ;
}
else
{
mix left into center with -6 dB gain ;
mix right into center with -6 dB gain ;
}
}
else if (output_nfront == 2) /* 2 front loudspeakers (left, right) */
{
if (dualmode == Stereo) /* output of both mono channels requested */
{
route left into left ;
route right into right ;
}
else if (dualmode == Chan 1)
{
mix left into left with -3 dB gain ;
mix left into right with -3 dB gain ;
}
else if (dualmode == Chan 2)
{
mix right into left with -3 dB gain ;
mix right into right with -3 dB gain ;
}
else /* mono sum of both mono channels requested */
{
mix left into left with -6 dB gain ;
mix right into left with -6 dB gain ;
mix left into right with -6 dB gain ;
mix right into right with -6 dB gain ;
}
}
else /* output_nfront == 3 */
— 82 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Pseudo code
{
if (dualmode == Stereo)
{
route left into left ;
route right into right ;
}
else if (dualmode == Chan 1)
{
route left into center ;
}
else if (dualmode == Chan 2)
{
route right into center ;
}
else
{
mix left into center with -6 dB gain ;
mix right into center with -6 dB gain ;
}
}
}
else /* acmod > 0 */
{
for i = { left, center, right, leftsur/monosur, rightsur }
{
if (exists(input_chan[i])) and (exists(output_chan[i]))
{
route input_chan[i] into output_chan[i] ;
}
}
if (output_mode == 2/0 Dolby Surround compatible)
/* 2 ch matrix encoded output requested */
{
if (input_nfront != 2)
{
mix center into left with -3 dB gain ;
mix center into right with -3 dB gain ;
}
if (input_nrear == 1)
{
mix -mono surround into left with -3 dB gain ;
mix mono surround into right with -3 dB gain ;
}
else if (input_nrear == 2)
{
mix -left surround into left with -3 dB gain ;
mix -right surround into left with -3 dB gain ;
mix left surround into right with -3 dB gain ;
mix right surround into right with -3 dB gain ;
}
}
else if (output_mode == 1/0) /* center only */
{
if (input_nfront != 1)
{
— 83 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Pseudo code
mix left into center with -3 dB gain ;
mix right into center with -3 dB gain ;
}
if (input_nfront == 3)
{
mix center into center using clev and +3 dB gain ;
}
if (input_nrear == 1)
{
mix mono surround into center using slev and -3 dB gain ;
}
else if (input_nrear == 2)
{
mix left surround into center using slev and -3 dB gain ;
mix right surround into center using slev and -3 dB gain ;
}
}
else /* more than center output requested */
{
if (output_nfront == 2)
{
if (input_nfront == 1)
{
mix center into left with -3 dB gain ;
mix center into right with -3 dB gain ;
}
else if (input_nfront == 3)
{
mix center into left using clev ;
mix center into right using clev ;
}
}
if (input_nrear == 1) /* single surround channel coded */
{
if (output_nrear == 0) /* no surround loudspeakers */
{
mix mono surround into left with slev and -3 dB gain ;
mix mono surround into right with slev and -3 dB gain ;
}
else if (output_nrear == 2) /* two surround loudspeaker channels */
{
mix mono srnd into left surround with -3 dB gain ;
mix mono srnd into right surround with -3 dB gain ;
}
}
else if (input_nrear == 2) /* two surround channels encoded */
{
if (output_nrear == 0)
{
mix left surround into left using slev ;
mix right surround into right using slev ;
}
else if (output_nrear == 1) .
{
— 84 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Pseudo code
mix left srnd into mono surround with -3 dB gain ;
mix right srnd into mono surround with -3 dB gain ;
}
}
}
}
}
The actual coefficients used for downmixing will affect the absolute level of the
center channel. If dialogue level is to be established with absolute SPL calibration, this
should be taken into account.
7.8.2 Downmixing into two channels
Let L, C, R, Ls, Rs refer to the 5 discrete channels which are to be mixed down to
2 channels. In the case of a single surround channel (n/1 modes), S refers to the single
surround channel. Two types of downmix should be provided: downmix to an LtRt matrix
surround encoded stereo pair; and downmix to a conventional stereo signal, LoRo. The
downmixed stereo signal (LoRo, or LtRt) may be further mixed to mono, M, by a simple
summation of the 2 channels. If the LtRt downmix is combined to mono, the surround
information will be lost. The LoRo downmix is preferred when a mono signal is desired.
Downmix coefficients shall have relative accuracy of at least0.25
± dB.
Prior to the scaling needed to prevent overflow, the general 3/2 downmix
equations for an LoRo stereo signal are:
Lo = 1.0 * L + clev * C + slev * Ls ;
Ro = 1.0 * R + clev * C + slev * Rs ;
If LoRo are subsequently combined for monophonic reproduction, the effective
mono downmix equation becomes:
M = 1.0 * L + 2.0 * clev * C + 1.0 * R + slev * Ls + slev * Rs ;
If only a single surround channel, S, is present (3/1 mode) the downmix equations
are:
Lo = 1.0 * L + clev * C + 0.7 * slev * S ;
Ro = 1.0 * R + clev * C + 0.7 * slev * S ;
M = 1.0 * L + 2.0 * clev * C + 1.0 * R + 1.4 * slev * S ;
The values of clev and slev are indicated by the cmixlev and surmixlev bit fields in the
BSI data, as shown in Table 5.4 on page21 and Table 5.5 on page22 respectively.
If the cmixlev or surmixlev bit fields indicate the reserved state (value of ‘11’), the
decoder should use the intermediate coefficient values indicated by the bit field value of 0
1. If the Center channel is missing (2/1 or 2/2 mode), the same equations may be used
without the C term. If the surround channels are missing, the same equations may be used
without the Ls,Rs, or S terms.
Prior to the scaling needed to prevent overflow, the 3/2 downmix equations for an
LtRt stereo signal are:
— 85 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Lt = 1.0 * L + 0.707 * C - 0.707 * Ls - 0.707 * Rs ;
Rt = 1.0 * R + 0.707 * C + 0.707 * Ls + 0.707 * Rs ;
If only a single surround channel, S, is present (3/1 mode) these equati
ons become:
Lt = 1.0 L + 0.707 C - 0.707 S ;
Rt = 1.0 R + 0.707 C + 0.707 S ;
If the center channel is missing (2/2 or 2/1 mode) the C term is dropped.
The actual coefficients used must be scaled downwards so that arithmetic overflow
does not occur if all channels contributing to a downmix signal happen to be at full scale.
For each audio coding mode, a different number of channels contribute to the downmix,
and a different scaling could be used to prevent overflow. For simplicity, the scaling for
the worst case may be used in all cases. This minimizes the number of coefficients
required. The worst case scaling occurs when clev and slev are both 0.707. In the case of
the LoRo downmix, the sum of the unscaled coefficients is 1 + 0.707 + 0.707 = 2.414, so
all coefficients must be multiplied by 1 / 2.414 = 0.4143 (downwards scaling by 7.65 dB).
In the case of the LtRt downmix, the sum of the unscaled coefficients is 1 + 0.707 + 0.707
+ 0.707 = 3.121, so all coefficients must be multiplied by 1 / 3.121, or 0.3204 (downwards
scaling by 9.89 dB). The scaled coefficients will typically be converted to binary values
with limited wordlength. The 6-bit coefficients shown below have sufficient accuracy.
In order to implement the LoRo 2-channel downmix, scaled (by 0.453) coefficient
values are needed which correspond to the values of 1.0, 0.707, 0.596, 0.500, 0.354.
Table 7.31 LoRo Scaled Downmix Coefficients
Unscaled Scaled 6-bit Quantized Gain Relative Coefficient
Coefficient Coefficient Coefficient Gain Error
1.0 0.414 26/64 -7.8 dB 0.0 dB ---
0.707 0.293 18/64 -11.0 dB -3.2 dB -0.2 dB
0.596 0.247 15/64 -12.6 dB -4.8 dB +0.3 dB
0.500 0.207 13/64 -13.8 dB -6.0 dB 0.0 dB
0.354 0.147 9/64 -17.0 dB -9.2 dB -0.2 dB
In order to implement the LtRt 2-ch downmix, scaled (by 0.3204) coefficient
values are needed which correspond to the values of 1.0 and 0.707.
Table 7.32 LtRt Scaled Downmix Coefficients
Unscaled Scaled 6-bit Quantized Gain Relative Coefficient
Coefficient Coefficient Coefficient Gain Error
1.0 0.3204 20/64 -10.1 dB 0.0 dB ---
0.707 0.2265 14/64 -13.20 dB -3.1 dB -0.10 dB
If it is necessary to implement a mixdown to mono, a further scaling of 1/2 will
have to be applied to the LoRo downmix coefficients to prevent overload of the mono
sum of Lo+Ro.
— 86 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
7.9 Transform equations and block switching
7.9.1 Overview
The choice of analysis block length is fundamental to any transform-based audio
coding system. A long transform length is most suitable for input signals whose spectrum
remains stationary, or varies only slowly, with time. A long transform length provides
greater frequency resolution, and hence improved coding performance for such signals. On
the other hand, a shorter transform length, possessing greater time resolution, is more
desirable for signals which change rapidly in time. Therefore, the time vs. frequency
resolution tradeoff should be considered when selecting a transform block length.
The traditional approach to solving this dilemma is to select a single transform
length which provides the best tradeoff of coding quality for both stationary and dynamic
signals. AC-3 employs a more optimal approach, which is to adapt the frequency/time
resolution of the transform depending upon spectral and temporal characteristics of the
signal being processed. This approach is very similar to behavior known to occur in human
hearing. In transform coding, the adaptation occurs by switching the block length in a
signal dependent manner.
7.9.2 Technique
In the AC-3 transform block switching procedure, a block length of either 512 or
256 samples (time resolution of 10.7 or 5.3 ms for sampling frequency of 48 kHz) can be
employed. Normal blocks are of length 512 samples. When a normal windowed block is
transformed, the result is 256 unique frequency domain transform coefficients. Shorter
blocks are constructed by taking the usual 512 sample windowed audio segment and
splitting it into two segments containing 256 samples each. The first half of an MDCT
block is transformed separately but identically to the second half of that block. Each half
of the block produces 128 unique non-zero transform coefficients representing frequencies
from 0 to fs/2, for a total of 256. This is identical to the number of coefficients produced
by a single 512 sample block, but with two times improved temporal resolution.
Transform coefficients from the two half-blocks are interleaved together on a coefficient-
by-coefficient basis to form a single block of 256 values. This block is quantized and
transmitted identically to a single long block. A similar, mirror image procedure is applied
in the decoder during signal reconstruction.
Transform coefficients for the two 256 length transforms arrive in the decoder
interleaved together bin-by-bin. This interleaved sequence contains the same number of
transform coefficients as generated by a single 512-sample transform. The decoder
processes interleaved sequences identically to noninterleaved sequences, except during the
inverse transformation described below.
Prior to transforming the audio signal from time to frequency domain, the encoder
performs an analysis of the spectral and/or temporal nature of the input signal and selects
the appropriate block length. This analysis occurs in the encoder only, and therefore can
be upgraded and improved without altering the existing base of decoders. A one bit code
per channel per transform block (blksw[ch] ) is embedded in the bit stream which conveys
— 87 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
length information: (blksw[ch] = 0 or 1 for 512 or 256 samples, respectively). The decoder
uses this information to deformat the bit stream, reconstruct the mantissa data, and apply
the appropriate inverse transform equations.
7.9.3 Decoder implementation
TDAC transform block switching is accomplished in AC-3 by making an
adjustment to the conventional forward and inverse transformation equations for the 256
length transform. The same window and FFT sine/cosine tables used for 512 sample
blocks can be reused for inverse transforming the 256 sample blocks; however, the pre-
and post-FFT complex multiplication twiddle requires an additional 128 table values for
the block-switched transform.
Since the input and output arrays for blksw[ch] = 1 are exactly one half of the length
of those for blksw = 0, the size of the inverse transform RAM and associated buffers is the
same with block switching as without.
The adjustments required for inverse transforming the 256 sample blocks are:
1. The input array contains 128 instead of 256 coefficients.
2. The IFFT pre and post-twiddle use a different cosine table, requiring an additional 128
table values (64 cosine, 64 sine).
3. The complex IFFT employs 64 points instead of 128. The same FFT cosine table can
be used with sub-sampling to retrieve only the even numbered entries.
4. The input pointers to the IFFT post-windowing operation are initialized to different
start addresses, and operate modulo 128 instead of modulo 256.
7.9.4 Transformation equations
[Link] 512-sample IMDCT transform
The following procedure describes the technique used for computing the IMDCT
for a single N=512 length real data block using a single N/4 point complex IFFT with
simple pre- and post-twiddle operations. These are the inverse transform equations used
when the blksw flag is set to zero (indicating absence of a transient, and 512 sample
transforms).
1) Define the MDCT transform coefficients = X[k], k=0,1,...N/2-1.
2) Pre-IFFT complex multiply step.
Compute N/4-point complex multiplication product Z[k], k=0,1,...N/4-1:
Pseudo code
for(k=0; k<N/4; k++)
{
/* Z[k] = (X[N/2-2*k-1] + j * X[2*k]) * (xcos1[k] + j * xsin1[k]) ; */
Z[k]=(X[N/2-2*k-1]*xcos1[k]-X[2*k]*xsin1[k])+j*(X[2*k]*xcos1[k]+X[N/2-2*k-1]*xsin1[k]);
}
— 88 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
where
xcos1[k] = -cos(2pi * (8*k+1)/(8*N)) ;
xsin1[k] = -sin(2pi * (8*k+1)/(8*N)) ;
3) Complex IFFT step.
Compute N/4-point complex IFFT of Z(k) to generate complex-valued sequence
z(n).
Pseudo code
for(n=0; n<N/4; n++)
{
z[n] = 0 ;
for(k=0; k<N/4; k++)
{
z[n] + = Z[k] * (cos(8*pi*k*n/N) + j * sin(8*pi*k*n/N)) ;
}
}
4) Post-IFFT complex multiply step.
Compute N/4-point complex multiplication produc
t y(n), n=0,1,...N/4-1 as:
Pseudo code
for(n=0; n<N/4; n++)
{
/* y[n] = z[n] * (xcos1[n] + j * xsin1[n]) ; */
y[n] = (zr[n] * xcos1[n] - zi[n] * xsin1[n]) + j * (zi[n] * xcos1[n] + zr[n] * xsin1[n]) ;
}
where
zr[n] = real(z[n]) ;
zi[n] = imag(z[n]) ;
and xcos1[n] and xsin1[n] are as defined in step 2 above.
5) Windowing and de-interleaving step.
Compute windowed time-domain samples x[n]:
Pseudo code
for(n=0; n<N/8; n++)
{
x[2*n] = -yi[N/8+n] * w[2*n] ;
x[2*n+1] = yr[N/8-n-1] * w[2*n+1] ;
x[N/4+2*n] = -yr[n] * w[N/4+2*n] ;
x[N/4+2*n+1] = yi[N/4-n-1] * w[N/4+2*n+1] ;
x[N/2+2*n] = -yr[N/8+n] * w[N/2-2*n-1] ;
x[N/2+2*n+1] = yi[N/8-n-1] * w[N/2-2*n-2] ;
x[3*N/4+2*n] = yi[n] * w[N/4-2*n-1] ;
x[3*N/4+2*n+1] = -yr[N/4-n-1] * w[N/4-2*n-2] ;
}
where
yr[n] = real(y[n]);
yi[n] = imag(y[n]) ;
w[n] is the transform window sequence (see Table 7.33).
— 89 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
6) Overlap and add step.
The first half of the windowed block is overlapped with the second half of the
previous block to produce PCM samples (the factor of 2 scaling undoes headroom scaling
performed in the encoder):
Pseudo code
for(n=0; n<N/2; n++)
{
pcm[n] = 2 * (x[n] + delay[n]) ;
delay[n] = x[N/2+n) ;
}
Note that the arithmetic processing in the overlap/add processing must use
saturation arithmetic to prevent overflow (wraparound). Since the output signal consists
of the original signal plus coding error, it is possible for the output signal to exceed 100%
level even though the original input signal was less than or equal to 100% level.
[Link] 256-sample IMDCT transforms
The following equations should be used for computing the inverse transforms in
the case of blksw = 1, indicating the presence of a transient and two 256 sample transforms
(N below still equals 512).
1) Define the MDCT transform coefficients = X[k], k=0,1,...N/2.
Pseudo code
for(k=0; k<N/4; k++)
{
X1[k] = X[2*k] ;
X2[k] = X[2*k+1] ;
}
2) Pre-IFFT complex multiply step.
Compute N/8-point complex multiplication products Z1(k) and Z2(k),
k=0,1,...N/8-1.
Pseudo code
for(k=0; k<N/8; k++)
{
/* Z1[k] = (X1[N/4-2*k-1] + j * X1[2*k]) * (xcos2[k] + j * xsin2[k]); */
Z1[k]=(X1[N/4-2*k-1]*xcos2[k]-X1[2k]*xsin2[k])+j*(X1[2*k]*xcos2[k]+X1[N/4-2*k-1]*xsin2[k]) ;
/* Z2[k] = (X2[N/4-2*k-1] + j * X2[2*k]) * (xcos2[k] + j * xsin2[k]) ; */
Z2[k]=(X2[N/4-2*k-1]*xcos2[k]-X2[2*k]*xsin2[k])+j*(X2[2*k]*xcos2[k]+X2[N/4-2*k-1]*xsin2[k]) ;
}
where
xcos2[k] = -cos(2pi*(8*k+1)/(4*N)), xsin2(k) = -sin(2pi*(8*k+1)/(4*N))
3) Complex IFFT step.
— 90 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Compute N/8-point complex IFFTs of Z1[k] and Z2[k] to generate complex-
valued sequences z1[n] and z2[n].
Pseudo code
for(n=0; n<N/8; n++)
{
z1[n] = 0. ;
z2[n] = 0. ;
for(k=0; k<N/8; k++)
{
z1[n] + = Z1[k] * (cos(16*pi*k*n/N) + j * sin(16*pi*k*n/N)) ;
z2[n] + = Z2[k] * (cos(16*pi*k*n/N) + j * sin(16*pi*k*n/N)) ;
}
}
4) Post-IFFT complex multiply step:
Compute N/8-point complex multiplication products y1[n] and y2[n], n=0,1,...N/8-1.
Pseudo code
for(n=0; n<N/8; n++)
{
/* y1[n] = z1[n] * (xcos2[n] + j * xsin2[n]) ; */
y1[n] = (zr1[n] * xcos2[n] - zi1[n] * xsin2[n]) + j * (zi1[n] * xcos2[n] + zr1[n] * xsin2[n]) ;
/* y2[n] = z2[n] * (xcos2[n] + j * xsin2[n]) ; */
y2[n] = (zr2[n] * xcos2[n] - zi2[n] * xsin2[n]) + j * (zi2[n] * xcos2[n] + zr2[n] * xsin2[n]) ;
}
where
zr1[n] = real(z1[n]) ;
zi1[n] = imag(z1[n]) ;
zr2[n] = real(z2[n]) ;
zi2[n] = imag(z2[n]) ;
and xcos2[n] and xsin2[n] are as defined in step 2 above.
5) Windowing and de-interleaving step.
Compute windowed time-domain samples x[n
].
Pseudo code
for(n=0; n<N/8; n++)
{
x[2*n] = -yi1[n] * w[2*n] ;
x[2*n+1] = yr1[N/8-n-1] * w[2*n+1] ;
x[N/4+2*n] = -yr1[n] * w[N/4+2*n] ;
x[N/4+2*n+1] = yi1[N/8-n-1] * w[N/4+2*n+1] ;
x[N/2+2*n] = -yr2[n] * w[N/2-2*n-1] ;
x[N/2+2*n+1] = yi2[N/8-n-1] * w[N/2-2*n-2] ;
x[3N/4+2*n] = yi2[n] * w[N/4-2*n-1] ;
x[3N/4+2*n+1] = -yr2[N/8-n-1] * w[N/4-2*n-2] ;
}
where
yr1[n] = real(y1[n]) ;
yi1[n] = imag(y1[n]) ;
— 91 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
yr2[n] = real(y2[n]) ;
yi2[n] = imag(y2[n]) ;
and w[n] is the transform window sequence (see Table 7.33).
Table 7.33 Transform Window Sequence (w[addr]),
Where addr = (10 * A) + B
B=0 B=1 B=2 B=3 B=4 B=5 B=6 B=7 B=8 B=9
A=0 0.00014 0.00024 0.00037 0.00051 0.00067 0.00086 0.00107 0.00130 0.00157 0.00187
A=1 0.00220 0.00256 0.00297 0.00341 0.00390 0.00443 0.00501 0.00564 0.00632 0.00706
A=2 0.00785 0.00871 0.00962 0.01061 0.01166 0.01279 0.01399 0.01526 0.01662 0.01806
A=3 0.01959 0.02121 0.02292 0.02472 0.02662 0.02863 0.03073 0.03294 0.03527 0.03770
A=4 0.04025 0.04292 0.04571 0.04862 0.05165 0.05481 0.05810 0.06153 0.06508 0.06878
A=5 0.07261 0.07658 0.08069 0.08495 0.08935 0.09389 0.09859 0.10343 0.10842 0.11356
A=6 0.11885 0.12429 0.12988 0.13563 0.14152 0.14757 0.15376 0.16011 0.16661 0.17325
A=7 0.18005 0.18699 0.19407 0.20130 0.20867 0.21618 0.22382 0.23161 0.23952 0.24757
A=8 0.25574 0.26404 0.27246 0.28100 0.28965 0.29841 0.30729 0.31626 0.32533 0.33450
A=9 0.34376 0.35311 0.36253 0.37204 0.38161 0.39126 0.40096 0.41072 0.42054 0.43040
A=10 0.44030 0.45023 0.46020 0.47019 0.48020 0.49022 0.50025 0.51028 0.52031 0.53033
A=11 0.54033 0.55031 0.56026 0.57019 0.58007 0.58991 0.59970 0.60944 0.61912 0.62873
A=12 0.63827 0.64774 0.65713 0.66643 0.67564 0.68476 0.69377 0.70269 0.71150 0.72019
A=13 0.72877 0.73723 0.74557 0.75378 0.76186 0.76981 0.77762 0.78530 0.79283 0.80022
A=14 0.80747 0.81457 0.82151 0.82831 0.83496 0.84145 0.84779 0.85398 0.86001 0.86588
A=15 0.87160 0.87716 0.88257 0.88782 0.89291 0.89785 0.90264 0.90728 0.91176 0.91610
A=16 0.92028 0.92432 0.92822 0.93197 0.93558 0.93906 0.94240 0.94560 0.94867 0.95162
A=17 0.95444 0.95713 0.95971 0.96217 0.96451 0.96674 0.96887 0.97089 0.97281 0.97463
A=18 0.97635 0.97799 0.97953 0.98099 0.98236 0.98366 0.98488 0.98602 0.98710 0.98811
A=19 0.98905 0.98994 0.99076 0.99153 0.99225 0.99291 0.99353 0.99411 0.99464 0.99513
A=20 0.99558 0.99600 0.99639 0.99674 0.99706 0.99736 0.99763 0.99788 0.99811 0.99831
A=21 0.99850 0.99867 0.99882 0.99895 0.99908 0.99919 0.99929 0.99938 0.99946 0.99953
A=22 0.99959 0.99965 0.99969 0.99974 0.99978 0.99981 0.99984 0.99986 0.99988 0.99990
A=23 0.99992 0.99993 0.99994 0.99995 0.99996 0.99997 0.99998 0.99998 0.99998 0.99999
A=24 0.99999 0.99999 0.99999 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
A=25 1.00000 1.00000 1.00000 1.00000 1.00000 1.00000
6) Overlap and add step.
The first half of the windowed block is overlapped with the second half of the
previous block to produce PCM samples (the factor of 2 scaling undoes headroom scaling
performed in the encoder):
Pseudo code
for(n=0; n<N/2; n++)
{
pcm[n] = 2 * (x[n] + delay[n]) ;
delay[n] = x[N/2+n] ;
}
— 92 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Note that the arithmetic processing in the overlap/add processing must use
saturation arithmetic to prevent overflow (wrap around). Since the output signal consists
of the original signal plus coding error, it is possible for the output signal to exceed 100%
level even though the original input signal was less than or equal to 100% level.
7.9.5 Channel gain range code
When the signal level is low, the dynamic range of the decoded audio is typically
limited by the wordlength used in the transform computation. The use of longer
wordlength improves dynamic range but increases cost, as the wordlength of both the
arithmetic units and the working RAM must be increased. In order to allow the
wordlength of the transform computation to be reduced, the AC-3 bit stream includes a
syntactic element gainrng[ch] . This 2-bit element exists for each encoded block for each
channel.
The gainrng element is a value in the range of 0-3. The value is an indication of the
maximum sample level within the coded block. Each block represents 256 new audio
samples and 256 previous audio samples. Prior to the application of the 512 point
window, the maximum absolute value of the 512 PCM values is determined. Based on the
maximum value within the block, the value of gainrng is set as indicated below:
Maximum absolute value (max) gainrng
max ≥ 0.5 0
0.5 > max ≥ 0.25 1
0.25 > max ≥ 0.125 2
0.125 > max 3
If the encoder does not perform the step of finding the maximum absolute value
within each block then the value ofgainrng should be set to 0.
The decoder may use the value of gainrng to pre-scale the transform coefficients
prior to the transform and to post-scale the values after the transform. With careful design,
the post-scaling process can be performed right at the PCM output stage allowing a 16-bit
output buffer RAM to provide 18-bit dynamic range audio.
7.10 Error detection
There are several ways in which the AC-3 data may determine that errors are
contained within a frame of data. The decoder may be informed of that fact by the
transport system which has delivered the data. The data integrity may be checked using
the embedded CRCs. Also, some simple consistency checks on the received data can
indicate that errors are present. The decoder strategy when errors are detected is user
definable. Possible responses include muting, block repeats, or frame repeats. The amount
of error checking performed, and the behavior in the presence of errors are not specified in
this standard, but are left to the application and implementation.
— 93 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
7.10.1 CRC checking
Each AC-3 frame contains two 16-bit CRC words. crc1 is the second 16-bit word
of the frame, immediately following the sync word. crc2 is the last 16-bit word of the
frame, immediately preceding the sync word of the following frame. crc1 applies to the first
5/8 of the frame, not including the sync word. crc2 provides coverage for the last 3/8 of the
frame as well as for the entire frame (not including the sync word). Decoding of CRC
word(s) allows errors to be detected.
The following generator polynomial is used to generate each of the 16-bit CRC
words: x16 + x15 + x2 + 1.
The 5/8 of a frame is defined in Table 7.34, and may be calculated by:
5/8_framesize = truncate(framesize ÷ 2) + truncate(framesize ÷ 8) ;
or
5/8_framesize = (int) (framesize>>1) + (int) (framesize>>3) ;
where framesize is in units of 16-bit words. Table 7.34 shows the value of 5/8 of the frame
size as a function of AC-3 bit-rate and audio sample rate.
The CRC calculation may be implemented by one of several standard techniques.
A convenient hardware implementation is a linear feedback shift register (LFSR). An
example of an LFSR circuit for the above generator polynomial is the following:
b0 b1 + b2 b3 b b + b +
13 14 15
u(x)
Checking for valid CRC with the above circuit consists of resetting all registers to
zero, and then shifting the AC-3 data bits serially into the circuit in the order in which they
appear in the data stream. The sync word is not covered by either CRC (but is included in
the indicated 5/8_framesize) so it should not be included in the CRC calculation. crc1 is
considered valid if the above register contains all zeros after the first 5/8 of the frame has
been shifted in. If the calculation is continued until all data in the frame has been shifted
through, and the value is again equal to zero, then crc2 is considered valid. Some decoders
may choose to only check crc2, and not check for a valid crc1 at the 5/8 point in the frame.
If crc1 is invalid, it is possible to reset the registers to zero and then check crc2. If crc2 then
checks, then the last 3/8 of the frame is probably error free. This is of little utility however,
since if errors are present in the initial 5/8 of a frame it is not possible to decode any audio
from the frame even if the final 3/8 is error free.
Note that crc1 is generated by encoders such that the CRC calculation will produce
zero at the 5/8 point in the frame. It is not the value generated by calculating the CRC of
the first 5/8 of the frame using the above generator polynomial. Therefore, decoders
should not attempt to save crc1, calculate the CRC for the first 5/8 of the frame, and then
compare the two.
— 94 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Table 7.34 5/8_frame Size Table; Number of Words
in the First 5/8 of the Frame
frmsizecod nominal bit-rate fs = 32 kHz fs = 44.1 kHz fs = 48 kHz
5/8_framesize 5/8_framesize 5/8_framesize
‘000000’ (0) 32 kbps 60 42 40
‘000001’ (0) 32 kbps 60 43 40
‘000010’ (1) 40 kbps 75 53 50
‘000011’ (1) 40 kbps 75 55 50
‘000100’ (2) 48 kbps 90 65 60
‘000101’ (2) 48 kbps 90 65 60
‘000110’ (3) 56 kbps 105 75 70
‘000111’ (3) 56 kbps 105 76 70
‘001000’ (4) 64 kbps 120 86 80
‘001001’ (4) 64 kbps 120 87 80
‘001010’ (5) 80 kbps 150 108 100
‘001011’ (5) 80 kbps 150 108 100
‘001100’ (6) 96 kbps 180 130 120
‘001101’ (6) 96 kbps 180 130 120
‘001110’ (7) 112 kbps 210 151 140
‘001111’ (7) 112 kbps 210 152 140
‘010000’ (8) 128 kbps 240 173 160
‘010001’ (8) 128 kbps 240 173 160
‘010010’ (9) 160 kbps 300 217 200
‘010011’ (9) 160 kbps 300 217 200
‘010100’ (10) 192 kbps 360 260 240
‘010101’ (10) 192 kbps 360 261 240
‘010110’ (11) 224 kbps 420 303 280
‘010111’ (11) 224 kbps 420 305 280
‘011000’ (12) 256 kbps 480 347 320
‘011001’ (12) 256 kbps 480 348 320
‘011010’ (13) 320 kbps 600 435 400
‘011011’ (13) 320 kbps 600 435 400
‘011100’ (14) 384 kbps 720 521 480
‘011101’ (14) 384 kbps 720 522 480
‘011110’ (15) 448 kbps 840 608 560
‘011111’ (15) 448 kbps 840 610 560
‘100000’ (16) 512 kbps 960 696 640
‘100001’ (16) 512 kbps 960 696 640
‘100010’ (17) 576 kbps 1080 782 720
‘100011’ (17) 576 kbps 1080 783 720
‘100100’ (18) 640 kbps 1200 870 800
‘100101’ (18) 640 kbps 1200 871 800
— 95 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Syntactical block size restrictions within each frame (enforced by encoders),
guarantee that blocks 0 and 1 are completely covered by crc1. Therefore, decoders may
immediately begin processing block 0 when the 5/8 point in the data frame is reached. This
may allow smaller input buffers in some applications. Decoders that are able to store an
entire frame may choose to process only crc2. These decoders would not begin processing
block 0 of a frame until the entire frame is received.
7.10.2 Checking bit stream consistency
It is always possible that an AC-3 frame could have valid sync information and
valid CRCs, but otherwise be undecodable. This condition may arise if a frame is
corrupted such that the CRC word is nonetheless valid, or in the case of an encoder error
(bug). One safeguard against this is to perform some error checking tests within the AC-3
decoder and bit stream parser. Despite its coding efficiency, there are some redundancies
inherent in the AC-3 bit stream. If the AC-3 bit stream contains errors, a number of illegal
syntactical constructions are likely to arise. Performing checks for these illegal constructs
will detect a great many significant error conditions.
The following is a list of known bit stream error conditions. In some
implementations it may be important that the decoder be able to benignly deal with these
errors. Specifically, decoders may wish to ensure that these errors do not cause reserved
memory to be overwritten with invalid data, and do not cause processing delays by
looping with illegal loop counts. Invalid audio reproduction may be allowable, so long as
system stability is preserved.
1) (blknum == 0) &&
(cplstre == 0) ;
2) (cplinu == 1) &&
(no channels in coupling) ;
3) (cplinu == 1) &&
(cplbegf > (cplendf+2)) ;
4) (cplinu == 1) &&
((blknum == 0) || (previous cplinu == 0)) &&
(chincpl[n] == 1) &&
(cplcoe[n] == 0) ;
5) (blknum == 0) &&
(acmod == 2) &&
(rematstr == 0) ;
6) (cplinu == 1) &&
((blknum == 0) || (previous cplinu == 0)) &&
(cplexpstr == 0) ;
7) (cplinu == 1) &&
((cplbegf != previous cplbegf) || (cplendf != previous cplendf)) &&
(cplexpstr == 0) ;
8) (blknum == 0) &&
(chexpstr[n] == 0) ;
— 96 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
9) (cplinu == 1) &&
(cplbegf != previous cplbegf) &&
(chincpl[n] == 1) &&
(chexpstr[n] == 0) ;
10) (blknum == 0) &&
(lfeon == 1) &&
(lfeexpstr == 0) ;
11) (chincpl[n] == 0) &&
(chbwcod[n] > 60) ;
12) (blknum == 0) &&
(baie == 0) ;
13) (blknum == 0) &&
(snroffste == 0) ;
14) (blknum == 0) &&
(cplinu == 1) &&
(cplleake == 0) ;
15) (cplinu == 1) &&
(expanded length of cpl delta bit allocation > 50) ;
16) expanded length of delta bit allocation[n] > 50 ;
17) compositely coded 5-level exponent value > 124 ;
18) compositely coded 3-level mantissa value > 26 ;
19) compositely coded 5-level mantissa value > 124 ;
20) compositely coded 11-level mantissa value > 120 ;
21) bit stream unpacking continues past the end of the frame ;
Note that some of these conditions (such as #17 through #20) can only be tested
for at low-levels within the decoder software, resulting in a potentially significant MIPS
impact. So long as these conditions do not affect system stability, they do not need to be
specifically prevented.
— 97 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
8. ENCODING THE AC-3 BIT STREAM
8.1 Introduction
This section provides some guidance on AC-3 encoding. Since AC-3 is specified
by the syntax and decoder processing, the encoder is not precisely specified. The only
normative requirement on the encoder is that the output elementary bit stream follow
AC-3 syntax. Encoders of varying levels of sophistication may be produced. More
sophisticated encoders may offer superior audio performance, and may make operation at
lower bit-rates acceptable. Encoders are expected to improve over time. All decoders will
benefit from encoder improvements. The encoder described in this section, while basic in
operation, provides good performance. The description which follows indicates several
avenues of potential improvement. A flow diagram of the encoding process is shown in
Figure 8.1.
8.2 Summary of the encoding process
8.2.1 Input PCM
[Link] Input word length
The AC-3 encoder accepts audio in the form of PCM words. The internal dynamic
range of AC-3 allows input wordlengths of up to 24 bits to be useful.
[Link] Input sample rate
The input sample rate must be locked to the output bit rate so that each AC-3 sync
frame contains 1536 samples of audio. If the input audio is available in a PCM format at a
different sample rate than that required, sample rate conversion must be performed to
conform the sample rate.
[Link] Input filtering
Individual input channels may be high-pass filtered. Removal of DC components of
signals can allow more efficient coding since data rate is not used up encoding DC.
However, there is the risk that signals which do not reach 100% PCM level before high-
pass filtering will exceed 100% level after filtering, and thus be clipped. A typical encoder
would high-pass filter the input signals with a single pole filter at 3 Hz.
The lfe channel should be low-pass filtered at 120 Hz. A typical encoder would
filter the lfe channel with an 8th order elliptic filter with a cutoff frequency of 120 Hz.
— 98 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Input PCM
blksw flags
Transient Detect
Forward Transform
cplg strat
Coupling Strategy
Form Coupling
Channel
rematflgs
Rematrixing
Extract Exponents
expstrats
Exponent Strategy
dithflgs
Dither Strategy
Encoded Spectral Envelope
Encode Exponents
Mantissas
Normalize Mantissas
bitalloc params baps Quantize
Core Bit Allocation
Mantissas
Pack AC-3 Frame
Side Information Main Information
Output Frame
Figure 8.1. Flow diagram of the encoding process.
— 99 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
8.2.2 Transient detection
Transients are detected in the full-bandwidth channels in order to decide when to
switch to short length audio blocks to improve pre-echo performance. High-pass filtered
versions of the signals are examined for an increase in energy from one sub-block time-
segment to the next. Sub-blocks are examined at different time scales. If a transient is
detected in the second half of an audio block in a channel, that channel switches to a short
block. A channel that is block-switched uses the D45 exponent strategy.
The transient detector is used to determine when to switch from a long transform
block (length 512), to the short block (length 256). It operates on 512 samples for every
audio block. This is done in two passes, with each pass processing 256 samples. Transient
detection is broken down into four steps: 1) high-pass filtering, 2) segmentation of the
block into submultiples, 3) peak amplitude detection within each sub-block segment, and
4) threshold comparison. The transient detector outputs a flag blksw[n] for each full-
bandwidth channel, which when set to "one" indicates the presence of a transient in the
second half of the 512 length input block for the corresponding channel.
1) High-pass filtering: The high-pass filter is implemented as a cascaded biquad direct
form II IIR filter with a cutoff of 8 kHz.
2) Block Segmentation: The block of 256 high-pass filtered samples are segmented into a
hierarchical tree of levels in which level 1 represents the 256 length block, level 2 is two
segments of length 128, and level 3 is four segments of length 64.
3) Peak Detection: The sample with the largest magnitude is identified for each segment
on every level of the hierarchical tree. The peaks for a single level are found as follows:
P[j][k] = max(x(n))
for n = (512 × (k-1) / 2^j), (512 × (k-1) / 2^j) + 1, ...(512 × k / 2^j) - 1
and k = 1, ..., 2^(j-1) ;
where: x(n) = the nth sample in the 256 length block
j = 1, 2, 3 is the hierarchical level number
k = the segment number within level j
Note that P[j][0], (i.e., k=0) is defined to be the peak of the last segment on level j
of the tree calculated immediately prior to the current tree. For example, P[3][4] in the
preceding tree isP[3][0] in the current tree.
4) Threshold Comparison: The first stage of the threshold comparator checks to see if
there is significant signal level in the current block. This is done by comparing the overall
peak value P[1][1] of the current block to a “silence threshold”. If P[1][1] is below this
threshold then a long block is forced. The silence threshold value is 100/32768. The next
stage of the comparator checks the relative peak levels of adjacent segments on each level
of the hierarchical tree. If the peak ratio of any two adjacent segments on a particular level
exceeds a pre-defined threshold for that level, then a flag is set to indicate the presence of
a transient in the current 256 length block. The ratios are compared as follows:
mag(P[j][k]) × T[j] > mag(P[j][(k-1)])
where: T[j] is the pre-defined threshold for level j, defined as:
T[1] = .1
T[2] = .075
T[3] = .05
— 100 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
If this inequality is true for any two segment peaks on any level, then a transient is
indicated for the first half of the 512 length input block. The second pass through this
process determines the presence of transients in the second half of the 512 length input
block.
8.2.3 Forward transform
[Link] Windowing
The audio block is multiplied by a window function to reduce transform boundary
effects and to improve frequency selectivity in the filter bank. The values of the window
function are included in Table 7.33 on page 92. Note that the 256 coefficients given are
used back-to-back to form a 512-point symmetrical window.
[Link] Time to frequency transformation
Based on the block switch flags, each audio block is transformed into the
frequency domain by performing one long N=512 point transform, or two short N=256
point transforms. Let x[n] represent the windowed input time sequence. The output
frequency sequence, XD[k] is defined by:
−
-2 N 1 2π π
XD[k] =
N
∑ x[n] cos
4N
(2n + 1)(2k + 1) + (2k + 1)(1 + α )
4
n=0 for 0 ≤ k < N/2
-1 for the first short transform
where α = 0 for the long transform
+1 for the second short transform
8.2.4 Coupling strategy
[Link] Basic encoder
For a basic encoder, a static coupling strategy may be employed. Suitable coupling
parameters are:
cplbegf = 6 ; /* coupling starts at 10.2 kHz */
cplendf = 12 ; /* coupling channel ends at 20.3 kHz */
cplbndstrc = 0, 0, 1, 1, 0, 1, 1, 1;
cplinu = 1; /* coupling always on */
/* all non-block switched channels are coupled */
for(ch=0; ch<nfchans; ch++) if(blksw[ch]) chincpl[ch] = 0; else chincpl[ch] = 1;
Coupling coordinates for all channels may be transmitted for every other block, i.e.
blocks 0, 2, and 4. During blocks 1, 3, and 5, coupling coordinates are reused.
— 101 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
[Link] Advanced encoder
More advanced encoders may make use of dynamically variable coupling
parameters. The coupling frequencies may be made variable based on bit demand and on a
psychoacoustic model which compares the audibility of artifacts caused by bit starvation
vs. those caused by the coupling process. Channels with a rapidly time varying power level
may be removed from coupling. Channels with slowly varying power levels may have their
coupling coordinates sent less often. The coupling band structure may be made dynamic.
8.2.5 Form coupling channel
[Link] Coupling channel
The most basic encoder can form the coupling channel by simply adding all of the
individual channel coefficients together, and dividing by 8. The division by 8 prevents the
coupling channel from exceeding a value of 1. Slightly more sophisticated encoders can
alter the sign of individual channels before adding them into the sum so as to avoid phase
cancellations.
[Link] Coupling coordinates
Coupling coordinates are formed by taking power ratios within of each coupling
band. The power in the original channel within a coupling band is divided by the power in
the coupling channel within the coupling band. This power ratio becomes the coupling
coordinate. The coupling coordinates are converted to floating point format and
quantized. The exponents for each channel are examined to see if they can be further
scaled by 3, 6, or 9. This generates the 2-bit master coupling coordinate for that channel.
(The master coupling coordinates allow the dynamic range represented by the coupling
coordinate to be increased.)
8.2.6 Rematrixing
Rematrixing is active only in the 2/0 mode. Within each rematrixing band, power
measurements are made on the L, R, L+R, and L-R signals. If the maximum power is
found in the L or R channels, the rematrix flag is not set for that band. If the maximum
power is found in the L+R or L-R signal, then the rematrix flag is set. When the rematrix
flag for a band is set, the encoder codes L+R and L-R instead of L and R. Rematrixing is
described in Section 7.5 on page72.
8.2.7 Extract exponents
The binary representation of each frequency coefficient is examined to determine
the number of leading zeros. The number of leading zeroes (up to a maximum of 24)
becomes the initial exponent value. These exponents are extracted and the exponent sets
(one for each block for each channel, including the coupling channel) are used to
determine the appropriate exponent strategies.
— 102 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
8.2.8 Exponent strategy
For each channel, the variation in exponents over frequency and time is examined.
If the exponents indicate a relatively flat spectrum, an exponent strategy such as D25 or
D45 may be used. If the spectrum is very tonal, then a high spectral resolution exponent
strategy such as D15 or D25 would be used. If the spectrum changes little over the 6
blocks in a frame, the exponents may be sent only for block 0, and reused for blocks 1-5.
If the exponents are changing rapidly during the frame, exponents may be sent for block 0
and for those blocks which have exponent sets which differ significantly from the
previously sent exponents. There is a tradeoff between fine frequency resolution, fine time
resolution, and the number of bits required to send exponents. In general, when operating
at very low bit rates, it is necessary to trade off time vs. frequency resolution.
In a basic encoder a simple algorithm may be employed. First, look at the variation
of exponents over time. When the variation exceeds a threshold new exponents will be
sent. The exponent strategy used is made dependent on how many blocks the new
exponent set is used for. If the exponents will be used for only a single block, then use
strategy D45. If the new exponents will be used for 2 or 3 blocks, then use strategy D25.
If the new exponents will be used for 4,5, or 6 blocks, use strategy D15.
8.2.9 Dither strategy
The encoder controls, on a per channel basis, whether coefficients which will be
quantized to zero bits will be reproduced with dither. The intent is to maintain
approximately the same energy in the reproduced spectrum even if no bits are allocated to
portions of the spectrum. Depending on the exponent strategy, and the accuracy of the
encoded exponents, it may be beneficial to defeat dither for some blocks.
A basic encoder can implement a simple dither strategy on a per channel basis.
When blksw[ch] is 1, defeat dither for that block and for the following block.
8.2.10 Encode exponents
Based on the selected exponent strategy, the exponents of each exponent set are
preprocessed. D25 and D45 exponent strategies require that a single exponent be shared
over more than one mantissa. The exponents will be differentially encoded for
transmission in the bit stream. The difference between successive raw exponents does not
necessarily produce legal differential codes (maximum value of ±2) if the slew rate of the
raw exponents is greater than that allowed by the exponent strategy. Preprocessing adjusts
exponents so that transform coefficients that share an exponent have the same exponent
and so that differentials are legal values. The result of this processing is that some
exponents will have their values decreased, and the corresponding mantissas will have
some leading zeroes.
The exponents are differentially encoded to generate the encoded spectral
envelope. As part of the encoder processing, a set of exponents is generated which is
equal to the set of exponents which the decoder will have when it decodes the encoded
spectral envelope.
— 103 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
8.2.11 Normalize mantissas
Each channel's transform coefficients are normalized by left shifting each
coefficient the number of times given by its corresponding exponent to create normalized
mantissas. The original binary frequency coefficients are left shifted according to the
exponents which the decoder will use. Some of the normalized mantissas will have leading
zeroes. The normalized mantissas are what are quantized.
8.2.12 Core bit allocation
A basic encoder may use the core bit allocation routine with all parameters fixed at
nominal default values.
sdcycod = 2 ;
fdcycod = 1 ;
sgaincod = 1 ;
dbpbcod = 2 ;
floorcod = 4 ;
cplfgaincod = 4 ;
fgaincod[ch] = 4 ;
lfegaincod = 4 ;
cplsnroffst = fsnroffst[ch] = lfesnroffst = fineoffset ;
Since the bit allocation parameters are static, they are only sent during block 0.
Delta bit allocation is not used, so deltbaie = 0. The core bit allocation routine (described in
Section 7.2 on page 50) is run, and the coarse and fine SNR offsets are adjusted until all
available bits in the frame are used up. The coarse SNR offset adjusts in 6 dB increments,
and the fine offset adjusts in 3/8 dB increments. Bits are allocated globally from a common
bit pool to all channels. The combination of csnroffst and fineoffset are chosen which uses the
largest number of bits without exceeding the frame size. This involves an iterative process.
When, for a given iteration, the number of bits exceeds the pool, the SNR offset is
decreased for the next iteration. On the other hand, if the allocation is less than the pool,
the SNR offset is increased for the next iteration. When the SNR offset is at its maximum
without causing the allocation to exceed the pool, the iterating is complete. The result of
the bit allocation routine are the final values of csnroffst and fineoffset , and the set of bit
allocation pointers (baps). The SNR offset values are included in the bit stream so that the
decoder does not need to iterate.
8.2.13 Quantize mantissas
The baps are used by the mantissa quantization block. There is a bap for each
individual transform coefficient. Each normalized mantissas is quantized by the quantizer
indicated by the corresponding bap. Asymmetrically quantized mantissas are quantized by
rounding to the number of bits indicated by the corresponding bap. Symmetrically
quantized mantissas are quantized through the use of a table lookup. Mantissas with baps
of 1, 2, and 4 are grouped into triples or duples.
— 104 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
8.2.14 Pack AC-3 frame
All of the data is packed into the encoded AC-3 frame. Some of the quantized
mantissas are grouped together and coded by a single codeword. The output format is
dependent on the application. The frame may be output in a burst, or delivered as a serial
data stream at a constant rate.
— 105 —
ATSC Digital Audio Compression (AC-3) Standard 20 Dec 95
Blank Page
— 106 —
ATSC Digital Audio Compression (AC-3) Standard (Annex A) 20 Dec 95
ANNEX A
(Normative)
AC-3 ELEMENTARY STREAMS IN AN MPEG-2 MULTIPLEX
1. SCOPE
This Annex contains specifications on how to combine one or more AC-3
elementary streams into an MPEG-2 “Transport Stream” or “Program Stream” (ISO/IEC
13818-1). Applications which reference this specification may need to specify precisely the
values of some of the parameters described in this Annex.
2. INTRODUCTION
The AC-3 elementary bit stream is included in an MPEG-2 multiplex bit stream in
much the same way an MPEG-1 audio stream would be included. The AC-3 bit stream is
packetized into PES packets. An MPEG-2 multiplex bit stream containing AC-3
elementary streams must meet all audio constraints described in the STD model in Section
3.6. It is necessary to unambiguously indicate that an AC-3 stream is, in fact, an AC-3
stream (and not an MPEG audio stream). The MPEG-2 standard does not explicitly
indicate codes to be used to indicate an AC-3 stream. Also, the MPEG-2 standard does
not have an audio descriptor adequate to describe the contents of the AC-3 bit stream in
the PSI tables.
The AC-3 audio access unit (AU) or presentation unit (PU) is an AC-3 sync frame.
The AC-3 sync frame contains 1536 audio samples. The duration of an AC-3 access (or
presentation) unit is 32 ms for audio sampled at 48 kHz, approximately 34.83 ms for audio
sampled at 44.1 kHz, and 48 ms for audio sampled at 32 kHz.
The items which need to be specified in order to include AC-3 within the MPEG-2
bit stream are: stream_type , stream_id , registration descriptor, and AC-3 audio descriptor.
The use of the ISO 639 language descriptor is optional. Some constraints are placed on
the PES layer for the case of multiple audio streams intended to be reproduced in exact
sample synchronism.
3. DETAILED SPECIFICATION
3.1 Stream_type
The preferred value of stream_type for AC-3 is 0x81. Other values which MPEG
has assigned as “User Private” may be used also. If the value of 0x81 is used, depending
on the particular application, the decoder may assume that stream_type 0x81 indicates AC-3
audio. If the potential for ambiguity exists, the AC-3 registration descriptor should be
included (see Annex A, Section 3.3).
— 107 —
ATSC Digital Audio Compression (AC-3) Standard (Annex A) 20 Dec 95
3.2 Stream_id
3.2.1 Transport stream
For transport streams, the value of stream_id in the PES header shall be 0xBD
(indicating private_stream_1 ). Multiple AC-3 streams may share the same value of stream_id
since each stream is carried with a unique PID value. The mapping of values of PID to
stream_type is indicated in the transport stream Program Map Table (PMT).
3.2.2 Program stream
In program streams, the stream_id is intended to specify the type and number of the
elementary stream. Multiple AC-3 elementary streams can not use a common value of
stream_id ; unique values are required. If a single AC-3 elementary stream is carried in a
program stream, stream_id may use the value 0xBD (indicating private_stream_1 ). ISO/IEC
13818-1 does not provide values of stream_id suitable for identifying multiple AC-3
elementary streams. If multiple AC-3 elementary streams are carried in a program stream,
stream_id shall use the values 110x xxxx, where x xxxx indicates a stream number with a
value of 0-31. This value for stream_id is identical to the value used for MPEG-1 or
MPEG-2 audio. Confusion between MPEG audio and AC-3 audio may be avoided by a
Program Stream Map, which associates values of stream_id with values of stream_type .
Streams which use a stream_id of 110x xxxx are clearly identified as to the type of audio
coding employed by the value ofstream_type which is linked to the each value ofstream_id .
3.3 Registration descriptor
The syntax of the AC-3 registration descriptor is shown in Table 1. If the
stream_type value used for AC-3 is not 0x81, then the AC-3 registration descriptor shall be
included in the TS_program_map_section (for transport streams) or the program_stream_map
(for program streams). If the stream_type value used for AC-3 is 0x81, the AC-3
registration may be included optionally (it should be included if there is any chance of
ambiguity).
Table 1 AC-3 Registration Descriptor
Syntax No. of bits Mnemonic
registration_descriptor() {
descriptor_tag 8 uimsbf
descriptor_length 8 uimsbf
format_identifier 32 uimsbf
}
descriptor_tag — 0x05.
descriptor_length— 0x04.
format_identifier — The AC-3 format_identifier is 0x41432D33 (“AC-3”).
— 108 —
ATSC Digital Audio Compression (AC-3) Standard (Annex A) 20 Dec 95
3.4 AC-3 audio descriptor
The AC-3 audio descriptor, shown in Table 2, allows information about individual
AC-3 elementary streams to be included in the program specific information (PSI) tables.
This information is useful to allow the appropriate AC-3 stream(s) to be directed to the
audio decoder. Note that horizontal lines in the table indicate allowable termination points
for the descriptor.
Table 2 AC-3 Audio Descriptor Syntax
Syntax No. of Mnemonic
bits
audio_stream_descriptor( ) {
descriptor_tag 8 uimsbf
descriptor_length 8 uimsbf
sample_rate_code 3 bslbf
bsid 5 bslbf
bit_rate_code 6 bslbf
surround_mode 2 bslbf
bsmod 3 bslbf
num_channels 4 bslbf
full_svc 1 bslbf
langcod 8 bslbf
if(num_channels==0) /* 1+1 mode */
langcod2 8 bslbf
if(bsmod<2) {
mainid 3 uimsbf
reserved 5 bslbf
}
else asvcflags 8 bslbf
textlen 7 uimsbf
text_code 1 bslbf
for(i=0; i<M; i++) {
text[i] 8 bslbf
}
for(i=0; i<N; i++) {
additional_info[i] N×8 bslbf
}
}
— The preferred value for the AC-3 descriptor tag is 0x81. Other values
descriptor_tag
which MPEG has assigned as User Private may also be used.
— This is an 8-bit field specifying the number of bytes of the descriptor
descriptor_length
immediately followingdescriptor_length field.
sample_rate_code — This is a 3-bit field which indicates the sample rate of the encoded
audio. The indication may be of one specific sample rate, or may be of a set of values
which include the sample rate of the encoded audio. See Table 3.
— 109 —
ATSC Digital Audio Compression (AC-3) Standard (Annex A) 20 Dec 95
Table 3 Sample Rate Code Table
sample_rate_code Sample rate
‘000’ 48 kHz
‘001’ 44.1 kHz
‘010’ 32 kHz
‘011’ reserved
‘100’ 48 kHz or 44.1 kHz
‘101’ 48 kHz or 32 kHz
‘110’ 44.1 kHz or 32 kHz
‘111’ 48 kHz or 44.1 kHz or 32 kHz
bsid— This is a 5-bit field which is set to the same value as the bsid field in the AC-3
elementary stream.
— This is a 6-bit field. The lower 5 bits indicate a nominal bit rate. The MSB
bit_rate_code
indicates whether the indicated bit rate is exact (MSB=0) or an upper limit (MSB=1). See
Table 4.
Table 4 Bit Rate Code Table
bit_rate_code exact bit rate bit_rate_code bit rate upper limit
‘000000’ (0.) 32 kbps ‘100000’ (32.) 32 kbps
‘000001’ (1.) 40 kbps ‘100001’ (33.) 40 kbps
‘000010’ (2.) 48 kbps ‘100010’ (34.) 48 kbps
‘000011’ (3.) 56 kbps ‘100011’ (35.) 56 kbps
‘000100’ (4.) 64 kbps ‘100100’ (36.) 64 kbps
‘000101’ (5.) 80 kbps ‘100101’ (37.) 80 kbps
‘000110’ (6.) 96 kbps ‘100110’ (38.) 96 kbps
‘000111’ (7.) 112 kbps ‘100111’ (39.) 112 kbps
‘001000’ (8.) 128 kbps ‘101000’ (40.) 128 kbps
‘001001’ (9.) 160 kbps ‘101001’ (41.) 160 kbps
‘001010’ (10.) 192 kbps ‘101010’ (42.) 192 kbps
‘001011’ (11.) 224 kbps ‘101011’ (43.) 224 kbps
‘001100’ (12.) 256 kbps ‘101100’ (44.) 256 kbps
‘001101’ (13.) 320 kbps ‘101101’ (45.) 320 kbps
‘001110’ (14.) 384 kbps ‘101110’ (46.) 384 kbps
‘001111’ (15.) 448 kbps ‘101111’ (47.) 448 kbps
‘010000’ (16.) 512 kbps ‘110000’ (48.) 512 kbps
‘010001’ (17.) 576 kbps ‘110001’ (49.) 576 kbps
‘010010’ (18.) 640 kbps ‘110010’ (50.) 640 kbps
dsurmod — This is a 2-bit field which may be set to the same value as the dsurmod field in
the AC-3 elementary stream, or which may be set to ‘00’ (not indicated). See Table 5.
— 110 —
ATSC Digital Audio Compression (AC-3) Standard (Annex A) 20 Dec 95
Table 5 dsurmod Table
surround_mode Meaning
‘00’ not indicated
‘01’ NOT Dolby Surround encoded
‘10’ Dolby Surround encoded
‘11’ reserved
bsmod — This is a 3-bit field which is set to the same value as the bsmod field in the AC-3
elementary stream.
num_channels — This is a 4-bit field which indicates the number of channels in the AC-3
elementary stream. When the MSB is 0, the lower 3 bits are set to the same value as the
acmod field in the AC-3 elementary stream. When the MSB field is 1, the lower 3 bits
indicate the maximum number of encoded audio channels (counting the LFE channel as 1).
If the value of acmod in the AC-3 elementary stream is ‘000’ (1+1 mode), then the value of
num_channels shall be set to ‘0000’. See Table 6.
Table 6 num_channels Table
num_channels audio coding mode num_channels number of encoded
(acmod) channels
‘0000’ 1+1 ‘1000’ 1
‘0001’ 1/0 ‘1001’ ≤2
‘0010’ 2/0 ‘1010’ ≤3
‘0011’ 3/0 ‘1011’ ≤4
‘0100’ 2/1 ‘1100’ ≤5
‘0101’ 3/1 ‘1101’ ≤6
‘0110’ 2/2 ‘1110’ reserved
‘0111’ 3/2 ‘1111’ reserved
full_svc — This is a 1-bit field which indicates whether or not this audio service is a full
service suitable for presentation, or whether this audio service is only a partial service
which should be combined with another audio service before presentation. This bit should
be set to a ‘1’ if this audio service is sufficiently complete to be presented to the listener
without being combined with another audio service (for example, a visually impaired
service which contains all elements of the program; music, effects, dialogue, and the visual
content descriptive narrative). This bit should be set to a ‘0’ if the service is not
sufficiently complete to be presented without being combined with another audio service
(e.g., a visually impaired service which only contains a narrative description of the visual
program content and which needs to be combined with another audio service which
contains music, effects, and dialogue).
langcod — This is an 8-bit field which is set to the same value as the langcod field in the
AC-3 elementary stream. A value of 0x00 indicates that the language is unknown or not
indicated.
— 111 —
ATSC Digital Audio Compression (AC-3) Standard (Annex A) 20 Dec 95
langcod2— This is an 8-bit field which is set to the value of the langcod2 field in the AC-3
elementary stream. This field indicates the language of the audio contained in the second
mono audio channel (1+1 mode only).
mainid — This is a 3-bit field which contains a number in the range 0-7 which identifies a
main audio service. Each main service should be tagged with a unique number. This value
is used as an identifier to link associated services with particular main services.
asvcflags — This is an 8-bit field. Each bit (0-7) indicates with which main service(s) this
associated service is associated. The left most bit, bit 7, indicates whether this associated
service may be reproduced along with main service number 7. If the bit has a value of 1,
the service is associated with main service number 7. If the bit has a value of 0, the service
is not associated with main service number 7.
textlen — This is an unsigned integer which indicates the length, in bytes, of a descriptive
text field which follows.
text_code — This is a 1-bit field which indicates how the following text field is encoded. If
this bit is a ‘1’, the text is encoded as 1-byte characters using the ISO Latin-1 alphabet
(ISO 8859-1). If this bit is a ‘0’, the text is encoded with 2-byte unicode characters.
text[i] — The text field may contain a brief textual description of the audio service.
— This is a set of additional bytes filling out the remainder of the
additional_info[j]
descriptor. The purpose of these bytes is not currently defined. This field is provided to
allow the descriptor to be extended in the future.
3.5 ISO_639_language_code
The ISO_639_language_code descriptor allows a stream to be tagged with the 24-bit
ISO 639 language code. The AC-3 bit stream and the AC-3 audio descriptor both contain
an (identical) 8-bit language code which is adequate for most applications. Additional use
of the ISO_639_language_code descriptor is thus redundant. If the ISO_639_language_descriptor
is included in the TS_program_map_section (for transport streams) or the program_stream_map
(for program streams), then the audio_type field of this descriptor shall have a value of 0x00
(undefined).
3.6 STD audio buffer size
For an MPEG-2 transport stream, the T-STD model defines the main audio buffer
size (BSn) as:
BSn = BSmux + BSdec + BSoh
where
BSmux = 736 bytes
BSoh = PES header overhead
BSdec = access unit buffer.
— 112 —
ATSC Digital Audio Compression (AC-3) Standard (Annex A) 20 Dec 95
MPEG-2 specifies a fixed value for BSn (3584 bytes) and indicates that any excess
buffer may be used for additional multiplexing.
When an AC-3 elementary stream is carried by an MPEG-2 transport stream, the
transport stream shall be compliant with a main audio buffer size of:
BSn = BSmux + BSpad + BSdec
where
BSmux = 736 bytes
BSpad = 64 bytes
BSdec may be found in Table 5.13 of ATSC Standard A/52 (for the case of 44.1
kHz sample rate, the larger of the two values shown shall be used). The 64 bytes in BSpad
are available for BSoh and additional multiplexing. This constraint makes it possible to
implement decoders with the minimum possible memory buffer.
Applications which employ program streams should specify appropriate
constraints.
4. PES CONSTRAINTS
4.1 Encoding
In some applications, the audio decoder may be capable of simultaneously
decoding two elementary streams containing different program elements, and then
combining the program elements into a complete program. Most of the program elements
are found in the main audio service. Another program element (such as a narration of the
picture content intended for the visually impaired listener) may be found in the associated
audio service. In this case the audio decoder may sequentially decode audio frames (or
audio blocks) from each elementary stream and do the combining (mixing together) on a
frame or (block) basis. In order to have the audio from the two elementary streams
reproduced in exact sample synchronism, it is necessary for the original audio elementary
stream encoders to have encoded the two audio program elements frame synchronously;
i.e., if audio stream 1 has sample 0 of frame n taken at time t0, then audio stream 2 should
also have frame n beginning with its sample 0 taken the identical time t0. If the encoding
of multiple audio services is done frame and sample synchronous, and decoding is intended
to be frame and sample synchronous, then the PES packets of these audio services shall
contain identical values of PTS which refer to the audio access units intended for
synchronous decoding.
Audio services intended to be combined together for reproduction shall be
encoded at an identical sample rate.
4.2 Decoding
If audio access units from two audio services which are to be simultaneously
decoded have identical values of PTS indicated in their corresponding PES headers, then
— 113 —
ATSC Digital Audio Compression (AC-3) Standard (Annex A) 20 Dec 95
the corresponding audio access units shall be presented to the audio decoder for
simultaneous synchronous decoding. Synchronous decoding means that for corresponding
audio frames (access units), corresponding audio samples are presented at the identical
time.
If the PTS values do not match (indicating that the audio encoding was not frame
synchronous) then the audio frames (access units) of the main audio service shall be
presented to the audio decoder for decoding and presentation at the time indicated by PTS.
An associated service which is being simultaneously decoded should have its audio frames
(access units), which are in closest time alignment (as indicated by PTS) to those of the
main service being decoded, presented to the audio decoder for simultaneous decoding. In
this case the associated service may be reproduced out of sync by as much as 1/2 of a
frame time. (This is typically satisfactory; a visually impaired narration does not require
highly precise timing.)
4.3 Byte-alignment
The AC-3 elementary stream shall be byte-aligned within the MPEG-2 data stream.
This means that the initial 8 bits of an AC-3 frame shall reside in a single byte which is
carried by the MPEG-2 data stream.
— 114 —
ATSC Digital Audio Compression (AC-3) Standard (Annex B) 20 Dec 95
ANNEX B
(Informative)
AC-3 DATA STREAM IN IEC958 INTERFACE
1. SCOPE
This Annex provides specifications for one method to incorporate one or more
AC-3 data streams and time stamps into a single bit stream consistent with the physical
and logical IEC958 interface specification.
2. INTRODUCTION
It is advantageous to have standardized methods to route AC-3 elementary
streams. While this annex specifies one particular interface method, other organizations
(e.g., AES, EIA, IEEE, ISO/IEC, SMPTE) may also document suitable interface
standards. Parties who require a standardized AC-3 elementary stream interconnection are
encouraged to investigate the status of standards development in other organizations.
The IEC958 standard specifies a widely used method of interconnecting digital
audio equipment with two channels of linear PCM audio. This Annex specifies a way in
which the IEC958 interface may be used in order to convey AC-3 elementary streams.
Such an interface can facilitate the interconnection of consumer and professional
equipment which may be capable of working with either linear PCM or AC-3 encoded
audio signals. In the consumer area, cost savings result from being able to do two
functions with a single connector (or IC pin) rather than requiring multiple connectors
(and IC pins). In the professional area, the method allows some existing equipment which
is capable of recording linear PCM to also record AC-3 bit streams. This Annex also
specifies how to include time stamps which indicate the absolute time at which the
encoded audio samples were taken. The time stamps are encoded as values of SMPTE
time code. Conversion of these values to values of MPEG-2 PTS is possible. The methods
specified in this Annex, which allow one or more AC-3 elementary streams and time
stamps to be conveyed within an IEC958 data stream, could easily be extended to allow
other types of data to also be conveyed.
3. BASIC PARAMETERS OF IEC958:1989 INTERFACE
The logical format of the IEC958 interface consists of a sequence of sub-frames.
Each sub-frame is intended to convey one linear PCM sample, and contains 32 time slots,
each of which (excluding the four time slots used for synchronization purposes) can carry
a single bit of information. The specified usage of the 32 time slots is shown in Table 1.
A pair of sub-frames, each containing the PCM word of one audio channel, make
up an IEC958 frame containing two PCM words, one from channel 1 and one from
channel 2. A sequence of 192 frames makes up a block. The 192 channel status bits for
each channel during a block make up the 192 bit (24 byte) channel status word for that
— 115 —
ATSC Digital Audio Compression (AC-3) Standard (Annex B) 20 Dec 95
channel. The channel status word contains information such as sampling frequency and
emphasis, as well as other information about the nature of the included audio data.
Table 1 IEC958 Sub-frame
Time Slots Bits Specified Contents
0-3 - Sync preamble.
4-7 4 Aux data (or LSB’s of 24 bit linear PCM word).
8 - 27 20 Linear PCM word of up to 20 bits. MSB in slot 27.
28 1 Validity flag. Set to ‘0’ if PCM word is reliable, otherwise set to ‘1’.
29 1 User data. Default value of ‘0’.
30 1 One bit of channel status word.
31 1 Parity bit. Set so time slots 4 to 31 inclusive have an even number of
ones and zeros.
The AC-3 (or other) data streams to be conveyed are formed into data bursts, each
consisting of a 64-bit preamble containing information about the burst followed by a data
payload. Data bursts are tagged with a number indicating to which data stream they
belong. Up to eight different data streams may be time multiplexed together to form a set
of data bit streams.
The data is placed in time slots 12-27, which are normally used to carry linear
16-bit PCM words. This location allows some recording equipment to record and
playback either linear 16-bit PCM audio, or encoded data streams. In the consumer
(S/PDIF) application, both sub-frames (Ch1, Ch2) are simultaneously employed to carry
32-bit data words (32-bit mode). This allows the consumer IEC958 bit stream to convey
either 2-channel linear PCM audio, or a set of alternate bit streams, but not both
simultaneously. In the professional (AES/EBU) application, the sub-frames of each
channel may be used together (as in the consumer application) to carry 32-bit data words,
or they may be used independently with one or both sub-frames carrying 16-bit data words
(16-bit mode). This allows the professional IEC958 bit stream to simultaneously convey:
two linear PCM channels; or one linear PCM channel and one set of data bit streams; or
two sets of data bit streams.
The electrical specification of IEC958 is used without modification.
4. DETAILED SPECIFICATION
This section contains the detailed specification as to how data is placed into the
IEC958 logical format. Specifications cover both the consumer application (S/PDIF, bit 0
of channel status equals 0), and the professional application (AES/EBU, bit 0 of channel
status equals 1). For the professional application, two modes are available: 32-bit and
16-bit. The consumer application may employ only the 32-bit mode.
4.1 Channel status word
The primary bit of interest in the channel status word is bit 1 which indicates
whether the sub-frame contains PCM audio or data. This bit should be set to a value of ‘1’
— 116 —
ATSC Digital Audio Compression (AC-3) Standard (Annex B) 20 Dec 95
to indicate digital data. Consumer application equipment may use the value of this bit to
determine whether or not to interpret the IEC958 signal as stereo PCM audio, or digital
data. In some cases, it is desirable for decoders to have the capability to ignore this bit and
attempt to automatically detect whether the sub-frame contains data or PCM audio. For
example, if an IEC958 stream containing data (instead of PCM audio) is recorded on, and
played back from, a media designed for stereo PCM, it is possible that the output IEC958
stream would incorrectly be indicated (in bit 1 of channel status) to contain PCM audio. In
this case, the IEC958 stream would not be fully compliant with this specification. If this
data stream is interpreted by equipment as containing PCM audio, improper results would
be obtained. Section 5 of this annex contains suggestions on implementation of an
autodetect function. The auto detection function should be implemented in professional
equipment, and may be implemented optionally in consumer equipment.
4.1.1 Channel status word — consumer application
The first bit of the channel status word is set to ‘0’ to indicate consumer use. In
order to indicate that this IEC958 data stream does not contain PCM audio data, the
digital data bit of the channel status word shall be used. The channel status word is made
identical for channel 1 and channel 2. When the IEC958 interface conveys AC-3 data, the
bit assignment of channel status shall be as shown in Table 2.
Table 2 Channel Status Bits
Bit(s) Value Comments
bit 0 ‘0’ Consumer use of channel status block
bit 1 ‘1’ Digital data
bit 2 --- Copyright indication.
No deviation from IEC958
bits 3,4,5 ‘0’,’0’,’0’ The current IEC958 specification shows these
bits = 0, with the values of 1 as “reserved”.
bits 6,7 ‘0’,’0’ Mode 0
bits 8-191 --- No deviation from IEC958
bits 24-27 --- Shall indicate audio sampling frequency
4.1.2 Channel status word — professional application
In some professional applications it may be acceptable to not implement the
channel status word. In this case, all channel status bits should be set to ‘0’, and the
receiving equipment shall default to 48 kHz sample rate. If the actual sample rate deviates
from 48 kHz the burden falls to the receiver or the human operator to properly deal with
that fact.
If the channel status word is implemented, the first bit is set to ‘1’ to indicate
professional use. In order to indicate that one or both channels of this IEC958 data stream
do not contain PCM audio data, the audio/non-audio bits of the channel status words
should be used. When one of the channels of the IEC958 interface conveys AC-3 data, the
bit assignment of the channel status word for byte 0 of that channel shall be as shown in
— 117 —
ATSC Digital Audio Compression (AC-3) Standard (Annex B) 20 Dec 95
Table 3, where mandatory indicates that the bit(s) must be set as shown; and
recommended indicates that the bit(s) should be set as shown, but that some applications
may be workable with other settings. (For example, when replaying a recording of the bit
stream, some equipment may automatically set bit 1 to a value of ‘0’ to indicate audio
data, even though the recording actually contains AC-3 data. In this case the burden falls
to the receiver or the human operator to determine whether the data is linear PCM or
AC-3.)
Table 3 Channel Status Bits in Byte 0
Bit(s) Value Mandatory/ Comments
Recommended
bit 0 ‘1’ Mandatory Professional use of channel status block.
bit 1 ‘1’ Recommended Non-audio mode.
bits 2-4 ‘000’ Recommended Emphasis not indicated.
(Not used by decoder.)
bit 5 ‘0’ Recommended Sampling frequency locked.
bits 6,7 -- Recommended Indicates sampling frequency per IEC958.
4.2 Placement of data into sub-frames
The method to place the data into the IEC958 bit stream is to format the data to be
transmitted into data blocks, and to send each block in a continuous sequence, or burst, of
IEC958 frames. The potential length of any individual data burst is limited to 65535 bits.
In between data bursts, the IEC958 frame contents may be set to all 0’s, and will ignored
by the receiver. Each data burst contains a 64-bit preamble consisting of a sync code, an
indicator of the burst length, and information about the type of data contained in the burst.
The coding of the preamble allows the time division multiplexing of up to 8 different data
streams (a data set) into one IEC958 stream.
4.2.1 32-bit mode
In the 32-bit mode, data is placed into time slots 12-27 (16 bits) of the sub-frames.
The sub-frames of both channels are used to form a single 32-bit word data channel. In
this case the interface may convey either two linear PCM audio channels, or data, but not
both simultaneously. Each frame can carry 32 bits of data.
Considering the data payload as a serial stream of bits, the first bit of the payload
in a burst would occupy the MSB of sub-frame 1 (time slot 27), and the 32’nd bit would
occupy the LSB (or what would be the LSB for 16-bit PCM audio) of sub-frame 2 (time
slot 12). The next 32 bits of the data burst would occupy the next frame. The last data bits
of the data burst might occupy only a fraction of the last frame. Any unused bits in the last
frame will be ignored by the receiver.
— 118 —
ATSC Digital Audio Compression (AC-3) Standard (Annex B) 20 Dec 95
4.2.2 16-bit mode
The 16-bit mode is only specified for use in the professional application. In the 16-
bit mode, data is placed into time slots 12-27 (16 bits) of the sub-frames. Each sub-frame
can carry a 16-bit word of data. The sub-frames of Ch1 are considered a separate data
channel from the sub-frames of Ch2. Each of these channels may convey either a single
channel of linear PCM, or data, but not both simultaneously. Typically, the sub-frames for
one channel will be used to carry a single channel of linear PCM, while the sub-frames for
the other channel will convey data.
Considering the data payload as a serial stream of bits, the first bit of the payload
in a burst would occupy the MSB of the sub-frame (time slot 27), and the 16’th bit would
occupy the LSB (or what would be the LSB for 16-bit PCM audio) of the sub-frame (time
slot 12). The next 16 bits of the data burst would occupy the next sub-frame of the same
channel. The last data bits of the data burst might occupy only a fraction of the last sub-
frame. Any unused bits in the last sub-frame will be ignored by the receiver.
4.3 Validity flag
The validity flag in time slot 28 may be used to indicate individual 16-bit words of
data which are thought to be in error. If the data source believes a particular 16-bit word
contained in a sub-frame contains an error, the validity bit may be set to ‘1’. Otherwise
this bit shall be set to a ‘0’ which indicates valid data. The use of this bit by receivers is
optional.
4.4 Coding of preamble
The data to be transmitted is formed into bursts of data. A 64-bit preamble is
added to the beginning of each burst. The remainder of the burst is then the data payload.
The preamble occupies 16 bits in each of 4 sub-frames. The preamble is considered to be
four 16-bit words designated as Pa, Pb, Pc, Pd. The contents of these four words are
specified in Table 4. When placed into a sub-frame, the MSB of a 16-bit preamble word is
placed into time slot 27, and the LSB is placed into time slot 12. The combination of Pa
and Pb form a 32-bit sync code. This allows a receiver to find the preamble with a very
small probability of mis-detection.
Table 4 Preamble Words
Preamble word Contents
Pa 16 bit sync word 1 = 0xF872
Pb 16 bit sync word 2 = 0x4E1F
Pc 16 bit burst_info value.
Pd 16 bit length_code (unsigned integer), equal to the
number of data bits in the following data burst
— 119 —
ATSC Digital Audio Compression (AC-3) Standard (Annex B) 20 Dec 95
4.4.1 32-bit mode
The 4 preamble words are contained in 2 sequential frames. The frame beginning
the data burst contains preamble word Pa in the Ch1 sub-frame, and Pb in the Ch2 sub-
frame. The next frame containsPc in Ch1 andPd in Ch2.
4.4.2 16-bit mode
The 4 preamble words are contained in 4 sequential sub-frames of the individual
channel (Ch1 or Ch2) being employed to convey the AC-3 data stream. The sub-frame (of
the channel being used) beginning the data burst contains preamble word Pa, the next sub-
frame (of the channel) in the burst contains
Pb, etc.
4.4.3 burst_info
The 16-bit burst_info value contains information about the data which will be found
in the burst. The contents of burst_info is specified in Table 5. Bit 15 of burst_info is
considered the MSB.
Table 5 burst_info
Bit(s) Value
0-4 data_type (5-bit unsigned integer = 0-31)
5-6 Reserved (shall be set to ‘00’)
7 error_flag 1 indicates data burst may contain errors,
0 indicates data may be valid
8-12 data_type_dependent
13-15 data_stream_number
[Link] data_type
The 5-bit data_type field indicates what type of data, (AC-3, time stamp, etc.) will
be found in the [Link] values ofdata_type are defined in this [Link] Table 6.
Table 6 Values of data_type
Value Meaning
0 Null data
1 AC-3 data
2 Time Stamp
3-31 Reserved
[Link] Reserved bits
Bits 5 and 6 are reserved. These bits shall be set to a value of ‘00’. Receivers of
this data stream may ignore the contents of these bits.
— 120 —
ATSC Digital Audio Compression (AC-3) Standard (Annex B) 20 Dec 95
[Link] error_flag
The error_flag bit is available to indicate if the contents of the burst contains data
errors. If a data burst is thought to be error free, or if the data source does not know if the
data contains errors, then the value of this bit shall be set to a ‘0’. If the data source does
know that a particular data burst contains some errors this bit may be set to a ‘1’. The use
of this bit by receivers is optional.
[Link] data_type_dependent
The data_type_dependent field contains 5 bits whose meaning is intended to be
dependent on the value ofdata_type.
[Link] data_stream_number
The 3-bit data_stream_number indicates to which virtual data stream the burst
belongs. Eight codes (0-7) are available so that up to eight independent data streams (each
of any assigned data type) may be carried in the IEC958 data stream in a time multiplex.
Each independent data stream shall use a unique value for data_stream_type .
In the consumer application the following constraints shall apply. If a single data
stream is carried, the value of data_stream_number shall be 0. If a set of data streams are
carried, one of the streams shall have a data_stream_number of 0. If a receiver is only
capable of selecting and processing a single data stream, it shall receive and process
data_stream_number 0. Stream 0 thus has the highest priority, and should carry the most
important data. The MSB of the 3-bit stream number is placed in bit number 15.
4.4.4 length_code
The length_code indicates the length of the data payload in bits, from 0 to 65535.
The size of the preamble is not counted in the value of
length_code.
4.5 Burst spacing
In order to facilitate the implementation of the autodetection function (see Section
5) there is one requirement on burst spacing. There shall not be a sequence of 4096 or
more IEC958 frames which contain at least one data burst, without the beginning of at
least one of the data bursts preceded by two IEC958 frames which have sub-frame
contents in time slots 12-27 of all 0’s. Since the sub-frame contents of time slots 12-27 are
set to all zeros between data bursts, this requirement is automatically met unless there are
sequences of data bursts so tightly packed that there is never (in a span of 4096 IEC958
frames) a sequence of 2 all-zero frames preceding any burst.
4.6 The null data_type
A null data type is provided so that the preamble sync codes may be inserted
occasionally into the data stream. This could potentially enhance reliable autodetection of
whether or not the sub-frame contains PCM audio or digital data.
— 121 —
ATSC Digital Audio Compression (AC-3) Standard (Annex B) 20 Dec 95
The null burst data_type has a value of 0x0. In a null data burst, the length_code ,
error_flag , and data_type_dependent values shall all be set to ‘0’. The data_stream_number shall
be set to 0x7.
If the burst frequency of the data being conveyed is low, or the interface is idle (no
data to convey) there may be long periods of inactivity which may be autodetected as
PCM silence. Placement of null data bursts allows sync codes to be detected, allowing an
autodetector to realize that the sub-frame contents should be considered to be data and
not PCM audio. Thus use of null data bursts is optional.
4.7 The AC-3 data_type
When AC-3 data is conveyed, data_type has a value of 0x1. In this case, the value
of data_type_dependent shall be as shown in Table 7.
Table 7 Values of data_type_dependentWhen data_type = 1
burst_info data_type_dependent Meaning
bit number bit number
8-10 0-2 Value of bsmod in AC-3 elementary stream
11-12 3-4 Reserved, shall be set to ‘00’
The AC-3 syntactical element bsmod is a 3-bit field. The left-most bit of this value
is placed in burst_info bit number 10. Receivers may ignore the contents of the reserved
bits.
4.7.1 Placement of AC-3 frames into data bursts
The AC-3 data stream consists of a sequence of AC-3 sync frames. Each AC-3
sync frame represents 1536 encoded audio samples. AC-3 sync frame boundaries occur at
a frequency of exactly once every 1536 IEC958 frames. Each burst of AC-3 data shall
contain one complete AC-3 sync frame. The length of the AC-3 data burst will depend on
the encoded bit rate (which determines the AC-3 sync frame length). The data bursts
containing AC-3 sync frames shall occur at a regular rate, with each AC-3 burst beginning
1536 IEC958 frames after the beginning of the preceding AC-3 burst (of the same
data_stream_number ).
It is possible for this interface to simultaneously convey multiple AC-3 data
streams. One of the applications of this capability would be to convey both a main audio
service and an associated audio service. In this case, it is important to identify which of the
time-sequential AC-3 bursts represent audio encoded during the same time interval. In the
case that a main audio service and an associated audio service are placed into this
interface, the burst of the associated service shall occur prior to the burst of the main
audio service with which it is associated. The main service audio burst shall have its
data_stream_number set to ‘0’.
— 122 —
ATSC Digital Audio Compression (AC-3) Standard (Annex B) 20 Dec 95
4.7.2 Symbol frequency
When the IEC958 data stream conveys linear PCM audio, the symbol frequency is
64 times the PCM sampling frequency (32 time slots per PCM sample, times 2 channels).
When AC-3 data is conveyed by the interface, the symbol frequency shall be 64 times the
sampling rate of the AC-3 encoded audio. When more than one coded AC-3 bit stream is
transmitted through the same interface, the audio sampling frequencies shall be identical.
In the consumer application, bits 24-27 of the channel status word shall indicate the
sampling frequency, as specified in IEC958.
4.8 The time stamp data_type
Time stamps are useful in applications where time information must be kept closely
associated with encoded audio data (see Figure 1). An example of this would be in a
digital audio/video transmission system where both audio and video sources have SMPTE
time code. When the audio and video are digitally compressed it is useful if each output
compressed bit stream contains the original SMPTE time code information. When a time
stamp is included in this interface, its value applies to the single coded audio access unit
which immediately follows.
Values of SMPTE time code occur only once per picture frame, and thus have a
resolution in their value of approximately 33 ms (for 30 Hz frame rate). Audio samples
occur much more frequently, approximately once every 21 µs (48 kHz sample rate). The
AC-3 audio access units occur every 32 ms (48 kHz sample rate). It would be desirable
for the time stamp to precisely indicate the time of the first audio sample contained in each
audio access unit, but this is not practical due to the coarse nature of the source of the
timing information (SMPTE time code). The method adopted here is to let the time stamp
contain both a SMPTE time code value, and an indicator as to the audio sample within the
following audio access unit to which the time code value applies. Since, in general, there is
not an exact integer relationship between the frequency of time code values and the
frequency of audio samples, there will always be an inherent ambiguity of exactly when in
the audio stream the exact time code value is valid, since the exact point of validity will
typically be between two audio samples. Depending on the precise time code frame rate,
and the audio access unit frequency (which depends on the audio sample rate), it is
possible for all audio samples within a single audio access unit to be between two
sequential time code values. In this case, the time stamp cannot point to a sample in the
audio access unit, but must point to a sample in the following audio access unit. It is also
possible for two time code values to occur within a single audio access unit. In this case,
the time code value which applies to the earliest sample in the access unit shall be used. It
should be recognized that the time stamp values will inherently have a small amount of
jitter. The sources of the jitter will be: the inherent + or - 1 sample ambiguity as to which
sample the time code value applies; bandwidth limitations in some sources of linear time
code; and interrupt latencies in some hardware implementations. In some applications
(such as converting the time stamp values to values of MPEG-2 PTS), this jitter may have
to be removed by subsequent processing equipment.
— 123 —
ATSC Digital Audio Compression (AC-3) Standard (Annex B) 20 Dec 95
Audio Signals
to be encoded
IEC 958 signal
Digital Audio Encoder Coded audio + time code
SMPTE
Time Code
Figure 1. Encoding audio with time code.
4.8.1 Preamble values
Time stamps are conveyed by data bursts with a data_type value of 0x2. The value
of data_type_dependent shall be set to 0x0 for the payload defined below. (In the future,
other payload types may be defined for different values of data_type_dependent .) The
length_code shall indicate the actual length of the time stamp payload.
4.8.2 Time stamp payload
The time stamp payload, shown in Table 8, has a minimum length of six 16-bit
words which have a defined meaning. Additional 16-bit words may be optionally added,
but the meaning of these words is not specified.
Table 8 Time Stamp Payload
Time Stamp MSB Bit Number LSB
Payload Word 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Usr8, Usr7, Hours [63] [62] [61] [60] [55] [54] [53] [52] [59] [58] H20 H10 H8 H4 H2 H1
1 Usr6, Usr5, Minutes [47] [46] [45] [44] [39] [38] [37] [36] [43] M40 M20 M10 M8 M4 M2 M1
2 Usr4, Usr3, Seconds [31] [30] [29] [28] [23] [22] [21] [20] [27] S40 S20 S10 S8 S4 S2 S1
3 Usr2, Usr1, cf, df, [15] [14] [13] [12] [7] [6] [5] [4] [11] [10] F20 F10 F8 F4 F2 F1
Frames
4 Sample Number S15 S14 S13 S12 S11 S10 S9 S8 S7 S6 S5 S4 S3 S2 S1 S0
5 Reserved, Flags R R R R R R R R R R a3 a2 a1 a0 f1 [10]
Table Entries
[63] Bit number 63 of SMPTE time code word
H20 This bit has a value of 20 hours
R Reserved bit, set to ‘0’
Usr8 The 8th group of user bits in the SMPTE time code word
cf Color frame flag bit
df Drop-frame flag bit
f1 Flag bit number 1
S15 Sample number, bit 15
a3 Frame rate code, bit 3
The first four words contain an hours, minutes, seconds, frame count. Space is
available to carry the user group, color frame flag, drop frame flag, and unassigned bits
— 124 —
ATSC Digital Audio Compression (AC-3) Standard (Annex B) 20 Dec 95
from a SMPTE time code word. Flag bit f1 (in word 5) is set to a ‘1’ if this information
has been copied from a source of SMPTE time code into the upper bits of payload words
0-3. If flag bit f1 is set to a ‘0’, this information has not been provided, and the upper bits
of payload words marked [ ] are all set to ‘0’. The sample number in word 4 is an
unsigned integer which indicates the sample number (Sn in Figure 2) to which the time
code value applies. The sample number does not have to be exactly correct, but should
indicate an audio sample within ±0.5 ms of the ideal value. Word 5 contains 10 reserved
bits (in bits 6-15), a 4-bit frame rate code (a3-a0), the f1 flag bit, and the drop-frame flag
bit (bit 10 of the SMPTE time code word) if the timing source is SMPTE time code. The
drop-frame flag bit is always provided in bit 0 of word 5; its presence in bit 6 of word 3 is
conditional on the value of the f1 flag bit. The meaning of the frame rate code is shown in
Table 9.
Increasing Time
TS TS
AC-3 Frame n AC-3 Frame n+1
n n+1
Sample Sn Sample Sn+1
Time Stamp n contains the Time Stamp n+1 contains the
time information for sample time information for sample
Sn in AC-3 frame n Sn+1 in AC-3 frame n+1
If the value of sample number Sn contained in time
stamp n is >= 1536 (the number of samples in an AC-3
frame), it points to sample 1536-Sn in AC-3 frame n+1
Figure 2. Time stamps and AC-3 frames in the IEC958 data stream.
Table 9 Frame Rate Code
frame rate code frame rate
a3 a2 a1 a0
0 0 0 0 not indicated
0 0 0 1 24 1001 (23.98)
0 0 1 0 24
0 0 1 1 25
0 1 0 0 30 1001 (29.97)
0 1 0 1 30
0 1 1 0 50
0 1 1 1 60 1001 (59.94)
1 0 0 0 60
- - - - reserved
1 1 1 1 reserved
Additional payload words containing arbitrary information may be optionally
provided. The meaning of any additional payload information is not specified. Receivers
should be capable of operating whether or not additional information is present. The
presence of additional information may be determined by the value of the length_code in the
— 125 —
ATSC Digital Audio Compression (AC-3) Standard (Annex B) 20 Dec 95
burst preamble. If the value length_code is 0x0060 then no additional information is present.
If the value oflength_code is greater than 0x0060 then additional information is present.
5. AUTO DETECTION OF AUDIO/DATA MODE
The IEC958 interface can convey either PCM audio, or data. The receiver needs to
know whether the IEC958 information is to be considered PCM audio, or data. This
information is best conveyed by setting bit 1 of the channel status word to indicate data. In
some cases this bit may not be set correctly. In some applications it may be useful for
receivers to be able to determine whether the IEC958 contents are PCM audio or data,
without referring to bit 1 of the channel status word. This may be done quite reliably by
recognizing that the 32-bit sync code formed by the first two 16-bit words of the preamble
(Pa, Pb) are unlikely to occur very often in natural PCM audio (approximately once every
24 hours). By looking for an extended 96-bit sync code consisting of six 16-bit words ( 4
zeros followed by Pa, Pb, or: 0x0000, 0x0000, 0x0000, 0x0000, 0xF872, 0x4E1F) the
probability of a false occurrence of sync will be vanishingly small. The decision process
which may be followed is shown in Figure 3. In this diagram, the mode of the receiver can
be switched between PCM and DATA. The SYNC function is meant to indicate whether,
in a span of 4096 IEC958 frames, the extended 96-bit sync code is found. Note that if the
IEC958 stream goes idle (all zeros), the autodetector will go into PCM mode, and only
switch back to DATA mode when a data burst appears. If this behavior is undesirable, it
can be prevented by inserting null data bursts at least once every 4096 IEC958 frames.
PCM Mode
No
No
SYNC ? Yes DATA Mode SYNC ?
Yes
Figure 3. PCM-DATA auto mode detection.
— 126 —
ATSC Digital Audio Compression (AC-3) Standard (Annex C) 20 Dec 95
ANNEX C
(Informative)
AC-3 KARAOKE MODE
1. SCOPE
This Annex contains specifications for how karaoke aware and karaoke capable
AC-3 decoders should reproduce karaoke AC-3 bit streams. A minimum level of
functionality is defined which allows a karaoke aware decoder to produce an appropriate
2/0 or 3/0 default output when presented with a karaoke mode AC-3 bit stream. An
additional level of functionality is defined for the karaoke capable decoder so that the
listener may optionally control the reproduction of the karaoke bit stream.
2. INTRODUCTION
The AC-3 karaoke mode has been defined in order to allow the multi-channel
AC-3 bit stream to convey audio channels designated as L, R (e.g., 2-channel stereo
music), M (e.g., guide melody), and V1, V2 (e.g., one or two vocal tracks). This Annex
does not specify the contents of L, R, M, V1, and V2, but does specify the behavior of
AC-3 decoding equipment when receiving a karaoke bit stream containing these channels.
An AC-3 decoder which is karaoke capable will allow the listener to optionally reproduce
the V1 and V2 channels, and may allow the listener to adjust the relative levels (mixing
balance) of the M, V1, and V2 channels. An AC-3 decoder which is karaoke aware will
reproduce the L, R, and M channels, and will reproduce the V1 and V2 channels at a level
indicated by the encoded bit stream.
The 2-channel karaoke aware decoder will decode the karaoke bit stream using the
Lo, Ro downmix. The L and R channels will be reproduced out of the left and right
outputs, and the M channel will appear as a phantom center. The precise level of the M
channel is determined by cmixlev which is under control of the program provider. The level
of the V1 and V2 channels which will appear in the downmix is determined by surmixlev ,
which is under control of the program provider. A single V channel (V1 only) will appear
as a phantom center. A pair of V channels (V1 and V2) will be reproduced with V1 in left
output and V2 in right output.
The 5-channel karaoke aware decoder will reproduce the L, R channels out of the
left and right outputs, and the M channel out of the center output. A single V channel (V1
only) will be reproduced in the center channel output. A pair of V channels (V1 and V2)
will be reproduced with V1 in left output and V2 in right output. The level of the V1 and
V2 channels which will appear in the output is determined by surmixlev.
The karaoke capable decoder gives some control of the reproduction to the
listener. The V1, V2 channels may be selected for reproduction independent of the value
of surmixlev in the bit stream. The decoder may optionally allow the reproduction level and
location of the M, V1, and V2 channels to be adjusted by the listener. The detailed
— 127 —
ATSC Digital Audio Compression (AC-3) Standard (Annex C) 20 Dec 95
implementation of the flexible karaoke capable decoder is not specified; it is left up to the
implementation as to the degree of adjustability to be offered to the listener.
3. DETAILED SPECIFICATION
3.1 Karaoke mode indication
AC-3 bit streams are indicated as karaoke type when bsmod = ‘111’ and acmod >=
0x2.
3.2 Karaoke mode channel assignment
The channel assignments for both the normal mode and the karaoke mode are
shown in Table 1.
Table 1 Channel Array Ordering
audio Normal channel Karaoke channel
acmod coding assignment assignment
mode (bsmod ! = ‘111’) (bsmod =‘111’)
‘010’ 2/0 L,R L,R
‘011’ 3/0 L,C,R L,M,R
‘100’ 2/1 L,R,S L,R,V1
‘101’ 3/1 L,C,R,S L,M,R,V1
‘110’ 2/2 L,R,Ls,Rs L,R,V1,V2
‘111’ 3/2 L,C,R,Ls,Rs L,M,R,V1,V2
3.3 Reproduction of karaoke mode bit streams
This section contains the specifications which shall be met by decoders which are
designated as karaoke aware or karaoke capable. The following general equations indicate
how the AC-3 decoder’s output channels, LK, CK, RK, are formed from the encoded
channels L, M, R, V1, V2. Typically, the surround loudspeakers are not used when
reproducing karaoke bit streams.
LK = L + a * V1 + b * V2 + c * M
CK = d * V1 + e * V2 + f * M
RK = R + g * V1 + h * V2 + i * M
3.3.1 Karaoke aware decoders
The values of the coefficients a-i, which are used by karaoke aware decoders, are
given in Table 2. Values are shown for both 2-channel (2/0) and multi-channel (3/0)
reproduction. For each of these situations, a coefficient set is shown for the case of a
single encoded V channel (V1 only) or two encoded V channels (V1, V2). The actual
coefficients used must be scaled downwards so that arithmetic overflow does not occur if
all channels contributing to an output channel happen to be at full scale. Monophonic
— 128 —
ATSC Digital Audio Compression (AC-3) Standard (Annex C) 20 Dec 95
reproduction would be obtained by summing the left and right output channels of the 2/0
reproduction. Any AC-3 decoder will produce the appropriate output if it is set to perform
an Lo, Ro 2-channel downmix.
Table 2 Coefficient Values for Karaoke Aware Decoders
2/0 Reproduction 3/0 Reproduction
Coefficient 1 vocal 2 vocals 1 vocal 2 vocals
a 0.7 * slev slev 0.0 slev
b --- 0.0 --- 0.0
c clev clev 0.0 0.0
d --- --- slev 0.0
e --- --- --- 0.0
f --- --- 1.0 1.0
g 0.7 * slev 0.0 0.0 0.0
h --- slev --- slev
i clev clev 0.0 0.0
3.3.2 Karaoke capable decoders
Karaoke capable decoders allow the user to choose to have the decoder reproduce
none, one, or both of the V channels. The default coefficient values for the karaoke
capable decoder are given in Table 2. When the listener selects to have none, one, or both
of the V channels reproduced, the default coefficients are given in Table 3. Values are
shown for both 2-channel (2/0) and multi-channel (3/0) reproduction, and for the cases of
user selected reproduction of no V channel (None), one V channel (either V1 or V2), or
both V channels (V1+V2). The M channel and a single V channel are reproduced out of
the center output (phantom center in 2/0 reproduction), and a pair of V channels are
reproduced out of the left (V1) and right (V2) outputs. The actual coefficients used must
be scaled downwards so that arithmetic overflow does not occur if all channels
contributing to an output happen to be at full scale.
Table 3 Default Coefficient Values for Karaoke Capable Decoders
2/0 Reproduction 3/0 Reproduction
Coefficient None V1 V2 V1+V2 None V1 V2 V1+V2
a 0.0 0.7 0.0 1.0 0.0 0.0 0.0 1.0
b 0.0 0.0 0.7 0.0 0.0 0.0 0.0 0.0
c clev clev clev clev 0.0 0.0 0.0 0.0
d --- --- --- --- 0.0 1.0 0.0 0.0
e --- --- --- --- 0.0 0.0 1.0 0.0
f --- --- --- --- 1.0 1.0 1.0 1.0
g 0.0 0.7 0.0 0.0 0.0 0.0 0.0 0.0
h 0.0 0.0 0.7 1.0 0.0 0.0 0.0 1.0
i clev clev clev clev 0.0 0.0 0.0 0.0
— 129 —
ATSC Digital Audio Compression (AC-3) Standard (Annex C) 20 Dec 95
Additional flexibility may be offered optionally to the user of the karaoke decoder.
For instance, the coefficients a, d, and g might be adjusted to allow the V1 channel to be
reproduced in a different location and with a different level. Similarly the level and
location of the V2 and M channels could be adjusted. The details of these additional
optional user controls are not specified and are left up to the implementation. Also left up
to the implementation is what use might be made of the Ls, Rs outputs of the 5-channel
decoder, which would naturally reproduce the V1, V2 channels.
— 130 —