MPEG Audio Byte Stream Format

W3C Group Note 23 July 2024

Abstract

This specification defines a Media Source Extensions™ [MEDIA-SOURCE] byte stream format specification based on MPEG audio streams.

Status of This Document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

The working group maintains a list of all bug reports that the editors have not yet tried to address; there may also be related open bugs in the GitHub repository of the Media Source Extensions™ specification.

This document was published by the Media Working Group as a Group Note using the Note track.

This Group Note is endorsed by the Media Working Group, but is not endorsed by W3C itself nor its Members.

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

The W3C Patent Policy does not carry any licensing requirements or commitments on this document.

This document is governed by the 03 November 2023 W3C Process Document.

This specification defines segment formats for implementations of Media Source Extensions™ [MEDIA-SOURCE] that choose to support MPEG audio streams specified in [ISO11172-3], [ISO13818-3], and [ISO14496-3].

It defines the MIME-types (see 2. MIME-types) used to signal codecs, and provides the necessary format specific definitions for initialization segments, media segments, and random access points required by the Byte Stream Formats section of the Media Source Extensions™ specification. This document also defines extra behaviors and state that only apply to this byte stream format.

This section specifies the MIME-types that may be passed to isTypeSupported() or addSourceBuffer() for byte streams that conform to this specification.

"audio/aac" for sequences of ADTS frames, as specified in [ISO14496-3].
"audio/mpeg" for MPEG-1/2/2.5 Layer I/II/III streams, as specified in [RFC3003].

The "codecs" MIME-type parameter MUST NOT be used with these MIME-types.

The format of an MPEG Audio Frame depends on the MIME type used (see 2. MIME-types).

If the "audio/aac" MIME-type is used, an MPEG Audio Frame is a sequence of bytes that conform to the adts_frame() syntax specified in Table 1.A.5 of [ISO14496-3].
If the "audio/mpeg" MIME-type is used, an MPEG Audio Frame is a sequence of bytes that conform to the frame() syntax element specified in Section 2.4.1.2 of [ISO11172-3] or the corresponding definition in [ISO13818-3].

Since [ID3v1], [ID3v2] metadata frames, and Icecast headers are common in existing MPEG audio streams, implementations SHOULD gracefully handle such frames. Zero or more of these metadata frames are allowed to occur before, after, or between MPEG Audio Frame. Minimal implementations MUST accept, consume, and ignore these frames. More advanced implementations MAY choose to expose the metadata information via an inband TextTrack or some other mechanism.

There is no normative spec for Icecast/SHOUTcast headers, just examples. For the purpose of this specification, an Icecast header is defined as beginning with the 4 character sequence "ICY "(U+0049 I, U+0043 C, U+0059 Y, U+0020 SPACE) and ending with a pair of carriage-return line-feed sequences (U+000D CARRIAGE RETURN, U+000A LINE FEED, U+000D CARRIAGE RETURN, U+000A LINE FEED).

Note

Icecast headers are allowed in the byte streams because some Icecast and SHOUTcast servers return a status line that looks like "ICY OK 200" instead of a standard HTTP status line. User-agent network stacks typically interpret this as an HTTP 0.9 response and include the header in the response body. Allowing these headers to appear provides a simple way to interoperate with these servers.

The MPEG audio byte stream is a combination of one or more MPEG Audio Frame and zero or more metadata frames.

Every MPEG Audio Frame is a random access point.
Every MPEG Audio Frame header is an initialization segment.
The coded audio in each MPEG Audio Frame is a media segment.

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MAY, MUST, MUST NOT, and SHOULD in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

[html]: HTML Standard. Anne van Kesteren; Domenic Denicola; Ian Hickson; Philip Jägenstedt; Simon Pieters. WHATWG. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[ID3v1]: ID3 tag version 1. id3.org. URL: https://id3.org/ID3v1
[ID3v2]: ID3 tag version 2.3.0. id3.org. URL: https://id3.org/id3v2.3.0
[ISO11172-3]: Information technology — Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbit/s — Part 3: Audio. ISO/IEC. August 1993. Published. URL: https://www.iso.org/standard/22412.html
[ISO13818-3]: Information technology — Generic coding of moving pictures and associated audio information — Part 3: Audio. ISO/IEC. April 1998. Published. URL: https://www.iso.org/standard/26797.html
[ISO14496-3]: Information technology — Coding of audio-visual objects — Part 3: Audio. ISO/IEC. December 2019. Published. URL: https://www.iso.org/standard/76383.html
[MEDIA-SOURCE]: Media Source Extensions™. Jean-Yves Avenard; Mark Watson. W3C. 4 July 2024. W3C Working Draft. URL: https://www.w3.org/TR/media-source-2/
[RFC2119]: Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc2119
[RFC3003]: The audio/mpeg Media Type. M. Nilsson. IETF. November 2000. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc3003
[RFC8174]: Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words. B. Leiba. IETF. May 2017. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc8174

MPEG Audio Byte Stream Format

Abstract

Status of This Document

1. Introduction

2. MIME-types

3. MPEG Audio Frames

4. Metadata Frames

4.1 Icecast headers

5. Segment Definitions

6. Conformance

A. References

A.1 Normative references