<?xml version="1.0" encoding="utf-8"?>
<?xml-model href="rfc7991bis.rnc"?>
<!DOCTYPE rfc [
<!ENTITY docname "draft-swhited-ogg-stems-05">
]>
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="info" docName="&docname;" ipr="trust200902" obsoletes="" updates="" xml:lang="en" version="3" submissionType="independent">
  <front>
    <title abbrev="Ogg Stem">Ogg Stem Files</title>
    <seriesInfo name="Internet-Draft" value="&docname;"/>
    <author fullname="Sam Whited" initials="ssw" role="editor" surname="Whited">
      <address>
        <email>sam@samwhited.com</email>
        <uri>https://blog.samwhited.com</uri>
      </address>
    </author>
    <date year="2026" month="4" day="1"/>
    <area>General</area>
    <workgroup>Internet Engineering Task Force</workgroup>
    <keyword>audio</keyword>
    <keyword>ogg</keyword>
    <keyword>stems</keyword>
    <keyword>djing</keyword>
    <abstract>
      <t>
        This document defines a multi-track profile of the Ogg container format
        for storing for storing stems for use by DJ applications while remaining
        backwards compatible with existing media players.
      </t>
    </abstract>
  </front>
  <middle>
    <section>
      <name>Introduction</name>
      <t>
          Stem are recordings of individual instruments, or clusters of
          instruments, used by DJs and music producers for live mixing of music.
          Historically stem files have been stored as individual audio files, or
          using patent-encumbered or vendor specific proprietary container
          formats.
          The Ogg file format developed by the Xiph.Org Foundation was formally
          specified in <xref target="RFC3533"/> and <xref target="RFC5334"/> and
          is ideally situated as a container for stems.

          This specification documents a profile for the Ogg container format
          that allows it to store lossless or lossy stems as well as metadata
          about the stems in a single file for use in DJ applications.
        </t>
      <section anchor="requirements">
        <name>Requirements Language</name>
        <t>
          The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>",
          "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>",
          "<bcp14>SHALL NOT</bcp14>", "<bcp14>SHOULD</bcp14>",
          "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>",
          "<bcp14>NOT RECOMMENDED</bcp14>", "<bcp14>MAY</bcp14>", and
          "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
          described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/>
          when, and only when, they appear in all capitals, as shown here.
        </t>
      </section>
    </section>
    <section>
      <name>Requirements</name>
      <t>
        STEM files have a few basic requirements:
      </t>
      <ul spacing="normal">
        <li>Backwards compatibility with existing media players</li>
        <li>The ability to store multiple audio track</li>
        <li>The ability to synchronize playback of multiple audio tracks</li>
        <li>The ability to store file-level or bitstream-level metadata and per-stem metadata</li>
        <li>Backwards compatibility when additional tracks have unknown formats that cannot be decoded</li>
      </ul>
    </section>
    <section>
      <name>Bitstream Layout</name>
      <section>
        <name>Audio Streams</name>
        <t>
          Each stem file may contain an arbitrary number of logical
          bitstreams containing audio and <bcp14>MUST</bcp14> include at least
          three audio streams (the original audio and at least two stems).
          Each stream <bcp14>SHOULD</bcp14> be encoded using the same codec with
          the same parameters including bitrate, channel number, channel layout,
          and sample rate.
        </t>
        <t>
          The first logical bitstream containing audio data <bcp14>MUST</bcp14>
          be the final, post-mix, audio.
          This helps preserve backwards compatibility in media players which do
          not support this format (which typically play the first audio stream
          found).
          The remaining audio logical bitstreams will be individual stems and
          <bcp14>SHOULD</bcp14> have the same effective audio length (after
          calculating offsets from the granule position) as the first logical
          bitstream such that playing each stem stream from the beginning would
          result in the same audio (excluding mastering) as the final mix
          present in the first logical bitstream.
        </t>
        <t>
          For example, if the original logical bitstream is three minutes long
          and the stem file includes a percussion track but the percussion does
          not start until minute two the percussion stem would still be three
          minutes long but would contain a minute of silence at the start of the
          track, or, depending on the codec in use, would contain a two minute
          track with a granule position set to the equivalent of one minute.
        </t>
      </section>
      <section anchor="skeleton">
        <name>Skeleton Track</name>
        <t>
          Ogg Skeleton <xref target="I-D.swhited-ogg-skeleton"/> is a format
          designed to provide structuring information for multi-track Ogg files.
          Each stem file <bcp14>MUST</bcp14> include a Skeleton bitstream which
          <bcp14>SHOULD</bcp14> include keypoint indexes for each stem and the
          main audio file.
        </t>
        <t>
          Each fisbone secondary header packet describing a logical bitstream
          containing a stem track <bcp14>SHOULD</bcp14> set the <tt>role</tt>
          header to the value <tt>audio/stem</tt>.
          Similarly, the fisbone secondary header packet describing the first
          logical bitstream containing the main audio <bcp14>SHOULD</bcp14> set
          the <tt>role</tt> header to <tt>audio/main</tt>.
        </t>
        <t>
          In addition, fisbone headers describing a stem track
          <bcp14>SHOULD</bcp14> set a header with the name <tt>stem_color</tt>
          to a color value in RGB hex format such as <tt>#135374</tt> which
          <bcp14>MAY</bcp14> be used to represent the stem in graphical playback
          software such as DJ control software.
        </t>
      </section>
      <section>
        <name>DSP Metadata</name>
        <t>
          For metadata that applies to all the stems it is not desirable to
          include it in the individual stream metadata blocks for several
          reasons:
        </t>
        <ol spacing="normal">
          <li>
            In the absence of a standard many applications only store
            information on the first stream, but in the case of stems this is
            the one stream to which none of this metadata applies
          </li>
          <li>
            Applications meant for writing general metadata may remove unknown
            values in the first streams metadata
          </li>
          <li>
            Some stem metadata should be associated with all stem streams, but
            not the main mix stream and storing it on every stream is not ideal
          </li>
        </ol>
        <t>
          Similarly, storing this metadata in Skeleton headers
          <xref target="skeleton"/> does not make logical sense as the metadata
          applies to the mix, not to any individual stem track.
        </t>
        <t>
          To work around these limitations stem files store metadata that
          applies to all stems (notably information about configuring a basic
          Digital Signal Processor or DSP) in a separate logical
          bitstream, the first packet of which is structured according
          to the following table:
        </t>
        <table>
          <name>Vorbis comment logical bitstream layout</name>
          <thead>
            <tr>
              <th>Data</th>
              <th>Description</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td>8 bytes</td>
              <td>0x53 0x74 0x65 0x6d 0x4d 0x65 0x74 0x61 ("StemMeta")</td>
            </tr>
            <tr>
              <td>2 bytes</td>
              <td>
                Version number of the metadata logical bitstream (notably this
                is not the version of the metadata stored in the mapping).
                These bytes are 0x01 0x00, meaning version 1.0 of the mapping.
              </td>
            </tr>
          </tbody>
        </table>
        <t>
          The remainder of the logical bitstream comprises a Vorbis comment
          metadata block containing human-readable information coded in
          UTF-8.
          The name "Vorbis comment" points to the fact that the Vorbis codec
          stores such metadata in almost the same way (see
          <xref target="Vorbis"/>).
          A stem file <bcp14>MUST NOT</bcp14> contain more than one Vorbis
          comment metadata block
          The Vorbis comment metadata block is defined to be identical to the
          Vorbis comment metadata block defined in <xref target="RFC9639"/>
          section 8.6, "Vorbis Comment".
        </t>
        <t>
          The Vorbis comment metadata block <bcp14>SHOULD NOT</bcp14> be used
          for arbitrary metadata that is unrelated to stems (ie. a track title
          or author).
          Vendor specific tags <bcp14>MAY</bcp14> be included in the metadata
          block.
          Vendor specific tags in the block <bcp14>SHOULD</bcp14> use a vendor
          specific namespace and <bcp14>MUST NOT</bcp14> prefix their tags with
          "STEM:".
          Specific keys for the Vorbis comment metadata block are defined in
          <xref target="mastering"/>.
        </t>
      </section>
    </section>
    <section>
      <name>Mixing</name>
      <t>
        The stem tracks <bcp14>SHOULD NOT</bcp14> have any gain normalization
        applied.
        Instead they should retain the same levels as they would have in the
        final mix present in the first track so that if all stems were played at
        unity gain the levels would be equivalent to the final mix.
      </t>
    </section>
    <section anchor="mastering">
      <name>Mastering</name>
      <t>
        Because mastering happens post-mix and the stems are pre-mix audio the
        stem tracks <bcp14>SHOULD NOT</bcp14> have any mastering steps applied.
        Instead, metadata for configuring a compressor and limiter
        <bcp14>SHOULD</bcp14> be included in the previously defined Vorbis
        comment metadata block.
        After mixing, playback applications <bcp14>MAY</bcp14> choose to feed
        the mix through a Digital Signal Processor (DSP) configured with the
        limiter and compressor settings read from the metadata.
      </t>
      <t>
        Each setting for the DSP is stored as a floating-point number with
        a minimum value of 0.0 and a maximum value of 1.0.
        These numbers are stored as strings and <bcp14>MUST</bcp14> use the "."
        mark instead of the "," mark as a decimal separator.
        Only ASCII numbers "0" to "9" and the "." character <bcp14>MUST</bcp14>
        be used.
        Digit grouping delimiters <bcp14>MUST NOT</bcp14> be used.
        Both integer and decimal parts are in base 10.
      </t>
      <t>
        It is <bcp14>RECOMMENDED</bcp14> that applications displaying the
        compressor or limiter settings support replacement of the "." with
        locale specific separators.
        Locale specific digit grouping <bcp14>MAY</bcp14> be used by
        applications displaying the settings.
      </t>
      <t>
        Because different DSPs may use different ranges or scales for each value
        the playback software <bcp14>SHOULD</bcp14> interpret the 0-1 values as
        a linear scale and map them to the range and scale required by the DSP
        when configuring the DSP for playback.
        This may result in a loss of fidelity on some DSPs, but this is deemed
        an acceptable trade off for stem playback which would not normally
        be able to have a mastering step at all.
      </t>
      <section>
        <name>Compressor Metadata</name>
        <table>
          <name>Compressor metadata tags</name>
          <thead>
            <tr>
              <th>Tag</th>
              <th>Requirement Level</th>
              <th>Values</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td>STEM:COMPRESSOR:ENABLED</td>
              <td>
                <bcp14>REQUIRED</bcp14>
              </td>
              <td>"TRUE" or "FALSE"</td>
            </tr>
            <tr>
              <td>STEM:COMPRESSOR:RATIO</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>STEM:COMPRESSOR:OUTPUT_GAIN</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>STEM:COMPRESSOR:THRESHOLD</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>STEM:COMPRESSOR:ATTACK</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>STEM:COMPRESSOR:INPUT_GAIN</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>STEM:COMPRESSOR:RELEASE</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>STEM:COMPRESSOR:HP_CUTOFF</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>STEM:COMPRESSOR:HP_DRY_WET</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>0.0-1.0</td>
            </tr>
          </tbody>
        </table>
      </section>
      <section>
        <name>Limiter Metadata</name>
        <table>
          <name>Limiter metadata tags</name>
          <thead>
            <tr>
              <th>Tag</th>
              <th>Requirement Level</th>
              <th>Values</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td>STEM:LIMITER:ENABLED</td>
              <td>
                <bcp14>REQUIRED</bcp14>
              </td>
              <td>"TRUE" or "FALSE"</td>
            </tr>
            <tr>
              <td>STEM:LIMITER:RELEASE</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>STEM:LIMITER:THRESHOLD</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>STEM:LIMITER:CEILING</td>
              <td>
                <bcp14>OPTIONAL</bcp14>
              </td>
              <td>0.0-1.0</td>
            </tr>
          </tbody>
        </table>
      </section>
    </section>
    <section anchor="IANA">
      <name>IANA Considerations</name>
      <t>This memo includes no request to IANA.</t>
    </section>
    <section anchor="Security">
      <name>Security Considerations</name>
      <t>This document should not affect the security of the Internet.</t>
    </section>
  </middle>
  <back>
    <references>
      <name>Normative References</name>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.3533.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5334.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9639.xml"/>
    </references>
    <references>
      <name>Informative References</name>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.swhited-ogg-skeleton.xml"/>
      <reference anchor="Vorbis" target="https://xiph.org/vorbis/doc/Vorbis_I_spec.html">
        <front>
          <title>Vorbis I specification</title>
          <author>
            <organization>Xiph.Org Foundation</organization>
          </author>
          <date year="2020" month="07" day="04"/>
        </front>
      </reference>
    </references>
  </back>
</rfc>
