<?xml version="1.0" encoding="utf-8"?>
<?xml-model href="rfc7991bis.rnc"?>
<!DOCTYPE rfc [
<!ENTITY docname "draft-swhited-mka-stems-07">
]>
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="info" docName="&docname;" ipr="trust200902" obsoletes="" updates="" xml:lang="en" version="3" submissionType="independent">
  <front>
    <title abbrev="MKA Stem">Matroska Stem Files</title>
    <seriesInfo name="Internet-Draft" value="&docname;"/>
    <author fullname="Sam Whited" initials="ssw" role="editor" surname="Whited">
      <address>
        <email>sam@samwhited.com</email>
        <uri>https://blog.samwhited.com</uri>
      </address>
    </author>
    <date year="2026" month="3" day="28"/>
    <area>General</area>
    <workgroup>Internet Engineering Task Force</workgroup>
    <keyword>audio</keyword>
    <keyword>matroska</keyword>
    <keyword>stems</keyword>
    <keyword>djing</keyword>
    <abstract>
      <t>
        This document defines a multi-track profile of the Matroska container
        format for storing stems for use by DJ applications while remaining
        backwards compatible with existing media players.
      </t>
    </abstract>
  </front>
  <middle>
    <section>
      <name>Introduction</name>
      <t>
        Stems are recordings of individual instruments, or clusters of
        instruments, used by DJs and music producers for live mixing of music.
        Historically stems have been stored as individual audio files, or using
        patent-encumbered or vendor specific, proprietary container formats.
      </t>
      <t>
        A common feature of modern software used by DJs is "dynamic" or "live"
        stem separation where the DJ software attempts to algorithmically
        separate the audio signals in a track to allow the DJ to mute, solo, or
        apply effects to individual instruments.
        The results of such dynamic separation vary but are, generally speaking,
        noticeably different from the original stems used by the producer and
        frequently contain distortions and other artifacts that sound
        undesirable.
        A better model is to have the producer release the original stems and
        information about the mastering alongside the original track, giving
        them an advantage when attempting to convince DJs to give them air time.
        This allows the final mix to sound better and closer to the producers
        original vision for the track, even while it is being remixed and
        interpreted by the DJ.
      </t>
      <t>
        This specification documents a profile for the Matroska container
        format <xref target="RFC9559"/> that allows it to store the final mix
        for a track alongside the lossless or lossy stems used to mix the
        track in a single file.
        In addition it specifies metadata for storing mastering information so
        that remixes using the stems can remain as close to the producer of the
        tracks original intent as possible.
        The target consumer of these stem files are DJ applications meant for
        live remixing and performance, as well as Digital Audio Workstations
        (DAWs) used by producers who want their music to be played by DJs.
      </t>
      <section anchor="requirements">
        <name>Requirements Language</name>
        <t>
          The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>",
          "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>",
          "<bcp14>SHALL NOT</bcp14>", "<bcp14>SHOULD</bcp14>",
          "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>",
          "<bcp14>NOT RECOMMENDED</bcp14>", "<bcp14>MAY</bcp14>", and
          "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
          described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/>
          when, and only when, they appear in all capitals, as shown here.
        </t>
      </section>
    </section>
    <section>
      <name>Requirements</name>
      <t>
        STEM files have a few basic requirements including:
      </t>
      <ul spacing="normal">
        <li>Backwards compatibility with existing media players,</li>
        <li>The ability to store multiple audio track,</li>
        <li>
          The ability to store file-level metadata and track-level metadata, and
        </li>
        <li>
          Backwards compatibility when additional tracks have unknown formats
          that cannot be decoded.
        </li>
      </ul>
      <t>
        The following are explicitly <em>Not</em> requirements of this design:
      </t>
      <ul spacing="normal">
        <li>
          Streaming over high-latency connections (ie. the internet) support,
          and
        </li>
        <li>substream re-multiplexing.</li>
      </ul>
    </section>
    <section>
      <name>Track Layout</name>
      <section anchor="s_audio_streams">
        <name>Audio Streams</name>
        <t>
          Each stem file may contain an arbitrary number of tracks containing
          audio and <bcp14>MUST</bcp14> include at least three audio tracks (the
          mixed audio and at least two stems).
          For stem files meant for live DJ use, it is <bcp14>RECOMMENDED</bcp14>
          that four or fewer stem tracks be used (as opposed to stem files meant
          for music production or non-live remixing where a DAW may utilize a
          significantly larger number of tracks).
        </t>
        <t>
          For ease of decoding each track <bcp14>SHOULD</bcp14> be encoded using
          the same codec with the same parameters including bitrate, and sample
          rate.
          Stems are often recorded with a single channel and only the final mix
          is in stereo.
          Stems <bcp14>MAY</bcp14> have a different channel count or layout than
          the main audio track, however it is <bcp14>RECOMMENDED</bcp14>
          that all stem tracks maintain the same channel count and layout as the
          main track and have the same channel balance as their component parts
          in the final mix.
          For example, if the final mix is a stereo track that contains a fiddle
          that is 75% in the right channel and only 25% in the left channel,
          the stem track for the fiddle would also be in stereo with the stem
          mostly appearing from the right channel as in the final mix.
        </t>
        <t>
          The first track containing audio data <bcp14>MUST</bcp14> be the final
          post-mix audio in the default language.
          All tracks containing the final post-mix audio regardless of language
          <bcp14>MUST</bcp14> have the Matroska "<tt>Default</tt>" flag set to
          "<tt>1</tt>"
          (<xref target="RFC9559" sectionFormat="comma" section="18.1"/>,
          <xref target="RFC9559" sectionFormat="bare" section="5.1.4.1.5"/>).
          This helps preserve backwards compatibility in media players which do
          not support this format which typically play the first audio stream
          found or may select based on the default flag.
          In addition, the "<tt>Enabled</tt>" flag for any main tracks
          <bcp14>MUST</bcp14> be set to "<tt>1</tt>"
          (<xref target="RFC9559" sectionFormat="comma" section="5.1.4.1.4"/>).
        </t>
        <t>
          The remaining audio tracks will be individual stems and
          <bcp14>MUST</bcp14> have the same effective length as the first track
          such that playing each stem track from the beginning would result in
          the same audio (excluding mastering) as the final mix present in the
          first track.
          For example, if the original track is three minutes long and the
          stem file includes a percussion track but the percussion does not
          start until minute two the percussion stem would still be three
          minutes long but would contain a minute of silence at the
          start of the track, or would have a block timestamp
          (<xref target="RFC9559" sectionFormat="comma" section="10"/>)
          that sets the effective start time to one minute.
        </t>
        <t>
          Each stem track <bcp14>MUST</bcp14> have the Matroska
          "<tt>Default</tt>" flag set to "<tt>0</tt>" and <bcp14>MUST</bcp14>
          have the "<tt>Enabled</tt>" flag set to "<tt>0</tt>".
        </t>
        <t>
          The stem tracks <bcp14>SHOULD NOT</bcp14> have any gain normalization
          applied to bring the stems up to the same perceived volume.
          Instead they should retain the same levels as they would have in the
          final mix present in the default track so that if all stems were
          played at unity gain the levels would be equivalent to the final mix.
        </t>
        <t>
          Each stem track (ie. all tracks that are not the first track)
          <bcp14>MUST</bcp14> set the value of the
          <tt>\Segment\Tracks\TrackEntry\Name</tt> field
          (<xref target="RFC9559" sectionFormat="comma" section="5.1.4.1.18"/>)
          to a short, human-meaningful, track name for the stem that describes
          its contents, for example "Percussion" or "Vocals".
          These names are intended for display in playback applications and
          therefore should remain concise (generally no more than one word),
          but no specific format or length requirement is defined.
        </t>
        <t>
          For each stem track a <tt>\Segment\Tags\Tag</tt>
          (<xref target="RFC9559" sectionFormat="comma" section="5.1.8"/>)
          <bcp14>SHOULD</bcp14> also be set with its target set to the stem
          track.
          The tag, if present, <bcp14>MUST</bcp14> contain a <tt>SimpleTag</tt>
          element with the <tt>TagName</tt> field set to "<tt>STEM_COLOR</tt>"
          and the <tt>TagString</tt> field set to a color representing the track
          in RGB hex format (ie. "#145374").
        </t>
      </section>
    </section>
    <section>
      <name>Digital Signal Processor</name>
      <t>
        Because mastering happens post-mix and the stems are pre-mix audio the
        stem tracks <bcp14>SHOULD NOT</bcp14> have any mastering steps applied.
        Instead, metadata for configuring a compressor and limiter
        <bcp14>SHOULD</bcp14> be included in the file's global metadata as
        simple tags (see <xref target="RFC9559" section="5.1.8.1.2"/>).
        After mixing, playback applications <bcp14>MAY</bcp14> choose to feed
        the mix through a Digital Signal Processor (DSP) configured with the
        limiter and compressor settings read from the metadata.
      </t>
      <t>
        Each binary setting for the compressor or limiter is stored as a
        floating-point number in the 32-bit and 64-bit binary interchange
        format, as defined in <xref target="IEEE_754_2019"/> with the additional
        restriction that they are limited to a minimum value of 0.0 and a
        maximum value of 1.0.
        Because different DSPs may use different ranges or scales for each value
        the playback software <bcp14>SHOULD</bcp14> interpret the 0-1 values as
        a linear scale and map them to the range and scale required by the DSP
        when configuring the DSP for playback.
        This may result in a loss of fidelity on some DSPs, but this is deemed
        an acceptable trade off for stem playback which would not normally
        be able to have a mastering step at all.
      </t>
      <t>
        During production of a stem track, vendor specific metadata
        <bcp14>MAY</bcp14> be embedded in the Matroska file for more accurately
        configuring a specific DSP, but if such metadata is included the scaled
        values <bcp14>SHOULD</bcp14> also be present for those without access
        to the specific DSP used for the track and such metadata
        <bcp14>MUST</bcp14> select tag names in such a way that they do not
        conflict with the tag names defined for the generic compressor or
        limiter.
      </t>
      <section anchor="s_compressor_metadata">
        <name>Compressor Metadata</name>
        <table>
          <name>Compressor metadata tags</name>
          <thead>
            <tr>
              <th>Tag Name</th>
              <th>Type</th>
              <th>Values</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td>COMPRESSOR_ENABLED</td>
              <td>UTF-8</td>
              <td>"TRUE" or "FALSE"</td>
            </tr>
            <tr>
              <td>COMPRESSOR_RATIO</td>
              <td>binary</td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>COMPRESSOR_OUTPUT_GAIN</td>
              <td>binary</td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>COMPRESSOR_THRESHOLD</td>
              <td>binary</td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>COMPRESSOR_ATTACK</td>
              <td>binary</td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>COMPRESSOR_INPUT_GAIN</td>
              <td>binary</td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>COMPRESSOR_RELEASE</td>
              <td>binary</td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>COMPRESSOR_HP_CUTOFF</td>
              <td>binary</td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>COMPRESSOR_HP_DRY_WET</td>
              <td>binary</td>
              <td>0.0-1.0</td>
            </tr>
          </tbody>
        </table>
      </section>
      <section anchor="s_limiter_metadata">
        <name>Limiter Metadata</name>
        <table>
          <name>Limiter metadata tags</name>
          <thead>
            <tr>
              <th>Tag Name</th>
              <th>Type</th>
              <th>Values</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td>LIMITER_ENABLED</td>
              <td>UTF-8</td>
              <td>"TRUE" or "FALSE"</td>
            </tr>
            <tr>
              <td>LIMITER_RELEASE</td>
              <td>binary</td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>LIMITER_THRESHOLD</td>
              <td>binary</td>
              <td>0.0-1.0</td>
            </tr>
            <tr>
              <td>LIMITER_CEILING</td>
              <td>binary</td>
              <td>0.0-1.0</td>
            </tr>
          </tbody>
        </table>
      </section>
    </section>
    <section>
      <name>Format Support</name>
      <t>
        The Matroska container format can store many types of audio, not all of
        which are suitable for DJing or music production.
        To ensure compatibility between playback and encoding applications the
        following formats should be supported based on the use case of the
        software.
        Formats with the use case "Live remixing" are intended largely for
        playback applications meant for live performance (ie. DJ software).
        Formats with the use case "Music production" are intended to be
        distributed for remixing in a non-live setting (ie. with a DAW).
      </t>
      <table>
        <name>Audio codec support</name>
        <thead>
          <tr>
            <th>Codec</th>
            <th>Use Case</th>
            <th>Codec ID</th>
            <th>Requirement Level</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>FLAC <xref target="RFC9639"/></td>
            <td>Live remixing, Music production</td>
            <td>A_FLAC <xref target="RFC9639" sectionFormat="comma" section="10.2"/></td>
            <td><bcp14>SHOULD</bcp14></td>
          </tr>
          <tr>
            <td>Opus <xref target="RFC6716"/></td>
            <td>Live remixing</td>
            <td>A_OPUS <xref target="I-D.ietf-cellar-codec" sectionFormat="comma" section="3.4.32"/></td>
            <td><bcp14>SHOULD</bcp14></td>
          </tr>
          <tr>
            <td>Raw PCM (IEEE float, little endian)</td>
            <td>Music production</td>
            <td>A_PCM/FLOAT/IEEE <xref target="I-D.ietf-cellar-codec" sectionFormat="comma" section="3.4.33"/></td>
            <td><bcp14>SHOULD</bcp14></td>
          </tr>
          <tr>
            <td>Raw PCM (integer, big endian)</td>
            <td>Music production</td>
            <td>A_PCM/INT/BIG <xref target="I-D.ietf-cellar-codec" sectionFormat="comma" section="3.4.34"/></td>
            <td><bcp14>SHOULD</bcp14></td>
          </tr>
          <tr>
            <td>Raw PCM (integer, little endian)</td>
            <td>Music production</td>
            <td>A_PCM/INT/LIT <xref target="I-D.ietf-cellar-codec" sectionFormat="comma" section="3.4.35"/></td>
            <td><bcp14>SHOULD</bcp14></td>
          </tr>
        </tbody>
      </table>
    </section>
    <section anchor="IANA">
      <name>IANA Considerations</name>
      <t>
        This memo modifies the "Matroska Tag Names" registry to add the
        following values:
      </t>
      <table>
        <name>Additions to the "Matroska Tag Names" Registry</name>
        <thead>
          <tr>
            <th>Tag Name</th>
            <th>Tag Type</th>
            <th>Reference</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td>STEM_COLOR</td>
            <td>UTF-8</td>
            <td>This document, <xref target="s_audio_streams"/></td>
          </tr>
          <tr>
            <td>COMPRESSOR_ENABLED</td>
            <td>UTF-8</td>
            <td>This document, <xref target="s_compressor_metadata"/></td>
          </tr>
          <tr>
            <td>COMPRESSOR_RATIO</td>
            <td>binary</td>
            <td>This document, <xref target="s_compressor_metadata"/></td>
          </tr>
          <tr>
            <td>COMPRESSOR_OUTPUT_GAIN</td>
            <td>binary</td>
            <td>This document, <xref target="s_compressor_metadata"/></td>
          </tr>
          <tr>
            <td>COMPRESSOR_THRESHOLD</td>
            <td>binary</td>
            <td>This document, <xref target="s_compressor_metadata"/></td>
          </tr>
          <tr>
            <td>COMPRESSOR_ATTACK</td>
            <td>binary</td>
            <td>This document, <xref target="s_compressor_metadata"/></td>
          </tr>
          <tr>
            <td>COMPRESSOR_INPUT_GAIN</td>
            <td>binary</td>
            <td>This document, <xref target="s_compressor_metadata"/></td>
          </tr>
          <tr>
            <td>COMPRESSOR_RELEASE</td>
            <td>binary</td>
            <td>This document, <xref target="s_compressor_metadata"/></td>
          </tr>
          <tr>
            <td>COMPRESSOR_HP_CUTOFF</td>
            <td>binary</td>
            <td>This document, <xref target="s_compressor_metadata"/></td>
          </tr>
          <tr>
            <td>COMPRESSOR_HP_DRY_WET</td>
            <td>binary</td>
            <td>This document, <xref target="s_compressor_metadata"/></td>
          </tr>
          <tr>
            <td>LIMITER_ENABLED</td>
            <td>UTF-8</td>
            <td>This document, <xref target="s_limiter_metadata"/></td>
          </tr>
          <tr>
            <td>LIMITER_RELEASE</td>
            <td>binary</td>
            <td>This document, <xref target="s_limiter_metadata"/></td>
          </tr>
          <tr>
            <td>LIMITER_THRESHOLD</td>
            <td>binary</td>
            <td>This document, <xref target="s_limiter_metadata"/></td>
          </tr>
          <tr>
            <td>LIMITER_CEILING</td>
            <td>binary</td>
            <td>This document, <xref target="s_limiter_metadata"/></td>
          </tr>
        </tbody>
      </table>
    </section>
    <section anchor="Security">
      <name>Security Considerations</name>
      <t>This document should not affect the security of the Internet.</t>
    </section>
  </middle>
  <back>
    <references>
      <name>Normative References</name>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9559.xml"/>
    </references>
    <references>
      <name>Informative References</name>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6716.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml6/reference.R.IEEE.754-2019.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9639.xml"/>
      <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-cellar-codec.xml"/>
    </references>
    <section anchor="Acknowledgements" numbered="false">
      <name>Acknowledgements</name>
      <t>
        Thanks to the members of <tt>#matroska</tt> on the <tt>libera.chat</tt>
        IRC network, and to mosu and JanC in particular, for patiently
        explaining the basics of the format to me and for all their feedback.
      </t>
      <t>
        Thanks also to the members of the Ardour forums for their feedback
        on DAWs and mastering.
      </t>
      <t>
        Finally, thanks to the members of the IETF CELLAR working group,
        especially Steve Lhomme, for their feedback.
      </t>
    </section>
  </back>
</rfc>
