Proposal for an XML Specification modeling Time, Positions and Parts of Audio Waveforms

A first draft of a subset of functionality for an open XML standard for musical instruments.

version 0.1

Scope of this Document

This document describes a model for handling the notions of time, positions and parts of audio waveforms. The model is intended to derive a partial XML specification for musical instruments from it. In practice, an implementation of the model would provide functionality to cut an audio waveform into pieces and recombine them in a different order, or repeat some pieces.

The specification can possibly become a part of an overall XML standard for musical instruments, as discussed by the open-instruments mailing list. The proposal given here is not in the first place meant to be used independently from further specification work, however, a prototype implementation that realizes a functional subset is available.

Table of contents

  1. Conceptual Model

  2. Formal Design

  3. XML Specification

  4. Tag Reference


1. Conceptual Model

Requirements

While terms like "time" and "position" seem clearly defined in a physical sense (locating coordinates in 4D time-space), in a musical context "time" and "position" are terms with more facets and multiple dimensions of meaning. The notions of time and positions in musical pieces are rarely used in an absolute sense. Instead, an amount of time or a position in music usually depends on other instances of times and positions, and on other musical characterstics of the sound, e.g. the piece's tempo or a waveform's sample-rate. Times and positions are linked relatively to each other, thus the meanings of "time" and "position" is recursively interdependent in a musical context.

The model proposed here tries to capture these notions of time and positions as 'musically' as possible. This requires time, positions and parts of sound pieces to be understood as each having several possible different meaning extensions, depending on the way they are used in a musical context. Important conceptual requirements that arise from the musical interpretation of the terms are:

Terminology

To embed the above requirements into a formalized model, conceptual terms and additional helper terms are introduced as model elements. These basic model-terms are as follows:

Waveform

The physical representation of a sound or musical piece. A waveform has a sample-rate and a length in frames associated with it, as well as a beats per minute measure (which may default to a fix value and remain unused on sounds that are not tempo-structured). Optionally, a waveform can also have a name to refer to it. Every waveform owns an implicit reference to exactly one master-slice, from which any number of slices can be derived.

Slice

A technical alias for "part of a waveform". A slice is delimited by a start-anker and an end-anker. Every slice also references exactly one parent-slice. The positions specified by the start-anker and end-anker usually are relative to the parent-slice. In most cases, slices also have a name which identifies them uniquely.
A slice is recursively specified as always having a parent-slice associated with it. This way, a slice is always a child of another slice, which again is child of another slice, etc.
Different slices may overlap.
The top-most slice which represents a whole waveform is called the master-slice. The master-slice never gets created explicitly, instead there is an implicit 1:1-relationship between each waveform and one master-slice representing it. (I.e., every waveform 'automatically' has one master-slice associated with it.) As the relationship between a waveform and its master-slice is both 1:1 and implicit, a waveform can safely be said to be its own master-slice, or to be used transparently in the role of its own master-slice. (I.e., treat the waveform as if it is the master-slice itself, which allows to hide the concept of master-slices from end-users. See the specification of the <waveform> tag below in the XML specification.)

Anker

A technical alias for "position in a slice". Ankers are always specified relative to another anker, usually the start-anker of the current slice's parent-slice. As each anker refers to another anker it is relative to, the concept of ankers is again a recursive one (as well as the concept of slices, see above). The time offset between the relative anker and the specified anker is given by a time instance.

Time

Time in a sound or musical piece. As demanded by the requirements, time can be specified either as measures in beats, given in sample-frames, as time in seconds, or be relative to the length of the parent-slice. Such different possible descriptions of time are named time-values.

Time-Value

A symbolic value from which time can be derived. The string is composed of a numeric part and an optional suffix which determines its type (beats, sample-frames, seconds, etc.). The suffix can be

Note that the same time can be expressed by different time-values.

Cuelist

One way to gain practical use from cutting musical pieces into slices is to build a new sequence of musical pieces from the slices. This way, e.g. a song could first be divided into slices of verses, choruses, bridge-parts etc., and later get combined in a different order. Such a re-combined sequence is called a cuelist.

Cue

A reference to a slice, to be used inside a cuelist. The advantage of combing cues to cuelists (and not directly combining slices to possible 'slicelists') is that by using cues as explicit references to slices, confusion about identity and multiplicity is avoided. Multiple occurrences of the same slice inside a cuelist become cleanly modeled as multiple distinct cues that reference one and the same slice. For convenience, each cue SHOULD be repeatable/loopable n times, so that repetitions of single slice-occurrences can be described by a sinlge cue that gets repeated n times.

Other terms

The uppercase key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are intended to be interpreted as described in RFC 2119.

Visual Example

The following figure gives a visual representation of the model terms, displaying a waveform and its slices as amplitude-graphs on a timeline:

The relationships between the introduced terms are displayed in the following diagram:


2. Formal Design

In order to prepare an implementation of the model in XML, a formalized version of the above concepts is required. This is given here in a first attempt by a class-diagram in the Unified Modeling Language (UML). The model also provides a basis for the SuperCollider prototype implementation, this is why the classes SF3 and BBCutProc from the BBCut library are referenced in addition to the declaration of the model concepts.

Additional formal semantics will be given by the XML specification in the next chapter.

For details on the methods of these classes, see the prototype implementation.

In order to apply the model in the prototype implementation, a basic concept of an XML-configurable instrument needs to be introduced. An overall open-instruments XML standard should much more precisely define the functionality of such instrument, but for now a preliminary class SFSliceInstrument is sufficient for this task. The following class-diagram shows its declaration, as it is part of the prototype implementation.

The notion of being 'playable'

One important concept which is orthogonal to several entity-types of the model is the notion of playability. In the musical context modeled here, a play-action can be applied to several different kinds of objects. The following objects can be played:

It is important to note that 'to play' does not have exactly the same meaning when applied to these different entity-types. In the formal model we may express the common concept of playability by an interface Playable on a syntactical level, however, it is up to the individual implementations of the play()-methods in the corresponding classes to give precise semantics to what 'playing' means for that specific entity-type. Note that these play()-methods are not polymorphic related to each other, as their classes do not stand in one common hierarchy of inheritance. The class diagram below shows the playable entity-types:

It remains one task of an overall XML instrument specification to formally distinguish between several variants of what 'to play' means. E.g. it is obvious, that 'to play an instrument' has different semantics than 'to play a sound', or 'to play a score'. It appears that playing a sound also means playing an instrument at the same time. And to play a score always also implies playing sounds, which in turn implies playing instruments.


3. XML Specification

The formal model developed in the previous sections is now used as a basis for specifying a set of XML tags.

Example

This is an example of how a waveform is sliced into parts and re-joined as a cuelist in which the original parts are played in a different order and with repetitions:

<instrument> <!-- just a placeholder for possibly more flexible "instruments"  -->

    <wave src="examples/audio/NosNodOrr.wav"> <!-- wavefile implicitly used as master-slice -->
        <slice name="start" to="20.55839s"/>
        <slice name="leer" to="27.69878s"/>
        <slice name="zwischen" to="34.83637s"/>
        <slice name="wummer" from="33.06s" to="34.82s"/>
        <slice name="basis" from="34.9s" to="49.16338s"/>
        <slice name="funny" from="49.19s" to="56.33s"/>
        <slice name="melody1" from="56.35s" to="63.46s"/>
        <slice name="melody2" from="63.5s" to="70.65s"/>
        <slice name="melody3" from="70.7s" to="77.8371s"/>
        <slice name="end" to="1.0"/>
    </wave>
    
    <cuelist name="song" loop="false">
        <cue slice-ref="start"/>
        <cue slice-ref="zwischen"/>
        <cue slice-ref="wummer" repeat="4"/>
        <cue slice-ref="basis"/>
        <cue slice-ref="funny"/>
        <cue slice-ref="melody1" repeat="2"/>
        <cue slice-ref="melody2"/>
        <cue slice-ref="basis"/>
        <cue slice-ref="melody1" repeat="2"/>
        <cue slice-ref="melody3"/>
        <cue slice-ref="wummer" repeat="8"/>
        <cue slice-ref="funny"/>
        <cue slice-ref="melody1"/>
        <cue slice-ref="melody2"/>
        <cue slice-ref="melody3"/>
        <cue slice-ref="end"/>
        <cue slice-ref="melody2"/>
        <cue slice-ref="melody1"/>
        <cue slice-ref="melody2"/>
        <cue slice-ref="melody3"/>
        <!--<cue slice-ref="melody1" repeat="2"/>
        <cue slice-ref="melody2"/>-->
        <cue slice-ref="end"/>        
    </cuelist>

</instrument>


Let's look at this example step by step:

The <instrument> tag is an outer container tag that initiates the document. In an overall XML specification for musical instruments, <instrument> could carry far more functionality than listed here.

The <wave src=".."> tag declares a wavefile to be loaded.

The <slice> tags inside the <wave> tag describe parts into which the waveform is divided. Note that the conceptual model requires all slice tags to appear inside other slices (their parent-slice), but here the <wave> tag is parsed to implicitly provide a master-slice, so that <slice> tags can appear inside <wave>.

Specifying Slices

Using shortcut anker specifications

Expressions of time can be specified in either a long, full-sized form, or a short form via a parseable string. In most cases the short form will be more suitable to write and read, so it is the primary recommended form. See below for the full-sized form.

Using the short form, time-values get specified via attributes start, end, to, length etc. inside their owning XML tag (e.g. <slice>). Time values of different types are distinguished via a suffix in the string, which can be

These are examples of how to specify time-values directly via attributes:

<slice name="test" start="0.1s" end="3.5s"/>
<slice name="test2" start="2.5~" end="1.0"/>


The use of the from and to attributes in slice tags is important and distinguishes between different semantics of the <slice> tag. Three cases need to be distinguished:

  1. both a from and a to attribute are given:
    in this case the slice is placed between two ankers which are implicitly created at the time specified by the time-values given via from and to

  2. only a from attribute is given:
    in this case a start-anker is implicitly created at the time given by from-attribute, and the slice's end is set to the end of its parent-slice

  3. only a to attribute is given:
    in this case the slice starts behind the previously declared slice (in terms of XML: the <slice> tag which is the previous sibling), or at the beginning of the parent-slice if the current <slice> tag is the first on its hierarchy level

Example:

<wave name="mywave" src="/data/wav/loop3.wav">
    <slice name="intro" to="12.305s"/> <!-- first slice starts at time-value 0.0 -->
    <slice name="vocals1" to="37.975s"/> <!-- identical: from="12.305s" to="37.975s" -->
    <slice name="bridge1" to="49.534s"/>
</wave>

Usages of the <slice>-tag

As already shown in some of the above examples, a slice can be specified in alternative semantic ways. A slice's overall position, as relative to the master waveform, results from specifying either of the following:

  1. Both independently specifying the a start and the end anker:
    <slice start-anker-ref="time-value" end-anker-ref="time-value" name="..."/>

  2. Setting a start anker and letting the end-anker follow after a given time:*
    <slice start="time-value" length="time-value" name="..."/>

  3. Setting an end-anker and let the start-anker appear earlier a specified amount of time:*
    <slice end="time-value" length="time-value" name="..."/>

  4. Setting an end-anker and use the start-anker of the most previously declared slice:*
    <slice to="time-value" name="..."/>

  5. Using the end-time of the most previously declared slice as a start-anker's time and creating an end-anker the specified length behind the start-anker:*
    <slice length="time-value" name="..."/>

* An implementation may decide to internally create the new anker as relative to the slice's specified start-anker, or to algorithmitically sum up the times to create an end-anker with the same relative anker as the start-anker had been created.

A parser SHOULD be aware of these different combinations and report an error if invalid attribute sets (e.g. an end-attribute combined with a to-attribute) are encountered.

Explicit anker and time declarations

The short form of the <slice> tag notation which has been introduced above makes use of from and to attributes which lead to implicit creation of the corresponding start-anker and end-anker of a slice by the parsing application.

Instead of using the short notation, both ankers and times can also be notated explicitly in a full-size XML notation. The conceptual distinctions between ankers, times and a time-value's type thus remain completely projected into XML semantics.

Example:

<slice name="myslice">
    <start>
        <anker name="my_unique_start_anker"> <!-- a name is given for also referencing this anker from elsewhere -->
            <time>
                <beats>2.5</beats>
            </time>
        </anker>
    </start>
    <end>
        <anker-ref name="my_unique_end_anker-decalred_somewhere_else">
    </end>
</slice>

Padding

In order to mark areas of the waveform as unused (among the children of one parent-slice), a 'null-slice' can be specified using the <pad>-tag. The <pad>-tag only makes sence in a sequence of <slice>-tags which are sepcified as consecutively following each other (i.e. tags of the form <slice to="..." name="..."> or <slice length="..." name="...">). According to this usage inside slice-sequences, the <pad>-tag knows to semantic variencies to declare an unused part of the sound or musical piece:

  1. Padding to a specific anker (any slice or pad created afterwards and following consecutively will start at this anker time):
    <pad to="time-value"/>

  2. Padding a specific amount of time:
    <pad length="time-value"/>

Referencing other declared elements

Referencing a parent-slice

Slices can be implicitly specified as relative to a parent-slice by placing their definitions inside a sub-tree of the parent-slice's declaration.

Example:

<slice name="vocals1" from="9.35s" to="37.975s">
    <slice name="first-half-of-vocals1" start="0.0" end="0.5"/>
</slice>


An alternative way of referencing a parent-slice without nesting a <slice>-tag inside another <slice>-tag, is explicitly referencing the parent-slice via the parent-ref attribute.

Example:

<slice name="vocals1" from="9.35s" to="37.975s">
    ...
</slice>

...

<slice parent-ref="vocals1" name="first-half-of-vocals1" start="0.0" end="0.5"/>

Referencing ankers

Just like slices can be references via the identifier-names, ankers can be referenced via their names. This is especially useful when multiple slices are to share common ankers.

Example:

<wave name="mywave" src="/data/wav/loop3.wav"/>

<slice parent-ref="mywave" name="vocals1">
    <anker role="start" name="my_unique_start_anker"> <!-- a name is given for also referencing this anker from elsewhere -->
        <time>
            <beats>2.5</beats>
        </time>
    </anker>
    <anker role="end" name="my_unique_end_anker">
        <time>
            <beats>4.0</beats>
        </time>
    </anker>
</slice>

...

<slice parent-ref="vocals1" name="first-half-of-vocals1" start="0.0" end="0.5"/>


An implementation SHOULD evaluate references to other declared elements after having completely parsed the XML, in order to allow forward-references to identifiers declared later.

Cuelists

A cuelist is a sequential collection of references to slices. Any slice can be sorted in any order and appear any number of times in the cuelist. When the cuelist gets played (how to play is not subject of this document), it sequentially outputs the slices in the same way as if they had been joind to be one major waveform being played.

Cuelists can be named. Each slice referenced can be repeated any number of times.

Example:

<cuelist name="myCuelist">
    <cue slice-ref="intro" repeat="2"/>
    <cue slice-ref="vocals1"/>
    <cue slice-ref="bridge1"/>
    <cue slice-ref="intro"/>
    <cue slice-ref="vocals2"/>
    <cue slice-ref="chorus1" repeat="3"/>
</cuelist>

Recursive use of cuelists

Cuelists can be declared inside cuelists. This is especially useful when the inner cuelist is to be repeated multiple times (by use of the repeat attribute).

In order to refer to cuelists declared elsewhere in document, the <cuelist-ref> tag can be used.

Example:

<cuelist name="song">
    <cue slice-ref="intro" repeat="2"/>
    <cue slice-ref="vocals1"/>
    
    <cuelist repeat="2">
        <cue slice-ref="bridge1"/>
        <cue slice-ref="intro"/>
    </cuelist>
    
    <cuelist-ref name="more"/>
</cuelist>

...

<cuelist name="more">
    ...
</cuelist>


4. Tag Reference

<instrument [name=".."]>
{ <wave>
| <slice parent-slice-ref="..">
| <anker relative-anker-ref="..">
| <cuelist>
| ... many many more in a complete specification ... }

</instrument>

The top-level tag of a possible XML specification for musical instruments. Of course, most features of a complete instrument are left out here.

Attribute:

  • name (optional): An identifier with which the instrument can be referenced.

Child-tags:

  • <wave> tags which represent waveform data and associated master-slices.

  • <anker> tags with explicitly set relative-anker-ref attributes (needed to allow an <anker> tag to be specified outside of a <slice> tag).

  • <slice> tags with explicitly set parent-ref attributes (needed to allow a <slice> tag to be specified outside a <wave> or another <slice> tag).

  • <cuelist> tags which declare lists that recombine slices declared in the instrument.



<wave [name=".."] src=".." [bpm=".."] [sample-rate=".."]>
{ <slice> .. </slice>
| <anker name=".."> .. </anker>

</wave>

(What about <binary-data>.. base64-encoded data..</binary-data> to store the waveform directly in the XML?)

Attributes:

  • name (optional): An identifier with which the instrument could be referenced by a host application.

  • src: path to a file from which to load the waveform (maybe also a URL?)

Child-tags:

  • <anker> tags with a name attribute may be specified which can be referenced by by anker-ref, start-anker-ref, end-anker-ref and relative-anker-ref attributes.

  • Inner <slice> tags will implicitly have the wave's master-slice representation as their parent-slice.



<slice name=".." [from=".."] [to=".."] [length=".."] [start-anker-ref=".."] [end-anker-ref=".."] [parent-ref=".."]>
[ <start> <anker > .. </anker>|<anker-ref/> </start> ]
[ <end> <anker> .. </anker>|<anker-ref/> </end> ]
{ <anker name=".."> .. </anker> }
{ <slice> .. </slice> }

{ <pad> .. </pad> }
</slice>

Attributes:

  • name: An identifier with which the slice can be referenced.

  • from (optional): A time-offset as string (e.g. '5.2s', '3.5~', '278043#') which denotes the start time of the slice relative to the start anker of the parent-slice. A start-anker representing this start position gets implicitly created.

  • to (optional): A time-offset as string which denotes the end time of the slice relative to the start anker of the parent-slice. An end-anker representing this end position gets implicitly created.

  • length (optional): A time-value as string which denotes the the length of the slice. Corresponding start-ankers and end-ankers are implicitly created, relative to the end-anker of the previous slice in a consecutive sequence of slices (all created as child-slices of the same parent-slice), or relative to the parent-slice's start-anker if the currently declared slice is the first child-slice.

  • start-anker-ref (optional): Reference to an anker declared elsewhere in the document which marks the start-position of the slice. An application SHOULD evaluate this reference dynamically after parsing the while XML document in order to allow forward references in the document.

  • end-anker-ref (optional): Reference to an anker declared elsewhere in the document which marks the end-position of the slice. An application SHOULD evaluate this reference dynamically after parsing the while XML document in order to allow forward references in the document.

  • parent-ref (optional): Reference to a slice declared elsewhere in the document which will become the parent-slice of the currently declared slice. By default, the outer slice, inside of which the current slice is declared, will be used as parent-slice. An application SHOULD evaluate this reference dynamically after parsing the while XML document in order to allow forward references in the document.

Child-tags:

  • Inside either a <start> or an <end> tag, <anker> or <anker-ref> tags can be specified to mark the slice's start and end ankers. This is the long syntax alternative to the short form of referencing ankers via the start-anker-ref or end-anker-ref attributes.

  • <anker> tags with a name attribute may be specified which can be referenced by anker-ref, start-anker-ref, end-anker-ref and relative-anker-ref attributes.

  • Inner <slice> tags will implicitly have the current slice refernced as their parent-slice and may be specified in consecutive order, giving their lengths only.

  • <pad> tags can help to quickly create anonymous slices in a consecutive sequence of slices which will remain unused.



<start>
( <anker> .. </anker>
| <anker-ref/> )

</start>

Child-tag, exactly one of the following must occur:

  • <anker>: An anker which serves as the start-anker of the slice inside of which this <start> tag occurs.

  • <anker-ref>: A reference to an anker declared elsewhere in the document which serves as the start-anker of the slice inside of which this <start> tag occurs.

Parent-tag:

  • The semantics specified here is only valid if this tag appears as a direct child of a <slice> tag. Other cases may be covered elsewhere.



<end>
( <anker> .. </anker>
| <anker-ref/> )

</end>

Child-tag, exactly one of the following must occur as a child-tag:

  • <anker>: An anker which serves as the end-anker of the slice inside of which this <end> tag occurs.

  • <anker-ref>: A reference to an anker declared elsewhere in the document which serves as the end-anker of the slice inside of which this <end> tag occurs.

Parent-tag:

  • The semantics specified here is only valid if this tag appears as a direct child of a <slice> tag. Other cases may be covered elsewhere.



<anker [name=".."] [relative-anker-ref=".."]>
<time> .. </time>
</anker>

Attributes:

  • name (optional): An identifier with which the anker can be referenced.

  • relative-anker-ref (optional): Another anker to which this anker is placed relative, with the specified time as offset. If this <anker> tag is specified inside a <slice> tag, the relative anker defaults to this slice's start-anker. An application SHOULD evaluate this reference dynamically after parsing the while XML document in order to allow forward references in the document.

Child-tag:

  • <time>: A time instance giving the offset between the relative-anker and the anker currently declared.



<anker-ref name=".."/>

References an anker which is declared elsewhere. The referenced anker is treated as if it was declared in the current place by an <anker> declaration.

Attributes:

  • name (optional): The identifier of an anker to reference. An application SHOULD evaluate this reference dynamically after parsing the while XML document in order to allow forward references in the document.



<time>
( <beats>float-value</beats>
| <secs>float-value</secs>
| <frames>int-value</frames>
| <rel>float-value</rel> )
</time>

Child-tags, exactly one of the following must occur:

  • <beats>: Time-value in beats. The physical time in seconds that result from this value depends on the bpm value associated with the waveform.

  • <secs>: Time-value in seconds. The position in the waveform data (number of sample-frames) depends on the sample-rate associated with the waveform.

  • <frames>: Time-value in sample-frames of the waveform. The physical time in seconds depends on the sample-rate associated with the waveform.

  • <rel>: Relative time-value, representing a portion of the length of the current slice inside which this time-value is used.



<pad ( to=".." | length=".." ) />

Inserts an unused gap in the sequence of slices. This is equivalent to creating a slice with just a dummy-name and then simply not use it. The <pad> tag highly increases human readability when inserting such gaps, compared to using <slice> with a dummy-name.

Attribute, exactly one must occur:

  • to: A time-value specifying an offset beginning from the parent-slice's start-anker to pad until.

  • length: A time-value specifying the length to pad.



<cuelist [name=".."] [repeat=".."] [loop="true|false"]>
{ <cue ../>
| <cuelist> .. </cuelist>
| <cuelist-ref/> }

</cuelist>

Declares a cuelist, i.e. a sequence of references to slices. A host application could e.g. output the resulting audio when the user invokes a 'play' command on the cuelist.

Attributes:

  • name (optional): An identifier with which the cuelist can be referenced.

  • repeat (optional): Number of times to repeat this cuelist when it gets played. Default: 1.

  • loop (optional): Flag to indicate whether to endlessly repeat the cuelist when played by a host application. Gets ignored if this cuelist is used as an inner cuelist inside another cuelist.

Child-tags:

  • <cue ../>: Sequence of references to slices to be played consecutively in the cuelist.

  • <cuelist>: Inner cuelists.

  • <cuelist-ref ../>: Reference to a cuelist declared elsewhere in the document.



<cuelist-ref name=".." [repeat=".."]/>

References a cuelist declared elsewhere in the document. The referenced cuelist is treated as if it was declared in the current place by a <cuelist> declaration.

Attributes:

  • name (optional): Cuelist to reference. An application SHOULD evaluate this reference dynamically after parsing the while XML document in order to allow forward references in the document.

  • repeat (optional): Number of times to repeat the referenced cuelist. This value replaces a possibly set repeat-attribute of the referenced cuelist. If not set, the number of repetitions depends on the repeat-attribute of the referenced cuelist.



<cue slice-ref=".." [repeat=".."]/>

Places a reference to a slice into a cuelist.

Attributes:

  • slice-ref: Name of a slice to include in the cuelist. An application SHOULD evaluate this reference dynamically after parsing the while XML document in order to allow forward references in the document.

  • repeat (optional): An integer number specifying how often the references slice is to be repeated, default is 1.



Authors so far:
2005-09-16 Jens Gulden, jgulden@cs.tu-berlin.de