A first draft of a subset of functionality for an open XML standard for musical instruments.
version 0.1
This document describes a model for handling the notions of time, positions and parts of audio waveforms. The model is intended to derive a partial XML specification for musical instruments from it. In practice, an implementation of the model would provide functionality to cut an audio waveform into pieces and recombine them in a different order, or repeat some pieces.
The specification can possibly become a part of an overall XML standard for musical instruments, as discussed by the open-instruments mailing list. The proposal given here is not in the first place meant to be used independently from further specification work, however, a prototype implementation that realizes a functional subset is available.
While terms like "time" and "position" seem clearly defined in a physical sense (locating coordinates in 4D time-space), in a musical context "time" and "position" are terms with more facets and multiple dimensions of meaning. The notions of time and positions in musical pieces are rarely used in an absolute sense. Instead, an amount of time or a position in music usually depends on other instances of times and positions, and on other musical characterstics of the sound, e.g. the piece's tempo or a waveform's sample-rate. Times and positions are linked relatively to each other, thus the meanings of "time" and "position" is recursively interdependent in a musical context.
The model proposed here tries to capture these notions of time and positions as 'musically' as possible. This requires time, positions and parts of sound pieces to be understood as each having several possible different meaning extensions, depending on the way they are used in a musical context. Important conceptual requirements that arise from the musical interpretation of the terms are:
Time in a musical piece may simply refer to clock time
(e.g. '5.2 s'). But especially in the context of music, time may
also refer to a beat-measurement scale (e.g. 'three...and'-beats)
or, when handling waveform data, may be given as number of
sample-frames. Time can also be understood as being relative to a
part (e.g. 'two-thirds of part A').
Just as in a physical /
geometric sense, a time-value is a differential value.
It denotes the length of musical pieces, not a point in time or a
position.
Positions in musical pieces are understood as inherently relative to other positions (e.g. 'at beat 3 after the end of the second verse'), not just as absolute positions denoted by single time-value. Thus, a position is specified recursively by refering to another position, combined with a time offset that denotes the distance between the two positions.
Parts of musical pieces can be related to each other in several different ways, but parts may also be completely unrelated to each other. The specification is thus required to allow flexible declarations of parts, e.g. giving parts in a consecutive sequence, placing them in a hierarchy (as parts of other parts (of other parts,... etc.)), or freely overlapping.
To embed the above requirements into a formalized model, conceptual terms and additional helper terms are introduced as model elements. These basic model-terms are as follows:
The physical representation of a sound or musical piece. A waveform has a sample-rate and a length in frames associated with it, as well as a beats per minute measure (which may default to a fix value and remain unused on sounds that are not tempo-structured). Optionally, a waveform can also have a name to refer to it. Every waveform owns an implicit reference to exactly one master-slice, from which any number of slices can be derived.
A technical alias for "part of a waveform". A slice
is delimited by a start-anker and an end-anker. Every
slice also references exactly one parent-slice. The positions
specified by the start-anker and end-anker usually
are relative to the parent-slice. In most cases, slices
also have a name which identifies them uniquely.
A slice
is recursively
specified as always having a parent-slice associated with it.
This way, a slice is always a child of another slice, which again is
child of another slice, etc.
Different slices may overlap.
The
top-most slice which represents a whole waveform is called the
master-slice. The master-slice never gets created explicitly,
instead there is an implicit 1:1-relationship between each waveform
and one master-slice representing it. (I.e., every waveform
'automatically' has one master-slice associated with it.) As the
relationship between a waveform and its master-slice is both 1:1 and
implicit, a waveform can safely be said to be its own
master-slice, or to be used transparently in the role of its
own master-slice. (I.e., treat the waveform as if it is the
master-slice itself, which allows to hide the concept of
master-slices from end-users. See the specification of the <waveform>
tag below in the XML specification.)
A technical alias for "position in a slice". Ankers are always specified relative to another anker, usually the start-anker of the current slice's parent-slice. As each anker refers to another anker it is relative to, the concept of ankers is again a recursive one (as well as the concept of slices, see above). The time offset between the relative anker and the specified anker is given by a time instance.
Time in a sound or musical piece. As demanded by the requirements, time can be specified either as measures in beats, given in sample-frames, as time in seconds, or be relative to the length of the parent-slice. Such different possible descriptions of time are named time-values.
A symbolic value from which time can be derived. The string is composed of a numeric part and an optional suffix which determines its type (beats, sample-frames, seconds, etc.). The suffix can be
"~
",
for beat measurements
"#
",
for sample-frames.
"s
",
for seconds
"ms
",
for milliseconds
or none for a relative value between 0.0 (start of the parent-slice) and 1.0 (end of the parent-slice).
Note that the same time can be expressed by different time-values.
One way to gain practical use from cutting musical pieces into slices is to build a new sequence of musical pieces from the slices. This way, e.g. a song could first be divided into slices of verses, choruses, bridge-parts etc., and later get combined in a different order. Such a re-combined sequence is called a cuelist.
A reference to a slice, to be used inside a cuelist. The advantage of combing cues to cuelists (and not directly combining slices to possible 'slicelists') is that by using cues as explicit references to slices, confusion about identity and multiplicity is avoided. Multiple occurrences of the same slice inside a cuelist become cleanly modeled as multiple distinct cues that reference one and the same slice. For convenience, each cue SHOULD be repeatable/loopable n times, so that repetitions of single slice-occurrences can be described by a sinlge cue that gets repeated n times.
The uppercase key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are intended to be interpreted as described in RFC 2119.
The following figure gives a visual representation of the model terms, displaying a waveform and its slices as amplitude-graphs on a timeline:
The relationships between the introduced terms are displayed in the following diagram:
In order to prepare an implementation of the model in XML, a
formalized version of the above concepts is required. This is given
here in a first attempt by a class-diagram in the Unified Modeling
Language (UML). The model also provides a basis for the SuperCollider
prototype implementation, this is why the classes SF3
and BBCutProc
from the BBCut library are referenced in
addition to the declaration of the model concepts.
Additional formal semantics will be given by the XML specification in the next chapter.
For details on the methods of these classes, see the prototype implementation.
In order to apply the model in the prototype
implementation, a basic concept of an XML-configurable instrument
needs to be introduced. An overall open-instruments XML standard
should much more precisely define the functionality of such
instrument, but for now a preliminary class SFSliceInstrument
is sufficient for this task. The following class-diagram shows its
declaration, as it is part of the prototype implementation.
One important concept which is orthogonal to several entity-types of the model is the notion of playability. In the musical context modeled here, a play-action can be applied to several different kinds of objects. The following objects can be played:
instruments
waveforms
slices
cuelists
It is important to note that 'to play' does not have exactly the
same meaning when applied to these different entity-types. In the
formal model we may express the common concept of playability by an
interface Playable
on a syntactical level, however, it
is up to the individual implementations of the play()
-methods
in the corresponding classes to give precise semantics to what
'playing' means for that specific entity-type. Note that these
play()
-methods are not polymorphic related to
each other, as their classes do not stand in one common hierarchy of
inheritance. The class diagram below shows the playable entity-types:
It remains one task of an overall XML instrument specification to formally distinguish between several variants of what 'to play' means. E.g. it is obvious, that 'to play an instrument' has different semantics than 'to play a sound', or 'to play a score'. It appears that playing a sound also means playing an instrument at the same time. And to play a score always also implies playing sounds, which in turn implies playing instruments.
The formal model developed in the previous sections is now used as a basis for specifying a set of XML tags.
This is an example of how a waveform is sliced into parts and re-joined as a cuelist in which the original parts are played in a different order and with repetitions:
<instrument> <!-- just a placeholder for possibly more flexible "instruments" --> <wave src="examples/audio/NosNodOrr.wav"> <!-- wavefile implicitly used as master-slice --> <slice name="start" to="20.55839s"/> <slice name="leer" to="27.69878s"/> <slice name="zwischen" to="34.83637s"/> <slice name="wummer" from="33.06s" to="34.82s"/> <slice name="basis" from="34.9s" to="49.16338s"/> <slice name="funny" from="49.19s" to="56.33s"/> <slice name="melody1" from="56.35s" to="63.46s"/> <slice name="melody2" from="63.5s" to="70.65s"/> <slice name="melody3" from="70.7s" to="77.8371s"/> <slice name="end" to="1.0"/> </wave> <cuelist name="song" loop="false"> <cue slice-ref="start"/> <cue slice-ref="zwischen"/> <cue slice-ref="wummer" repeat="4"/> <cue slice-ref="basis"/> <cue slice-ref="funny"/> <cue slice-ref="melody1" repeat="2"/> <cue slice-ref="melody2"/> <cue slice-ref="basis"/> <cue slice-ref="melody1" repeat="2"/> <cue slice-ref="melody3"/> <cue slice-ref="wummer" repeat="8"/> <cue slice-ref="funny"/> <cue slice-ref="melody1"/> <cue slice-ref="melody2"/> <cue slice-ref="melody3"/> <cue slice-ref="end"/> <cue slice-ref="melody2"/> <cue slice-ref="melody1"/> <cue slice-ref="melody2"/> <cue slice-ref="melody3"/> <!--<cue slice-ref="melody1" repeat="2"/> <cue slice-ref="melody2"/>--> <cue slice-ref="end"/> </cuelist> </instrument> |
Let's look at this example step by step:
The <instrument>
tag is an outer container tag
that initiates the document. In an overall XML specification for
musical instruments, <instrument>
could carry far
more functionality than listed here.
The <wave src="..">
tag declares a
wavefile to be loaded.
The <slice>
tags inside the <wave>
tag describe parts into which the waveform is divided. Note that the
conceptual model requires all slice tags to appear inside other
slices (their parent-slice), but here the <wave>
tag is parsed to implicitly provide a master-slice, so that <slice>
tags can appear inside <wave>
.
Expressions of time can be specified in either a long, full-sized form, or a short form via a parseable string. In most cases the short form will be more suitable to write and read, so it is the primary recommended form. See below for the full-sized form.
Using the short form, time-values get specified via attributes
start
, end
, to
, length
etc. inside their owning XML tag (e.g. <slice>
).
Time values of different types are distinguished via a suffix in the
string, which can be
"~
",
for beat measurements
"#
",
for sample-frames.
"s
",
for seconds
"ms
",
for milliseconds
or none for a relative value between 0.0 (start of the parent-slice) and 1.0 (end of the parent-slice)
These are examples of how to specify time-values directly via attributes:
<slice name="test" start="0.1s" end="3.5s"/> <slice name="test2" start="2.5~" end="1.0"/> |
The use of the from
and to
attributes in slice tags is important and distinguishes between
different semantics of the <slice>
tag. Three
cases need to be distinguished:
both a from
and a to
attribute are given:
in this case the slice is placed between two
ankers which are implicitly created at the time specified by the
time-values given via from
and to
only a from
attribute
is given:
in this case a start-anker is implicitly created at the
time given by from-attribute, and the slice's end is set to the end
of its parent-slice
only a to
attribute is given:
in this case
the slice starts behind the previously declared slice (in terms of
XML: the <slice>
tag which is the previous
sibling), or at the beginning of the parent-slice if the current
<slice>
tag is the first on its hierarchy level
Example:
<wave name="mywave" src="/data/wav/loop3.wav"> <slice name="intro" to="12.305s"/> <!-- first slice starts at time-value 0.0 --> <slice name="vocals1" to="37.975s"/> <!-- identical: from="12.305s" to="37.975s" --> <slice name="bridge1" to="49.534s"/> </wave> |
As already shown in some of the above examples, a slice can be specified in alternative semantic ways. A slice's overall position, as relative to the master waveform, results from specifying either of the following:
Both independently specifying the
a start and the end anker:<slice
start-anker-ref="time-value" end-anker-ref="time-value"
name="..."/>
Setting a start anker and letting
the end-anker follow after a given time:* <slice
start="time-value" length="time-value"
name="..."/>
Setting an end-anker and let the
start-anker appear earlier a specified amount of time:*<slice
end="time-value" length="time-value"
name="..."/>
Setting an end-anker and use the
start-anker of the most previously declared slice:*<slice
to="time-value" name="..."/>
Using the end-time of the most
previously declared slice as a start-anker's time and creating an
end-anker the specified length behind the start-anker:*<slice
length="time-value" name="..."/>
* An implementation may decide to internally create the new anker as relative to the slice's specified start-anker, or to algorithmitically sum up the times to create an end-anker with the same relative anker as the start-anker had been created.
A parser SHOULD be aware of these different combinations and
report an error if invalid attribute sets (e.g. an end
-attribute
combined with a to
-attribute) are encountered.
The short form of the <slice>
tag notation
which has been introduced above makes use of from
and to
attributes which lead to implicit creation of the corresponding
start-anker and end-anker of a slice by the parsing application.
Instead of using the short notation, both ankers and times can also be notated explicitly in a full-size XML notation. The conceptual distinctions between ankers, times and a time-value's type thus remain completely projected into XML semantics.
Example:
<slice name="myslice"> <start> <anker name="my_unique_start_anker"> <!-- a name is given for also referencing this anker from elsewhere --> <time> <beats>2.5</beats> </time> </anker> </start> <end> <anker-ref name="my_unique_end_anker-decalred_somewhere_else"> </end> </slice> |
In order to mark areas of the waveform as unused (among the
children of one parent-slice), a 'null-slice' can be specified using
the <pad>
-tag. The <pad>
-tag
only makes sence in a sequence of <slice>
-tags
which are sepcified as consecutively following each other (i.e. tags
of the form <slice to="..." name="...">
or <slice length="..." name="...">
).
According to this usage inside slice-sequences, the <pad>-tag
knows to semantic variencies to declare an unused part of the sound
or musical piece:
Padding to a specific anker (any
slice or pad created afterwards and following consecutively will
start at this anker time):<pad to="time-value"/>
Padding a specific amount of
time:<pad length="time-value"/>
Slices can be implicitly specified as relative to a parent-slice by placing their definitions inside a sub-tree of the parent-slice's declaration.
Example:
<slice name="vocals1" from="9.35s" to="37.975s"> <slice name="first-half-of-vocals1" start="0.0" end="0.5"/> </slice> |
An alternative way of referencing a parent-slice without
nesting a <slice>
-tag inside another <slice>
-tag,
is explicitly referencing the parent-slice via the parent-ref
attribute.
Example:
<slice name="vocals1" from="9.35s" to="37.975s"> ... </slice> ... <slice parent-ref="vocals1" name="first-half-of-vocals1" start="0.0" end="0.5"/> |
Just like slices can be references via the identifier-names, ankers can be referenced via their names. This is especially useful when multiple slices are to share common ankers.
Example:
<wave name="mywave" src="/data/wav/loop3.wav"/> <slice parent-ref="mywave" name="vocals1"> <anker role="start" name="my_unique_start_anker"> <!-- a name is given for also referencing this anker from elsewhere --> <time> <beats>2.5</beats> </time> </anker> <anker role="end" name="my_unique_end_anker"> <time> <beats>4.0</beats> </time> </anker> </slice> ... <slice parent-ref="vocals1" name="first-half-of-vocals1" start="0.0" end="0.5"/> |
An implementation SHOULD evaluate references to other declared
elements after having completely parsed the XML, in order to
allow forward-references to identifiers declared later.
A cuelist is a sequential collection of references to slices. Any slice can be sorted in any order and appear any number of times in the cuelist. When the cuelist gets played (how to play is not subject of this document), it sequentially outputs the slices in the same way as if they had been joind to be one major waveform being played.
Cuelists can be named. Each slice referenced can be repeated any number of times.
Example:
<cuelist name="myCuelist"> <cue slice-ref="intro" repeat="2"/> <cue slice-ref="vocals1"/> <cue slice-ref="bridge1"/> <cue slice-ref="intro"/> <cue slice-ref="vocals2"/> <cue slice-ref="chorus1" repeat="3"/> </cuelist> |
Cuelists can be declared inside cuelists. This is especially
useful when the inner cuelist is to be repeated multiple times (by
use of the repeat
attribute).
In order to refer to cuelists declared elsewhere in document, the
<cuelist-ref>
tag can be used.
Example:
<cuelist name="song"> <cue slice-ref="intro" repeat="2"/> <cue slice-ref="vocals1"/> <cuelist repeat="2"> <cue slice-ref="bridge1"/> <cue slice-ref="intro"/> </cuelist> <cuelist-ref name="more"/> </cuelist> ... <cuelist name="more"> ... </cuelist> |
</instrument> |
".." ] src=".."
[bpm=".." ] [sample-rate= >
{ <slice> .. </slice> | <anker name= ".." >
.. </anker> |
".." [from=".."]
[to= >[ <start> <anker > .. </anker>|<anker-ref/>
</start> ] [ <end> <anker </slice> |
( <anker > .. </anker> </start> |
( <anker > .. </anker> </end> |
< Attributes:
Child-tag:
|
anker-ref > |
( <beats >float-value</beats> |
<secs </time> |
to= > |
".." ]
[repeat=".." ] [loop="true|false" ]>{ <cue ../ > </cuelist> |
".."
[repeat= > |
".."
[repeat=".." ]/> |
Authors so far:
2005-09-16 Jens Gulden, jgulden@cs.tu-berlin.de