Part C - DirectAudio

Three-Dimensional Sound

Describe the properties of a three-dimensional sound
Create the transfer matrix for a three-dimensional sound

Sample | Framework | Design | Coordinator | Audio | Sound | APIAudioBase | Exercises

Mobile sounds within a digital game improve the multi-media experience conasiderably.  Three-dimensional sound effects exhibit not only locality but also orientation and range.  The intensity of a local sound varies with the position and orientation of the listener with respect to the sound source.  Sound travels in straight lines and wraps around objects with some difficulty.  Pockets of silence or barely audible sound are not uncommon in three-dimensional multi-media games.  A first approximation to such effects is a projection cone from the sound source. 

In this chapter, we introduce three-dimensional sound.  We describe the properties of the sound cone and the attentuation with distance from the listener. 

3DSound Sample

The 3DSound Sample introduces three local sounds:  a stationary one at [-30, 5, 40] in world space, a stationary one at [30, 5, 40] in world space and one attached to the spinning box on the left.

3D Sound Sample

The user can change the sound file associated with any model sound. 

The projection cone for each local sound is set at 90 degrees.  The listener's position and orientation is that of the current camera.  The projection cones of both stationary sounds are directed along the negative z world axis (viewed in a left-handed system).  Each local sound is audible only if the current camera is within its projection cone.  The sound attached to the spinning box has its projection cone directed along the box's local positive z axis.  As the box rotates its sound cone may pass the current camera, in which case, the sound is audible. 


Four components require upgrades to accommodate local audio:

  • Design - the Design class defines the local sounds and toggles them at the user's initiative.
  • Coordinator - updates the sounds created by the Design object
  • Audio - manages the listener.
  • Sound - holds the information for each local sound and manages the playing of each sound\. 

Audio and Sound Components

Three-Dimensional Effects

The intensity of a three-dimensional sound as heard by a listener depends upon:

  • the position of the sound source
  • the position of the listener
  • the volume-distance formula
  • the rolloff factor
  • the orientation of the listener
  • the properties of the sound projection cone


The intensity varies between a minimum separation and a maximum separation, where separation is the distance between the sound source and the listener.  The minimum separation is the distance between the listener and the sound source within which the sound is at its maximum intensity and does not vary.  The maximum separation is the distance between the listener and the sound source beyond which the sound is at its minimum intensity and does not decrease further.

sound source and listener

The rolloff factor describes the attenuation with increased distance between the listener and the sound source.  A rolloff factor other than zero specifies an attenuation that is a multiple of the real-world attenuation.  A rolloff factor of 0.0 specifies no rolloff with distance. 

Sound Projection Cone

The sound projection cone describes the intensity within three distinct regions: the inside cone, the outside cone, and the silent region.  The properties of the cone include:

  • its orientation
  • the interior angle of the inner cone
  • the interior angle of the outer cone
  • the inner volume
  • the outer volume

sound source and listener

X3DAudio Implementation

X3DAudio is an API that augments XAudio2 to account for sound sources positioning and orientation in three-dimensions.  X3DAudio uses a listener and an emitter to describe the relative position and orientation of a sound source with respect to the listener (in world coordinates).  X3DAudio uses two functions to connect to XAudio2: X3DAudioInitialize() and X3DAudioCalculate() along with a matrix that converts three-dimensional data into sound output on each of the supported channels. 

X3DAudioInitialize() initializes the speaker configuration and usually is only called once in an application.  X3DAudioCalculate() calculates the matrix that transforms the input volume into output volume. 

We set the three-dimensional properties of a sound source through the SetOutputMatrix() method on the IXAudio2Voice interface. 

Sound Parameters

X3DAudio uses a left-handed Cartesian coordinate system: x left to right, y down to up, z near to far.  To scale the audio coordinates to the graphic coordinates, X3DAudio accepts a distance conversion factor through a member of the X3DAUDIO_DSP_SETTINGS structure. 

X3DAudio accepts the position and orientation of a sound source through a X3DAUDIO_EMITTER structure and the properties of its projection cone through a X3DAUDIO_CONE structure.  The X3DAudioCalculate() function uses these values to construct the transfer matrix.  We submit this matrix to the SetOutputMatrix() method on the IXAudio2SourceVoice interface to produce the appropriate volume output for the mastering voice. 


Translation Layer

The Translation Layer settings defines the macros for

  • describing the action that clones the sound attached to the left box
  • identifying the default key for the sound cloning action
  • describing the local model sounds
  • identifying the default names of the sound files attached to the local model sounds
  • the sound parameters
    • the ratio of the interior cone angle to the outside cone angle
    • the distance conversion factor (the number of metres in one unit)
    • the minimum distance factor (1 = 1 metre)

This configuration file also defines the enumeration constants for

  • cloning the sound attached to the left box
  • local and mobile sound types
  • model object sound and the local sounds
 // Translation.h
 // ...
 typedef enum Action {
     // ...
 } Action;

     // ...
     L"Toggle Local Left Sound", \
     L"Toggle Local Right Sound", \
     L"Toggle Object Sound", \

 #define ACTION_KEY_MAP {\
     // ...
     KEY_F3, KEY_F4, KEY_F6, KEY_F7, KEY_W, KEY_S, KEY_F8, KEY_F9, KEY_0, \
 // ...
 typedef enum ModelSound {
 } ModelSound;

     L"Local Left",\
     L"Local Right",\

 #define SOUND_MAPPINGS {\
     L"Crickets (by reinsamba) .wav",\
     L"Gong (by reinsamba) .wav",\
     L"Goat (by reinsamba) .wav",\
     L"Fortaleza election campaign (by reinsamba) .wav"\
     L"Street_accordeonist (by reinsamba) .wav",\
 // ...


The Design component creates the local sound sources and controls their on/off states through user initiated toggles. 

The Design class defines instance pointers to the local sounds:

 class Design : public Coordinator {
     // ...
     iSound*  background;   // points to background sound
     iSound*  discrete;     // points to discrete sound
     iSound*  locall;       // points to local sound on the left
     iSound*  localr;       // points to local sound on the right
     iSound*  objectSnd;    // points to object sound
     // ...
     Design(void*, int);
     void initialize();
     void update();



The constructor initializes the instance pointers:

 Design::Design(void* h, int s) : Coordinator(h, s) {
     // ...
     // pointers to the sounds
     background   = nullptr;
     discrete     = nullptr;
     locall       = nullptr;
     localr       = nullptr;
     objectSnd    = nullptr;


The initialize() method creates three local Sound objects, attaches the third local sound to the spinner box on the left, and creates Text items to report the on/off state of each sound:

 void Design::initialize() {
    // ...
    // audio ----------------------------------------------------------------
    // ...
    // create local sound on the left
    if (file(SND_LOCAL_L)) {
        locall = CreateLocalSound(file(SND_LOCAL_L), true, 90);
        locall->translate(-30, 5, 40);
        CreateText(Rectf(0.5f, 0.82f, 1, 0.90f), hud, L"Local left ", onOff,
    // create local sound on the right
    if (file(SND_LOCAL_R)) {
        localr = CreateLocalSound(file(SND_LOCAL_R), true, 90);
        localr->translate(30, 5, 40);
        CreateText(Rectf(0.5f, 0.90f, 1, 0.98f), hud, L"Local right ", onOff,
    // create a local sound attached to right object
    if (file(SND_OBJECT)) {
        objectSnd = CreateLocalSound(file(SND_OBJECT), true, 90);


The update() method toggles the local sounds in response to the user's initiative:

 void Design::update() {
     // ...
     // audio -------------------------------------------------------------
     if (pressed(AUD_BKGRD) && background) background->toggle();
     if (pressed(AUD_IMPLS) && discrete)   discrete->toggle();
     if (pressed(AUD_LOCALL) && locall)    locall->toggle();
     if (pressed(AUD_LOCALR) && localr)    localr->toggle();
     if (pressed(AUD_OBJECT) && objectSnd) objectSnd->toggle();


The Coordinator component adds updating of sound data to account for possible changes in position and orientation. 

The update() method updates the positions and orientations of each Sound object before each frame is rendered:

 void Coordinator::update() {
     // ...
     // update the sound sources
     for (unsigned i = 0; i < sound.size(); i++)
         if (sound[i])
     // ...


The Audio component manages the three-dimensional processing that is independent of sound source data.  This includes the position and orientation of the listener and the channel properties of the audio device. 

The APIAudio class includes two new instance variables:

 class APIAudio : public iAPIAudio, public APIAudioBase {
    X3DAUDIO_HANDLE   X3DInstance; // X3DAudio properties
    X3DAUDIO_LISTENER Listener;    // listener's position + orientation
    // ...
    // ...



The setup() method retrieves a handle to the X3DAudio engine and initializes the engine for the device configuration, initializes the listener, and copies the engine and listener addresses to the APIAudioBase class variables:

 bool APIAudio::setup() {
     // ...
     else {
         XAUDIO2_DEVICE_DETAILS deviceDetails;
         pXAudio2->GetDeviceDetails(0, &deviceDetails);
         DWORD channelMask = deviceDetails.OutputFormat.dwChannelMask;
         // Initialize the X3DAudio engine
         X3DAudioInitialize(channelMask, X3DAUDIO_SPEED_OF_SOUND,
         ZeroMemory(&Listener, sizeof(X3DAUDIO_LISTENER));
         // set the APIAudioBase class variables
         pX3DInstance = &X3DInstance;
         pListener    = &Listener;
         rc = true;
     return rc;

The GetDeviceDetails() method on the XAudio2 interface populates the XAUDIO2_DEVICE_DETAILS structure.  The OutputFormat.dwChannelMask member holds the channels to speaker position assignments and the speed of sound in world units per second. 

The X3DAudioInitialize() function initializes X3DAudio engine for the retrieved channel mask. 


The update() method extracts the listener's position and orientation from the view transformation matrix and saves that data in the Listener class variable for subsequent use by the individual sound sources:

 void APIAudio::update(const void* view) {
     if (view) {
         Vector position           = ((Matrix*)view)->position();
         Vector front              = ::normal(((Matrix*)view)->direction('z'));
         Vector up                 = ::normal(((Matrix*)view)->direction('y'));
         X3DAUDIO_VECTOR eFront    = {front.x, front.y, front.z};
         X3DAUDIO_VECTOR ePosition = {position.x, position.y, position.z};
         X3DAUDIO_VECTOR eUp       = {up.x, up.y, up.z};
         Listener.OrientFront = eFront;
         Listener.Position    = ePosition;
         Listener.OrientTop   = eUp;

The three members of this X3DAUDIO_LISTENER structure hold the heading, position, and up direction respectively. 


The Sound component holds the information for the individual sounds sources and processes that information as required for each frame.  The upgrade to this component includes the introduction of the sound projection cone and updating to account for changes in position and orientation.

Sound Class

The iSound interface introduces a derivation from the Frame and exposes one new virtual method for updating the sound source:

 class iSound : public Frame, public iSwitch, public Base {
     // ...
     virtual void update() = 0;
     // ...
 iSound* CreateSound(const wchar_t*, bool = true);
 iSound* CreateLocalSound(const wchar_t*, bool = true, float = 0, float = 0);
 iSound* Clone(const iSound*);

The Sound class includes an instance pointer that identifies the locality of the sound source:

 class Sound : public iSound {
     // ...
     bool local; // is this sound local ?
     // ...
     Sound(const wchar_t*, bool, bool, bool, float = 0, float = 0);
     // ...
     void update();
     // ...


The constructor passes the angles of the sound projection cone to the APISound object:

 Sound::Sound(const wchar_t* file, bool l, bool c, bool o, float q, float i) :
 local(l), continuous(c), on(o)  {
     // ...
     apiSound = CreateAPISound(q, i);
     // ...


The update() method passes the position and orientation of each local sound's reference frame to the APISound object:

 void Sound::update() {
     if (apiSound && local)
         apiSound->update(position(), orientation('z'));

APISound Class

The iAPISound interface exposes a virtual method that updates the position and orientation of the sound source:

 class iAPISound {
     // ...
     virtual void update(const Vector&, const Vector&) = 0;
     // ...
 iAPISound* CreateAPISound(float, float);

The APISound class defines instance variables that hold emitter data, 3d settings, and cone data and an instance pointer to the matrix that translates sound source input into output for the mastering voice:

 class APISound : public iAPISound, public APIAudioBase {
     // ...
     X3DAUDIO_EMITTER      Emitter;       // Represents the frame in 3D space
     X3DAUDIO_DSP_SETTINGS DSPSettings;   // Stores 3D audio settings
     X3DAUDIO_CONE         cone;          // Stores the sound cone settings
     float                 outerCone;     // outer cone angle in degrees
     float                 innerCone;     // inner cone angle in degrees
     unsigned              matrixSize;    // size of translation matrix
     float*                matrix;        // calculates output volumes
     // ...
     APISound(float, float);
     // ...
     bool setup(const wchar_t*, bool, bool);
     void update(const Vector&, const Vector&);
     void play(const wchar_t*, const Vector&, const Vector&, bool, bool);
     // ...


The constructor initializes the instance variables and ensures that the inner cone angle is not larger than the outer cone angle received:

 APISound::APISound(float o, float i) : outerCone(o), innerCone(i),
 matrix(nullptr), matrixSize(0) {
    // align inner and outer cones
    if (!outerCone)
        innerCone = 0;
    if (!innerCone || innerCone > outerCone)
        innerCone = outerCone;
    // ...


The setup() method initializes the three-dimensional parameters for the local sound:

 bool APISound::setup(const wchar_t* sound, bool local, bool continuous) {
     // ...
     else {
         // ...
         if(FAILED(pXAudio2->CreateSourceVoice( &pSourceVoice,
          // ...
         else if (fileType == fourccXWMA) {
             // ...
         else {
             // ...

         if (local) { // Setup X3DAUDIO_DSP_SETTINGS and X3DAUDIO_EMITTER
             XAUDIO2_DEVICE_DETAILS deviceDetails;
             pXAudio2->GetDeviceDetails(0, &deviceDetails);

             X3DAUDIO_EMITTER      clear2 = {0};
             X3DAUDIO_DSP_SETTINGS clear3 = {0};
             X3DAUDIO_CONE         clear4 = {0};

             unsigned nInput  = wfx.Format.nChannels;
             unsigned nOutput = deviceDetails.OutputFormat.Format.nChannels;
             matrixSize       = nInput * nOutput;

             matrix = new float[matrixSize];
             for (unsigned i = 0; i < matrixSize; i++) matrix[i] = 0;
             Emitter     = clear2;
             DSPSettings = clear3;

             DSPSettings.SrcChannelCount     = nInput;
             DSPSettings.DstChannelCount     = nOutput;
             DSPSettings.pMatrixCoefficients = matrix;

             Emitter.ChannelCount = 1;
             Emitter.CurveDistanceScaler = distanceScale;
             // If the sound is omni-directional, we dont need a sound cone
             if (outerCone) {
                 cone = clear4;
                 cone.OuterAngle  = outerCone * 0.0174532f; // degs->rads
                 cone.OuterVolume = 0.0f; // silent outside Outer Cone
                 cone.InnerAngle  = innerCone * 0.0174532f; // degs->rads
                 cone.InnerVolume = 1.0f; // no change within Inner Cone
                 Emitter.pCone    = &cone;
     // ...

The X3DAUDIO_DSP_SETTINGS instance holds the data for a call to X3DAudioCalculate().  This call (in update()) determines the transfer matrix from the listener and emitter data. 

The pCone menber of the X3DAUDIO_EMITTER structure holds the address of the X3DAUDIO_CONE instance that contains the sound projection cone information. 


The update() method calculates the transfer matrix for the current listener and the sound source and sets the output matrix for the voice:

 void APISound::update(const Vector& position, const Vector& front) {
     if (pSourceVoice) {
         X3DAUDIO_VECTOR eFront    = {front.x, front.y, front.z};
         X3DAUDIO_VECTOR ePosition = {position.x, position.y, position.z};

         Emitter.OrientFront = eFront;
         Emitter.Position    = ePosition;
         // Velocity (Doppler) and OrientTop (for multi-channel audio?)

         // X3DAudioCalculate()
         X3DAudioCalculate(*pX3DInstance, pListener, &Emitter,
          X3DAUDIO_CALCULATE_MATRIX, &DSPSettings );

          DSPSettings.SrcChannelCount, DSPSettings.DstChannelCount,

The X3DAudioCalculate() function calculates the matrix coefficients for the current listener and emitter positions and orientations.  The SetOutputMatrix() method on the IXAudio2SourceVoice sets the volume level of each channel of the final output for the source voice.  The channels are mapped to the input channels of the master voice.


The play() method updates the current object if the source is local and the object needed to be setup:

 void APISound::play(const wchar_t* file, const Vector& position,
  const Vector& heading, bool local, bool continuous) {

     // create the voice if it doesn't yet exist
     if (!pSourceVoice && setup(file, local, continuous) && local)
         update(position, heading);
     if (pSourceVoice) {


The release() method deallocates the dynamic memory allocated for the transfer matrix:

 void APISound::release() {
     // ...
     if (matrix) {
         delete [] matrix;
         matrix = nullptr;
         matrixSize = 0;


The APIAudioBase class holds the addresses of the three-dimensional constants and the current listener and the distance scale for access by any of the local APISound objects. 

 class APIAudioBase {
     // ...
     static X3DAUDIO_HANDLE*        pX3DInstance;    // X3DAudio constants
     static X3DAUDIO_LISTENER*      pListener;       // listener's frame
     static float                   volume;
     static float                   frequencyRatio;
     static float                   distanceScale;   // scale to user units
     void error(const wchar_t*, const wchar_t* = 0) const;
     void logError(const wchar_t*) const;

These class variables are initially zero-valued. 

 // APIAudioBase.cpp

 iAPIAudio*              APIAudioBase::audio           = nullptr;
 IXAudio2*               APIAudioBase::pXAudio2        = nullptr;
 X3DAUDIO_HANDLE*        APIAudioBase::pX3DInstance    = nullptr;
 X3DAUDIO_LISTENER*      APIAudioBase::pListener       = nullptr;
 IXAudio2MasteringVoice* APIAudioBase::pMasteringVoice = nullptr;
 float                   APIAudioBase::volume          = 0;
 float                   APIAudioBase::frequencyRatio  = 1.0f;
 float                   APIAudioBase::distanceScale   = 1.0f;
 // ...


  • Attach your own sound to another object in the scene
  • Read the MSDN Documentation on X3DAudio

Previous Reading  Previous: Sound Next: Direct Input: Mouse   Next Reading

  Designed by Chris Szalwinski   Copying From This Site