Part C - Direct Audio


Introduce sound functionality into the framework
Use XAudio2 to implement continuous and discrete sounds

Sample | Framework | Design | Coordinator | User Input | Audio | Sound | APIAudioBase | Exercises

One way to enhance the quality of a digital game is to augment its graphical representation with audio.  Audio hardware is typically distinct from graphics hardware.  To appreciate how audio hardware operates, we examine the steps involved in processing sound on modern computers.

This chapter introduces audio functionality into the framework.  The model sounds are global and steady, as if at a constant distance from the listener and emanating from every point on a sphere of that radius.  Their volume and pitch do not depend on the listener's position or orientation. 

Sound Sample

The Sound Sample adds two global sounds: a continuous background sound and a short discrete sound.

Sound Sample File Selection

Pressing the 'Toggle Background Sound' key toggles the continuous sound.  Pressing the 'Discrete Sound' key plays the discrete sound once.

The dialog box includes a Sound identifier combo box and a Sound File combo box.  The Sound identifier combo box lists the sounds modelled in the application.  The Sound File combo box lists the sound files available for selection by the user.  The user can select a model sound and match any sound file in the list to that model sound. 

The sound files distributed with the framework are Creative Commons licensed sounds downloaded from the Freesound site at The selected files were uploaded to that site by "reinsamba", a biology teacher and nature and ambient sound collector from Cologne, Germany.


Three components involve upgrades to accommodate global audio:

  • Design - the Design class defines the global sounds and toggles them at the user's initiative.
  • Coordinator - manages the sounds created by the Design object
  • User Input - the APIUserInput class accepts the sound configuration data.

Two new components are included with this sample:

  • Audio - represents the audio hardware.
  • Sound - holds the information for each sound in the framework and manages the playing of those sounds. 

Audio and Sound Components

The audio classes in the Translation Layer share interfaces to the supporting API.  This represents a form of connectivity across the layer.  The shared variables include object and interface addresses as well as global output settings.  The framework implements this connectivity using protected class variables within a base class from which the other classes derive.  We call this class APIAudioBase and describe its details at the end of this chapter. 

The topics raised in this sample include:

  • processing of sound on modern hardware
  • XAudio2 implementation
  • retrieving the names of sound files from a Windows directory

Processing of Sound on Modern Hardware

Sound, either emanating from a source or inpacting upon a listener, takes the form of changes in air pressure.  Sound is analogue in nature, when produced, when travelling, and when heard.  We can clean, modify and mix sound when it is in digital form.  A sound card converts analogue signals into digital ones, mixes the digital signals and subsequently converts them back into analogue signals.  The Analogue to Digital converter (ADC) on a sound card samples and quantizes input from a sound source, while the Digital to Analogue converter (DAC) converts the processed digital signal into an electrical current for playback through an analogue device like a speaker or earphones.  Between the ADC's translation of the analogue input and the DAC's translation of the digital signal into analogue output, we use software to transform the digital signals and mix them with other signals. 

Conversion of Audio Signals

Pulse-Code Modulation (PCM) is the standard method of digitally representing analog signals in computers, blu-ray, compact disc, and dvd formats.  In a PCM stream, the magnitude of the analogue signal is sampled regularly at uniform intervals and the sample is quantized to the nearest value within a range of discrete values.  A PCM stream has two properties:

  • sampling rate - number of times per second that samples are taken
  • bit depth - the number of possible discrete values that a sample can take

Conversion of Audio Signals

Adaptive differential pulse-code modulation (ADPCM) varies the size of the quantization step and allows reduction of the required bandwidth for a given signal-to-noise ratio. 

XAudio2 Implementation

XAudio2 is the Audio API that ships with DirectX.  The XAudio2 specific topics raised in this sample include:

  • the architecture of XAudio2
  • the audio file structures for XAudio2
  • the COM considerations with XAudio2
  • using XAudio2 to process an audio source

The last version of XAudio2 that shipped with the Direct SDK is XAudio2 2.7.  With version 2.8, XAudio2 started shipping as a system component in Windows 8. 

Architecture of XAudio2

XAudio2 is the successor to DirectSound and XAudio that provides flexible and powerful digital signal processing (DSP) effects, including a filter on every voice, as well as support for compressed audio.  XAudio converts all audio data into floating-point PCM format.  If the data has been compressed, XAudio2 decodes it before processing it further.

XAudio2 uses voices to process, manipulate, and play audio data.  Each source voice sends its own audio data to the mastering voice.  The mastering voice receives data from other voices and sends it to the audio hardware for output. 


Audio File Structure

Audio files supported by XAudio2 use the Resource Interchange File Format (RIFF).  RIFF files consist of chunks.  A four-character code (FOURCC) identifies the data within each chunk.  The identifiers common to PCM and ADPCM formats are

  • 'RIFF' - standard RIFF chunk containing the file type WAVE or XWMA in the first four bytes of its data section
  • 'fmt ' - the format header for the audio file (WAVEFORMATEX structure)
  • 'data' - the audio data for the audio file

The RIFF chunk has the form

  • 'RIFF' - its literal fourcc code
  • fileSize - size of the data in the file (including the size of the file type and the size of the data)
  • fileType - fourcc code that identifies the file type
  • data - consists of chunks in any order

The non-RIFF chunks have the form

  • chunkID - the fourcc code that identifies the data contained in the chunk
  • chunkSize - size of the data section of the chunk (size of the valid data in the chunk - excluding padding, the size of chunkID, or the size of chunkSize)
  • data - zero or more bytes of data (padded to the nearest word boundary)

The WAVEFORMATEXTENSIBLE structure provides support for more than two channels and has the following members

  • Format - a WAVEFORMATEX structure that specifies the basic format (see below)
  • wValidBitsPerSample - number of bits of precision in the signal
  • wSamplesPerBlock - number of samples contained in one compressed block
  • wReserved - reserved for system use
  • dwChannelMask - bit mask for assigning channels to speaker positions
  • SubFormat - sub-format of the data

The WAVEFORMATEX structure holds the basic format and has the following members

  • wFormatTag - wavefrom-audio format type
  • nChannels - number of channels
  • nSamplesPerSec - sample rate in Hertz (Hz) - for WAVE_FORMAT_PCM 8.0kHz, 11.024kHz, 22.05kHz, 44.1kHz
  • nAvgBytesPerSec - average data transfer rate - for WAVE_FORMAT_PCM = nSamplesPerSec * nBlockAlign
  • nBlockAlign - block alignment - for WAVE_FORMAT_PCM = nChannels * wBitsPerSample / 8
  • nBitsPerSample - bits per sample - for WAVE_FORMAT_PCM 8 or 16
  • cbSize - extra format information - for WAVE_FORMAT_PCM ignored

COM Considerations

The XAudio2 API runs in its own thread.  To initialize the COM library for use by the calling thread, we call

  • CoInitializeEx() - once at the beginning of the application before creating any instance of the COM object
  • CoUninitialize() - once at the end of the application after all interfaces to the COM objects have been released

The XAudio2 COM object manages all audio engine states and is the only XAudio2 object that is reference-counted.  All other XAudio2 objects are controlled using create and destroy calls. 

A simple XAudio2 application accesses the following COM interfaces:

  • IXAudio2 - interface to the XAudio2 engine
  • IXAudio2MasteringVoice - interface to the mastering voice
  • IXAudio2SourceVoice - interface to a sound voice constructed from an audio file

Processing a Source Voice

In this sample, the processing of source voice involves five steps:

  • extracting the format information for the audio data from the file
  • loading the audio data from the file into a buffer
  • creating a sound voice
  • managing the sound voice
  • destroying the sound voice

We load an audio file by finding its RIFF chunk and looping through the chunk to find its individual chunks.  The MSDN documentation at provides the FindChunk() and ReadChunkData() functions to perform these operations. 

We open the audio file by calling CreateFile(), locate the file's RIFF chunk by calling FindChunk( , fourccRIFF, ), and extract the file type by calling ReadChunkData( , &fileType, ).  We then locate the file's 'fmt ' chunk by calling FindChunk( , fourccFMT, ) and copy its contents by calling ReadChunkData( , &wfx, )

Based on the extracted format, we locate the 'data' chunk and find its size by calling FindChunk( , fourccDATA, chunkSize, ).  We allocate memory for the data buffer and copy the audio data into that buffer by calling ReadChunkData( , &buffer, ).  Finally, we populate the members of a XAUDIO2_BUFFER structure with the following buffer information:

  • AudioBytes - the size of the audio buffer in bytes
  • pAudioData - the address of the data buffer
  • Flags - the buffer flags - XAUDIO2_END_OF_STREAM = no data after this buffer
  • LoopCount - the repetition count - XAUDIO2_LOOP_INFINITE if continuous, 0 otherwise

We create a source voice by passing the WAVEFORMATEX structure to the CreateSourceVoice() method on the IXAudio2 interface and then passing the address of the XAUDIO2_BUFFER to the SubmitSourceBuffer() method on the IXAudio2SourceVoice interface. 

The MSDN documentation at provides sample coding for these operations. 

We start playing a source voice by calling the Start(NULL) method on the IXAudio2SourceVoice interface. 

We control the source voice by passing its current volume to the SetVolume() method and the current frequency ratio to the SetFrequencyRatio() method on the IXAudio2SourceVoice interface. 

The MSDN documentation at

provides details regarding the volume and frequency ratio ranges used by XAudio2. 

We stop playing a source voice by calling the Stop(NULL) method on the IXAudio2SourceVoice interface. 

We destroy a source voice by calling the DestroyVoice() method on the IXAudio2SourceVoice interface. 

Retrieving File Names from a Windows Directory

To provide the user with a list of files from which to select those for mapping to the model sounds, the application needs to populate the sound file combo box of the User Dialog with the names of the files in the audio directory.

The Windows operating system includes library support for retrieving the file names in a specified directory.  The WIN32_FILE_DATA structure holds the name of a file and a flag that distinguishes a file from a directory.  The FindFirstFile() function populates an instance of this structure for a specified directory.  The FindNextFile() function populates the instance for the subsequent entities in the same directory.  The cFileName member of the instance holds the name of file/directory and the dwFileAttributes member holds the flag that distinguishes a file from a directory. 


Translation Layer

The settings file for the translation layer defines the enumeration constants for

  • the sound actions
  • the model sounds

and the macros for

  • the sound action descriptions
  • the sound action key mappings
  • the model sound descriptions
  • the default sound file mappings
 // Translation.h
 // ...
 typedef enum Action {
     // ...
 } Action;

     // ...
     L"Toggle Background Sound", \
     L"Discrete Sound", \
     L"Increase Volume", \
     L"Decrease Volume", \
     L"Speed Up Sound",  \
     L"Slow Down Sound", \

 #define ACTION_KEY_MAP {\
     // ...
     KEY_F3, KEY_F4, KEY_F6, KEY_F7, KEY_W, KEY_S, \
 // ...

 // Model Sounds
 // to add a configurable sound
 // - add its enumeration constant
 // - add its description
 // - add the default filename for the sound
 // - reset MAX_DESC in Configuration.h if necessary
 typedef enum ModelSound {
 } ModelSound;

 // friendly descriptions of configurable sounds as listed in user dialog 
     L"Discrete", \

 // initial selection of configurable sounds
 // include the authors name for CCS+ accreditation
 #define SOUND_MAPPINGS {\
     L"Crickets (by reinsamba) .wav",\
     L"Gong (by reinsamba) .wav" \

 // ...

Model Layer

The settings for the Model Layer define the path to the sound file directory, the enumeration constant for sounds, the volume and frequency parameters

 // Model.h
 // ...
 #define AUDIO_DIRECTORY   L"..\\..\\resources\\audio"
 // ...
 typedef enum Category {
     // ...
 } Category;

 // audio controls
 // initial volume settings
 #define MIN_VOLUME         0
 #define DEFAULT_VOLUME    50
 #define MAX_VOLUME       100
 #define STEP_VOLUME        1
 // initial frequency settings
 #define MIN_FREQUENCY       0
 #define MAX_FREQUENCY     100
 #define STEP_FREQUENCY      1


The Design component creates the model sounds and the text strings that report their state, and processes individual toggling of these sounds. 

The Design class defines instance pointers to the background and discrete sounds:

 class Design : public Cooridnator {
     // ...
     iSound*   background;      // points to the background sound
     iSound*   discrete;        // points to the discrete sound
     // ...
     // ...



The constructor initializes the instance pointers:

 Design::Design(void* h, int s) : Coordinator(h, s) { 
     // ...
     // pointers to the sounds
     background   = nullptr;
     discrete     = nullptr;


The initialize() method creates the Sound objects and the Text items that report their state:

 void Design::initialize() {
     // ...
     // audio -------------------------------------------------------------

     if (file(SND_BKGRD)) {
         background = CreateSound(file(SND_BKGRD));
         CreateText(Rectf(0.5f, 0.66f, 1, 0.74f), hud, L"Background ", onOff,
     if (file(SND_DISCR)) {
         discrete = CreateSound(file(SND_DISCR), false);
         CreateText(Rectf(0.5f, 0.74f, 1, 0.82f), hud, L"Discrete ", onOff,


The update() method toggles the state of the Sound objects at the user's request:

 void Design::update() {
     // ...
     // audio ------------------------------------------------------------- 

     if (pressed(AUD_BKGRD)) background->toggle();
     if (pressed(AUD_IMPLS)) discrete->toggle();


The Coordinator component manages the Audio component that represents the sound card and the sounds created by the Design object.  This component also configures the framework for the sound files selected by the user and processes user initiated changes in volume and frequency. 

The iCoordinator interface exposes two virtual methods to the framework:

 class iCoordinator {
      // ...
      virtual void add(iSound* s)    = 0; 
      // ...
      virtual void remove(iSound* s) = 0;
      // ...
 iCoordinator* CoordinatorAddress();

The Coordinator class holds the addresses of the APIAudio object and all of the Sound objects in the framework and keeps track of the last of the last audio update:

 class Coordinator : public iCoordinator {
     // ...
     iAPIAudio*           audio;            // points to the audio object
     std::vector<iSound*> sound;            // points to sound sources
     unsigned             lastAudioUpdate;  // most recent audio update
     // ...
     void adjustVolume(int);
     void adjustFrequency(int);
     const wchar_t* file (ModelSound s) const;
     // ...
     // ...
     void add(iSound* s)    { ::add(sound, s); }
     // ...
     void remove(iSound* s) { ::remove(sound, s); }
     // ...



The constructor passes the path to the audio file directory to the APIUserInput object, creates the APIAudio object, and initializes the audio timer, frequency, and volume:

 Coordinator::Coordinator(void* hinst, int show) {
    // ...
    userInput   = CreateAPIUserInput(AUDIO_DIRECTORY);
    audio       = CreateAPIAudio(1.0f, MIN_VOLUME, MAX_VOLUME, MIN_FREQUENCY,
    // ...
    lastAudioUpdate  = 0;
    // ...
    // volume and frequency settings
    frequency = DEFAULT_FREQUENCY;
    volume    = DEFAULT_VOLUME;
    // ...

Set Configuration

The setConfiguration() method sets up the APIAudio object once all other setups have succeeded:

 bool Coordinator::setConfiguration() {
     // ...
     if (userInput->getConfiguration()) {
         // ...
         if (window->setup()) {
             // ...
             if (display->setup()) {
                 // ...
                 rc = true;
             } else
     // ...


The reset() method changes the name of the file associated with each model sound that the user has reconfigured:

 void Coordinator::reset() {
     if (setConfiguration()) {
         // reset the sound files
         for (unsigned i = 0; i < sound.size(); i++) {
             if (sound[i]->relFileName() &&
              strcmp(file((ModelSound)i), sound[i]->relFileName()))

Adjust Volume and Frequency

The adjustVolume() method increments or decrements the volume of all Sound objects that are on by the prescribed step:

 void Coordinator::adjustVolume(int factor) {
     if (factor > 0)      volume += STEP_VOLUME;
     else if (factor < 0) volume -= STEP_VOLUME;
     if (volume > MAX_VOLUME)      volume = MAX_VOLUME;
     else if (volume < MIN_VOLUME) volume = MIN_VOLUME;
     lastAudioUpdate = now;

The adjustFrequency() method increments or decrements the frequency of all Sound objects that are on by the prescribed step:

 void Coordinator::adjustFrequency(int factor) {
     if (factor < 0) {
         frequency = frequency + STEP_FREQUENCY;
         frequency = frequency < MIN_FREQUENCY ? MIN_FREQUENCY : frequency;
     else if (factor > 0) {
         frequency = frequency + STEP_FREQUENCY;
         frequency = frequency > MAX_FREQUENCY ? MAX_FREQUENCY : frequency;
     lastAudioUpdate = now;


The file() method returns the address of the string that holds the file name for the specified model sound:

 const wchar_t* Coordinator::file(ModelSound s) const {
     return userInput->file(s);


The update() method adjusts the volume and frequency of all sounds in response to user key presses:

 void Coordinator::update() {
     // ...
     // update the volume and the frequency
     if (now - lastAudioUpdate > KEY_LATENCY) {
         if (userInput->pressed(AUD_VOLUME_DEC)) adjustVolume(-1);
         if (userInput->pressed(AUD_VOLUME_INC)) adjustVolume(1);
         if (userInput->pressed(AUD_FREQ_DEC)) adjustFrequency(-1);
         if (userInput->pressed(AUD_FREQ_INC)) adjustFrequency(1);
     // ...


The no-argument render() method updates the volume and frequency settings on the APIAudio object and renders all of the sounds immediately after drawing the frame:

 void Coordinator::render() {
    // ...
    // update the audio
    // ...
    // render all of the sounds

The one-argument render() method renders all of the sounds:

 void Coordinator::render(Category category) {

     switch (category) {
         // ...
         case SOUNDS:
             for (unsigned i = 0; i < sound.size(); i++) 
                 if (sound[i]) sound[i]->render();
         // ...

Suspend, Restore, and Release

The suspend() method suspends the playing of all Sounds that are on:

 void Coordinator::suspend() {
     // ...
     for (unsigned i = 0; i < sound.size(); i++) 
         if (sound[i]) sound[i]->suspend();
     // ...

The restore() method restores the APIAudio object and restarts the Sounds that were suspended:

 void Coordinator::restore() {
     // ...
     // ...
     for (unsigned i = 0; i < sound.size(); i++) 
         if (sound[i]) sound[i]->restore();
     // ...

The release() method releases the APIAudio object's and all Sound objects' connections to the Audio API:

 void Coordinator::release() {
     // ...
     for (unsigned i = 0; i < sound.size(); i++) 
         if (sound[i]) sound[i]->release();
     // ...
     // ...


The destructor destroys all of the Sound objects that still exist as well as the APIAudio object: 

 Coordinator::~Coordinator() {
     // ...
     for (unsigned i = 0; i < sound.size(); i++) 
         if (sound[i]) sound[i]->Delete();
     // ...
     // ...

User Input

The UserInput component handles the sound file configuration selected by the user.  The APIUserInput object retrieves the names of the files stored in the audio directory, lists the files available for selection in the dialog box's sound file combo box, and saves the user's selections in its instance variables.

The iAPIUserInput interface exposes two virtual methods for handling sound file selection:

 class iAPIUserInput {
     // ...
     virtual const wchar_t* file(ModelSound s) const = 0;
     // ...
     virtual void showSoundMapping(void*)        = 0;
     virtual void updateSoundMapping(void*)      = 0;
     // ...
 iAPIUserInput* CreateAPIUserInput(const wchar_t*);
 iAPIUserInput* APIUserInputAddress();

The APIUserInput class includes an instance variable that holds the index of the model sound currently selected in the model sound combo box:

 class APIUserInput : public iAPIUserInput, public APIBase {
     // ...
     const wchar_t* audioDirectory; // points to the Audio File Directory
     // ...
     // audio
     unsigned       nSounds;
     wchar_t        (*sndDesc)[MAX_DESC + 1];
     wchar_t        (*sndFile)[MAX_DESC + 1];
     // ...
     // most recent configuration memory
     // ...
     int            sound;
     // ...
     // ...
     APIUserInput(const wchar_t*);
     // ...
     const wchar_t* file(ModelSound s) const { return sndFile[s]; }
     // ...
     void populateSoundFileList(void*);
     // ...
     void showSoundMapping(void*);
     void updateSoundMapping(void*);
     // ...



The constructor initializes the model sound currently selected in the model sound combo box and stores the model sounds and default sound file mappings: 

 APIUserInput::APIUserInput(const wchar_t* a) : audioDirectory(a) {
     // ...
     sound       = 0;
     // ...
     // allocate memory for sound descriptions and initial sound mappings
     const wchar_t* soundDesc[] = SOUND_DESCRIPTIONS;
     const wchar_t* defFile[]   = SOUND_MAPPINGS;
     nSounds = sizeof soundDesc / sizeof (wchar_t*);
     sndDesc = new wchar_t[nSounds][MAX_DESC + 1];
     sndFile = new wchar_t[nSounds][MAX_DESC + 1];
     for (unsigned i = 0; i < nSounds; i++) {
         strcpy(sndDesc[i], soundDesc[i], MAX_DESC);
         strcpy(sndFile[i],   defFile[i], MAX_DESC);


The destructor deallocates the memory used for the model sound descriptions and the sound file names: 

 APIUserInput::~APIUserInput() {
     // ...
     delete [] sndDesc;
     delete [] sndFile;

Populate User Dialog

The populateAPIUserDialog() method populates the sound file list:

 void APIUserInput::populateAPIUserDialog(void* hwnd) {
     // ...

Populate Sound File List

The populateSoundFileList() method populates the model sound and sound file combo boxes and sets their cursors to the most recent selections.  This method

  • empties both combo boxes
  • retrieves the names of the sound files in the audio directory
  • populates the sound file combo box with the file names
  • retrieves the model sound descriptions from the Context object
  • populates the model sound combo boxes with those descriptions
  • sets the selected model sound to that previous selection
  • sets the selected sound file for the selected sound to the previous selection
 void APIUserInput::populateSoundFileList(void* hwndw) {
     HWND hwnd = (HWND)hwndw; // handle to current window
     WIN32_FIND_DATA ffd;
     wchar_t directory[MAX_DESC+1];
     unsigned length;
     bool keepsearching;

     SendDlgItemMessage(hwnd, IDC_SFL, CB_RESETCONTENT, 0, 0L);
     length = wcslen(audioDirectory);
     if (length > MAX_DESC - 3)
         error(L"APIUserInput::10 Audio Directory name is too long");
     else {
         strcpy(directory, audioDirectory, MAX_DESC);
         strcat(directory, L"\\*", MAX_DESC);
         handle = FindFirstFile(directory, &ffd);
         keepsearching = handle != INVALID_HANDLE_VALUE;
         bool foundAFile = false;
         while (keepsearching) {
             if (!(ffd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY)) {
                 if (wcslen(ffd.cFileName) < MAX_DESC) {
                     SendDlgItemMessage(hwnd, IDC_SFL, CB_ADDSTRING, 0,
                     foundAFile = true;
             keepsearching = FindNextFile(handle, &ffd) != 0;
         if (foundAFile) {
             SendDlgItemMessage(hwnd, IDC_SFL, CB_SETCURSEL, 0, 0L);
             EnableWindow(GetDlgItem(hwnd, IDC_SFL), TRUE);
     SendDlgItemMessage(hwnd, IDC_AUD, CB_RESETCONTENT, 0, 0L);

     // populate the configurable sound combo box with a list of
     // configurable sounds
     for (int msnd = 0; msnd < nSounds; msnd++) {
         SendDlgItemMessage(hwnd, IDC_AUD, CB_ADDSTRING, 0,
         // find the previously selected match in the sound file list and 
         // store the sound file item index in the sound list data item
         // default to the top of the list, if match not found
         int i = 0;
         bool found = false;
         const wchar_t* storedFilename = sndFile[msnd];
         int nf = SendDlgItemMessage(hwnd, IDC_SFL, CB_GETCOUNT, 0, 0L);
         wchar_t availableFile[MAX_DESC + 1] = L"";
         for (int file = 0; file < nf && !found; file++) {
             SendDlgItemMessage(hwnd, IDC_SFL, CB_GETLBTEXT, file,
             if (!wcscmp(availableFile, storedFilename)) {
                 found = true;
                 i = file;
         SendDlgItemMessage(hwnd, IDC_AUD, CB_SETITEMDATA, msnd, i);

     SendDlgItemMessage(hwnd, IDC_AUD, CB_SETCURSEL, sound, 0L);
     int file = SendDlgItemMessage(hwnd, IDC_AUD, CB_GETITEMDATA,
      sound, 0L);
     SendDlgItemMessage(hwnd, IDC_SFL, CB_SETCURSEL, file, 0L);

The SendDlgItemMessage(), EnableWindow(), and GetDlgItem() functions and their enumeration constants are described in the chapter on Windows Programming.

The FindFirstFile() function populates an instance of a WIN32_FIND_DATA struct for the directory specified in the first argument and returns a handle to the directory.  The FindNextFile() function populates an instance of a WIN32_FIND_DATA struct for the handle specified in the first argument.  The dwFileAttributes member of the instance contains the flag FILE_ATTRIBUTE_DIRECTORY that distinguishes a directory from a file.  The cFileName member of the instance contains the address of the C style null-terminated string that holds the file name. 

Show Sound Mapping

The showSoundMapping() method retrieves the index of the currently selected model sound and resets the cursor in the sound file combo box to the file that the user previously selected for that model sound:

 void APIUserInput::showSoundMapping(void* hwndw) {
     HWND hwnd = (HWND)hwndw; // handle to current window

     int s = SendDlgItemMessage(hwnd, IDC_AUD, CB_GETCURSEL, 0, 0L);
     if (s == CB_ERR)
         error(L"APIUserInput::42 Sound selection failed");
     else {
         int f = SendDlgItemMessage(hwnd, IDC_AUD, CB_GETITEMDATA, s, 0L); 
         SendDlgItemMessage(hwnd, IDC_SFL, CB_SETCURSEL, f, 0L);

The SendDlgItemMessage() function and its enumeration constants are described in the chapter on Windows Programming.

Update Sound Mapping

The updateSoundMapping() method retrieves the index for the selected model sound and the index for the selected sound file and stores the sound file index in the data item of the line item that describes the selected model sound:

 void APIUserInput::updateSoundMapping(void* hwndw) {
     HWND hwnd = (HWND)hwndw; // handle to current window

     int s = SendDlgItemMessage(hwnd, IDC_AUD, CB_GETCURSEL, 0, 0L);
     int f = SendDlgItemMessage(hwnd, IDC_SFL, CB_GETCURSEL, 0, 0L);
     if (s != CB_ERR && f != CB_ERR)
         SendDlgItemMessage(hwnd, IDC_AUD, CB_SETITEMDATA, s, f);

The SendDlgItemMessage() function and its enumeration constants are described in the chapter on Windows Programming.

Save User Choices

The saveUserChoices() method extracts the file names selected for each model sound and stores those names in sndFile:

 bool APIUserInput::saveUserChoices(void* hwndw) {
     // ...
     //----- sound mappings ----------------------------------------------- 

     // define the files associated with the configurable sounds
     wchar_t f[MAX_DESC + 1];
     for (int s = 0; s < nSounds; s++) {
         int i = SendDlgItemMessage(hwnd, IDC_AUD, CB_GETITEMDATA, s, 0L);
         // extract the filename from the string parameter of the line item
         SendDlgItemMessage(hwnd, IDC_SFL, CB_GETLBTEXT, i, (LPARAM)f);
         strcpy(sndFile[s], f, MAX_DESC);
     sound = SendDlgItemMessage(hwnd, IDC_AUD, CB_GETCURSEL, 0, 0L);
     return rcd;

The SendDlgItemMessage() function and its enumeration constants are described in the chapter on Windows Programming.

Window Procedure

The window procedure for the APIUserInput object includes code that is triggered when the user accesses the model sound combo box or the user accesses the sound file combo box:

 BOOL CALLBACK dlgProc(HWND hwnd, UINT msg, WPARAM wp, LPARAM lp) {
     // ...
     // Process message msg
     switch (msg) {
       // ...
       case WM_COMMAND:          // user accessed a dialog box control
         switch (LOWORD(wp)) {   // which control?
           // ...
           case IDC_AUD:  // user accessed the audio combo box
             // show the current model sound and file associated with it
           case IDC_SFL:  // user accessed mappable file combo box
             if (HIWORD(wp) == CBN_SELCHANGE)
                 // associate the selected file with selected model sound 
           // ...
     return rc;

The HIWORD() and CBN_SELCHANGE macros are described in the chapter on Windows Programming.

Resource Script

The APIUserInput resource script includes labels and combo boxes for the model sound descriptions and the list of sound file names:

 IDD_DLG DIALOGEX 200, 100, 310, 280
 // ...
     // ...
     LTEXT           "Sound", IDC_STATIC, 10, 95, 55, 10
     COMBOBOX        IDC_AUD, 10, 105, 55, 66, CBS_DROPDOWNLIST | \ 
                     WS_VSCROLL | WS_TABSTOP
     LTEXT           "Sound File", IDC_STATIC, 68, 95, 112, 10
     COMBOBOX        IDC_SFL, 68, 105, 122, 127, CBS_DROPDOWNLIST | \
                     WS_DISABLED | WS_VSCROLL | WS_TABSTOP
     // ...
                     IDC_STATIC, 8, 252, 300, 10
                     IDC_STATIC, 8, 262, 200, 10


The Audio component represents the sound card.  This component consists of one class - the APIAudio class, which manages those aspects of audio that apply to all sounds. 

The iAPIAudio interface exposes eight virtual methods to the Coordinator class:

 class iAPIAudio {
     virtual void setVolume(int)         = 0;
     virtual bool setup()                = 0;
     virtual void setFrequencyRatio(int) = 0;
     virtual void update(const void*)    = 0;
     virtual void suspend()              = 0;
     virtual bool restore()              = 0;
     virtual void release()              = 0;
     virtual void Delete() const         = 0;
     friend class Coordinator;
 iAPIAudio* CreateAPIAudio(float, int, int, int, int, int, int);

The APIAudio class includes six instance variables that hold the volume and frequency parameters:

 class APIAudio : public iAPIAudio, public APIAudioBase {
     int minVolume;
     int maxVolume;
     int defVolume;
     int minFrequency;
     int maxFrequency;
     int defFrequency;

     APIAudio(const APIAudio& s);            // prevent copying
     APIAudio& operator=(const APIAudio& s); // prevent assignments
     virtual ~APIAudio();
     void convertVolume(int);
     void convertFrequency(int);
     APIAudio(float, int, int, int, int, int, int);
     bool setup();
     void setVolume(int v)         { convertVolume(v); }
     void setFrequencyRatio(int f) { convertFrequency(f); }
     void update(const void*);
     void suspend();
     bool restore();
     void release();
     void Delete() const { delete this; }



The constructor, initializes COM for multi-threading, initializes the volume and frequency parameters to the received values, and initializes the base class pointer to the current object:

 APIAudio::APIAudio(float d, int mnv, int mxv, int mnf, int mxf,
  int dv, int df) : minVolume(mnv), maxVolume(mxv), minFrequency(mnf),
  maxFrequency(mxf), defVolume(dv), defFrequency(df) {
     CoInitializeEx(nullptr, COINIT_MULTITHREADED);
     audio = this;


The setup() method retrieves an interface to the XAudio2 COM object and an interface to the mastering voice:

 bool APIAudio::setup() {
     bool rc = false;
     UINT32 flags = 0;

     // enable XAudio2 debugging if we're running in debug mode
     #ifdef _DEBUG
         flags |= XAUDIO2_DEBUG_ENGINE;
     if (FAILED(XAudio2Create(&pXAudio2, flags))) {
         error(L"APIAudio::11 Failed to initialize the XAudio2 engine");
     else if (FAILED(pXAudio2->CreateMasteringVoice(&pMasteringVoice))) {
         error(L"APIAudio::12 Failed to create the Mastering Voice");
         rc = true;
     return rc;


The update() method sets the volume of the mastering voice to the base class value:

 void APIAudio::update(const void* view) {

Suspend, Restore, and Release

The suspend() method turns off the mastering voice's volume:

 void APIAudio::suspend() {
     if (pMasteringVoice) pMasteringVoice->SetVolume(0);

The restore() method destroys the mastering voice and releases the pointer to the XAudio2 COM interface:

 void APIAudio::restore() {
     if (pMasteringVoice) pMasteringVoice->SetVolume(0);

The release() method destroys the mastering voice and releases the interface to the XAudio2 COM interface:

 void APIAudio::release() {

     if (pMasteringVoice) {
         pMasteringVoice = nullptr;
     if (pXAudio2) {
         pXAudio2 = nullptr;


The destructor releases the APIAudio object from its connections to the Audio API and uninitializes COM:

 APIAudio::~APIAudio() {
     audio = nullptr;

Conversion Methods

The convertVolume() method converts the model volume into an XAudio2 volume:

 void APIAudio::convertVolume(int v) {
     if (v < defVolume)                           // from [MIN, DEF-]
         volume = (v - (float)minVolume) /
          (defVolume - minVolume);                // to   [0, 0.9999]
     else if (v > defVolume)                      // from [DEF+, MAX]
         volume = powf(2, (v - (float)defVolume) /
          (maxVolume - defVolume) * 24);         // to   [1+, 2^24]
         volume = 1.0f;

The convertFrequency() method converts the model frequency into an XAudio2 frequency:

 void APIAudio::convertFrequency(int f) {
     if (f < defFrequency)                          // from [MIN, DEF-]
         frequencyRatio = float(f - minFrequency) / // to   [0, 0.9999]
          (defFrequency - minFrequency);
     else if (f > defFrequency)                     // from [DEF+, MAX]
         frequencyRatio = float(f - defFrequency) / // to   [DEF+, 1024]
          (maxFrequency - defFrequency) * (1024 - 1) + 1.0f;
         frequencyRatio = 1.0f;


The Sound component manages the information and processing of all of the sounds in the framework.  This com[ponent consists of two classes: the Sound class that interfaces with the Design object and the APISound class that connects to the Audio API. 

Sound Component

The APISound objects access the graphics hardware through two interfaces:

  • IXAudio2SourceVoice - to the XAudio2 Source Voice object
  • IXAudio2 - to the XAudio2 COM object

Sound Class

The iSound interface exposes thirteen virtual methods to the Coordinator and Designobjects:

 class iSound : public Frame, public iSwitch, public Base {
     virtual const wchar_t* relFileName() const = 0;
     virtual void change(const wchar_t* f)      = 0;
     virtual void loop(bool)                    = 0;
     virtual bool isOn() const                  = 0;
     virtual bool toggle()                      = 0;
     virtual bool stop()                        = 0;
     virtual bool play()                        = 0;
     virtual void render()                      = 0;
     virtual void suspend()                     = 0;
     virtual void restore()                     = 0;
     virtual void release()                     = 0;
     virtual void Delete() const                = 0;
     friend class Coordinator;
     friend class Design;
     virtual iSound* clone() const              = 0;
 iSound* CreateSound(const wchar_t*, bool = true);
 iSound* Clone(const iSound*);

The Sound class defines three instance pointers and five instance variabes:

class Sound : public iSound {
    iAPISound* apiSound;          // points to the sound at the api level
    wchar_t*   fileWithPath;      // name of sound file with the path
    wchar_t*   relFile;           // name of sound file without the path
    bool       on;                // is this sound on?
    bool       setToStart;        // is this sound ready to start playing?
    bool       setToStop;         // is this sound ready to stop playing?
    bool       continuous;        // is this sound continuous?
    unsigned   lastToggle;        // time of the last toggle

    Sound(const Sound&);
    virtual ~Sound();
    Sound(const wchar_t*, bool, bool);
    Sound& operator=(const Sound&);
    iSound* clone() const              { return new Sound(*this); }
    const wchar_t* relFileName() const { return fileWithPath; }
    void change(const wchar_t* f);
    void loop(bool on)                { this->on = on; }
    bool isOn() const                 { return on; }
    bool toggle();
    void update();
    bool play();
    bool stop();
    void render();
    void suspend();
    void restore();
    void release();
    void Delete() const { delete this; }


The constructor

  • stores the type and the description flags for the sound
  • adds the address of the Sound object to the Audio coordinator
  • creates the APISound object that connects to the Audio API
  • stores the relative file name and creates the path based file name
  • initializes the status flags and the time of the last toggle
 Sound::Sound(const wchar_t* file, bool c, bool o) :
  continuous(c), on(o)  {
     apiSound = CreateAPISound();

     if (file) {
         // store filename (without/with path)
         int len = strlen(file);
         relFile = new wchar_t[len + 1];
         strcpy(relFile, file, len);
         // add the directory to create the relative filename
         len += strlen(AUDIO_DIRECTORY) + 1;
         fileWithPath = new wchar_t[len + 1];
         ::nameWithDir(fileWithPath, AUDIO_DIRECTORY, relFile, len);
     else {
         relFile      = nullptr;
         fileWithPath = nullptr;
     setToStart = continuous && on;
     setToStop  = false;
     lastToggle = 0;


The change() method on the Sound object changes the name of the sound file associated with the Sound object:

 void Sound::change(const wchar_t* file) {
     if (file) {
         if (apiSound) {
         int len = strlen(file);
         if (relFile)
             delete [] relFile;
         relFile = new wchar_t[len + 1];
         strcpy(relFile, file, len);
         if (fileWithPath)
             delete [] fileWithPath;
         len += strlen(AUDIO_DIRECTORY) + 1;
         fileWithPath = new wchar_t[len + 1];
         ::nameWithDir(fileWithPath, AUDIO_DIRECTORY, relFile, len);


The toggle() method toggles the on/off state of the Sound object:

 bool Sound::toggle() {
     bool rc = false;

     if (now - lastToggle > KEY_LATENCY) {
         if (on) setToStop = true;
         else   setToStart = true;
         lastToggle = now;
         rc = true;
     return rc;


The play() method prepares for the playing of the APISound object:

 bool Sound::play() {
     bool rc = false;

     if (now - lastToggle > KEY_LATENCY) {
         setToStart = true;
         lastToggle = now;
         rc = true;
     return rc;


The stop() method on the Sound object prepares to stop the playing of the APISound object:

 bool Sound::stop() {
     bool rc = false;

     if (now - lastToggle > KEY_LATENCY) {
         setToStop = true;
         lastToggle = now;
         rc = true;
     return rc;


The render() method implements the start or stop action as required:

 void Sound::render() {
     if (setToStop) {
         if (apiSound) apiSound->stop();
         setToStop  = false;
         on         = false;
     if (setToStart) {
         if (apiSound) apiSound->play(fileWithPath, continuous);
         setToStart = false;
         setToStop  = false;
         on         = true;

Suspend, Restore, and Release

The suspend() method stops the playing of the APISound object and prepares the flags for subsequent action upon restoration:

 void Sound::suspend() {
     if (apiSound) apiSound->stop();
     setToStart = continuous && (setToStart || on);
     on         = false;

The restore() method resets the time of the last toggle:

 void Sound::restore() { lastToggle = now; }

The release() method suspends playing and releases the APISound object from the Audip API:

 void Sound::release() {
     if (apiSound) apiSound->release();


The destructor deletes the memory used for the file names, deletes the APISound object, and removes the current object's address from the Coordinator's list:

 Sound::~Sound() {
     if (fileWithPath) delete [] fileWithPath;
     if (relFile)      delete [] relFile;
     if (apiSound)     apiSound->Delete();

APISound Class

The iAPISound interface exposes five virtual methods to the Sound class:

  • clone() - creates a clone of an APISound object
  • update() - update the state of the source voice
  • play() - starts playing the source voice
  • stop() - stops playing the source voice
  • release() - releases the source voice
  • Delete() - deletes the APISound object
 class iAPISound {
     virtual iAPISound* clone() const        = 0;
     virtual void play(const wchar_t*, bool) = 0;
     virtual void stop()                     = 0;
     virtual void release()                  = 0;
     virtual void Delete() const             = 0;
     friend class Sound;
 iAPISound* CreateAPISound();

The APISound class defines instance pointers hold the addresses of the interfaces to the sound segment COM object and the audio path COM object:

 class APISound : public iAPISound, public APIAudioBase {
     IXAudio2SourceVoice*  pSourceVoice;  // sound source
     BYTE*                 pDataBuffer;   // Stores WAVE buffer
     UINT32*               pDpdsBuffer;   // Stores xWMA buffer

     virtual ~APISound();
     HRESULT ReadChunkData(HANDLE, void*, DWORD, DWORD);
     APISound(const APISound& s);
     APISound& operator=(const APISound& s);
     iAPISound* clone() const { return new APISound(*this); }
     bool setup(const wchar_t*, bool continuous);
     void play(const wchar_t*, bool);
     void stop();
     void release();
     void Delete() const { delete this; }

Fourcc identifiers are pre-defined in little-endian order:

 #define fourccRIFF 'FFIR' // RIFF
 #define fourccFMT  ' tmf' // fmt
 #define fourccDATA 'atad' // data
 #define fourccWAVE 'EVAW' // WAVE
 #define fourccXWMA 'AMWX' // XWMA
 #define fourccDPDS 'sdpd' // dpds


The constructor initializes the instance pointers:

 APISound::APISound() {
     pSourceVoice = nullptr;
     pDataBuffer  = nullptr;
     pDpdsBuffer  = nullptr;


The setup() method creates the source voice for the sound file, retrieves the interface to its COM object and downloads the sound segment to the hardware:

 bool APISound::setup(const wchar_t* sound, bool continuous) {
     bool   rc = false;
     HANDLE file;
     DWORD  chunkSize = 0, chunkDataPosition = 0, fileType = 0;
     WAVEFORMATEXTENSIBLE wfx     = {0};
     XAUDIO2_BUFFER buffer        = {0};
     XAUDIO2_BUFFER_WMA wmaBuffer = {0};

     file = CreateFile(sound, GENERIC_READ, FILE_SHARE_READ,
      nullptr, OPEN_EXISTING, 0, nullptr);
     if (file == INVALID_HANDLE_VALUE)
         error(L"APISound::10 Failed to open audio file");
     else if (FAILED(FindChunk(file, fourccRIFF, chunkSize,
         error(L"APISound::11 Failed to find RIFF segment");
     else if (FAILED(ReadChunkData(file, &fileType, sizeof(DWORD),
         error(L"APISound::12 Failed to read RIFF segment");
     else if ( fileType != fourccWAVE && fileType != fourccXWMA )
         error(L"APISound::13 File is not a valid WAVE or XWMA file");
     else {
         // No more file-related error handling from this point forward
         // Read in WAVEFORMATEXTENSIBLE from 'fmt ' chunk
         FindChunk(file, fourccFMT, chunkSize, chunkDataPosition);
         ReadChunkData(file, &wfx, chunkSize, chunkDataPosition);

         // Fill out audio data buffer with contents of the fourccDATA chunk
         FindChunk(file, fourccDATA, chunkSize, chunkDataPosition);
         waveBuffer = new BYTE[chunkSize];
         ReadChunkData(file, waveBuffer, chunkSize, chunkDataPosition);

         // Populate XAUDIO2_BUFFER, set looping here
         buffer.AudioBytes = chunkSize;
         buffer.Flags      = XAUDIO2_END_OF_STREAM;
         buffer.pAudioData = pDataBuffer;
         buffer.PlayBegin  = 0;
         buffer.PlayLength = 0;
         buffer.LoopBegin  = 0;
         buffer.LoopLength = 0;
         buffer.LoopCount  = continuous ? XAUDIO2_LOOP_INFINITE : 0;

         // If the file is an XWMA file, then load in the additional buffer
         if (fileType == fourccXWMA) {
             FindChunk(file, fourccDPDS, chunkSize, chunkDataPosition);
             // Divide by 4 to get a DWORD packet count
             // (Dugan)
             wmaBuffer.PacketCount = chunkSize / 4;

             pDpdsBuffer = new UINT32[chunkSize];
             ReadChunkData(file, pDpdsBuffer, chunkSize, chunkDataPosition);
             wmaBuffer.pDecodedPacketCumulativeBytes = pDpdsBuffer;

             error(L"APISound::14 Failed to create Source Voice");
         else if (fileType == fourccXWMA) {
             if(FAILED(voice->SubmitSourceBuffer(&buffer, &wmaBuffer)))
                 error(L"APISound::15 Failed to submit XWMA Source Buffer");
         else {
                 error(L"APISound::16 Failed to submit WAVE Source Buffer ");
                 rc = true;
     if (file != INVALID_HANDLE_VALUE) CloseHandle(file);

     return rc;

The CreateSourceVoice() method on the XAudio2 engine interface creates a source voice and retrieves the address of the interface to that object.  The first argument is the address of the pointer where the address will be stored.  The second argument is the address of the WAVEFORMATEX object that holds the format information from the audio file.  The third argument specifies the filter effect should be available on the voice.  The fourth argument specifies the default value for the maximum frequency ratio. 

The SubmitSourceBuffer() method on the source voice object for WAVE files loads the buffer into the object.  The argument to this method is the address of the XAUDIO2_BUFFER buffer that contains the audio data. 

The SubmitSourceBuffer() method on the source voice object for xWMA files loads two buffers into the object.  The first argument to this method is the address of the XAUDIO2_BUFFER buffer that contains the audio data.  The second argument to this method is the address of the XAUDIO2_BUFFER_WMA buffer that contains the wma data. 


The play() method on the APISound object sets up the sound voice if necessary, and starts playing it:

 void APISound::play(const iSound* sound, bool continuous) { 
     if (!voice) setup(sound, continuous);
     if (voice) voice->Start(0);

The Start() method on the source voice object plays the voice.  The first argument must be 0.  The second argument is optional and used to group voices into operation sets.


The stop() method stops the playing of the source voice:

 void APISound::stop() { if(voice) voice->Stop(0); }

The Stop() method on the source voice object stops the voice.  The first argument is 0 or XAUDIO2_PLAY_TAILS.  The second argument is optional and used to group voices into operation sets.


The release() method destroys the sound voice and deallocates its buffer(s):

 void APISound::release() {
     if (pSourceVoice) {
         pSourceVoice = nullptr;
     if (pDataBuffer) {
         delete [] pDataBuffer;
         pDataBuffer = nullptr;
     if (pDpdsBuffer) {
         delete [] pDpdsBuffer;
         pDpdsBuffer = nullptr;


The destructor releases the connection to the Audio API:

 APISound::~APISound() { release(); }


The APIAudioBase class holds the information for the Translation Layer that the APIAudio and APISound classes share with one another.  The APIAudioBase class embodies the audio connectivity across the Translation Layer and serves as the base class for its derived audio classes. 

Translation Layer Audio Connectivity

The APIAudioBase class contains five class variables:

  • a pointer that holds the address of APIAudio object
  • a pointer to the IXAudio2 interface
  • a pointer to the IXAudio2MasteringVoice interface
  • the volume and frequency ratio settings
 class APIAudioBase {
     static iAPIAudio*              audio;           // the sound card object
     static IXAudio2*               pXAudio2;        // XAudio2 engine
     static IXAudio2MasteringVoice* pMasteringVoice; // masteringVoice
     static float                   volume;
     static float                   frequencyRatio;
     void error(const wchar_t*, const wchar_t* = 0) const;
     void logError(const wchar_t*) const;


All class variables are initially zero-valued. 

 // APIAudioBase.cpp

 iAPIAudio*              APIAudioBase::audio           = nullptr;
 IXAudio2*               APIAudioBase::pXAudio2        = nullptr;
 IXAudio2MasteringVoice* APIAudioBase::pMasteringVoice = nullptr;
 float                   APIAudioBase::volume          = 0;
 float                   APIAudioBase::frequencyRatio  = 1.0f;


The error() method concatenates the error strings if there are two, pops up a message box with the concatenated string, and logs the error:

 void APIAudioBase::error(const wchar_t* msg, const wchar_t* more) const {

     int len = strlen(msg);
     if (more) len += strlen(more);
     wchar_t* str = new wchar_t[len + 1];
     strcpy(str, msg, len);
     if (more) strcat(str, more, len);
     if (hwnd) MessageBox((HWND)hwnd, str, L"Error", MB_OK);
     delete [] str;

Log Error

The logError() method appends a record consisting of the error string to the error.log file:

 void APIAudioBase::logError(const wchar_t* msg) const {
     std::wofstream fp("error.log", std::ios::app);
     if (fp) {
          fp << msg << std::endl;


  • Read Microsoft's Documentation on XAudio2
  • Read the Wikipedia article on XAudio2
  • Add a third global sound to the application, using your own .wav music file

Previous Reading  Previous: Textures Next: 3D Sound   Next Reading

  Designed by Chris Szalwinski   Copying From This Site