teaching machines

Generating utterance WAVs

December 27, 2012 by . Filed under algorithms, public.

I’m going to ask my students to write some C code to create an audio pronunciation of a number. For instance, 137 is pronounced “one hundred thirty-seven.” As input, I will give them WAV files containing the separated pronunciations of all the number words that are needed. They’ll need to read these files in, concatenate the samples, and write back out to a WAV file.

The separated number words I programmatically generated using AppleScript’s say command, which lets me write to an AIFF file:

-- The directory on the desktop to write to.
set targetDir to "pronunciations"

-- The things to say.
set utterances to {"zero", "one", "two", "three", "four", "five", "six", "seven", ¬
	"eight", "nine", "ten", "eleven", "twelve", "thirteen", ¬
	"fourteen", "fifteen", "sixteen", "seventeen", "eighteen", ¬
	"nineteen", "twenty", "thirty", "forty", "fifty", "sixty", ¬
	"seventy", "eighty", "ninety", "hundred", "thousand", "million", ¬
	"billion", "trillion", "infinity", "negative"}

set desktopPath to the POSIX path of (path to desktop)
set destinationDir to desktopPath & targetDir

-- Create the directory we're writing to.
tell application "Finder"
  if not (exists desktopPath & targetDir as POSIX file) then
    make new folder at (path to desktop) with properties {name:targetDir}
  end if
end tell

-- Speak the utterances and record the results to files.
repeat with utterance in utterances
  say utterance using "Ralph" saving to (destinationDir & "/" & utterance & ".aiff")
end repeat

The uncompressed WAV protocol can be pretty straightforward to parse and generate, depending on the settings. AIFF on the other hand, I know nothing about and do not care to learn. I convert the AIFFs to mono-channel (-ac 1), 16-bit, little endian (-acodec pcm_s16le) WAV files with this shell script:

for i in *.aiff; do
  ffmpeg -i $i -ac 1 -acodec pcm_s16le ${i:r}.wav
done