Guidelines to Record a TTS Dataset at Home
Contents
Guidelines to Record a TTS Dataset at Home#
This section describes how to get the best audio quality when recording TTS data at home.
Recommended Data#
Start by recording the Harvard sentences, which should produce about 20 to 40 minutes of usable data for TTS.
Hardware Requirements#
Use a microphone like the Audio-Technica AT2020USB+ or the Blue Yeti USB mic.
Get a boom filter or windscreen.
Software Requirements#
This document focuses on Audacity: it’s free, and it has a db meter and shortcuts for making recording easy.
Reaper is also quite good but not yet covered on this document.
Do not use software that does not have a numbered db meter.
Recording Prerequisites#
Connect your microphone to your computer.
Open Audacity.
Select your audio interface.
Set the bit depth to 24-bit preferably or 16-bit.
Set the sampling rate to 96 kHz preferably or 44 kHz.
On the microphone, set the microphone pattern to Cardioid if you have that option.
Set up the boom filter or windscreen.
Select the most quiet room in your environment, and close your windows and doors.
Eliminate external sources of noise, for example, air conditioning, computer fan, and so on.
Adjusting the Microphone Level and Body Position Before Recording#
Set your microphone gain to the direction of a clock’s hour hand marking ‘9 o’clock.’
Press the recording button on Audacity.
Make sure you’re talking onto the side of the microphone that has the brand logo.
Position yourself at least a fist away from the microphone and no more than a foot away.
Speak into the microphone with the voice you will use during the TTS data recording.
Adjust the microphone gain to optimize the signal to noise ratio:
If you’re recording with 24-bit, make sure the db meter is hitting between -24 db and -6 db while you’re recording.
If you’re recording with 16-bit, make sure the db meter is between -12 db and -6 db while you’re recording.
Do not change the microphone gain after you’ve adjusted it.
Positioning Yourself Just Right, Too Far or Too Close#
1 fist away (just right) – no distortions, minimal room sound, good signal to noise ratio.
1 inch away – muffled (proximity effect) and distortion from plosives like /b/ and /p/.
2 feet away – lots of room sound and bad signal-to-noise ratio.
Recording the TTS Data#
Prepare your script such that you can easily read it.
Press Shift+R to record into a new track, read the first sentence, and press space bar to stop recording.
Repeat Step 2 for each sentence in your dataset until you have completed the recording of the last one.
Export all files by clicking on File > Export > Export Multiple….