The torchaudio based processing of audio files comes from NVIDIA Deep Learning Examples project: https://github.com/NVIDIA/DeepLearningExamples/
jasper_python/1/features.py: https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/SpeechRecognition/Jasper/common/features.py
The audio sample used for benchmarks comes from LibriSpeech dataset.