The torchaudio based processing of audio files comes from NVIDIA Deep Learning Examples project: https://github.com/NVIDIA/DeepLearningExamples/

jasper_python/1/features.py: https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/SpeechRecognition/Jasper/common/features.py

The audio sample used for benchmarks comes from LibriSpeech dataset.