Docsoft, Incorporated

Improve Automatic Speech Recognition

Author: Yury Delendik, Docsoft Inc.
Created: 7.28.2009

It is really important to remember that there are hundreds of factors that may impact the quality of the audio or video as it relates to Automatic Speech to Text Recognition. The following is a list of things to do to enhance the audio quality. These techniques can improve the transcription accuracy of the Docsoft:AV. Providing good quality media files is essential for the speaker training and transcribing processes.

The Do’s:

  1. Use good quality recording equipment
  2. Ensure the microphone is not blocked or touching articles of clothing
  3. Ensure the microphone is close to the speaker and free from outside interference
  4. Record using native audio capture settings without sound improvement filters
  5. Ensure speaker pronounces all words clearly
  6. Eliminate as much background noise as possible such as music, applause, or loud noises

The Do Not’s:

  1. Do not compress audio files
  2. Do not re-encode audio file from original recording
  3. Do not attempt to digitally alter audio file
  4. Do not try to digitally enhance audio quality

Please keep in mind that the best files for audio mining are the files that were captured with CD quality (44,100 kHz, 16 bit) and were saved in lossless or low compression formats (e.g. MP3 with 128 kbps). Altering or up-converting low quality sound to high quality parameters does not improve speech recognition accuracy.