The basic components of a speech recognition system include the:
- Microphone
- Sound card
- Recognition engine
- Vocabulary
- Speaker profile
- Language model
Speech recognition technology (SRT) is a process by which a computer transcribes verbal dictation directly into text, eliminating the need for human transcription. The microphone and sound card convert analog human speech into a digital waveform. A noise-canceling microphone is often preferred to eliminate background noise.
A recognition engine is an integrated software that contains medical dictionaries and thus can recognize words specific to the anatomical pathology laboratory. Some engines can recognize up to 300,000 medical terms. This vocabulary, in conjunction with a high-speed processor, allows the user to speak at a natural rate while recording their speech. Some engines can also understand verbal commands to open applications or edit text.
For the most accurate transcription, a speaker profile needs to be created; the user’s speech needs to be compared with the system’s vocabulary to ensure accuracy in speech recognition. The purpose of the profile is to account for differing accents and ways of saying certain terms. Creating a speech profile requires reading a 6,000-word document filled with words commonly used in a pathology laboratory. The user reads that document into the system and then edits any misinterpretations. With a unique speech profile created, the engine should result in an accurate transcription for that individual.
The recognition engine can also be integrated with language models. Language models rely on cloud-based processing and ongoing data collection projects to continuously improve their ability to recognize and understand a wider variety of words, phrases, specialized medical terminology, accents, and languages.