Leçon 1,
Chapitre 1
En cours
Reconnaissance vocale – Introduction
What is Speech Recognition?
- Speech Recognition, also called speech-to-text conversion, Automatic Speech Recognition (ASR) or computer speech recognition is the ability of a machine or a program to identify spoken words and convert them to readable text.
- In other words, speech recognition is a capability which enables programs to process human speech into a readable format.
- Speech recognition technology uses natural language processing(NLP) and machine learning(ML) to convert human speech.
- Difference between speech recognition and voice recognition: Speech recognition focuses on the translation of speech from a verbal format to a text one whereas voice recognition just seeks to identify an individual user’s voice.
How does Speech Recognition Work ?
- Speech Recognition systems use coding and algorithms to analyze and interpret human speech and convert them into text.
- To perform Speech Recognition, human speech is first detected by a microphone which then passes the sound recording to a software program. The software program then performs multiple steps on this sound recording, most importantly feature extraction and phonetic unit recognition. At the end the software program converts the digitally processed sound to human readable text.
- A step-by-step working of a speech recognition system can be given as follows:
- Recording human speech: A microphone translates the sound vibrations of human speech into electrical signals.
- Digitalizing human speech: A computer then converts that electrical signal (analog) into a digital signal.
- Speech enhancement: A computer preprocessing unit enhances the speech signal to make the sound quality better, while mitigating the background noise to make the sound clearer.
- Feature Extraction: This method processes voice inputs using Phonetic Voice Recognition.
- Phonetic Unit Recognition: The speech recognition software analyzes the signal using Acoustic Modeling to register phonemes. Phonemes are distinct units of speech sound that represent and distinguish one word from another.
- Speech Recognition: Each of these distinguished words are then recognized by comparing them with a generic voice pattern of the same word that is already stored in the software.
- Typically an accuracy of 90 -95% is achieved in speech recognition, using currently available algorithms.
Applications of Speech Recognition
- Note taking/Writing: Voice assistant software that offer speech to text translations are one of the most common applications. Siri and Alexa are such examples. Speech-to-text platforms such as Speechmatics and Google’s speech-to-text engine are also examples of similar type.
- Voice control: We can use speech recognition to give commands to and control Voice User Interface (VUI) devices. For example: Asking a car infotainment system to play music or get directions.
- Helping the Disabled: Allowing the blind or people with eye damage to write fluently in any language they choose to only using their speech. Speech recognition also allows the deaf, hard of hearing, and those with learning and other disabilities use computers and similar hardware to engage with media using features such as auto-captioning, Dictaphones etc.
- Voice Biometrics: Verification of users is an example of voice recognition, this is especially useful in banking and financial industries for security purpose. Similar to facial recognition, an individual can also use voice recognition to log into their accounts.
Although speech and voice recognition work differently, they are deeply intertwined to provide many cross-functional capabilities to improve our daily lives.