Robust speech recognition in the presence of echo and noise using multiple signals for discrimination
First Claim
1. A speech recognition system (SRS) comprising:
- a first input to receive a raw microphone signal having a user voice signal based on user speech during a period of time and an echo signal based on sound produced by a speaker during the period of time;
a second input to receive a plurality of types of echo information signals during the period of time, each type of echo information signal including information derived from the echo signal by an echo cancellation system; and
a trained speech recognition processor to recognize speech based on the raw microphone signal and the plurality of types of echo information signals, wherein the processor was trained at least by inputting a plurality of different samples of raw microphone signals and a plurality of different samples of each of a plurality of the types of echo information signals.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for speech recognition system having a speech processor that is trained to recognize speech by considering (1) a raw microphone signal that includes an echo signal and (2) different types of echo information signals from an echo cancellation system (and optionally different types of ambient noise suppression signals from a noise suppressor). The different types of echo information signals may include those used for echo cancelation and those having echo information. The speech recognition system may convert the raw microphone signal and different types of echo information signals (and optional noise suppression signals) into spectral features in the form of a vector, and a concatenator to combine the feature vectors into a total vector (for a period of time) that is used to train the speech processor, and during use of the speech processor to recognize speech.
183 Citations
21 Claims
-
1. A speech recognition system (SRS) comprising:
-
a first input to receive a raw microphone signal having a user voice signal based on user speech during a period of time and an echo signal based on sound produced by a speaker during the period of time; a second input to receive a plurality of types of echo information signals during the period of time, each type of echo information signal including information derived from the echo signal by an echo cancellation system; and a trained speech recognition processor to recognize speech based on the raw microphone signal and the plurality of types of echo information signals, wherein the processor was trained at least by inputting a plurality of different samples of raw microphone signals and a plurality of different samples of each of a plurality of the types of echo information signals. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of speech recognition comprising:
-
receiving at a first input, a raw microphone signal having a user voice signal based on user speech during a period of time and an echo signal based on sound produced by a speaker during the period of time; receiving at a second input, a plurality of types of echo information signals during the period of time, each type of echo information signal including information derived by an echo cancellation system from the echo signal; and recognizing speech at a trained speech recognition processor, based on the raw microphone signal and the plurality of types of echo information signals, wherein the processor was trained at least by inputting a plurality of different samples of raw microphone signals and a plurality of different samples of each of a plurality of the types of echo information signals. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A method of training a speech recognition system (SRS) comprising:
-
receiving at a first input, a plurality of raw microphone signals having a user voice signal based on a sample of user speech during a period of time and an echo signal based on a sample of sound produced by a speaker during the period of time; receiving at a second input, a plurality of types of echo information signals during the period of time, each type of echo information signal including information derived by an echo cancellation system from the echo signal; and training a speech recognition processor to recognize speech based on the raw microphone signals and the plurality of types of echo information signals. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification