Transient noise rejection for speech recognition
First Claim
1. A method of speech recognition, comprising the steps of:
- (a) receiving audio including user speech and at least some transient noise associated with the speech;
(b) converting the received audio into digital data;
(c) segmenting the digital data into acoustic frames;
(d) extracting acoustic feature vectors from the acoustic frames;
(e) evaluating the acoustic frames for transient noise on a frame-by-frame basis;
(f) rejecting those acoustic frames having transient noise, wherein steps (e) and (f) include assessing at least two time spaced samples within an acoustic frame to determine autocorrelation of the samples within the frame, and rejecting the acoustic frame if the autocorrelation is determined to be insufficient;
(g) accepting as speech frames those acoustic frames having no transient noise; and
thereafter(h) recognizing the user speech using the speech frames.
3 Assignments
0 Petitions
Accused Products
Abstract
A method of and system for transient noise rejection for improved speech recognition. The method comprises the steps of (a) receiving audio including user speech and at least some transient noise associated with the speech, (b) converting the received audio into digital data, (c) segmenting the digital data into acoustic frames, and (d) extracting acoustic feature vectors from the acoustic frames. The method also comprises the steps of (e) evaluating the acoustic frames for transient noise on a frame-by-frame basis, (f) rejecting those acoustic frames having transient noise, (g) accepting as speech frames those acoustic frames having no transient noise and, thereafter, (h) recognizing the user speech using the speech frames.
12 Citations
17 Claims
-
1. A method of speech recognition, comprising the steps of:
-
(a) receiving audio including user speech and at least some transient noise associated with the speech; (b) converting the received audio into digital data; (c) segmenting the digital data into acoustic frames; (d) extracting acoustic feature vectors from the acoustic frames; (e) evaluating the acoustic frames for transient noise on a frame-by-frame basis; (f) rejecting those acoustic frames having transient noise, wherein steps (e) and (f) include assessing at least two time spaced samples within an acoustic frame to determine autocorrelation of the samples within the frame, and rejecting the acoustic frame if the autocorrelation is determined to be insufficient; (g) accepting as speech frames those acoustic frames having no transient noise; and
thereafter(h) recognizing the user speech using the speech frames. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer program product including instructions on a computer readable medium and executable by a computer processor of a speech recognition system to cause the system to implement steps comprising:
-
(a) receiving audio including user speech and at least some transient noise associated with the speech; (b) converting the received audio into digital data; (c) segmenting the digital data into acoustic frames; (d) extracting acoustic feature vectors from the acoustic frames; (e) evaluating the acoustic frames for transient noise on a frame-by-frame basis; (f) rejecting those acoustic frames having transient noise; (g) accepting as speech frames those acoustic frames having no transient noise, wherein the steps (e), (f), and (g) include; analyzing an acoustic frame to determine whether an acoustic frame includes a voiced or an unvoiced signal; accepting the acoustic frame if the acoustic frame is determined to include a voiced signal;
otherwise,assessing at least two time spaced samples within the acoustic frame to determine autocorrelation of the samples within the frame; rejecting the acoustic frame if the autocorrelation is determined to be insufficient;
otherwise,comparing the acoustic frame to a preceding acoustic frame accepted as a speech frame to determine cross-correlation between the acoustic frames; rejecting the acoustic frame if the cross-correlation is determined to be insufficient;
otherwise,applying a codebook trained on speech samples to the acoustic frame to determine a location of the acoustic frame in multidimensional feature space; and rejecting the acoustic frame if it is determined that the acoustic frame is not assigned to a speech cluster; and
thereafter(h) recognizing the user speech using the speech frames.
-
-
13. A speech recognition system, comprising:
-
an acoustic interface to receive audio including user speech and at least some transient noise associated with the speech; a pre-processor to pre-process the received audio, including segmenting the digital data into acoustic frames, extracting acoustic feature vectors from the acoustic frames, evaluating the acoustic frames for transient noise on a frame-by-frame basis, rejecting those acoustic frames having transient noise, and accepting as speech frames those acoustic frames having no transient noise, wherein the pre-processor rejects those acoustic frames having transient noise by assessing at least two time spaced samples within an acoustic frame to determine autocorrelation of the samples within the frame, and rejecting the acoustic frame if the autocorrelation is determined to be insufficient; and a decoder to recognize the user speech using the speech frames. - View Dependent Claims (14, 15, 16, 17)
-
Specification