Multi-pass speech activity detection strategy to improve automatic speech recognition
First Claim
1. A method performed by an automatic speech recognition system having a processor, comprising:
- performing, by the processor, at least two passes of speech activity detection on an acoustic utterance uttered by a speaker, the at least two passes including an initial pass and a subsequent pass;
estimating, by the processor, at least one of feature statistics and transforms for acoustic feature extraction and acoustic modeling based on an output of an initial pass; and
performing, by the processor, automatic speech recognition using an output of the subsequent pass and the at least one of the feature statistics and transforms estimated from the initial pass while bypassing an output of the initial pass to recognize the acoustic utterance and output a textual representation of the acoustic utterance,wherein the at least two passes are performed simultaneously or separately, based on a user selection.
1 Assignment
0 Petitions
Accused Products
Abstract
An automatic speech recognition system and a method performed by an automatic speech recognition system are provided. The method includes performing at least two passes of speech activity detection on an acoustic utterance uttered by a speaker. The at least two passes include an initial pass and a subsequent pass. The method further includes estimating at least one of feature statistics and transforms for acoustic feature extraction and acoustic modeling based on an output of an initial pass. The method further includes performing automatic speech recognition using an output of the subsequent pass while bypassing an output of the initial pass to recognize the acoustic utterance.
-
Citations
19 Claims
-
1. A method performed by an automatic speech recognition system having a processor, comprising:
-
performing, by the processor, at least two passes of speech activity detection on an acoustic utterance uttered by a speaker, the at least two passes including an initial pass and a subsequent pass; estimating, by the processor, at least one of feature statistics and transforms for acoustic feature extraction and acoustic modeling based on an output of an initial pass; and performing, by the processor, automatic speech recognition using an output of the subsequent pass and the at least one of the feature statistics and transforms estimated from the initial pass while bypassing an output of the initial pass to recognize the acoustic utterance and output a textual representation of the acoustic utterance, wherein the at least two passes are performed simultaneously or separately, based on a user selection. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computer program product for automatic speech recognition, the computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method comprising:
-
performing, by a processor of an automatic speech recognition system, at least two passes of speech activity detection on an acoustic utterance uttered by a speaker, the at least two passes including an initial pass and a subsequent pass; estimating, by the processor, at least one of feature statistics and transforms for acoustic feature extraction and acoustic modeling based on an output of an initial pass; and performing, by the processor, automatic speech recognition using an output of the subsequent pass and the at least one of the feature statistics and transforms estimated from the initial pass while bypassing an output of the initial pass to recognize the acoustic utterance and output a textual representation of the acoustic utterance, wherein the at least two passes are performed simultaneously or separately, based on a user selection. - View Dependent Claims (17, 18)
-
-
19. An automatic speech recognition system having a processor, comprising:
-
a speech activity detector, implemented by the processor, for performing at least two passes of speech activity detection on an acoustic utterance uttered by a speaker, the at least two passes including an initial pass and a subsequent pass, and for estimating at least one of feature statistics and transforms for acoustic feature extraction and acoustic modeling based on an output of an initial pass; and a speech decoder, implemented by the processor, for performing automatic speech recognition using an output of the subsequent pass and the at least one of the feature statistics and transforms estimated from the initial pass while bypassing an output of the initial pass to recognize the acoustic utterance and output a textual representation of the acoustic utterance, wherein the at least two passes are performed simultaneously or separately, based on a user selection.
-
Specification