Method and apparatus for accurate endpointing of speech in the presence of noise
First Claim
1. A device for detecting endpoints of an utterance in frames of a received signal, comprising:
- a processor; and
a software module executable by the processor to compare an utterance with a first threshold value to determine a first starting point and a first ending point of the utterance, compare with a second threshold value a part of the utterance that predates the first starting point to determine a second starting point of the utterance, and compare with the second threshold value a part of the utterance that postdates the first ending point to determine a second ending point of the utterance, wherein the first and second threshold values are calculated once per frame from a signal-to-noise ratio for the utterance.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus for accurate endpointing of speech in the presence of noise includes a processor and a software module. The processor executes the instructions of the software module to compare an utterance with a first signal-to-noise-ratio (SNR) threshold value to determine a first starting point and a first ending point of the utterance. The processor then compares with a second SNR threshold value a part of the utterance that predates the first starting point to determine a second starting point of the utterance. The processor also then compares with the second SNR threshold value a part of the utterance that postdates the first ending point to determine a second ending point of the utterance. The first and second SNR threshold values are recalculated periodically to reflect changing SNR conditions. The first SNR threshold value advantageously exceeds the second SNR threshold value.
112 Citations
13 Claims
-
1. A device for detecting endpoints of an utterance in frames of a received signal, comprising:
-
a processor; and
a software module executable by the processor to compare an utterance with a first threshold value to determine a first starting point and a first ending point of the utterance, compare with a second threshold value a part of the utterance that predates the first starting point to determine a second starting point of the utterance, and compare with the second threshold value a part of the utterance that postdates the first ending point to determine a second ending point of the utterance, wherein the first and second threshold values are calculated once per frame from a signal-to-noise ratio for the utterance. - View Dependent Claims (2, 3)
-
-
4. A method of detecting endpoints of an utterance in frames of a received signal, comprising the steps of:
-
comparing an utterance with a first threshold value to determine a first starting point and a first ending point of the utterance;
comparing with a second threshold value a part of the utterance that predates the first starting point to determine a second starting point of the utterance; and
comparing with the second threshold value a part of the utterance that postdates the first ending point to determine a second ending point of the utterance, wherein the first and second threshold values are calculated once per frame from a signal-to-noise ratio for the utterance. - View Dependent Claims (5, 6)
-
-
7. A device for detecting endpoints of an utterance in frames of a received signal, comprising:
-
means for comparing an utterance with a first threshold value to determine a first starting point and a first ending point of the utterance;
means for comparing with a second threshold value a part of the utterance that predates the first starting point to determine a second starting point of the utterance; and
means for comparing with the second threshold value a part of the utterance that postdates the first ending point to determine a second ending point of the utterance, wherein the first and second threshold values are calculated once per frame from a signal-to-noise ratio for the utterance. - View Dependent Claims (8, 9)
-
-
10. A voice recognition system, comprising:
-
an acoustic processor configured to determine parameters of an utterance contained in received frames of a speech signal, the acoustic processor including an endpoint detector configured to compare the utterance with a first threshold value to determine a first starting point and a first ending point of the utterance, compare with a second threshold value a part of the utterance that predates the first starting point to determine a second starting point of the utterance, and compare with the second threshold value a part of the utterance that postdates the first ending point to determine a second ending point of the utterance, wherein the first and second threshold values are calculated once per frame from a signal-to-noise ratio for the utterance;
pattern comparison logic coupled to the acoustic processor and configured to compare stored word templates with parameters associated with the utterance; and
a database coupled to the pattern comparison logic and configured to store the word templates. - View Dependent Claims (11, 12, 13)
-
Specification