Method and apparatus for performing real-time endpoint detection in automatic speech recognition
First Claim
1. A method for performing real-time endpoint detection for use in automatic speech recognition applied to an input signal, the method comprising the steps of:
- extracting one or more features from said input signal to generate a sequence of extracted feature values;
applying a filter to said sequence of extracted feature values to generate a sequence of filter output values, said filter comprising an edge detecting filter and said filter output values indicative of whether an edge is present in said sequence of extracted feature values; and
applying a state transition diagram to said sequence of filter output values to identify endpoints within said input signal.
8 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for performing real-time endpoint detection for use in automatic speech recognition. A filter is applied to the input speech signal and the filter output is then evaluated with use of a state transition diagram (i.e., a finite state machine). The filter is advantageously designed in light of several criteria in order to increase the accuracy and robustness of detection. The state transition diagram advantageously has three states. The endpoints which are detected may then be advantageously applied to the problem of energy normalization of the speech portion of the signal.
106 Citations
28 Claims
-
1. A method for performing real-time endpoint detection for use in automatic speech recognition applied to an input signal, the method comprising the steps of:
-
extracting one or more features from said input signal to generate a sequence of extracted feature values;
applying a filter to said sequence of extracted feature values to generate a sequence of filter output values, said filter comprising an edge detecting filter and said filter output values indicative of whether an edge is present in said sequence of extracted feature values; and
applying a state transition diagram to said sequence of filter output values to identify endpoints within said input signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
6. The method of claim 5 wherein said filter parameters are set approximately to s=0.5385;
- A=0.2208; and
K1 . . . K6={1.583, 1.468, −
0.078, −
0.036, −
0.872, −
0.56}.
- A=0.2208; and
-
7. The method of claim 4 wherein said predetermined window is of a size approximately equal to 25.
-
8. The method of claim 1 wherein said state transition diagram has at least three states.
-
9. The method of claim 8 wherein said at least three states include a silence state, an in-speech state and a leaving-speech state.
-
10. The method of claim 1 wherein one or more transitions of said state transition diagram operates based on a comparison of one of said filter output values with one or more predetermined thresholds.
-
11. The method of claim 10 wherein said one or more thresholds comprise a lower threshold and an upper threshold.
-
12. The method of claim 11 wherein said state transition diagram has at least three states including a silence state, an in-speech state and a leaving-speech state, and wherein one or more transitions originating from the leaving-speech state operates based on a count of number of a frames which have elapsed since said leaving-speech state was last entered.
-
13. The method of claim 1 wherein said identified endpoints comprise speech beginning points and speech ending points.
-
14. The method of claim 1 further comprising the step of performing real-time energy normalization on said input signal based on said identified endpoints.
-
15. An apparatus for performing real-time endpoint detection for use in automatic speech recognition applied to an input signal, the apparatus comprising:
-
means for extracting one or more features from said input signal to generate a sequence of extracted feature values;
a filter applied to said sequence of extracted feature values which generates a sequence of filter output values, said filter comprising an edge detecting filter and said filter output values indicative of whether an edge is present in said sequence of extracted feature values; and
a state transition diagram applied to said sequence of filter output values which identifies endpoints within said input signal. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
20. The apparatus of claim 19 wherein said filter parameters are set approximately to s=0.5385;
- A=0.2208; and
K1 . . . K6={1.583, 1.468, −
0.078, −
0.036, −
0.872, −
0.56}.
- A=0.2208; and
-
21. The apparatus of claim 18 wherein said predetermined window is of a size approximately equal to 25.
-
22. The apparatus of claim 15 wherein said state transition diagram has at least three states.
-
23. The apparatus of claim 22 wherein said at least three states include a silence state, an in-speech state and a leaving-speech state.
-
24. The apparatus of claim 15 wherein one or more transitions of said state transition diagram operates based on a comparison of one of said filter output values with one or more predetermined thresholds.
-
25. The apparatus of claim 24 wherein said one or more thresholds comprise a lower threshold and an upper threshold.
-
26. The apparatus of claim 25 wherein said state transition diagram has at least three states including a silence state, an in-speech state and a leaving-speech state, and wherein one or more transitions originating from the leaving-speech state operates based on a count of a number of frames which have elapsed since said leaving-speech state was last entered.
-
27. The apparatus of claim 15 wherein said identified endpoints comprise speech beginning points and speech ending points.
-
28. The apparatus of claim 15 further comprising means for performing real-time energy normalization on said input signal based on said identified endpoints.
Specification