Method and apparatus for performing real-time endpoint detection in automatic speech recognition

US 6,782,363 B2
Filed: 05/04/2001
Issued: 08/24/2004
Est. Priority Date: 05/04/2001
Status: Expired due to Term

First Claim

Patent Images

1. A method for performing real-time endpoint detection for use in automatic speech recognition applied to an input signal, the method comprising the steps of:

extracting one or more features from said input signal to generate a sequence of extracted feature values;

applying a filter to said sequence of extracted feature values to generate a sequence of filter output values, said filter comprising an edge detecting filter and said filter output values indicative of whether an edge is present in said sequence of extracted feature values; and

applying a state transition diagram to said sequence of filter output values to identify endpoints within said input signal.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for performing real-time endpoint detection for use in automatic speech recognition. A filter is applied to the input speech signal and the filter output is then evaluated with use of a state transition diagram (i.e., a finite state machine). The filter is advantageously designed in light of several criteria in order to increase the accuracy and robustness of detection. The state transition diagram advantageously has three states. The endpoints which are detected may then be advantageously applied to the problem of energy normalization of the speech portion of the signal.

106 Citations

28 Claims

1. A method for performing real-time endpoint detection for use in automatic speech recognition applied to an input signal, the method comprising the steps of:
- extracting one or more features from said input signal to generate a sequence of extracted feature values;
  
  applying a filter to said sequence of extracted feature values to generate a sequence of filter output values, said filter comprising an edge detecting filter and said filter output values indicative of whether an edge is present in said sequence of extracted feature values; and
  
  applying a state transition diagram to said sequence of filter output values to identify endpoints within said input signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1 wherein said one or more features comprise cepstral features.
  - 3. The method of claim 2 wherein said one or more features comprises a one-dimensional short-term energy feature.
  - 4. The method of claim 1 wherein said filter comprises a moving-average filter applied to a predetermined window of said sequence of said extracted feature values.
  - 5. The method of claim 4 wherein said filter comprises a filter having a profile of the form:
6. The method of claim 5 wherein said filter parameters are set approximately to s=0.5385;
- A=0.2208; and
  
  K₁. . . K₆={1.583, 1.468, −
  
  0.078, −
  
  0.036, −
  
  0.872, −
  
  0.56}.
7. The method of claim 4 wherein said predetermined window is of a size approximately equal to 25.
8. The method of claim 1 wherein said state transition diagram has at least three states.
9. The method of claim 8 wherein said at least three states include a silence state, an in-speech state and a leaving-speech state.
10. The method of claim 1 wherein one or more transitions of said state transition diagram operates based on a comparison of one of said filter output values with one or more predetermined thresholds.
11. The method of claim 10 wherein said one or more thresholds comprise a lower threshold and an upper threshold.
12. The method of claim 11 wherein said state transition diagram has at least three states including a silence state, an in-speech state and a leaving-speech state, and wherein one or more transitions originating from the leaving-speech state operates based on a count of number of a frames which have elapsed since said leaving-speech state was last entered.
13. The method of claim 1 wherein said identified endpoints comprise speech beginning points and speech ending points.
14. The method of claim 1 further comprising the step of performing real-time energy normalization on said input signal based on said identified endpoints.

15. An apparatus for performing real-time endpoint detection for use in automatic speech recognition applied to an input signal, the apparatus comprising:
- means for extracting one or more features from said input signal to generate a sequence of extracted feature values;
  
  a filter applied to said sequence of extracted feature values which generates a sequence of filter output values, said filter comprising an edge detecting filter and said filter output values indicative of whether an edge is present in said sequence of extracted feature values; and
  
  a state transition diagram applied to said sequence of filter output values which identifies endpoints within said input signal.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 16. The apparatus of claim 15 wherein said one or more features comprise cepstral features.
  - 17. The apparatus of claim 16 wherein said one or more features comprises a one-dimensional short-term energy feature.
  - 18. The apparatus of claim 15 wherein said filter comprises a moving-average filter and is applied to a predetermined window of said sequence of said extracted feature values.
  - 19. The apparatus of claim 18 wherein said filter comprises a filter having a profile of the form:
20. The apparatus of claim 19 wherein said filter parameters are set approximately to s=0.5385;
- A=0.2208; and
  
  K₁. . . K₆={1.583, 1.468, −
  
  0.078, −
  
  0.036, −
  
  0.872, −
  
  0.56}.
21. The apparatus of claim 18 wherein said predetermined window is of a size approximately equal to 25.
22. The apparatus of claim 15 wherein said state transition diagram has at least three states.
23. The apparatus of claim 22 wherein said at least three states include a silence state, an in-speech state and a leaving-speech state.
24. The apparatus of claim 15 wherein one or more transitions of said state transition diagram operates based on a comparison of one of said filter output values with one or more predetermined thresholds.
25. The apparatus of claim 24 wherein said one or more thresholds comprise a lower threshold and an upper threshold.
26. The apparatus of claim 25 wherein said state transition diagram has at least three states including a silence state, an in-speech state and a leaving-speech state, and wherein one or more transitions originating from the leaving-speech state operates based on a count of a number of frames which have elapsed since said leaving-speech state was last entered.
27. The apparatus of claim 15 wherein said identified endpoints comprise speech beginning points and speech ending points.
28. The apparatus of claim 15 further comprising means for performing real-time energy normalization on said input signal based on said identified endpoints.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
WSOU Investments, LLC (WSOU Holdings, LLC)
Original Assignee
Lucent Technologies, Inc. (Nokia Corporation)
Inventors
Li, Qi P., Lee, Chin-hui, Zhou, Qiru, Zheng, Jinsong
Primary Examiner(s)
CHAWAN, VIJAY B

Application Number

US09/848,897
Publication Number

US 20020184017A1
Time in Patent Office

1,208 Days
Field of Search

704/210, 704/215, 704/233, 704/248, 704/253, 704/249, 704/250, 704/254, 704/255, 704/208, 704/214
US Class Current

704/248
CPC Class Codes

G10L 25/24 the extracted parameters be...

G10L 25/87 Detection of discrete point...

Method and apparatus for performing real-time endpoint detection in automatic speech recognition

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

106 Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for performing real-time endpoint detection in automatic speech recognition

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

106 Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links