System and method for an endpoint detection of speech for improved speech recognition in noisy environments

US 8,175,876 B2
Filed: 06/25/2009
Issued: 05/08/2012
Est. Priority Date: 03/02/2001
Status: Expired due to Fees

First Claim

Patent Images

1. A method for end-point decision for a speech signal, the method comprising:

receiving a plurality of frames of the speech signal;

extracting, using a processor, an energy parameter and a cepstral vector parameter for at least one frame of the plurality of frames;

calculating, using the processor, a cepstral distance between the cepstral vector parameter and a silence mean cepstral vector;

using a first condition, by the processor, to make a first end-point decision for the at least one frame of the plurality of frames by comparing the energy parameter to a first energy threshold; and

using a second condition, by the processor, to make a second end-point decision for the at least one frame of the plurality of frames by comparing the energy parameter to a second energy threshold and by comparing the cepstral distance to a first cepstral distance threshold, wherein the second energy threshold is lower than the first energy threshold.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

According to a disclosed embodiment, an endpointer determines the background energy of a first portion of a speech signal, and a cepstral computing module extracts one or more features of the first portion. The endpointer calculates an average distance of the first portion based on the features. Subsequently, an energy computing module measures the energy of a second portion of the speech signal, and the cepstral computing module extracts one or more features of the second portion. Based on the features of the second portion, the endpointer calculates a distance of the second portion. Thereafter, the endpointer contrasts the energy of the second portion with the background energy of the first portion, and compares the distance of the second portion with the distance of the first portion. The second portion of the speech signal is classified by the endpointer as speech or non-speech based on the contrast and the comparison.

171 Citations

26 Claims

1. A method for end-point decision for a speech signal, the method comprising:
- receiving a plurality of frames of the speech signal;
  
  extracting, using a processor, an energy parameter and a cepstral vector parameter for at least one frame of the plurality of frames;
  
  calculating, using the processor, a cepstral distance between the cepstral vector parameter and a silence mean cepstral vector;
  
  using a first condition, by the processor, to make a first end-point decision for the at least one frame of the plurality of frames by comparing the energy parameter to a first energy threshold; and
  
  using a second condition, by the processor, to make a second end-point decision for the at least one frame of the plurality of frames by comparing the energy parameter to a second energy threshold and by comparing the cepstral distance to a first cepstral distance threshold, wherein the second energy threshold is lower than the first energy threshold.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1 further comprising:
    - using a third condition to make a third end-point decision for the at least one frame of the plurality of frames by comparing the energy parameter to a third energy threshold and by comparing the cepstral distance to a second cepstral distance threshold, wherein the third energy threshold is lower than the second energy threshold and the second cepstral distance threshold is higher than the first cepstral distance threshold.
  - 3. The method of claim 2 further comprising:
    - receiving an initial plurality of frames of the speech signal;
      
      calculating a silence average background energy parameter using the initial plurality of frames;
      
      obtaining the first energy threshold, the second energy threshold and the third energy threshold using the silence average background energy parameter.
  - 4. The method of claim 3, wherein the first energy threshold is obtained from the silence average background energy parameter by a multiplication by a first constant, the second energy threshold is obtained from the silence background energy parameter by a multiplication by a second constant and the third energy threshold is obtained from the silence background energy parameter by a multiplication by a third constant.
  - 5. The method of claim 2 further comprising:
    - receiving an initial plurality of frames of the speech signal;
      
      calculating the silence mean cepstral vector using the initial plurality of frames;
      
      calculating a silence cepstral distance of the initial plurality of frames using the silence mean cepstral vector;
      
      obtaining the first cepstral distance threshold and the second cepstral distance threshold using the silence cepstral distance.
  - 6. The method of claim 5, wherein the second cepstral distance threshold is obtained from the silence cepstral distance by multiplying by a fourth constant.
  - 7. The method of claim 2 further comprising:
    - receiving an initial plurality of frames of the speech signal;
      
      calculating a silence average background energy parameter using the initial plurality of frames;
      
      calculating the silence mean cepstral vector using the initial plurality of frames;
      
      calculating a silence cepstral distance of the initial plurality of frames using the silence mean cepstral vector;
      
      obtaining the first energy threshold, the second energy threshold and the third energy threshold using the silence average background energy parameter and obtaining the first cepstral distance threshold and the second cepstral distance using the silence cepstral distance.
  - 8. The method of claim 7, wherein the first energy threshold is obtained from the silence average background energy parameter by a multiplication by a first constant, the second energy threshold is obtained from the silence background energy parameter by a multiplication by a second constant, the third energy threshold is obtained from the silence background energy parameter by a multiplication by a third constant and the second cepstral distance is obtained from the silence cepstral distance by multiplying by a fourth constant.
  - 9. The method of claim 1 further comprising:
    - receiving an initial plurality of frames of the speech signal;
      
      calculating a silence average background energy parameter using the initial plurality of frames;
      
      obtaining the first energy threshold and the second energy threshold using the silence average background energy parameter.
  - 10. The method of claim 9, wherein the first energy threshold is obtained from the silence average background energy parameter by a multiplication by a first constant and the second energy threshold is obtained from the silence background energy parameter by a multiplication by a second constant.
  - 11. The method of claim 1 further comprising:
    - receiving an initial plurality of frames of the speech signal;
      
      calculating the silence mean cepstral vector using the initial plurality of frames;
      
      calculating a silence cepstral distance of the initial plurality of frames using the silence mean cepstral vector;
      
      obtaining the first cepstral distance threshold using the silence cepstral distance.
  - 12. The method of claim 1 further comprising:
    - receiving an initial plurality of frames of the speech signal;
      
      calculating a silence average background energy parameter using the initial plurality of frames;
      
      calculating the silence mean cepstral vector using the initial plurality of frames;
      
      calculating a silence cepstral distance of the initial plurality of frames using the silence mean cepstral vector;
      
      obtaining the first energy threshold and the second energy threshold using the silence average background energy parameter and obtaining the first cepstral distance threshold using the silence cepstral distance.
  - 13. The method of claim 12, wherein the first energy threshold is obtained from the silence average background energy parameter by a multiplication by a first constant and the second energy threshold is obtained from the silence background energy parameter by a multiplication by a second constant.

14. A system for end-point decision for a speech signal, the system comprising:
- a processor configured to;
  
  receive a plurality of frames of the speech signal;
  
  extract an energy parameter and a cepstral vector parameter for at least one frame of the plurality of frames;
  
  calculate a cepstral distance between the cepstral vector parameter and a silence mean cepstral vector;
  
  use a first condition to make a first end-point decision for the at least one frame of the plurality of frames by comparing the energy parameter to a first energy threshold; and
  
  use a second condition to make a second end-point decision for the at least one frame of the plurality of frames by comparing the energy parameter to a second energy threshold and by comparing the cepstral distance to a first cepstral distance threshold, wherein the second energy threshold is lower than the first energy threshold.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
- - 15. The system of claim 14, wherein the processor is further configured to:
    - use a third condition to make a third end-point decision for the at least one frame of the plurality of frames by comparing the energy parameter to a third energy threshold and by comparing the cepstral distance to a second cepstral distance threshold, wherein the third energy threshold is lower than the second energy threshold and the second cepstral distance threshold is higher than the first cepstral distance threshold.
  - 16. The system of claim 15, wherein the processor is further configured to:
    - receive an initial plurality of frames of the speech signal;
      
      calculate a silence average background energy parameter using the initial plurality of frames;
      
      obtain the first energy threshold, the second energy threshold and the third energy threshold using the silence average background energy parameter.
  - 17. The system of claim 16, wherein the first energy threshold is obtained from the silence average background energy parameter by a multiplication by a first constant, the second energy threshold is obtained from the silence background energy parameter by a multiplication by a second constant and the third energy threshold is obtained from the silence background energy parameter by a multiplication by a third constant.
  - 18. The system of claim 15, wherein the processor is further configured to:
    - receive an initial plurality of frames of the speech signal;
      
      calculate the silence mean cepstral vector using the initial plurality of frames;
      
      calculate a silence cepstral distance of the initial plurality of frames using the silence mean cepstral vector;
      
      obtain the first cepstral distance threshold and the second cepstral distance threshold using the silence cepstral distance.
  - 19. The system of claim 18, wherein the second cepstral distance threshold is obtained from the silence cepstral distance by multiplying by a fourth constant.
  - 20. The system of claim 15, wherein the processor is further configured to:
    - receive an initial plurality of frames of the speech signal;
      
      calculate a silence average background energy parameter using the initial plurality of frames;
      
      calculate the silence mean cepstral vector using the initial plurality of frames;
      
      calculate a silence cepstral distance of the initial plurality of frames using the silence mean cepstral vector;
      
      obtain the first energy threshold, the second energy threshold and the third energy threshold using the silence average background energy parameter and obtaining the first cepstral distance threshold and the second cepstral distance using the silence cepstral distance.
  - 21. The system of claim 20, wherein the first energy threshold is obtained from the silence average background energy parameter by a multiplication by a first constant, the second energy threshold is obtained from the silence background energy parameter by a multiplication by a second constant, the third energy threshold is obtained from the silence background energy parameter by a multiplication by a third constant and the second cepstral distance is obtained from the silence cepstral distance by multiplying by a fourth constant.
  - 22. The system of claim 14, wherein the processor is further configured to:
    - receive an initial plurality of frames of the speech signal;
      
      calculate a silence average background energy parameter using the initial plurality of frames;
      
      obtain the first energy threshold and the second energy threshold using the silence average background energy parameter.
  - 23. The system of claim 22, wherein the first energy threshold is obtained from the silence average background energy parameter by a multiplication by a first constant and the second energy threshold is obtained from the silence background energy parameter by a multiplication by a second constant.
  - 24. The system of claim 14, wherein the processor is further configured to:
    - receive an initial plurality of frames of the speech signal;
      
      calculate the silence mean cepstral vector using the initial plurality of frames;
      
      calculate a silence cepstral distance of the initial plurality of frames using the silence mean cepstral vector;
      
      obtain the first cepstral distance threshold using the silence cepstral distance.
  - 25. The system of claim 14, wherein the processor is further configured to:
    - receive an initial plurality of frames of the speech signal;
      
      calculate a silence average background energy parameter using the initial plurality of frames;
      
      calculate the silence mean cepstral vector using the initial plurality of frames;
      
      calculate a silence cepstral distance of the initial plurality of frames using the silence mean cepstral vector;
      
      obtain the first energy threshold and the second energy threshold using the silence average background energy parameter and obtaining the first cepstral distance threshold using the silence cepstral distance.
  - 26. The system of claim 25, wherein the first energy threshold is obtained from the silence average background energy parameter by a multiplication by a first constant and the second energy threshold is obtained from the silence background energy parameter by a multiplication by a second constant.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
WIAV Solutions LLC
Original Assignee
WIAV Solutions LLC
Inventors
Bou-Ghazale, Sahar E., Asadi, Ayman O., Assaleh, Khaled
Primary Examiner(s)
Han, Qi

Application Number

US12/459,168
Publication Number

US 20100030559A1
Time in Patent Office

1,048 Days
Field of Search

704/248, 704/233, 704/253, 704/210, 704/215
US Class Current

704/248
CPC Class Codes

G10L 25/87 Detection of discrete point...

System and method for an endpoint detection of speech for improved speech recognition in noisy environments

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

171 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for an endpoint detection of speech for improved speech recognition in noisy environments

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

171 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links