System and method for an endpoint detection of speech for improved speech recognition in noisy environments
First Claim
1. A method for end-point decision for a speech signal, the method comprising:
- receiving a plurality of frames of the speech signal;
extracting, using a processor, an energy parameter and a cepstral vector parameter for at least one frame of the plurality of frames;
calculating, using the processor, a cepstral distance between the cepstral vector parameter and a silence mean cepstral vector;
using a first condition, by the processor, to make a first end-point decision for the at least one frame of the plurality of frames by comparing the energy parameter to a first energy threshold; and
using a second condition, by the processor, to make a second end-point decision for the at least one frame of the plurality of frames by comparing the energy parameter to a second energy threshold and by comparing the cepstral distance to a first cepstral distance threshold, wherein the second energy threshold is lower than the first energy threshold.
5 Assignments
0 Petitions
Accused Products
Abstract
According to a disclosed embodiment, an endpointer determines the background energy of a first portion of a speech signal, and a cepstral computing module extracts one or more features of the first portion. The endpointer calculates an average distance of the first portion based on the features. Subsequently, an energy computing module measures the energy of a second portion of the speech signal, and the cepstral computing module extracts one or more features of the second portion. Based on the features of the second portion, the endpointer calculates a distance of the second portion. Thereafter, the endpointer contrasts the energy of the second portion with the background energy of the first portion, and compares the distance of the second portion with the distance of the first portion. The second portion of the speech signal is classified by the endpointer as speech or non-speech based on the contrast and the comparison.
171 Citations
26 Claims
-
1. A method for end-point decision for a speech signal, the method comprising:
-
receiving a plurality of frames of the speech signal; extracting, using a processor, an energy parameter and a cepstral vector parameter for at least one frame of the plurality of frames; calculating, using the processor, a cepstral distance between the cepstral vector parameter and a silence mean cepstral vector; using a first condition, by the processor, to make a first end-point decision for the at least one frame of the plurality of frames by comparing the energy parameter to a first energy threshold; and using a second condition, by the processor, to make a second end-point decision for the at least one frame of the plurality of frames by comparing the energy parameter to a second energy threshold and by comparing the cepstral distance to a first cepstral distance threshold, wherein the second energy threshold is lower than the first energy threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system for end-point decision for a speech signal, the system comprising:
a processor configured to; receive a plurality of frames of the speech signal; extract an energy parameter and a cepstral vector parameter for at least one frame of the plurality of frames; calculate a cepstral distance between the cepstral vector parameter and a silence mean cepstral vector; use a first condition to make a first end-point decision for the at least one frame of the plurality of frames by comparing the energy parameter to a first energy threshold; and use a second condition to make a second end-point decision for the at least one frame of the plurality of frames by comparing the energy parameter to a second energy threshold and by comparing the cepstral distance to a first cepstral distance threshold, wherein the second energy threshold is lower than the first energy threshold. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
Specification