Prosody based endpoint detection
First Claim
Patent Images
1. A method of operating an endpoint detector for speech recognition, the method comprising:
- inputting speech representing an utterance;
determining that a value of the speech has dropped below a threshold value;
computing an intonation of the utterance;
referencing the intonation of the utterance against an intonation model to determine a first end-of-utterance probability;
determining a period of time that has elapsed since the value of the speech dropped below the threshold value;
referencing the period of time against an elapsed time model to determine a second end-of-utterance probability;
computing an overall end-of-utterance probability as a function of the first and second end-of-utterance probabilities; and
determining whether an end-of-utterance has occurred based on the overall end-of-utterance probability.
5 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus are provided for performing prosody based endpoint detection of speech in a speech recognition system. Input speech represents an utterance, which has an intonation pattern. An end-of-utterance condition is identified based on prosodic parameters of the utterance, such as the intonation pattern and the duration of the final syllable of the utterance, as well as non-prosodic parameters, such as the log energy of the speech.
155 Citations
8 Claims
-
1. A method of operating an endpoint detector for speech recognition, the method comprising:
-
inputting speech representing an utterance;
determining that a value of the speech has dropped below a threshold value;
computing an intonation of the utterance;
referencing the intonation of the utterance against an intonation model to determine a first end-of-utterance probability;
determining a period of time that has elapsed since the value of the speech dropped below the threshold value;
referencing the period of time against an elapsed time model to determine a second end-of-utterance probability;
computing an overall end-of-utterance probability as a function of the first and second end-of-utterance probabilities; and
determining whether an end-of-utterance has occurred based on the overall end-of-utterance probability. - View Dependent Claims (2, 3)
-
-
4. A method of operating an endpoint detector for speech recognition, the method comprising:
-
inputting speech representing an utterance;
computing an intonation of the utterance;
referencing the intonation of the utterance against an intonation model to determine a first end-of-utterance probability;
determining a duration of a final syllable of the utterance;
referencing the duration of the final syllable against a syllable duration model to determine a second end-of-utterance probability;
computing an overall end-of-utterance probability as a function of the first and second end-of-utterance probabilities; and
determining whether an end-of-utterance has occurred based on the overall end-of-utterance probability. - View Dependent Claims (5, 6)
-
-
7. A method of operating an endpoint detector for speech recognition, the method comprising:
-
inputting speech representing an utterance, the utterance having a time-varying fundamental frequency;
determining that a value of the speech has drooped below a threshold value;
computing an intonation of the utterance by determining the fundamental frequency of the utterance as a function of time;
referencing the intonation of the utterance against an intonation model to determine a first end-of-utterance probability;
determining a period of time that has elapsed since a value of the speech dropped below the threshold value;
referencing the period of time against an elapsed time model to determine a second end-of-utterance probability;
determining a duration of a final syllable of the utterance;
referencing the duration of the final syllable against a syllable duration model to determine a third end-of-utterance probability;
computing an overall end-of-utterance probability as a function of the first, second, and third end-of-utterance probabilities; and
determining whether an end-of-utterance has occurred by comparing the overall end-of-utterance probability to a threshold probability.
-
-
8. An apparatus for performing endpoint detection comprising:
-
means for inputting speech representing an utterance, the utterance having a time-varying fundamental frequency;
means for determining that a value of the speech has dropped below a threshold value;
means for computing an intonation of the utterance by determining the fundamental frequency of the utterance as a function of time;
means for referencing the intonation of the utterance against an intonation model to determine a first end-of-utterance probability;
means for determining a period of time that has elapsed since the speech dropped below the threshold value;
means for referencing the period of time against an elapsed time model to determine a second end-of-utterance probability;
means for computing the duration of the final syllable of the utterance against a syllable duration model to determine a third end-of-utterance probability;
means for determining an overall end-of-utterance probability as a function of the first, second, and third end-of-utterance probabilities; and
means for determining whether an end-of-utterance has occurred by comparing the overall end-of-utterance probability to a threshold probability.
-
Specification