Prosody based endpoint detection

US 6,873,953 B1
Filed: 05/22/2000
Issued: 03/29/2005
Est. Priority Date: 05/22/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A method of operating an endpoint detector for speech recognition, the method comprising:

inputting speech representing an utterance;

determining that a value of the speech has dropped below a threshold value;

computing an intonation of the utterance;

referencing the intonation of the utterance against an intonation model to determine a first end-of-utterance probability;

determining a period of time that has elapsed since the value of the speech dropped below the threshold value;

referencing the period of time against an elapsed time model to determine a second end-of-utterance probability;

computing an overall end-of-utterance probability as a function of the first and second end-of-utterance probabilities; and

determining whether an end-of-utterance has occurred based on the overall end-of-utterance probability.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus are provided for performing prosody based endpoint detection of speech in a speech recognition system. Input speech represents an utterance, which has an intonation pattern. An end-of-utterance condition is identified based on prosodic parameters of the utterance, such as the intonation pattern and the duration of the final syllable of the utterance, as well as non-prosodic parameters, such as the log energy of the speech.

155 Citations

8 Claims

1. A method of operating an endpoint detector for speech recognition, the method comprising:
- inputting speech representing an utterance;
  
  determining that a value of the speech has dropped below a threshold value;
  
  computing an intonation of the utterance;
  
  referencing the intonation of the utterance against an intonation model to determine a first end-of-utterance probability;
  
  determining a period of time that has elapsed since the value of the speech dropped below the threshold value;
  
  referencing the period of time against an elapsed time model to determine a second end-of-utterance probability;
  
  computing an overall end-of-utterance probability as a function of the first and second end-of-utterance probabilities; and
  
  determining whether an end-of-utterance has occurred based on the overall end-of-utterance probability.
- View Dependent Claims (2, 3)
- - 2. A method as recited in claim 1, wherein said computing an intonation of the utterance comprises computing an intonation of the utterance by determining the fundamental frequency of the utterance as a function of time.
  - 3. A method as recited in claim 2, further comprising:
    - determining a duration of a final syllable of the utterance; and
      
      , referencing the duration of the final syllable against a syllable duration model to determine a third end-of-utterance probability;
      
      wherein said computing an overall end-of-utterance probability comprises computing the overall end-of-utterance probability as a function of the first, second, and third end-of-utterance probabilities.

4. A method of operating an endpoint detector for speech recognition, the method comprising:
- inputting speech representing an utterance;
  
  computing an intonation of the utterance;
  
  referencing the intonation of the utterance against an intonation model to determine a first end-of-utterance probability;
  
  determining a duration of a final syllable of the utterance;
  
  referencing the duration of the final syllable against a syllable duration model to determine a second end-of-utterance probability;
  
  computing an overall end-of-utterance probability as a function of the first and second end-of-utterance probabilities; and
  
  determining whether an end-of-utterance has occurred based on the overall end-of-utterance probability.
- View Dependent Claims (5, 6)
- - 5. A method as recited in claim 4, wherein said computing an intonation of the utterance comprises computing an intonation of the utterance by determining the fundamental frequency of the utterance as a function of time.
  - 6. A method as recited in claim 4, further comprising:
    - determining that a value of the speech has dropped below a threshold value;
      
      determining a period of time that has elapsed since the value of the speech dropped below the threshold value; and
      
      referencing the period of time against an elapsed time model to determine a second end-of-utterance probability;
      
      wherein said computing an overall end-of-utterance probability comprises computing the overall end-of-utterance probability as a function of the first, second, and third end-of-utterance probabilities.

7. A method of operating an endpoint detector for speech recognition, the method comprising:
- inputting speech representing an utterance, the utterance having a time-varying fundamental frequency;
  
  determining that a value of the speech has drooped below a threshold value;
  
  computing an intonation of the utterance by determining the fundamental frequency of the utterance as a function of time;
  
  referencing the intonation of the utterance against an intonation model to determine a first end-of-utterance probability;
  
  determining a period of time that has elapsed since a value of the speech dropped below the threshold value;
  
  referencing the period of time against an elapsed time model to determine a second end-of-utterance probability;
  
  determining a duration of a final syllable of the utterance;
  
  referencing the duration of the final syllable against a syllable duration model to determine a third end-of-utterance probability;
  
  computing an overall end-of-utterance probability as a function of the first, second, and third end-of-utterance probabilities; and
  
  determining whether an end-of-utterance has occurred by comparing the overall end-of-utterance probability to a threshold probability.

8. An apparatus for performing endpoint detection comprising:
- means for inputting speech representing an utterance, the utterance having a time-varying fundamental frequency;
  
  means for determining that a value of the speech has dropped below a threshold value;
  
  means for computing an intonation of the utterance by determining the fundamental frequency of the utterance as a function of time;
  
  means for referencing the intonation of the utterance against an intonation model to determine a first end-of-utterance probability;
  
  means for determining a period of time that has elapsed since the speech dropped below the threshold value;
  
  means for referencing the period of time against an elapsed time model to determine a second end-of-utterance probability;
  
  means for computing the duration of the final syllable of the utterance against a syllable duration model to determine a third end-of-utterance probability;
  
  means for determining an overall end-of-utterance probability as a function of the first, second, and third end-of-utterance probabilities; and
  
  means for determining whether an end-of-utterance has occurred by comparing the overall end-of-utterance probability to a threshold probability.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Lennig, Matthew
Primary Examiner(s)
Azad, Abul K.

Application Number

US09/576,116
Time in Patent Office

1,772 Days
Field of Search

704/251, 704/252, 704/253, 704/254, 704/247, 704/248, 704/249, 704/250, 704/207, 704/208, 704/209, 704/210, 704/218, 704/219, 704/214, 704/215, 704/226, 704/231, 704/256, 704/261, 704/267, 704/268
US Class Current

704/253
CPC Class Codes

G10L 25/87 Detection of discrete point...

Prosody based endpoint detection

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

155 Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

Prosody based endpoint detection

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

155 Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links