Method and apparatus for word counting in continuous speech recognition useful for reliable barge-in and early end of speech detection

US RE38,649 E1
Filed: 07/13/2001
Issued: 11/09/2004
Est. Priority Date: 07/31/1997
Status: Expired due to Fees

First Claim

Patent Images

1. A method comprising the steps of:

a. determining if a speech utterance has started, if an utterance has not started then obtaining next frame and re-running step a, otherwise continuing to step b;

b. obtaining a speech frame of the speech utterance that represents a frame period that is next in time;

c. extracting features from the speech frame;

d. performing dynamic programming to build a speech recognition network;

e. performing a beam search using the speech recognition network;

f. updating a decoding tree of the speech utterance after the beam search;

g. determining if a first word of the speech utterance has been received and if it has been received disabling any aural prompt and continuing to step h, otherwise if first has not been determined continuing to step h;

h. determining if n words have been received and if n words have not been received then returning to step b, otherwise continuing to step i;

i. backtracking through the beam search path having the greatest likelihood score to obtain a string having a greatest likelihood of corresponding to the received utterance when speech recognition of the word sequence has completed; and

j. outputting the string.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speech recognition technology has attained maturity such that the most likely speech recognition result has been reached and is available before an energy based termination of speech has been made. The present invention innovatively uses the rapidly available speech recognition results to provide intelligent barge-in for voice-response systems, to count words to output sub-sequences to provide paralleling and/or pipelining of tasks related to the entire word sequence, and to count words to provide rapid, speech recognition based termination of speech processing and outputting of the recognized word sequence.

Citations

12 Claims

1. A method comprising the steps of:
- a. determining if a speech utterance has started, if an utterance has not started then obtaining next frame and re-running step a, otherwise continuing to step b;
  
  b. obtaining a speech frame of the speech utterance that represents a frame period that is next in time;
  
  c. extracting features from the speech frame;
  
  d. performing dynamic programming to build a speech recognition network;
  
  e. performing a beam search using the speech recognition network;
  
  f. updating a decoding tree of the speech utterance after the beam search;
  
  g. determining if a first word of the speech utterance has been received and if it has been received disabling any aural prompt and continuing to step h, otherwise if first has not been determined continuing to step h;
  
  h. determining if n words have been received and if n words have not been received then returning to step b, otherwise continuing to step i;
  
  i. backtracking through the beam search path having the greatest likelihood score to obtain a string having a greatest likelihood of corresponding to the received utterance when speech recognition of the word sequence has completed; and
  
  j. outputting the string.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, wherein said first word recognized must be a word found in a pre-specified grammar.
  - 3. The method of claim 1, further comprising the step of:

4. A method for speech recognition of comprising the steps of:
- a. determining if a speech utterance has started, if an utterance has not started then returning to the beginning of step a, otherwise continuing to step b;
  
  b. getting a speech frame that represents a frame period that is next in time;
  
  c. extracting features from the speech frame;
  
  d. using the features extracted from the present speech frame to score word models of a speech recognition grammar;
  
  e. dynamically programming an active network of word sequences using a Viterbi algorithm;
  
  f. pruning unlikely words and extending likely words to update the active network;
  
  g. updating a decoding tree;
  
  h. determining a word count n for this speech frame of the speech utterance;
  
  i. examining the word count n and if the word count n is equal to one disabling any aural prompt and continuing with step b, if the word count n is greater than one but less than a termination count N continuing with step j; and
  
  if the word count n is at least equal to the termination count N continuing with step k;
  
  j. determining if n words have been determined as recognized by each of the word counts and if n words have not been determined as recognized then returning to step b and if n words have been recognized outputting the n words and returning to step b, otherwise continuing to step i;
  
  k. determining if the end of the utterance has been reached by determining if the word count of each of the presently active word sequences is equal to the same termination count N and if each of the word counts of the presently active word sequences is equal to N then declaring the utterance ended and continuing to step m, otherwise continuing to step 1;
  
  l. determining if there has not been any speech energy for a pre-specified gap time and if there has not been any then declaring the utterance ended and continuing to step m, otherwise returning to step b;
  
  m. backtracking through the various active word sequences to obtain the word sequence with the greatest likelihood of matching the utterance; and
  
  n. outputting the word sequence corresponding to the greatest likelihood.
- View Dependent Claims (5, 6, 7, 8)
- - 5. The method of claim 4, wherein step h further comprises:
6. The method of claim 4, wherein said said word sequence must be found in a pre-specified grammar.
7. The method of claim 4, further comprising the step of:
- after step j, determining if the partial word sequence corresponds to a word sequence requiring a different maximum word count, and if a different maximum word count is required adjusting the maximum word count N to the different maximum word count.
8. The method of claim 7, wherein the partial word sequence requiring a different maximum word count is a telephone number prefix.

9. An apparatus for speech recognition of a speech utterance comprising:
- means for determining if the speech utterance has started, means responsive to said speech utterance start determining means for obtaining a speech frame of the speech utterance that represents a frame period that is next in time;
  
  means for extracting features from said speech frame;
  
  means for building a speech recognition network using dynamic programming;
  
  means for performing a beam search using the speech recognition network;
  
  means for updating a decoding tree of the speech utterance after the beam search;
  
  means for determining if a first word of the speech utterance has been received and if it has been received disabling any aural prompt;
  
  means for determining if N words have been received to quickly end further speech recognition processing of the speech utterance;
  
  means responsive to said N word determining means for backtracking through the beam search path having the greatest likelihood score to obtain a word sequence having a greatest likelihood of corresponding to the received speech utterance ; and
  
  means for outputting said word sequence.
- View Dependent Claims (10)
- - 10. The apparatus of claim 9, wherein all said means comprises a system having a processor running a program stored in connected memory.

11. A method for use with an interactive speech recognition system having an aural prompt, comprising the steps of:
- a. determining if a speech utterance has started, if an utterance has not started then obtaining next frame and re-running step a, otherwise continuing to step b;

12. A method for use with an interactive speech recognition system having an aural prompt, comprising the steps of:
- a. determining if a speech utterance has started, if an utterance has not started then obtaining next frame and re-running step a, otherwise continuing to step b;

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Lucent Technologies, Inc. (Nokia Corporation)
Original Assignee
Lucent Technologies, Inc. (Nokia Corporation)
Inventors
Setlur, Anand Rangaswamy, Sukkar, Rafid Antoon
Primary Examiner(s)
Banks-Harold, Marsha D.
Assistant Examiner(s)
Lerner, Martin

Application Number

US09/905,596
Time in Patent Office

1,215 Days
Field of Search

704/231, 704/233, 704/251, 704/252, 704/253, 704/254, 704/255, 704/275, 379/79, 379/80, 379/88.01, 379/88.04, 379/88.03
US Class Current

704/231
CPC Class Codes

G10L 15/05 Word boundary detection

G10L 15/22 Procedures used during a sp...

Method and apparatus for word counting in continuous speech recognition useful for reliable barge-in and early end of speech detection

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for word counting in continuous speech recognition useful for reliable barge-in and early end of speech detection

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links