Method and apparatus for recognition-based barge-in detection in the context of subword-based automatic speech recognition

US 6,574,595 B1
Filed: 07/11/2000
Issued: 06/03/2003
Est. Priority Date: 07/11/2000
Status: Expired due to Term

First Claim

Patent Images

1. A method comprising the steps of:

a. determining if a speech utterance has started, if an utterance has not started then obtaining next frame and re-running step a, otherwise continuing to step b;

b. obtaining a speech frame of the speech utterance that represents a frame period that is next in time;

c. extracting features from the speech frame;

d. computing likelihood scores for all active sub-word models for the present frame of speech;

e. performing dynamic programming to build a speech recognition network of likely sub-word paths;

f. performing a beam search using the speech recognition network;

g. updating a decoding tree of the speech utterance after the beam search;

h. finding the best scoring sub-word path of said likely sub-word paths and determining a number of sub-words in said best scoring sub-word path;

i. determining if said best scoring sub-word path has a sub-word length greater than a minimum number of sub-words and if the best scoring path is greater proceeding to step j, otherwise returning to step b;

j. determining if recorded root is a sub-string of best path and if recorded root is not a sub-string of best path recording best path as recorded root and returning to step b, otherwise proceeding to step k;

k. determining if the recorded root has remained stable for a threshold number of additional sub-words and if said root of said best scoring path has not remained stable for the threshold number returning to step b otherwise proceeding to step 1;

l. declaring barge-in;

m. disabling any prompt that is playing; and

n. backtracking through the best scoring path to obtain a string having a greatest likelihood of corresponding to the utterance; and

outputting the string.

View all claims

8 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Robust, multi-faceted sub-word method for rapidly and reliably detecting a barge-in condition of a speaker talking while an automated audio prompt is being played. This sub-word method allows for rapid stopping of the prompt to improve automatic speech recognition and reduce speaker confusion and/or frustration. An automatic speech recognition system (ASR) that practices such a method is also presented.

Citations

16 Claims

1. A method comprising the steps of:
- a. determining if a speech utterance has started, if an utterance has not started then obtaining next frame and re-running step a, otherwise continuing to step b;
  
  b. obtaining a speech frame of the speech utterance that represents a frame period that is next in time;
  
  c. extracting features from the speech frame;
  
  d. computing likelihood scores for all active sub-word models for the present frame of speech;
  
  e. performing dynamic programming to build a speech recognition network of likely sub-word paths;
  
  f. performing a beam search using the speech recognition network;
  
  g. updating a decoding tree of the speech utterance after the beam search;
  
  h. finding the best scoring sub-word path of said likely sub-word paths and determining a number of sub-words in said best scoring sub-word path;
  
  i. determining if said best scoring sub-word path has a sub-word length greater than a minimum number of sub-words and if the best scoring path is greater proceeding to step j, otherwise returning to step b;
  
  j. determining if recorded root is a sub-string of best path and if recorded root is not a sub-string of best path recording best path as recorded root and returning to step b, otherwise proceeding to step k;
  
  k. determining if the recorded root has remained stable for a threshold number of additional sub-words and if said root of said best scoring path has not remained stable for the threshold number returning to step b otherwise proceeding to step 1;
  
  l. declaring barge-in;
  
  m. disabling any prompt that is playing; and
  
  n. backtracking through the best scoring path to obtain a string having a greatest likelihood of corresponding to the utterance; and
  
  outputting the string.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1, wherein said sub-word sequence recognized must be a sub-word sequence found in a pre-specified grammar.
  - 3. The method of claim 1, further comprising the step of:
4. The method of claim 3, further comprising the step of:
- in parallel with step i, determining if a speech endpoint has been reached, if yes said speech endpoint has been reached then begin backtracking to obtain recognized string and declaring barge-in and proceeding to step m, and if no said speech endpoint has not been reached then proceeding to step b.
5. The method of claim 1, further comprising the step of:
- in parallel with step i, determining if a speech endpoint has been reached, if yes said speech endpoint has been reached then begin backtracking to obtain recognized string and declaring barge-in and proceeding to step m, and if no said speech endpoint has not been reached then proceeding to step b.

6. A method for speech recognition using barge-in comprising the steps of:
- a. determining if a speech utterance has started, if an utterance has not started then returning to the beginning of step a, otherwise continuing to step b;
  
  b. getting a speech frame that represents a frame period that is next in time;
  
  c. extracting features from the speech frame;
  
  d. using the features extracted from the present speech frame to score sub-word models of a speech recognition grammar;
  
  e. dynamically programming an active network of sub-word sequences using a Viterbi algorithm;
  
  f. pruning unlikely sub-word sequences and extending likely sub-word sequences to update the active network;
  
  g. updating a decoding tree to said likely sub-word sequences;
  
  h. finding the best scoring sub-word path of said likely sub-word paths and determining a number of sub-words in said best scoring sub-word path;
  
  i. determining if said best scoring sub-word path has a sub-word length greater than a minimum number of sub-words and if the best scoring path is greater proceeding to step j, otherwise returning to step b;
  
  j. determining if recorded root is a sub-string of best path and if recorded root is not a sub-string of best path recording best path as recorded root and returning to step b, otherwise proceeding to step k;
  
  k. determining if the recorded root has remained stable for a threshold number of additional sub-words and if said root of said best scoring path has not remained stable for the threshold number returning to step b otherwise proceeding to step l;
  
  l. declaring barge-in;
  
  m. disabling any prompt that is playing; and
  
  n. outputting the string corresponding to said best scoring path.
- View Dependent Claims (7, 8, 9, 10, 11, 12)
- - 7. The method of claim 6, wherein said sub-word sequence recognized must be a sub-word sequence found in a pre-specified grammar.
  - 8. The method of claim 6, further comprising the step of:
9. The method of claim 8, wherein step h further comprises:
- examining all viable sub-word sequences contained in the decoding tree for the present speech frame;
  
  traversing through pointers that are associated with sub-word sequences of the decoding tree; and
  
  counting a number of sub-words in the best scoring sub-word sequence path.
10. The method of claim 9, wherein only pointers that are associated with sub-word sequences of the decoding tree that have speech content are traversed.
11. The method of claim 6, wherein step h further comprises:
- examining all viable sub-word sequences contained in the decoding tree for the present speech frame;
  
  traversing through pointers that are associated with sub-word sequences of the decoding tree; and
  
  counting a number of sub-words in the best scoring sub-word sequence path.
12. The method of claim 11, wherein only pointers that are associated with sub-word sequences of the decoding tree that have speech content are traversed.

13. An apparatus for automatic speech recognition of a speech utterance to declare barge-in comprising:
- means for determining if the speech utterance has started, means responsive to said speech utterance start determining means for obtaining a speech frame of the speech utterance that represents a frame period that is next in time;
  
  means for extracting features from said speech frame;
  
  means for performing dynamic programming to build a speech recognition network of likely sub-word paths;
  
  means for performing a beam search using the speech recognition network;
  
  means for updating a decoding tree of the speech utterance after the beam search;
  
  means for finding the best scoring sub-word path of said likely sub-word paths and determining a number of sub-words in said best scoring sub-word path; and
  
  means for determining if said best scoring sub-word path has a sub-word length greater than a minimum number of sub-words;
  
  means responsive to a condition that the best scoring path is greater recording a root of a sub-word sequence corresponding to said best scoring path for determining if a count of times the recorded root has remained stable for a threshold number of additional sub-words;
  
  means responsive to a condition of the root of said best scoring path has remained stable during at least the threshold number of additional phonemes and declaring barge-in and disabling any prompt that is playing when the recorded count exceeds the threshold number.
- View Dependent Claims (14, 15, 16)
- - 14. The apparatus for automatic speech recognition of claim 13, further comprising:
15. The apparatus of claim 14, wherein all said means comprise a system having a processor running a program stored in connected memory.
16. The apparatus of claim 13, wherein all said means comprise a system having a processor running a program stored in connected memory.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
WSOU Investments, LLC (WSOU Holdings, LLC)
Original Assignee
Lucent Technologies, Inc. (Nokia Corporation)
Inventors
Sukkar, Rafid Antoon, Setlur, Anand Rangaswamy, Mitchell, Carl Dennis
Primary Examiner(s)
Banks-Harold, Marsha D.
Assistant Examiner(s)
Azad, Abul K.

Application Number

US09/614,018
Time in Patent Office

1,057 Days
Field of Search

704/240, 704/241, 704/242, 704/251, 704/252, 704/253, 704/254, 704/255, 704/256
US Class Current

704/242
CPC Class Codes

G10L 15/22 Procedures used during a sp...

Method and apparatus for recognition-based barge-in detection in the context of subword-based automatic speech recognition

First Claim

8 Assignments

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for recognition-based barge-in detection in the context of subword-based automatic speech recognition

First Claim

8 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links