Method and system for considering information about an expected response when performing speech recognition

US 8,612,235 B2
Filed: 06/08/2012
Issued: 12/17/2013
Est. Priority Date: 02/04/2005
Status: Active Grant

First Claim

Patent Images

1. A method for recognizing speech in a speech recognition system comprising the steps of:

receiving input speech from a user in a speech dialog where there is at least one point in the speech dialog where there is a grammar of possible responses and at least one expected response, the at least one expected response being a subset of the grammar and known to be the at least one expected response in the speech recognition system before receiving the user input speech, the at least one expected response including the most likely response or responses expected to be uttered by the user at the at least one point in the speech dialog and having a model associated therewith;

generating acoustic features of the input speech;

directly comparing the input speech acoustic features only to the model that is associated with the at least one expected response, rather than to models associated with additional possible responses, to generate an associated confidence factor;

comparing the confidence factor to an acceptance threshold for accepting the expected response as the result of the speech recognition.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system receives and analyzes speech input from a user in order to recognize and accept a response from the user. Under certain conditions, information about the response expected from the user may be available. In these situations, the available information about the expected response is used to modify the behavior of the speech recognition system by taking this information into account. The modified behavior of the speech recognition system according to the invention has several embodiments including: comparing the observed speech features to the models of the expected response separately from the usual hypothesis search in order to speed up the recognition system; modifying the usual hypothesis search to emphasize the expected response; updating and adapting the models when the recognized speech matches the expected response to improve the accuracy of the recognition system.

173 Citations

18 Claims

1. A method for recognizing speech in a speech recognition system comprising the steps of:
- receiving input speech from a user in a speech dialog where there is at least one point in the speech dialog where there is a grammar of possible responses and at least one expected response, the at least one expected response being a subset of the grammar and known to be the at least one expected response in the speech recognition system before receiving the user input speech, the at least one expected response including the most likely response or responses expected to be uttered by the user at the at least one point in the speech dialog and having a model associated therewith;
  
  generating acoustic features of the input speech;
  
  directly comparing the input speech acoustic features only to the model that is associated with the at least one expected response, rather than to models associated with additional possible responses, to generate an associated confidence factor;
  
  comparing the confidence factor to an acceptance threshold for accepting the expected response as the result of the speech recognition.
- View Dependent Claims (2, 3, 4)
- - 2. The method of claim 1 further comprising adjusting the acceptance threshold based on the comparison of the input speech acoustic features to the model that is associated with the at least one expected response in order to affect the acceptance of the expected response.
  - 3. The method of claim 1 further comprising, if the comparison to the model that is associated with the expected response does not yield a confidence factor exceeding the acceptance threshold, comparing the input speech acoustic features to models associated with additional possible responses.
  - 4. The method of claim 1 wherein the subset includes only a single expected response.

5. An apparatus for recognizing speech comprising:
- circuitry for receiving input speech from a user as part of a speech dialog where there is at least one point in the speech dialog where there is a grammar of possible responses and at least one expected response, the at least one expected response being a subset of the grammar and known to be the at least one expected response in the speech recognition system before receiving the user input speech, the at least one expected response including the most likely response or responses expected to be uttered by the user at the at least one point in the speech dialog and having a model associated therewith, the circuitry configured for generating acoustic features of the input speech;
  
  processing circuitry operable for directly comparing the input speech acoustic features to the model that is associated with the at least one expected response, rather than to models associated with additional possible responses, to generate an associated confidence factor and for comparing the confidence factor to an acceptance threshold for accepting the expected response as the recognized speech.
- View Dependent Claims (6, 7, 8)
- - 6. The apparatus of claim 5 wherein the comparison of the input speech acoustic features includes using the model that is associated with at least one expected response in a match/search algorithm.
  - 7. The apparatus of claim 5 wherein the processing circuitry is further operable to adjust the acceptance threshold based on the comparison of the input speech features to the model that is associated with at least one expected response in order to affect the acceptance of the expected response.
  - 8. The apparatus of claim 5 wherein the processing circuitry compares the input speech features to models associated with additional responses, if the comparison to the model that is associated with the expected response does not yield a confidence factor exceeding the acceptance threshold.

9. A method for recognizing speech in a speech recognition system comprising the steps of:
- in a speech dialog where there is a grammar of possible responses and at least one expected response, determining an expected response to be received as input speech from a user in at least one point in the speech dialog, the at least one expected response being a subset of the grammar and known to be the at least one expected response in the speech recognition system before receiving the user input speech, the at least one expected response including the most likely response or responses expected to be uttered by the user at the at least one point in the speech dialog and having a model associated therewith;
  
  based upon the determined expected response, modifying a speech recognition match/search algorithm with respect to the model that is associated with the expected response so that the match/search algorithm is configured to favor the expected response and boost a confidence factor associated with a hypothesis generated by the modified speech recognition match/search algorithm;
  
  processing input speech from a user using the modified match/search algorithm to generate a hypothesis and confidence factor for the input speech.
- View Dependent Claims (10, 11, 12, 13)
- - 10. The method of claim 9 further comprising comparing the hypothesis confidence factor to an acceptance threshold for accepting the hypothesis.
  - 11. The method of claim 9 wherein the match/search algorithm processes the input speech through a plurality of states and the method further comprises at least one of modifying transition probabilities associated with transitions between multiple states in the match/search algorithm or modifying initial state probabilities associated with a path through multiple states in the match/search algorithm.
  - 12. The method of claim 9 wherein the match/search algorithm utilizes multiple paths through acoustic models for the input speech and the method further comprises at least one of modifying the acoustic models based on the expected response or modifying the insertion penalty associated with a given acoustic model.
  - 13. The method of claim 9 wherein processing of the input speech results in frames of the features of the input speech and the method further comprises utilizing multiple paths through acoustic models for the input speech feature frames to generate scores, and applying at least one of a bias or penalty per frame to the input speech to affect the scores of paths through the models.

14. An apparatus for recognizing speech comprising:
- processing circuitry including a match/search algorithm for performing speech recognition in a speech recognition system having a speech dialog with a grammar of possible responses and at least one expected response, the expected response to be received in user input speech in at least one point in the speech dialog, the at least one expected response being a subset of the grammar and known to be the at least one expected response in the speech recognition system before receiving the user input speech, the at least one expected response including the most likely response or responses expected to be uttered by the user at the at least one point in the speech dialog and having a model associated therewith;
  
  the processing circuitry configured for modifying the speech recognition match/search algorithm with respect to the model that is associated with the expected response to be received as input speech from a user so that the match/search algorithm is configured to favor the expected response and boost a confidence factor associated with a hypothesis generated by the modified speech recognition match/search algorithm;
  
  the processing circuitry further being configured for processing input speech from a user using the modified match/search algorithm to generate a hypothesis and confidence factor for the input speech.
- View Dependent Claims (15, 16, 17, 18)
- - 15. The apparatus of claim 14 wherein the processing circuitry is further configured for comparing the hypothesis confidence factor to an acceptance threshold for accepting the hypothesis.
  - 16. The apparatus of claim 14 wherein the match/search algorithm processes the input speech through a plurality of states and is modifiable by at least one of modifying transition probabilities associated with transitions between multiple states in the match/search algorithm or modifying initial state probabilities associated with a path through multiple states in the match/search algorithm.
  - 17. The apparatus of claim 14 wherein the match/search algorithm utilizes multiple paths through acoustic models for the input speech and is modifiable by at least one of modifying the acoustic models based on the expected response or modifying the insertion penalty associated with a given acoustic model.
  - 18. The apparatus of claim 14 wherein the processing circuitry creates frames of the features of the input speech and the match/search algorithm utilizes multiple paths through acoustic models for the input speech feature frames to generate scores, the algorithm applying at least one of a bias or penalty per frame to the input speech to affect the scores of paths through the models.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Vocollect, Inc. (Honeywell International Inc.)
Original Assignee
Vocollect, Inc. (Honeywell International Inc.)
Inventors
Braho, Keith, El-Jaroudi, Amro
Primary Examiner(s)
He, Jialong

Application Number

US13/492,202
Publication Number

US 20120245939A1
Time in Patent Office

557 Days
Field of Search

704/251, 704/256, 704/270, 704/275
US Class Current

704/275
CPC Class Codes

G10L 15/065   Adaptation

G10L 15/22   Procedures used during a sp...

G10L 2015/228   of application context

Method and system for considering information about an expected response when performing speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

173 Citations

18 Claims

Specification

Use Cases

Quick Links

Others

Method and system for considering information about an expected response when performing speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

173 Citations

18 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others