Concatenated expected responses for speech recognition using expected response boundaries to determine corresponding hypothesis boundaries

US 9,984,685 B2
Filed: 11/07/2014
Issued: 05/29/2018
Est. Priority Date: 11/07/2014
Status: Active Grant

First Claim

Patent Images

1. A method for accepting or rejecting hypothesis words in a hypothesis part using an adjustable acceptance threshold as part of a speech recognition system, the method comprising:

receiving a single speech input from a user, the speech input comprising a first speech input part and a second speech input part, the first speech input part and the second speech input part each having information independent from the other speech input part;

processing the single speech input to generate a single hypothesis comprising a sequence of a first hypothesis part corresponding to the first input part and a second hypothesis part corresponding to the second input part, each of the first hypothesis part and the second hypothesis part having one or more hypothesis words, and each hypothesis word having a corresponding confidence score;

independently comparing each of the first hypothesis part and the second hypothesis part with a first expected response part and a second expected response part, respectively, the first expected response part and the second expected response part having information different and independent from the other expected response part, and the first expected response part being independently compared with the first hypothesis part and the second expected response part being independently compared with the second hypothesis part, and using boundaries between the first or the second expected response parts to determine boundaries between the first or the second hypothesis parts respectively;

adjusting an acceptance threshold for each hypothesis word in the first hypothesis part if the first hypothesis part matches word-for-word the first expected response part, otherwise not adjusting the acceptance threshold for each hypothesis word in the first hypothesis part, and independently adjusting an acceptance threshold for each hypothesis word in the second hypothesis part if the second hypothesis part matches word-for-word the second expected response part, otherwise not adjusting the acceptance threshold for each hypothesis word in the second hypothesis part;

comparing the confidence score for each hypothesis word in each of the first hypothesis part and second hypothesis part to its acceptance threshold; and

accepting or rejecting each hypothesis word in each of the first hypothesis part and the second hypothesis part based on the results of the comparison.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition system used for hands-free data entry receives and analyzes speech input to recognize and accept a user'"'"'s response. Under certain conditions, a user'"'"'s response might be expected. In these situations, the expected response may modify the behavior of the speech recognition system to improve performance. For example, if the hypothesis of a user'"'"'s response matches the expected response then there is a high probability that the user'"'"'s response was recognized correctly. This information may be used to make adjustments. An expected response may include expected response parts, each part containing expected words. By considering an expected response as the concatenation of expected response parts, each part may be considered independently for the purposes of adjusting an acceptance algorithm, adjusting a model, or recording an apparent error. In this way, the speech recognition system may make modifications based on a wide range of user responses.

Citations

30 Claims

1. A method for accepting or rejecting hypothesis words in a hypothesis part using an adjustable acceptance threshold as part of a speech recognition system, the method comprising:
- receiving a single speech input from a user, the speech input comprising a first speech input part and a second speech input part, the first speech input part and the second speech input part each having information independent from the other speech input part;
  
  processing the single speech input to generate a single hypothesis comprising a sequence of a first hypothesis part corresponding to the first input part and a second hypothesis part corresponding to the second input part, each of the first hypothesis part and the second hypothesis part having one or more hypothesis words, and each hypothesis word having a corresponding confidence score;
  
  independently comparing each of the first hypothesis part and the second hypothesis part with a first expected response part and a second expected response part, respectively, the first expected response part and the second expected response part having information different and independent from the other expected response part, and the first expected response part being independently compared with the first hypothesis part and the second expected response part being independently compared with the second hypothesis part, and using boundaries between the first or the second expected response parts to determine boundaries between the first or the second hypothesis parts respectively;
  
  adjusting an acceptance threshold for each hypothesis word in the first hypothesis part if the first hypothesis part matches word-for-word the first expected response part, otherwise not adjusting the acceptance threshold for each hypothesis word in the first hypothesis part, and independently adjusting an acceptance threshold for each hypothesis word in the second hypothesis part if the second hypothesis part matches word-for-word the second expected response part, otherwise not adjusting the acceptance threshold for each hypothesis word in the second hypothesis part;
  
  comparing the confidence score for each hypothesis word in each of the first hypothesis part and second hypothesis part to its acceptance threshold; and
  
  accepting or rejecting each hypothesis word in each of the first hypothesis part and the second hypothesis part based on the results of the comparison.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method according to claim 1, wherein after a first hypothesis part is evaluated, the steps of determining, adjusting, comparing, and accepting or rejecting are repeated for subsequent hypothesis parts until all hypothesis parts comprising the hypothesis have been evaluated.
  - 3. The method according to claim 1, comprising dividing the hypothesis into sequential and non-overlapping hypothesis parts.
  - 4. The method according to claim 1, comprising (i) accepting a hypothesis word when the hypothesis word'"'"'s confidence score exceeds the hypothesis word'"'"'s acceptance threshold and (ii) rejecting a hypothesis word when the hypothesis word'"'"'s confidence score does not exceed the hypothesis word'"'"'s acceptance threshold.
  - 5. The method according to claim 1, wherein each hypothesis word is assigned a default acceptance threshold prior to adjustment.
  - 6. The method according to claim 1, wherein the acceptance threshold adjustment for a hypothesis word in a hypothesis part does not affect acceptance threshold adjustments for any other hypothesis word in the hypothesis part.

7. A method for marking hypothesis words in a hypothesis part as suitable for adaptation in a speech recognition system, the method comprising:
- receiving a single speech input from a user with a speech recognition system comprising a microphone, processor, and memory, the speech input comprising a first speech input part and a second speech input part, the first speech input part and the second speech input part each having information independent from the other speech input part;
  
  processing the single speech input to generate a hypothesis comprising a sequence of a first hypothesis part corresponding to the first input part and a second hypothesis part corresponding to the second input part, each of the first hypothesis part and the second hypothesis part having one or more hypothesis words;
  
  independently comparing each of the first hypothesis part and the second hypothesis part with a first expected response part and a second expected response part, respectively, the first expected response part and the second expected response part having information different and independent from the other expected response part, and the first expected response part being independently compared with the first hypothesis part and the second expected response part being independently compared with the second hypothesis part, and using boundaries between the first or the second expected response parts to determine boundaries between the first or the second hypothesis parts respectively; and
  
  marking each hypothesis word in the first or the second hypothesis part as suitable for adaptation if the first or the second hypothesis part matches word-for-word the first or the second expected response part, otherwise not marking any hypothesis word in the first or the second hypothesis part as suitable for adaptation.
- View Dependent Claims (8, 9, 10, 11)
- - 8. The method according to claim 7, wherein the step of determining a hypothesis part corresponding to an expected response part stored in the memory comprises using boundaries between expected response parts to determine boundaries between hypothesis parts.
  - 9. The method according to claim 8, comprising dividing the hypothesis into sequential and non-overlapping hypothesis parts.
  - 10. The method according to claim 7, comprising adapting the models for the hypothesis words marked as suitable for adaptation using acoustic data corresponding to those hypothesis words.
  - 11. The method according to claim 7, comprising not using data corresponding to hypothesis words that are marked as not suitable for adaptation in adapting the models corresponding to those hypothesis words.

12. A speech recognition system configured to adjust acceptance thresholds for words in a hypothesis part, comprising:
- a storage medium for storing information and processor-executable instructions;
  
  a microphone for receiving speech input from a user;
  
  a computing device comprising a processor communicatively coupled to the storage medium, the processor configured by the processor-executable instructions to perform the steps of;
  
  (i) receiving the single speech input from the microphone, the speech input comprising a first speech input part and a second speech input part, the first speech input part and the second speech input part each having information independent from the other speech input part,(ii) processing the single speech input to determine a single hypothesis comprising a sequence of a first hypothesis part corresponding to the first input part and a second hypothesis part corresponding to the second input part, each of the first hypothesis part and the second hypothesis part having one or more hypothesis words, the single hypothesis being stored on the storage medium,(iii) independently comparing each of the first hypothesis part and the second hypothesis part with a first expected response part and a second expected response part, respectively, the first expected response part and the second expected response part having information different and independent from the other expected response part, and the first expected response part being independently compared with the first hypothesis part and the second expected response part being independently compared with the second hypothesis part, and using boundaries between the first or the second expected response parts to determine boundaries between the first or the second hypothesis parts respectively,and(iii) if the first or the second hypothesis part matches the first or the second expected response part, then adjusting acceptance thresholds for hypothesis words in the first or the second hypothesis part.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The speech recognition system according to claim 12, wherein the hypothesis part is one of a plurality of hypothesis parts which form a hypothesis.
  - 14. The speech recognition system according to claim 13, wherein boundaries between hypothesis parts are determined using boundaries between the expected response parts.
  - 15. The speech recognition system according to claim 13, wherein after a first hypothesis part is evaluated, subsequent hypothesis parts are evaluated in sequence until all hypothesis parts comprising the hypothesis have been evaluated.
  - 16. The speech recognition system according to claim 12, wherein a confidence score corresponding to a hypothesis word is compared to the adjusted acceptance threshold to either accept or reject the hypothesis word as recognized speech.
  - 17. The speech recognition system according to claim 12, wherein the hypothesis part'"'"'s adjustment is not affected by the matching conditions between any other hypothesis parts and their corresponding expected response parts.

18. A speech recognition system configured to mark words in a hypothesis part as suitable for adaptation, comprising:
- a storage medium for storing information and processor-executable instructions;
  
  a microphone for receiving speech input from a user;
  
  a computing device comprising a processor communicatively coupled to the storage medium, the processor configured by the processor-executable instructions to perform the steps of;
  
  (i) receiving a single speech input from the microphone, the speech input comprising a first speech input part and a second speech input part, the first speech input part and the second speech input part each having information independent from the other speech input part,(ii) processing the single speech input to determine a hypothesis comprising a sequence of a first hypothesis part corresponding to the first input part and a second hypothesis part corresponding to the second input part, each of the first hypothesis part and the second hypothesis part having one or more hypothesis words, the hypothesis being stored on the storage medium,(iii) independently comparing each of the first hypothesis part and the second hypothesis part with a first expected response part and a second expected response part, respectively, the first expected response part and the second expected response part having information different and independent from the other expected response part, and the first expected response part being independently compared with the first hypothesis part and the second expected response part being independently compared with the second hypothesis part, and using boundaries between the first or the second expected response parts to determine boundaries between the first or the second hypothesis parts respectively; and
  
  (iii) if the first or the second hypothesis part matches the first or the second expected response part, then marking the hypothesis words in the first or the second hypothesis part as suitable for adaptation.
- View Dependent Claims (19, 20, 21)
- - 19. The speech recognition system according to claim 18, wherein acoustic data corresponding to a hypothesis word that is marked as suitable for adaptation is used to adapt a model corresponding to that hypothesis word.
  - 20. The speech recognition system according to claim 18, wherein the hypothesis part'"'"'s word marking is not affected by the matching conditions between any other hypothesis parts and their corresponding expected response parts.
  - 21. The speech recognition system according to claim 18, wherein acoustic data for the marked hypothesis words is stored on the storage medium for future use.

22. A method for counting errors in a speech recognition system, the method comprising:
- receiving a single speech input from a user with a speech recognition system comprising a microphone, processor, and memory, the single speech input comprising a first speech input part and a second speech input part, the first speech input part and the second speech input part each having information independent from the other speech input part;
  
  processing the single speech input to generate a hypothesis comprising a sequence of a first hypothesis part corresponding to the first input part and a second hypothesis part corresponding to the second input part, each of the first hypothesis part and the second hypothesis part having one or more hypothesis words, and each hypothesis word having a corresponding confidence score;
  
  independently comparing each of the first hypothesis part and the second hypothesis part with a first expected response part and a second expected response part, respectively, the first expected response part and the second expected response part having information different and independent from the other expected response part, and the first expected response part being independently compared with the first hypothesis part and the second expected response part being independently compared with the second hypothesis part, the first expected response part and the second response part being stored in the memory, and using boundaries between the first or the second expected response parts to determine boundaries between the first or the second hypothesis parts respectively;
  
  analyzing the first or the second hypothesis part for recognition errors and/or correct recognitions; and
  
  adding the number of recognition errors to an error count and adding the number of correct recognitions to a correct count, the error count and correct count representing a running total of recognition errors and correct recognitions respectively.
- View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30)
- - 23. The method according to claim 22, wherein the error count and correct count are stored in the memory.
  - 24. The method according to claim 22, wherein the error count and correct count represent running totals of recognition errors and correct recognitions that correspond to a particular user.
  - 25. The method according to claim 22, comprising determining an error rate for the speech recognition system from the error count and correct count.
  - 26. The method according to claim 22, wherein each hypothesis word has a corresponding confidence score and the recognition errors comprise hypothesis words with low confidence scores.
  - 27. The method according to claim 22, wherein the recognition errors comprise substitution errors.
  - 28. The method according to claim 22, wherein the recognition errors comprise insertion errors.
  - 29. The method according to claim 22, wherein the recognition errors comprise deletion errors.
  - 30. The method according to claim 22, wherein the error count comprises a combination of low-confidence errors, substitution errors, insertion errors, and/or deletion errors.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hand Held Products Incorporated (Honeywell International Inc.)
Original Assignee
Hand Held Products Incorporated (Honeywell International Inc.)
Inventors
Braho, Keith, Makay, Jason M.
Primary Examiner(s)
Kazeminezhad, Farzad

Application Number

US14/535,764
Publication Number

US 20160133253A1
Time in Patent Office

1,299 Days
Field of Search

704232, 704251, 704242, 704 2
US Class Current
CPC Class Codes

G06F 40/226   Validation

G10L 13/04   Details of speech synthesis...

G10L 15/01   Assessment or evaluation of...

G10L 15/065   Adaptation

G10L 15/08   Speech classification or se...

G10L 15/22   Procedures used during a sp...

G10L 2015/088   Word spotting

Concatenated expected responses for speech recognition using expected response boundaries to determine corresponding hypothesis boundaries

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Concatenated expected responses for speech recognition using expected response boundaries to determine corresponding hypothesis boundaries

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links