Concatenated expected responses for speech recognition using expected response boundaries to determine corresponding hypothesis boundaries
First Claim
1. A method for accepting or rejecting hypothesis words in a hypothesis part using an adjustable acceptance threshold as part of a speech recognition system, the method comprising:
- receiving a single speech input from a user, the speech input comprising a first speech input part and a second speech input part, the first speech input part and the second speech input part each having information independent from the other speech input part;
processing the single speech input to generate a single hypothesis comprising a sequence of a first hypothesis part corresponding to the first input part and a second hypothesis part corresponding to the second input part, each of the first hypothesis part and the second hypothesis part having one or more hypothesis words, and each hypothesis word having a corresponding confidence score;
independently comparing each of the first hypothesis part and the second hypothesis part with a first expected response part and a second expected response part, respectively, the first expected response part and the second expected response part having information different and independent from the other expected response part, and the first expected response part being independently compared with the first hypothesis part and the second expected response part being independently compared with the second hypothesis part, and using boundaries between the first or the second expected response parts to determine boundaries between the first or the second hypothesis parts respectively;
adjusting an acceptance threshold for each hypothesis word in the first hypothesis part if the first hypothesis part matches word-for-word the first expected response part, otherwise not adjusting the acceptance threshold for each hypothesis word in the first hypothesis part, and independently adjusting an acceptance threshold for each hypothesis word in the second hypothesis part if the second hypothesis part matches word-for-word the second expected response part, otherwise not adjusting the acceptance threshold for each hypothesis word in the second hypothesis part;
comparing the confidence score for each hypothesis word in each of the first hypothesis part and second hypothesis part to its acceptance threshold; and
accepting or rejecting each hypothesis word in each of the first hypothesis part and the second hypothesis part based on the results of the comparison.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech recognition system used for hands-free data entry receives and analyzes speech input to recognize and accept a user'"'"'s response. Under certain conditions, a user'"'"'s response might be expected. In these situations, the expected response may modify the behavior of the speech recognition system to improve performance. For example, if the hypothesis of a user'"'"'s response matches the expected response then there is a high probability that the user'"'"'s response was recognized correctly. This information may be used to make adjustments. An expected response may include expected response parts, each part containing expected words. By considering an expected response as the concatenation of expected response parts, each part may be considered independently for the purposes of adjusting an acceptance algorithm, adjusting a model, or recording an apparent error. In this way, the speech recognition system may make modifications based on a wide range of user responses.
-
Citations
30 Claims
-
1. A method for accepting or rejecting hypothesis words in a hypothesis part using an adjustable acceptance threshold as part of a speech recognition system, the method comprising:
-
receiving a single speech input from a user, the speech input comprising a first speech input part and a second speech input part, the first speech input part and the second speech input part each having information independent from the other speech input part; processing the single speech input to generate a single hypothesis comprising a sequence of a first hypothesis part corresponding to the first input part and a second hypothesis part corresponding to the second input part, each of the first hypothesis part and the second hypothesis part having one or more hypothesis words, and each hypothesis word having a corresponding confidence score; independently comparing each of the first hypothesis part and the second hypothesis part with a first expected response part and a second expected response part, respectively, the first expected response part and the second expected response part having information different and independent from the other expected response part, and the first expected response part being independently compared with the first hypothesis part and the second expected response part being independently compared with the second hypothesis part, and using boundaries between the first or the second expected response parts to determine boundaries between the first or the second hypothesis parts respectively; adjusting an acceptance threshold for each hypothesis word in the first hypothesis part if the first hypothesis part matches word-for-word the first expected response part, otherwise not adjusting the acceptance threshold for each hypothesis word in the first hypothesis part, and independently adjusting an acceptance threshold for each hypothesis word in the second hypothesis part if the second hypothesis part matches word-for-word the second expected response part, otherwise not adjusting the acceptance threshold for each hypothesis word in the second hypothesis part; comparing the confidence score for each hypothesis word in each of the first hypothesis part and second hypothesis part to its acceptance threshold; and accepting or rejecting each hypothesis word in each of the first hypothesis part and the second hypothesis part based on the results of the comparison. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for marking hypothesis words in a hypothesis part as suitable for adaptation in a speech recognition system, the method comprising:
-
receiving a single speech input from a user with a speech recognition system comprising a microphone, processor, and memory, the speech input comprising a first speech input part and a second speech input part, the first speech input part and the second speech input part each having information independent from the other speech input part; processing the single speech input to generate a hypothesis comprising a sequence of a first hypothesis part corresponding to the first input part and a second hypothesis part corresponding to the second input part, each of the first hypothesis part and the second hypothesis part having one or more hypothesis words; independently comparing each of the first hypothesis part and the second hypothesis part with a first expected response part and a second expected response part, respectively, the first expected response part and the second expected response part having information different and independent from the other expected response part, and the first expected response part being independently compared with the first hypothesis part and the second expected response part being independently compared with the second hypothesis part, and using boundaries between the first or the second expected response parts to determine boundaries between the first or the second hypothesis parts respectively; and marking each hypothesis word in the first or the second hypothesis part as suitable for adaptation if the first or the second hypothesis part matches word-for-word the first or the second expected response part, otherwise not marking any hypothesis word in the first or the second hypothesis part as suitable for adaptation. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A speech recognition system configured to adjust acceptance thresholds for words in a hypothesis part, comprising:
-
a storage medium for storing information and processor-executable instructions; a microphone for receiving speech input from a user; a computing device comprising a processor communicatively coupled to the storage medium, the processor configured by the processor-executable instructions to perform the steps of; (i) receiving the single speech input from the microphone, the speech input comprising a first speech input part and a second speech input part, the first speech input part and the second speech input part each having information independent from the other speech input part, (ii) processing the single speech input to determine a single hypothesis comprising a sequence of a first hypothesis part corresponding to the first input part and a second hypothesis part corresponding to the second input part, each of the first hypothesis part and the second hypothesis part having one or more hypothesis words, the single hypothesis being stored on the storage medium, (iii) independently comparing each of the first hypothesis part and the second hypothesis part with a first expected response part and a second expected response part, respectively, the first expected response part and the second expected response part having information different and independent from the other expected response part, and the first expected response part being independently compared with the first hypothesis part and the second expected response part being independently compared with the second hypothesis part, and using boundaries between the first or the second expected response parts to determine boundaries between the first or the second hypothesis parts respectively, and (iii) if the first or the second hypothesis part matches the first or the second expected response part, then adjusting acceptance thresholds for hypothesis words in the first or the second hypothesis part. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A speech recognition system configured to mark words in a hypothesis part as suitable for adaptation, comprising:
-
a storage medium for storing information and processor-executable instructions; a microphone for receiving speech input from a user; a computing device comprising a processor communicatively coupled to the storage medium, the processor configured by the processor-executable instructions to perform the steps of; (i) receiving a single speech input from the microphone, the speech input comprising a first speech input part and a second speech input part, the first speech input part and the second speech input part each having information independent from the other speech input part, (ii) processing the single speech input to determine a hypothesis comprising a sequence of a first hypothesis part corresponding to the first input part and a second hypothesis part corresponding to the second input part, each of the first hypothesis part and the second hypothesis part having one or more hypothesis words, the hypothesis being stored on the storage medium, (iii) independently comparing each of the first hypothesis part and the second hypothesis part with a first expected response part and a second expected response part, respectively, the first expected response part and the second expected response part having information different and independent from the other expected response part, and the first expected response part being independently compared with the first hypothesis part and the second expected response part being independently compared with the second hypothesis part, and using boundaries between the first or the second expected response parts to determine boundaries between the first or the second hypothesis parts respectively; and (iii) if the first or the second hypothesis part matches the first or the second expected response part, then marking the hypothesis words in the first or the second hypothesis part as suitable for adaptation. - View Dependent Claims (19, 20, 21)
-
-
22. A method for counting errors in a speech recognition system, the method comprising:
-
receiving a single speech input from a user with a speech recognition system comprising a microphone, processor, and memory, the single speech input comprising a first speech input part and a second speech input part, the first speech input part and the second speech input part each having information independent from the other speech input part; processing the single speech input to generate a hypothesis comprising a sequence of a first hypothesis part corresponding to the first input part and a second hypothesis part corresponding to the second input part, each of the first hypothesis part and the second hypothesis part having one or more hypothesis words, and each hypothesis word having a corresponding confidence score; independently comparing each of the first hypothesis part and the second hypothesis part with a first expected response part and a second expected response part, respectively, the first expected response part and the second expected response part having information different and independent from the other expected response part, and the first expected response part being independently compared with the first hypothesis part and the second expected response part being independently compared with the second hypothesis part, the first expected response part and the second response part being stored in the memory, and using boundaries between the first or the second expected response parts to determine boundaries between the first or the second hypothesis parts respectively; analyzing the first or the second hypothesis part for recognition errors and/or correct recognitions; and adding the number of recognition errors to an error count and adding the number of correct recognitions to a correct count, the error count and correct count representing a running total of recognition errors and correct recognitions respectively. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30)
-
Specification