Method, system and computer program for enhanced speech recognition of digits input strings
First Claim
1. A method for speech recognition, the method comprising:
- for an expected input string comprising a plurality of expected string segments comprising a first expected string segment and a second expected string segment, receiving a first speech segment for the first expected string segment and a second speech segment for the second expected string segment, wherein the first speech segment is different from the second speech segment;
performing speech recognition separately on the first speech segment and the second speech segment, wherein said performing speech recognition comprises generating a first segment n-best list and a second segment n-best list, the first segment n-best list comprising n highest confidence score results of said speech recognition on the first speech segment and the second segment n-best list comprising n highest confidence score results of said speech recognition on the second speech segment, where n comprises at least one integer;
generating a global n-best list corresponding to said expected input string, wherein the global n-best list comprises a plurality of results each generated at least in part by combining a result from said first segment n-best list with a result from said second segment n-best list; and
determining a final global speech recognition result corresponding to said expected input string,wherein said determining said final global speech recognition result comprises pruning results of said global n-best list utilizing a pruning criterion.
3 Assignments
0 Petitions
Accused Products
Abstract
The present invention proposes a method, system and computer program for speech recognition. According to one embodiment, a method is provided wherein, for an expected input string divided into a plurality of expected string segments, a speech segment is received for each expected string segment. Speech recognition is then performed separately on each said speech segment via the generation, for each said speech segment, of a segment n-best list comprising n highest confidence score results. A global n-best list is then generated corresponding to the expected input string utilizing the segment n-best lists and a final global speech recognition result corresponding to said expected input string is determined via the pruning of the results of the global n-best list utilizing a pruning criterion.
-
Citations
20 Claims
-
1. A method for speech recognition, the method comprising:
-
for an expected input string comprising a plurality of expected string segments comprising a first expected string segment and a second expected string segment, receiving a first speech segment for the first expected string segment and a second speech segment for the second expected string segment, wherein the first speech segment is different from the second speech segment; performing speech recognition separately on the first speech segment and the second speech segment, wherein said performing speech recognition comprises generating a first segment n-best list and a second segment n-best list, the first segment n-best list comprising n highest confidence score results of said speech recognition on the first speech segment and the second segment n-best list comprising n highest confidence score results of said speech recognition on the second speech segment, where n comprises at least one integer; generating a global n-best list corresponding to said expected input string, wherein the global n-best list comprises a plurality of results each generated at least in part by combining a result from said first segment n-best list with a result from said second segment n-best list; and determining a final global speech recognition result corresponding to said expected input string, wherein said determining said final global speech recognition result comprises pruning results of said global n-best list utilizing a pruning criterion. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. One or more machine-readable storage devices having stored therein a program product, which, when executed by a set of one or more processors, causes the set of one or more processors to perform a method comprising:
-
for an expected input string comprising a plurality of expected string segments comprising a first expected string segment and a second expected string segment, receiving a first speech segment for the first expected string segment and a second speech segment for the second expected string segment, wherein the first speech segment is different from the second speech segment; performing speech recognition separately on the first speech segment and the second speech segment, wherein said performing speech recognition comprises generating a first segment n-best list and a second segment n-best list, the first segment n-best list comprising n highest confidence score results of said speech recognition on the first speech segment, and the second segment n-best list comprising n highest confidence score results of said speech recognition on the second speech segment, where n comprises at least one integer; generating a global n-best list corresponding to said expected input string, wherein the global n-best list comprises a plurality of results each generated at least in part by combining a result from said first segment n-best list with a result from said second segment n-best list; and determining a final global speech recognition result corresponding to said expected input string, wherein said determining said final global speech recognition result comprises pruning results of said global n-best list utilizing a pruning criterion. - View Dependent Claims (16, 17)
-
-
18. A system for speech recognition comprising:
-
a set of one or more processors; a memory unit coupled with the set of one or more processors; and a speech recognition unit operable to, for an expected input string comprising a plurality of expected string segments comprising a first expected string segment and a second expected string segment, receive a first speech segment for the first expected string segment and a second speech segment for the second expected string segment, wherein the first speech segment is different from the second speech segment; perform speech recognition separately on said first speech segment and said second speech segment, wherein said speech recognition comprises generating a first segment n-best list and a second segment n-best list, the first segment n-best list comprising n highest confidence score results of said speech recognition on the first speech segment, and the second segment n-best list comprising n highest confidence score results of said speech recognition on the second speech segment, where n comprises at least one integer; generating a global n-best list corresponding to said expected input string, wherein the global n-best list comprises a plurality of results each generated at least in part by combining a result from said first segment n-best list with a result from said second segment n-best list; and determine a final global speech recognition result corresponding to said expected input string, wherein determining said final global speech recognition result comprises pruning results of said global n-best list utilizing a pruning criterion. - View Dependent Claims (19, 20)
-
Specification