Method, system and computer program for enhanced speech recognition of digits input strings

US 8,589,162 B2
Filed: 09/19/2008
Issued: 11/19/2013
Est. Priority Date: 09/19/2007
Status: Active Grant

First Claim

Patent Images

1. A method for speech recognition, the method comprising:

for an expected input string comprising a plurality of expected string segments comprising a first expected string segment and a second expected string segment, receiving a first speech segment for the first expected string segment and a second speech segment for the second expected string segment, wherein the first speech segment is different from the second speech segment;

performing speech recognition separately on the first speech segment and the second speech segment, wherein said performing speech recognition comprises generating a first segment n-best list and a second segment n-best list, the first segment n-best list comprising n highest confidence score results of said speech recognition on the first speech segment and the second segment n-best list comprising n highest confidence score results of said speech recognition on the second speech segment, where n comprises at least one integer;

generating a global n-best list corresponding to said expected input string, wherein the global n-best list comprises a plurality of results each generated at least in part by combining a result from said first segment n-best list with a result from said second segment n-best list; and

determining a final global speech recognition result corresponding to said expected input string,wherein said determining said final global speech recognition result comprises pruning results of said global n-best list utilizing a pruning criterion.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention proposes a method, system and computer program for speech recognition. According to one embodiment, a method is provided wherein, for an expected input string divided into a plurality of expected string segments, a speech segment is received for each expected string segment. Speech recognition is then performed separately on each said speech segment via the generation, for each said speech segment, of a segment n-best list comprising n highest confidence score results. A global n-best list is then generated corresponding to the expected input string utilizing the segment n-best lists and a final global speech recognition result corresponding to said expected input string is determined via the pruning of the results of the global n-best list utilizing a pruning criterion.

Citations

20 Claims

1. A method for speech recognition, the method comprising:
- for an expected input string comprising a plurality of expected string segments comprising a first expected string segment and a second expected string segment, receiving a first speech segment for the first expected string segment and a second speech segment for the second expected string segment, wherein the first speech segment is different from the second speech segment;
  
  performing speech recognition separately on the first speech segment and the second speech segment, wherein said performing speech recognition comprises generating a first segment n-best list and a second segment n-best list, the first segment n-best list comprising n highest confidence score results of said speech recognition on the first speech segment and the second segment n-best list comprising n highest confidence score results of said speech recognition on the second speech segment, where n comprises at least one integer;
  
  generating a global n-best list corresponding to said expected input string, wherein the global n-best list comprises a plurality of results each generated at least in part by combining a result from said first segment n-best list with a result from said second segment n-best list; and
  
  determining a final global speech recognition result corresponding to said expected input string,wherein said determining said final global speech recognition result comprises pruning results of said global n-best list utilizing a pruning criterion.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method according to claim 1, wherein receiving the first speech segment for the first expected string segment and receiving the second speech segment for the second expected string segment further comprises:
    - receiving the first speech segment corresponding to the first expected string segment from a speaker; and
      
      prompting the speaker to speak the second speech segment corresponding to the second expected string segment.
  - 3. The method according to claim 2, further comprising:
    - receiving data representing a length of at least one expected string segment of said plurality of expected string segments from said speaker; and
      
      restricting, utilizing said data, a grammar analysis of a speech recognition on said at least one expected string segment.
  - 4. The method according to claim 1, wherein performing speech recognition comprises:
    - performing a grammar analysis speech recognition utilizing a determined maximum length of the expected input string.
  - 5. The method according to claim 1, wherein performing speech recognition comprises:
    - performing a grammar analysis speech recognition utilizing a determined exact length of the expected input string.
  - 6. The method according to claim 1, further comprising:
    - receiving a single continuous speech segment input corresponding to said expected input string from a speaker;
      
      determining one or more time positions within said single continuous speech segment input utilizing a signal received from said speaker to indicate one or more input speech segments; and
      
      dividing said single continuous speech segment input into a plurality of input speech segments utilizing said one or more time positions.
  - 7. The method according to claim 1, further comprising:
    - determining a weight for each result in said global n-best list, wherein said determining said weight comprises calculating each said weight utilizing a plurality of weights associated with corresponding segment n-best lists results composing said each result in said global n-best list.
  - 8. The method according to claim 7, wherein calculating each said weight comprises:
    - summing said plurality of weights associated with said corresponding segment n-best lists results.
  - 9. The method according to claim 1, further comprising:
    - prompting a speaker to repeat a speech segment in response to a determination that speech recognition performed on said speech segment fails to meet a predetermined accuracy threshold.
  - 10. The method according to claim 1, further comprising:
    - determining an accuracy level of said final global speech recognition result utilizing user input.
  - 11. The method according to claim 1, wherein the expected input string corresponds to a credit card number.
  - 12. The method according to claim 11, wherein the pruning criterion comprises the Luhn algorithm.
  - 13. The method according to claim 1, wherein the first segment n-best list comprises a first number of results and the second segment n-best list comprises a second number of results, and the first number is different from the second number.
  - 14. The method according to claim 1, wherein generating a global n-best list comprises:
    - generating a first result of the global n-best list by concatenating a first result of the first segment n-best list with a second result of the second segment n-best list; and
      
      generating a second result of the global n-best list by concatenating the first result of the first segment n-list with a third result of the second segment n-best list.

15. One or more machine-readable storage devices having stored therein a program product, which, when executed by a set of one or more processors, causes the set of one or more processors to perform a method comprising:
- for an expected input string comprising a plurality of expected string segments comprising a first expected string segment and a second expected string segment, receiving a first speech segment for the first expected string segment and a second speech segment for the second expected string segment, wherein the first speech segment is different from the second speech segment;
  
  performing speech recognition separately on the first speech segment and the second speech segment, wherein said performing speech recognition comprises generating a first segment n-best list and a second segment n-best list, the first segment n-best list comprising n highest confidence score results of said speech recognition on the first speech segment, and the second segment n-best list comprising n highest confidence score results of said speech recognition on the second speech segment, where n comprises at least one integer;
  
  generating a global n-best list corresponding to said expected input string, wherein the global n-best list comprises a plurality of results each generated at least in part by combining a result from said first segment n-best list with a result from said second segment n-best list; and
  
  determining a final global speech recognition result corresponding to said expected input string, wherein said determining said final global speech recognition result comprises pruning results of said global n-best list utilizing a pruning criterion.
- View Dependent Claims (16, 17)
- - 16. The one or more machine-readable storage devices according to claim 15, wherein generating a global n-best list comprises:
    - generating a first result of the global n-best list by concatenating a first result of the first segment n-best list with a second result of the second segment n-best list; and
      
      generating a second result of the global n-best list by concatenating the first result of the first segment n-best list with a third result of the second segment n-best list.
  - 17. The one or more machine-readable storage devices according to claim 15, wherein the first segment n-best list comprises a first number of results and the second segment n-best list comprises a second number of results, and the first number is different from the second number.

18. A system for speech recognition comprising:
- a set of one or more processors;
  
  a memory unit coupled with the set of one or more processors; and
  
  a speech recognition unit operable to, for an expected input string comprising a plurality of expected string segments comprising a first expected string segment and a second expected string segment, receive a first speech segment for the first expected string segment and a second speech segment for the second expected string segment, wherein the first speech segment is different from the second speech segment;
  
  perform speech recognition separately on said first speech segment and said second speech segment, wherein said speech recognition comprises generating a first segment n-best list and a second segment n-best list, the first segment n-best list comprising n highest confidence score results of said speech recognition on the first speech segment, and the second segment n-best list comprising n highest confidence score results of said speech recognition on the second speech segment, where n comprises at least one integer;
  
  generating a global n-best list corresponding to said expected input string, wherein the global n-best list comprises a plurality of results each generated at least in part by combining a result from said first segment n-best list with a result from said second segment n-best list; and
  
  determine a final global speech recognition result corresponding to said expected input string, wherein determining said final global speech recognition result comprises pruning results of said global n-best list utilizing a pruning criterion.
- View Dependent Claims (19, 20)
- - 19. The system according to claim 18, wherein the first segment n-best list comprises a first number of results and the second segment n-best list comprises a second number of results, and the first number is different from the second number.
  - 20. The system according to claim 18, wherein generating a global n-best list comprises:
    - generating a first result of the global n-best list by concatenating a first result of the first segment n-best list with a second result of the second segment n-best list; and
      
      generating a second result of the global n-best list by concatenating the first result of the first segment n-best list with a third result of the second segment n-best list.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Lejeune, Remi, Crepy, Hubert
Primary Examiner(s)
Armstrong, Angela A

Application Number

US12/234,176
Publication Number

US 20090125306A1
Time in Patent Office

1,887 Days
Field of Search

704/231, 704/251, 704/254
US Class Current

704/254
CPC Class Codes

G10L 15/08 Speech classification or se...

Method, system and computer program for enhanced speech recognition of digits input strings

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method, system and computer program for enhanced speech recognition of digits input strings

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links