HANDWRITING-BASED USER INTERFACE FOR CORRECTION OF SPEECH RECOGNITION ERRORS

US 20090228273A1
Filed: 03/05/2008
Published: 09/10/2009
Est. Priority Date: 03/05/2008
Status: Abandoned Application

First Claim

Patent Images

1. A method of correcting speech recognition result output by a speech recognizer, comprising:

displaying the speech recognition result as a sequence of tokens on a user interface display;

receiving editing marks on the displayed speech recognition result, input by a user, through the user interface display;

identifying an error type and error position within the speech recognition result based on the editing marks; and

replacing tokens in the speech recognition result, marked by the editing marks as being incorrect, with alternative tokens, based on the error type and error position identified, to obtain a revised speech recognition result; and

outputting the revised speech recognition result for display on the user interface display.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition result is displayed for review by a user. If it is incorrect, the user provides pen-based editing marks. An error type and location (within the speech recognition result) are identified based on the pen-based editing marks. An alternative result template is generated, and an N-best alternative list is also generated by applying the template to intermediate recognition results from an automatic speech recognizer. The N-best alternative list is output for use in correcting the speech recognition results.

Citations

20 Claims

1. A method of correcting speech recognition result output by a speech recognizer, comprising:
- displaying the speech recognition result as a sequence of tokens on a user interface display;
  
  receiving editing marks on the displayed speech recognition result, input by a user, through the user interface display;
  
  identifying an error type and error position within the speech recognition result based on the editing marks; and
  
  replacing tokens in the speech recognition result, marked by the editing marks as being incorrect, with alternative tokens, based on the error type and error position identified, to obtain a revised speech recognition result; and
  
  outputting the revised speech recognition result for display on the user interface display.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 wherein identifying an error type and error position comprises:
    - performing handwriting recognition on symbols in the editing marks to identify a type of error represented by the symbols; and
      
      identifying a position in the speech recognition result that the editing marks occur to identify the error position.
  - 3. The method of claim 2 and further comprising:
    - prior to replacing tokens, generating a list of alternative tokens based on the error type and error position.
  - 4. The method of claim 3 wherein generating a list of alternative tokens, comprises:
    - generating a template indicative of a structure of alternative speech recognition results that are hypothesis error corrections for the speech recognition result.
  - 5. The method of claim 4 wherein the speech recognizer generates a plurality of intermediate recognition results prior to outputting the speech recognition result, and wherein generating a list of alternative tokens further comprises:
    - comparing the template against the intermediate recognition results, generated for a position in the speech recognition result that corresponds to the error position, to identify as the list of alternative tokens, a list of intermediate recognition results that match the template.
  - 6. The method of claim 5 and further comprising:
    - generating a posterior probability confidence measure for each of the intermediate recognition results; and
      
      ranking the list of intermediate recognition results in order of the confidence measure.
  - 7. The method of claim 6 wherein the speech recognizer generates language model scores and acoustic model scores for each of the intermediate recognition results and wherein generating the posterior probability confidence measure comprises:
    - generating the posterior probability confidence measure based on the acoustic model scores and language model scores for each of the intermediate recognition results.
  - 8. The method of claim 6 wherein replacing tokens comprises:
    - automatically replacing the tokens in the speech recognition result with a top ranked intermediate recognition result from the ranked list of intermediate recognition results.
  - 9. The method of claim 8 and further comprising:
    - displaying, as the revised speech recognition result, the speech recognition result with tokens replaced by the top ranked intermediate recognition result;
      
      displaying the ranked list of intermediate recognition results;
      
      if the revised speech recognition result is incorrect, receiving a user selection, through the user interface display, of a correct one of the intermediate recognition results in the ranked list; and
      
      displaying the speech recognition result as the correct one of the intermediate recognition results.
  - 10. The method of claim 9 and further comprising:
    - if none of the intermediate recognition results in the ranked list is correct, receiving a user handwriting input of the correct speech recognition result;
      
      performing handwriting recognition on the user handwriting input to obtain a handwriting recognition result; and
      
      displaying as the revised speech recognition result, the handwriting recognition result.

11. A user interface system used for performing correction of speech recognition results generated by a speech recognizer, comprising:
- a user interface display displaying a speech recognition result;
  
  a user interface component configured to receive through the user interface display, handwritten editing marks on the speech recognition result and being indicative of an error type of an error located at an error position in the speech recognition result where the handwritten editing mark is made;
  
  a template generator generating a template indicative of alternative speech recognition results based on the error type and error position;
  
  an N-best alternative generator configured to identify intermediate speech recognition results output by the speech recognizer that match the template and to score each matching intermediate speech recognition result to obtain an N-best list of alternatives comprising the N-best scoring intermediate speech recognition results that match the template; and
  
  an error correction component configured to generate a revised speech recognition result by revising the speech recognition result with one of the N-best alternatives and to display the revised speech recognition result on the user interface display.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The user interface system of claim 11 and further comprising:
    - a handwriting recognition component configured to identify the error type based on symbols in the handwritten editing marks.
  - 13. The user interface system of claim 11 wherein the error correction component is configured to automatically generate the revised speech recognition result using a top ranked one of the N-best alternatives.
  - 14. The user interface system of claim 12 wherein the error correction component is configured to generate the revised speech recognition result using a user selected one of the N-best alternatives.
  - 15. The user interface system of claim 12 wherein the handwriting recognition component receives a handwriting input indicative of a handwritten correction of the displayed speech recognition result and generates a handwriting recognition result based on the handwritten correction, and wherein the error correction component is configured to generate the revised speech recognition result using the handwriting recognition result.

16. A method of correcting a speech recognition result displayed on a touch sensitive user interface display, comprising:
- receiving a handwritten input identifying an error type and error position of an error in the speech recognition result, through the touch sensitive user interface display;
  
  generating a list of alternatives for the speech recognition result at the error position; and
  
  performing error correction by;
  
  automatically generating a revised speech recognition result using a first alternative in the list and displaying the revised speech recognition result;
  
  displaying the list of alternatives, and, if the revised speech recognition result is incorrect, receiving a user selection of a correct one of the alternatives and displaying the revised speech recognition result using the selected correct alternative, andif a user input is received indicative of there being no correct alternative in the list, receiving a user handwriting input indicative of a user written correction of the error, performing handwriting recognition on the user handwriting input to generate a handwriting recognition result and displaying the revised speech recognition result using the handwriting recognition result.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The method of claim 16 wherein generating a list of alternatives comprises:
    - generating an alternative template identifying a structure of alternative results used to correct the speech recognition result; and
      
      matching the template against intermediate speech recognition results output by a speech recognition system to identify a list of matching alternatives;
      
      calculating a posterior probability score for each of the matching alternatives; and
      
      ranking the matching alternatives based on the score to obtain a ranked list of a top N scoring alternatives.
  - 18. The method of claim 16 and further comprising:
    - performing handwriting recognition on the handwritten input to identify the error type and error position.
  - 19. The method of claim 18 wherein the user interface display comprises a touch sensitive screen, and wherein the handwritten input comprises pen-based editing inputs on the speech recognition result displayed on the touch sensitive screen.
  - 20. The method of claim 17 wherein calculating comprises:
    - calculating the posterior probability score using language model scores and acoustic model scores generated for the intermediate speech recognition results by the speech recognition system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Soong, Frank Kao-Ping, Wang, Lijuan

Application Number

US12/042,344
Publication Number

US 20090228273A1
Time in Patent Office

Days
Field of Search
US Class Current

704/235
CPC Class Codes

G06F 3/04883   for inputting data by handw...

G06F 40/232   Orthographic correction, e....

G06V 30/1423   the instrument generating s...

G10L 15/22   Procedures used during a sp...

HANDWRITING-BASED USER INTERFACE FOR CORRECTION OF SPEECH RECOGNITION ERRORS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

HANDWRITING-BASED USER INTERFACE FOR CORRECTION OF SPEECH RECOGNITION ERRORS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links