Dictation with incremental recognition of speech

US 9,361,883 B2
Filed: 05/01/2012
Issued: 06/07/2016
Est. Priority Date: 05/01/2012
Status: Active Grant

First Claim

Patent Images

1. A method, performed by a computing system, for providing a dictating service, comprising:

receiving a speech signal in response to vocalization, by a user, of an incremental portion of a complete utterance, the speech signal being from a microphone;

interpreting the incremental portion based on the speech signal, to provide recognized speech, prior to the user finishing the complete utterance; and

providing rendered text associated with the recognized speech on an output presentation displayed on a display screen prior to the user finishing the complete utterance, wherein providing the rendered text on the output presentation further comprises modifying a rate at which the rendered text is presented on the output presentation, the rate being modified based on a level of uncertainty associated with each part of the rendered text.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A dictation module is described herein which receives and interprets a complete utterance of the user in incremental fashion, that is, one incremental portion at a time. The dictation module also provides rendered text in incremental fashion. The rendered text corresponds to the dictation module'"'"'s interpretation of each incremental portion. The dictation module also allows the user to modify any part of the rendered text, as it becomes available. In one case, for instance, the dictation module provides a marking menu which includes multiple options by which a user can modify a selected part of the rendered text. The dictation module also uses the rendered text (as modified or unmodified by the user using the marking menu) to adjust one or more models used by the dictation model to interpret the user'"'"'s utterance.

31 Citations

View as Search Results

20 Claims

1. A method, performed by a computing system, for providing a dictating service, comprising:
- receiving a speech signal in response to vocalization, by a user, of an incremental portion of a complete utterance, the speech signal being from a microphone;
  
  interpreting the incremental portion based on the speech signal, to provide recognized speech, prior to the user finishing the complete utterance; and
  
  providing rendered text associated with the recognized speech on an output presentation displayed on a display screen prior to the user finishing the complete utterance, wherein providing the rendered text on the output presentation further comprises modifying a rate at which the rendered text is presented on the output presentation, the rate being modified based on a level of uncertainty associated with each part of the rendered text.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method of claim 1, wherein one or more incremental portions of the complete utterance correspond to respective words of the complete utterance.
  - 3. The method of claim 1, wherein one or more incremental portions of the complete utterance correspond to respective phrases of the complete utterance.
  - 4. The method of claim 1,wherein said interpreting comprises performing a search over a plurality of possible paths in a linguistic lattice, to find a most likely path, each path comprising a different interpretation of the incremental portion,wherein the search is commenced in response to detection of a pause in the vocalization of the complete utterance, the pause having a duration expressed by a silence threshold,and wherein the silence threshold is set to approximately zero.
  - 5. The method of claim 1, wherein said interpreting comprises treating each incremental portion as linguistically dependent on a preceding incremental portion, when a preceding incremental portion exists.
  - 6. The method of claim 1, wherein said interpreting comprises boosting a relevance of an output of a language model after a user chooses to modify an incremental portion.
  - 7. The method of claim 1, wherein said providing the rendered text comprises providing the rendered text in a character-by-character serial manner with a delay between characters to simulate operation of a manual typewriter, each character forming a part of a representation of the complete utterance.
  - 8. The method of claim 1, wherein said providing the rendered text comprises providing sound that simulates operation of a manual typewriter.
  - 9. The method of claim 1,wherein said interpreting is performed using an acoustic model and a language model,wherein the method further comprises modifying at least one of the acoustic model and the language model based on the rendered text,the rendered text being confirmed by the user as being a correct representation of the incremental portion by virtue of a decision by the user to either modify, or not to modify, the rendered text.
  - 10. The method of claim 1, further comprising providing at least one option to modify a selected part of the rendered text, the selected part of the rendered text being modifiable based on received input.
  - 11. The method of claim 10, wherein said providing the at least one option to modify the selected part of the rendered text comprises:
    - receiving a selection of a part of the rendered text, to provide the selected part; and
      
      providing a menu that includes a plurality of options regarding different respective ways in which the selected part is modifiable.
  - 12. The method of claim 11, wherein the menu corresponds to a marking menu, the marking menu either:
    - explicitly displaying the menu options;
      
      or implicitly enabling the menu options without explicitly displaying the menu options.
  - 13. The method of claim 11, wherein the plurality of options include at least one of:
    - an option to choose an alternative word or phrase to replace the selected part;
      
      an option to add a punctuation mark to the selected part;
      
      an option to add formatting to the selected part;
      
      an option to delete the selected part;
      
      an option to spell out the selected part by voice;
      
      an option to invoke a soft keyboard; and
      
      an option to re-speak the selected part.
  - 14. The method of claim 1,further comprising presenting a set of soft keys for use by the user in manually inputting text;
    - andmodifying at least one aspect of the set of soft keys based on the rendered text.
  - 15. The method of claim 1, wherein providing the rendered text on the output presentation further comprises presenting a symbol in proximity of a part of the rendered text responsive to the part of the rendered text being assessed to be uncertain, wherein the part of the rendered text is assessed to be uncertain based on the level of uncertainty associated with each part of the rendered text.
  - 16. The method of claim 1, wherein providing the rendered text on the output presentation further comprises modifying at least one of a color, a size, a font, or a transparency level at which the rendered text is presented on the output presentation based on the level of uncertainty associated with each part of the rendered text.

17. A computing system, comprising:
- at least one processing device; and
  
  memory that comprises computer readable instructions that, when executed by the at least one processing device, cause the at least one processing device to perform acts including;
  
  extracting features from a speech signal, the speech signal being received in response to vocalization, by a user, of an incremental portion of a complete utterance, the speech signal being from a microphone;
  
  interpreting the incremental portion based on the features extracted from the speech signal, prior to the user finishing the complete utterance, to provide recognized speech, the speech signal being acoustically interpreted using an acoustic model and linguistically interpreted using a language model; and
  
  providing rendered text associated with the recognized speech on an output presentation displayed on a display screen prior to the user finishing the complete utterance, wherein providing the rendered text on the output presentation further comprises modifying a rate at which the rendered text is presented on the output presentation, the rate being modified based on a level of uncertainty associated with each part of the rendered text.
- View Dependent Claims (18)
- - 18. The computing system of claim 17, wherein the memory further comprises computer readable instructions that, when executed by the at least one processing device, cause the at least one processing device to perform acts including:
    - presenting a plurality of options for modifying a selected part of the rendered text in different respective ways, the plurality of options being presented in a form of a marking menu, the marking menu being placed in proximity to the selected part of the rendered text; and
      
      modifying at least one of the acoustic model or the language model based on the rendered text that is either unmodified or modified based on input received responsive to the plurality of options.

19. A computer readable storage device for storing computer readable instructions, the computer readable instructions providing a dictation module when executed by one or more processing devices, the computer readable instructions comprising:
- logic configured to present rendered text associated with a vocalization, by a user, of an incremental portion of a complete utterance on a display screen prior to the user finishing the complete utterance, said logic configured to present the rendered text further being configured to modify a rate at which the rendered text is presented, the rate being modified based on a level of uncertainty associated with the rendered text; and
  
  logic configured to present a marking menu on the display screen to the user that provides a plurality of options, the plurality of options giving the user an opportunity to modify any part of the rendered text in different respective ways.
- View Dependent Claims (20)
- - 20. The computer readable storage device of claim 19, wherein the marking menu is a radial marking menu, and wherein the plurality of options are radially arrayed around the selected part of the rendered text.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Paek, Timothy S., Lee, Bongshin, Hsu, Bo-June
Primary Examiner(s)
He, Jialong

Application Number

US13/460,854
Publication Number

US 20130297307A1
Time in Patent Office

1,498 Days
Field of Search

704/251, 704/253, 704/256, 704/270
US Class Current

1/1
CPC Class Codes

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2015/221   Announcement of recognition...

Dictation with incremental recognition of speech

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

31 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Dictation with incremental recognition of speech

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

31 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links