Common word graph based multimodal input

US 7,848,917 B2
Filed: 03/30/2006
Issued: 12/07/2010
Est. Priority Date: 03/30/2006
Status: Expired due to Fees

First Claim

Patent Images

1. A method for processing input received by a computing device comprising one or more processors, the method comprising:

decoding input from a first input modality to produce posterior probabilities for words along paths in a common word graph;

recording a decoding front for each of a plurality of possible input modalities, each decoding front comprising a set of nodes in the common word graph that define an end of a last word along a path in the common word graph that was assigned a probability by decoding an input from the respective input modality;

receiving input from a second input modality after recording the decoding front for the second input modality;

using the nodes of the recorded decoding front for the second input modality to determine where in the common word graph to begin rescoring and pruning the common word graph based on the input from the second input modality;

using one or more of the processors, rescoring and pruning the common word graph based on averaging posterior probabilities from decoding input from the first input modality and decoding input from the second input modality; and

outputting a hypothesis for the input based on the common word graph.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Multiple input modalities are selectively used by a user or process to prune a word graph. Pruning initiates rescoring in order to generate a new word graph with a revised best path.

29 Citations

View as Search Results

14 Claims

1. A method for processing input received by a computing device comprising one or more processors, the method comprising:
- decoding input from a first input modality to produce posterior probabilities for words along paths in a common word graph;
  
  recording a decoding front for each of a plurality of possible input modalities, each decoding front comprising a set of nodes in the common word graph that define an end of a last word along a path in the common word graph that was assigned a probability by decoding an input from the respective input modality;
  
  receiving input from a second input modality after recording the decoding front for the second input modality;
  
  using the nodes of the recorded decoding front for the second input modality to determine where in the common word graph to begin rescoring and pruning the common word graph based on the input from the second input modality;
  
  using one or more of the processors, rescoring and pruning the common word graph based on averaging posterior probabilities from decoding input from the first input modality and decoding input from the second input modality; and
  
  outputting a hypothesis for the input based on the common word graph.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein outputting the hypothesis comprises rendering the hypothesis to a user.
  - 3. The method of claim 1 wherein rescoring comprises rescoring the common word graph from a node in the decoding front to a furthest node available in the word graph.
  - 4. The method of claim 1, in which the first input modality is a speech modality, and the second input modality is a handwriting modality.
  - 5. The method of claim 4, in which the first input modality is processed using a speech recognizer with access to an acoustic model, and the second input modality is processed using a handwriting recognizer with access to a hand stroke model.
  - 6. The method of claim 1, in which the plurality of possible modalities comprise speech, handwriting, and keystrokes.
  - 7. The method of claim 1 further comprising updating the decoding front for the second input modality so that it comprises a set of nodes in the common word graph where each node defines the end of a word that was assigned a probability by decoding the input of the second input modality.

8. A computer storage medium having computer-executable instructions that when executed by a computer perform steps to process input received by the computer comprising the steps of:
- receiving input using a first modality;
  
  modifying a word graph based on the input; and
  
  rendering a hypothesis to a user for the input based on the word graph, and repeating the following steps until a desired hypothesis is obtained;
  
  modifying the word graph based on complementary information received using a second modality, the complementary information corresponding to at least a portion of the input, wherein the second modality is different from the first modality, in which modifying the word graph includes rescoring the word graph based on averaging posterior probabilities from the modalities of the input and the complementary information; and
  
  rendering a new hypothesis to the user for the input based on the word graph.
- View Dependent Claims (9, 10)
- - 9. The computer-readable medium of claim 8 wherein the step of modifying the word graph includes pruning a word graph.
  - 10. The computer-readable medium of claim 8 wherein the step of modifying the word graph includes recording relevant nodes of each modality with respect to the word graph and rescoring the word graph includes rescoring the word graph beginning from a recorded node.

11. A computing device comprising:
- a first component configured to provide input into the computing device using a first modality;
  
  a second component configured to provide input into the computing device using a second modality; and
  
  a recognizer configured to receive input from the first component and the second component and configured to modify a common word graph based on input from the first component and input from the second component, wherein modifying a common word graph based on input from the second component comprises rescoring words in the common word graph beginning with words that occur after nodes set in a recorded decoding front for the second modality based on the input from the second component.
- View Dependent Claims (12, 13, 14)
- - 12. The computing device of claim 11 wherein the recognizer is configured to render a hypothesis to the user based on the word graph.
  - 13. The computing device of claim 12 wherein the recognizer is configured to modify the word graph by pruning and rescoring the word graph.
  - 14. The computing device of claim 13 wherein the recognizer is configured to repetitively receive corrective information using the second modality, modify the common word graph, and render a new hypothesis to the user based on the word graph until the new hypothesis is a desired hypothesis.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Soong, Frank Kao-Ping K., Zhou, Jian-Lai, Liu, Peng
Primary Examiner(s)
Hudspeth; David R
Assistant Examiner(s)
Spooner; Lamont M

Application Number

US11/394,809
Publication Number

US 20070239432A1
Time in Patent Office

1,713 Days
Field of Search

704/1, 704/9, 704/10, 704/231, 704/251, 704/257, 707 2- 6, 707706-708, 382/159, 382/116, 382/187, 382/228
US Class Current

704/9
CPC Class Codes

G06F 40/237 Lexical tools

Common word graph based multimodal input

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

29 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Common word graph based multimodal input

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

29 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links