×

MULTIMODAL DISAMBIGUATION OF SPEECH RECOGNITION

  • US 20120143607A1
  • Filed: 12/06/2011
  • Published: 06/07/2012
  • Est. Priority Date: 06/02/2004
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for processing language input in a system that includes a mobile computer, the mobile computer including a microphone and a display and a text input device operable by a user, the method comprising operations of:

  • (a) responsive to the mobile computing device receiving via the microphone voice input comprising multiple discrete utterances from a user, converting the voice input into a digital sequence of vectors;

    (b) the mobile computing device creating an initial N-best list of words corresponding to each of the utterances by conducting speech recognition operations including matching the vectors to potential phonemes and matching the phonemes against a lexicon model and a language model, the operation of creating each initial N-best list of words further considering context of the corresponding utterance with respect to words of N-best lists corresponding to others of the received utterances, said context including subject-verb agreement, proper case, proper gender, and numerical agreement;

    (c) for each of said utterances, the mobile computing device visually displaying a best word from the initial N-best list of words corresponding to said utterance;

    (d) responsive to implied or explicit user selection of one of the displayed best words, said selected word being from a given N-best list of words corresponding to a given utterance, the mobile computing device causing the display to present additional words from the given initial N-best list of words;

    (e) during said presentation of the additional words, the mobile computing device receiving via the text input device hand-entered input from a user, and responsive to said text input, constraining said presentation of the additional words to exclude words of the given initial N-best list that are inconsistent with the textual input;

    (f) responsive to said presentation of the additional words being constrained to a resultant word, displaying the resultant word instead of the selected word;

    (g) the mobile computing device updating the initial N-best lists of others of the utterances besides the given utterance to provide subject-verb agreement, employ proper case, use proper gender, and exhibit numerical agreement when considered in context of the resultant word;

    (h) for each of the utterances having an updated N-best list, the mobile computing device causing the display to present a best word of the updated N-best list of words for that utterance; and

    (i) for each of the utterances without an updated N-best list, the mobile computing device causing the display to present a best word of the initial N-best list of words for said utterance.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×