Techniques for disambiguating speech input using multimodal interfaces

  • US 7,684,985 B2
  • Filed: 12/10/2003
  • Issued: 03/23/2010
  • Est. Priority Date: 12/10/2002
  • Status: Active Grant
  • ×
    • Pin Icon | RPX Insight
    • Pin
First Claim
Patent Images

1. A system for disambiguating speech input using one of voice mode interaction, visual mode interaction, or a combination of voice mode interaction and visual mode interaction with an application comprising:

  • a speech disambiguation mechanism resident on one of an end user device and a remote server, and accessed through said end user device possessing multimodal user interfaces, said speech disambiguation mechanism comprising;

    an options and parameters component for receiving and storing user parameters and receiving application parameters for controlling the speech disambiguation mechanism, wherein the speech disambiguation mechanism is controlled by parameters set by the user and parameters set by the application, and wherein the parameters include confidence thresholds governing unambiguous recognition and close matches;

    a speech recognition component that receives recorded audio, speech input or a combination of the recorded audio and the speech input through one of said multimodal user interfaces, and generates;

    a plurality of tokens corresponding to disambiguated words for presentation to the user; and

    for each of the one or more tokens, a confidence value indicative of the likelihood that a given token correctly represents the speech input;

    a selection component that identifies, according to a selection algorithm, two or more of the tokens to be presented to the user;

    one or more disambiguation components directing one or more of said multimodal user interfaces to present the alternatives to the user in one of voice mode, visual mode, or a combination of the voice mode and the visual mode, and directing the multimodal user interfaces to receive an alternative selected by the user in one of the voice mode, the visual mode, or a combination of the voice mode and the visual mode; and

    an output interface for communicating the selected alternative without translation of the speech input to the application as input.

View all claims

    Thank you for your feedback