Supporting multi-lingual user interaction with a multimodal application

US 8,909,532 B2
Filed: 03/23/2007
Issued: 12/09/2014
Est. Priority Date: 03/23/2007
Status: Active Grant

- Alert
- Pin

First Claim

Patent Images

1. A method comprising:

receiving a voice utterance from a user;

determining, using at least one speech engine operating on at least one processor and a plurality of grammars that each specifies a limited set of one or more acceptable inputs in a language of a plurality of languages, a plurality of speech recognition results for the voice utterance and a plurality of confidence levels, the at least one speech engine determining each of the plurality of speech recognition results by using at least one of the plurality of grammars and matching the voice utterance to the limited set of acceptable inputs identified by the at least one grammar of the plurality of grammars, each confidence level of the plurality of confidence levels corresponding to a respective speech recognition result of the plurality of speech recognition results and each of the plurality of speech recognition results corresponding to a respective language of the plurality of languages, wherein each of the plurality of confidence levels determined using the at least one speech engine indicates a confidence of the at least one speech engine that the voice utterance matches a matched input of the limited set of acceptable inputs identified by the at least one grammar used to determine the speech recognition result;

evaluating the plurality of confidence levels for the plurality of speech recognition results to determine a speech recognition result of the plurality of speech recognition results having a highest confidence level of the plurality of confidence levels determined by the at least one speech engine; and

selecting one of the plurality of languages for use in subsequently interacting with the user by selecting a language corresponding to the speech recognition result having the highest confidence level of the plurality of confidence levels determined by the at least one speech engine.

View all claims

3 Assignments

Timeline View

Assignment View

Litigations

0 Petitions

Accused Products

Abstract

Methods, apparatus, and products are disclosed for supporting multi-lingual user interaction with a multimodal application, the application including a plurality of VoiceXML dialogs, each dialog characterized by a particular language, supporting multi-lingual user interaction implemented with a plurality of speech engines, each speech engine having a grammar and characterized by a language corresponding to one of the dialogs, with the application operating on a multimodal device supporting multiple modes of interaction including a voice mode and one or more non-voice modes, the application operatively coupled to the speech engines through a VoiceXML interpreter, the VoiceXML interpreter: receiving a voice utterance from a user; determining in parallel, using the speech engines, recognition results for each dialog in dependence upon the voice utterance and the grammar for each speech engine; administering the recognition results for the dialogs; and selecting a language for user interaction in dependence upon the administered recognition results.

Citations

21 Claims

1. A method comprising:
- receiving a voice utterance from a user;
  
  determining, using at least one speech engine operating on at least one processor and a plurality of grammars that each specifies a limited set of one or more acceptable inputs in a language of a plurality of languages, a plurality of speech recognition results for the voice utterance and a plurality of confidence levels, the at least one speech engine determining each of the plurality of speech recognition results by using at least one of the plurality of grammars and matching the voice utterance to the limited set of acceptable inputs identified by the at least one grammar of the plurality of grammars, each confidence level of the plurality of confidence levels corresponding to a respective speech recognition result of the plurality of speech recognition results and each of the plurality of speech recognition results corresponding to a respective language of the plurality of languages, wherein each of the plurality of confidence levels determined using the at least one speech engine indicates a confidence of the at least one speech engine that the voice utterance matches a matched input of the limited set of acceptable inputs identified by the at least one grammar used to determine the speech recognition result;
  
  evaluating the plurality of confidence levels for the plurality of speech recognition results to determine a speech recognition result of the plurality of speech recognition results having a highest confidence level of the plurality of confidence levels determined by the at least one speech engine; and
  
  selecting one of the plurality of languages for use in subsequently interacting with the user by selecting a language corresponding to the speech recognition result having the highest confidence level of the plurality of confidence levels determined by the at least one speech engine.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising:
    - prompting the user for the voice utterance.
  - 3. The method of claim 2, wherein prompting the user for the voice utterance further comprises rendering, in a sequential manner, a prompt in each of the plurality of languages.
  - 4. The method of claim 1, wherein:
    - an application having a voice interface uses a plurality of dialogs to interact with the user, each dialog being arranged to cause the application to interact with the user in a particular one of the plurality of languages;
      
      each grammar of the plurality of grammars corresponds to one of the plurality of dialogs and the particular language of the corresponding dialog;
      
      receiving the voice utterance comprises processing the voice utterance for each dialog using a Form Interpretation Algorithm (‘
      
      FIA’
      
      ) corresponding to the dialog; and
      
      determining speech recognition results further comprises collecting speech recognition results, using the plurality of grammars, for each dialog according to the FIA corresponding to each dialog.
  - 5. The method of claim 4, wherein further comprising:
    - configuring the application to interact with the user in the language corresponding to the speech recognition result corresponding to the highest confidence value, wherein the configuring comprises activating an event that updates user interaction attributes of a Document Object Model representing the application.
  - 6. The method of claim 1, wherein determining the speech recognition results using the at least one speech engine comprises determining the speech recognition results using, in parallel, a plurality of speech engines.
  - 7. The method of claim 1, wherein each language of the plurality of languages is associated with a different speech engine of a plurality of speech engines, andwherein determining the plurality of speech recognition results comprises determining the plurality of speech recognition results by respectively using the plurality of speech engines.
  - 8. The method of claim 1, further comprising:
    - interacting with the user in the language corresponding to the speech recognition result corresponding to the highest confidence value of the plurality of confidence levels determined using the at least one speech engine.

9. An apparatus comprising:
- at least one computer processor programmed to;
  
  receive a voice utterance from a user;
  
  determine, using at least one speech engine and a plurality of grammars that each specifies a limited set of one or more acceptable inputs in a language of a plurality of languages, a plurality of speech recognition results for the voice utterance and a plurality of confidence levels, the at least one speech engine determining each of the plurality of speech recognition results by using at least one of the plurality of grammars and matching the voice utterance to the limited set of acceptable inputs identified by the at least one grammar of the plurality of grammars, each confidence level of the plurality of confidence levels corresponding to a respective speech recognition result of the plurality of speech recognition results and each of the plurality of speech recognition results corresponding to a respective language of the plurality of languages, wherein each of the plurality of confidence levels determined using the at least one speech engine indicates a confidence of the at least one speech engine that the voice utterance matches a matched input of the limited set of acceptable inputs identified by the at least one grammar used to determine the speech recognition result;
  
  evaluate the plurality of confidence levels for the plurality of speech recognition results to determine a speech recognition result of the plurality of speech recognition results having a highest confidence level of the plurality of confidence levels determined by the at least one speech engine; and
  
  select one of the plurality of languages for use in subsequently interacting with the user by selecting a language corresponding to the speech recognition result having the highest confidence level of the plurality of confidence levels determined by the at least one speech engine.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The apparatus of claim 9, wherein:
    - an application having a voice interface is configured to provide a prompt to the user in each language of the plurality of languages, andthe at least one processor is further programmed to prompt the user for the voice utterance, the prompting including rendering, in a sequential manner, the prompt of each language.
  - 11. The apparatus of claim 9, wherein:
    - an application having a voice interface uses a plurality of dialogs to interact with the user, each dialog being arranged to cause the application to interact with the user in a particular one of the plurality of languages;
      
      each grammar of the plurality of grammars corresponds to one of the plurality of dialogs and the particular language of the corresponding dialog;
      
      the at least one processor is programmed to receive the voice utterance by processing the voice utterance for each dialog using a Form Interpretation Algorithm (‘
      
      FIA’
      
      ) for the dialog; and
      
      the at least one processor is programmed to determine the speech recognition results using the plurality of grammars by collecting recognitions results for each dialog according to the FIA corresponding to each dialog.
  - 12. The apparatus of claim 11, wherein the at least one processor is programmed to select the language for user interaction at least in part by configuring the application to interact with the user in the language, wherein the configuring comprises activating an event that updates user interaction attributes of a Document Object Model representing the application.
  - 13. The apparatus of claim 9, wherein the at least one processor is programmed to determine the speech recognition results using the at least one speech engine at least in part by determining the speech recognition results using, in parallel, a plurality of speech engines.
  - 14. The apparatus of claim 9, wherein each language of the plurality of languages is associated with a different speech engine of a plurality of speech engines, andwherein the at least one processor is programmed to determine the plurality of speech recognition results at least in part by determining the plurality of speech recognition results respectively using the plurality of speech engines.

15. At least one non-transitory recordable medium encoded with computer-executable instructions that, when executed by a computer, cause the computer to carry out a method comprising:
- receiving a voice utterance from a user;
  
  determining, using at least one speech engine and a plurality of grammars that each specifies a limited set of one or more acceptable inputs in a language of a plurality of languages, a plurality of speech recognition results for the voice utterance and a plurality of confidence levels, the at least one speech engine determining each of the plurality of speech recognition results by using at least one of the plurality of grammars and matching the voice utterance to the limited set of acceptable inputs identified by the at least one grammar of the plurality of grammars, each confidence level of the plurality of confidence levels corresponding to a respective speech recognition result of the plurality of speech recognition results and each of the plurality of speech recognition results corresponding to a respective language of the plurality of languages, wherein each of the plurality of confidence levels determined using the at least one speech engine indicates a confidence of the at least one speech engine that the voice utterance matches a matched input of the limited set of acceptable inputs identified by the at least one grammar used to determine the speech recognition result;
  
  evaluating the plurality of confidence levels for the plurality of speech recognition results to determine a speech recognition result of the plurality of speech recognition results having a highest confidence level of the plurality of confidence levels determined using the at least one speech engine; and
  
  selecting one of the plurality of languages for use in subsequently interacting with the user by selecting a language corresponding to the speech recognition result having the highest confidence level of the plurality of confidence levels determined using the at least one speech engine.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The at least one recordable medium of claim 15, wherein the method further comprises prompting the user for the voice utterance.
  - 17. The at least one recordable medium of claim 16, wherein prompting the user for the voice utterance comprises rendering, in a sequential manner, a prompt in each of the plurality of languages.
  - 18. The at least one recordable medium of claim 15, wherein:
    - an application having a voice interface uses a plurality of dialogs to interact with the user, each dialog being arranged to cause the application to interact with the user in a particular one of the plurality of languages;
      
      each grammar of the plurality of grammars corresponds to one of the plurality of dialogs and the particular language of the corresponding dialog;
      
      receiving the voice utterance comprises processing the voice utterance for each dialog using a Form Interpretation Algorithm (‘
      
      FIA’
      
      ) corresponding to the dialog; and
      
      determining speech recognition results further comprises collecting speech recognition results, using the plurality of grammars, for each dialog according to the FIA corresponding to each dialog.
  - 19. The at least one recordable medium of claim 15, wherein selecting the language for user interaction further comprises configuring an application having a voice interface to interact with the user in the language corresponding to the speech recognition result corresponding to the highest confidence value, wherein the configuring comprises activating an event that updates user interaction attributes of a Document Object Model representing the application.
  - 20. The at least one recordable medium of claim 15, wherein determining the speech recognition results using the at least one speech engine comprises determining the speech recognition results using, in parallel, a plurality of speech engines.
  - 21. The at least one recordable medium of claim 15, wherein each language of the plurality of languages is associated with a different speech engine of a plurality of speech engines, andwherein determining the plurality of speech recognition results comprises determining the plurality of speech recognition results respectively using the plurality of speech engines.

Specification

Resources

Litigation Campaign Assessment

Litigation Data

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Cross, Charles W. Jr.
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Villena, Mark

Application Number

US11/690,423
Publication Number

US 20080235027A1
Time in Patent Office

2,818 Days
Field of Search

704/8, 704/251, 704/270, 704/270.1, 704/275, 715/234
US Class Current

704/270.1
CPC Class Codes

G10L 15/22 Procedures used during a sp...

Supporting multi-lingual user interaction with a multimodal application

First Claim

3 Assignments

Litigations

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Supporting multi-lingual user interaction with a multimodal application

First Claim

3 Assignments

Subscription Required

Subscription Required

Litigations

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links