Synchronizing visual and speech events in a multimodal application

US 7,917,365 B2
Filed: 06/16/2005
Issued: 03/29/2011
Est. Priority Date: 06/16/2005
Status: Expired due to Fees

First Claim

Patent Images

1. A method for synchronizing visual and speech events in a multimodal application, the method comprising:

calling a voice form of the multimodal application, wherein the multimodal application is run using at least one computer processor, wherein the multimodal application provides a multimodal web page to a client device over a network;

receiving speech from a user;

determining a semantic interpretation of the speech;

calling a global application update handler of the multimodal application including exiting a voice form;

identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation;

executing the additional processing function;

updating a visual element after executing the additional processing function;

updating a voice form after executing the additional processing function; and

restarting the voice form after executing the additional processing function,wherein determining a semantic interpretation of the speech comprises determining a plurality of semantic interpretations of the speech, andwherein identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation comprises identifying, by the global application update handler, an additional processing function for each of the plurality of semantic interpretations.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Exemplary methods, systems, and products are disclosed for synchronizing visual and speech events in a multimodal application, including receiving from a user speech; determining a semantic interpretation of the speech; calling a global application update handler; identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation; and executing the additional function. Typical embodiments may include updating a visual element after executing the additional function. Typical embodiments may include updating a voice form after executing the additional function. Typical embodiments also may include updating a state table after updating the voice form. Typical embodiments also may include restarting the voice form after executing the additional function.

Citations

18 Claims

1. A method for synchronizing visual and speech events in a multimodal application, the method comprising:
- calling a voice form of the multimodal application, wherein the multimodal application is run using at least one computer processor, wherein the multimodal application provides a multimodal web page to a client device over a network;
  
  receiving speech from a user;
  
  determining a semantic interpretation of the speech;
  
  calling a global application update handler of the multimodal application including exiting a voice form;
  
  identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation;
  
  executing the additional processing function;
  
  updating a visual element after executing the additional processing function;
  
  updating a voice form after executing the additional processing function; and
  
  restarting the voice form after executing the additional processing function,wherein determining a semantic interpretation of the speech comprises determining a plurality of semantic interpretations of the speech, andwherein identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation comprises identifying, by the global application update handler, an additional processing function for each of the plurality of semantic interpretations.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the additional processing function is independent of any particular voice form.
  - 3. The method of claim 1, wherein calling the global update handler comprises calling the global update handler independently of any particular voice form.
  - 4. The method of claim 1, wherein calling the global application update handler comprises calling the global application update handler as a result of an event invoked upon a return of the semantic interpretation of the speech.
  - 5. The method of claim 1, wherein:
    - receiving speech from a user comprises receiving an utterance from the user and recognizing the utterance of the user as a word of speech; and
      
      determining a semantic interpretation of the speech comprises assigning a meaning to the speech, wherein the meaning is applicable not only to the utterance received from the user but also to different words of speech.
  - 6. The method of claim 1, wherein the method further comprises updating a state table after updating the voice form.

7. A system for synchronizing visual and speech events in a multimodal application, the system comprising at least one computer processor, at least one computer memory operatively coupled to the computer processor, and computer program instructions disposed within the computer memory configured for:
- calling a voice form of the multimodal application, wherein the multimodal application provides a multimodal web page to a client device over a network;
  
  receiving speech from a user;
  
  determining a semantic interpretation of the speech;
  
  calling a global application update handler of the multimodal application including exiting a voice form;
  
  identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation;
  
  executing the additional processing function;
  
  updating a visual element after executing the additional processing function;
  
  updating a voice form after executing the additional processing function; and
  
  restarting the voice form after executing the additional processing function,wherein determining a semantic interpretation of the speech comprises determining a plurality of semantic interpretations of the speech, andwherein identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation comprises identifying, by the global application update handler, an additional processing function for each of the plurality of semantic interpretations.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, wherein the additional processing function is independent of any particular voice form.
  - 9. The system of claim 7, wherein calling the global update handler comprises calling the global update handler independently of any particular voice form.
  - 10. The system of claim 7, wherein calling the global application update handler comprises calling the global application update handler as a result of an event invoked upon a return of the semantic interpretation of the speech.
  - 11. The system of claim 7, wherein:
    - receiving speech from a user comprises receiving an utterance from the user and recognizing the utterance of the user as a word of speech; and
      
      determining a semantic interpretation of the speech comprises assigning a meaning to the speech, wherein the meaning is applicable not only to the utterance received from the user but also to different words of speech.
  - 12. The system of claim 7, further comprising computer program instructions disposed within the computer memory capable of updating a state table after updating the voice form.

13. A non-transitory computer-readable storage medium comprising computer program instructions that, when executed on at least one processor in a computer, perform a method of synchronizing visual and speech events in a multimodal application, the method comprising:
- calling a voice form of the multimodal application, wherein the multimodal application provides a multimodal web page to a client device over a network;
  
  receiving speech from a user;
  
  determining a semantic interpretation of the speech;
  
  calling a global application update handler of the multimodal application including exiting a voice form;
  
  identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation;
  
  executing the additional processing function;
  
  updating a visual element after executing the additional processing function;
  
  updating a voice form after executing the additional processing function; and
  
  restarting the voice form after executing the additional processing function,wherein determining a semantic interpretation of the speech comprises determining a plurality of semantic interpretations of the speech, andwherein identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation comprises identifying, by the global application update handler, an additional processing function for each of the plurality of semantic interpretations.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The non-transitory computer-readable storage medium of claim 13, wherein the additional processing function is independent of any particular voice form.
  - 15. The non-transitory computer-readable storage medium of claim 13, wherein calling the global update handler comprises calling the global update handler independently of any particular voice form.
  - 16. The non-transitory computer-readable storage medium claim 13, wherein calling the global application update handler comprises calling the global application update handler as a result of an event invoked upon a return of the semantic interpretation of the speech.
  - 17. The non-transitory computer-readable storage medium of claim 13, wherein:
    - receiving speech from a user comprises receiving an utterance from the user and recognizing the utterance of the user as a word of speech; and
      
      determining a semantic interpretation of the speech comprises assigning a meaning to the speech, wherein the meaning is applicable not only to the utterance received from the user but also to different words of speech.
  - 18. The non-transitory computer-readable storage medium of claim 13, wherein the method further comprises updating a state table after updating the voice form.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Pike, Hilary A., Wintermute, David W., Lewis, Benjamin D., Jablokov, Igor R., Smith, Daniel M., Hollinger, Michael C., Zaitzeff, Michael A., Cross, Charles W. Jr.
Primary Examiner(s)
SAINT CYR, LEONARD

Application Number

US11/154,898
Publication Number

US 20060287845A1
Time in Patent Office

2,112 Days
Field of Search

None
US Class Current

704/251
CPC Class Codes

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2021/105   Synthesis of the lips movem...

Synchronizing visual and speech events in a multimodal application

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Synchronizing visual and speech events in a multimodal application

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links