Synchronizing visual and speech events in a multimodal application

US 8,571,872 B2
Filed: 09/30/2011
Issued: 10/29/2013
Est. Priority Date: 06/16/2005
Status: Expired due to Fees

First Claim

Patent Images

1. A method, comprising:

receiving, by a multimodal application executing on a computer processor, multimodal input from a multimodal browser of a device, wherein the multimodal input comprises speech from a user;

determining a semantic interpretation of at least a portion of the speech using a voice form;

calling a global application update handler of the multimodal application;

identifying, by the global application update handler, an additional processing function based at least in part upon the semantic interpretation and a geographical location, wherein the additional processing function is independent of the voice form; and

executing the additional processing function, wherein the additional processing function executed depends on the semantic interpretation of the at least a portion of the speech,wherein determining a semantic interpretation of at least a portion of the speech comprises determining a plurality of semantic interpretations of the at least a portion of the speech, andwherein identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation comprises identifying, by the global application update handler, an additional processing function for each of the plurality of semantic interpretations.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Exemplary methods, systems, and products are disclosed for synchronizing visual and speech events in a multimodal application, including receiving from a user speech; determining a semantic interpretation of the speech; calling a global application update handler; identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation; and executing the additional function. Typical embodiments may include updating a visual element after executing the additional function. Typical embodiments may include updating a voice form after executing the additional function. Typical embodiments also may include updating a state table after updating the voice form. Typical embodiments also may include restarting the voice form after executing the additional function.

Citations

20 Claims

1. A method, comprising:
- receiving, by a multimodal application executing on a computer processor, multimodal input from a multimodal browser of a device, wherein the multimodal input comprises speech from a user;
  
  determining a semantic interpretation of at least a portion of the speech using a voice form;
  
  calling a global application update handler of the multimodal application;
  
  identifying, by the global application update handler, an additional processing function based at least in part upon the semantic interpretation and a geographical location, wherein the additional processing function is independent of the voice form; and
  
  executing the additional processing function, wherein the additional processing function executed depends on the semantic interpretation of the at least a portion of the speech,wherein determining a semantic interpretation of at least a portion of the speech comprises determining a plurality of semantic interpretations of the at least a portion of the speech, andwherein identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation comprises identifying, by the global application update handler, an additional processing function for each of the plurality of semantic interpretations.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein executing the additional processing function comprises providing content to the user.
  - 3. The method of claim 2, wherein the content comprises an advertisement.
  - 4. The method of claim 1, further comprising updating the voice form based at least in part upon the semantic interpretation after executing the additional processing function.
  - 5. The method of claim 1, further comprising:
    - exiting the voice form; and
      
      restarting the voice form after executing the additional processing function.
  - 6. The method of claim 1, further comprising updating a visual element based at least in part upon the semantic interpretation after executing the additional processing function.

7. A system, comprising:
- at least one computer processor;
  
  at least one computer memory operatively coupled to the computer processor; and
  
  computer program instructions disposed within the computer memory that, when executed, cause the at least one computer processor to;
  
  receive, by a multimodal application executing on a computer processor, multimodal input from a multimodal browser of a device, wherein the multimodal input comprises speech from a user;
  
  determine a semantic interpretation of at least a portion of the speech using a voice form, the semantic interpretation comprising a plurality of semantic interpretations of the at least a portion of the speech;
  
  call a global application update handler of the multimodal application;
  
  identify, by the global application update handler, an additional processing function based at least in part upon the semantic interpretation and a geographical location, for each of the plurality of semantic interpretations, wherein the additional processing function is independent of the voice form; and
  
  execute the additional processing function, wherein the additional processing function executed depends on the semantic interpretation of the at least a portion of the speech.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, further comprising computer program instructions disposed within the computer memory that, when executed, cause the at least one computer processor to provide content to the user based at least in part upon the additional processing function.
  - 9. The system of claim 8, wherein the content comprises an advertisement.
  - 10. The system of claim 7, further comprising computer program instructions disposed within the computer memory that, when executed, cause the at least one computer processor to update the voice form based at least in part upon the semantic interpretation after executing the additional processing function.
  - 11. The system of claim 7, further comprising computer program instructions disposed within the computer memory that, when executed, cause the at least one computer processor to:
    - exit the voice form; and
      
      restart the voice form after executing the additional processing function.
  - 12. The system of claim 7, further comprising computer program instructions disposed within the computer memory that, when executed, cause the at least one computer processor to update a visual element based at least in part upon the semantic interpretation after executing the additional processing function.

13. A non-transitory computer-readable storage medium comprising instructions that, when executed on at least one computer processor, perform a method, comprising:
- receiving, by a multimodal application executing on a computer processor, multimodal input from a multimodal browser of a device, wherein the multimodal input comprises speech from a user;
  
  determining a semantic interpretation of at least a portion of the speech using a voice form;
  
  calling a global application update handler of the multimodal application;
  
  identifying, by the global application update handler, an additional processing function based at least in part upon the semantic interpretation and a geographical location, wherein the additional processing function is independent of the voice form; and
  
  executing the additional processing function, wherein the additional processing function executed depends on the semantic interpretation of the at least a portion of the speech,wherein determining a semantic interpretation of at least a portion of the speech comprises determining a plurality of semantic interpretations of the at least a portion of the speech, andwherein identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation comprises identifying, by the global application update handler, an additional processing function for each of the plurality of semantic interpretations.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The non-transitory computer-readable storage medium of claim 13, further comprising instructions that, when executed, provide content to the user based at least in part upon the additional processing function.
  - 15. The non-transitory computer-readable storage medium of claim 14, wherein the content comprises an advertisement.
  - 16. The non-transitory computer-readable storage medium of claim 13, further comprising instructions that, when executed, update the voice form based at least in part upon the semantic interpretation after executing the additional processing function.
  - 17. The non-transitory computer-readable storage medium of claim 13, further comprising instructions that, when executed:
    - exit the voice form; and
      
      restart the voice form after executing the additional processing function.
  - 18. The non-transitory computer-readable storage medium of claim 13, further comprising instructions that, when executed, update a visual element based at least in part upon the semantic interpretation after executing the additional processing function.

19. A non-transitory computer-readable storage medium comprising instructions that, when executed on at least one computer processor, perform a method, comprising:
- receiving speech from a user;
  
  determining a semantic interpretation of at least a portion of the speech; and
  
  identifying, by a global application update handler, an additional processing function in dependence upon the semantic interpretation;
  
  executing the additional processing function to provide an advertisement based at least in part upon the semantic interpretation and a geographical location,wherein determining a semantic interpretation of at least a portion of the speech comprises determining a plurality of semantic interpretations of the at least a portion of the speech, andwherein identifying, by the global application update handler, an additional processing function in dependence upon the semantic interpretation comprises identifying, by the global application update handler, an additional processing function for each of the plurality of semantic interpretations.
- View Dependent Claims (20)
- - 20. The non-transitory computer-readable storage medium of claim 19, wherein the speech is received using a voice form, and wherein the global application update handler is independent of the voice form.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
Cross, Charles W. Jr., Hollinger, Michael C., Jablokov, Igor R., Lewis, Benjamin D., Pike, Hilary A., Smith, Daniel M., Wintermute, David W., Zaitzeff, Michael A.
Primary Examiner(s)
SAINT CYR, LEONARD

Application Number

US13/249,717
Publication Number

US 20120022875A1
Time in Patent Office

760 Days
Field of Search

None
US Class Current

704/270
CPC Class Codes

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2021/105   Synthesis of the lips movem...

Synchronizing visual and speech events in a multimodal application

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Synchronizing visual and speech events in a multimodal application

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links