Integrating conversational speech into Web browsers

US 20060235694A1
Filed: 04/14/2005
Published: 10/19/2006
Est. Priority Date: 04/14/2005
Status: Abandoned Application

First Claim

Patent Images

1. A method of integrating conversational speech into a multimodal, Web-based processing model, said method comprising:

speech recognizing a user spoken utterance directed to a voice-enabled field of a multimodal markup language document presented within a browser using a statistical grammar to determine a recognition result;

providing the recognition result to the browser;

receiving, within a natural language understanding (NLU) system, the recognition result from the browser;

semantically processing the recognition result to determine a meaning; and

selecting a next programmatic action to be performed according to the meaning.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of integrating conversational speech into a multimodal, Web-based processing model can include speech recognizing a user spoken utterance directed to a voice-enabled field of a multimodal markup language document presented within a browser. A statistical grammar can be used to determine a recognition result. The method further can include providing the recognition result to the browser, receiving, within a natural language understanding (NLU) system, the recognition result from the browser, and semantically processing the recognition result to determine a meaning. Accordingly, a next programmatic action to be performed can be selected according to the meaning.

Citations

20 Claims

1. A method of integrating conversational speech into a multimodal, Web-based processing model, said method comprising:
- speech recognizing a user spoken utterance directed to a voice-enabled field of a multimodal markup language document presented within a browser using a statistical grammar to determine a recognition result;
  
  providing the recognition result to the browser;
  
  receiving, within a natural language understanding (NLU) system, the recognition result from the browser;
  
  semantically processing the recognition result to determine a meaning; and
  
  selecting a next programmatic action to be performed according to the meaning.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising, prior to said speech recognizing step, sending at least a visual portion of the multimodal markup language document to the browser, wherein the statistical grammar is associated with the voice-enabled field.
  - 3. The method of claim 1, further comprising, responsive to a notification that the requesting browser is executing at least a visual portion of the multimodal markup language document, loading the statistical grammar for processing user speech directed to the voice-enabled field.
  - 4. The method of claim 1, wherein the recognition result comprises speech recognized text.
  - 5. The method of claim 1, wherein the recognition result comprises a tokenized representation of the user spoken utterance.
  - 6. The method of claim 1, wherein the recognition result comprises at least one of speech recognized text and a tokenized representation of the user spoken utterance, said speech recognizing step further comprising parsing the recognition result to determine data for at least one input element of the multimodal markup language document presented within the browser, such that the data is used in said semantically processing step with the recognition result.
  - 7. The method of claim 1, further comprising receiving, within the NLU system, additional data that was entered, through a non-voice user input, into the multimodal markup language document presented by the browser, wherein said semantically processing step is performed using the recognition result and the additional data.
  - 8. The method of claim 1, said determining step comprising generating a next multimodal markup language document that is provided to the browser.

9. A system for processing multimodal interactions including conversational speech using a Web-based processing model, said system comprising:
- a multimodal server configured to process a multimodal markup language document and store non-visual portions of the multimodal markup language document, wherein the multimodal server provides visual portions of the multimodal markup language document to a client browser;
  
  a voice server configured to perform automatic speech recognition upon a user spoken utterance directed to a voice-enabled field of the multimodal markup language document, wherein said voice server utilizes a statistical grammar to process the user spoken utterance directed to the voice-enabled field, wherein the client browser is provided with a result from the automatic speech recognition;
  
  a conversational server configured to semantically process the result of the automatic speech recognition to determine a meaning that is provided to a Web server, wherein the conversational server receives the result of the automatic speech recognition to be semantically processed from the client browser via the Web server; and
  
  an application server configured to provide data responsive to an instruction from the Web server, wherein the Web server issues the instruction according to the meaning.
- View Dependent Claims (10, 11, 12)
- - 10. The system of claim 9, wherein the conversational server further is provided non- voice user input originating from at least one graphical user interface element of the multimodal markup language document such that the meaning is determined according to the non-voice user input and the result of the automatic speech recognition.
  - 11. The system of claim 9, wherein the result of the automatic speech recognition comprises a tokenized representation of the user spoken utterance and at least one of speech recognized text derived from the user spoken utterance and data derived from the user spoken utterance that corresponds to at least one input mechanism of a visual portion of the multimodal markup language document.
  - 12. The system of claim 9, wherein the Web server generates a multimodal markup language document to be provided to the client browser, wherein the multimodal markup language document comprises data obtained from the application server.

13. A machine readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform the steps of:
- speech recognizing a user spoken utterance directed to a voice-enabled field of a multimodal markup language document presented within a browser using a statistical grammar to determine a recognition result;
  
  providing the recognition result to the browser;
  
  receiving, within a natural language understanding (NLU) system, the recognition result from the browser;
  
  semantically processing the recognition result to determine a meaning; and
  
  selecting a next programmatic action to be performed according to the meaning.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The machine readable storage of claim 13, further comprising, prior to said speech recognizing step, sending at least a visual portion of the multimodal markup language document to the browser, wherein the statistical grammar is associated with the voice-enabled field.
  - 15. The machine readable storage of claim 13, further comprising, responsive to a notification that the requesting browser is executing at least a visual portion of the multimodal markup language document, loading the statistical grammar for processing user speech directed to the voice-enabled field.
  - 16. The machine readable storage of claim 13, wherein the recognition result comprises speech recognized text.
  - 17. The machine readable storage of claim 13, wherein the recognition result comprises a tokenized representation of the user spoken utterance.
  - 18. The machine readable storage of claim 13, wherein the recognition result comprises at least one of speech recognized text and a tokenized representation of the user spoken utterance, said speech recognizing step further comprising parsing the recognition result to determine data for at least one input element of the multimodal markup language document presented within the browser, such that the data is used in said semantically processing step with the recognition result.
  - 19. The machine readable storage of claim 13, further comprising receiving, within the NLU system, additional data that was entered, through a non-voice user input, into the multimodal markup language document presented by the browser, wherein said semantically processing step is performed using the recognition result and the additional data.
  - 20. The machine readable storage of claim 13, said determining step comprising generating a next multimodal markup language document that is provided to the browser.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Muschett, Brien H., Ruback, Harvey M., Cross, Charles W., Wilson, Leslie R.

Application Number

US11/105,865
Publication Number

US 20060235694A1
Time in Patent Office

Days
Field of Search
US Class Current

704/270.100
CPC Class Codes

G06F 16/95   Retrieval from the web

G10L 15/26   Speech to text systems G10L...

H04M 3/4936   Speech interaction details ...

Integrating conversational speech into Web browsers

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Integrating conversational speech into Web browsers

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links