Integrating conversational speech into Web browsers
First Claim
1. A method of integrating conversational speech into a multimodal, Web-based processing model, said method comprising:
- speech recognizing a user spoken utterance directed to a voice-enabled field of a multimodal markup language document presented within a browser using a statistical grammar to determine a recognition result;
providing the recognition result to the browser;
receiving, within a natural language understanding (NLU) system, the recognition result from the browser;
semantically processing the recognition result to determine a meaning; and
selecting a next programmatic action to be performed according to the meaning.
2 Assignments
0 Petitions
Accused Products
Abstract
A method of integrating conversational speech into a multimodal, Web-based processing model can include speech recognizing a user spoken utterance directed to a voice-enabled field of a multimodal markup language document presented within a browser. A statistical grammar can be used to determine a recognition result. The method further can include providing the recognition result to the browser, receiving, within a natural language understanding (NLU) system, the recognition result from the browser, and semantically processing the recognition result to determine a meaning. Accordingly, a next programmatic action to be performed can be selected according to the meaning.
-
Citations
20 Claims
-
1. A method of integrating conversational speech into a multimodal, Web-based processing model, said method comprising:
-
speech recognizing a user spoken utterance directed to a voice-enabled field of a multimodal markup language document presented within a browser using a statistical grammar to determine a recognition result;
providing the recognition result to the browser;
receiving, within a natural language understanding (NLU) system, the recognition result from the browser;
semantically processing the recognition result to determine a meaning; and
selecting a next programmatic action to be performed according to the meaning. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for processing multimodal interactions including conversational speech using a Web-based processing model, said system comprising:
-
a multimodal server configured to process a multimodal markup language document and store non-visual portions of the multimodal markup language document, wherein the multimodal server provides visual portions of the multimodal markup language document to a client browser;
a voice server configured to perform automatic speech recognition upon a user spoken utterance directed to a voice-enabled field of the multimodal markup language document, wherein said voice server utilizes a statistical grammar to process the user spoken utterance directed to the voice-enabled field, wherein the client browser is provided with a result from the automatic speech recognition;
a conversational server configured to semantically process the result of the automatic speech recognition to determine a meaning that is provided to a Web server, wherein the conversational server receives the result of the automatic speech recognition to be semantically processed from the client browser via the Web server; and
an application server configured to provide data responsive to an instruction from the Web server, wherein the Web server issues the instruction according to the meaning. - View Dependent Claims (10, 11, 12)
-
-
13. A machine readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform the steps of:
-
speech recognizing a user spoken utterance directed to a voice-enabled field of a multimodal markup language document presented within a browser using a statistical grammar to determine a recognition result;
providing the recognition result to the browser;
receiving, within a natural language understanding (NLU) system, the recognition result from the browser;
semantically processing the recognition result to determine a meaning; and
selecting a next programmatic action to be performed according to the meaning. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification