Distributed voice web architecture and associated components and methods
DCFirst Claim
1. A method comprising:
- receiving speech of a user;
endpointing the speech of the user locally for automatic speech recognition;
transmitting the endpointed speech of the user to a remote site over a wide area network for remote speech recognition;
receiving remotely generated prompts transmitted over the wide area network; and
playing the prompts to the user.
5 Assignments
Litigations
0 Petitions
Accused Products
Abstract
A speech-enabled distributed processing system forming a Voice Web includes a gateway, one or more voice content sites coupled to the gateway over a wide area network, and a browser coupled to the gateway over a network, which may or may not be the wide area network. The gateway receives telephone calls from one or more users over telephony connections and performs endpointing of speech of each user. The browser provides the gateway with information enabling the gateway to selectively direct the endpointed speech to a voice content site via the wide area network. The gateway outputs the endpointed speech in the form of application protocol requests onto the wide area network to the appropriate site, as specified by the browser, or to the browser. The gateway receives prompts in the form of application protocol responses from the browser or a voice content site and plays the prompts to the appropriate user over the telephony connection. While accessing a selected voice content site, the gateway reroutes the endpointed speech to the browser if the endpointing result represents a hotword candidate.
-
Citations
54 Claims
-
1. A method comprising:
-
receiving speech of a user;
endpointing the speech of the user locally for automatic speech recognition;
transmitting the endpointed speech of the user to a remote site over a wide area network for remote speech recognition;
receiving remotely generated prompts transmitted over the wide area network; and
playing the prompts to the user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
receiving a first voice hyperlink control message from a remote speech application; and
transmitting a second voice hyperlink control message to a remote voice browser in response to the first voice hyperlink control message.
-
-
8. A method as recited in claim 1, further comprising receiving and responding to a control message from a remote voice browser sent, the control message sent via a second network separate from the wide area network.
-
9. A method as recited in claim 1, wherein said endpointing comprises concurrently applying, to the speech of the user, a set of endpointing parameters for a voice browser and a set of endpointing parameters for a speech application other than the voice browser.
-
10. A method as recited in claim 9, further comprising transmitting the endpointed speech to the voice browser if the speech of the user satisfies the set of endpointing parameters for the voice browser.
-
11. A method as recited in claim 10, wherein the set of endpointing parameters for the voice browser corresponds to a browser hotword.
-
12. A method as recited in claim 11, further comprising transmitting the endpointed speech to said speech application other than the voice browser even if the speech of the user satisfies the set of endpointing parameters for the browser hotword.
-
13. A method as recited in claim 1, further comprising locally recognizing a browser hotword in the speech of the user.
-
14. A method as recited in claim 1, wherein said endpointing comprises:
-
using a set of endpointing parameters; and
dynamically adjusting the endpointing parameters during a session with the user based on a response received from the remote site.
-
-
15. A method as recited in claim 14, wherein the endpointing parameters are modifiable on a per-utterance basis.
-
16. A method as recited in claim 1, performed concurrently for each of a plurality of users, to allow each of the users to sequentially access selected ones of a plurality of remote sites on the wide area network.
-
17. A method as recited in claim 1, performed locally within a telephony end user device.
-
18. A method as recited in claim 1, wherein said receiving speech of the user comprises receiving the speech of the user over the wide area network.
-
19. A method as recited in claim 1, wherein said receiving speech of the user comprises receiving the speech of the user using Internet Protocol (IP) telephony.
-
20. A method as recited in claim 1, wherein said receiving speech of the user comprises receiving the speech of the user over a Public Switched Telephone Network (PSTN).
-
21. A method as recited in claim 1, wherein said receiving speech of the user comprises receiving the speech of the user from a local microphone, and said playing the prompts to the user comprises playing the prompts to the user via a local speaker.
-
22. A method as recited in claim 1, further comprising:
-
receiving a Hypertext Transport Protocol (HTTP) cookie from a remote speech application; and
using the HTTP cookie to maintain state of the remote speech application.
-
-
23. A method as recited in claim 22, further comprising using the HTTP cookie to maintain state of the remote speech application within a user session.
-
24. A method as recited in claim 22, further comprising using the HTTP cookie to maintain state of the remote speech application between user sessions.
-
25. A method as recited in claim 1, further comprising receiving verification information resulting from a speaker identity verification process executing in a remote site.
-
26. A method as recited in claim 25, wherein the remote site executing the speaker identity verification process is a site of a voice browser.
-
27. A method as recited in claim 25, further comprising providing the verification information to a remote voice content site over the wide area network in response to a user attempting to access said remote voice content site.
-
28. A method of providing a user with access to voice content on a network, the method comprising:
-
receiving a first voice hyperlink control message transmitted from a voice browser in a first r emote voice content site;
activating a voice hyperlink in response to the voice hyperlink control message to provide a user with voice access to a speech application in a second remote voice content site over a wide area network;
receiving speech of the user;
endpointing the speech of the user locally for automatic speech recognition, including concurrently applying, to the speech of the user, a first set of endpointing parameters for the voice browser and a second set of endpointing parameters for said speech application;
transmitting endpointed speech of the user to the second remote voice content site via the wide area network for speech recognition;
receiving remotely generated prompts transmitted over the wide area network; and
playing the prompts to the user.
-
-
29. A method comprising:
-
receiving remotely transmitted endpointed speech of a user over a network, the endpointed speech having been endpointed for automatic speech recognition by a remote device and transmitted onto the network by the remote device;
recognizing the speech locally;
generating a prompt in response to the speech;
transmitting the prompt to the remote device over the network; and
providing a voice hyperlink control message to the remote device over the network to allow the remote device to access a remote voice content site. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38)
-
-
39. A method comprising:
-
receiving endpointed speech of a user transmitted remotely over a wide area network, the endpointed speech having been endpointed for automatic speech recognition by a remote device and transmitted onto the wide area network by the remote device;
recognizing the speech locally;
generating a prompt in response to the speech; and
transmitting the prompt to the remote device over the wide area network. - View Dependent Claims (40, 41, 42, 43, 44)
-
-
45. A device comprising:
-
a voice interface to allow the device to receive speech from a user;
an endpointer to perform endpointing of the speech of the user for automatic speech recognition;
a network interface to connect the device to a wide area network; and
a processor to control the device to cause the device to provide the user with voice access to a remote content site maintaining a speech application over the wide area network, wherein the processor is configured to transmit the endpointed speech of the user to a remote speech recognizer over the wide area network, to receive prompts via the wide area network, and to play the prompts to the user via the voice interface.
-
-
46. A gateway for use in a speech-enabled processing system, the gateway comprising:
-
voice interface means for receiving speech from a user;
endpointer means for endpointing the speech of the user for automatic speech recognition;
network interface means for connecting the gateway to a wide area network; and
control means for controlling the gateway to provide the user with voice access to a remote content site maintaining a speech-enabled application via the wide area network, the control means including means for transmitting results of said endpointing to the remote speech application over the wide area network, means for receiving prompts transmitted over the wide area network, and means for playing the prompts to the user using the voice interface means.
-
-
47. A voice content site on a network, the voice content site comprising:
-
means for receiving remotely transmitted endpointed speech of a user over a network, the endpointed speech having been endpointed for automatic speech recognition and transmitted onto the network by a remote gateway having a voice connection with the user;
means for recognizing the speech;
means for generating a prompt in response to the speech; and
means for transmitting the prompt to the gateway over the network.
-
-
48. An apparatus for operating a voice content site on a network, the method comprising:
-
means for receiving endpointed speech of a user transmitted remotely over a wide area network, the endpointed speech having been endpointed for automatic speech recognition and transmitted onto the wide area network by a remote gateway having a voice connection with the user;
means for recognizing the speech locally;
means for generating a prompt in response to the speech; and
means for transmitting the prompt to the gateway over the wide area network.
-
-
49. A speech-enabled distributed processing system comprising:
-
a gateway configured to provide a user with sequential voice access to selected ones of a plurality of remote voice content sites on a first network, each of the remote voice content sites operating a speech application, the gateway coupled to receive speech from the user via a voice interface, the gateway further configured to perform endpointing of the speech for automatic speech recognition, to transmit the endpointed speech onto the first network, and to receive prompts transmitted over the first network and to play the prompts to the user; and
a first voice content site remotely coupled to the gateway on the first network, the first voice content site configured to receive endpointed speech of the user transmitted by the gateway over the first network, to perform speech recognition on the endpointed speech, to generate prompts and to transmit the generated prompts to the gateway over the first network, and to provide control messages to the gateway to configure the gateway to provide the user with access to another remote voice content site on the first network in response to an utterance of the user.
-
-
50. A speech-enabled processing system comprising:
-
a gateway coupled to concurrently receive a plurality of telephone communications, each from a different user, via a telephony enabled network, the gateway configured to provide concurrently each of the users with sequential voice access to selected ones of a plurality of remote voice content sites, each of the remote voice content sites operating a speech application, at least some of the remote voice content sites coupled to the gateway via a wide area network, the gateway further configured to perform endpointing of speech of the user for automatic speech recognition and to output results of said endpointing in requests onto the wide area network, the results of said endpointing selectively directed by the gateway to appropriate ones of the remote content sites, the gateway further configured to receive prompts in responses via the wide area network and to play the prompts to the user;
a first voice content site of the plurality of voice content sites, coupled to the gateway remotely via the wide area network, the voice content site configured to receive the requests via the wide area network and to perform speech recognition on the endpointed speech contained therein, the voice content site further configured to generate the prompts and to output the packetized prompts in the responses onto the wide area network; and
a second voice content site of the plurality of voice content sites, coupled to the gateway, the second voice content site including a voice browser configured to control access by the gateway to the plurality of remote voice content sites, the voice browser configured to provide the gateway with voice hyperlink control messages to configure the gateway to selectively direct the results of said endpointing in response to speech from a user, to activate a voice hyperlink to a selected voice content site.
-
-
51. An apparatus comprising:
-
a telephony device to provide telephonic communication between a local user and a remote user on a wide area network, the telephony device including an audio input device to receive speech from the local user and an audio output device to output speech of the remote user to the local user; and
a gateway to provide the local user with voice access to any of a plurality of remote voice content sites via the wide area network, the gateway including an endpointer to endpoint speech of the local user for automatic speech recognition, the gateway configured to transmit results of endpointing the speech of the local user to a remote speech application over the wide area network, to receive prompts transmitted over the wide area network by the speech application, and to play the prompts to the local user using the audio output device. - View Dependent Claims (52, 53, 54)
-
Specification