Distributed voice web architecture and associated components and methods

US 6,785,653 B1
Filed: 05/01/2000
Issued: 08/31/2004
Est. Priority Date: 05/01/2000
Status: Expired due to Term

- Alert
- Pin

First Claim

Patent Images

1. A method comprising:

receiving speech of a user;

endpointing the speech of the user locally for automatic speech recognition;

transmitting the endpointed speech of the user to a remote site over a wide area network for remote speech recognition;

receiving remotely generated prompts transmitted over the wide area network; and

playing the prompts to the user.

View all claims

5 Assignments

Timeline View

Assignment View

Litigations

0 Petitions

Accused Products

Abstract

A speech-enabled distributed processing system forming a Voice Web includes a gateway, one or more voice content sites coupled to the gateway over a wide area network, and a browser coupled to the gateway over a network, which may or may not be the wide area network. The gateway receives telephone calls from one or more users over telephony connections and performs endpointing of speech of each user. The browser provides the gateway with information enabling the gateway to selectively direct the endpointed speech to a voice content site via the wide area network. The gateway outputs the endpointed speech in the form of application protocol requests onto the wide area network to the appropriate site, as specified by the browser, or to the browser. The gateway receives prompts in the form of application protocol responses from the browser or a voice content site and plays the prompts to the appropriate user over the telephony connection. While accessing a selected voice content site, the gateway reroutes the endpointed speech to the browser if the endpointing result represents a hotword candidate.

Citations

54 Claims

1. A method comprising:
- receiving speech of a user;
  
  endpointing the speech of the user locally for automatic speech recognition;
  
  transmitting the endpointed speech of the user to a remote site over a wide area network for remote speech recognition;
  
  receiving remotely generated prompts transmitted over the wide area network; and
  
  playing the prompts to the user.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
- - 2. A method as recited in claim 1, wherein the remote site is a remote voice content site, the method further comprising activating a voice hyperlink to provide the user with voice access to the remote site over the wide area network.
  - 3. A method as recited in claim 2, further comprising receiving a first voice hyperlink control message from a remote speech application, wherein said activating the voice hyperlink comprises responding to the voice hyperlink control message to provide the user with access to the remote voice content site.
  - 4. A method as recited in claim 3, wherein the remote speech application is in a remote voice content site other than said remote voice content site.
  - 5. A method as recited in claim 3, wherein the remote speech application is a voice browser.
  - 6. A method as recited in claim 3, wherein the remote speech application is a content application.
  - 7. A method as recited in claim 1, further comprising:
8. A method as recited in claim 1, further comprising receiving and responding to a control message from a remote voice browser sent, the control message sent via a second network separate from the wide area network.
9. A method as recited in claim 1, wherein said endpointing comprises concurrently applying, to the speech of the user, a set of endpointing parameters for a voice browser and a set of endpointing parameters for a speech application other than the voice browser.
10. A method as recited in claim 9, further comprising transmitting the endpointed speech to the voice browser if the speech of the user satisfies the set of endpointing parameters for the voice browser.
11. A method as recited in claim 10, wherein the set of endpointing parameters for the voice browser corresponds to a browser hotword.
12. A method as recited in claim 11, further comprising transmitting the endpointed speech to said speech application other than the voice browser even if the speech of the user satisfies the set of endpointing parameters for the browser hotword.
13. A method as recited in claim 1, further comprising locally recognizing a browser hotword in the speech of the user.
14. A method as recited in claim 1, wherein said endpointing comprises:
- using a set of endpointing parameters; and
  
  dynamically adjusting the endpointing parameters during a session with the user based on a response received from the remote site.
15. A method as recited in claim 14, wherein the endpointing parameters are modifiable on a per-utterance basis.
16. A method as recited in claim 1, performed concurrently for each of a plurality of users, to allow each of the users to sequentially access selected ones of a plurality of remote sites on the wide area network.
17. A method as recited in claim 1, performed locally within a telephony end user device.
18. A method as recited in claim 1, wherein said receiving speech of the user comprises receiving the speech of the user over the wide area network.
19. A method as recited in claim 1, wherein said receiving speech of the user comprises receiving the speech of the user using Internet Protocol (IP) telephony.
20. A method as recited in claim 1, wherein said receiving speech of the user comprises receiving the speech of the user over a Public Switched Telephone Network (PSTN).
21. A method as recited in claim 1, wherein said receiving speech of the user comprises receiving the speech of the user from a local microphone, and said playing the prompts to the user comprises playing the prompts to the user via a local speaker.
22. A method as recited in claim 1, further comprising:
- receiving a Hypertext Transport Protocol (HTTP) cookie from a remote speech application; and
  
  using the HTTP cookie to maintain state of the remote speech application.
23. A method as recited in claim 22, further comprising using the HTTP cookie to maintain state of the remote speech application within a user session.
24. A method as recited in claim 22, further comprising using the HTTP cookie to maintain state of the remote speech application between user sessions.
25. A method as recited in claim 1, further comprising receiving verification information resulting from a speaker identity verification process executing in a remote site.
26. A method as recited in claim 25, wherein the remote site executing the speaker identity verification process is a site of a voice browser.
27. A method as recited in claim 25, further comprising providing the verification information to a remote voice content site over the wide area network in response to a user attempting to access said remote voice content site.

28. A method of providing a user with access to voice content on a network, the method comprising:
- receiving a first voice hyperlink control message transmitted from a voice browser in a first r emote voice content site;
  
  activating a voice hyperlink in response to the voice hyperlink control message to provide a user with voice access to a speech application in a second remote voice content site over a wide area network;
  
  receiving speech of the user;
  
  endpointing the speech of the user locally for automatic speech recognition, including concurrently applying, to the speech of the user, a first set of endpointing parameters for the voice browser and a second set of endpointing parameters for said speech application;
  
  transmitting endpointed speech of the user to the second remote voice content site via the wide area network for speech recognition;
  
  receiving remotely generated prompts transmitted over the wide area network; and
  
  playing the prompts to the user.

29. A method comprising:
- receiving remotely transmitted endpointed speech of a user over a network, the endpointed speech having been endpointed for automatic speech recognition by a remote device and transmitted onto the network by the remote device;
  
  recognizing the speech locally;
  
  generating a prompt in response to the speech;
  
  transmitting the prompt to the remote device over the network; and
  
  providing a voice hyperlink control message to the remote device over the network to allow the remote device to access a remote voice content site.
- View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38)
- - 30. A method as recited in claim 29, wherein the network is a local area network.
  - 31. A method as recited in claim 29, wherein the network is a wide area network.
  - 32. A method as recited in claim 29, wherein the remote device has a voice connection with the user.
  - 33. A method as recited in claim 29, further comprising transmitting a voice hyperlink control message to the remote device in response to speech from the user.
  - 34. A method as recited in claim 29, further comprising transmitting a control message to the remote device indicating whether the speech of the user represents a hotword.
  - 35. A method as recited in claim 29, further comprising transmitting a set of endpointing parameters to the remote device over the network.
  - 36. A method as recited in claim 29, wherein the endpointed speech is received in packetized form, and wherein said transmitting the prompt comprises transmitting the prompt in packetized form.
  - 37. A method as recited in claim 29, further comprising performing speaker verification on behalf of a plurality of remote voice content sites, including transmitting speaker identification information to said remote device over the network.
  - 38. A method as recited in claim 29, further comprising locally storing an HTTP cookie to maintain state of a speech application between user sessions.

39. A method comprising:
- receiving endpointed speech of a user transmitted remotely over a wide area network, the endpointed speech having been endpointed for automatic speech recognition by a remote device and transmitted onto the wide area network by the remote device;
  
  recognizing the speech locally;
  
  generating a prompt in response to the speech; and
  
  transmitting the prompt to the remote device over the wide area network.
- View Dependent Claims (40, 41, 42, 43, 44)
- - 40. A method as recited in claim 39, wherein the remote device has a voice connection with the user.
  - 41. A method as recited in claim 39, further comprising transmitting a voice hyperlink control message to the remote device over the wide area network to allow the remote device to access a remote voice content site on the network.
  - 42. A method as recited in claim 39, further comprising transmitting a set of endpointing parameters to the remote device over the network.
  - 43. A method as recited in claim 42, wherein said transmitting the set of endpointing parameters comprises transmitting the set of endpointing parameters in a response to the endpointed speech, such that the remote device can implement the set of endpointing parameters dynamically during a session with a user.
  - 44. A method as recited in claim 39, wherein the endpointed speech is received in packetized form, and wherein said transmitting the prompt comprises transmitting the prompt in packetized form.

45. A device comprising:
- a voice interface to allow the device to receive speech from a user;
  
  an endpointer to perform endpointing of the speech of the user for automatic speech recognition;
  
  a network interface to connect the device to a wide area network; and
  
  a processor to control the device to cause the device to provide the user with voice access to a remote content site maintaining a speech application over the wide area network, wherein the processor is configured to transmit the endpointed speech of the user to a remote speech recognizer over the wide area network, to receive prompts via the wide area network, and to play the prompts to the user via the voice interface.

46. A gateway for use in a speech-enabled processing system, the gateway comprising:
- voice interface means for receiving speech from a user;
  
  endpointer means for endpointing the speech of the user for automatic speech recognition;
  
  network interface means for connecting the gateway to a wide area network; and
  
  control means for controlling the gateway to provide the user with voice access to a remote content site maintaining a speech-enabled application via the wide area network, the control means including means for transmitting results of said endpointing to the remote speech application over the wide area network, means for receiving prompts transmitted over the wide area network, and means for playing the prompts to the user using the voice interface means.

47. A voice content site on a network, the voice content site comprising:
- means for receiving remotely transmitted endpointed speech of a user over a network, the endpointed speech having been endpointed for automatic speech recognition and transmitted onto the network by a remote gateway having a voice connection with the user;
  
  means for recognizing the speech;
  
  means for generating a prompt in response to the speech; and
  
  means for transmitting the prompt to the gateway over the network.

48. An apparatus for operating a voice content site on a network, the method comprising:
- means for receiving endpointed speech of a user transmitted remotely over a wide area network, the endpointed speech having been endpointed for automatic speech recognition and transmitted onto the wide area network by a remote gateway having a voice connection with the user;
  
  means for recognizing the speech locally;
  
  means for generating a prompt in response to the speech; and
  
  means for transmitting the prompt to the gateway over the wide area network.

49. A speech-enabled distributed processing system comprising:
- a gateway configured to provide a user with sequential voice access to selected ones of a plurality of remote voice content sites on a first network, each of the remote voice content sites operating a speech application, the gateway coupled to receive speech from the user via a voice interface, the gateway further configured to perform endpointing of the speech for automatic speech recognition, to transmit the endpointed speech onto the first network, and to receive prompts transmitted over the first network and to play the prompts to the user; and
  
  a first voice content site remotely coupled to the gateway on the first network, the first voice content site configured to receive endpointed speech of the user transmitted by the gateway over the first network, to perform speech recognition on the endpointed speech, to generate prompts and to transmit the generated prompts to the gateway over the first network, and to provide control messages to the gateway to configure the gateway to provide the user with access to another remote voice content site on the first network in response to an utterance of the user.

50. A speech-enabled processing system comprising:
- a gateway coupled to concurrently receive a plurality of telephone communications, each from a different user, via a telephony enabled network, the gateway configured to provide concurrently each of the users with sequential voice access to selected ones of a plurality of remote voice content sites, each of the remote voice content sites operating a speech application, at least some of the remote voice content sites coupled to the gateway via a wide area network, the gateway further configured to perform endpointing of speech of the user for automatic speech recognition and to output results of said endpointing in requests onto the wide area network, the results of said endpointing selectively directed by the gateway to appropriate ones of the remote content sites, the gateway further configured to receive prompts in responses via the wide area network and to play the prompts to the user;
  
  a first voice content site of the plurality of voice content sites, coupled to the gateway remotely via the wide area network, the voice content site configured to receive the requests via the wide area network and to perform speech recognition on the endpointed speech contained therein, the voice content site further configured to generate the prompts and to output the packetized prompts in the responses onto the wide area network; and
  
  a second voice content site of the plurality of voice content sites, coupled to the gateway, the second voice content site including a voice browser configured to control access by the gateway to the plurality of remote voice content sites, the voice browser configured to provide the gateway with voice hyperlink control messages to configure the gateway to selectively direct the results of said endpointing in response to speech from a user, to activate a voice hyperlink to a selected voice content site.

51. An apparatus comprising:
- a telephony device to provide telephonic communication between a local user and a remote user on a wide area network, the telephony device including an audio input device to receive speech from the local user and an audio output device to output speech of the remote user to the local user; and
  
  a gateway to provide the local user with voice access to any of a plurality of remote voice content sites via the wide area network, the gateway including an endpointer to endpoint speech of the local user for automatic speech recognition, the gateway configured to transmit results of endpointing the speech of the local user to a remote speech application over the wide area network, to receive prompts transmitted over the wide area network by the speech application, and to play the prompts to the local user using the audio output device.
- View Dependent Claims (52, 53, 54)
- - 52. An apparatus as recited in claim 51, wherein said telephony device uses Internet Protocol (IP) telephony to provide the telephonic communication.
  - 53. An apparatus as recited in claim 51, wherein the apparatus is part of a personal computer (PC).
  - 54. An apparatus as recited in claim 51, wherein the telephony device uses Internet Protocol (IP) telephony to provide the telephonic communication and the apparatus is part of a personal computer (PC).

Specification

Resources

Litigation Campaign Assessment

Litigation Data

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Inventors
White, James E., Lennig, Matthew
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US09/561,680
Time in Patent Office

1,583 Days
Field of Search

704/270.1, 704/275, 704/270
US Class Current

704/270.1
CPC Class Codes

G10L 15/30   Distributed recognition, e....

G10L 25/87   Detection of discrete point...

H04M 3/4938   comprising a voice browser ...

Distributed voice web architecture and associated components and methods

First Claim

5 Assignments

Litigations

0 Petitions

Accused Products

Abstract

Citations

54 Claims

Specification

Solutions

Use Cases

Quick Links

Distributed voice web architecture and associated components and methods

First Claim

5 Assignments

Subscription Required

Subscription Required

Litigations

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

54 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links