Interactive voice recognition and response over the internet

US 8,126,719 B1
Filed: 11/09/2010
Issued: 02/28/2012
Est. Priority Date: 10/04/1999
Status: Expired due to Term

First Claim

Patent Images

1. A system supporting speech recognition comprising:

two or more clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers in an uncompressed audio format, each buffer comprising a portion of the received audio speech, encode the stored audio speech in the one or more buffers before all of the audio speech is received, package the encoded audio speech into one or more packets to be transmitted over the Internet before all of the audio speech is received, and transmit a packet of encoded audio speech over the Internet before all of the audio speech is received;

a server, the server comprising the capability to receive packets of encoded audio speech from at least two clients, decode each of the packets of audio speech and store the resultant speech into one or more buffers for the respective client, and evaluate the resultant speech received from each of the at least two clients,wherein the server further comprises the capability to transmit a response to a client, the response a result of the server'"'"'s evaluation of the resultant speech received from the client, and the server alters a processing time used to evaluate the resultant speech based on a value communicated between the client and the server, anda client of the two or more clients further comprises the capability to receive the response from the server.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems for handling speech recognition processing in effectively real-time, via the Internet, in order that users do not experience noticeable delays from the start until they receive responsive feedback. A user uses a client to access the Internet and a server supporting speech recognition processing. The user inputs speech to the client, which transmits the user speech to the server in approximate real-time. The server evaluates the user speech, and provides responsive feedback to the client, again, in approximate real-time, with minimum latency delays. The client upon receiving responsive feedback from the server, displays, or otherwise provides, the feedback to the user.

65 Citations

View as Search Results

26 Claims

1. A system supporting speech recognition comprising:
- two or more clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers in an uncompressed audio format, each buffer comprising a portion of the received audio speech, encode the stored audio speech in the one or more buffers before all of the audio speech is received, package the encoded audio speech into one or more packets to be transmitted over the Internet before all of the audio speech is received, and transmit a packet of encoded audio speech over the Internet before all of the audio speech is received;
  
  a server, the server comprising the capability to receive packets of encoded audio speech from at least two clients, decode each of the packets of audio speech and store the resultant speech into one or more buffers for the respective client, and evaluate the resultant speech received from each of the at least two clients,wherein the server further comprises the capability to transmit a response to a client, the response a result of the server'"'"'s evaluation of the resultant speech received from the client, and the server alters a processing time used to evaluate the resultant speech based on a value communicated between the client and the server, anda client of the two or more clients further comprises the capability to receive the response from the server.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 26)
- - 2. The system of claim 1 wherein based on a URL sent by the client, the server determines whether to expect speech data for processing from the client.
  - 3. The system of claim 1 wherein the response is in text format, and a client of the two or more clients comprises a text-to-speech engine which converts a text format response to audio data, and an audio output device that the client uses to output the audio data to a user.
  - 4. The system of claim 1 wherein the server further comprises two or more stored text format files, and the server selects a stored text format file to transmit to a client of the two or more clients as a result of the server'"'"'s evaluation of the resultant speech received from the client.
  - 5. The system of claim 4 wherein the server further comprises the capability to partition a stored text format file into two or more packets for the transmission over the Internet, and to transmit each packet over the Internet to a client.
  - 6. The system of claim 5 wherein a client further comprises an audio output device, and the capability to receive the packets of text format, convert the packets of text format to audio data and play the audio data to a user.
  - 7. The system of claim 1 wherein the one or more buffers comprises a linked list of buffers.
  - 8. The system of claim 1 wherein the one or more buffers comprises at least two sets of a linked list of buffers.
  - 9. The system of claim 1 wherein the one or more buffers comprises a predefined number of buffers.
  - 10. The system of claim 1 wherein the encoded audio speech is in a compressed format.
  - 11. The system of claim 1 wherein the one or more packets are TCP packets.
  - 12. The system of claim 1 wherein the server, based on its evaluation of the resultant speech from a first client of the at least two clients, generates a textual response and sends the textual response to the first client over the Internet.
  - 13. The system of claim 1 wherein the value comprises at least one numerical value.
  - 26. The system of claim 1 wherein the value comprises at least one numerical value.

14. A system supporting speech recognition comprising:
- two or more clients, each client comprising the capability to receive audio speech from a user in an uncompressed audio format, encode the audio speech before all of the audio speech is received and storing the encoded speech in one or more buffers, package the encoded audio speech into one or more packets to be transmitted over the Internet before all of the audio speech is received, and transmit a packet of encoded audio speech over the Internet before all of the audio speech is received; and
  
  a server, the server comprising the capability to receive packets of encoded audio speech from at least two clients, decode each of the packets of audio speech and store the resultant speech into one or more buffers for the respective client, and evaluate the resultant speech received from each of the at least two clients,wherein the server further comprises the capability to transmit a response to a client, the response a result of the server'"'"'s evaluation of the resultant speech received from the client, and the server alters a processing time used to evaluate the resultant speech based on a value communicated between the client and the server, anda client of the two or more clients further comprises the capability to receive the response from the server.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 15. The system of claim 14 wherein based on a URL sent by the client, the server determines whether to expect speech data for processing from the client.
  - 16. The system of claim 14 wherein the response is in text format, and a client of the two or more clients comprises a text-to-speech engine which converts a text format response to audio data, and an audio output device that the client uses to output the audio data to a user.
  - 17. The system of claim 14 wherein the server further comprises two or more stored text format files, and the server selects a stored text format file to transmit to a client of the two or more clients as a result of the server'"'"'s evaluation of the resultant speech received from the client.
  - 18. The system of claim 17 wherein the server further comprises the capability to partition a stored text format file into two or more packets for the transmission over the Internet, and to transmit each packet over the Internet to a client.
  - 19. The system of claim 18 wherein a client further comprises an audio output device, and the capability to receive the packets of text format, convert the packets of text format to audio data and play the audio data to a user.
  - 20. The system of claim 14 wherein the one or more buffers comprises a linked list of buffers.
  - 21. The system of claim 14 wherein the one or more buffers comprises at least two sets of a linked list of buffers.
  - 22. The system of claim 14 wherein the one or more buffers comprises a predefined number of buffers.
  - 23. The system of claim 14 wherein the encoded audio speech is in a compressed format.
  - 24. The system of claim 14 wherein the one or more packets are TCP packets.
  - 25. The system of claim 14 wherein the server, based on its evaluation of the resultant speech from a first client of the at least two clients, generates a textual response and sends the textual response to the first client over the Internet.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Pearson Education Incorporated (Pearson plc)
Original Assignee
GlobalEnglish Corporation (Pearson plc)
Inventors
Jochumson, Christopher S.
Primary Examiner(s)
Lerner, Martin

Application Number

US12/942,834
Time in Patent Office

476 Days
Field of Search

704/231, 704/235, 704/260, 704/270, 704/270.1, 704/275, 704/201, 704/258
US Class Current

704/270.1
CPC Class Codes

G10L 13/08   Text analysis or generation...

G10L 15/00   Speech recognition G10L17/0...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 19/00   Speech or audio signals ana...

G10L 19/0018   Speech coding using phoneti...

Interactive voice recognition and response over the internet

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

65 Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Interactive voice recognition and response over the internet

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

65 Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links