Method and system for network-based speech recognition

US 6,865,536 B2
Filed: 07/19/2002
Issued: 03/08/2005
Est. Priority Date: 10/04/1999
Status: Expired due to Term

First Claim

Patent Images

1. A system supporting speech recognition comprising:

two or more clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers organized as a linked list, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received, package the encoded buffer to receive audio speech into one or more packets to be transmitted over a network before all of the audio speech is received, and transmit a packet of encoded audio speech over the network before all of the audio speech is received; and

a server, said server comprising the capability to receive packets of encoded audio speech from at least two clients, decode each of the packets of audio speech and store the resultant raw speech into one or more buffers for the respective client, and evaluate the resultant raw speech received from each of the at least two clients, wherein a linked list of buffers holds of a client about 0.1 seconds or less of uncompressed audio speech.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems for handling speech recognition processing in effectively real-time, via the internet, in order that users do not experience noticeable delays from the start of an exercise until they receive responsive feedback. A user uses a client to access the internet and a server supporting speech recognition processing, e.g., for language learning activities. The user inputs speech to the client, which transmits the user speech to the server in approximate real-time. The server evaluates the user speech in context of the current speech recognition exercise being executed, and provides responsive feedback to the client, again, in approximate real-time, with minimum latency delays. The client upon receiving responsive feedback from the server, displays, or otherwise provides, the feedback to the user.

Citations

26 Claims

1. A system supporting speech recognition comprising:
- two or more clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers organized as a linked list, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received, package the encoded buffer to receive audio speech into one or more packets to be transmitted over a network before all of the audio speech is received, and transmit a packet of encoded audio speech over the network before all of the audio speech is received; and
  
  a server, said server comprising the capability to receive packets of encoded audio speech from at least two clients, decode each of the packets of audio speech and store the resultant raw speech into one or more buffers for the respective client, and evaluate the resultant raw speech received from each of the at least two clients, wherein a linked list of buffers holds of a client about 0.1 seconds or less of uncompressed audio speech.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The system of claim 1 wherein the encoded audio speech is in a compressed format.
  - 3. The system of claim 1 wherein the server further comprises the capability to transmit a response to a client or the two or more clients, the response a result of the server'"'"'s evaluation of the resultant raw speech received from the client or the two or more clients, andwhere the client or the two or more clients further comprises the capability to receive a response from the server.
  - 4. The system of claim 3 wherein the response is a text response, and a client of the two or more clients comprises a screen on which the client displays the text response.
  - 5. The system of claim 3 wherein the response is in a text format, and a client of the two or more clients comprises a text-to-speech engine which converts a text format response to audio data, and an audio output device that the client uses to output the audio data to the user.
  - 6. The system of claim 1 wherein the server further comprises two or more stored text format files, and the server selects a stored text format file to transmit to a client of the two or more clients as a result of the server'"'"'s evaluation of the resultant raw speech received from the client.

7. A system supporting speech recognition comprising:
- two or more clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers organized as a linked list, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received, package the encoded buffer to receive audio speech into one or more packets to be transmitted over a network before all of the audio speech is received, and transmit a packet of encoded audio speech over the network before all of the audio speech is received; and
  
  a server, said server comprising the capability to receive packets of encoded audio speech from at least two clients, decode each of the packets of audio speech and store the resultant raw speech into one or more buffers for the respective client, and evaluate the resultant raw speech received from each of the at least two clients, wherein the server comprises the capability of receiving from a client a grammar reference number, and the server will decode each of the packets of audio speech received from the client according to the grammar reference number.
- View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
- - 8. The system of claim 7 wherein each buffer in the linked list of buffers of a client holds about 0.1 seconds or less of uncompressed audio speech.
  - 9. The system of claim 7 wherein the encoded audio speech is in a compressed format.
  - 10. The system of claim 7 wherein the server further comprises the capability to transmit a response to a client or the two or more clients, the response a result of the server'"'"'s evaluation of the resultant raw speech received from the client or the two or more clients, andwhere the client or the two or more clients further comprises the capability to receive a response from the server.
  - 11. The system of claim 10 wherein response is a text response, and a client of the two or more clients comprises a screen on which the client displays the text response.
  - 12. The system of claim 10 wherein the response is in a text format, and a client of the two or more clients comprises a text-to-speech engine which converts a text format response to audio data, and an audio output device that the client uses to output the audio data to the user.
  - 13. The system of claim 7 wherein the server further comprises two or more stored text format files, and the server selects a stored text format file to transmit to a client of the two or more clients as a result of the server'"'"'s evaluation of the resultant raw speech received from the client.
  - 14. The system of claim 7 wherein a linked list of buffers holds of the client about 0.1 seconds or less of uncompressed audio speech.

15. A system comprising:
- one or more clients, each client provides a user with a series of questions, the capability to receive audio speech from a user provided as answers to the series of questions, store the audio speech in one or more buffers organized as a linked list, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received, package the encoded buffer to receive audio speech into one or more packets to be transmitted over a network before all of the audio speech is received, and transmit a packet of encoded audio speech over the network before all of the audio speech is received; and
  
  a server, said server comprising the capability to receive packets of encoded audio speech from the client, decode each of the packets of audio speech and store the resultant raw speech into one or more buffers for the client, and evaluate the resultant raw speech received from each of the clients in relation to the series of questions.
- View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 16. The system of claim 15 wherein the server causes the client to provide a user-discernable indication whether an answer to one of the series of questions is correct or incorrect.
  - 17. The system of claim 15 based on a response to one of the series of questions, the server transmits a response to the client of at least one of text, audio, visual, or audiovisual content.
  - 18. The system of claim 15 wherein a linked list of buffers holds of the client about 0.1 seconds or less of uncompressed audio speech.
  - 19. The system of claim 15 wherein each buffer in the linked list of buffers of a client holds about 0.1 seconds or less of uncompressed audio speech.
  - 20. The system of claim 15 wherein the server comprises the capability of receiving from a client a grammar reference number, and the server will decode each of the packets of audio speech received from the client according to the grammar reference number.
  - 21. The system of claim 15 wherein the encoded audio speech is in a compressed format.
  - 22. The system of claim 15 wherein the server further comprises the capability to transmit a response to a client of the one or more clients, the response a result of the server'"'"'s evaluation of the resultant raw speech received from the client, andwhere the client further comprises the capability to receive a response from the server.
  - 23. The system of claim 22 wherein the response is a text response, and a client of the one or more clients comprises a screen on which the client displays the text response.
  - 24. The system of claim 22 wherein the response is in a text format, and a client of the one or more clients comprises a text-to-speech engine which converts a text format response to audio data, and an audio output device that the client uses to output the audio data to the user.
  - 25. The system of claim 15 wherein the server further comprises two or more stored text format flies, and the server selects a stored text format file to transmit to a client of the one or more clients as a result of the server'"'"'s evaluation of the resultant raw speech received from the client.

26. A system supporting speech recognition comprising:
- two or more clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers organized as a linked list, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received, package the encoded buffer to receive audio speech into one or more packets to be transmitted over a network before all of the audio speech is received, and transmit a packet of encoded audio speech over the network before all of the audio speech is received; and
  
  a server, said server comprising the capability to receive packets of encoded audio speech from at least two clients, decode each of the packets of audio speech and store the resultant raw speech into one or more buffers for the respective client, and evaluate the resultant raw speech received from each of the at least two clients, wherein each buffer in the linked list of buffers of a client holds about 0.1 seconds or less of uncompressed audio speech.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Pearson Education Incorporated (Pearson plc)
Original Assignee
GlobalEnglish Corporation (Pearson plc)
Inventors
Jochumson, Christopher S.
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
Lerner, Martin

Application Number

US10/199,395
Publication Number

US 20030046065A1
Time in Patent Office

963 Days
Field of Search

704/235, 704/260, 704/270, 704/270.1, 704/275, 709/203, 707/3
US Class Current

704/270.1
CPC Class Codes

G09B 19/04   Speaking with audible prese...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/30   Distributed recognition, e....

G10L 2015/228   of application context

Y10S 707/99933   Query processing, i.e. sear...

Method and system for network-based speech recognition

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for network-based speech recognition

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links