Method and system for network-based speech recognition

US 7,330,815 B1
Filed: 08/24/2004
Issued: 02/12/2008
Est. Priority Date: 10/04/1999
Status: Expired due to Term

First Claim

Patent Images

1. A system supporting speech recognition comprising:

two or more clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received, package the encoded buffer to receive audio speech into one or more packets to be transmitted over the internet before all of the audio speech is received, and transmit a packet of encoded audio speech over the internet before all of the audio speech is received; and

a server, the server comprising the capability to receive packets of encoded audio speech from at least two clients, decode each of the packets of audio speech and store the resultant raw speech into one or more buffers for the respective client, and a processing time used to evaluate the resultant raw speech will vary based on a value communicated to the server from each respective client.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems for handling speech recognition processing in effectively real-time, via the internet, in order that users do not experience noticeable delays from the start of an exercise until they receive responsive feedback. A user uses a client to access the internet and a server supporting speech recognition processing, e.g., for language learning activities. The user inputs speech to the client, which transmits the user speech to the server in approximate real-time. The server evaluates the user speech in context of the current speech recognition exercise being executed, and provides responsive feedback to the client, again, in approximate real-time, with minimum latency delays. The client upon receiving responsive feedback from the server, displays, or otherwise provides, the feedback to the user.

66 Citations

View as Search Results

25 Claims

1. A system supporting speech recognition comprising:
- two or more clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received, package the encoded buffer to receive audio speech into one or more packets to be transmitted over the internet before all of the audio speech is received, and transmit a packet of encoded audio speech over the internet before all of the audio speech is received; and
  
  a server, the server comprising the capability to receive packets of encoded audio speech from at least two clients, decode each of the packets of audio speech and store the resultant raw speech into one or more buffers for the respective client, and a processing time used to evaluate the resultant raw speech will vary based on a value communicated to the server from each respective client.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The system of claim 1 wherein the server further comprises the capability to transmit a response to a client, the response a result of the server'"'"'s evaluation of the resultant raw speech received from the client, anda client of the two or more clients further comprises the capability to receive the response from the server.
  - 3. The system of claim 2 wherein the response is a text response, and a client of the two or more clients comprises a screen on which the client displays the text response.
  - 4. The system of claim 1 wherein the one or more buffers comprise a linked list of buffers.
  - 5. The system of claim 1 wherein a user selects a user objective at a client, the client transmits the user objective to the server, and the server evaluates the resultant raw speech received from the client based on the user objective.
  - 6. The system of claim 5 wherein the user objective comprises pronunciation accuracy.
  - 7. The system of claim 5 wherein the user objective comprises grammar.
  - 8. The system of claim 1 wherein the encoded audio speech is in a compressed format.
  - 9. The system of claim 1 wherein the value is communicated by URL.

10. A system supporting speech recognition comprising:
- two or more clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers in a raw uncompressed audio format each buffer comprising a portion of the received audio speech encode a buffer of the received audio speech before all of the audio speech is received package the encoded buffer to receive audio speech into one or more packets to be transmitted over the Internet before all of the audio speech is received, and transmit a packet of encoded audio speech over the Internet before all of the audio speech is received; and
  
  a server, the server comprising the capability to receive packets of encoded audio speech from at least two clients decode each of the packets of audio speech and store the resultant raw speech into one or more buffers for the respective client and evaluate the resultant raw speech received from each of the at least two clients,wherein the server further comprises two or more stored text format files, and the server selects a stored text format file to transmit to a client of the two or more clients as a result of the server'"'"'s evaluation of the resultant raw speech received from the client, and the server adjusts a processing time used to evaluate the resultant raw speech based on a value in a URL connection between the client and the server.
- View Dependent Claims (11)
- - 11. The system of claim 10 wherein the server further comprises the capability to partition a stored text format file into two or more packets for the transmission over the Internet, and to transmit each packet over the Internet to a client.

12. A system supporting speech recognition comprising:
- two or more clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers organized as a linked list in a raw uncompressed audio format, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received, package the encoded buffer to receive audio speech into one or more packets to be transmitted over a network before all of the audio speech is received, and transmit a packet of encoded audio speech over the network before all of the audio speech is received; and
  
  a server, the server comprising the capability to receive packets of encoded audio speech from at least two clients, decode each of the packets of audio speech and store the resultant raw speech into one or more buffers for the respective client, and evaluate the resultant raw speech received from each of the at least two clients, wherein a level of processing used in the evaluation of the resultant raw speech received from each of the at least two clients is alterable based on a value communicated between the clients and the server.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21)
- - 13. The system of claim 12 wherein the encoded audio speech is in a compressed format.
  - 14. The system of claim 12 wherein the server further comprises the capability to transmit a response to a client, the response a result of the server'"'"'s evaluation of the resultant raw speech received from the client, anda client of the two or more clients further comprises the capability to receive the response from the server.
  - 15. The system of claim 14 wherein the response is a text response, and a client of the two or more clients comprises a screen on which the client displays the text response.
  - 16. The system of claim 12 wherein a user selects a user objective at a client, the client transmits the user objective to the server, and the server evaluates the resultant raw speech received from the client based on the user objective.
  - 17. The system of claim 16 wherein the user objective comprises pronunciation accuracy.
  - 18. The system of claim 16 wherein the user objective comprises grammar.
  - 19. The system of claim 12 wherein the one or more buffers comprise a linked list of buffers.
  - 20. The system of claim 12 wherein the server further comprises the capability to partition a stored text format file into two or more packets for the transmission over the network, and to transmit each packet over the network to a client.
  - 21. The system of claim 12 wherein the value is communicated by URL.

22. A system supporting speech recognition comprising:
- two or more clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers in a raw uncompressed audio format, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received, package the encoded buffer to receive audio speech into one or more packets to be transmitted over the internet before all of the audio speech is received, and transmit a packet of encoded audio speech over the internet before all of the audio speech is received; and
  
  a server, the server comprising the cap ability to receive packets of encoded audio speech from at least two clients, decode each of the packets of audio speech and store the resultant raw speech into one or more buffers for the respective client, and evaluate the resultant raw speech received from each of the at least two clients,wherein a user selects a user objective at a client, the client transmits the user objective to the server, and the server evaluates the resultant raw speech received from the client based on the user objective and a value communicated to the server by URL.

23. A system supporting speech recognition comprising:
- two or more clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers in a raw uncompressed audio format, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received package the encoded buffer to receive audio speech into one or more packets to be transmitted over the internet before all of the audio speech is received, and transmit a packet of encoded audio speech over the internet before all of the audio speech is received; and
  
  a server, the server comprising the capability to receive packets of encoded audio speech from at least two clients, decode each of the packets of audio speech and store the resultant raw speech into one or more buffers for the respective client and evaluate the resultant raw speech received from each of the at least two clients,wherein before the client receives audio speech from a user, the server transmits a file to a client, the client presents the file in at least one of an audio or visual format to the user, and the server evaluates the resultant raw speech received from the client in connection with the file transmitted from the server to the client and a processing time used to evaluate the resultant raw speech will vary based on a value communicated to the server from the client.

24. A system supporting speech recognition comprising:
- two or more clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers in a raw uncompressed audio format, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received package the encoded buffer to receive audio speech into one or more packets to be transmitted over the internet before all of the audio speech is received, and transmit a packet of encoded audio speech over the internet before all of the audio speech is received; and
  
  a server, the server comprising the capability to receive packets of encoded audio speech from at least two clients decode each of the packets of audio speech and store the resultant raw speech into one or more buffers for the respective client, and evaluate the resultant raw speech received from each of the at least two clients,wherein the server transmits a first file to a client, the client presents the first file in at least one of an audio or visual format to the user, after presenting the first file to the user, the client receives audio speech from the user, andthe server evaluates the resultant raw speech received from the client in connection with the first file transmitted from the server to the client and a processing time used by the server to evaluate the resultant raw speech is alterable based on a value communicated from the client to the server.
- View Dependent Claims (25)
- - 25. The system of claim 24 wherein the server transmits a second file to the client, the client presents the second file in at least one of an audio or visual format to the user, after presenting the second file to the user, the client receives audio speech from the user, andthe server evaluates the resultant raw speech received from the client in connection with the second file transmitted from the server to the client.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Pearson Education Incorporated (Pearson plc)
Original Assignee
GlobalEnglish Corporation (Pearson plc)
Inventors
Jochumson, Christopher S
Primary Examiner(s)
Lerner; Martin

Application Number

US10/711,114
Time in Patent Office

1,267 Days
Field of Search

704/235, 704/260, 704/270, 704/270.1, 704/275, 704/231, 709/202, 709/250, 434/185
US Class Current

704/231
CPC Class Codes

G10L 13/08   Text analysis or generation...

G10L 15/00   Speech recognition G10L17/0...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 19/00   Speech or audio signals ana...

G10L 19/0018   Speech coding using phoneti...

Method and system for network-based speech recognition

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

66 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for network-based speech recognition

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

66 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links