Method and system for network-based speech recognition
First Claim
1. A system supporting speech recognition comprising:
- two or more clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers organized as a linked list, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received, package the encoded buffer to receive audio speech into one or more packets to be transmitted over a network before all of the audio speech is received, and transmit a packet of encoded audio speech over the network before all of the audio speech is received; and
a server, said server comprising the capability to receive packets of encoded audio speech from at least two clients, decode each of the packets of audio speech and store the resultant raw speech into one or more buffers for the respective client, and evaluate the resultant raw speech received from each of the at least two clients, wherein a linked list of buffers holds of a client about 0.1 seconds or less of uncompressed audio speech.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for handling speech recognition processing in effectively real-time, via the internet, in order that users do not experience noticeable delays from the start of an exercise until they receive responsive feedback. A user uses a client to access the internet and a server supporting speech recognition processing, e.g., for language learning activities. The user inputs speech to the client, which transmits the user speech to the server in approximate real-time. The server evaluates the user speech in context of the current speech recognition exercise being executed, and provides responsive feedback to the client, again, in approximate real-time, with minimum latency delays. The client upon receiving responsive feedback from the server, displays, or otherwise provides, the feedback to the user.
-
Citations
26 Claims
-
1. A system supporting speech recognition comprising:
-
two or more clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers organized as a linked list, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received, package the encoded buffer to receive audio speech into one or more packets to be transmitted over a network before all of the audio speech is received, and transmit a packet of encoded audio speech over the network before all of the audio speech is received; and
a server, said server comprising the capability to receive packets of encoded audio speech from at least two clients, decode each of the packets of audio speech and store the resultant raw speech into one or more buffers for the respective client, and evaluate the resultant raw speech received from each of the at least two clients, wherein a linked list of buffers holds of a client about 0.1 seconds or less of uncompressed audio speech. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system supporting speech recognition comprising:
-
two or more clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers organized as a linked list, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received, package the encoded buffer to receive audio speech into one or more packets to be transmitted over a network before all of the audio speech is received, and transmit a packet of encoded audio speech over the network before all of the audio speech is received; and
a server, said server comprising the capability to receive packets of encoded audio speech from at least two clients, decode each of the packets of audio speech and store the resultant raw speech into one or more buffers for the respective client, and evaluate the resultant raw speech received from each of the at least two clients, wherein the server comprises the capability of receiving from a client a grammar reference number, and the server will decode each of the packets of audio speech received from the client according to the grammar reference number. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
one or more clients, each client provides a user with a series of questions, the capability to receive audio speech from a user provided as answers to the series of questions, store the audio speech in one or more buffers organized as a linked list, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received, package the encoded buffer to receive audio speech into one or more packets to be transmitted over a network before all of the audio speech is received, and transmit a packet of encoded audio speech over the network before all of the audio speech is received; and
a server, said server comprising the capability to receive packets of encoded audio speech from the client, decode each of the packets of audio speech and store the resultant raw speech into one or more buffers for the client, and evaluate the resultant raw speech received from each of the clients in relation to the series of questions. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A system supporting speech recognition comprising:
-
two or more clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers organized as a linked list, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received, package the encoded buffer to receive audio speech into one or more packets to be transmitted over a network before all of the audio speech is received, and transmit a packet of encoded audio speech over the network before all of the audio speech is received; and
a server, said server comprising the capability to receive packets of encoded audio speech from at least two clients, decode each of the packets of audio speech and store the resultant raw speech into one or more buffers for the respective client, and evaluate the resultant raw speech received from each of the at least two clients, wherein each buffer in the linked list of buffers of a client holds about 0.1 seconds or less of uncompressed audio speech.
-
Specification