Method and system for network-based speech recognition
First Claim
1. A method for supporting speech recognition on a server, comprising:
- receiving a URL from a client remote from the server, the URL comprising a grammar context number;
receiving one or more input packets of encoded audio speech data from the client;
decoding each of the one or more input packets of encoded audio speech data into a portion of raw speech data upon receipt of the respective input packet;
storing each portion of raw speech data into a buffer of a linked list of buffers;
indicating a grammar associated with the grammar context number to a speech recognition engine;
providing each buffer containing a portion of raw speech data to the speech recognition engine as the speech recognition engine is ready to accept it; and
, receiving a response from the speech recognition engine, wherein the response is based on an evaluation of the raw speech data in relation to the grammar.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for handling speech recognition processing in effectively real-time, via the internet, in order that users do not experience noticeable delays from the start of an exercise until they receive responsive feedback. A user uses a client to access the internet and a server supporting speech recognition processing, e.g., for language learning activities. The user inputs speech to the client, which transmits the user speech to the server in approximate real-time. The server evaluates the user speech in context of the current speech recognition exercise being executed, and provides responsive feedback to the client, again, in approximate real-time, with minimum latency delays. The client upon receiving responsive feedback from the server, displays, or otherwise provides, the feedback to the user.
-
Citations
16 Claims
-
1. A method for supporting speech recognition on a server, comprising:
-
receiving a URL from a client remote from the server, the URL comprising a grammar context number;
receiving one or more input packets of encoded audio speech data from the client;
decoding each of the one or more input packets of encoded audio speech data into a portion of raw speech data upon receipt of the respective input packet;
storing each portion of raw speech data into a buffer of a linked list of buffers;
indicating a grammar associated with the grammar context number to a speech recognition engine;
providing each buffer containing a portion of raw speech data to the speech recognition engine as the speech recognition engine is ready to accept it; and
,receiving a response from the speech recognition engine, wherein the response is based on an evaluation of the raw speech data in relation to the grammar. - View Dependent Claims (2, 3, 4, 5, 6)
packaging a text response based on the evaluation of the raw speech data in relation to the grammar into one or more transmission packets to be transmitted over the internet; and
,transmitting the one or more transmission packets to the client.
-
-
3. The method of claim 2, wherein the one or more input packets of encoded audio speech data received from the client are transmitted via an internet connection, and the one or more transmission packets are transmitted to the client via the same internet connection.
-
4. The method of claim 1, wherein the one or more input packets of encoded audio speech data are received from the client for a language learning exercise.
-
5. The method of claim 1, wherein a first buffer containing a first portion of raw speech data is provided to the speech recognition engine before all of the input packets of encoded audio speech data are received from the client.
-
6. The method of claim 1, further comprising:
-
identifying a speech file stored on the server based on receiving a second URL from the client;
encoding the speech file into a smaller file representation;
packaging the encoded speech file into one or more transmission packets; and
,transmitting each of the one or more transmission packets to the client.
-
-
7. A method for supporting speech recognition on a server, comprising:
-
receiving a first URL from a first client remote from the server, the first URL comprising a first grammar context;
associating a first set of a plurality of buffers with the first client;
generating a first instance of a speech recognition engine for the first client;
indicating a first grammar associated with the first grammar context to the first instance of the speech recognition engine;
receiving a second URL from a second client remote from the server, the second URL comprising a second grammar context;
associating a second set of a plurality of buffers with the second client;
generating a second instance of the speech recognition engine for the second client;
indicating a second grammar associated with the second grammar context to the second instance of the speech recognition engine;
receiving a packet of encoded audio speech data from the first client;
decoding the packet of encoded audio speech data from the first client into a first client portion of raw data;
storing the first client portion of raw data into a buffer of the first set of a plurality of buffers;
providing the buffer containing the first client portion of raw data to the first instance of a speech recognition engine for processing with the first grammar;
receiving a packet of encoded audio speech data from the second client;
decoding the packet of encoded audio speech data from the second client into a second client portion of raw data;
storing the second client portion of raw data into a buffer of the second set of a plurality of buffers; and
,providing the buffer containing the second client portion of raw data to the second instance of the speech recognition engine for processing with the second grammar. - View Dependent Claims (8, 9, 10, 11)
receiving a first response from the first instance of the speech recognition engine, wherein the first response is based on an evaluation of the raw data provided to the first instance of the speech recognition engine in relation to the first grammar;
packaging a first text response based on the evaluation of the raw data provided to the first instance of the speech recognition engine in relation to the first grammar into one or more first client transmission packets;
transmitting the one or more first client transmission packets to the first client over the internet;
receiving a second response from the second instance of the speech recognition engine, wherein the second response is based on an evaluation of the raw data provided to the second instance of the speech recognition engine in relation to the second grammar;
packaging a second text response based on the evaluation of the raw data provided to the second instance of the speech recognition engine in relation to the second grammar into one or more second client transmission packets; and
,transmitting the one or more second client transmission packets to the second client over the internet.
-
-
9. The method of claim 8, wherein the packet of encoded audio speech data from the first client is received via a first TCP/IP connection and the one or more first client transmission packets are transmitted to the first client via the same first TCP/IP connection, and wherein the packet of encoded audio speech data from the second client is received via a second TCP/IP connection and the one or more second client transmission packets are transmitted to the second client via the same second TCP/IP connection.
-
10. The method of claim 9, further comprising:
-
releasing the first TCP/IP connection following the transmission of the last packet of the one or more first client transmission packets to the first client;
terminating the first instance of the speech recognition engine some time after receiving the first response from the first instance of the speech recognition engine;
releasing the second TCP/IP connection following the transmission of the last packet of the one or more second client transmission packets to the second client; and
,terminating the second instance of the speech recognition engine some time after receiving the second response from the second instance of the speech recognition engine.
-
-
11. The method of claim 7, wherein the packet of encoded audio speech data from the first client is for a first language learning exercise, and the packet of encoded audio speech data from the second client is for a second language learning exercise.
-
12. A method for supporting speech recognition on a user processing device, comprising:
-
receiving a stream of audio speech data;
storing a portion of the stream of audio speech data into a buffer of a linked list of buffers as it is received;
transmitting a URL comprising a grammar context number which is indicative of a speech recognition exercise that the stream of audio speech data is for;
at a time t1, wherein t1 is prior to the time when the entirety of the stream of audio speech data is received, encoding a buffer of audio speech data into a smaller file representation;
at a time t2, wherein t2 is prior to the time when the entirety of the stream of audio speech data is received, formatting a portion of the smaller file representation into a packet for transmitting over the internet;
at a time t3, wherein t3 is prior to the time when the entirety of the stream of audio speech data is received, transmitting the packet over the internet; and
establishing an internet connection prior to time t3. - View Dependent Claims (13, 14, 15, 16)
writing a portion of the audio stream data contained in a buffer in the linked list of buffers to a second buffer in a second linked list of buffers; and
,freeing the buffer in the linked list of buffers to receive another portion of the stream of audio speech data.
-
-
16. The method of claim 12, further comprising, at a time t4, receiving one or more packets of a text response, decoding a packet of text response after it is received, and displaying the text response.
Specification