Method and system for network-based speech recognition
First Claim
1. A method for supporting speech recognition on a user processing device, comprising:
- receiving a stream of audio speech data;
storing a portion of the stream of audio speech data into a buffer of a linked list of buffers as it is received;
at a time t1, wherein t1 is prior to the time when the entirety of the stream of audio speech data is received, encoding a buffer of audio speech data into a smaller file representation;
at a time t2, wherein t2 is prior to the time when the entirety of the stream of audio speech data is received, formatting a portion of the smaller file representation into a packet for transmitting over the internet; and
, at a time t3, wherein t3 is prior to the time when the entirety of the stream of audio speech data is received, transmitting the packet over the internet.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for handling speech recognition processing in effectively real-time, via the internet, in order that users do not experience noticeable delays from the start of an exercise until they receive responsive feedback. A user uses a client to access the internet and a server supporting speech recognition processing, e.g., for language learning activities. The user inputs speech to the client, which transmits the user speech to the server in approximate real-time. The server evaluates the user speech in context of the current speech recognition exercise being executed, and provides responsive feedback to the client, again, in approximate real-time, with minimum latency delays. The client upon receiving responsive feedback from the server, displays, or otherwise provides, the feedback to the user.
-
Citations
20 Claims
-
1. A method for supporting speech recognition on a user processing device, comprising:
-
receiving a stream of audio speech data;
storing a portion of the stream of audio speech data into a buffer of a linked list of buffers as it is received;
at a time t1, wherein t1 is prior to the time when the entirety of the stream of audio speech data is received, encoding a buffer of audio speech data into a smaller file representation;
at a time t2, wherein t2 is prior to the time when the entirety of the stream of audio speech data is received, formatting a portion of the smaller file representation into a packet for transmitting over the internet; and
,at a time t3, wherein t3 is prior to the time when the entirety of the stream of audio speech data is received, transmitting the packet over the internet. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for supporting speech recognition on a server, comprising:
-
receiving a URL from a client remote from the server, the URL comprising a grammar context number;
receiving one or more input packets of encoded audio speech data from the client;
decoding each of the one or more input packets of encoded audio speech data into a portion of raw speech data upon receipt of the respective input packet;
storing each portion of raw speech data into a buffer of a linked list of buffers;
indicating a grammar associated with the grammar context number to a speech recognition engine;
providing each buffer containing a portion of raw speech data to the speech recognition engine as the speech recognition engine is ready to accept it; and
,receiving a response from the speech recognition engine, wherein the response is based on an evaluation of the raw speech data in relation to the grammar. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A method for supporting speech recognition on a server, comprising:
-
receiving a first URL from a first client remote from the server, the first URL comprising a first grammar context;
associating a first set of a plurality of buffers with the first client;
generating a first instance of a speech recognition engine for the first client;
indicating a first grammar associated with the first grammar context to the first instance of the speech recognition engine;
receiving a second URL from a second client remote from the server, the second URL comprising a second grammar context;
associating a second set of a plurality of buffers with the second client;
generating a second instance of the speech recognition engine for the second client;
indicating a second grammar associated with the second grammar context to the second instance of the speech recognition engine;
receiving a packet of encoded audio speech data from the first client;
decoding the packet of encoded audio speech data from the first client into a first client portion of raw data;
storing the first client portion of raw data into a buffer of the first set of a plurality of buffers;
providing the buffer containing the first client portion of raw data to the first instance of a speech recognition engine for processing with the first grammar;
receiving a packet of encoded audio speech data from the second client;
decoding the packet of encoded audio speech data from the second client into a second client portion of raw data;
storing the second client portion of raw data into a buffer of the second set of a plurality of buffers; and
,providing the buffer containing the second client portion of raw data to the second instance of the speech recognition engine for processing with the second grammar. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A system supporting speech recognition, comprising:
-
a plurality of clients, each client comprising the capability to receive audio speech from a user, store the audio speech in one or more buffers, each buffer comprising a portion of the received audio speech, encode a buffer of the received audio speech before all of the audio speech is received, package the encoded buffer of received audio speech into one or more packets to be transmitted over the internet before all of the audio speech is received, and transmit a packet of encoded audio speech over the internet before all of the audio speech is received; and
,a server, said server comprising the capability to receive packets of encoded audio speech from said plurality of clients, decode each of the packets of audio speech and store the resultant raw speech into one or more buffers, evaluate the resultant raw speech received from each of said plurality of clients, and transmit a feedback response to each of said plurality of clients from which said server received packets of encoded audio speech. - View Dependent Claims (19, 20)
-
Specification