Distributed real time speech recognition system
First Claim
1. A method of performing distributed voice recognition, the method comprising:
- (a) receiving speech utterance signals representing utterances of a user, the speech utterance signals comprising one or more words;
(b) generating, via a processing circuit of a client device, speech data values from the utterance signals during an utterance evaluation time frame corresponding to each utterance signal,wherein the speech data values comprise compressed mel-frequency cepstral coefficient vectors (MFCC vectors) further comprising MFCC delta parameters and MFCC acceleration parameters automatically determined based on at least an amount of computational resources available on the client device and a speed of a transceiver used to transmit data between the client device and a server;
(c) encoding the speech data values into a transmission format suitable for transmission over a communications channel to the server; and
(d) communicating user context information over the communications channel to the server, wherein the server uses the context information to dynamically select a grammar to use for recognizing the speech data values.
4 Assignments
0 Petitions
Accused Products
Abstract
A real-time system incorporating speech recognition and linguistic processing for recognizing a spoken query by a user and distributed between client and server, is disclosed. The system accepts user'"'"'s queries in the form of speech at the client where minimal processing extracts a sufficient number of acoustic speech vectors representing the utterance. These vectors are sent via a communications channel to the server where additional acoustic vectors are derived. Using Hidden Markov Models (HMMs), and appropriate grammars and dictionaries conditioned by the selections made by the user, the speech representing the user'"'"'s query is fully decoded into text (or some other suitable form) at the server. This text corresponding to the user'"'"'s query is then simultaneously sent to a natural language engine and a database processor where optimized SQL statements are constructed for a full-text search from a database for a recordset of several stored questions that best matches the user'"'"'s query. Further processing in the natural language engine narrows the search to a single stored question. The answer corresponding to this single stored question is next retrieved from the file path and sent to the client in compressed form. At the client, the answer to the user'"'"'s query is articulated to the user using a text-to-speech engine in his or her native natural language. The system requires no training and can operate in several natural languages.
563 Citations
5 Claims
-
1. A method of performing distributed voice recognition, the method comprising:
-
(a) receiving speech utterance signals representing utterances of a user, the speech utterance signals comprising one or more words; (b) generating, via a processing circuit of a client device, speech data values from the utterance signals during an utterance evaluation time frame corresponding to each utterance signal, wherein the speech data values comprise compressed mel-frequency cepstral coefficient vectors (MFCC vectors) further comprising MFCC delta parameters and MFCC acceleration parameters automatically determined based on at least an amount of computational resources available on the client device and a speed of a transceiver used to transmit data between the client device and a server; (c) encoding the speech data values into a transmission format suitable for transmission over a communications channel to the server; and (d) communicating user context information over the communications channel to the server, wherein the server uses the context information to dynamically select a grammar to use for recognizing the speech data values. - View Dependent Claims (2, 3, 4, 5)
-
Specification