Distributed real time speech recognition system
4 Assignments
0 Petitions
Accused Products
Abstract
A real-time system incorporating speech recognition and linguistic processing for recognizing a spoken query by a user and distributed between client and server, is disclosed. The system accepts user'"'"'s queries in the form of speech at the client where minimal processing extracts a sufficient number of acoustic speech vectors representing the utterance. These vectors are sent via a communications channel to the server where additional acoustic vectors are derived. Using Hidden Markov Models (HMMs), and appropriate grammars and dictionaries conditioned by the selections made by the user, the speech representing the user'"'"'s query is fully decoded into text (or some other suitable form) at the server. This text corresponding to the user'"'"'s query is then simultaneously sent to a natural language engine and a database processor where optimized SQL statements are constructed for a full-text search from a database for a recordset of several stored questions that best matches the user'"'"'s query. Further processing in the natural language engine narrows the search to a single stored question. The answer corresponding to this single stored question is next retrieved from the file path and sent to the client in compressed form. At the client, the answer to the user'"'"'s query is articulated to the user using a text-to-speech engine in his or her native natural language. The system requires no training and can operate in several natural languages.
-
Citations
70 Claims
-
1. (Canceled)
-
2. (Canceled)
-
3. (Canceled)
-
4. (Canceled)
-
5. (Canceled)
-
6. (Canceled)
-
7. (Canceled)
-
8. (Canceled)
-
9. (Canceled)
-
10. (Canceled)
-
11. A machine executable program for use in a voice query recognition system that is distributed across a client system and a separate server system, the program comprising:
-
a first audio signal receiving routine for receiving user speech utterance signals representing speech utterances to be recognized during a sequence of speech utterance evaluation time frames, said speech utterances including sentences comprised of one or more words; and
a first signal processing routine adapted to generate representative speech data values for each speech utterance evaluation time frame during which speech utterance signals are received, said representative speech data values including a set of compressed mel-frequency cepstral coefficients (MFCC);
a formatting routine for rendering said representative speech data values into a transmission format suitable for transmission from the client system over a communications channel to a second processing routine executing on the server computing system; and
wherein said representative speech data values are transmitted continuously during said speech utterances within streaming packets and without waiting for silence to be detected and/or said speech utterances to be completed;
further wherein said representative speech data values constitute a minimum amount of information that can be used by said second processing routine to complete accurate recognition of said one or more words and said sentences. - View Dependent Claims (12, 13, 14, 15, 46)
-
-
16. (Canceled)
-
17. (Canceled)
-
18. (Canceled)
-
19. (Canceled)
-
20. (Canceled)
-
21. (Canceled)
-
22. (Canceled)
-
23. (Canceled)
-
24. (Canceled)
-
25. (Canceled)
-
26. (Canceled)
-
27. A system for assisting a client computing device to perform speech recognition in cooperation with a server computing device, the system comprising:
-
a speech utterance capture circuit for receiving a speech utterance and generating associated speech utterance signals, where said speech utterance can include an articulated sentence of one or more articulated words; and
a speech utterance signal processing circuit, said signal processing being configurable to perform data extracting operations on said speech utterance signals to generate a set of frequency related Speech utterance signals for said articulated sentence; and
wherein said set of frequency related speech utterance signals include a set of compressed mel-frequency cepstral coefficients (MFCC);
a transmission circuit for coding said set of frequency related speech utterance signals into a format suitable for transmission over a communications channel to the server;
a receiving circuit for receiving a response to said articulated sentence through said communications channel from the server, said response being generated by said server using said set of frequency related speech utterance signals to perform a word recognition operation on said one or more articulated words and a sentence recognition operation on said articulated sentence; and
wherein a latency associated with performing said speech recognition is minimized by optimizing an allocation of signal processing responsibilities for said speech utterance signals between the client computing device and the server computing device on a case-by-case basis in accordance with signal processing capabilities of the client computing device.
-
-
28. A system for assisting a client computing device to perform real-time speech recognition in cooperation with a server computing device, the system comprising:
-
a sound processing circuit integrated within the client computing device, said sound processing circuit being adapted to receive a continuous speech utterance and to generate associated speech utterance signals therefrom, wherein said speech utterance can include an articulated sentence of one or more articulated words; and
a first signal processing routine adapted to be executed by the client computing device, and which first signal processing routine is further adapted to continuously generate a set of speech-based vector coefficients as needed from said speech utterance signals; and
a transmission circuit coupled to the client computing device for coding said set of speech based vector coefficients into a format suitable for transmission over a communications channel to the server, said set of speech-based vector coefficients being continuously transmitted in real-time within a Hypertext Transport Protocol (HTTP) byte stream as said speech utterances occur;
a receiving circuit coupled to the client computing device for receiving a real-time response to said articulated sentence through said communications channel from the server;
wherein said response is generated by said server substantially on a real-time basis using said set of speech based vector coefficients to perform a second signal processing routine which completes a word recognition operation on said one or more articulated words, as well as a sentence recognition operation on said articulated sentence;
further wherein at least some words ate recognized in real-time before said speech utterance is completed.
-
-
29. A distributed speech recognition system for processing a speech utterance comprising:
-
a first signal processing circuit associated with a client computing system, said first signal processing circuit being adapted to generate a first set of speech data values from speech utterance signals, wherein said first set of speech data values have a limited data content and are compressed without quantization to reduce processing and transmission latencies in the distributed speech recognition system;
a second signal processing circuit associated with a separate server computing system, said second signal processing circuit being configured to generate a second set of speech data values derived from said first set of speech data values, and being further configured to generate a combined speech data value set consisting of said second set of speech data values and said first set of data values;
a word recognition circuit adapted to use said combined speech data value set and for generating recognizing words in the speech utterance, said word recognition circuit being configured to recognize words before said speech utterance is finished. - View Dependent Claims (30, 31, 32, 33, 34)
-
-
35. (Canceled)
-
36. (Canceled)
-
37. (Canceled)
-
38. (Canceled)
-
39. (Canceled)
-
40. (Canceled)
-
41. (Canceled)
-
42. (Canceled)
-
43. (Canceled)
-
44. (Canceled)
-
45. (Canceled)
-
47. (Canceled)
-
48. (Canceled)
-
49. (Canceled)
-
50. (Canceled)
-
51. (Canceled)
-
52. (Canceled)
-
53. A method of performing distributed voice recognition comprising the steps of:
-
(a) receiving user speech utterance signals representing speech utterances to be recognized during a sequence of speech utterance evaluation time frames, said speech utterances including sentences comprised of one or more words; and
(b) generating representative speech data values with a first processing circuit for each speech utterance evaluation time frame during which speech utterance signals are received, said representative speech data values including a set of compressed mel-frequency cepstral coefficients (MFCC);
(c) encoding said representative speech data values into a transmission format suitable for transmission over a communications channel to a second processing circuit; and
further wherein said representative speech data values constitute a minimum amount of information that can be used by said second processing circuit to complete accurate recognition of said one or more words and said sentences. - View Dependent Claims (54, 55, 56, 57)
-
-
58. A method of performing distributed speech recognition using a first computing device and a second computing device, the method comprising the steps of:
-
(a) evaluating speech processing capabilities of the first computing device using an initialization routine, and (b) evaluating a transmission latency of a communications channel coupling the first computing device and the second computing device, and (c) allocating speech processing tasks between the first computing device and the second computing device based on results of steps (a) and (b), such that an overall speech recognition process is customized on a case-by-case basis for performance characteristics of the first computing device and the second computing device; and
(d) receiving a speech utterance at the first computing device; and
(e) generating associated speech utterance signals from said speech utterance with the first computing device; and
(f) generate a first set of speech data values from said speech utterance signals at the first computing device, said first set of speech data values being insufficient by themselves for permitting recognition of words articulated in said speech utterance; and
(g) formatting said first set of speech data values at the first computing device to be compatible with a communications protocol used by said communications channel;
(h) transmitting said first set of speech data values through said channel to the second computing device; and
(i) generating a second set of speech data values based on said speech data values, such that second set of speech data values contain sufficient information to be usable by a word recognition engine for recognizing words in said speech utterance. - View Dependent Claims (59, 60, 61, 62, 63, 64)
-
-
65. A method of performing distributed recognition of a speech utterance comprising the steps of:
-
(a) generating a first set of speech data values from speech utterance signals at a first computing system, wherein said first set of speech data values have a limited data content to reduce processing and transmission latencies; and
wherein said first set of speech data values include a set of compressed mel-frequency cepstral coefficients (MFCC). (b) generating a second set of speech data values derived from said first set of speech data values at a second computing system, said second computing system being independently operable from said first computing system; and
(c) generating a combined speech data value set at said second computing system consisting of said second set of speech data values and said first set of data values;
(d) generating a list of recognized words in said speech utterance, said list being generated at least in part before said speech utterance is finished. - View Dependent Claims (66, 67, 68, 69, 70)
-
Specification