Partial speech processing device & method for use in distributed systems
First Claim
Patent Images
1. A speech processing device for use in a distributed voice recognition system comprising:
- a sound processing circuit adapted to receive a speech utterance and to generate associated speech utterance signals therefrom; and
a first signal processing circuit adapted to generate a first set of speech data values from said speech utterance signals, said first set of speech data values being insufficient by themselves for permitting recognition of words articulated in said speech utterance; and
a transmission circuit for formatting and transmitting said first set of speech data values over a communications channel to a second signal processing circuit;
wherein said first set of speech data values can be sent in a data stream over said channel, during periods when silence is not detected, to a second signal processing circuit which can perform a full recognition of said words.
3 Assignments
0 Petitions
Accused Products
Abstract
A client device incorporates partial speech recognition for recognizing a spoken query by a user. The full recognition process is distributed over a client/server architecture, so that the amount of partial recognition signal processing tasks can be allocated on a dynamic basis based on processing resources, channel conditions, etc. Partially processed speech data from the client device can be streamed to a server for a real-time response. Additional natural language processing operations can also be performed to implement sentence recognition functionality.
332 Citations
33 Claims
-
1. A speech processing device for use in a distributed voice recognition system comprising:
-
a sound processing circuit adapted to receive a speech utterance and to generate associated speech utterance signals therefrom; and
a first signal processing circuit adapted to generate a first set of speech data values from said speech utterance signals, said first set of speech data values being insufficient by themselves for permitting recognition of words articulated in said speech utterance; and
a transmission circuit for formatting and transmitting said first set of speech data values over a communications channel to a second signal processing circuit;
wherein said first set of speech data values can be sent in a data stream over said channel, during periods when silence is not detected, to a second signal processing circuit which can perform a full recognition of said words. - View Dependent Claims (2, 3, 4, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A speech processing device for use in a distributed speech recognition system for processing a speech utterance comprising:
-
a first signal processing circuit adapted to generate a first set of speech data values from speech utterance signals associated with the user utterance, wherein said first set of speech data values have a limited data content and are compressed without quantization;
a transmission circuit for formatting and transmitting said first set of speech data values over a communications channel to a second signal processing circuit;
wherein the speech processing device is configured so that said first set of speech data values can be sent in a data stream over said channel, during periods when silence is not detected, to a server system which includes a second signal processing circuit which can perform a full recognition of text words in the user utterance as well as a natural language engine for performing a recognition of a meaning of a sentence presented in said text words. - View Dependent Claims (14, 15, 16)
-
-
17. A method of performing voice recognition comprising the steps of:
-
(a) receiving user speech utterance signals representing speech utterances to be recognized, said speech utterances including sentences comprised of one or more words; and
(b) processing said speech utterance signals with a first computing device to generate speech data values which are insufficient by themselves for recognizing words in said speech utterance; and
(c) formatting said speech data values into a transmission format suitable for transmission over a communications channel from said first computing device to a second computing device; and
wherein said representative speech data values are transmitted within a byte stream in said communications channel until silence is detected; and
further wherein said speech data values contain sufficient data content such that recognition of said one or more words can be completed by a speech recognition engine in said second computing device. - View Dependent Claims (5, 18, 19, 20, 21, 26)
-
-
22. A method of performing distributed voice recognition comprising the steps of:
-
(a) receiving user speech utterance signals representing speech utterances to be recognized during a sequence of speech utterance evaluation time frames, said speech utterances including sentences comprised of one or more words; and
(b) generating speech data values with a first processing circuit for each speech utterance evaluation time frame during which speech utterance signals are received;
(c) encoding said speech data values into a transmission format suitable for transmission over a communications channel to a second processing circuit; and
wherein said speech data values are compressed without being quantized;
further wherein said compressed speech data values constitute a sufficient amount of information that can be used by said second processing circuit to complete accurate recognition of said one or more words and said sentences. - View Dependent Claims (23, 24, 25)
-
-
27. A method of performing distributed speech recognition using a first computing device and a second computing device, the method comprising the steps of:
-
(a) evaluating speech processing capabilities of the first computing device using an initialization routine; and
(b) allocating speech processing tasks between the first computing device and the second computing device based on results of step (a), such that an overall speech recognition process is dynamically customized for performance characteristics of the first computing device and the second computing device; and
(c) receiving a speech utterance at the first computing device; and
(d) generating associated speech utterance signals from said speech utterance with the first computing device; and
(e) generate a first set of speech data values from said speech utterance signals at the first computing device, said first set of speech data values being insufficient by themselves for permitting recognition of words articulated in said speech utterance; and
(f) compressing said first set of speech data values at the first computing device;
(g) transmitting said compressed first set of speech data values through said channel to the second computing device in a byte stream except when silence is detected; and
(h) generating a second set of speech data values based on said speech data values, such that second set of speech data values contain sufficient information to be usable by a word recognition engine for recognizing words in said speech utterance. - View Dependent Claims (28, 29, 30, 31, 32, 33)
-
Specification