Partial speech processing device & method for use in distributed systems

US 20050086059A1
Filed: 12/03/2004
Published: 04/21/2005
Est. Priority Date: 11/12/1999
Status: Active Grant

First Claim

Patent Images

1. A speech processing device for use in a distributed voice recognition system comprising:

a sound processing circuit adapted to receive a speech utterance and to generate associated speech utterance signals therefrom; and

a first signal processing circuit adapted to generate a first set of speech data values from said speech utterance signals, said first set of speech data values being insufficient by themselves for permitting recognition of words articulated in said speech utterance; and

a transmission circuit for formatting and transmitting said first set of speech data values over a communications channel to a second signal processing circuit;

wherein said first set of speech data values can be sent in a data stream over said channel, during periods when silence is not detected, to a second signal processing circuit which can perform a full recognition of said words.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A client device incorporates partial speech recognition for recognizing a spoken query by a user. The full recognition process is distributed over a client/server architecture, so that the amount of partial recognition signal processing tasks can be allocated on a dynamic basis based on processing resources, channel conditions, etc. Partially processed speech data from the client device can be streamed to a server for a real-time response. Additional natural language processing operations can also be performed to implement sentence recognition functionality.

332 Citations

33 Claims

1. A speech processing device for use in a distributed voice recognition system comprising:
- a sound processing circuit adapted to receive a speech utterance and to generate associated speech utterance signals therefrom; and
  
  a first signal processing circuit adapted to generate a first set of speech data values from said speech utterance signals, said first set of speech data values being insufficient by themselves for permitting recognition of words articulated in said speech utterance; and
  
  a transmission circuit for formatting and transmitting said first set of speech data values over a communications channel to a second signal processing circuit;
  
  wherein said first set of speech data values can be sent in a data stream over said channel, during periods when silence is not detected, to a second signal processing circuit which can perform a full recognition of said words.
- View Dependent Claims (2, 3, 4, 6, 7, 8, 9, 10, 11, 12)
- - 2. The speech processing device of claim 1, wherein said second signal processing circuit uses a second set of speech data values that includes said first set of speech data values and a derived set of speech data values, which derived set of speech data values are computed based on said first speech data values.
  - 3. The speech processing device of claim 2, wherein said first set of data values are MFCC vector coefficients, and said derived set of speech data values are MFCC delta coefficients and a MFCC acceleration coefficients derived from said MFCC vector coefficients.
  - 4. The speech processing device of claim 2, wherein said second set of speech data values can be generated by said second signal processing circuit in a time that is less than the combination of a first time which would be required by said first signal processing circuit to generate said second set of speech data values from said first set of speech data values combined with a second time which would be required by said transmission circuit to format and transmit said second set of speech data values.
  - 6. The speech processing device of claim 1, wherein signal processing responsibilities of said first and second signal processing circuits are allocated such that said first signal processing circuit performs less than approximately ½
    - the required signal processing operations needed to convert said speech utterance signals into a form usable by a word recognition engine.
  - 7. The speech processing device of claim 2, wherein signal processing functions required to generate said first and second set of speech data values can be allocated between said first signal processing circuit and second signal processing circuit as needed based on computing resources available to said first and second signal processing circuits respectively.
  - 8. The speech processing device of claim 2, wherein signal processing functions performed by said first signal processing circuit are adjustable.
  - 9. The speech processing device of claim 1, wherein signal processing functions performed by said first signal processing circuit and second signal processing circuit are configured based on:
    - (i) computing resources available to said first and second signal processing circuits;
      
      (ii) performance characteristics of said transmission circuit; and
      
      (iii) transmission latencies of said communications channel.
  - 10. The speech processing device of claim 2, wherein said first signal processing circuit is also configured to assist said second signal processing circuit with signal processing computations required to generate said second set of speech data values.
  - 11. The speech processing device of claim 1, wherein said first set of speech data values represent the least amount of data that can used by said second signal processing circuit to generate a second set of data values usable for a word recognition process.
  - 12. The speech processing device of claim 3 wherein a Hidden Markov Model (HMM) is processes said derived set of speech data values to perform a full recognition of said words.

13. A speech processing device for use in a distributed speech recognition system for processing a speech utterance comprising:
- a first signal processing circuit adapted to generate a first set of speech data values from speech utterance signals associated with the user utterance, wherein said first set of speech data values have a limited data content and are compressed without quantization;
  
  a transmission circuit for formatting and transmitting said first set of speech data values over a communications channel to a second signal processing circuit;
  
  wherein the speech processing device is configured so that said first set of speech data values can be sent in a data stream over said channel, during periods when silence is not detected, to a server system which includes a second signal processing circuit which can perform a full recognition of text words in the user utterance as well as a natural language engine for performing a recognition of a meaning of a sentence presented in said text words.
- View Dependent Claims (14, 15, 16)
- - 14. The speech processing device of claim 13, wherein said articulated sentence can include one of a number of predefined sentences recognizable by said natural language engine and said articulated sentence is recognized by identifying a candidate set of potential sentences from said number of predefined sentences corresponding to said articulated sentence, and then comparing each entry in the candidate set of potential sentences to said articulate sentence to determine a matching recognized sentence.
  - 15. The speech processing device of claim 14, wherein said articulated sentence is compared against said candidate set of potential sentences by examining noun phrases.
  - 16. The speech processing device of claim 15, wherein said candidate set of potential sentences are determined in part by a context dictionary loaded by said natural language engine in response to an operating environment presented by said system to a user.

17. A method of performing voice recognition comprising the steps of:
- (a) receiving user speech utterance signals representing speech utterances to be recognized, said speech utterances including sentences comprised of one or more words; and
  
  (b) processing said speech utterance signals with a first computing device to generate speech data values which are insufficient by themselves for recognizing words in said speech utterance; and
  
  (c) formatting said speech data values into a transmission format suitable for transmission over a communications channel from said first computing device to a second computing device; and
  
  wherein said representative speech data values are transmitted within a byte stream in said communications channel until silence is detected; and
  
  further wherein said speech data values contain sufficient data content such that recognition of said one or more words can be completed by a speech recognition engine in said second computing device.
- View Dependent Claims (5, 18, 19, 20, 21, 26)
- - 5. The speech processing device of claim 17, wherein said second set of speech data values can be generated by said second signal processing circuit in less time than that which would be required by said first signal processing circuit to generate said second set of speech data values from said first set of speech data values.
  - 18. The method of claim 17, wherein said recognition can be performed in real-time by a Hidden Markov Model (HMM).
  - 19. The method of claim 17, wherein complete recognition of said one or more words is achieved with less latency than that resulting if said complete recognition were performed entirely by said first computing device or said second computing device.
  - 20. The method of claim 17, wherein signal processing functions required to perform speech recognition can be allocated between said first computing device and said second computing device as needed based on computing resources available to said first and second computing devices respectively.
  - 21. The method of claim 17, wherein said first computing device is part of a client computing system, and said second computing device is part of a server computing system, and said communications channel is a network.
  - 26. The method of claim 21, wherein said second processing circuit performs accurate recognition of said one or more words with less latency than would that resulting if said one or more words were recognized by either said first processing circuit or said second processing circuit operating alone.

22. A method of performing distributed voice recognition comprising the steps of:
- (a) receiving user speech utterance signals representing speech utterances to be recognized during a sequence of speech utterance evaluation time frames, said speech utterances including sentences comprised of one or more words; and
  
  (b) generating speech data values with a first processing circuit for each speech utterance evaluation time frame during which speech utterance signals are received;
  
  (c) encoding said speech data values into a transmission format suitable for transmission over a communications channel to a second processing circuit; and
  
  wherein said speech data values are compressed without being quantized;
  
  further wherein said compressed speech data values constitute a sufficient amount of information that can be used by said second processing circuit to complete accurate recognition of said one or more words and said sentences.
- View Dependent Claims (23, 24, 25)
- - 23. The method of claim 22, wherein said recognition of said one or more words occurs in real-time.
  - 24. The method of claim 22, wherein said speech data values correspond to separate cepstral coefficient values, with each coefficient being used for a corresponding frequency component of said user speech utterance signals, and said sufficient amount of information corresponds to a set of cepstral coefficients for frequency components spanning an audible speech frequency range.
  - 25. The method of claim 24, wherein a set of delta and acceleration coefficients are computed from said cepstral coefficient values to complete recognition of said one or more words and said sentences.

27. A method of performing distributed speech recognition using a first computing device and a second computing device, the method comprising the steps of:
- (a) evaluating speech processing capabilities of the first computing device using an initialization routine; and
  
  (b) allocating speech processing tasks between the first computing device and the second computing device based on results of step (a), such that an overall speech recognition process is dynamically customized for performance characteristics of the first computing device and the second computing device; and
  
  (c) receiving a speech utterance at the first computing device; and
  
  (d) generating associated speech utterance signals from said speech utterance with the first computing device; and
  
  (e) generate a first set of speech data values from said speech utterance signals at the first computing device, said first set of speech data values being insufficient by themselves for permitting recognition of words articulated in said speech utterance; and
  
  (f) compressing said first set of speech data values at the first computing device;
  
  (g) transmitting said compressed first set of speech data values through said channel to the second computing device in a byte stream except when silence is detected; and
  
  (h) generating a second set of speech data values based on said speech data values, such that second set of speech data values contain sufficient information to be usable by a word recognition engine for recognizing words in said speech utterance.
- View Dependent Claims (28, 29, 30, 31, 32, 33)
- - 28. The method of claim 27, wherein said second set of speech data values include said first set of speech data values and a derived set of speech data values, which derived set of speech data values are computed based on said first speech data values.
  - 29. The method of claim 27, wherein said second set of speech data values can be generated by said second computing device in a time that is less than the combination of a first time which would be required by said first computing device to generate said second set of speech data values from said first set of speech data values combined with a second time which would be required to format and transmit said second set of speech data values.
  - 30. The method of claim 27, wherein signal processing responsibilities of said first and second computing devices are allocated such that said first computing device performs less than approximately ½
    - the required signal processing operations needed to convert said speech utterance signals into a form usable by a word recognition engine.
  - 31. The method of claim 27, wherein said speech processing tasks performed by said first and second computing devices are further allocated based on:
    - (i) computing resources available to said first and second computing devices; and
      
      (ii) transmission latencies of said communications channel.
  - 32. The method of claim 27, wherein said first processing device is also configured to assist said second processing device with signal processing computations required to generate said second set of speech data values.
  - 33. The method of claim 27, wherein said first set of speech data values represent the least amount of data that can used by said second processing device to generate said second set of data values usable for a word recognition process.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
Phoenix Solutions Incorporated (CDC Corporation)
Inventors
Bennett, Ian M.

Granted Patent

US 7,729,904 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/270
CPC Class Codes

G06F 16/243   Natural language query form...

G06F 16/24522   Translation of natural lang...

G06F 16/3344   using natural language anal...

G06F 40/216   using statistical methods

G06F 40/237   Lexical tools

G06F 40/30   Semantic analysis

G06F 40/42   Data-driven translation

G06F 40/44   Statistical methods, e.g. p...

G09B 5/04   with audible presentation o...

G09B 7/00   Electrically-operated teach...

G10L 15/142   Hidden Markov Models [HMMs]

G10L 15/18   using natural language mode...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/285   Memory allocation or algori...

G10L 15/30   Distributed recognition, e....

H04M 2250/74   with voice recognition mean...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Partial speech processing device & method for use in distributed systems

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

332 Citations

33 Claims

Specification

Use Cases

Quick Links

Others

Partial speech processing device & method for use in distributed systems

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

332 Citations

33 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others