Method and apparatus for speech reconstruction in a distributed speech recognition system

US 6,633,839 B2
Filed: 02/02/2001
Issued: 10/14/2003
Est. Priority Date: 02/02/2001
Status: Active Grant

First Claim

Patent Images

1. In a distributed speech recognition system comprising a first communication device which receives a speech input and a second communication device remotely located from the first communication device and communicatively coupled to the first communication device, a method of reconstructing the speech input at the second communication device comprising the steps of:

receiving at the second communication device of the distributed speech recognition system encoded data sent by the first communication device of the distributed speech recognition system, the encoded data including encoded spectral data and encoded energy data;

selectively at the second communication device decoding the encoded spectral data and encoded energy data to determine the spectral data and energy data and extracting a speech recognition parameter from the encoded data; and

selectively combining the spectral data and energy data to reconstruct the speech input at the second communication device and matching the speech recognition parameter with a speech recognition data set.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In a distributed speech recognition system comprising a first communication device which receives a speech input (34), encodes data representative of the speech input, and transmits the encoded data and a second remotely-located communication device which receives the encoded data and compares the encoded data with a known data set, the device including a processor with a program which controls the processor to operate according to a method of reconstructing the speech input including the step of receiving encoded data including encoded spectral data and encoded energy data. The method further includes the step of decoding the encoded spectral data and encoded energy data to determine the spectral data and energy data. The method also includes the step of combining the spectral data and energy data to reconstruct the speech input.

80 Citations

View as Search Results

22 Claims

1. In a distributed speech recognition system comprising a first communication device which receives a speech input and a second communication device remotely located from the first communication device and communicatively coupled to the first communication device, a method of reconstructing the speech input at the second communication device comprising the steps of:
- receiving at the second communication device of the distributed speech recognition system encoded data sent by the first communication device of the distributed speech recognition system, the encoded data including encoded spectral data and encoded energy data;
  
  selectively at the second communication device decoding the encoded spectral data and encoded energy data to determine the spectral data and energy data and extracting a speech recognition parameter from the encoded data; and
  
  selectively combining the spectral data and energy data to reconstruct the speech input at the second communication device and matching the speech recognition parameter with a speech recognition data set.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of reconstructing the speech input according to claim 1, wherein the receiving step comprises the step of receiving encoded data including spectral data encoded as a series of mel-frequency cepstral coefficients.
  - 3. The method of reconstructing the speech input according to claim 2, wherein the speech input has a pitch period and the decoding step comprises the steps of:
4. The method of reconstructing the speech input according to claim 3, wherein the step of performing the inverse discrete cosine transform comprises the steps of:
- determining a matrix comprising a plurality of column vectors, each column vector corresponding to one of a plurality of mel-frequencies;
  
  selecting a column vector from the matrix corresponding to one of the plurality of mel-frequencies closest in value to one of the harmonic mel-frequencies; and
  
  forming an inner product between a row vector formed from the series of mel-frequency cepstral coefficients and the selected column vector.
5. The method of reconstructing the speech input according to claim 2, wherein the decoding step comprises the steps of:
- determining mel-frequencies corresponding to a set of frequencies; and
  
  performing an inverse discrete cosine transform on the mel-frequency cepstral coefficients at the mel-frequencies to determine log-spectral magnitudes of the speech input at the mel-frequencies.
6. The method of reconstructing the speech input according to claim 1, wherein:
- the receiving step comprises the step of receiving encoded data including encoded additional excitation data;
  
  the decoding step comprises the step of decoding the encoded additional excitation data to determine the additional excitation data; and
  
  the combining step comprises the step of combining the spectral, energy and excitation data to reconstruct the speech input.
7. The method of reconstructing the speech input according to claim 6, wherein the decoding step comprises the step of decoding the encoded additional excitation data to determine a pitch period and a voice class.

8. In a distributed speech recognition system comprising a first communication device which receives a speech input, encodes data representative of the speech input, and transmits the encoded data and a second remotely-located communication device which receives the encoded data and compares the encoded data with a known data set, a method of reconstructing the speech input at the second communication device comprising the steps of:
- receiving encoded data including encoded spectral data spectral data encoded as a series of mel-frequency cepstral coefficients and encoded energy data;
  
  performing an inverse discrete cosine transform on the mel-frequency cepstral coefficients at harmonic mel-frequencies corresponding to a pitch period of the speech input to determine log-spectral magnitudes of the speech input at the mel-harmonic frequencies; and
  
  exponentiating the log-spectral magnitudes to determine the spectral magnitudes of the speech input;
  
  decoding the encoded energy data to determine the energy data; and
  
  combining the spectral magnitudes and the energy data to reconstruct the speech input.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
- - 9. The method of reconstructing the speech input according to claim 8, wherein the step of performing the inverse discrete cosine transform comprises the steps of:
10. The method of reconstructing the speech input according to claim 8, further comprising the step of comparing the series of mel-frequency cepstral coefficients to a series of mel-frequency cepstral coefficients corresponding to an impulse response.
11. The method of reconstructing the speech input according to claim 10, wherein the step of comparing comprises the step of subtracting a series of mel-frequency cepstral coefficients corresponding to an impulse response of a pre-emphasis filter from the series of mel-frequency cepstral coefficients.
12. The method of reconstructing the speech input according to claim 8, wherein the speech input is divided into a series of frames and:
- the step of receiving encoded data comprises the step of receiving encoded energy data including a natural logarithm of an average energy value for each frame in the series of frames; and
  
  the step of decoding the encoded energy data comprises the step of exponentiating the natural logarithm of the average energy value for each frame in the series of frames.
13. The method of reconstructing the speech input according to claim 8, wherein:
- the receiving step comprises the step of receiving encoded data including encoded additional excitation data;
  
  the decoding step comprises the step of decoding the encoded additional excitation data to determine the additional excitation data; and
  
  the combining step comprises the step of combining the spectral, energy and excitation data to reconstruct the speech input.
14. The method of reconstructing the speech input according to claim 13, wherein the decoding step comprises the step of decoding the encoded excitation data to determine a pitch period and a voice class.
15. The method of reconstructing the speech input according to claim 14, wherein the decoding step includes the step of decoding the encoded excitation data to determine sub-frame energy data.
16. The method of reconstructing the speech input according to claim 8, wherein the step of performing an inverse discrete cosine transform includes the step of performing an inverse discrete cosine transform of higher resolution than a discrete cosine transform used to encode the spectral data as a series of mel-frequency cepstral coefficients.

17. In a distributed speech recognition system comprising a first communication device which receives a speech input, encodes data about the speech input, and transmits the encoded data and a second remotely-located communication device which receives the encoded data and compares the encoded data with a known data set, the second remotely-located communication device comprising:
- a processor including a program which controls the processor (i) to receive the encoded data including encoded spectral data spectral data encoded as a series of mel-frequency cepstral coefficients and encoded energy data, (ii) to perform an inverse discrete cosine transform on the mel-frequency cepstral coefficients at harmonic mel-frequencies corresponding to a pitch period of the speech input to determine log-spectral magnitudes of the speech input at the harmonic frequencies, (iii) to exponentiate the log-spectral magnitudes to determine the spectral magnitudes of the speech input, and (iv) to decode the encoded energy data to determine the energy data; and
  
  a speech synthesizer which combines the spectral magnitudes and the energy data to reconstruct the speech input.
- View Dependent Claims (18, 19, 20, 21, 22)
- - 18. The communication device according to claim 17, wherein the program further controls the processor (i) to determine a matrix comprising a plurality of column vectors, each column vector corresponding to one of a plurality of mel-frequencies, (ii) to select a column vector from the matrix corresponding to one of the plurality of mel-frequencies closest in value to one of the harmonic mel-frequencies, and (iii) to form an inner product between a row vector formed from the series of mel-frequency cepstral coefficients and the selected column vector so as to perform the inverse discrete cosine transform.
  - 19. The communication device according to claim 18, wherein the program further controls the processor to subtract a series of mel-frequency cepstral coefficients corresponding to an impulse response from the series of mel-frequency cepstral coefficients before performing the inverse discrete cosine transform.
  - 20. The communication device according to claim 17, wherein the speech input is divided into a series of frames and the program further controls the processor (i) to receive encoded energy data including a natural logarithm of an average energy value for each frame in the series of frames, and (ii) to exponentiate the natural logarithm of the average energy value for each frame in the series of frames to determine the energy data.
  - 21. The communication device according to claim 17, wherein:
22. The communication device according to claim 21, wherein the speech synthesizer comprises a sinusoidal vocoder-synthesizer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Original Assignee
Motorola, Inc. (Motorola Solutions, Inc.)
Inventors
Jasiuk, Mark A., Ramabadran, Tenkasi V., Kushner, William M., Meunier, Jeffrey
Primary Examiner(s)
MCFADDEN, SUSAN IRIS

Application Number

US09/775,951
Publication Number

US 20020147579A1
Time in Patent Office

984 Days
Field of Search

704/203, 704/207, 704/204, 704/216, 704/217, 704/231, 704/236, 704/255, 704/254, 704/205, 704/206
US Class Current

704/205
CPC Class Codes

G10L 15/30   Distributed recognition, e....

G10L 19/00   Speech or audio signals ana...

G10L 19/093   using sinusoidal excitation...

G10L 25/18   the extracted parameters be...

Method and apparatus for speech reconstruction in a distributed speech recognition system

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

80 Citations

22 Claims

Specification

Use Cases

Quick Links

Others

Method and apparatus for speech reconstruction in a distributed speech recognition system

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

80 Citations

22 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others