Method and apparatus for encoding speech using neural network technology for speech classification

US 5,737,716 A
Filed: 12/26/1995
Issued: 04/07/1998
Est. Priority Date: 12/26/1995
Status: Expired due to Term

First Claim

Patent Images

1. A speech coding apparatus for encoding speech data which is input to the speech coding apparatus, the speech coding apparatus comprising:

an input device for receiving the speech data; and

at least one processor coupled to the input device, the at least one processor for parameterizing the speech data to produce at least one feature vector which describe parameters of the speech data, applying a first neural network to the at least one feature vector to obtain at least one speech classification of the speech data, creating characterized speech data by characterizing the speech data using a characterization methodology which depends on the at least one speech classification, and creating an encoded bitstream by encoding the characterized speech data.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A low-rate voice coding method and apparatus uses vocoder-embedded neural network techniques. A neural network controlled speech analysis processor includes a neural network which manages speech characterization, encoding , decoding, and reconstruction methodologies. The voice coding method and apparatus uses multi-layer perceptron (MLP) based neural network structures in single or multi-stage arrangements.

Citations

41 Claims

1. A speech coding apparatus for encoding speech data which is input to the speech coding apparatus, the speech coding apparatus comprising:
- an input device for receiving the speech data; and
  
  at least one processor coupled to the input device, the at least one processor for parameterizing the speech data to produce at least one feature vector which describe parameters of the speech data, applying a first neural network to the at least one feature vector to obtain at least one speech classification of the speech data, creating characterized speech data by characterizing the speech data using a characterization methodology which depends on the at least one speech classification, and creating an encoded bitstream by encoding the characterized speech data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The speech coding apparatus as claimed in claim 1 further comprising:
    - a memory device coupled to the at least one processor, the memory device for storing connection weight information used by the first neural network, wherein the connection weight information was predetermined by an adaptation process which stored the connection weight information in the memory device,wherein the at least one processor, during the step of applying the first neural network to the at least one feature vector, is also for reading the connection weight information from the memory device and using the connection weight information in conjunction with the first neural network when the first neural network is applied to the at least one feature vector.
  - 3. The speech coding apparatus as claimed in claim 2, wherein the at least one processor is also for determining an interference estimate which estimates a level of interference co-existent with the speech data, and for inputting the interference estimate into the first neural network when the first neural network is applied to the at least one feature vector.
  - 4. The speech coding apparatus as claimed in claim 2, wherein the at least one processor is also for determining an interference estimate which estimates a level of interference co-existent with the speech data, wherein the connection weight information comprises multiple sets of weights, each set of weights corresponding to an interference level, the at least one processor also for selecting the set of weights from the multiple sets of weights based on the interference estimate, and for using the set of weights as the connection weight information.
  - 5. The speech coding apparatus as claimed in claim 2, wherein the at least one processor is further for using at least one previous speech classification which was determined by the first neural network as an input to the first neural network when the first neural network is being applied to the at least one feature vector.
  - 6. The speech coding apparatus as claimed in claim 5, wherein the at least one processor is also for determining an interference estimate which estimates a level of interference co-existent with the speech data, and for inputting the interference estimate into the first neural network when the first neural network is applied to the at least one feature vector.
  - 7. The speech coding apparatus as claimed in claim 5, wherein the at least one processor is also for determining an interference estimate which estimates a level of interference co-existent with the speech data, wherein the connection weight information comprises multiple sets of weights, each set of weights corresponding to an interference level, the at least one processor also for selecting the set of weights from the multiple sets of weights based on the interference estimate, and for using the set of weights as the connection weight information.
  - 8. The speech coding apparatus as claimed in claim 2, wherein the memory device is also for storing second connection weight information used by a second neural network, and the at least one processor is also for applying the second neural network to the at least one speech classification which is output from the first neural network, wherein the second neural network uses the second connection weight information in conjunction with the second neural network and uses the at least one speech classification as an input to determine a more accurate speech classification, wherein the characterization methodology depends on the more accurate speech classification.
  - 9. The speech coding apparatus as claimed in claim 8, wherein the at least one processor is also for determining an interference estimate which estimates a level of interference co-existent with the speech data, and for inputting the interference estimate into the first neural network when the first neural network is applied to the at least one feature vector.
  - 10. The speech coding apparatus as claimed in claim 9, wherein the at least one processor is also for inputting the interference estimate into the second neural network when the second neural network is applied to the at least one speech classification which is output from the first neural network.
  - 11. The speech coding apparatus as claimed in claim 8, wherein the at least one processor is also for determining an interference estimate which estimates a level of interference co-existent with the speech data, wherein the connection weight information comprises multiple sets of weights, each set of weights corresponding to an interference level, the at least one processor also for selecting the set of weights from the multiple sets of weights based on the interference estimate, and for using the set of weights as the connection weight information for the first neural network.
  - 12. The speech coding apparatus as claimed in claim 11, wherein the at least one processor is also for selecting a second set of weights from the multiple sets of weights based on the interference estimate, and for using the second set of weights as the connection weight information for the second neural network.
  - 13. The speech coding apparatus as claimed in claim 1, further comprising:
    - a transmission channel interface coupled to the processor, wherein the transmission channel interface is for sending the encoded bitstream to a speech decoding apparatus which performs inverse processes to those performed by the speech coding apparatus so that synthesized speech data which approximates the speech data can be obtained.
  - 14. The speech coding apparatus as claimed in claim 1, wherein the at least one processor is also for applying the first neural network to the at least one feature vector to obtain the at least one speech classification of the speech data, wherein the at least one speech classification comprises at least two degrees of periodicity of the speech data.
  - 15. The speech coding apparatus as claimed in claim 1, wherein the at least one processor is also for applying the first neural network to the at least one feature vector to obtain the at least one speech classification of the speech data, wherein the at least one speech classification comprises multiple phonemes which approximate the speech data.
  - 16. The speech coding apparatus as claimed in claim 1, wherein the at least one processor is also for parameterizing the speech data to produce the at least one feature vector, wherein the at least one feature vector comprises a subframe correlation coefficient over expected pitch range, a subframe LPC gain, a subframe low-band to high-band energy ratio, and a subframe energy ratio of a segment of the speech data against a maximum energy of multiple prior segments of the speech data.
  - 17. The speech coding apparatus as claimed in claim 1, wherein the at least one processor, during the step of encoding the characterized speech data, is also for using an encoding methodology which depends on the at least one speech classification.

18. A speech decoding apparatus for decoding an encoded bitstream to produce synthesized speech data, the speech decoding apparatus comprising:
- a transmission channel interface for receiving the encoded bitstream from a speech encoding apparatus; and
  
  at least one processor coupled to the transmission channel interface, the at least one processor for decoding a speech classification from a first portion of the encoded bitstream, wherein the speech classification was derived by a neural network in the speech encoding apparatus, the at least one processor also for decoding a remainder of the encoded bitstream using a decoding methodology which depends on the speech classification, resulting in a decoded bitstream, the at least one processor also for creating reconstructed speech basis elements from the decoded bitstream and producing the synthesized speech data using the reconstructed speech basis elements.
- View Dependent Claims (19)
- - 19. The speech decoding apparatus as claimed in claim 18, wherein the at least one processor, during the step of creating the reconstructed speech basis elements, is also for using a reconstruction methodology which is an inverse process to a characterization methodology used by the speech encoding apparatus, the characterization methodology having been determined from the speech classification.

20. A method for encoding speech data by a speech coding apparatus comprising the steps of:
- a) acquiring a segment of the speech data;
  
  b) parameterizing the segment of the speech data to produce at least one feature vector which describes parameters of the speech data;
  
  c) applying a first neural network to the at least one feature vector to obtain at least one speech classification of the speech data;
  
  d) creating characterized speech data by characterizing the speech data using a characterization methodology which depends on the at least one speech classification; and
  
  e) creating an encoded bitstream by encoding the characterized speech data.
- View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
- - 21. The method as claimed in claim 20 further comprising the steps of:
    - f) storing connection weight information used by the first neural network, wherein the connection weight information was predetermined by an adaptation process;
      
      wherein step c) comprises the steps of;
      
      c1) reading the connection weight information; and
      
      c2) using the connection weight information in conjunction with the first neural network when the first neural network is applied to the at least one feature vector.
  - 22. The method as claimed in claim 21, wherein the at least one processor is also forg) determining an interference estimate which estimates a level of interference co-existent with the speech data, wherein the connection weight information comprises multiple sets of weights, each set of weights corresponding to an interference level;
    - wherein step c) further comprises the steps of;
      
      c3) selecting the set of weights from the multiple sets of weights based on the interference estimate; and
      
      c4) using the set of weights as the connection weight information.
  - 23. The method as claimed in claim 21, wherein step c) further comprises the step of:
    - c3) using at least one previous speech classification which was determined by the first neural network as an input to the first neural network when the first neural network is being applied to the at least one feature vector.
  - 24. The method as claimed in claim 23, further comprising the step of:
    - g) determining an interference estimate which estimates a level of interference co-existent with the speech data; and
      
      wherein step c) further comprises the step of;
      
      c4) inputting the interference estimate into the first neural network when the first neural network is applied to the at least one feature vector.
  - 25. The method as claimed in claim 23, further comprising the step of:
    - g) determining an interference estimate which estimates a level of interference co-existent with the speech data, wherein the connection weight information comprises multiple sets of weights, each set of weights corresponding to an interference level;
      
      wherein step c) further comprises the steps of;
      
      c4) selecting the set of weights from the multiple sets of weights based on the interference estimate; and
      
      c5) using the set of weights as the connection weight information.
  - 26. The method as claimed in claim 21, further comprising the steps of:
    - g) storing the connection weight information to be used by a second neural network;
      
      h) applying the second neural network to the at least one speech classification which is output from the first neural network;
      
      i) using the connection weight information in conjunction with the second neural network when the second neural network is applied to the at least one speech classification; and
      
      j) using the at least one speech classification as an input to the second neural network to determine a more accurate speech classification, wherein the characterization methodology and the encoding methodology depend on the more accurate speech classification.
  - 27. The method as claimed in claim 26, further comprising the step of:
    - k) determining an interference estimate which estimates a level of interference co-existent with the speech data; and
      
      wherein step c) comprises the step of;
      
      c3) inputting the interference estimate into the first neural network when the first neural network is applied to the at least one feature vector.
  - 28. The method as claimed in claim 27, wherein step j) comprises the step of:
    - j1) inputting the interference estimate into the second neural network when the second neural network is applied to the at least one speech classification which is output from the first neural network.
  - 29. The method as claimed in claim 26, further comprising the step of:
    - k) determining an interference estimate which estimates a level of interference co-existent with the speech data, wherein the connection weight information comprises multiple sets of weights, each set of weights corresponding to an interference level;
      
      wherein the step c) further comprises the steps of;
      
      c4) selecting the set of weights from the multiple sets of weights based on the interference estimate; and
      
      c5) using the set of weights as the connection weight information for the first neural network.
  - 30. The method as claimed in claim 29, further comprising the step of:
    - l) selecting a second set of weights from the multiple sets of weights based on the interference estimate; and
      
      wherein step j) comprises the step of;
      
      j1) using the second set of weights as the connection weight information for the second neural network.
  - 31. The method as claimed in claim 21, further comprising the step of:
    - g) determining an interference estimate which estimates a level of interference co-existent with the speech data; and
      
      wherein step c) further comprises the step of;
      
      c3) inputting the interference estimate into the first neural network when the first neural network is applied to the at least one feature vector.
  - 32. The method as claimed in claim 20, further comprising the step of:
    - f) sending the encoded bitstream to a speech decoding apparatus which performs inverse processes to those performed by the speech coding apparatus so that synthesized speech data which approximates the speech data can be obtained.
  - 33. The method as claimed in claim 20, wherein step c) comprises the step of:
    - c1) applying the first neural network to the at least one feature vector to obtain the at least one speech classification of the speech data, wherein the at least one speech classification comprises at least two degrees of periodicity of the speech data.
  - 34. The method as claimed in claim 20, wherein step c) comprises the step of:
    - e1) applying the first neural network to the at least one feature vector to obtain the at least one speech classification of the speech data, wherein the at least one speech classification comprises multiple phonemes which approximate the speech data.
  - 35. The method as claimed in claim 20, wherein step b) comprises the step of:
    - b1) parameterizing the speech data to produce the at least one feature vector, wherein the at least one feature vector comprises a subframe correlation coefficient over expected pitch range, a subframe LPC gain, a subframe low-band to high-band energy ratio, and a subframe energy ratio of the segment against a maximum energy of multiple prior segments.
  - 36. The method as claimed in claim 20, wherein step e) comprises the step of:
    - e1) encoding the characterized speech data using an encoding methodology which depends on the at least one speech classification.
  - 37. The method as claimed in claim 20, wherein the characterized speech data includes at least one parameter that represents the speech data, and step e) comprises the steps of:
    - e1) determining whether the at least one speech classification indicates that a particular parameter of the at least one parameter is a dominant parameter of the speech data;
      
      e2) when the at least one speech classification indicates that the particular parameter is the dominant parameter of the speech data, encoding the particular parameter using a first quantization codebook having a first number of codebook entries; and
      
      e3) when the at least one speech classification indicates that the particular parameter is a less dominant parameter of the speech data, encoding the particular parameter using a second quantization codebook having a second number of the codebook entries, wherein the second number is smaller than the first number.
  - 38. The method as claimed in claim 20, wherein the characterized speech data includes at least one parameter that represents the speech data, multiple quantizer stages are available to encode each of the at least one parameter, and step e) comprises the steps of:
    - e1) determining whether the at least one speech classification indicates that a particular parameter of the at least one parameter is a dominant parameter of the speech data;
      
      e2) when the at least one speech classification indicates that the particular parameter is the dominant parameter of the speech data, encoding the particular parameter using a first number of quantization stages; and
      
      e3) when the at least one speech classification indicates that the particular parameter is a less dominant parameter of the speech data, encoding the particular parameter using a second number of quantization stages, wherein the second number is smaller than the first number.
  - 39. The method as claimed in claim 20, further comprising the step, performed before step b) of:
    - f) adding a small level of Gaussian noise to the segment of the speech data.

40. A method for decoding an encoded bitstream to produce synthesized speech data, the method comprising the steps of:
- a) receiving the encoded bitstream from a speech encoding apparatus;
  
  b) decoding a speech classification from a fit portion of the encoded bitstream, wherein the speech classification was derived by a neural network in the speech encoding apparatus;
  
  c) decoding a remainder of the encoded bitstream using a decoding methodology which depends on the speech classification, resulting in a decoded bitstream;
  
  d) creating reconstructed speech basis elements from the decoded bitstream; and
  
  e) producing the synthesized speech data using the reconstructed speech basis elements.
- View Dependent Claims (41)
- - 41. The method as claimed in claim 40, wherein step d) comprises the step of:
    - d1) using a reconstruction methodology which is an inverse process to a characterization methodology used by the speech encoding apparatus, the characterization methodology having been determined from the speech classification.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
CDC Propriete Intellectuelle
Original Assignee
Motorola, Inc. (Motorola Solutions, Inc.)
Inventors
Bergstrom, Chad Scott, Garrison, III, deceased, Sidney Clarence
Primary Examiner(s)
MacDonald, Allen R.
Assistant Examiner(s)
COLLINS, ALPHONSO

Application Number

US08/578,730
Time in Patent Office

833 Days
Field of Search

395/2.1, 395/2.11, 395/2.2, 395/2.23, 395/2.24, 395/2.35, 395/2.41, 395/2.45, 395/2.68, 395/2.3
US Class Current

704/202
CPC Class Codes

G10L 19/02 using spectral analysis, e....

G10L 25/30 using neural networks

Method and apparatus for encoding speech using neural network technology for speech classification

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

41 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for encoding speech using neural network technology for speech classification

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

41 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links