Voice audio encoding device, voice audio decoding device, voice audio encoding method, and voice audio decoding method

US 10,685,660 B2
Filed: 09/25/2018
Issued: 06/16/2020
Est. Priority Date: 12/13/2012
Status: Active Grant

First Claim

Patent Images

1. A speech or audio coding apparatus comprising:

a transformation section that transforms an input signal from a time domain to a frequency domain to obtain a frequency spectrum comprising spectral coefficients;

an estimation section that estimates an energy envelope which represents an energy level for each subband of a plurality of subbands achieved by splitting the frequency spectrum of the input signal, each subband having at least two spectral coefficients;

a quantization section that quantizes the energy envelope to obtain a quantized energy envelope;

a group determining section that splits the quantized energy envelopes into a plurality of groups, each group having a plurality of at least two subbands;

a first bit allocation section that allocates bits to each group of the plurality of groups to obtain a group-specific number of bits for each group of the plurality of groups;

a second bit allocation section that allocates, for each group of the plurality of groups, the group-specific number of bits allocated to a respective group of the plurality of groups to the plurality of subbands belonging to the respective group; and

a coding section that encodes, for each subband of the plurality of subbands, the spectral coefficients included in the respective subband using bits allocated to the respective subbands.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Provided are a voice audio encoding device, voice audio decoding device, voice audio encoding method, and voice audio decoding method that efficiently perform bit distribution and improve sound quality. Dominant frequency band identification unit identifies a dominant frequency band having a norm factor value that is the maximum value within the spectrum of an input voice audio signal. Dominant group determination units and non-dominant group determination unit group all sub-bands into a dominant group that contains the dominant frequency band and a non-dominant group that contains no dominant frequency band. Group bit distribution unit distributes bits to each group on the basis of the energy and norm variance of each group. Sub-band bit distribution unit redistributes the bits that have been distributed to each group to each sub-band in accordance with the ratio of the norm to the energy of the groups.

36 Citations

25 Claims

1. A speech or audio coding apparatus comprising:
- a transformation section that transforms an input signal from a time domain to a frequency domain to obtain a frequency spectrum comprising spectral coefficients;
  
  an estimation section that estimates an energy envelope which represents an energy level for each subband of a plurality of subbands achieved by splitting the frequency spectrum of the input signal, each subband having at least two spectral coefficients;
  
  a quantization section that quantizes the energy envelope to obtain a quantized energy envelope;
  
  a group determining section that splits the quantized energy envelopes into a plurality of groups, each group having a plurality of at least two subbands;
  
  a first bit allocation section that allocates bits to each group of the plurality of groups to obtain a group-specific number of bits for each group of the plurality of groups;
  
  a second bit allocation section that allocates, for each group of the plurality of groups, the group-specific number of bits allocated to a respective group of the plurality of groups to the plurality of subbands belonging to the respective group; and
  
  a coding section that encodes, for each subband of the plurality of subbands, the spectral coefficients included in the respective subband using bits allocated to the respective subbands.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The speech or audio coding apparatus according to claim 1, further comprising a dominant frequency band identification section that identifies a dominant frequency band which is a subband in which the energy envelope of the frequency spectrum exhibits a local maximum value, whereinthe group determining section determines the dominant frequency band and subbands on both sides of the dominant frequency band each forming a descending slope of the energy envelope as dominant groups and determines continuous subbands other than the dominant frequency band as non-dominant groups.
  - 3. The speech or audio coding apparatus according to claim 1, further comprising:
    - an energy calculation section that calculates a group-specific energy; and
      
      a distribution calculation section that calculates a group-specific energy envelope distribution, whereinthe first bit allocation section allocates, based on the calculated group-specific energy and the group-specific energy envelope distribution, more bits to a group when at least one of the energy and the energy envelope distribution is greater and allocates fewer bits to a group when at least one of the energy and the energy envelope distribution is smaller.
  - 4. The speech or audio coding apparatus according to claim 1, wherein the second bit allocation section allocates more bits to a subband comprising a greater energy envelope and allocates fewer bits to a subband comprising a smaller energy envelope.
  - 5. The speech or audio coding apparatus according to claim 1, wherein the second bit allocation section is configured to allocate more bits to a perceptually more important subband and fewer bits to a perceptually less important subband.
  - 6. The speech or audio coding apparatus according to claim 1, wherein the second bit allocation section is configured to allocate more bits to the subbands in a group having a higher energy variance and to allocate fewer bits to the subbands in a group having a lower energy variance.
  - 7. The speech or audio coding apparatus according to claim 1, wherein the second bit allocation section is configured to allocate more bits to the subbands in a group having a peak in the frequency spectrum and to allocate fewer bits to the subbands in a group having a valley in the frequency spectrum.
  - 8. The speech or audio coding apparatus according to claim 1, wherein the second bit allocation section is configured to operate based on the following equation:
  - 9. The speech or audio coding apparatus according to claim 1, wherein the first bit allocation section is configured to allocate more bits to a dominant group and fewer bits to a non-dominant group.
  - 10. The speech or audio coding apparatus according to claim 1, wherein the first bit allocation section is configured to allocates bits on a group-by-group basis based on a group-specific energy, a total energy of all groups, a group-specific energy variance and a total energy variance of all groups.
  - 11. The speech or audio coding apparatus according to claim 1, wherein the first bit allocation section is configured to operate based on the following equation:
  - 12. The speech or audio coding apparatus according to claim 11, wherein a value of scale1 is between 0 and 1.
  - 13. The speech or audio coding apparatus according to claim 1, wherein the first bit allocation section is configured to determine a perceptual importance of each group by using an energy and an energy variance of the group and to enhance a dominant group.
  - 14. The speech or audio coding apparatus according to claim 1, wherein the first bit allocation section is configured to determine a perceptual importance of a group based on an energy of the group and an energy distribution and to determine bits to be allocated to each group based on the perceptual importance for the respective group.
  - 15. The speech or audio coding apparatus according to claim 1, wherein the group determining section is configured to adaptively determine group widths of the plurality of groups according to a characteristic of the input signal.
  - 16. The speech or audio coding apparatus according to claim 1, wherein the group determining section is configured to use quantized subband energies.
  - 17. The speech or audio coding apparatus according to claim 1, wherein the group determining section is configured to separate peaks of the frequency spectrum from valleys of the frequency spectrum, wherein a peak of the frequency spectrum is located in a dominant group and a valley of the frequency spectrum is located in a non-dominant group.
  - 18. The speech or audio coding apparatus according to claim 1,wherein the group determining section is configured to identify dominant frequency bands, in which subband energy values in the frequency spectrum of the input signal have local maximum values, and to group subbands including the dominant frequency bands into dominant groups and other subbands into non-dominant groups,wherein the first bit allocation section is configured to allocate bits to a respective group based on an energy of the respective group and an energy variance of the respective group, andwherein the second bit allocation section is configured to allocate the bits, allocated on a group-by-group basis to the respective group, to a respective subband in the respective group according to a ratio of an energy of the respective subband to an energy of the respective group.
  - 19. The speech or audio coding apparatus according to claim 1,wherein the first bit allocation section is configured to allocate more bits to a perceptually more important group and less bits to a perceptually less important group, andwherein the second bit allocation section is configured to allocate more bits to a perceptually more important subband and less bits to a perceptually less important subband.

20. A speech or audio decoding apparatus, comprising:
- a de-quantization section that de-quantizes a quantized spectral envelope to obtain a dequantized spectral envelope;
  
  a group determining section that groups splits the quantized spectral envelope into a plurality of groups each group having a plurality of at least two subbands;
  
  a first bit allocation section that allocates bits to each group of the plurality of groups to obtain a group-specific number of bits for each group of the plurality of groups;
  
  a second bit allocation section that allocates, for each group of the plurality of groups, the group-specific number of bits allocated to a respective group of the plurality of groups to the plurality of subbands belonging to the respective group;
  
  a decoding section that decodes, for each subband of the plurality of subbands, encoded spectral coefficients included in a respective subband of a speech or audio signal using the bits allocated to the respective subband to obtain a decoded frequency spectrum;
  
  an envelope shaping section that applies the de-quantized spectral envelope to the decoded frequency spectrum to obtain a shaped spectrum; and
  
  an inverse transformation section that inversely transforms the shaped spectrum from a frequency domain to a time domain.
- View Dependent Claims (21, 22, 23)
- - 21. The speech or audio decoding apparatus according to claim 20, further comprising a dominant frequency band identification section that identifies a dominant frequency band which is a subband in which the energy envelope of the frequency spectrum exhibits a local maximum value, whereinthe group determining section determines the dominant frequency band and subbands on both sides of the dominant frequency band each forming a descending slope of the energy envelope as dominant groups and determines continuous subbands other than the dominant frequency band as non-dominant groups.
  - 22. The speech or audio decoding apparatus according to claim 20, further comprising:
    - an energy calculation section that calculates a group-specific energy; and
      
      a distribution calculation section that calculates a group-specific energy envelope, whereinthe first bit allocation section allocates, based on the calculated group-specific energy and the group-specific energy envelope distribution, more bits to a group when at least one of the energy and the energy envelope distribution is greater and allocates fewer bits to a group when at least one of the energy and the energy envelope distribution is smaller.
  - 23. The speech or audio decoding apparatus according to claim 20, wherein the second bit allocation section allocates more bits to a subband comprising a greater energy envelope and allocates fewer bits to a subband comprising a smaller energy envelope.

24. A speech or audio coding method, comprising:
- transforming an input signal from a time domain to a frequency domain to obtain a frequency spectrum comprising spectral coefficients;
  
  estimating an energy envelope that represents an energy level for each subband of a plurality of subbands achieved by splitting the frequency spectrum of the input signal, each subband having at least two spectral coefficients;
  
  quantizing the energy envelope to obtain a quantized energy envelope;
  
  splitting the quantized energy envelopes into a plurality of groups, each group having a plurality of at least two subbands;
  
  allocating, for each group of the plurality of groups, bits to each group of the plurality of groups to obtain a group-specific number of bits for each group of the plurality of groups;
  
  allocating, for each group of the plurality of groups, the group-specific number of bits allocated to a respective group of the plurality of groups to the plurality of subbands belonging to the respective group; and
  
  encoding, for each subband of the plurality of subbands, the spectral coefficients included in the respective subband using bits allocated to the respective subband.

25. A speech or audio decoding method, comprising:
- de-quantizing a quantized spectral envelope to obtain a dequantized spectral envelope;
  
  splitting the quantized spectral envelope into a plurality of groups each group having a plurality of at least two subbands;
  
  allocating bits to each group of the plurality of groups to obtain a group-specific number of bits for each group of the plurality of groups;
  
  allocating, for each group of the plurality of groups, the group-specific number of bits allocated to a respective group of the plurality of groups to the plurality of subbands belonging to the respective group;
  
  decoding, for each subband of the plurality of subbands, encoded spectral coefficients included in a respective subband of a speech/audio signal using the bits allocated to the respective subband to obtain a decoded frequency spectrum;
  
  applying the de-quantized spectral envelope to the decoded frequency spectrum to obtain a shaped spectrum; and
  
  inversely transforming the shaped spectrum from a frequency domain to a time domain.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Fraunhofer Gesellschaft Zur Foerderung Der Angewandten Forsching E.V.
Original Assignee
Fraunhofer Gesellschaft Zur Foerderung Der Angewandten Forsching E.V.
Inventors
Liu, Zongxian, Nagisetty, Srikanth, Oshikiri, Masahiro
Primary Examiner(s)
Shin, Seong-Ah A

Application Number

US16/141,934
Publication Number

US 20190027155A1
Time in Patent Office

630 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 19/0204 using subband decomposition

G10L 19/035 Scalar quantisation

Voice audio encoding device, voice audio decoding device, voice audio encoding method, and voice audio decoding method

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

36 Citations

25 Claims

Specification

Use Cases

Quick Links

Others

Voice audio encoding device, voice audio decoding device, voice audio encoding method, and voice audio decoding method

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

36 Citations

25 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others