Encoding device and encoding method, decoding device and decoding method, and program
First Claim
1. An encoding device, comprising:
- processing circuitry configured to perform a process including;
receiving an input audio signal;
generating a low frequency sub-band signal of a sub-band on a low frequency side of the input audio signal and a high frequency sub-band signal of a sub-band on a high frequency side of the input audio signal;
calculating a quasi-high frequency sub-band power that is an estimated value of a high frequency sub-band power of the high frequency sub-band signal based on the low frequency sub-band signal and a predetermined estimation coefficient;
calculating a number-of-sections determining feature amount by calculating a sub-band power sum of the power of the sub-band signal of the sub-bands on the high frequency side of the input signal, wherein the sub-band power sum is an estimated bandwidth of a frame to be processed;
determining the number of continuous frame sections including frames for which the same estimation coefficient is selected in a process target section including a plurality of frames of the input signal, based on the number-of-sections determining feature amount;
selecting the estimation coefficient of a frame that constitutes the continuous frame section from a plurality of estimation coefficients based on the quasi-high frequency sub-band power and the high frequency sub-band power in each continuous frame section obtained by dividing the process target section based on the determined number of continuous frame sections;
generating data for obtaining the estimation coefficient selected in a frame of each of the continuous frame sections constituting the process target section;
encoding a low frequency signal of the input signal to generate low frequency encoded data;
multiplexing the data and the low frequency encoded data to generate an output code string representative of the input audio signal; and
outputting the output code string.
1 Assignment
0 Petitions
Accused Products
Abstract
The present technology relates to an encoding device and an encoding method, a decoding device and a decoding method, and a program, configured to obtain a high quality audio with less encoding amount. A number-of-sections determining feature amount calculating circuit calculates a number-of-sections determining feature amount for determining the number of divisions to divide a process target section into continuous frame sections each including a frame for which the same estimation coefficient is selected, based on sub-band signals of a plurality of sub-bands constituting an input signal. A quasi-high frequency sub-band power difference calculating circuit determines the number of continuous frame sections in the process target section based on the number-of-sections determining feature amount, selects an estimation coefficient for obtaining a high frequency component of the input signal by estimation for each continuous frame section, and generates data including a coefficient index for obtaining the estimation coefficient. A high frequency encoding circuit encodes the obtained data, and generates high frequency encoded data. The present technology can be applied to an encoding device.
61 Citations
18 Claims
-
1. An encoding device, comprising:
-
processing circuitry configured to perform a process including; receiving an input audio signal; generating a low frequency sub-band signal of a sub-band on a low frequency side of the input audio signal and a high frequency sub-band signal of a sub-band on a high frequency side of the input audio signal; calculating a quasi-high frequency sub-band power that is an estimated value of a high frequency sub-band power of the high frequency sub-band signal based on the low frequency sub-band signal and a predetermined estimation coefficient; calculating a number-of-sections determining feature amount by calculating a sub-band power sum of the power of the sub-band signal of the sub-bands on the high frequency side of the input signal, wherein the sub-band power sum is an estimated bandwidth of a frame to be processed; determining the number of continuous frame sections including frames for which the same estimation coefficient is selected in a process target section including a plurality of frames of the input signal, based on the number-of-sections determining feature amount; selecting the estimation coefficient of a frame that constitutes the continuous frame section from a plurality of estimation coefficients based on the quasi-high frequency sub-band power and the high frequency sub-band power in each continuous frame section obtained by dividing the process target section based on the determined number of continuous frame sections; generating data for obtaining the estimation coefficient selected in a frame of each of the continuous frame sections constituting the process target section; encoding a low frequency signal of the input signal to generate low frequency encoded data; multiplexing the data and the low frequency encoded data to generate an output code string representative of the input audio signal; and outputting the output code string. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An encoding method, comprising:
-
receiving, by processing circuitry, an input audio signal; generating, by the processing circuitry, a low frequency sub-band signal of a sub-band on a low frequency side of the input audio signal and a high frequency sub-band signal of a sub-band on a high frequency side of the input audio signal; calculating, by the processing circuitry, a quasi-high frequency sub-band power that is an estimated value of a high frequency sub-band power of the high frequency sub-band signal based on the low frequency sub-band signal and a predetermined estimation coefficient; calculating, by the processing circuitry, a number-of-sections determining feature amount by calculating a sub-band power sum of the power of the sub-band signal of the sub-bands on the high frequency side of the input signal, wherein the sub-band power sum is an estimated bandwidth of a frame to be processed; determining, by the processing circuitry, the number of continuous frame sections including frames for which the same estimation coefficient is selected in a process target section including a plurality of frames of the input signal, based on the number-of-sections determining feature amount; selecting, by the processing circuitry, the estimation coefficient of a frame that constitutes the continuous frame section from a plurality of estimation coefficients based on the quasi-high frequency sub-band power and the high frequency sub-band power in each continuous frame section obtained by dividing the process target section based on the determined number of continuous frame sections; generating, by the processing circuitry, data for obtaining the estimation coefficient selected in a frame of each of the continuous frame sections constituting the process target section; generating, by the processing circuitry, low frequency encoded data by encoding a low frequency signal of the input signal; generating, by the processing circuitry, an output code string by multiplexing the data and the low frequency encoded data, the output code string being representative of the input audio signal; and outputting, by the processing circuitry, the output code string.
-
-
11. A computer-readable storage device encoded with computer-executable instructions that, when executed by processing circuitry, perform an encoding method comprising:
-
receiving an input audio signal; generating a low frequency sub-band signal of a sub-band on a low frequency side of the input audio signal and a high frequency sub-band signal of a sub-band on a high frequency side of the input audio signal; calculating a quasi-high frequency sub-band power that is an estimated value of a high frequency sub-band power of the high frequency sub-band signal based on the low frequency sub-band signal and a predetermined estimation coefficient; calculating a number-of-sections determining feature amount by calculating a sub-band power sum of the power of the sub-band signal of the sub-bands on the high frequency side of the input signal, wherein the sub-band power sum is an estimated bandwidth of a frame to be processed; determining the number of continuous frame sections including frames for which the same estimation coefficient is selected in a process target section including a plurality of frames of the input signal, based on the number-of-sections determining feature amount; selecting the estimation coefficient of a frame that constitutes the continuous frame section from a plurality of estimation coefficients based on the quasi-high frequency sub-band power and the high frequency sub-band power in each continuous frame section obtained by dividing the process target section based on the determined number of continuous frame sections; generating data for obtaining the estimation coefficient selected in a frame of each of the continuous frame sections constituting the process target section; generating low frequency encoded data by encoding a low frequency signal of the input signal; generating an output code string by multiplexing the data and the low frequency encoded data, the output code string being representative of the input audio signal; and outputting the output code string.
-
-
12. A decoding device, comprising:
-
processing circuitry configured to perform a process including; receiving an input code string representative of an audio signal; demultiplexing the input code string into data for obtaining an estimation coefficient selected in a frame of each continuous frame section constituting a process target section, which is generated based on a result of calculating an estimated value of a high frequency sub-band power of a high frequency sub-band signal of the audio signal based on a low frequency sub-band signal of the audio signal and a predetermined estimation coefficient, determining the number of continuous frame sections including frames for which the same estimation coefficient is selected in the process target section including a plurality of frames of the audio signal based on a number-of-sections determining feature amount extracted from the audio signal, wherein the number-of-sections determining feature amount is calculated by calculating a sub-band power sum of the power of the sub-band signal of the sub-bands on the high frequency side of the input signal, wherein the sub-band power sum is an estimated bandwidth of a frame to be processed, and selecting the estimation coefficient of a frame constituting the continuous frame section from a plurality of estimation coefficients based on the estimated value and the high frequency sub-band power in each of the continuous frame sections obtained by dividing the process target section based on the determined number of continuous frame sections, and low frequency encoded data obtained by encoding a low frequency signal of the input signal; decoding the low frequency encoded data to generate a low frequency signal; generating a high frequency signal based on the estimation coefficient obtained from the data and the low frequency signal obtained from the decoding; generating the audio signal based on the high frequency signal and the low frequency signal obtained from the decoding; and outputting the audio signal. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A decoding method, comprising:
-
receiving, by processing circuitry, an input code string representative of an audio signal; demultiplexing, by the processing circuitry, the input code string into data for obtaining an estimation coefficient selected in a frame of each continuous frame section constituting a process target section, which is generated based on a result of calculating an estimated value of a high frequency sub-band power of a high frequency sub-band signal of the audio signal based on a low frequency sub-band signal of the audio signal and a predetermined estimation coefficient, determining the number of continuous frame sections including frames for which the same estimation coefficient is selected in the process target section including a plurality of frames of the audio signal based on a number-of-sections determining feature amount extracted from the audio signal, wherein the number-of-sections determining feature amount is calculated by calculating a sub-band power sum of the power of the sub-band signal of the sub-bands on the high frequency side of the input signal, wherein the sub-band power sum is an estimated bandwidth of a frame to be processed, and selecting the estimation coefficient of a frame constituting the continuous frame section from a plurality of estimation coefficients based on the estimated value and the high frequency sub-band power in each of the continuous frame sections obtained by dividing the process target section based on the determined number of continuous frame sections, and low frequency encoded data obtained by encoding a low frequency signal of the input signal; generating, by the processing circuitry, a low frequency signal by decoding the low frequency encoded data; generating, by the processing circuitry, a high frequency signal based on the estimation coefficient obtained from the data and the low frequency signal obtained from the decoding; generating, by the processing circuitry, the audio signal based on the high frequency signal and the low frequency signal obtained from the decoding; and outputting, by the processing circuitry, the audio signal.
-
-
18. A computer-readable storage device encoded with computer-executable instructions that, when executed by processing circuitry, perform an encoding method comprising:
-
receiving an input code string representative of an audio signal; demultiplexing the input code string into data for obtaining an estimation coefficient selected in a frame of each continuous frame section constituting a process target section, which is generated based on a result of calculating an estimated value of a high frequency sub-band power of a high frequency sub-band signal of the audio signal based on a low frequency sub-band signal of the audio signal and a predetermined estimation coefficient, determining the number of continuous frame sections including frames for which the same estimation coefficient is selected in the process target section including a plurality of frames of the audio signal based on a number-of-sections determining feature amount extracted from the audio signal, wherein the number-of-sections determining feature amount is calculated by calculating a sub-band power sum of the power of the sub-band signal of the sub-bands on the high frequency side of the input signal, wherein the sub-band power sum is an estimated bandwidth of a frame to be processed, and selecting the estimation coefficient of a frame constituting the continuous frame section from a plurality of estimation coefficients based on the estimated value and the high frequency sub-band power in each of the continuous frame sections obtained by dividing the process target section based on the determined number of continuous frame sections, and low frequency encoded data obtained by encoding a low frequency signal of the input signal; generating a low frequency signal by decoding the low frequency encoded data; generating a high frequency signal based on the estimation coefficient obtained from the data and the low frequency signal obtained from the decoding; generating the audio signal based on the high frequency signal and the low frequency signal obtained from the decoding; and outputting the audio signal.
-
Specification