Methods for reconstructing an audio signal
First Claim
1. A computer-implemented method, comprising:
- receiving input audio data comprising a plurality of audio samples;
detecting distortion in a first portion of the input audio data associated with a first period of time, the distortion caused by at least one of the plurality of audio samples missing from the input audio data or a magnitude value of one or more of the plurality of audio samples being equal to a saturation threshold value;
determining that a second portion of the input audio data following the first portion is not distorted, the second portion corresponding to a second period of time that begins at a first time;
performing, based on a magnitude of signal values of the input audio data, a quantization process to generate first audio data by mapping the signal values of the input audio data to discrete states corresponding to respective quantization intervals;
generating, based on the first audio data, two or more first audio data predictions corresponding to at least part of the first period of time, the two or more first audio data predictions determined using a first generative model that receives the first audio data as input features and predicts a magnitude of signal values for audio samples recursively in a first direction in time;
determining a first audio sample in the first audio data corresponding to the first time;
determining a magnitude value associated with the first audio sample;
selecting, based on at least the magnitude value associated with the first audio sample, a first data prediction of the two or more first audio data predictions;
generating, based on the first data prediction, second audio data corresponding to at least part of the first period of time;
generating, based on at least the first audio data and the second audio data, output audio data, the output audio data including the second audio data followed by a third portion of the first audio data that includes the first audio sample; and
doing at least one of (a) causing audio corresponding to the output audio data to be output by at least one speaker, or (b) causing a function corresponding to a voice command represented by the output audio data to be executed.
1 Assignment
0 Petitions
Accused Products
Abstract
A system configured to reconstruct audio signals. The system may identify missing audio samples due to packet loss or detect distortion caused by audio clipping and may reconstruct the audio data. The system may employ a forward-looking neural network that recursively predicts audio samples based on previous audio samples and/or a backward-looking neural network that recursively predicts audio samples based on subsequent audio samples. The system may generate audio data using only the forward-looking neural network for low latency applications or may generate audio data using both neural networks for mid to high latency applications. To reduce distortion in output audio data, the system may generate the audio data by cross-fading between outputs of the neural networks and/or may cross-fade between the generated audio data and the input audio data.
14 Citations
23 Claims
-
1. A computer-implemented method, comprising:
-
receiving input audio data comprising a plurality of audio samples; detecting distortion in a first portion of the input audio data associated with a first period of time, the distortion caused by at least one of the plurality of audio samples missing from the input audio data or a magnitude value of one or more of the plurality of audio samples being equal to a saturation threshold value; determining that a second portion of the input audio data following the first portion is not distorted, the second portion corresponding to a second period of time that begins at a first time; performing, based on a magnitude of signal values of the input audio data, a quantization process to generate first audio data by mapping the signal values of the input audio data to discrete states corresponding to respective quantization intervals; generating, based on the first audio data, two or more first audio data predictions corresponding to at least part of the first period of time, the two or more first audio data predictions determined using a first generative model that receives the first audio data as input features and predicts a magnitude of signal values for audio samples recursively in a first direction in time; determining a first audio sample in the first audio data corresponding to the first time; determining a magnitude value associated with the first audio sample; selecting, based on at least the magnitude value associated with the first audio sample, a first data prediction of the two or more first audio data predictions; generating, based on the first data prediction, second audio data corresponding to at least part of the first period of time; generating, based on at least the first audio data and the second audio data, output audio data, the output audio data including the second audio data followed by a third portion of the first audio data that includes the first audio sample; and doing at least one of (a) causing audio corresponding to the output audio data to be output by at least one speaker, or (b) causing a function corresponding to a voice command represented by the output audio data to be executed. - View Dependent Claims (2, 3, 4)
-
-
5. A computer-implemented method, comprising:
-
receiving input audio data comprising a plurality of audio samples; detecting distortion in a first portion of the input audio data associated with a first period of time; determining that a second portion of the input audio data following the first portion is not distorted, the second portion corresponding to a second period of time that begins at a first time; performing a quantization process on the input audio data to generate first audio data by mapping signal values of the input audio data to discrete states corresponding to respective quantization intervals; generating, based on the first audio data, two or more first audio data predictions corresponding to at least part of the first period of time, the two or more first audio data predictions determined using a first generative model that receives the first audio data as input features and predicts audio samples recursively in a first direction in time; generating, based on the two or more first audio data predictions, second audio data corresponding to at least part of the first period of time; generating, based on at least the first audio data and the second audio data, output audio data; and doing at least one of (a) causing audio corresponding to the output audio data to be output by at least one speaker, or (b) causing a function corresponding to a voice command represented by the output audio data to be executed. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system comprising:
-
at least one processor; and memory including instructions operable to be executed by the at least one processor to perform a set of actions to configure the system device to; receive input audio data comprising a plurality of audio samples; detect distortion in a first portion of the input audio data associated with a first period of time; determine that a second portion of the input audio data following the first portion is not distorted, the second portion corresponding to a second period of time that begins at a first time; perform a quantization process on the input audio data to generate first audio data by mapping signal values of the input audio data to discrete states corresponding to respective quantization intervals; generate, based on the first audio data, two or more first audio data predictions corresponding to at least part of the first period of time, the two or more first audio data predictions determined using a first generative model that receives the first audio data as input features and predicts audio samples recursively in a first direction in time; generate, based on the two or more first audio data predictions, second audio data corresponding to at least part of the first period of time; generate, based on at least the first audio data and the second audio data, output audio data; and do at least one of (a) cause audio corresponding to the output audio data to be output by at least one speaker, or (b) cause a function corresponding to a voice command represented by the output audio data to be executed. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23)
-
Specification