Systems and methods for restoration of speech components

US 9,978,388 B2
Filed: 09/11/2015
Issued: 05/22/2018
Est. Priority Date: 09/12/2014
Status: Active Grant

First Claim

Patent Images

1. A method for restoring speech components of an audio signal, the method comprising:

receiving an audio signal after it has been processed for noise suppression;

determining distorted frequency regions and undistorted frequency regions in the received audio signal that has been processed for noise suppression, the distorted frequency regions including regions of the audio signal in which speech distortion is present due to the noise suppression processing; and

performing one or more iterations using a model to generate predictions of a restored version of the audio signal, the model being configured to modify the audio signal so as to restore the speech components in the distorted frequency regions.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for restoring distorted speech components of an audio signal distorted by a noise reduction or a noise cancellation includes determining distorted frequency regions and undistorted frequency regions in the audio signal. The distorted frequency regions include regions of the audio signal in which a speech distortion is present. Iterations are performed using a model to refine predictions of the audio signal at distorted frequency regions. The model is configured to modify the audio signal and may include deep neural network trained using spectral envelopes of clean or undamaged audio signals. Before each iteration, the audio signal at the undistorted frequency regions is restored to values of the audio signal prior to the first iteration; while the audio signal at distorted frequency regions is refined starting from zero at the first iteration. Iterations are ended when discrepancies of audio signal at undistorted frequency regions meet pre-defined criteria.

Citations

20 Claims

1. A method for restoring speech components of an audio signal, the method comprising:
- receiving an audio signal after it has been processed for noise suppression;
  
  determining distorted frequency regions and undistorted frequency regions in the received audio signal that has been processed for noise suppression, the distorted frequency regions including regions of the audio signal in which speech distortion is present due to the noise suppression processing; and
  
  performing one or more iterations using a model to generate predictions of a restored version of the audio signal, the model being configured to modify the audio signal so as to restore the speech components in the distorted frequency regions.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein the audio signal is obtained by at least one of a noise reduction or a noise cancellation of an acoustic signal including speech.
  - 3. The method of claim 2, wherein the speech components are attenuated or eliminated at the distorted frequency regions by the at least one of the noise reduction or the noise cancellation.
  - 4. The method of claim 1, wherein the model includes a deep neural network trained using spectral envelopes of clean audio signals or undamaged audio signals.
  - 5. The method of claim 1, wherein the iterations are performed so as to further refine the predictions used for restoring speech components in the distorted frequency regions.
  - 6. The method of claim 1, wherein the audio signal at the distorted frequency regions is set to zero before a first of the one or more iterations.
  - 7. The method of claim 1, wherein prior to performing each of the one or more iterations, the restored version of the audio signal at the undistorted frequency regions is reset to values of the audio signal before the first of the one or more iterations.
  - 8. The method of claim 1, further comprising after performing each of the one or more iterations comparing the restored version of the audio signal with the audio signal at the undistorted frequency regions before and after the one or more iterations to determine discrepancies.
  - 9. The method of claim 8, further comprising ending the one or more iterations if the discrepancies meet pre-determined criteria.
  - 10. The method of claim 9, wherein the pre-determined criteria are defined by low and upper bounds of energies of the audio signal.

11. A system for restoring speech components of an audio signal, the system comprising:
- at least one processor; and
  
  a memory communicatively coupled with the at least one processor, the memory storing instructions, which when executed by the at least one processor performs a method comprising;
  
  receiving an audio signal after it has been processed for noise suppression;
  
  determining distorted frequency regions and undistorted frequency regions in the received audio signal that has been processed for noise suppression, the distorted frequency regions including regions of the audio signal in which speech distortion is present due to the noise suppression processing; and
  
  performing one or more iterations using a model to generate predictions of a restored version of the audio signal, the model being configured to modify the audio signal so as to restore the speech components in the distorted frequency regions.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The system of claim 11, wherein the audio signal is obtained by at least one of a noise reduction or a noise cancellation of an acoustic signal including speech.
  - 13. The system of claim 12, wherein the speech components are attenuated or eliminated at the distorted frequency regions by the at least one of the noise reduction or the noise cancellation.
  - 14. The system of claim 11, wherein the model includes a deep neural network.
  - 15. The system of claim 14, wherein the deep neural network is trained using spectral envelopes of clean audio signals or undamaged audio signals.
  - 16. The system of claim 15, wherein the audio signal at the distorted frequency regions are set to zero before a first of the one or more iterations.
  - 17. The system of claim 11, wherein before performing each of the one or more iterations, the restored version of the audio signal at the undistorted frequency regions is reset to values before the first of the one or more iterations.
  - 18. The system of claim 11, further comprising, after performing each of the one or more iterations, comparing the restored version of the audio signal with the audio signal at the undistorted frequency regions before and after the one or more iterations to determine discrepancies.
  - 19. The system of claim 18, further comprising ending the one or more iterations if the discrepancies meet pre-determined criteria, the pre-determined criteria being defined by low and upper bounds of energies of the audio signal.

20. A non-transitory computer-readable storage medium having embodied thereon instructions, which when executed by at least one processor, perform steps of a method, the method comprising:
- receiving an audio signal after it has been processed for noise suppression;
  
  determining distorted frequency regions and undistorted frequency regions in the received audio signal that has been processed for noise suppression, the distorted frequency regions including regions of the audio signal in which speech distortion is present due to the noise suppression processing; and
  
  performing one or more iterations using a model to refine predictions of the audio signal at the distorted frequency regions, the model being configured to modify the audio signal so as to restore speech components in the distorted frequency regions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Knowles Electronics Llc (Knowles Corporation)
Inventors
Avendano, Carlos, Woodruff, John
Primary Examiner(s)
Riley, Marcus T

Application Number

US14/852,446
Publication Number

US 20160078880A1
Time in Patent Office

984 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 21/02   Speech enhancement, e.g. no...

G10L 21/0208   Noise filtering

G10L 21/038   using band spreading techni...

G10L 25/30   using neural networks

Systems and methods for restoration of speech components

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for restoration of speech components

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links