Noise suppression for speech processing based on machine-learning mask estimation

US 9,640,194 B1
Filed: 10/04/2013
Issued: 05/02/2017
Est. Priority Date: 10/04/2012
Status: Active Grant

First Claim

Patent Images

1. A method for noise suppression, comprising:

receiving, by a first processor communicatively coupled with a first memory, first noisy speech, the first noisy speech obtained using two or more microphones;

extracting, by the first processor, one or more first cues from the first noisy speech, the one or more first cues including cues associated with noise suppression and automatic speech processing; and

creating clean automatic speech processing features using a mapping and the extracted one or more first cues, the clean automatic speech processing features being for use in automatic speech processing and the mapping being provided by a process including;

receiving, by a second processor communicatively coupled with a second memory, clean speech and noise;

producing, by the second processor, second noisy speech using the clean speech and the noise;

extracting, by the second processor, one or more second cues from the second noisy speech, the one or more second cues including cues associated with noise suppression and noisy automatic speech processing;

extracting clean automatic speech processing cues from the clean speech; and

generating, by the second processor, the mapping from the one or more second cues to the clean automatic speech processing cues, the generating including at least one machine-learning technique.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Described are noise suppression techniques applicable to various systems including automatic speech processing systems in digital audio pre-processing. The noise suppression techniques utilize a machine-learning framework trained on cues pertaining to reference clean and noisy speech signals, and a corresponding synthetic noisy speech signal combining the clean and noisy speech signals. The machine-learning technique is further used to process audio signals in real time by extracting and analyzing cues pertaining to noisy speech to dynamically generate an appropriate gain mask, which may eliminate the noise components from the input audio signal. The audio signal pre-processed in such a manner may be applied to an automatic speech processing engine for corresponding interpretation or processing. The machine-learning technique may enable extraction of cues associated with clean automatic speech processing features, which may be used by the automatic speech processing engine for various automatic speech processing.

Citations

18 Claims

1. A method for noise suppression, comprising:
- receiving, by a first processor communicatively coupled with a first memory, first noisy speech, the first noisy speech obtained using two or more microphones;
  
  extracting, by the first processor, one or more first cues from the first noisy speech, the one or more first cues including cues associated with noise suppression and automatic speech processing; and
  
  creating clean automatic speech processing features using a mapping and the extracted one or more first cues, the clean automatic speech processing features being for use in automatic speech processing and the mapping being provided by a process including;
  
  receiving, by a second processor communicatively coupled with a second memory, clean speech and noise;
  
  producing, by the second processor, second noisy speech using the clean speech and the noise;
  
  extracting, by the second processor, one or more second cues from the second noisy speech, the one or more second cues including cues associated with noise suppression and noisy automatic speech processing;
  
  extracting clean automatic speech processing cues from the clean speech; and
  
  generating, by the second processor, the mapping from the one or more second cues to the clean automatic speech processing cues, the generating including at least one machine-learning technique.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 18)
- - 2. The method of claim 1, wherein the automatic speech processing comprises automatic speech recognition.
  - 3. The method of claim 1, wherein the automatic speech processing comprises one or more of automatic speech recognition, language recognition, keyword recognition, speech confirmation, emotion detection, voice sensing, and speaker recognition.
  - 4. The method of claim 1, wherein receiving, by the second processor, the clean speech and the noise comprises receiving predetermined reference clean speech and predetermined reference noise from a reference database.
  - 5. The method of claim 1, wherein the clean speech and noise are each obtained using at least two microphones, the one or more first and second cues each including at least one inter-microphone level difference (ILD) cues and inter-microphone phase difference (IPD) cues.
  - 6. The method of claim 4, wherein the automatic speech processing comprises one or more of automatic speech recognition, language recognition, keyword recognition, speech confirmation, emotion detection, voice sensing, and speaker recognition.
  - 7. The method of claim 1, wherein the one or more first cues and the one or more second cues each further include at least one of energy at channel cues, voice activity detection (VAD) cues, spatial cues, frequency cues, Wiener gain mask estimates, pitch-based cues, periodicity-based cues, noise estimates, and context cues.
  - 8. The method of claim 1, wherein the at least one machine-learning technique includes one or more of a neural network, regression tree, a nonlinear transform, a linear transform, and a Gaussian Mixture Model (GMM).
  - 9. The method of claim 1, wherein the generating applies the at least one machine-learning technique to the clean speech and the second noisy speech.
  - 18. The method of claim 1, wherein the first processor communicatively coupled with the first memory are included in a cloud-based computing environment.

10. A system for noise suppression, comprising:
- a first frequency analysis module, executed by at least one processor, that is configured to receive first noisy speech, the first noisy speech being each obtained using at least two microphones;
  
  a second frequency analysis module, executed by the at least one processor, that is configured to receive clean speech and noise;
  
  a combination module, executed by the at least one processor, that is configured to produce second noisy speech using the clean speech and the noise;
  
  a first cue extraction module, executed by the at least one processor, that is configured to extract one or more first cues from the first noisy speech, the one or more first cues including cues associated with noise suppression and automatic speech processing;
  
  a second cue extraction module, executed by the at least one processor, that is configured to extract one or more second cues from the second noisy speech, the one or more second cues including cues associated with noise suppression and noisy automatic speech processing;
  
  a third cue extraction module, executed by the at least one processor, that is configured to extract clean automatic speech processing cues from the clean speech; and
  
  a learning module, executed by the at least one processor, that is configured to generate a mapping from the one or more second cues associated with the noise suppression cues and the noisy automatic speech processing cues to the clean automatic speech processing cues, the generating including at least one machine-learning technique; and
  
  a modification module, executed by the at least one processor, that is configured to create clean automatic speech processing features using the mapping and the extracted one or more first cues, the clean automatic speech processing features being for use in automatic speech processing.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The system of claim 10, wherein the automatic speech processing comprises automatic speech recognition.
  - 12. The system of claim 10, wherein the automatic speech processing comprises one or more of automatic speech recognition, language recognition, keyword recognition, speech confirmation, emotion detection, voice sensing, and speaker recognition.
  - 13. The system of claim 10, wherein the second frequency analysis module is configured to receive the clean speech and the noise from a reference database, the clean speech and noise being predetermined reference clean speech and predetermined reference noise.
  - 14. The system of claim 10, wherein the at least one machine-learning technique includes one or more of a neural network, regression tree, a non-linear transform, a linear transform, and a Gaussian Mixture Model (GMM).
  - 15. The system of claim 10, wherein the one or more first cues and the one or more second cues each include at least one of ILD cues and IPD cues.
  - 16. The system of claim 10, wherein the one or more first cues and the one or more second cues each include at least one of energy at channel cues, VAD cues, spatial cues, frequency cues, Wiener gain mask estimates, pitch-based cues, periodicity-based cues, noise estimates, and context cues.
  - 17. The system of claim 14, wherein the at least one machine-learning techniques each include one or more of a neural network, regression tree, a non-linear transform, a linear transform, and a GMM.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Samsung Electronics Co. Ltd.
Original Assignee
Knowles Electronics Llc (Knowles Corporation)
Inventors
Nemala, Sridhar Krishna, Laroche, Jean
Primary Examiner(s)
Pham, Thierry L

Application Number

US14/046,551
Time in Patent Office

1,306 Days
Field of Search

704226, 704231, 704233, 704246
US Class Current
CPC Class Codes

G10L 2021/02165 Two microphones, one receiv...

G10L 21/0232 Processing in the frequency...

Noise suppression for speech processing based on machine-learning mask estimation

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Noise suppression for speech processing based on machine-learning mask estimation

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links