Noise suppression for speech processing based on machine-learning mask estimation
First Claim
1. A method for noise suppression, comprising:
- receiving, by a first processor communicatively coupled with a first memory, first noisy speech, the first noisy speech obtained using two or more microphones;
extracting, by the first processor, one or more first cues from the first noisy speech, the one or more first cues including cues associated with noise suppression and automatic speech processing; and
creating clean automatic speech processing features using a mapping and the extracted one or more first cues, the clean automatic speech processing features being for use in automatic speech processing and the mapping being provided by a process including;
receiving, by a second processor communicatively coupled with a second memory, clean speech and noise;
producing, by the second processor, second noisy speech using the clean speech and the noise;
extracting, by the second processor, one or more second cues from the second noisy speech, the one or more second cues including cues associated with noise suppression and noisy automatic speech processing;
extracting clean automatic speech processing cues from the clean speech; and
generating, by the second processor, the mapping from the one or more second cues to the clean automatic speech processing cues, the generating including at least one machine-learning technique.
5 Assignments
0 Petitions
Accused Products
Abstract
Described are noise suppression techniques applicable to various systems including automatic speech processing systems in digital audio pre-processing. The noise suppression techniques utilize a machine-learning framework trained on cues pertaining to reference clean and noisy speech signals, and a corresponding synthetic noisy speech signal combining the clean and noisy speech signals. The machine-learning technique is further used to process audio signals in real time by extracting and analyzing cues pertaining to noisy speech to dynamically generate an appropriate gain mask, which may eliminate the noise components from the input audio signal. The audio signal pre-processed in such a manner may be applied to an automatic speech processing engine for corresponding interpretation or processing. The machine-learning technique may enable extraction of cues associated with clean automatic speech processing features, which may be used by the automatic speech processing engine for various automatic speech processing.
-
Citations
18 Claims
-
1. A method for noise suppression, comprising:
-
receiving, by a first processor communicatively coupled with a first memory, first noisy speech, the first noisy speech obtained using two or more microphones; extracting, by the first processor, one or more first cues from the first noisy speech, the one or more first cues including cues associated with noise suppression and automatic speech processing; and creating clean automatic speech processing features using a mapping and the extracted one or more first cues, the clean automatic speech processing features being for use in automatic speech processing and the mapping being provided by a process including; receiving, by a second processor communicatively coupled with a second memory, clean speech and noise; producing, by the second processor, second noisy speech using the clean speech and the noise; extracting, by the second processor, one or more second cues from the second noisy speech, the one or more second cues including cues associated with noise suppression and noisy automatic speech processing; extracting clean automatic speech processing cues from the clean speech; and generating, by the second processor, the mapping from the one or more second cues to the clean automatic speech processing cues, the generating including at least one machine-learning technique. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 18)
-
-
10. A system for noise suppression, comprising:
-
a first frequency analysis module, executed by at least one processor, that is configured to receive first noisy speech, the first noisy speech being each obtained using at least two microphones; a second frequency analysis module, executed by the at least one processor, that is configured to receive clean speech and noise; a combination module, executed by the at least one processor, that is configured to produce second noisy speech using the clean speech and the noise; a first cue extraction module, executed by the at least one processor, that is configured to extract one or more first cues from the first noisy speech, the one or more first cues including cues associated with noise suppression and automatic speech processing; a second cue extraction module, executed by the at least one processor, that is configured to extract one or more second cues from the second noisy speech, the one or more second cues including cues associated with noise suppression and noisy automatic speech processing; a third cue extraction module, executed by the at least one processor, that is configured to extract clean automatic speech processing cues from the clean speech; and a learning module, executed by the at least one processor, that is configured to generate a mapping from the one or more second cues associated with the noise suppression cues and the noisy automatic speech processing cues to the clean automatic speech processing cues, the generating including at least one machine-learning technique; and a modification module, executed by the at least one processor, that is configured to create clean automatic speech processing features using the mapping and the extracted one or more first cues, the clean automatic speech processing features being for use in automatic speech processing. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
Specification