Speech enhancement with low-order non-negative matrix factorization
First Claim
1. A method performed by a computing device for enhancing speech, the method comprising:
- accessing multiple dictionaries of dictionary atoms, the dictionaries being generated from clean speech samples by performing a non-negative matrix factorization (“
NMF”
) of frequency-domain (“
FD”
) clean speech sample representations of the clean speech samples, each NMF having a unique initialization, wherein each of the multiple dictionaries comprises a reduced number of dictionary atoms to conserve processing power;
receiving noisy speech;
generating a FD noisy speech representation of the noisy speech;
for each of the multiple dictionaries, generating a FD clean speech representation corresponding to the FD noisy speech representation by performing a NMF of the FD noisy speech representation based on the dictionary atoms of the dictionaries;
generating an enhanced FD clean speech representation of the noisy speech by combining the FD clean speech representations generated using each dictionary with the reduced number of dictionary atoms, the combining includes averaging the FD clean speech representations; and
converting the enhanced FD clean speech representation into clean speech that represents an enhancement of the noisy speech.
1 Assignment
0 Petitions
Accused Products
Abstract
A system is provided that employs a statistical approach to semi-supervised speech enhancement with a low-order non-negative matrix factorization (“NMF”). The system enhances noisy speech based on multiple dictionaries with dictionary atoms derived from the same clean speech samples and generates an enhanced speech representation of the noisy speech by combining, for each dictionary, a clean speech representation of the noisy speech generated based on a NMF using the dictionary atoms of the dictionary. The system generates frequency-domain (“FD”) clean speech sample representations of the clean speech samples, for example, using a Fourier transform. To generate each dictionary, the system generates a dictionary-unique initialization of the dictionary atoms and the activations and performs a NMF of the FD clean speech samples.
-
Citations
22 Claims
-
1. A method performed by a computing device for enhancing speech, the method comprising:
-
accessing multiple dictionaries of dictionary atoms, the dictionaries being generated from clean speech samples by performing a non-negative matrix factorization (“
NMF”
) of frequency-domain (“
FD”
) clean speech sample representations of the clean speech samples, each NMF having a unique initialization, wherein each of the multiple dictionaries comprises a reduced number of dictionary atoms to conserve processing power;receiving noisy speech; generating a FD noisy speech representation of the noisy speech; for each of the multiple dictionaries, generating a FD clean speech representation corresponding to the FD noisy speech representation by performing a NMF of the FD noisy speech representation based on the dictionary atoms of the dictionaries; generating an enhanced FD clean speech representation of the noisy speech by combining the FD clean speech representations generated using each dictionary with the reduced number of dictionary atoms, the combining includes averaging the FD clean speech representations; and converting the enhanced FD clean speech representation into clean speech that represents an enhancement of the noisy speech. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computing system for enhancing speech, the computing system comprising:
-
one or more computer-readable storage media storing computer-executable instructions that, when executed, cause the computing system to; access multiple dictionaries of dictionary atoms; receive a frequency-domain (“
FD”
) noisy speech representation of noisy speech;for each of the multiple dictionaries, generate a FD clean speech representation corresponding to the FD noisy speech representation by performing a non-negative matrix factorization (“
NMF”
) of the FD noisy speech representation based on the dictionary atoms of the dictionary, wherein each of the multiple dictionaries comprises a reduced number of dictionary atoms to conserve processing power; andgenerate an enhanced FD clean speech representation by combining the FD clean speech representations generated using each dictionary with the reduced number of dictionary atoms, the combining includes averaging the FD clean speech representations; and one or more processors for executing the computer-executable instructions stored in the one or more computer-readable storage media. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method performed by a computing device for enhancing speech, the method comprising:
-
receiving noisy speech; generating a FD noisy speech representation of the noisy speech; for each of multiple dictionaries, generating a FD clean speech representation corresponding to the FD noisy speech representation by performing a NMF of the FD noisy speech representation based on dictionary atoms of the dictionary, wherein each dictionary represents a different NMF based on the same clean speech samples, and wherein each of the multiple dictionaries comprises a reduced number of dictionary atoms to conserve processing power; generating an enhanced FD clean speech representation of the noisy speech by combining the generated FD clean speech representations generated using each dictionary with the reduced number of dictionary atoms, the combining includes averaging the FD clean speech representations by iteratively performing the following steps until each dictionary has been selected; selecting a dictionary of the multiple dictionaries of dictionary atoms; obtaining a non-negative maximum a posté
riori probability estimate of a time-frequency component; andgenerating a running total of the FD clean speech representations; and dividing the running total by a number of the multiple dictionaries to generate the enhanced FD clean speech representation; and converting the enhanced FD clean speech representation into clean speech that represents an enhancement of the noisy speech. - View Dependent Claims (19, 20, 21, 22)
-
Specification