Speech signal separation and synthesis based on auditory scene analysis and speech modeling
First Claim
1. A method for generating clean speech from a mixture of noise and speech, the method comprising:
- deriving speech parameters, based on the mixture of noise and speech and a model of speech, the deriving using at least one hardware processor, wherein the deriving speech parameters comprises;
performing one or more spectral analyses on the mixture of noise and speech to generate one or more spectral representations;
deriving, based on the one or more spectral representations, feature data;
grouping target speech features in the feature data according to the model of speech;
separating the target speech features from the feature data; and
generating, based at least partially on the target speech features, the speech parameters; and
synthesizing, based at least partially on the speech parameters, clean speech.
5 Assignments
0 Petitions
Accused Products
Abstract
Provided are systems and methods for generating clean speech from a speech signal representing a mixture of a noise and speech. The clean speech may be generated from synthetic speech parameters. The synthetic speech parameters are derived based on the speech signal components and a model of speech using auditory and speech production principles. The modeling may utilize a source-filter structure of the speech signal. One or more spectral analyzes on the speech signal are performed to generate spectral representations. The feature data is derived based on a spectral representation. The features corresponding to the target speech according to a model of speech are grouped and separated from the feature data. The synthetic speech parameters, including spectral envelope, pitch data and voice classification data are generated based on features corresponding to the target speech.
-
Citations
20 Claims
-
1. A method for generating clean speech from a mixture of noise and speech, the method comprising:
-
deriving speech parameters, based on the mixture of noise and speech and a model of speech, the deriving using at least one hardware processor, wherein the deriving speech parameters comprises; performing one or more spectral analyses on the mixture of noise and speech to generate one or more spectral representations; deriving, based on the one or more spectral representations, feature data; grouping target speech features in the feature data according to the model of speech; separating the target speech features from the feature data; and generating, based at least partially on the target speech features, the speech parameters; and synthesizing, based at least partially on the speech parameters, clean speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for generating clean speech from a mixture of noise and speech, the system comprising:
-
one or more processors; and a memory communicatively coupled with the processor, the memory storing instructions which if executed by the one or more processors perform a method comprising; deriving speech parameters, based on the mixture of noise and speech and a model of speech, wherein the deriving speech parameters comprises; performing one or more spectral analyses on the mixture of noise and speech to generate one or more spectral representations; deriving, based on the one or more spectral representations, feature data; grouping target speech features in the feature data according to the model of speech; separating the target speech features from the feature data; and generating, based at least partially on the target speech features, the speech parameters; and synthesizing, based at least partially on the speech parameters, clean speech. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A non-transitory computer-readable storage medium having embodied thereon a program, the program being executable by a processor to perform a method for generating clean speech from a mixture of noise and speech, the method comprising:
-
deriving speech parameters, based on the mixture of noise and speech and a model of speech, via instructions stored in the memory and executed by the one or more processors, wherein the deriving speech parameters comprises; performing one or more spectral analyses on the mixture of noise and speech to generate one or more spectral representations; deriving, based on the one or more spectral representations, feature data; grouping target speech features in the feature data according to the model of speech; separating the target speech features from the feature data; and generating, based at least partially on the target speech features, the speech parameters; and synthesizing, based at least partially on the speech parameters, via instructions stored in the memory and executed by the one or more processors, clean speech.
-
Specification