PHONETIC POSTERIORGRAMS FOR MANY-TO-ONE VOICE CONVERSION
First Claim
1. A computer-implemented method comprising:
- obtaining a target speech;
obtaining a source speech;
generating a phonetic posteriorgram (PPG) based on acoustic features of the target speech, the PPG including a set of values corresponding to a range of times and a range of phonetic classes;
generating a mapping between the PPG and the acoustic features of the target speech; and
converting the source speech into a converted speech based on the PPG and the mapping.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for converting speech using phonetic posteriorgrams (PPGs). A target speech is obtained and a PPG is generated based on acoustic features of the target speech. Generating the PPG may include using a speaker-independent automatic speech recognition (SI-ASR) system for equalizing different speakers. The PPG includes a set of values corresponding to a range of times and a range of phonetic classes, the phonetic classes corresponding to senones. A mapping between the PPG and one or more segments of the target speech is generated. A source speech is obtained, and the source speech are converted into a converted speech based on the PPG and the mapping.
21 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
obtaining a target speech; obtaining a source speech; generating a phonetic posteriorgram (PPG) based on acoustic features of the target speech, the PPG including a set of values corresponding to a range of times and a range of phonetic classes; generating a mapping between the PPG and the acoustic features of the target speech; and converting the source speech into a converted speech based on the PPG and the mapping. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory computer-readable medium comprising instructions that, when executed by a processor, cause the processor to perform operations including:
-
obtaining a target speech; obtaining a source speech; generating a phonetic posteriorgram (PPG) based on acoustic features of the target speech, the PPG including a set of values corresponding to a range of times and a range of phonetic classes; generating a mapping between the PPG and the acoustic features of the target speech; and converting the source speech into a converted speech based on the PPG and the mapping. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A system comprising:
-
a processor; a computer-readable medium in data communication with the processor, the computer-readable medium comprising instructions that, when executed by the processor, cause the processor to perform operations including; obtaining a target speech; obtaining a source speech; generating a phonetic posteriorgram (PPG) based on acoustic features of the target speech, the PPG including a set of values corresponding to a range of times and a range of phonetic classes; generating a mapping between the PPG and the acoustic features of the target speech; and converting the source speech into a converted speech based on the PPG and the mapping. - View Dependent Claims (17, 18, 19, 20)
-
Specification