Method and apparatus for exemplary morphing computer system background
First Claim
Patent Images
1. A system for morphing diphones of a source voice of a source speaker into a target voice of a target speaker, the system comprising:
- a database storing a plurality of diphones;
an automated speech recognizer (ASR) configured to create a list of phonemes from the source voice of the source speaker;
a pitch extractor configured to extract the pitch from the source speech of the source speaker, wherein the ASR and the pitch extractor are configured to convert the source voice of the source speaker into a sequence of diphones based on the list of phonemes and the pitch; and
a unit selector configured to select, for each of diphones in the sequence of diphones, a best matching diphone from among candidate diphones in the database based on;
a quality of a label match between a phonetic transcription of the diphone to phonetic transcriptions of the candidate diphones determined based on a summation of consonant distances between the diphone and the candidate diphones and vowel distances between the diphone and the candidate diphones,differences between a pitch contour of the diphone to pitch contours of the candidate diphones,differences between a duration of the diphone and durations of the candidate diphones,differences between a plurality of formants of a preceding diphone that precedes the diphone and corresponding pluralities of formants of the candidate diphones, anddifferences between a pitch of the diphone and pitches of the candidate diphones.
1 Assignment
0 Petitions
Accused Products
Abstract
Method and apparatus for reducing a size of databases required for recorded speech data.
-
Citations
34 Claims
-
1. A system for morphing diphones of a source voice of a source speaker into a target voice of a target speaker, the system comprising:
-
a database storing a plurality of diphones; an automated speech recognizer (ASR) configured to create a list of phonemes from the source voice of the source speaker; a pitch extractor configured to extract the pitch from the source speech of the source speaker, wherein the ASR and the pitch extractor are configured to convert the source voice of the source speaker into a sequence of diphones based on the list of phonemes and the pitch; and a unit selector configured to select, for each of diphones in the sequence of diphones, a best matching diphone from among candidate diphones in the database based on; a quality of a label match between a phonetic transcription of the diphone to phonetic transcriptions of the candidate diphones determined based on a summation of consonant distances between the diphone and the candidate diphones and vowel distances between the diphone and the candidate diphones, differences between a pitch contour of the diphone to pitch contours of the candidate diphones, differences between a duration of the diphone and durations of the candidate diphones, differences between a plurality of formants of a preceding diphone that precedes the diphone and corresponding pluralities of formants of the candidate diphones, and differences between a pitch of the diphone and pitches of the candidate diphones. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A method of morphing diphones of a source voice of a source speaker into a target voice of a target speaker, the method comprising:
-
storing a plurality of diphones in a database; creating, by an automated speech recognizer (ASR), a list of phonemes from the source voice of the source speaker; extracting, by a pitch extractor, the pitch from the source speech of the source speaker; converting the source voice of the source speaker into a sequence of diphones based on the list of phonemes and the pitch; and selecting, for each of diphones in the sequence of diphones, a best matching diphone from among candidate diphones in the database based on; a quality of a label match between a phonetic transcription of the diphone to phonetic transcriptions of the candidate diphones determined based on a summation of consonant distances between the diphone and the candidate diphones and vowel distances between the diphone and the candidate diphones, differences between a pitch contour of the diphone to pitch contours of the candidate diphones, differences between a duration of the diphone and durations of the candidate diphones, differences between a plurality of formants of a preceding diphone that precedes the diphone and corresponding pluralities of formants of the candidate diphones, and differences between a pitch of the diphone and pitches of the candidate diphones. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
-
Specification