Prosody Conversion
First Claim
1. A method comprising:
- (a) receiving data for a plurality of segments of a passage in a source voice, wherein the data for each segment of the plurality models a prosodic component of the source voice for that segment;
(b) identifying a target voice entry in a codebook for each of the source voice passage segments, wherein each of the identified target voice entries models a prosodic component of a target voice for a different segment of codebook training material, and wherein the codebook training material is substantially different from the passage; and
(c) generating a target voice version of the plurality of passage segments by altering the modeled source voice prosodic component for each segment to replicate the target voice prosodic component modeled by the target voice entry identified for that segment in (b).
6 Assignments
0 Petitions
Accused Products
Abstract
A contour for a syllable (or other speech segment) in a voice undergoing conversion is transformed. The transform of that contour is then used to identify one or more source syllable transforms in a codebook. Information regarding the context and/or linguistic features of the contour being converted can also be compared to similar information in the codebook when identifying an appropriate source transform. Once a codebook source transform is selected, an inverse transformation is performed on a corresponding codebook target transform to yield an output contour. The corresponding codebook target transform represents a target voice version of the same syllable represented by the selected codebook source transform. The output contour may be further processed to improve conversion quality.
38 Citations
35 Claims
-
1. A method comprising:
-
(a) receiving data for a plurality of segments of a passage in a source voice, wherein the data for each segment of the plurality models a prosodic component of the source voice for that segment; (b) identifying a target voice entry in a codebook for each of the source voice passage segments, wherein each of the identified target voice entries models a prosodic component of a target voice for a different segment of codebook training material, and wherein the codebook training material is substantially different from the passage; and (c) generating a target voice version of the plurality of passage segments by altering the modeled source voice prosodic component for each segment to replicate the target voice prosodic component modeled by the target voice entry identified for that segment in (b). - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A machine-readable medium having machine-executable instructions for performing a method comprising:
-
(a) receiving data for a plurality of segments of a passage in a source voice, wherein the data for each segment of the plurality models a prosodic component of the source voice for that segment; (b) identifying a target voice entry in a codebook for each of the source voice passage segments, wherein each of the identified target voice entries models a prosodic component of a target voice for a different segment of codebook training material, and wherein the codebook training material is substantially different from the passage; and (c) generating a target voice version of the plurality of passage segments by altering the modeled source voice prosodic component for each segment to replicate the target voice prosodic component modeled by the target voice entry identified for that segment in (b). - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A device, comprising:
one or more processors configured to perform a method, the method including (a) receiving data for a plurality of segments of a passage in a source voice, wherein the data for each segment of the plurality models a prosodic component of the source voice for that segment, (b) identifying a target voice entry in a codebook for each of the source voice passage segments, wherein each of the identified target voice entries models a prosodic component of a target voice for a different segment of codebook training material, and wherein the codebook training material is substantially different from the passage, and (c) generating a target voice version of the plurality of passage segments by altering the modeled source voice prosodic component for each segment to replicate the target voice prosodic component modeled by the target voice entry identified for that segment in (b). - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
33. A device, comprising:
a voice converter, the voice converter including means for receiving data for a plurality of segments of a passage in a source voice, means for identifying target voice data entries in a codebook for segments of the source voice passage, and means for generating a target voice version of the passage segments based on identified target voice data entries. - View Dependent Claims (34, 35)
Specification