Prosody conversion
First Claim
1. A method comprising:
- (a) receiving data for a plurality of segments of a passage in a source voice, wherein the data for each segment of the plurality models a prosodic component of the source voice for that segment;
(b) identifying a target voice entry in a codebook for each of the source voice passage segments, wherein each of the identified target voice entries models a prosodic component of a target voice for a different segment of codebook training material; and
(c) generating, in one or more processors, a target voice version of the plurality of passage segments by altering the modeled source voice prosodic component for each segment to replicate the target voice prosodic component modeled by the target voice entry identified for that segment in (b), and whereinthe codebook includes multiple source voice entries,each of the multiple source voice entries models a prosodic component of the source voice for a different segment of the codebook training material,each of the multiple source voice entries corresponds to a target voice entry modeling a prosodic component of the target voice for the segment of the codebook training material for which the corresponding source voice entry models the prosodic component of the source voice,operation (b) includes, for each source voice passage segment, identifying a target voice entry by comparing data for the source voice passage segment to one or more of the multiple source voice entries,each of the multiple source voice entries and its corresponding target voice entry includes a plurality of transform coefficients representing a contour for the modeled prosodic component, andoperation (b) includes, for each source voice passage segment, identifying a target voice entry by comparing transform coefficients representing a contour for the prosodic component of the source voice passage segment to the transform coefficients for one or more of the multiple source voice entries.
6 Assignments
0 Petitions
Accused Products
Abstract
A contour for a syllable (or other speech segment) in a voice undergoing conversion is transformed. The transform of that contour is then used to identify one or more source syllable transforms in a codebook. Information regarding the context and/or linguistic features of the contour being converted can also be compared to similar information in the codebook when identifying an appropriate source transform. Once a codebook source transform is selected, an inverse transformation is performed on a corresponding codebook target transform to yield an output contour. The corresponding codebook target transform represents a target voice version of the same syllable represented by the selected codebook source transform. The output contour may be further processed to improve conversion quality.
32 Citations
31 Claims
-
1. A method comprising:
-
(a) receiving data for a plurality of segments of a passage in a source voice, wherein the data for each segment of the plurality models a prosodic component of the source voice for that segment; (b) identifying a target voice entry in a codebook for each of the source voice passage segments, wherein each of the identified target voice entries models a prosodic component of a target voice for a different segment of codebook training material; and (c) generating, in one or more processors, a target voice version of the plurality of passage segments by altering the modeled source voice prosodic component for each segment to replicate the target voice prosodic component modeled by the target voice entry identified for that segment in (b), and wherein the codebook includes multiple source voice entries, each of the multiple source voice entries models a prosodic component of the source voice for a different segment of the codebook training material, each of the multiple source voice entries corresponds to a target voice entry modeling a prosodic component of the target voice for the segment of the codebook training material for which the corresponding source voice entry models the prosodic component of the source voice, operation (b) includes, for each source voice passage segment, identifying a target voice entry by comparing data for the source voice passage segment to one or more of the multiple source voice entries, each of the multiple source voice entries and its corresponding target voice entry includes a plurality of transform coefficients representing a contour for the modeled prosodic component, and operation (b) includes, for each source voice passage segment, identifying a target voice entry by comparing transform coefficients representing a contour for the prosodic component of the source voice passage segment to the transform coefficients for one or more of the multiple source voice entries. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A non-transitory machine-readable medium storing machine-executable instructions for performing a method comprising:
-
(a) receiving data for a plurality of segments of a passage in a source voice, wherein the data for each segment of the plurality models a prosodic component of the source voice for that segment; (b) identifying a target voice entry in a codebook for each of the source voice passage segments, wherein each of the identified target voice entries models a prosodic component of a target voice for a different segment of codebook training material; and (c) generating a target voice version of the plurality of passage segments by altering the modeled source voice prosodic component for each segment to replicate the target voice prosodic component modeled by the target voice entry identified for that segment in (b), and wherein the codebook includes multiple source voice entries, each of the multiple source voice entries models a prosodic component of the source voice for a different segment of the codebook training material, each of the multiple source voice entries corresponds to a target voice entry modeling a prosodic component of the target voice for the segment of the codebook training material for which the corresponding source voice entry models the prosodic component of the source voice, operation (b) includes, for each source voice passage segment, identifying a target voice entry by comparing data for the source voice passage segment to one or more of the multiple source voice entries, each of the multiple source voice entries and its corresponding target voice entry includes a plurality of transform coefficients representing a contour for the modeled prosodic component, and operation (b) includes, for each source voice passage segment, identifying a target voice entry by comparing transform coefficients representing a contour for the prosodic component of the source voice passage segment to the transform coefficients for one or more of the multiple source voice entries. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A device, comprising:
-
at least one processor; and at least one memory storing machine executable instructions, the machine-executable instructions configured to, with the at least one processor, cause the device to (a) receive data for a plurality of segments of a passage in a source voice, wherein the data for each segment of the plurality models a prosodic component of the source voice for that segment, (b) identify a target voice entry in a codebook for each of the source voice passage segments, wherein each of the identified target voice entries models a prosodic component of a target voice for a different segment of codebook training material, and (c) generate a target voice version of the plurality of passage segments by altering the modeled source voice prosodic component for each segment to replicate the target voice prosodic component modeled by the target voice entry identified for that segment in (b), and wherein the codebook includes multiple source voice entries, each of the multiple source voice entries models a prosodic component of the source voice for a different segment of the codebook training material, each of the multiple source voice entries corresponds to a target voice entry modeling a prosodic component of the target voice for the segment of the codebook training material for which the corresponding source voice entry models the prosodic component of the source voice, operation (b) includes, for each source voice passage segment, identifying a target voice entry by comparing data for the source voice passage segment to one or more of the multiple source voice entries, each of the multiple source voice entries and its corresponding target voice entry includes a plurality of transform coefficients representing a contour for the modeled prosodic component, and operation (b) includes, for each source voice passage segment, identifying a target voice entry by comparing transform coefficients representing a contour for the prosodic component of the source voice passage segment to the transform coefficients for one or more of the multiple source voice entries. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A device, comprising:
a voice converter, the voice converter including means for receiving data for a plurality of segments of a passage in a source voice, means for identifying target voice data entries in a codebook for segments of the source voice passage, and means for generating a target voice version of the passage segments based on identified target voice data entries, and wherein the codebook includes multiple source voice entries, each of the multiple source voice entries models a prosodic component of the source voice for a different segment of the codebook training material, each of the multiple source voice entries corresponds to a target voice entry modeling a prosodic component of the target voice for the segment of the codebook training material for which the corresponding source voice entry models the prosodic component of the source voice, the identification means include means for comparing data for the source voice passage segment to one or more of the multiple source voice entries, each of the multiple source voice entries and its corresponding target voice entry includes a plurality of transform coefficients representing a contour for the modeled prosodic component, and the identification means further include means for comparing transform coefficients representing a contour for the prosodic component of the source voice passage segment to the transform coefficients for one or more of the multiple source voice entries. - View Dependent Claims (31)
Specification