System and method for prosodically modified unit selection databases
First Claim
1. A method comprising:
- selecting, via a processor, speech units from a speech unit database, where the speech units are used to generate speech correspond to text;
identifying a desired prosodic curve of the speech to be produced from the speech units;
identifying an actual prosodic curve of the speech units;
decomposing, via a residual-excited linear prediction algorithm, the speech units into residual coefficients and linear predictive coder coefficients;
determining a cost of modifying the residual coefficients to yield a determination;
modifying, via a pitch synchronous overlap and add algorithm, the residual coefficients, to yield modified residual coefficients based on the determination;
combining, via the residual-excited linear prediction algorithm, the modified residual coefficients with the linear predictive coder coefficients, to yield new speech units, such that a new prosodic curve corresponding to the new speech units conforms to the desired prosodic curve; and
generating the speech using the new speech units.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems, methods, and computer-readable storage devices to improve the quality of synthetic speech generation. A system selects speech units from a speech unit database, the speech units corresponding to text to be converted to speech. The system identifies a desired prosodic curve of speech produced from the selected speech units, and also identifies an actual prosodic curve of the speech units. The selected speech units are modified such that a new prosodic curve of the modified speech units matches the desired prosodic curve. The system stores the modified speech units into the speech unit database for use in generating future speech, thereby increasing the prosodic coverage of the database with the expectation of improving the output quality.
19 Citations
15 Claims
-
1. A method comprising:
-
selecting, via a processor, speech units from a speech unit database, where the speech units are used to generate speech correspond to text; identifying a desired prosodic curve of the speech to be produced from the speech units; identifying an actual prosodic curve of the speech units; decomposing, via a residual-excited linear prediction algorithm, the speech units into residual coefficients and linear predictive coder coefficients; determining a cost of modifying the residual coefficients to yield a determination; modifying, via a pitch synchronous overlap and add algorithm, the residual coefficients, to yield modified residual coefficients based on the determination; combining, via the residual-excited linear prediction algorithm, the modified residual coefficients with the linear predictive coder coefficients, to yield new speech units, such that a new prosodic curve corresponding to the new speech units conforms to the desired prosodic curve; and generating the speech using the new speech units. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; selecting speech units from a speech unit database, where the speech units are used to generate speech correspond to text; identifying a desired prosodic curve of the speech to be produced from the speech units; identifying an actual prosodic curve of the speech units; decomposing, via a residual-excited linear prediction algorithm, the speech units into residual coefficients and linear predictive coder coefficients; determining a cost of modifying the residual coefficients to yield a determination; modifying, via a pitch synchronous overlap and add algorithm, the residual coefficients, to yield modified residual coefficients based on the determination; combining, via the residual-excited linear prediction algorithm, the modified residual coefficients with the linear predictive coder coefficients, to yield new speech units, such that a new prosodic curve corresponding to the new speech units conforms to the desired prosodic curve; and generating the speech using the new speech units. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A non-transitory computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
-
selecting speech units from a speech unit database, where the speech units are used to generate speech correspond to text; identifying a desired prosodic curve of the speech to be produced from the speech units; identifying an actual prosodic curve of the speech units; decomposing, via a residual-excited linear prediction algorithm, the speech units into residual coefficients and linear predictive coder coefficients; determining a cost of modifying the residual coefficients to yield a determination; modifying, via a pitch synchronous overlap and add algorithm, the residual coefficients, to yield modified residual coefficients based on the determination; combining, via the residual-excited linear prediction algorithm, the modified residual coefficients with the linear predictive coder coefficients, to yield new speech units, such that a new prosodic curve corresponding to the new speech units conforms to the desired prosodic curve; and generating the speech using the new speech units. - View Dependent Claims (12, 13, 14, 15)
-
Specification