Generating prosodic contours for synthesized speech
First Claim
Patent Images
1. A method implemented by a system of one or more computers, comprising:
- receiving, by the system of one or more computers, speech utterances encoded in audio data and a transcript having text that represents the speech utterances;
extracting, by the system of one or more computers, prosodic contours from the utterances;
extracting, by the system of one or more computers and from the transcript, attributes of text associated with the utterances;
for pairs of utterances from the speech utterances, determining, by the system of one or more computers, distances between attributes of text associated with the pairs of utterances;
for the pairs of utterances from the speech utterances, determining, by the system of one or more computers, distances between prosodic contours for the pairs of utterances;
generating, by the system of one or more computers, a model based on the determined distances for the attributes and the prosodic contours, the model adapted to estimate a distance between a determined prosodic contour for a received utterance and a prosodic contour for a synthesized utterance when given a distance between an attribute of text associated with the received utterance and an attribute of text associated with the synthesized utterance; and
storing, by the system of one or more computers, the model in a computer-readable memory device.
2 Assignments
0 Petitions
Accused Products
Abstract
The subject matter of this specification can be implemented in a computer-implemented method that includes receiving utterances and transcripts thereof. The method includes analyzing the utterances and transcripts to determine certain attributes, such as distances between prosodic contours for pairs of utterances. A model can be generated that can be used to estimate a distance between a determined prosodic contour for a received utterance and an unknown prosodic contour for a synthesized utterance when given a distance between attributes for text associated with the received utterance and the synthesized utterance.
-
Citations
21 Claims
-
1. A method implemented by a system of one or more computers, comprising:
-
receiving, by the system of one or more computers, speech utterances encoded in audio data and a transcript having text that represents the speech utterances; extracting, by the system of one or more computers, prosodic contours from the utterances; extracting, by the system of one or more computers and from the transcript, attributes of text associated with the utterances; for pairs of utterances from the speech utterances, determining, by the system of one or more computers, distances between attributes of text associated with the pairs of utterances; for the pairs of utterances from the speech utterances, determining, by the system of one or more computers, distances between prosodic contours for the pairs of utterances; generating, by the system of one or more computers, a model based on the determined distances for the attributes and the prosodic contours, the model adapted to estimate a distance between a determined prosodic contour for a received utterance and a prosodic contour for a synthesized utterance when given a distance between an attribute of text associated with the received utterance and an attribute of text associated with the synthesized utterance; and storing, by the system of one or more computers, the model in a computer-readable memory device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer-implemented system, comprising:
-
one or more computers having; an interface to receive speech utterances encoded in audio data and a transcript having text that represents the speech utterances; a prosodic contour extractor to extract prosodic contours from the utterances; a transcript analyzer to extract attributes of text associated with the utterances; an attribute comparer to determine, for pairs of utterances from the speech utterances, distances between attributes of text associated with the pairs of utterances; a prosodic contour comparer to determine, for the pairs of utterances from the speech utterances, distances between prosodic contours for the pairs of utterances; a model generator programmed to generate a model based on the determined distances for the attributes and the prosodic contours, the model adapted to estimate a distance between a determined prosodic contour for a received utterance and a prosodic contour for a synthesized utterance when given a distance between an attribute of text associated with the received utterance and an attribute of text associated with the synthesized utterance; and a computer-readable memory device associated with the one or more computers to store the model. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification