Concatenative speech synthesis using a finite-state transducer
First Claim
1. A method for selecting segments from a corpus of source utterances for synthesizing a target utterance, comprising:
- searching a graph in which each path through the graph identifies a sequence of segments of the source utterances and a corresponding sequence of unit labels that characterizes a pronunciation of a concatenation of that sequence of segments, each path being associated with a numerical score that characterizes a quality of the sequence of segment;
wherein searching the graph includes matching a pronunciation of the target utterance to paths through the graph, and selecting segments for synthesizing the target utterance based on numerical scores of matching paths through the graph.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for concatenative speech synthesis includes a processing stage that selects segments based on their symbolic labeling in an efficient graph-based search, which uses a finite-state transducer formalism. This graph-based search uses a representation of concatenation constraints and costs that does not necessarily grow with the size of the source corpus thereby limiting the increase in computation required for the search as the size of the source corpus increases. In one application of this method, multiple alternative segment sequences are generated and a best segment sequence is then be selected using characteristics that depend on specific signal characteristics of the segments.
-
Citations
18 Claims
-
1. A method for selecting segments from a corpus of source utterances for synthesizing a target utterance, comprising:
-
searching a graph in which each path through the graph identifies a sequence of segments of the source utterances and a corresponding sequence of unit labels that characterizes a pronunciation of a concatenation of that sequence of segments, each path being associated with a numerical score that characterizes a quality of the sequence of segment;
wherein searching the graph includes matching a pronunciation of the target utterance to paths through the graph, and selecting segments for synthesizing the target utterance based on numerical scores of matching paths through the graph. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. Software stored on a computer-readable medium for causing a computer to perform functions comprising selecting segments from a corpus of source utterances for synthesizing a target utterance, wherein selecting the segments comprises:
-
searching a graph in which each path through the graph identifies a sequence of segments of the source utterances and a corresponding sequence of unit labels that characterizes a pronunciation of a concatenation of that sequence of segments, each path being associated with a numerical score that characterizes a quality of the sequence of segment;
wherein searching the graph includes matching a pronunciation of the target utterance to paths through the graph, and selecting segments for synthesizing the target utterance based on numerical scores of matching paths through the graph.
-
Specification