System and method for selecting training text
First Claim
1. A method for identifying a subset of a corpus of speech data usable for estimating speech parameters in a speech processing application, said corpus being arranged as a plurality of sentences, comprising the steps of:
- constructing feature vectors corresponding to all phonetic segments appearing in said corpus;
mapping said feature vectors into a plurality of matrices based on a model chosen to fit said corpus, said matrices being arranged to include sets of said feature vectors corresponding to sentences in said corpus; and
operating on said parameter space matrices with a greedy algorithm to find a submatrix of full rank, said full-rank submatrix being formed by the union of one or more of said model-based matrices and whereby sentences corresponding to said one or more of said model-based matrices included in said full-rank submatrix comprise said subset of said corpus of speech data;
wherein an articulation of one or more of said corresponding sentences provides an input to said speech processing application for estimation of said speech parameters.
9 Assignments
0 Petitions
Accused Products
Abstract
A system and method are described for determining a near-optimum subset of data, based on a selected model, from a large corpus of data. Sets of feature vectors corresponding to natural or other preselected divisions of the data corpus are mapped into matrices representative of such divisions. The invention operates to find a submatrix of full rank formed as a union of one or more of those division-based matrices. A greedy algorithm utilizing Gram-Schmidt orthonormalization operates on the division matrices to find a near optimum submatrix and in a time bound representing a substantial improvement over prior-art methods. An important application of the invention is the selection of a small number of sentences from a corpus of a very large number of such sentences from which the parameters of a duration model for speech synthesis can be estimated.
278 Citations
31 Claims
-
1. A method for identifying a subset of a corpus of speech data usable for estimating speech parameters in a speech processing application, said corpus being arranged as a plurality of sentences, comprising the steps of:
-
constructing feature vectors corresponding to all phonetic segments appearing in said corpus; mapping said feature vectors into a plurality of matrices based on a model chosen to fit said corpus, said matrices being arranged to include sets of said feature vectors corresponding to sentences in said corpus; and operating on said parameter space matrices with a greedy algorithm to find a submatrix of full rank, said full-rank submatrix being formed by the union of one or more of said model-based matrices and whereby sentences corresponding to said one or more of said model-based matrices included in said full-rank submatrix comprise said subset of said corpus of speech data; wherein an articulation of one or more of said corresponding sentences provides an input to said speech processing application for estimation of said speech parameters. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for identifying a subset of a corpus of speech data usable for estimating speech parameters in a speech processing application, said corpus being arranged as a plurality of sentences, comprising:
- means for constructing feature vectors corresponding to all phonetic segments appearing in said corpus;
means for mapping said feature vectors into a plurality of matrices based on a model selected to fit said corpus, said matrices being arranged to include sets of said feature vectors corresponding to sentences in said corpus; and means for applying a greedy algorithm to said model-based matrices for finding a submatrix of full rank, said full-rank submatrix being formed by the union of one or more of said model-based matrices and whereby sentences corresponding to said one or more of said model-based matrices included in said full-rank submatrix comprise said subset of said corpus of speech data; wherein an articulation of one or more of said corresponding sentences provides an input to said speech processing application for estimation of said speech parameters. - View Dependent Claims (7, 8)
- means for constructing feature vectors corresponding to all phonetic segments appearing in said corpus;
-
9. In a method for synthesizing speech from text comprising the steps of:
- analyzing input text to determine phonetic segments for said input text;
estimating acoustic parameters associated with each said phonetic segment; and generating a speech waveform based on said estimated acoustic parameters to synthesize said input text into speech; wherein said acoustic parameters determined in said estimating step are derived from a set of training data, and said training data are manifested as a set of sentences selected from a corpus of speech data arranged as a plurality of sentences; a method for selecting said selected sentences comprising the steps of; constructing feature vectors corresponding to all phonetic segments appearing in said corpus; mapping said feature vectors into a plurality of matrices based on a model chosen to fit said corpus, said matrices arranged to include sets of said feature vectors corresponding to sentences in said corpus; and operating on said model-based matrices with a greedy algorithm to find a submatrix of full rank, said full-rank submatrix being formed as the union of one or more of said model-based matrices, whereby sentences corresponding to said one or more of said model-based matrices included in said full-rank submatrix comprise said selected sentences. - View Dependent Claims (10, 11, 12, 13)
- analyzing input text to determine phonetic segments for said input text;
-
14. In a system for synthesizing speech from text comprising:
- a text analysis means for analyzing input text to determine phonetic segments for said input text;
parameter estimation means for estimating acoustic parameters associated with each said phonetic segment; and speech generation means for generating a speech waveform based on said estimated speech parameters to thereby synthesize said input text into speech;
wherein said parameter estimation means further includes means for deriving a set of training data, said training data being manifested as a set of sentences selected from a corpus of speech data arranged as a plurality of sentences, and said means for deriving a set of training data further comprises;means for constructing feature vectors corresponding to all phonetic segments appearing in a plurality of sentences; means for mapping said feature vectors into a plurality of matrices based on a model chosen to fit said plurality of sentences, said matrices being arranged to include sets of said feature vectors corresponding to sentences in said plurality of sentences; means for applying a greedy algorithm to said model-based matrices for finding a submatrix of full rank, said full-rank submatrix being formed as the union of one or more of said model-based matrices. - View Dependent Claims (15, 16)
- a text analysis means for analyzing input text to determine phonetic segments for said input text;
-
17. A method for selecting speech parameter estimation sentences to be applied in a speech processing application by analyzing each of a plurality of sentences, said plurality of sentences including said selected speech parameter estimation sentences, according to the following steps:
- constructing feature vectors corresponding to all phonetic segments appearing in said plurality of sentences;
mapping said feature vectors into a plurality of matrices based on a model chosen to fit said plurality of sentences, said matrices being arranged to include sets of said feature vectors corresponding to sentences in said plurality of sentences; and operating on said model-based matrices with a greedy alogorithm to find a submatrix of full rank, said full-rank submatrix being formed by the union of one or more of said model-based matrices, the sentences corresponding to said one or more of said model-based matrices comprising said full-rank submatrix being selected as said speech parameter estimation sentences; wherein an articulation of one or more of said speech parameter estimation sentences provides an input to said speech processing application for estimation of said speech parameters. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25)
- constructing feature vectors corresponding to all phonetic segments appearing in said plurality of sentences;
-
26. A method for estimating speech parameters in a speech processing application by use of a model populated from data derived from a selected set of speech parameter estimation sentences, said speech parameter estimation sentences having been selected according to the following steps:
- constructing feature vectors corresponding to all phonetic segments appearing in a plurality of sentences, said plurality of sentences including said selected speech parameter estimation sentences;
mapping said feature vectors into a plurality of matrices based on said model, said matrices being arranged to include sets of said feature vectors corresponding to sentences in said plurality of sentences; and operating on said model-based matrices with a greedy algorithm to find a submatrix of full rank, said full-rank submatrix being formed by the union of one or more of said model-based matrices, the sentences corresponding to said one or more of said model-based matrices comprising said full-rank submatrix being selected as said speech parameter estimation sentences; wherein an articulation of one or more of said speech parameter estimation sentences provides an input to said speech-parameter-estimation model. - View Dependent Claims (27, 28, 29, 30)
- constructing feature vectors corresponding to all phonetic segments appearing in a plurality of sentences, said plurality of sentences including said selected speech parameter estimation sentences;
-
31. A method for identifying a subset of a corpus of speech data usable for estimating speech parameters in a speech processing application, said corpus being arranged as a plurality of ordered word sets, said word ordering being in accordance with a known ordering methodology, said method comprising the steps of:
- constructing feature vectors corresponding to all phonetic segments appearing in said corpus;
mapping said feature vectors into a plurality of matrices based on a model chosen to fit said corpus, said matrices being arranged to include sets of said feature vectors corresponding to word sets in said corpus; and operating on said parameter space matrices with a greedy algorithm to find a submatrix of full rank, said full-rank submatrix being formed by the union of one or more of said model-based matrices and whereby word sets corresponding to said one or more of said model-based matrices included in said full-rank submatrix comprise said subset of said corpus of speech data; wherein an articulation of one or more of said corresponding word sets provides an input to said speech processing application for estimation of said speech parameters.
- constructing feature vectors corresponding to all phonetic segments appearing in said corpus;
Specification