×

System and method for selecting training text

  • US 6,038,533 A
  • Filed: 07/07/1995
  • Issued: 03/14/2000
  • Est. Priority Date: 07/07/1995
  • Status: Expired due to Term
First Claim
Patent Images

1. A method for identifying a subset of a corpus of speech data usable for estimating speech parameters in a speech processing application, said corpus being arranged as a plurality of sentences, comprising the steps of:

  • constructing feature vectors corresponding to all phonetic segments appearing in said corpus;

    mapping said feature vectors into a plurality of matrices based on a model chosen to fit said corpus, said matrices being arranged to include sets of said feature vectors corresponding to sentences in said corpus; and

    operating on said parameter space matrices with a greedy algorithm to find a submatrix of full rank, said full-rank submatrix being formed by the union of one or more of said model-based matrices and whereby sentences corresponding to said one or more of said model-based matrices included in said full-rank submatrix comprise said subset of said corpus of speech data;

    wherein an articulation of one or more of said corresponding sentences provides an input to said speech processing application for estimation of said speech parameters.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×