×

System and method for word-sense disambiguation by recursive partitioning

  • US 8,099,281 B2
  • Filed: 06/06/2005
  • Issued: 01/17/2012
  • Est. Priority Date: 06/06/2005
  • Status: Active Grant
First Claim
Patent Images

1. A method of constructing a test for use in electronically disambiguating a homograph during a computer-based text-to-speech event, the method comprising:

  • using at least one processor to construct a decision tree for determining a pronunciation label for the homograph in an input word string, the decision tree comprising at least first and second nodes, the first node being a parent of the second node, wherein the at least one processor is configured to construct the decision tree at least in part by;

    accessing a first set of training samples, each of the training samples comprising a word string that contains the homograph and a pronunciation label indicating a correct pronunciation of the homograph in the word string;

    applying a plurality of decision rules to the first set of training samples, each of the plurality of decision rules partitioning the first set of training samples into at least two subsets of the first set of training samples;

    for each one of the plurality of decision rules, computing a corresponding measure of impurity indicative of an extent to which each of the at least two subsets formed by applying the one of the plurality of decision rules contains training samples associated with different pronunciation labels, wherein the one of the plurality of decision rules, when applied to word strings in the first set of training samples, determines whether at least one selected word indicator is present in the word strings, and wherein at least one training sample in the first set of training samples is retained for computing the measure of impurity corresponding to the one of the plurality of decision rules even if the at least one selected word indicator is absent in the word string of the at least one training sample; and

    selecting, for the first node of the decision tree, a decision rule from the plurality of decision rules based at least in part on the measures of impurity computed for the plurality of decision rules.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×