System and method for word-sense disambiguation by recursive partitioning
First Claim
1. A device for use with a computer-based system capable of converting text data to synthesized speech, the device comprising:
- an identification module for identifying a homograph contained in the text data; and
an assignment module for assigning a pronunciation to the homograph using a statistical test constructed from a recursive partitioning of a plurality of training samples, each training sample comprising a word string containing the homograph;
the recursive partitioning being based on determining for each of a plurality of word indicators an order and a distance of each word indicator relative to the homograph in each training sample, wherein an absence of one of the plurality of word indicators in a training sample is treated as an equivalent to the absent word indicator being more than a predefined distance from the homograph.
9 Assignments
0 Petitions
Accused Products
Abstract
A device and related methods for word-sense disambiguation during a text-to-speech conversion are provided. The device, for use with a computer-based system capable of converting text data to synthesized speech, includes an identification module for identifying a homograph contained in the text data. The device also includes an assignment module for assigning a pronunciation to the homograph using a statistical test constructed from a recursive partitioning of training samples, each training sample being a word string containing the homograph. The recursive partitioning is based on determining for each training sample an order and a distance of each word indicator relative to the homograph in the training sample. An absence of one of the word indicators in a training sample is treated as equivalent to the absent word indicator being more than a predefined distance from the homograph.
-
Citations
18 Claims
-
1. A device for use with a computer-based system capable of converting text data to synthesized speech, the device comprising:
-
an identification module for identifying a homograph contained in the text data; and
an assignment module for assigning a pronunciation to the homograph using a statistical test constructed from a recursive partitioning of a plurality of training samples, each training sample comprising a word string containing the homograph;
the recursive partitioning being based on determining for each of a plurality of word indicators an order and a distance of each word indicator relative to the homograph in each training sample, wherein an absence of one of the plurality of word indicators in a training sample is treated as an equivalent to the absent word indicator being more than a predefined distance from the homograph. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method of electronically disambiguating homographs during a computer-based text-to-speech event, the method comprising:
-
identifying a homograph contained in a text; and
determining a pronunciation for the homograph using a statistical test constructed from a recursive partitioning of a plurality of training samples, each training sample comprising a word string containing the homograph;
the recursive partitioning being based on determining for each of a plurality of word indicators an order and a distance of each word indicator relative to the homograph in each training sample, wherein an absence of one of the plurality of word indicators in a training sample is treated as an equivalent to the absent word indicator being more than a predefined distance from the homograph. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer-implemented method of constructing a statistical test for determining a pronunciation of a homograph encountered during an electronic text-to-speech conversion event, the method comprising:
-
selecting a set of training samples, each training sample comprising a word string containing the homograph; and
recursively partitioning the set of training samples, the recursive partitioning producing a decision tree for determining the pronunciation and being based on determining for each of a plurality of word indicators an order and a distance of each word indicator relative to the homograph in each training sample, wherein an absence of one of the plurality of word indicators in a training sample is treated as an equivalent to the absent word indicator being more than a predefined distance from the homograph. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification