Generating large units of graphonemes with mutual information criterion for letter to sound conversion
First Claim
1. A method of segmenting words into component parts, the method comprising:
- determining mutual information scores for graphoneme units, each graphoneme unit comprising at least on letter in the spelling of a word;
using the mutual information scores to combine graphoneme units into a larger graphoneme unit; and
segmenting words into component parts to form a sequence of graphonemes.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus are provided for segmenting words into component parts. Under the invention, mutual information scores for pairs of graphoneme units found in a set of words are determined. Each graphoneme unit includes at least one letter. The graphoneme units of one pair of graphoneme units are combined based on the mutual information score. This forms a new graphoneme unit. Under one aspect of the invention, a syllable n-gram model is trained based on words that have been segmented into syllables using mutual information. The syllable n-gram model is used to segment a phonetic representation of a new word into syllables. Similarly, an inventory of morphemes is formed using mutual information and a morpheme n-gram is trained that can be used to segment a new word into a sequence of morphemes.
18 Citations
17 Claims
-
1. A method of segmenting words into component parts, the method comprising:
-
determining mutual information scores for graphoneme units, each graphoneme unit comprising at least on letter in the spelling of a word;
using the mutual information scores to combine graphoneme units into a larger graphoneme unit; and
segmenting words into component parts to form a sequence of graphonemes. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-readable medium having computer-executable instructions for performing steps comprising:
-
determining mutual information scores for pairs of graphoneme units found in a set of words, each graphoneme unit comprising at least one letter;
combining the graphoneme units of one pair of graphonome units to form a new graphoneme unit based on the mutual information scores; and
identifying a set of graphoneme units for a word based in part on the new graphoneme unit. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method of segmenting a word into syllables, the method comprising:
-
segmenting a set of words into phonetic syllables using mutual information scores;
using the segmented set of words to train a syllable n-gram model; and
using the syllable n-gram model to segment a phonetic representation of a word into syllables via forced alignment.
-
-
17. A method of segmenting a word into morphemes, the method comprising:
-
segmenting a set of words into morphemes using mutual information scores;
using the segmented set of words to train a morpheme n-gram model; and
using the morpheme n-gram model to segment a word into morphemes via forced alignment.
-
Specification