Generating words and names using N-grams of phonemes
First Claim
1. A method for generating words and/or names, comprising:
- receiving at least one corpus based on a given language;
generating a plurality of N-grams of phonemes and a plurality of frequencies of occurrence using said at least one corpus, such that each frequency of occurrence corresponds to a respective pair of phonemes and indicates the frequency of the second phoneme in the pair following the first phoneme in the pair;
generating a phoneme tree using said plurality of N-grams of phonemes and said plurality of frequencies of occurrence;
performing a random walk on said phoneme tree using said frequencies of occurrence to generate a sequence of phonemes; and
mapping said sequence of phonemes into a sequence of graphemes.
11 Assignments
0 Petitions
Accused Products
Abstract
Generating words and/or names, comprising: receiving at least one corpus based on a given language; generating a plurality of N-grams of phonemes and a plurality of frequencies of occurrence using the corpus, such that each frequency of occurrence corresponds to a respective pair of phonemes and indicates the frequency of the second phoneme in the pair following the first phoneme in the pair; generating a phoneme tree using the plurality of N-grams of phonemes and the plurality of frequencies of occurrence; performing a random walk on the phoneme tree using the frequencies of occurrence to generate a sequence of phonemes; and mapping the sequence of phonemes into a sequence of graphemes.
60 Citations
45 Claims
-
1. A method for generating words and/or names, comprising:
-
receiving at least one corpus based on a given language;
generating a plurality of N-grams of phonemes and a plurality of frequencies of occurrence using said at least one corpus, such that each frequency of occurrence corresponds to a respective pair of phonemes and indicates the frequency of the second phoneme in the pair following the first phoneme in the pair;
generating a phoneme tree using said plurality of N-grams of phonemes and said plurality of frequencies of occurrence;
performing a random walk on said phoneme tree using said frequencies of occurrence to generate a sequence of phonemes; and
mapping said sequence of phonemes into a sequence of graphemes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. An apparatus for generating words and/or names, comprising:
-
means for receiving at least one corpus based on a given language;
first means for generating a plurality of N-grams of phonemes and a plurality of frequencies of occurrence using said at least one corpus, such that each frequency of occurrence corresponds to a respective pair of phonemes and indicates the frequency of the second phoneme in the pair following the first phoneme in the pair;
second means for generating a phoneme tree using said plurality of N-grams of phonemes and said plurality of frequencies of occurrence;
means for performing a random walk on said phoneme tree using said frequencies of occurrence to generate a sequence of phonemes; and
means for mapping said sequence of phonemes into a sequence of graphemes. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. An apparatus for generating words and/or names, comprising:
-
an input/output interface to receive at least one corpus based on a given language;
a processor configured to decompose said at least one corpus into a sequence of words, to generate a plurality of N-grams of phonemes and a plurality of frequencies of occurrence using said sequence of words and a dictionary, such that each frequency of occurrence corresponds to a respective pair of phonemes and indicates the frequency of the second phoneme in the pair following the first phoneme in the pair, and to generate a phoneme tree using said plurality of N-grams of phonemes and said plurality of frequencies of occurrence;
a storage for storing said dictionary, said phoneme tree, and a phoneme-to-grapheme lookup table, wherein said phoneme tree is configured with a root at the top and a plurality of nodes emanating from said root, said plurality of nodes connected in such a way that a pair of connected nodes are connected by a path from the first node in the pair to the second node in the pair, a pair of connected nodes in said phoneme tree corresponds to a pair of phonemes having a corresponding frequency of occurrence, so that the first node in a pair of nodes represents the first phoneme in the corresponding pair of phonemes, the second node in that pair of nodes represents the second phoneme in that pair of phonemes, and the path connecting the nodes of that pair of nodes represents the frequency of occurrence, and said processor retrieving said phoneme tree to perform a random walk on said phoneme tree using said frequencies of occurrence to generate a sequence of phonemes, and maps said sequence of phonemes into a sequence of graphemes using said phoneme-to-grapheme lookup table.
-
-
33. A computer program, stored in a tangible storage medium, for generating words and/or names, the program comprising executable instructions that cause a computer to:
-
receive at least one corpus based on a given language;
generate a plurality of N-grams of phonemes and a plurality of frequencies of occurrence using said at least one corpus, such that each frequency of occurrence corresponds to a respective pair of phonemes and indicates the frequency of the second phoneme in the pair following the first phoneme in the pair;
generate a phoneme tree using said plurality of N-grams of phonemes and said plurality of frequencies of occurrence;
perform a random walk on said phoneme tree using said frequencies of occurrence to generate a sequence of phonemes; and
map said sequence of phonemes into a sequence of graphemes. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
-
Specification