MULTI-LINGUAL WORD HYPHENATION USING INDUCTIVE MACHINE LEARNING ON TRAINING DATA
First Claim
1. A method comprising:
- receiving training data that include a plurality of hyphenated words;
inductively generating hyphenation patterns that represent substrings occurring within the words, wherein the hyphenation patterns include at least the substrings and include hyphenation codes associated respectively with characters occurring in the substrings, wherein the hyphenation codes identify hyphenation points within the patterns;
receiving at least one induction parameter applicable to generating the hyphenation patterns; and
storing at least the substrings and the hyphenation codes into a language-specific lexicon file.
2 Assignments
0 Petitions
Accused Products
Abstract
Tools and techniques are described for providing multi-lingual word hyphenation using inductive machine learning on training data. Methods provided by these techniques may receive training data that includes hyphenated words, and may inductively generate hyphenation patterns that represent substrings of these words. The hyphenation patterns may include the substrings and hyphenation codes associated with characters occurring in the substrings. The methods may receive induction parameters applicable to generating the hyphenation patterns, and may store the hyphenation patterns into a language-specific lexicon file. These methods may also receive requests to hyphenate input words that occur in a human language, and may evaluate how to process the request based on the language. The methods may search for hyphenation patterns occurring in the input words, with the hyphenation patterns being stored in the lexicon file. Finally, the methods may respond to the request, indicating whether the hyphenation patterns occurred in the input words.
29 Citations
20 Claims
-
1. A method comprising:
-
receiving training data that include a plurality of hyphenated words; inductively generating hyphenation patterns that represent substrings occurring within the words, wherein the hyphenation patterns include at least the substrings and include hyphenation codes associated respectively with characters occurring in the substrings, wherein the hyphenation codes identify hyphenation points within the patterns; receiving at least one induction parameter applicable to generating the hyphenation patterns; and storing at least the substrings and the hyphenation codes into a language-specific lexicon file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. At least one computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a computer, cause the computer to perform a method comprising:
-
receiving at least one request to hyphenate at least one input word occurring in a human language; evaluating how to process the request based at least in part on the human language; searching for a least one hyphenation pattern occurring in the input word, wherein at least the hyphenation pattern is stored in a language-specific lexicon file that is created inductively based on training data; and responding to the request, wherein the response at least indicates whether the at least one hyphenation pattern occurred in the input word. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A word hyphenation system comprising:
-
a least one server adapted to; receive training data that includes a plurality of hyphenated words; based on at least one induction parameter, inductively generate hyphenation patterns that represent substrings occurring within the words, wherein the hyphenation patterns include at least the substrings and hyphenation codes associated respectively with characters occurring in the substrings; receive at least one induction parameter applicable to generating the hyphenation patterns; and store at least the substrings and the hyphenation codes into a lexicon file specific to a human language; wherein the server is further adapted to; receive at least one request to hyphenate at least one input word occurring in the human language; evaluate how to process the request based at least in part on the human language; search for a least one hyphenation pattern occurring in the input word, wherein at least the hyphenation pattern is stored in the language-specific lexicon file; and respond to the request, wherein the response at least indicates whether the at least one hyphenation pattern occurred in the input word, wherein the server may achieve complete accuracy in hyphenating input words that occur within the training data, and wherein the server may achieve at least a lower bound on accuracy in hyphenating input words that do not occur in the training data; and at least one client system adapted to send the request and to receive the response thereto. - View Dependent Claims (20)
-
Specification