Improved translation system utilizing a morphological stripping process to reduce words to their root configuration to produce reduction of database size
First Claim
1. An improved machine translation system having a natural language source module for accepting externally introduced text in said source language, said module including a lexical database, said system being broadly based upon the concept of Chaos and conducts a divergent search in the source language, a morpheme root database, and further including a morphological word stripping means, said means to be implemented on a data processing device, said system source module includes means implementing a method having the steps whereby each of the words in a subject clause, phrase, or sentence of said externally introduced source language text are individually compared first to data in said lexical database and if said individual words are not found among said data in said lexical database then means are provided whereby said words are subjected to said morphological word stripping means, said stripping means being directed to the affixes of said words and first to the stripping of suffixes, if any, from each said word followed by the step of comparing an individual stripped word, in the absence of that particular word'"'"'s stripped suffix, with the data in said morpheme root database, which comparison normally proceeds downward through descending length character strings until a morpheme root match is found, further stripping and comparison with said database are repeated as often as required to find a root match.
3 Assignments
0 Petitions
Accused Products
Abstract
A machine translation system having a natural language source module for accepting externally introduced text in the source language. The system is broadly based upon the concept of Chaos and conducts a divergent search in the source language, a morpheme root database, and further includes a morphological word stripping means that is to be implemented on a data processing device. The system source module provides the steps whereby each of the words in a subject clause, phrase, or sentence of the externally introduced source language text are individually compared first to data in a lexical database and if the individual words are not found among the data in the lexical database then the words are subjected to the morphological word stripping means which are directed to the affixes of the words and first to the stripping of suffixes, if any, from each word followed by the step of comparing an individual stripped word, in the absence of that particular word'"'"'s stripped suffix, with the data in the morpheme root database, which comparison normally proceeds downward through descending length character strings until a morpheme root match is found. The stripping and comparison with the database are repeated as often as required to find a root match.
-
Citations
14 Claims
- 1. An improved machine translation system having a natural language source module for accepting externally introduced text in said source language, said module including a lexical database, said system being broadly based upon the concept of Chaos and conducts a divergent search in the source language, a morpheme root database, and further including a morphological word stripping means, said means to be implemented on a data processing device, said system source module includes means implementing a method having the steps whereby each of the words in a subject clause, phrase, or sentence of said externally introduced source language text are individually compared first to data in said lexical database and if said individual words are not found among said data in said lexical database then means are provided whereby said words are subjected to said morphological word stripping means, said stripping means being directed to the affixes of said words and first to the stripping of suffixes, if any, from each said word followed by the step of comparing an individual stripped word, in the absence of that particular word'"'"'s stripped suffix, with the data in said morpheme root database, which comparison normally proceeds downward through descending length character strings until a morpheme root match is found, further stripping and comparison with said database are repeated as often as required to find a root match.
-
11. A machine translation system for translating text from a first national natural source language to a second national natural target language through a universal machine method adapted to be implemented on a data processing device including a first module having a lexical database identifiably with said source language and said first module including means capable of performing a syntactic and lexical analysis on said text and attaching informational tags on each word of said text, a universal intermediate second module providing an interface having an operating environment for display to a user and a basis for issuing commands and receiving information, said second module also including a lexical database in an intermediate international created language that is capable of accepting said syntactic and lexical analysis of said text from said first module and including means for translating said source language words carrying said informational tags into said international created language while retaining said informational tags, and a third target module having a lexical database identifiable with said second national natural target language, and including means to accept said intermediate created language with its tagged words of said text and proceed to translate the text into the target national natural language, said second module being universally accepted by a multiplicity of differing national languages each of which has one of its own said first source module of one of its own said third target module;
- said first module also including a root word morpheme database, and having means whereby any individual words of said source text which cannot be initially matched with a word in said first module lexical database are then subjected to morphological stripping of endings and prefixes until the root of said words can be matched with said root word morpheme database, appropriate designating tags are attached to each said root word indicating, but not limited to, the root word designator, type of word, tense, gender, pluralism, and particular ending or prefix morpheme stripped therefrom, means are provided so that appropriate morphemes can be added to the translated root word in the target language of said third module, said system further including means for inputting text into file means in said first module, said machine method includes means adapted to read the said input file a character at a time until it reaches some form of punctuation which terminates a statement, including periods, commas exclamation marks, dashes, ellipsis, question marks;
said last mentioned means is directed to process only one statement at a time and all punctuation falls through as is appropriate;
means are provided wherein each word in the statement is looked up in the lexical database, if no match is found for an individual word the lexical database returns an error code;
said individual words returned with an error code goes to a morphology database including means which strips successive affixes, including suffixes or prefixes, off said word and modifies it to determine if the root of said word is in the lexical database, such a termination is made by checking said word against said database each time a morpheme is stripped from said word, and repeated until a match is found, said lexical database returns grammatical information about each of said words, however, said morphology database includes means that has the power to supersede this grammatical information during said stripping operation, however, if said word is of the type that may be many different parts of speech, including a verb, noun, adjective, adverb, article or preposition, and is ambiguous and/or did not pass through morphological stripping;
means are provides for an indeterminate flag to be set and additional means are provided whereby a grammatical analysis is performed by examination of the proximal words, if the said word is the first word followed by a noun the probability of it being adjectival is very high, if, on the other hand if the word before said word is an article said word must be a noun, in either event said word is appropriately flagged as to word type, once said word type has been resolved, in the lexical database, means are provided whereby it is tagged as to type and the proper individual identification for said word, which identification remains the same regardless of what language or what module the text may reside, if said word has multiple possibilities as to its type, as set forth above, namely including verb, noun, adjective, adverb, article, or preposition, then means are provided whereby a heuristic approach is utilized land it will appear as many times as there are possibilities, lookups are repeated a plurality of times until no ambiguities are remaining, said system further including program means whereby verbs are identified next by starting at the end of the sentence and/or statement and working forward until a first main verb is located, said system program means is intelligent since it stops processing when it encounters any additional main verbs or definite clause markers or punctuation, said program means then continues and if a verb is marked as an infinitive said program means moves on to further translation, verbs are tensed and during this process the said program means checks for modals and auxiliary verbs and sets them aside for later treatment. - View Dependent Claims (12, 13, 14)
- said first module also including a root word morpheme database, and having means whereby any individual words of said source text which cannot be initially matched with a word in said first module lexical database are then subjected to morphological stripping of endings and prefixes until the root of said words can be matched with said root word morpheme database, appropriate designating tags are attached to each said root word indicating, but not limited to, the root word designator, type of word, tense, gender, pluralism, and particular ending or prefix morpheme stripped therefrom, means are provided so that appropriate morphemes can be added to the translated root word in the target language of said third module, said system further including means for inputting text into file means in said first module, said machine method includes means adapted to read the said input file a character at a time until it reaches some form of punctuation which terminates a statement, including periods, commas exclamation marks, dashes, ellipsis, question marks;
Specification