Improved translation system utilizing a morphological stripping process to reduce words to their root configuration to produce reduction of database size

US 5,490,061 A
Filed: 09/05/1989
Issued: 02/06/1996
Est. Priority Date: 02/05/1987
Status: Expired due to Fees

First Claim

Patent Images

1. An improved machine translation system having a natural language source module for accepting externally introduced text in said source language, said module including a lexical database, said system being broadly based upon the concept of Chaos and conducts a divergent search in the source language, a morpheme root database, and further including a morphological word stripping means, said means to be implemented on a data processing device, said system source module includes means implementing a method having the steps whereby each of the words in a subject clause, phrase, or sentence of said externally introduced source language text are individually compared first to data in said lexical database and if said individual words are not found among said data in said lexical database then means are provided whereby said words are subjected to said morphological word stripping means, said stripping means being directed to the affixes of said words and first to the stripping of suffixes, if any, from each said word followed by the step of comparing an individual stripped word, in the absence of that particular word'"'"'s stripped suffix, with the data in said morpheme root database, which comparison normally proceeds downward through descending length character strings until a morpheme root match is found, further stripping and comparison with said database are repeated as often as required to find a root match.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A machine translation system having a natural language source module for accepting externally introduced text in the source language. The system is broadly based upon the concept of Chaos and conducts a divergent search in the source language, a morpheme root database, and further includes a morphological word stripping means that is to be implemented on a data processing device. The system source module provides the steps whereby each of the words in a subject clause, phrase, or sentence of the externally introduced source language text are individually compared first to data in a lexical database and if the individual words are not found among the data in the lexical database then the words are subjected to the morphological word stripping means which are directed to the affixes of the words and first to the stripping of suffixes, if any, from each word followed by the step of comparing an individual stripped word, in the absence of that particular word'"'"'s stripped suffix, with the data in the morpheme root database, which comparison normally proceeds downward through descending length character strings until a morpheme root match is found. The stripping and comparison with the database are repeated as often as required to find a root match.

Citations

14 Claims

1. An improved machine translation system having a natural language source module for accepting externally introduced text in said source language, said module including a lexical database, said system being broadly based upon the concept of Chaos and conducts a divergent search in the source language, a morpheme root database, and further including a morphological word stripping means, said means to be implemented on a data processing device, said system source module includes means implementing a method having the steps whereby each of the words in a subject clause, phrase, or sentence of said externally introduced source language text are individually compared first to data in said lexical database and if said individual words are not found among said data in said lexical database then means are provided whereby said words are subjected to said morphological word stripping means, said stripping means being directed to the affixes of said words and first to the stripping of suffixes, if any, from each said word followed by the step of comparing an individual stripped word, in the absence of that particular word'"'"'s stripped suffix, with the data in said morpheme root database, which comparison normally proceeds downward through descending length character strings until a morpheme root match is found, further stripping and comparison with said database are repeated as often as required to find a root match.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. A machine translation system as claimed in claim 1 wherein the method utilizing said word affix stripping means also includes means for stripping prefixes, and infixes, if any, from said words in the event that the stripping of suffixes was not adequate for reaching the word root and matching each said affix stripped word to said morpheme root data base.
  - 3. A machine translation system of the type claimed in claim 2 wherein such a divergent search can produce a multiplicity of possible solutions.
  - 4. A machine translation system of the type claimed in claim 3 wherein such a divergent search will also include inflected forms of all words.
  - 5. A machine translation system of the type claimed in claim 2 wherein means for attaching appropriate tags are provided and at least one appropriate tag is attached to said root word denoting the affixes such as prefixes, infixes, as well as suffixes that have been stripped from said root word, along with syntactic analysis, including but not limited to, word type, tense, gender, pluralism, and location clause or phrase, subject, object, and any other identification thought necessary in order to provide a smooth translation into a target language.
  - 6. An improved machine translation system to be implemented on a data processing device, as claimed in claim 5, wherein said system generally consists of three modules, said modules including said source or first module in a first natural national language adapted to accept said externally introduced text in said first language that is to be translated, said text being subjected to said method contained in said source module, a universal second or intermediate bridge module including means for translating said first national natural language into a universal internationally created second language, said second module including means for carrying out said translation with said at least one tag attached to each said word for identification and classification purposes;
    - and a third or target module carrying a second natural national language, said target module including means and a database capable of accepting the tagged words from said second module and readily translating them into said target second natural national language;
      
      said universal second or intermediate bridge module being usable universally with all of said first and third or source and target national natural language modules, respectively, regardless of whatever different languages might be resident therein.
  - 7. A machine translation system of the type claimed in claim 6 wherein said third module including means for utilizing a portion of its database for direct translation from said universal internationally created language into said target second natural national language, and a portion of said module having means for recombinant morphology usable in the rebuilding step, if necessary, of root words in said second national natural target language text by the method of addition of morphemes in said target language in order to bring about a relatively accurate and true translation thereof in relation to any stripped affixes carried out in said first module.
  - 8. A machine translation system of the type claimed in claim 7 wherein said word stripping is the degenerative stage of morphology in the source language while said recombinant or replacement of the stripped suffix/prefix to the root word is the generative stage of morphology in the target language, the generative stage being substantially a mirror image of the degenerative stage.
  - 9. A machine translation system of the type claimed in claim 8 wherein said generative stage is based on substantially the reverse of the degenerative morphology table of said target language when it is used as a source language.
  - 10. A machine translation system of the type claimed in claim 9 wherein said generative morphology is the means for recognizing and being cognizant of spelling shifts, if they exist, in said target second language contained in said third module.

11. A machine translation system for translating text from a first national natural source language to a second national natural target language through a universal machine method adapted to be implemented on a data processing device including a first module having a lexical database identifiably with said source language and said first module including means capable of performing a syntactic and lexical analysis on said text and attaching informational tags on each word of said text, a universal intermediate second module providing an interface having an operating environment for display to a user and a basis for issuing commands and receiving information, said second module also including a lexical database in an intermediate international created language that is capable of accepting said syntactic and lexical analysis of said text from said first module and including means for translating said source language words carrying said informational tags into said international created language while retaining said informational tags, and a third target module having a lexical database identifiable with said second national natural target language, and including means to accept said intermediate created language with its tagged words of said text and proceed to translate the text into the target national natural language, said second module being universally accepted by a multiplicity of differing national languages each of which has one of its own said first source module of one of its own said third target module;
- said first module also including a root word morpheme database, and having means whereby any individual words of said source text which cannot be initially matched with a word in said first module lexical database are then subjected to morphological stripping of endings and prefixes until the root of said words can be matched with said root word morpheme database, appropriate designating tags are attached to each said root word indicating, but not limited to, the root word designator, type of word, tense, gender, pluralism, and particular ending or prefix morpheme stripped therefrom, means are provided so that appropriate morphemes can be added to the translated root word in the target language of said third module, said system further including means for inputting text into file means in said first module, said machine method includes means adapted to read the said input file a character at a time until it reaches some form of punctuation which terminates a statement, including periods, commas exclamation marks, dashes, ellipsis, question marks;
  
  said last mentioned means is directed to process only one statement at a time and all punctuation falls through as is appropriate;
  
  means are provided wherein each word in the statement is looked up in the lexical database, if no match is found for an individual word the lexical database returns an error code;
  
  said individual words returned with an error code goes to a morphology database including means which strips successive affixes, including suffixes or prefixes, off said word and modifies it to determine if the root of said word is in the lexical database, such a termination is made by checking said word against said database each time a morpheme is stripped from said word, and repeated until a match is found, said lexical database returns grammatical information about each of said words, however, said morphology database includes means that has the power to supersede this grammatical information during said stripping operation, however, if said word is of the type that may be many different parts of speech, including a verb, noun, adjective, adverb, article or preposition, and is ambiguous and/or did not pass through morphological stripping;
  
  means are provides for an indeterminate flag to be set and additional means are provided whereby a grammatical analysis is performed by examination of the proximal words, if the said word is the first word followed by a noun the probability of it being adjectival is very high, if, on the other hand if the word before said word is an article said word must be a noun, in either event said word is appropriately flagged as to word type, once said word type has been resolved, in the lexical database, means are provided whereby it is tagged as to type and the proper individual identification for said word, which identification remains the same regardless of what language or what module the text may reside, if said word has multiple possibilities as to its type, as set forth above, namely including verb, noun, adjective, adverb, article, or preposition, then means are provided whereby a heuristic approach is utilized land it will appear as many times as there are possibilities, lookups are repeated a plurality of times until no ambiguities are remaining, said system further including program means whereby verbs are identified next by starting at the end of the sentence and/or statement and working forward until a first main verb is located, said system program means is intelligent since it stops processing when it encounters any additional main verbs or definite clause markers or punctuation, said program means then continues and if a verb is marked as an infinitive said program means moves on to further translation, verbs are tensed and during this process the said program means checks for modals and auxiliary verbs and sets them aside for later treatment.
- View Dependent Claims (12, 13, 14)
- - 12. A machine translation system as claimed in claim 11 including means whereby, after verbs are tensed, subjects and objects are located by proximal rule along with clausal analysis, said means then commences with the last verb in the statement and works backwards looking for nouns as well as moving forward from the verb to look for nouns, on each side of said verb the program means looks for clause and direction markers, direction indicates an object when after said verb, and the program looks for nouns before the verb for subjects.
  - 13. A machine translation system as claimed in claim 12 including means wherein phrases are identified idiomatically at the end of each sentence, if a word is part of a phrase it is assembled, using the same mechanism by which verbs are handled, a separate phrase/idiom database is provided and when it is identified an intermediate number is used in place of the phrase, means are provided in said third module database for accepting phrases in their own database and translating them into the target language from the intermediate language bridge.
  - 14. A machine translation system as claimed in claim 13 wherein means are provided whereby the user has the option to use his own lexicon to define a particular word differently than the database has done, once the user has introduced his definition it will supersede that of the program in the lookup and will be processed that way thereafter as if it were part of the original program definition.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intek International Food Products, Inc.
Original Assignee
Toltran, Ltd.
Inventors
Brisk, Richard, Kasindorf, Barry M., Tolin, Bruce G., Hatch, Mark, Tolin, Stanley
Primary Examiner(s)
Hayes, Gail O.
Assistant Examiner(s)
Kyle, Charles R.

Application Number

US07/403,683
Time in Patent Office

2,345 Days
Field of Search

364/419, 364/900, 364/419.02, 364/419.05, 364/419.04, 364/419.11
US Class Current

704/2
CPC Class Codes

G06F 40/268   Morphological analysis

G06F 40/279   Recognition of textual enti...

G06F 40/55   Rule-based translation

Improved translation system utilizing a morphological stripping process to reduce words to their root configuration to produce reduction of database size

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Improved translation system utilizing a morphological stripping process to reduce words to their root configuration to produce reduction of database size

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links