Interlingua, Interlingua Engine, and Interlingua Machine Translation System
First Claim
1. A system for representing natural languages in a common machine-readable form, called interlingua, comprising a lexicon and a grammar, where:
- a. said lexicon comprises;
1. a system of specifically designed classification codes for prototypical words of noun, adjective, verb, and adverb respectively,2. virtual codes for each kind of derived words,3. a system of synonymous feature set for synonymous word and metaphorical feature set for metaphorical sense of word,b. said grammar comprises;
1. a system of classification code and composition for prototypical clause,2. a system of time feature set, space feature set, and adverb feature set,3. a system of variational feature set for variational clause and synonymous feature set for synonymous clause, and4. a system of metaphor processing procedure, andc. said prototypical verb word and said prototypical clause comprising said prototypical verb word share the same classification code,whereby a sense of a word of a language will be;
a. matched to a unique interlingua classification code if it is prototypical, orb. if it is synonymous to a prototypical word then matched to the same classification code of said prototypical word plus a feature of synonymy, orc. if it is a derived word then matched to a virtual code plus the interlingua classification code of its stem word and a feature of derivation, ord. if it is a fixed extended sense then matched to an interlingua classification code of a word corresponding to said extended sense plus a feature of extension, ore. if it is a multiple use case then matched to an interlingua classification code of a word corresponding to said multiple use plus a feature of multiple use, orf. if it is used in a special clause or idiom then treated according to the corresponding rule governing said special clause or idiom respectively, andwhereby a clause of a language will be;
a. matched to a unique interlingua classification code of its comprised verb if both said clause and said verb are prototypical, orb. if said clause is variational and its comprised verb is prototypical then matched to the same interlingua classification code of its associated prototypical clause plus a feature of variation, orc. if said clause is prototypical and its comprised verb is not prototypical then matched to an interlingua classification code of a verb corresponding to said verb according to said verb being derived, fixed extended, or multiple use plus a corresponding feature of said verb, ord. if said clause is variational and its comprised verb is not prototypical then matched to an interlingua classification code of a verb corresponding to said verb according to said verb being derived, fixed extended, or multiple use plus a corresponding feature of said verb and a variational feature of said clause, ore. if said clause is a special clause then matched to an interlingua classification code for special clause plus a feature of special clause.
0 Assignments
0 Petitions
Accused Products
Abstract
An embodiment provides (a) a method and system for representing natural languages in a common machine-readable form, including the thorough design of the lexicon and grammar, the resulting representation called interlingua, (b) a method and system for using a computer to convert a text of a natural language into and out of a coded text of said interlingua representation, including a programming framework which is independent of other languages, said system is called interlingua engine, and (c) a method and system of machine translation using said interlingua engine, said system called interlingua machine translation system. Alternative embodiments are described.
87 Citations
18 Claims
-
1. A system for representing natural languages in a common machine-readable form, called interlingua, comprising a lexicon and a grammar, where:
-
a. said lexicon comprises; 1. a system of specifically designed classification codes for prototypical words of noun, adjective, verb, and adverb respectively, 2. virtual codes for each kind of derived words, 3. a system of synonymous feature set for synonymous word and metaphorical feature set for metaphorical sense of word, b. said grammar comprises; 1. a system of classification code and composition for prototypical clause, 2. a system of time feature set, space feature set, and adverb feature set, 3. a system of variational feature set for variational clause and synonymous feature set for synonymous clause, and 4. a system of metaphor processing procedure, and c. said prototypical verb word and said prototypical clause comprising said prototypical verb word share the same classification code, whereby a sense of a word of a language will be; a. matched to a unique interlingua classification code if it is prototypical, or b. if it is synonymous to a prototypical word then matched to the same classification code of said prototypical word plus a feature of synonymy, or c. if it is a derived word then matched to a virtual code plus the interlingua classification code of its stem word and a feature of derivation, or d. if it is a fixed extended sense then matched to an interlingua classification code of a word corresponding to said extended sense plus a feature of extension, or e. if it is a multiple use case then matched to an interlingua classification code of a word corresponding to said multiple use plus a feature of multiple use, or f. if it is used in a special clause or idiom then treated according to the corresponding rule governing said special clause or idiom respectively, and whereby a clause of a language will be; a. matched to a unique interlingua classification code of its comprised verb if both said clause and said verb are prototypical, or b. if said clause is variational and its comprised verb is prototypical then matched to the same interlingua classification code of its associated prototypical clause plus a feature of variation, or c. if said clause is prototypical and its comprised verb is not prototypical then matched to an interlingua classification code of a verb corresponding to said verb according to said verb being derived, fixed extended, or multiple use plus a corresponding feature of said verb, or d. if said clause is variational and its comprised verb is not prototypical then matched to an interlingua classification code of a verb corresponding to said verb according to said verb being derived, fixed extended, or multiple use plus a corresponding feature of said verb and a variational feature of said clause, or e. if said clause is a special clause then matched to an interlingua classification code for special clause plus a feature of special clause. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
whereby said input module will convert a text of said language into said interlingua representation by solving the two fundamental problems of ambiguity and matching senses, where the ambiguity problem is solved with the aid of said databases, in particular said databases of clause structure list, semantic rule of structure, and semantic code group, and the problem of matching senses is solved by said interlingua representation and said databases, in particular said database of metaphor processing procedure, and said output module will convert an interlingua coded text into a text of said language by first generating a text using said database of clause structure list and then improving the readability of the text with the aid of said databases, in particular said database of synonymous clause list.
-
-
9. An interlingua engine of claim 8, wherein said knowledge databases comprising common sense database, cultural database, encyclopedic database, and professional database.
-
10. An interlingua machine translation system for performing translation between and among a plurality of natural languages, comprising a computer equipped with said interlingua engine of claim 8 having incorporated said languages, means for inputting speeches or texts of said languages into said computer, and means for outputting texts of said languages from said computer in the form of text or speech, whereby speeches or texts of any one of said languages are able to be translated into speeches or texts of any other one of said languages by using said translation system.
-
11. A method of representing natural languages in a common machine-readable form, the resulting representation called interlingua, comprising providing a lexicon and a grammar, where:
-
a. said lexicon comprises; 1. a system of specifically designed classification codes for prototypical words of noun, adjective, verb, and adverb respectively, 2. virtual codes for each kind of derived words, and 3. a system of synomymous feature set for synonymous word and metaphorical feature set for metaphorical sense of word, b. said grammar comprises; 1. a system of classification code and composition for prototypical clause, 2. a system of time feature set, space feature set, and adverb feature set, 3. a system of variational feature set for variational clause and synonymous feature set for synonymous clause, and 4. a system of metaphor processing procedure, and c. said prototypical verb word and said prototypical clause comprising said prototypical verb word share the same classification code, whereby a sense of a word of a language will be; a. matched to a unique interlingua classification code if it is prototypical, or b. if it is synonymous to a prototypical word then matched to the same classification code of said prototypical word plus a feature of synonymy, or c. if it is a derived word then matched to a virtual code plus the interlingua classification code of its stem word and a feature of derivation, or d. if it is a fixed extended sense then matched to an interlingua classification code of a word corresponding to said extended sense plus a feature of extension, or e. if it is a multiple use case then matched to an interlingua classification code of a word corresponding to said multiple use plus a feature of multiple use, or f. if it is used in a special clause or idiom then treated according to the corresponding rule governing said special clause or idiom respectively, and whereby a clause of a language will be; a. matched to a unique interlingua classification code of its comprised verb if both said clause and said verb are prototypical, or b. if said clause is variational and its comprised verb is prototypical then matched to the same interlingua classification code of its associated prototypical clause plus a feature of variation, or c. if said clause is prototypical and its comprised verb is not prototypical then matched to an interlingua classification code of a verb corresponding to said verb according to said verb being derived, fixed extended, or multiple use plus a corresponding feature of said verb, or d. if said clause is variational and its comprised verb is not prototypical then matched to an interlingua classification code of a verb corresponding to said verb according to said verb being derived, fixed extended, or multiple use plus a corresponding feature of said verb and a variational feature of said clause, or e. if said clause is a special clause then matched to an interlingua classification code for special clause plus a feature of special clause. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
whereby said input module will convert a text of said language into said interlingua representation by solving the two fundamental problems of ambiguity and matching senses, where the ambiguity problem is solved with the aid of said databases, in particular said databases of clause structure list, semantic rule of structure, and semantic code group, and the problem of matching senses is solved by said interlingua representation and said databases, in particular said database of metaphor processing procedure, and the output module will convert an interlingua coded text into a text of said language by first generating a text using said database of clause structure list and then improving the readability of the text with the aid of said databases, in particular said database of synonymous clause list.
-
-
14. An interlingua engine of claim 13, wherein for said two-argument dynamic clause when both prototypical arguments S and O are concrete, said semantic rule of structure comprises a three level rule of prototypical rule, concrete-abstract rule, and collocation rule, where:
-
a. said prototypical rule comprising a general rule that if both S and O are concrete, and O is collocated, then verb is prototypical, and further; 1. if S is agentive, then clause is prototypical, and 2. if S is not agentive, then S is tool or material in broad sense and clause is variational, b. said concrete-abstract rule comprising a general rule that if at least one of S and O is abstract, then clause and verb are metaphorical, and further; 1. if S is not agentive, then S is tool or material or method or manner or cause in broad sese, and clause is variational, and 2. if both S and O are abstract, then S is method or manner or cause in broad sense and clause is metaphorical and variational or relational, and verb is metaphorical, and c. said collocation rule comprising a general rule that if O is not collocated, then clause and verb are metaphorical, and further; 1. if O is concrete, then verb is extended or metaphorical, 2. if S is not agentive, then S is tool or material or method or manner or cause in broad sense and clause is variational, and 3. if S is not agentive, and O is not concrete, then S is tool or material or method or manner or cause in broad sense and clause is metaphorical and variational or relational.
-
-
15. An interlingua engine of claim 13, wherein said program of input module has a general programming framework which comprises the stages of:
-
a. initializing said computer for preparation of the start of processing, including initializing three complementary databases of role, ambiguity, and sequence to record related data produced in the processing, b. word level processing to perform lexical analysis to resolve word level ambiguities with the aid of said lexicon and grammar of said language, convert un-ambiguous words into said interlingua representation, delete useless senses of words, and prepare un-resolved senses of words for further processing, c. phrase level processing to mark out clauses, adjective phrases and noun phrases, further disambiguate un-resolved ambiguous senses of words and delete useless senses of words, convert indentified phrases and words into said interlingua representation, prepare un-resolved senses of words for further processing, and prepare candidate clauses for sentence level syntactic processing, d. sentence level syntactic processing to check the syntactic correctness of candidate clauses, disambiguate ambiguous clauses with the aid of said database of clause structure list, delete useless candidate clauses, further disambiguate un-resolved senses of words and delete useless senses of words, convert indentified words into said interlingua representation, prepare un-resolved senses of words for further processing, and prepare both identified and un-resolved clauses for sentence level semantic processing, e. sentence level semantic processing to determine the semantic plausibility and meaning of identified clauses, disambiguate still un-resolved clauses with the aid of said databases of semantic rule of structure and semantic code group, delete useless candidate clauses, disambiguate still un-resolved ambiguous senses of words and delete remaining un-resolved senses of words, use said database of metaphor processing procedure to determine the meaning of metaphorically used words and clauses, convert identified clauses and words into said interlingua representation, and prepare still un-resolved clauses for sentence level pragmatic processing, f. sentence level pragmatic processing to finally resolve clauses made ambiguous due to pragmatic use with the aid of said database of clause structure list and said three built-up databases of role, ambiguity, and sequence, convert all the remaining clauses and words into said interlingua representation, and then save the converted text, together with the gathered feature information including said built-up databases, in a coded interlingua text.
-
-
16. An interlingua engine of claim 15, wherein said word level processing further comprises processing a system of high frequence word and said phrase level processing further comprises using said high frequency word to aid to mark clause, adjective phrase, and noun phrase.
-
17. An interlingua engine of claim 13, wherein said program of output module has a general programming framework which comprises the stages of:
-
a. generating a text of said language from a stored interlingua coded text with the aid of said lexicon and grammar as well as said database of clause structure list of said language, and b. rhetoric processing to make said generated text more readable with the aid of said database of synonymous clause list of said language, said database of semantic code group, said three built-up databases of role, ambiguity, and sequence, and said supplementary knowledge databases.
-
-
18. An interlingua machine translation method for performing translation between and among a plurality of natural languages, comprising:
-
a. providing a computer equipped with said interlingua engine of claim 13 having incorporated said languages, b. providing means for inputting speech or text of said languages into said computer, and c. providing means for outputting text of said languages from said computer in the form of text or speech, whereby said computer will translate a speech or text of any one of said languages into a speech or text of any other one of said languages.
-
Specification