Method and system for analyzing various languages and constructing language-independent semantic structures
First Claim
1. A method for a computer to analyze a sentence in a source language, comprising:
- performing a lexical analysis of the sentence in the source language;
performing a lexical-morphological analysis of each element of the sentence and building a lexical-morphological structure for the whole sentence;
performing a rough syntactic analysis on the lexical-morphological structure comprising;
generating all possible constituents for each element of the lexical-morphological structure;
generalizing the constituents to form a set of generalized constituents; and
generating a graph of the generalized constituents to describe all possible syntactic structures of the whole sentence, comprising;
for each generalized constituent having a lexical meaning and grammatical value which corresponds to a connection in the lexical-morphological structure;
initializing a surface model for the generalized constituent;
attempting to attach other constituents in surface slots of the syntforms of the surface model right and left neighboring constituents; and
establishing non-tree links on the graph of the generalized constituents;
performing a precise syntactic analysis to produce one or more syntactic trees for the sentence from the graph of the generalized constituents, the precise syntactic analysis including;
generating a graph of precise constituents, the graph of precise constituents being an intermediate representation between the graph of the generalized constituents and the one or more syntactic trees;
generating one or more syntactic structure variants from the graph of the precise constituents;
rating the precise constituents based on a plurality of rating scores independently obtained and calculated, including rating scores of one or more lexical meanings for each element of the sentence, rating scores of one or more syntactic constructions for each element of the sentence, rating scores of a degree of correspondence of the precise constituents to their semantic descriptions, and rating scores of a linear order of constituents in the sentence;
using the rating of the precise constituents to generate hypotheses about the overall syntactic structure of the sentence; and
selecting one or more hypotheses about the overall syntactic structure of the sentence with a highest rating score from the hypotheses generated;
selecting a best syntactic structure for the sentence from the one or more syntactic trees; and
generating a language-independent semantic structure for the sentence in the source language based at least in part upon the best syntactic structure for the sentence.
6 Assignments
0 Petitions
Accused Products
Abstract
A method and computer system for analyzing sentences of various languages and constructing a language-independent semantic structure are provided. On the basis of comprehensive knowledge about languages and semantics, exhaustive linguistic descriptions are created, and lexical, morphological, syntactic, and semantic analyses for one or more sentences of a natural or artificial language are performed. A computer system is also provided to implement, analyze and store various linguistic structures and to perform lexical, morphological, syntactic, and semantic analyses. As result, a generalized data structure, such as a semantic structure, is generated and used to describe the meaning of one or more sentences in language-independent form, applicable to automated abstracting, machine translation, control systems, Internet information retrieval, etc.
-
Citations
19 Claims
-
1. A method for a computer to analyze a sentence in a source language, comprising:
-
performing a lexical analysis of the sentence in the source language; performing a lexical-morphological analysis of each element of the sentence and building a lexical-morphological structure for the whole sentence; performing a rough syntactic analysis on the lexical-morphological structure comprising; generating all possible constituents for each element of the lexical-morphological structure; generalizing the constituents to form a set of generalized constituents; and generating a graph of the generalized constituents to describe all possible syntactic structures of the whole sentence, comprising; for each generalized constituent having a lexical meaning and grammatical value which corresponds to a connection in the lexical-morphological structure; initializing a surface model for the generalized constituent; attempting to attach other constituents in surface slots of the syntforms of the surface model right and left neighboring constituents; and establishing non-tree links on the graph of the generalized constituents; performing a precise syntactic analysis to produce one or more syntactic trees for the sentence from the graph of the generalized constituents, the precise syntactic analysis including; generating a graph of precise constituents, the graph of precise constituents being an intermediate representation between the graph of the generalized constituents and the one or more syntactic trees; generating one or more syntactic structure variants from the graph of the precise constituents; rating the precise constituents based on a plurality of rating scores independently obtained and calculated, including rating scores of one or more lexical meanings for each element of the sentence, rating scores of one or more syntactic constructions for each element of the sentence, rating scores of a degree of correspondence of the precise constituents to their semantic descriptions, and rating scores of a linear order of constituents in the sentence; using the rating of the precise constituents to generate hypotheses about the overall syntactic structure of the sentence; and selecting one or more hypotheses about the overall syntactic structure of the sentence with a highest rating score from the hypotheses generated; selecting a best syntactic structure for the sentence from the one or more syntactic trees; and generating a language-independent semantic structure for the sentence in the source language based at least in part upon the best syntactic structure for the sentence. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 16)
-
-
13. A method of analyzing a sentence in a source language, comprising:
-
performing a lexical analysis of the sentence in the source language; performing a lexical-morphological analysis on each element of the sentence and building a lexical-morphological structure for the whole sentence; performing a rough syntactic analysis on the lexical-morphological structure comprising; generating all possible constituents for each element of the lexical-morphological structure; generalizing the constituents to form a set of generalized constituents; and generating a graph of the generalized constituents to describe all possible syntactic structures of the whole sentence, comprising; for each generalized constituent having a lexical meaning and grammatical value which corresponds to a connection in the lexical-morphological structure; initializing a surface model for the generalized constituent; attempting to attach other constituents in surface slots of the syntforms of the surface model right and left neighboring constituents; and establishing non-tree links on the graph of the generalized constituents; performing a precise syntactic analysis on the graph of the generalized constituents, wherein the precise syntactic analysis includes; rating the graph of the precise constituents based on a plurality of rating scores independently obtained and calculated, including a rating score for one or more lexical meanings for each element of the sentence, a rating score for one or more syntactic constructions for each element of the sentence, a rating score for a degree of correspondence of the precise constituents to their semantic descriptions, and a rating score for the linear order of the constituents; using the rating scores to generate hypotheses about the overall syntactic structure of the sentence; and selecting one or more best hypotheses about the overall syntactic structure of the sentence with the highest rating score from the hypotheses generated; generating a graph of precise constituents, the graph of precise constituents being an intermediate re resentation between the graph of the generalized constituents and one or more syntactic trees; generating one or more syntactic trees from the graph of the precise constituents; selecting a syntactic structure for the sentence from the one or more syntactic trees; performing a semantic analysis on the selected syntactic structure of the sentence; and generating a language-independent semantic structure for the sentence of the language based at least in part on the semantic analysis of the selected syntactic structure. - View Dependent Claims (14, 15, 17)
-
-
18. A computer readable medium comprising instructions for causing a computing system to carry out steps comprising:
-
performing a lexical analysis of the sentence in the source language; performing a lexical-morphological analysis on the each element of the sentence and building a lexical-morphological structure for the whole sentence; performing a rough syntactic analysis on the lexical-morphological structure of the sentence comprising; generating all possible constituents for each element of the lexical-morphological structure; generalizing the constituents to form a set of generalized constituents; and generating a graph of the generalized constituents to describe all possible syntactic structures of the whole sentence, comprising; for each generalized constituent having a lexical meaning and grammatical value which corresponds to a connection in the lexical-morphological structure; initializing a surface model for the generalized constituent; attempting to attach other constituents in surface slots of the syntforms of the surface model right and left neighboring constituents; and establishing non-tree links on the graph of the generalized constituents; performing a precise syntactic analysis to produce one or more syntactic structures for the sentence from the graph of the generalized constituents; performing a semantic analysis on the syntactic structures of the sentence and generating a language-independent semantic structure for the sentence.
-
-
19. A computer system adapted to analyze a sentence of a language, comprising:
-
a lexical-morphological analyzer adapted to perform a lexical analysis and a lexical-morphological analysis on each element of the sentence and generate a lexical-morphological structure of the sentence; a rough syntactic analyzer adapted to perform a rough syntactic analysis on the lexical-morphological structure of the sentence comprising; generating all possible constituents for each element of the lexical-morphological structure; generalizing the constituents to form a set of generalized constituents; and generating a graph of the generalized constituents to describe all possible syntactic structures of the whole sentence, comprising; for each generalized constituent having a lexical meaning and grammatical value which corresponds to a connection in the lexical-morphological structure; initializing a surface model for the generalized constituent; attempting to attach other constituents in surface slots of the syntforms of the surface model right and left neighboring constituents; and establishing non-tree links on the graph of the generalized constituents; a precise syntactic analyzer adapted to perform a precise syntactic analysis on the graph of the generalized constituents and generate a syntactic structure of the sentence from the graph of the generalized constituents, wherein the precise syntactic analysis includes; rating the graph of the precise constituents based on a plurality of rating scores inde endentl obtained and calculated including a ratin score for one or more lexical meanings for each element of the sentence, a rating score for one or more syntactic constructions for each element of the sentence, a rating score for a degree of correspondence of the precise constituents to their semantic descriptions, and a rating score for the linear order of the constituents; using the rating scores to generate hypotheses about the overall syntactic structure of the sentence; and selecting one or more hypotheses about the overall syntactic structure of the sentence with the highest rating score from the hypotheses generated; and a semantic analyzer adapted to perform a semantic analysis on the syntactic structure of the sentence and generate a language-independent semantic structure for the sentence of the language.
-
Specification