×

Method and system for analyzing various languages and constructing language-independent semantic structures

  • US 8,078,450 B2
  • Filed: 10/10/2006
  • Issued: 12/13/2011
  • Est. Priority Date: 10/10/2006
  • Status: Active Grant
First Claim
Patent Images

1. A method for a computer to analyze a sentence in a source language, comprising:

  • performing a lexical analysis of the sentence in the source language;

    performing a lexical-morphological analysis of each element of the sentence and building a lexical-morphological structure for the whole sentence;

    performing a rough syntactic analysis on the lexical-morphological structure comprising;

    generating all possible constituents for each element of the lexical-morphological structure;

    generalizing the constituents to form a set of generalized constituents; and

    generating a graph of the generalized constituents to describe all possible syntactic structures of the whole sentence, comprising;

    for each generalized constituent having a lexical meaning and grammatical value which corresponds to a connection in the lexical-morphological structure;

    initializing a surface model for the generalized constituent;

    attempting to attach other constituents in surface slots of the syntforms of the surface model right and left neighboring constituents; and

    establishing non-tree links on the graph of the generalized constituents;

    performing a precise syntactic analysis to produce one or more syntactic trees for the sentence from the graph of the generalized constituents, the precise syntactic analysis including;

    generating a graph of precise constituents, the graph of precise constituents being an intermediate representation between the graph of the generalized constituents and the one or more syntactic trees;

    generating one or more syntactic structure variants from the graph of the precise constituents;

    rating the precise constituents based on a plurality of rating scores independently obtained and calculated, including rating scores of one or more lexical meanings for each element of the sentence, rating scores of one or more syntactic constructions for each element of the sentence, rating scores of a degree of correspondence of the precise constituents to their semantic descriptions, and rating scores of a linear order of constituents in the sentence;

    using the rating of the precise constituents to generate hypotheses about the overall syntactic structure of the sentence; and

    selecting one or more hypotheses about the overall syntactic structure of the sentence with a highest rating score from the hypotheses generated;

    selecting a best syntactic structure for the sentence from the one or more syntactic trees; and

    generating a language-independent semantic structure for the sentence in the source language based at least in part upon the best syntactic structure for the sentence.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×