×

Multiple score language processing system

  • US 5,418,717 A
  • Filed: 12/12/1991
  • Issued: 05/23/1995
  • Est. Priority Date: 08/27/1990
  • Status: Expired due to Fees
First Claim
Patent Images

1. A language processing system for generating the most likely analysis of the type of an annotated syntax tree of a sentence comprising a word sequence, wherein the word sequence is received from digitally encoded text, and outputting the most likely analysis via computer processing means, wherein said most likely analysis includes the most likely sequence of lexical categories for the words, the most likely syntactic structure of the type of a syntax tree for the sentence, and the most likely semantic attribute for each word, the language processing system comprising:

  • means for storing dictionary data records containing possible lexical categories and semantic attributes of words in said computer;

    means for storing grammar rules, indicative of the parent-children node relationship among grammatical constituents, by computer processing means, and assigning an ordered list of numbers (hereinafter, a permutation vector), for each grammar rule indicative of the semantic precedence of each child node relative to the other nodes;

    means for decomposing a syntax tree into a plurality of phrase levels representative of the structure and substructures of said tree, and the context under which a substructure is constructed, by computer processing means;

    annotating means for forming an ordered semantic feature vector for each node of a syntax tree representative of the major semantic features of said each node, and the semantic relationship among the features of the words, by transferring the semantic attributes of the words upward to the tree nodes, according to said permutation vectors, by computer processing means;

    means for driving data records indicative of the real usage of the words, lexical categories, syntactic structures and semantic feature co-occurrence, in text corpora annotated with lexical categories, syntax trees and semantic attributes, with computer processing means, by using said decomposing means and annotating means;

    means for storing statistical data, derived from said annotated text corpora, indicative of the probability of a word among all words having a common lexical category (hereinafter, lexical category probability), the probability of a lexical category being preceded by at least one neighboring lexical category (hereinafter, lexical context probability), the probability of a phrase level being reduced from a neighboring phrase level, or equivalently, the probability of constructing a nonterminal node under a particular contextual environment defined by neighboring terminal or nonterminal nodes (hereinafter, syntactic score probability), and the probability of a node being annotated with a particular ordered semantic feature vector given the syntactic subtree rooted at said node and at least one adjacent node of said node being annotated (hereinafter, semantic score probability);

    means for receiving a sentence from computer input devices or storage media;

    means, operative on said stored dictionary data, grammar rules and permutation vectors, for determining all possible annotated syntax trees, or equivalently, all possible lexical category sequences for the words, all syntactic structures, of the type of a syntax tree, for said lexical category sequences, and all semantic attribute sequences corresponding to said category sequences, and aid syntactic structures, by computer processing means, for said sentence or word sequence;

    means, operative on said stored statistical data by computer processing means, for generating an analysis score, for each possible analysis (or annotated syntax tree), of said sentence or word sequence; and

    means for determining the most likely sequence of lexical categories for the words;

    means for determining the most likely syntactic structure for a sentence;

    means for determining the most likely semantic attribute for a plurality of words in the text word; and

    means for outputting an output annotated syntax tree according to said analysis score thus generated.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×