Natural language analyzing apparatus and method, and construction of a knowledge base for natural language analysis
First Claim
1. A programmed computer system for natural language analysis, comprising:
- (a) knowledge base means for storing information representing dependencies among words in sentences and information representing taxonym relationships of words, said dependencies information being in the form of first-type tree structures and said taxonym information being in the form of second-type tree structures;
(b) table means, in said knowledge base means and responsive to the entry of a word thereto, for outputting information indicative of
1) a first-type tree structure in which said word appears,
2) node location information of said word in said first-type tree structure, and
3) information indicative of a second-type tree structure in which said word is contained as a hyponym;
(c) means for judging structural unambiguity of a sentence input to the system;
(d) means for extracting a modifier and modifiee pair of words for each possible dependency in a sentence judged to be structurally ambiguous by said judging means;
(e) means for entering said modifier-modifiee pair of words into said table means and determining, on the basis of the information output by said table means, a path for each said modifier-modifiee pair of words with the path having a different word of said modifier-modifiee pair of words at opposite ends and including at least some words in the first-type tree structure;
(f) means for calculating the path distance for each said modifier-modifiee pair of words; and
(g) means for determining the most preferable dependency on the basis of said path distance calculated for each said modifier-modifee pair of words.
1 Assignment
0 Petitions
Accused Products
Abstract
A system for resolving structural ambiguities in syntactic analysis of natural language, which ambiguities are caused by prepositional phrase attachment, relative clause attachment, and other modifier-modifiee relationships in sentences. The system uses instances of dependency (modification relationship) structures extracted from a terminology dictionary as a knowledge base. Structural ambiguity is represented by indicating that a word in a sentence has several words as candidate modifiees. The system resolves such ambiguity by 1) first searching the knowledge base, which contains dependency information in the form of tree structures, for dependencies between the word and each of its possible modifiees, 2) then assigning an order of preference to these dependencies by means of a path search in the tree structures, and 3) finally selecting the most preferable dependency as the modifiee. The sentences can be analyzed by a parser and transformed into dependency structures by the system. The knowledge base can be constructed automatically, since the source of knowledge exists in the form of texts, and knowledge bootstrapping can be realized by adding the outputs of the system to its knowledge base.
459 Citations
31 Claims
-
1. A programmed computer system for natural language analysis, comprising:
-
(a) knowledge base means for storing information representing dependencies among words in sentences and information representing taxonym relationships of words, said dependencies information being in the form of first-type tree structures and said taxonym information being in the form of second-type tree structures; (b) table means, in said knowledge base means and responsive to the entry of a word thereto, for outputting information indicative of
1) a first-type tree structure in which said word appears,
2) node location information of said word in said first-type tree structure, and
3) information indicative of a second-type tree structure in which said word is contained as a hyponym;(c) means for judging structural unambiguity of a sentence input to the system; (d) means for extracting a modifier and modifiee pair of words for each possible dependency in a sentence judged to be structurally ambiguous by said judging means; (e) means for entering said modifier-modifiee pair of words into said table means and determining, on the basis of the information output by said table means, a path for each said modifier-modifiee pair of words with the path having a different word of said modifier-modifiee pair of words at opposite ends and including at least some words in the first-type tree structure; (f) means for calculating the path distance for each said modifier-modifiee pair of words; and (g) means for determining the most preferable dependency on the basis of said path distance calculated for each said modifier-modifee pair of words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. In a computer system including a knowledge base that stores first-type tree structures representing dependencies among words in sentences and second- type tree structures representing taxonym relationships of words and including a table responsive to the entry of a word for 1) outputting identification data of said first-type tree structure in which said word appears, 2) node location data of said word in said first-type tree structure, and 3) identification data of said second-type tree structure in which said word appears as a hyponym, a computer implemented natural language analysis method comprising the steps of:
-
(a) judging the structural unambiguity of an incoming sentence; (b) extracting a modifier and modifiee pair of two words for each possible dependency in a sentence judged to be structurally ambiguous; (c) entering the two words of each said modifier-modifiee pair into said table means and for each said modifier-modifiee pair determining, on the basis of the output data, a path that has said two words of said modifier-modifiee pair at opposite ends and contains some of the words appearing in said first-type tree structure; (d) calculating the path distance for each pair; and (e) determining the most preferable dependency relationship, the basis of said path distance calculated for each said pair. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A method for constructing a knowledge base in a computer for natural language analysis comprising the computer implemented steps of:
- .
(a) preparing a knowledge base that stores tree structures representing dependencies among words in sentences; (b) determining the most preferable of the possible dependencies for an incoming sentence by using data in said knowledge base; and (c) storing in said knowledge base a tree structure for the incoming sentence that includes said most preferable dependency. - View Dependent Claims (25)
- .
-
26. A method of constructing a knowledge base in a computer for natural language analysis comprising the computer implemented steps of:
-
(a) preparing a database for storing tree structures representing dependencies among words in sentences, and preparing a table responsive to the entry of a word for outputting identification data of at least one of said tree structures containing said word and node location data of said word in said tree structure; (b) determining the most preferable of the possible dependencies for an ambiguous sentence by using data in said database and said table; and (c) storing in said database a tree structure for the ambiguous sentence that includes said most preferable dependency and renewing said table in response thereto. - View Dependent Claims (27, 28)
-
-
29. A programmed computer system for natural language analysis, comprising:
-
means for receiving respective syntactic structures including attachments indicative of ambiguities, derived from natural language sentences to be analyzed, and for converting said syntactic structures into dependency structures indicative of the dependency between words in said sentences; knowledge base means for storing information representing dependencies between words in sentences and information representing taxonym relationships of words, said information being in the form of tree structures; means for extracting dependencies from each dependency structure and producing respective multiple possible candidate dependencies; means for searching said knowledge base means for stored information representing dependencies related to said candidate dependencies and for selecting the most preferable candidate dependency for each dependency structure based on a comparison of dependency distances derived from a path search in said tree structures of said knowledge base means; and means for transforming each dependency structure using such dependency structures most preferable candidate dependency to remove the ambiguities therein. - View Dependent Claims (30, 31)
-
Specification