Natural language analyzing apparatus and method, and construction of a knowledge base for natural language analysis

US 5,424,947 A
Filed: 06/12/1991
Issued: 06/13/1995
Est. Priority Date: 06/15/1990
Status: Expired due to Fees

First Claim

Patent Images

1. A programmed computer system for natural language analysis, comprising:

(a) knowledge base means for storing information representing dependencies among words in sentences and information representing taxonym relationships of words, said dependencies information being in the form of first-type tree structures and said taxonym information being in the form of second-type tree structures;

(b) table means, in said knowledge base means and responsive to the entry of a word thereto, for outputting information indicative of

     1) a first-type tree structure in which said word appears,

     2) node location information of said word in said first-type tree structure, and

     3) information indicative of a second-type tree structure in which said word is contained as a hyponym;

(c) means for judging structural unambiguity of a sentence input to the system;

(d) means for extracting a modifier and modifiee pair of words for each possible dependency in a sentence judged to be structurally ambiguous by said judging means;

(e) means for entering said modifier-modifiee pair of words into said table means and determining, on the basis of the information output by said table means, a path for each said modifier-modifiee pair of words with the path having a different word of said modifier-modifiee pair of words at opposite ends and including at least some words in the first-type tree structure;

(f) means for calculating the path distance for each said modifier-modifiee pair of words; and

(g) means for determining the most preferable dependency on the basis of said path distance calculated for each said modifier-modifee pair of words.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system for resolving structural ambiguities in syntactic analysis of natural language, which ambiguities are caused by prepositional phrase attachment, relative clause attachment, and other modifier-modifiee relationships in sentences. The system uses instances of dependency (modification relationship) structures extracted from a terminology dictionary as a knowledge base. Structural ambiguity is represented by indicating that a word in a sentence has several words as candidate modifiees. The system resolves such ambiguity by 1) first searching the knowledge base, which contains dependency information in the form of tree structures, for dependencies between the word and each of its possible modifiees, 2) then assigning an order of preference to these dependencies by means of a path search in the tree structures, and 3) finally selecting the most preferable dependency as the modifiee. The sentences can be analyzed by a parser and transformed into dependency structures by the system. The knowledge base can be constructed automatically, since the source of knowledge exists in the form of texts, and knowledge bootstrapping can be realized by adding the outputs of the system to its knowledge base.

459 Citations

31 Claims

1. A programmed computer system for natural language analysis, comprising:
- (a) knowledge base means for storing information representing dependencies among words in sentences and information representing taxonym relationships of words, said dependencies information being in the form of first-type tree structures and said taxonym information being in the form of second-type tree structures;
  
  (b) table means, in said knowledge base means and responsive to the entry of a word thereto, for outputting information indicative of
  
       1) a first-type tree structure in which said word appears,
  
       2) node location information of said word in said first-type tree structure, and
  
       3) information indicative of a second-type tree structure in which said word is contained as a hyponym;
  
  (c) means for judging structural unambiguity of a sentence input to the system;
  
  (d) means for extracting a modifier and modifiee pair of words for each possible dependency in a sentence judged to be structurally ambiguous by said judging means;
  
  (e) means for entering said modifier-modifiee pair of words into said table means and determining, on the basis of the information output by said table means, a path for each said modifier-modifiee pair of words with the path having a different word of said modifier-modifiee pair of words at opposite ends and including at least some words in the first-type tree structure;
  
  (f) means for calculating the path distance for each said modifier-modifiee pair of words; and
  
  (g) means for determining the most preferable dependency on the basis of said path distance calculated for each said modifier-modifee pair of words.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. A system for natural language analysis according to claim 1, further comprising;
    - (h) means for storing in said knowledge base means a first-type tree structure for an input sentence including said most preferable dependency determined by said determining means, and for renewing said table means in response thereto.
  - 3. A system for natural language analysis according to claim 2 wherein said knowledge base means separately stores learned data and context data added by said storing means.
  - 4. A system for natural language analysis according to claim 2 wherein said calculating means calculates said path distance on the basis of the degree of consistency between the path and a first-type tree structure added by said storing means.
  - 5. A system for natural language analysis according to claim 1 wherein said table means is separately prepared for learned data and for context data.
  - 6. A system for natural language analysis according to claim 1 wherein said calculating means calculates said path distance, based on the number of dependencies included in the path.
  - 7. A system for natural language analysis according to claim 1 wherein said first-type tree structure is provided with semantic case data for each dependency.
  - 8. A system for natural language analysis according to claim 7 wherein said calculating means calculates said path distance according to the consistency between the case relationship between a modifier and a candidate modifiee and the case relationship for the path.
  - 9. A system for natural language analysis according to claim 8 wherein said calculating means calculates a dependency distance using the formula:
    - ##EQU2## where n is a real number in the range 0<
      
      n<
      
      1 and is an heuristic parameter that represents the degree of unimportance of the context, and the other parameters have values of 1 or 0.
  - 10. A system for natural language analysis according to claim 1 wherein said calculating means calculates said path distance, on the basis of the consistency of co-occurrence of a word included in an input sentence and a word included in said first-type tree structure for the path.
  - 11. A system for natural language analysis according to claim 1 wherein said second-type tree structure is an isa tree having only two nodes corresponding to a hypernym and a hyponym, and wherein said entering means is responsive to an output of a hypernym of a word forming the pair, to iterate search for an isa tree including said hypernym as a hyponym.
  - 12. A system for natural language analysis according to claim 1 wherein a synonym relationship is represented by two isa trees.

13. In a computer system including a knowledge base that stores first-type tree structures representing dependencies among words in sentences and second- type tree structures representing taxonym relationships of words and including a table responsive to the entry of a word for 1) outputting identification data of said first-type tree structure in which said word appears, 2) node location data of said word in said first-type tree structure, and 3) identification data of said second-type tree structure in which said word appears as a hyponym, a computer implemented natural language analysis method comprising the steps of:
- (a) judging the structural unambiguity of an incoming sentence;
  
  (b) extracting a modifier and modifiee pair of two words for each possible dependency in a sentence judged to be structurally ambiguous;
  
  (c) entering the two words of each said modifier-modifiee pair into said table means and for each said modifier-modifiee pair determining, on the basis of the output data, a path that has said two words of said modifier-modifiee pair at opposite ends and contains some of the words appearing in said first-type tree structure;
  
  (d) calculating the path distance for each pair; and
  
  (e) determining the most preferable dependency relationship, the basis of said path distance calculated for each said pair.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 14. A natural language analysis method according to claim 13 further comprising the step of:
    - (f) storing in said knowledge base a first-type tree structure for an input sentence including said most preferable dependency determined by said step (e) and renewing said table in response thereto.
  - 15. A natural language analysis method according to claim 14 wherein said knowledge base separately stores learned data and context data added by said step (f).
  - 16. A natural language analysis method according to claim 14 wherein said table is separately prepared for learned data and context data.
  - 17. A natural language analysis method according to claim 13 wherein said step (d) calculates said distance, on the basis of the number of dependencies included in the path.
  - 18. A natural language analysis method according to claim 13 wherein said first-type tree structure is provided with semantic case data for each dependency.
  - 19. A natural language analysis method according to claim 18 wherein said step (d) calculates said distance according to the consistency between the case relationship of a modifier and a candidate modifiee and the case relationship for the path.
  - 20. A natural language analysis method according to claim 13 wherein said step (d) calculates said distance according to the co-occurrence consistency of a word included in said input sentence and a word included in said first-type tree structure for the path.
  - 21. A natural language analysis method according to claim 13 wherein said step (d) calculates said distance according to the degree of consistency between the path and a first-type tree structure added by said step (f).
  - 22. A natural language analysis method according to claim 13 wherein said second-type tree structure is an isa tree having only two nodes corresponding to a hypernym and a hyponym, and wherein said step (c) is responsive to an output of a hypernym of a word forming the pair, to iterate search for an isa tree including said hypernym as a hyponym.
  - 23. A natural language analysis method according to claim 13 wherein a synonym relationship is represented by two isa tree structures,

24. A method for constructing a knowledge base in a computer for natural language analysis comprising the computer implemented steps of:
- .(a) preparing a knowledge base that stores tree structures representing dependencies among words in sentences;
  
  (b) determining the most preferable of the possible dependencies for an incoming sentence by using data in said knowledge base; and
  
  (c) storing in said knowledge base a tree structure for the incoming sentence that includes said most preferable dependency.
- View Dependent Claims (25)
- - 25. A method for constructing a knowledge base according to claim 24 wherein said knowledge base separately stores learned data and context data added by said step (c).

26. A method of constructing a knowledge base in a computer for natural language analysis comprising the computer implemented steps of:
- (a) preparing a database for storing tree structures representing dependencies among words in sentences, and preparing a table responsive to the entry of a word for outputting identification data of at least one of said tree structures containing said word and node location data of said word in said tree structure;
  
  (b) determining the most preferable of the possible dependencies for an ambiguous sentence by using data in said database and said table; and
  
  (c) storing in said database a tree structure for the ambiguous sentence that includes said most preferable dependency and renewing said table in response thereto.
- View Dependent Claims (27, 28)
- - 27. A method for constructing a knowledge base according to claim 26 wherein said knowledge base separately stores learned data and context data added by said step (c).
  - 28. A method for constructing a knowledge base according to claim 26 wherein said table is separately prepared for learned data and for context data.

29. A programmed computer system for natural language analysis, comprising:
- means for receiving respective syntactic structures including attachments indicative of ambiguities, derived from natural language sentences to be analyzed, and for converting said syntactic structures into dependency structures indicative of the dependency between words in said sentences;
  
  knowledge base means for storing information representing dependencies between words in sentences and information representing taxonym relationships of words, said information being in the form of tree structures;
  
  means for extracting dependencies from each dependency structure and producing respective multiple possible candidate dependencies;
  
  means for searching said knowledge base means for stored information representing dependencies related to said candidate dependencies and for selecting the most preferable candidate dependency for each dependency structure based on a comparison of dependency distances derived from a path search in said tree structures of said knowledge base means; and
  
  means for transforming each dependency structure using such dependency structures most preferable candidate dependency to remove the ambiguities therein.
- View Dependent Claims (30, 31)
- - 30. A system for natural language analysis as in claim 29 wherein said knowledge base comprises table means responsive to the entry of a word for outputting identification data of at least one of said tree structures containing said word and node location data of said word in said tree structure.
  - 31. A system for natural language analysis as in claim 29 wherein said dependency distances are determined using the formula:
    - ##EQU3## where n is a real number in the range 0<
      
      n<
      
      1 and is an heuristic parameter that represents the degree of unimportance of the context, and the other parameters have values of 1 or 0.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Nagao, Katashi, Nomiyama, Hiroshi
Primary Examiner(s)
Weinhardt, Robert A.
Assistant Examiner(s)
CHUNG TRANS, XUONG MY

Application Number

US07/714,408
Time in Patent Office

1,462 Days
Field of Search

364/419, 364/419.08, 364/419.04
US Class Current

704/9
CPC Class Codes

G06F 40/211 Syntactic parsing, e.g. bas...

G06F 40/30 Semantic analysis

Natural language analyzing apparatus and method, and construction of a knowledge base for natural language analysis

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

459 Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Natural language analyzing apparatus and method, and construction of a knowledge base for natural language analysis

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

459 Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links