Method and system for generating lexicon of cooccurrence relations in natural language

US 4,942,526 A
Filed: 10/24/1986
Issued: 07/17/1990
Est. Priority Date: 10/25/1985
Status: Expired due to Fees

First Claim

Patent Images

1. A method, using a computer including a processor and a memory, of generating cooccurrence relation information indicating whether a sequence of words in a given sentence described in a natural language is semantically correct or not, said method comprising the steps of:

(a) defining categories of sentences on the basis of the types of documents in which the sentences appear;

(b) defining fields of sentences on the basis of the subject matters of the sentences;

(c) preparing a text corpus by collecting input textual sentences belonging to the same category or the same field as the given sentence;

(d) preparing a cooccurrence relation table containing grammar or a set of grammatical rules for analyzing the textual sentences of the text corpus to permit determining a cooccurrence relation between words in the textual sentences;

(e) determining a hypothesized cooccurrence relation between words in the sequence of words in the given sentence on the basis of a cooccurrence relation from said cooccurrence relation table, the hypothesized cooccurrence relation indicating a particular possible concurrence relation between words in the given sentence;

(f) deriving an actual cooccurrence relation between words in the sequence of words in the given sentence from the determined hypothesized cooccurrence relation;

(g) determining whether the actual cooccurrence relation exceeds a predetermined threshold condition for a valid cooccurrence relation; and

(h) when the actual cooccurrence relation exceeds the predetermined threshold condition, outputting information indicating the actual cooccurrence relation as a valid cooccurrence relation.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and an apparatus for generating/maintaining automatically or interactively a lexicon for storing information of cooccurrence relations utilized for determining whether or not a sequence of words in a given sentence described in a natural language is semantically correct with the aid of a memory, a data processor and a textual sentence file. A hypothesized cooccurrence relation table for storing hypothesized cooccurrence relations each having a high probabliity of being a valid cooccurrence relation is prepared by consulting the file. A hypothesis for the cooccurrence relation is previously established on the basis of a cooccurrence relation pattern indicating a probably acceptable conjunction by consulting the hypothesized cooccurrence relation table. Subsequently, a corresponding actual cooccurrence relation is derived from the textual file by parsing the relevant textual sentence and is tested to determine whether the cooccurrence relation is valid or not with reference to predetermined threshold conditions. On the basis of the results of the test, the information of the cooccurrence relation is correspondingly modified. The present method and apparatus can be utilized in a natural language parsing system and a machine translation system.

163 Citations

13 Claims

1. A method, using a computer including a processor and a memory, of generating cooccurrence relation information indicating whether a sequence of words in a given sentence described in a natural language is semantically correct or not, said method comprising the steps of:
- (a) defining categories of sentences on the basis of the types of documents in which the sentences appear;
  
  (b) defining fields of sentences on the basis of the subject matters of the sentences;
  
  (c) preparing a text corpus by collecting input textual sentences belonging to the same category or the same field as the given sentence;
  
  (d) preparing a cooccurrence relation table containing grammar or a set of grammatical rules for analyzing the textual sentences of the text corpus to permit determining a cooccurrence relation between words in the textual sentences;
  
  (e) determining a hypothesized cooccurrence relation between words in the sequence of words in the given sentence on the basis of a cooccurrence relation from said cooccurrence relation table, the hypothesized cooccurrence relation indicating a particular possible concurrence relation between words in the given sentence;
  
  (f) deriving an actual cooccurrence relation between words in the sequence of words in the given sentence from the determined hypothesized cooccurrence relation;
  
  (g) determining whether the actual cooccurrence relation exceeds a predetermined threshold condition for a valid cooccurrence relation; and
  
  (h) when the actual cooccurrence relation exceeds the predetermined threshold condition, outputting information indicating the actual cooccurrence relation as a valid cooccurrence relation.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A method of generating cooccurrence relation information according to claim 1, wherein steps (e) and (f) are automatically executed in accordance with a predetermined processing program.
  - 3. A method of generating cooccurrence relation information according to claim 1, wherein steps (e) and (f) are executed in accordance with a processing program which interacts with a display device adapted for displaying the cooccurrence relation information and an information input device.
  - 4. A method of generating cooccurrence relation information according to claim 1, further comprising the step of inputting information of the valid cooccurrence relation into a cooccurrence relation lexicon in accordance with predetermined conditions for inclusion in said lexicon.
  - 5. A method of generating cooccurrence relation information according to claim 1, wherein the outputted information includes first data concerning the valid cooccurrence relation and second data representative of a combination of words for which the cooccurrence relation is valid.

6. A method, using a computer including a processor and a memory, of automatically generating and maintaining a cooccurrence relation lexicon storing cooccurrence relation information indicating whether a sequence of words in a given sentence described in a natural language is semantically correct or not, said method comprising the steps of:
- (a) storing in said memory a processing program for generating or maintaining said cooccurrence relation lexicon and a table containing hypothesized cooccurrence relations of high probability;
  
  (b) defining categories of sentences on the basis of the types of documents in which the sentences appear;
  
  (c) defining fields of sentences on the basis of the subject matters of the sentences;
  
  (d) preparing a text corpus file by collecting input textual sentences belonging to the same category or the same field as the given sentence;
  
  (e) determining a hypothesized cooccurrence relation between words in the sequence of words in the given sentence on the basis of a cooccurrence relation from said hypothesized cooccurrence relation table, the hypothesized cooccurrence relation indicating a particular possible cooccurrence relation between words in the given sentence;
  
  (f) deriving from said text corpus file actual textual sentences relevant to terms contained in the most recently determined hypothesized cooccurrence relation, analyzing the derived actual textual sentences, and storing the result of the analysis in said memory;
  
  (g) determining whether the result of the analysis indicates that information having the most recently determined hypothesized cooccurrence relation meets predetermined threshold conditions;
  
  (h) when the result of the analysis indicates that the information having the most recently determined hypothesized cooccurrence relation meets the predetermined threshold conditions, including the most recently determined hypothesized cooccurrence relation in said lexicon unless data of cooccurrence relations corresponding to a super-concept or a subconcept of the most recently determined hypothesized cooccurrence relation are present in said lexicon, and examining the probability of determining another hypothesized cooccurrence relation;
  
  (i) when the result of the analysis indicates that the information having the most recently determined hypothesized cooccurrence relation does not meet the predetermined threshold conditions, examining the probability of determining a further hypothesized cooccurrence relation;
  
  (j) when the result of the most recent analysis indicates that the possible further hypothesized cooccurrence relation does not meet the predetermined threshold conditions, examining the probability of determining a still further hypothesized cooccurrence relation; and
  
  (k) when a probability of establishing a further hypothesized cooccurrence relation is found in step (h), (i), or (j), re-executing the method commencing with step (e).
- View Dependent Claims (7, 8, 13)
- - 7. A method according to claim 6, wherein step (g) includes reading out data of the cooccurrence relation registered in said lexicon and deleting or modifying the read out data unless the predetermined threshold conditions are met by the readout data.
  - 8. A method according to claim 6, wherein step (d) includes updating said text corpus file periodically.
  - 13. A method of generating cooccurrence relation information according to claim 6 wherein step (a) comprises storing in said memory a processing program for generating and maintaining said cooccurrence relation lexicon.

9. A system for generating cooccurrence relation information indicating whether a sequence of words in a given sentence described in a natural language is semantically correct or not, wherein the given sentence is defined as within a particular one of a plurality of sentence categories on the basis of the type of document in which the given sentence appears and is defined as within a particular one of a plurality of sentence fields on the basis of the subject matter of the given sentence, said system comprising:
- a text corpus file including textual sentences belonging to the same category or the same field as the given sentence;
  
  a cooccurrence relation table containing grammar or a set of grammatical rules for analyzing the textual sentences of said text corpus file to permit determining a cooccurrence relation between words in the textual sentences;
  
  a memory including an area for storing a hypothesized cooccurrence relation table listing hypothesized cooccurrence relations having a high probability of valid cooccurrence relations and an area for storing a processing program for executing algorithms for automatically generating and maintaining a cooccurrence relation lexicon;
  
  means for determining hypothesized cooccurrence relations between words of the sequence of words in the given sentence on the basis of cooccurrence relation patterns, indicative of high probability of a particular cooccurrence relation extracted from said hypothesized cooccurrence relation table in accordance with a processing program stored in said memory; and
  
  testing means for responding to hypothesized cooccurrence relations determined by said determining means to derive textual sentences having relevant actual cooccurrence relation patterns from said text corpus file and for analyzing each of the derived textual sentences with the aid of sentence analysis or generation rules and a sentence analysis or generation lexicon, said testing means including means for examining whether the result of the analysis indicates that the derived textual sentences meet predetermined threshold conditions for a valid cooccurrence relation and means for outputting information indicating the valid cooccurrence relation.
- View Dependent Claims (10)
- - 10. A system according to claim 9, further comprising registration control means for comparing the valid cooccurrence relation information from said testing means with a predetermined condition, and means responsive to the valid cooccurrence relation information meeting the predetermined condition for modifying the contents of said cooccurrence relation lexicon.

11. A method, using a computer including a processor and a memory, of generating cooccurrence relation information indicating whether a sequence of words in a given sentence described in a natural language is semantically correct or not, said method comprising the steps of:
- (a) defining categories of sentences on the basis of the types of documents in which the sentences appear;
  
  (b) defining fields of sentences on the basis of the subject matters of the sentences;
  
  (c) preparing a text corpus by collecting input textual sentences belonging to the same category or the same field as the given sentence;
  
  (d) determining a hypothesized cooccurrence relation between words in the sequence of words in the given sentence on the basis of a cooccurrence relation pattern set up by an operator and indicating a particular possible cooccurrence relation between words in the the given sentence;
  
  (e) deriving an actual cooccurrence relation between words in the sequence of words in the given sentence from said text corpus for the determined hypothesized cooccurrence relation;
  
  (f) determining whether the actual cooccurrence relation exceeds a predetermined threshold condition for a valid cooccurrence relation; and
  
  (g) when the actual cooccurrence relation exceeds the predetermined threshold condition, outputting information indicating the actual cooccurrence relation as a valid cooccurrence relation.
- View Dependent Claims (12)
- - 12. A method of generating cooccurrence relation information according to claim 11, wherein step (d) determines a hypothesized cooccurrence relation directly by the operator.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hitachi, Ltd.
Original Assignee
Hitachi, Ltd.
Inventors
Okajima, Atsushi, Yamano, Fumiyuki, Katagiri, Eri
Primary Examiner(s)
James, Andrew J.
Assistant Examiner(s)
Nguyen, Viet Q.

Application Number

US06/922,889
Time in Patent Office

1,362 Days
Field of Search

364/300, 364/200, 364/419
US Class Current

704/10
CPC Class Codes

G06F 16/36   Creation of semantic tools,...

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/289   Phrasal analysis, e.g. fini...

Method and system for generating lexicon of cooccurrence relations in natural language

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

163 Citations

13 Claims

Specification

Use Cases

Quick Links

Others

Method and system for generating lexicon of cooccurrence relations in natural language

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

163 Citations

13 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others