Parser for natural language text

US 4,887,212 A
Filed: 10/29/1986
Issued: 12/12/1989
Est. Priority Date: 10/29/1986
Status: Expired due to Fees

First Claim

Patent Images

1. A data processing method for parsing natural language text, comprising the steps of:

inputting a human language word string;

isolating word components of said input strings;

performing a first morphological analysis of the isolated words from said input string, to strip off prefixes and suffixes;

looking up in a dictionary all the possible parts of speech for each word in said input stream;

performing a second morphological analysis on words from said input data stream which are not successfully matched in said dictionary look-up;

applying complement grammar rules to eliminate impossible parts of speech from consideration based upon the context within which the words of the input data stream occur;

resolving ambiguities in those words which require consideration of semantic as well as as syntactic characteristics;

selecting those words identified as verbs and grouping them according to valid verb group sequences;

performing a clause analysis including a verb analysis complement analysis, a noun phrase determination, a prepositional phrase structure determination and a grammar violation analysis; and

outputting a data structure were the words from the input data stream are associated with parts of speech and with group markings that indicate phrase structure.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An improved natural language text parser is disclosed which provides syntactic analysis of text using a fast and compact technique. Sequential steps of word isolation, morphological analysis and dictionary look-up combined with a complement grammar analysis, are applied to an input data stream of woods. Word expert rules, verb group analysis and clause analysis are then applied to provide an output data structure where the words in the input data stream are associated with appropriate phrase markings. The principle of operation of the parser is applicable to a variety of Indo-European languages and provides a faster and more compact technique for parsing in a data processor than has been available in the prior art.

445 Citations

7 Claims

1. A data processing method for parsing natural language text, comprising the steps of:
- inputting a human language word string;
  
  isolating word components of said input strings;
  
  performing a first morphological analysis of the isolated words from said input string, to strip off prefixes and suffixes;
  
  looking up in a dictionary all the possible parts of speech for each word in said input stream;
  
  performing a second morphological analysis on words from said input data stream which are not successfully matched in said dictionary look-up;
  
  applying complement grammar rules to eliminate impossible parts of speech from consideration based upon the context within which the words of the input data stream occur;
  
  resolving ambiguities in those words which require consideration of semantic as well as as syntactic characteristics;
  
  selecting those words identified as verbs and grouping them according to valid verb group sequences;
  
  performing a clause analysis including a verb analysis complement analysis, a noun phrase determination, a prepositional phrase structure determination and a grammar violation analysis; and
  
  outputting a data structure were the words from the input data stream are associated with parts of speech and with group markings that indicate phrase structure.

2. A data processing method for parsing natural language text in a computer having a memory, comprising the steps of:
- inputting a human language word string;
  
  isolating word components of said input word string in a bidirectional list data structure;
  
  storing a list data structure in said memory, said list data structure being a plurality of list nodes stored in said memory, each list node including a first address pointer to a preceding list node and a second address pointer to a succeeding list node in said list data structure;
  
  said list data structure further including string nodes stored in said memory, each string nodes being pointed to by a corresponding one of said list nodes, said string node storing information relating to a character string stored in said memory representing one of said isolated word components;
  
  said list data structure further including property nodes stored in said memory, each said property node being pointed to by a corresponding one of said string nodes stored in said memory, said property nodes storing information related to the language attributes of said character string representing said one of said isolated word components;
  
  looking up in a dictionary stored in association with said computer, the language attributes for one of said isolated word components associated with one of said string nodes and storing information access from said dictionary in response thereto, in association with one of said property nodes pointed to by said one of said string nodes;
  
  performing an analysis of said input word stream by accessing said list nodes in both a forward and a backward direction along said list data structure, accessing said string nodes pointed to by said accessed list nodes, accessing said property nodes pointed to by said accessed string nodes, and processing in context said character strings and their language attributes relating to said accessing string nodes and property nodes in accordance with stored program instructions for carrying out said analysis;
  
  outputting the results of said analysis;
  
  whereby an analysis can be made using the context within which words occur in the input word string.

3. A data processing method for parsing natural language text in a computer having a memory, comprising the steps of;
- inputting a human language word string;
  
  isolating word components of said input word string in a bidirectional list data structure;
  
  storing a list data structure in said memory, said list data structure being a plurality of list nodes stored in said memory, including a first list node having a first backward address pointer to a preceding list node and a first forward address pointer to a second list node in said list data structure, and a first string address pointer;
  
  said second list node further including a second backward address pointer to a said first list node and a second forward address pointer to a succeeding lit node in said list data structure, and a second string address pointer;
  
  said list data structure further including a first string node stored in said memory pointed to by said firs string address pointer for storing information relating to a first character string stored in said memory representing a first one of said isolated word components, and further including a first property address pointer;
  
  said list data structure further including a second string node stored in said memory pointed to by said second string address pointer, for storing information relating to a second character string stored in said memory representing a second one of said isolated word components, and further including a second property address pointer;
  
  said list data structure further including a first property node stored in said memory pointed to by said first property address pointer, for storing information relating to first language attributes of said first character string;
  
  said list data structure further including a second property node stored in said memory pointed to by said second property address pointer, for storing information relating to second language attributes of said second character string;
  
  performing a word context analysis of said input word stream by accessing said first and second list nodes in both a forward and a backward direction along said list data structure, accessing said first and second string nodes pointed to by said accessed first and second list nodes, accessing said first and second property nodes pointed to by said accessed first and second string node, and processing in context said first and second character strings and said first and second language attributes in accordance with stored program instructions for carrying out said word context analysis;
  
  outputting the results of said analysis;
  
  whereby an analysis can be made using the context within which words occur in the input word string.
- View Dependent Claims (4, 5, 6, 7)
- - 4. The data processing method for parsing natural language text of claim 3, wherein said word context analysis step comprises the steps of:
    - applying clause analysis to identify constituent clauses based upon the context within which said first and second word components of the input data stream occur;
      
      outputting a data structure where the words and phrases from the input data stream are associated with parts of speech.
  - 5. The data processing method for parsing natural language text of claim 3, wherein said word context analysis step comprises the steps of:
    - applying complement grammar rules to eliminate impossible parts of speech from consideration based upon the context within which said first and second word components of the input data stream occur;
      
      outputting a data structure where the words from the input data stream are associated with parts of speech.
  - 6. The data processing method for parsing natural language text of claim 3, wherein said word context analysis step comprises the steps of:
    - applying verb group analysis to identify verb groups based upon the context within which said first and second word components of the input data stream occur;
      
      outputting a data structure where the words from the input data stream are associated with parts of speech.
  - 7. The data processing method for parsing natural language text of claim 6, wherein said verb group analysis includes the step of applying paradigm-based morhphological text analysis to identify the grammatical category of said first word component.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Zamora, Antonio, Gunther, Michael D., Zamora, Elena M.
Primary Examiner(s)
MACDONALD, ALLEN R

Application Number

US06/924,670
Time in Patent Office

1,140 Days
Field of Search

364/419, 364/200 MS File, 364/900 MS File
US Class Current

704/8
CPC Class Codes

G06F 16/36   Creation of semantic tools,...

G06F 40/205   Parsing

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/253   Grammatical analysis; Style...

G06F 40/268   Morphological analysis

Parser for natural language text

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

445 Citations

7 Claims

Specification

Use Cases

Quick Links

Others

Parser for natural language text

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

445 Citations

7 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others