Parser for natural language text
First Claim
1. A data processing method for parsing natural language text, comprising the steps of:
- inputting a human language word string;
isolating word components of said input strings;
performing a first morphological analysis of the isolated words from said input string, to strip off prefixes and suffixes;
looking up in a dictionary all the possible parts of speech for each word in said input stream;
performing a second morphological analysis on words from said input data stream which are not successfully matched in said dictionary look-up;
applying complement grammar rules to eliminate impossible parts of speech from consideration based upon the context within which the words of the input data stream occur;
resolving ambiguities in those words which require consideration of semantic as well as as syntactic characteristics;
selecting those words identified as verbs and grouping them according to valid verb group sequences;
performing a clause analysis including a verb analysis complement analysis, a noun phrase determination, a prepositional phrase structure determination and a grammar violation analysis; and
outputting a data structure were the words from the input data stream are associated with parts of speech and with group markings that indicate phrase structure.
1 Assignment
0 Petitions
Accused Products
Abstract
An improved natural language text parser is disclosed which provides syntactic analysis of text using a fast and compact technique. Sequential steps of word isolation, morphological analysis and dictionary look-up combined with a complement grammar analysis, are applied to an input data stream of woods. Word expert rules, verb group analysis and clause analysis are then applied to provide an output data structure where the words in the input data stream are associated with appropriate phrase markings. The principle of operation of the parser is applicable to a variety of Indo-European languages and provides a faster and more compact technique for parsing in a data processor than has been available in the prior art.
445 Citations
7 Claims
-
1. A data processing method for parsing natural language text, comprising the steps of:
-
inputting a human language word string; isolating word components of said input strings; performing a first morphological analysis of the isolated words from said input string, to strip off prefixes and suffixes; looking up in a dictionary all the possible parts of speech for each word in said input stream; performing a second morphological analysis on words from said input data stream which are not successfully matched in said dictionary look-up; applying complement grammar rules to eliminate impossible parts of speech from consideration based upon the context within which the words of the input data stream occur; resolving ambiguities in those words which require consideration of semantic as well as as syntactic characteristics; selecting those words identified as verbs and grouping them according to valid verb group sequences; performing a clause analysis including a verb analysis complement analysis, a noun phrase determination, a prepositional phrase structure determination and a grammar violation analysis; and
outputting a data structure were the words from the input data stream are associated with parts of speech and with group markings that indicate phrase structure.
-
-
2. A data processing method for parsing natural language text in a computer having a memory, comprising the steps of:
-
inputting a human language word string; isolating word components of said input word string in a bidirectional list data structure; storing a list data structure in said memory, said list data structure being a plurality of list nodes stored in said memory, each list node including a first address pointer to a preceding list node and a second address pointer to a succeeding list node in said list data structure; said list data structure further including string nodes stored in said memory, each string nodes being pointed to by a corresponding one of said list nodes, said string node storing information relating to a character string stored in said memory representing one of said isolated word components; said list data structure further including property nodes stored in said memory, each said property node being pointed to by a corresponding one of said string nodes stored in said memory, said property nodes storing information related to the language attributes of said character string representing said one of said isolated word components; looking up in a dictionary stored in association with said computer, the language attributes for one of said isolated word components associated with one of said string nodes and storing information access from said dictionary in response thereto, in association with one of said property nodes pointed to by said one of said string nodes;
performing an analysis of said input word stream by accessing said list nodes in both a forward and a backward direction along said list data structure, accessing said string nodes pointed to by said accessed list nodes, accessing said property nodes pointed to by said accessed string nodes, and processing in context said character strings and their language attributes relating to said accessing string nodes and property nodes in accordance with stored program instructions for carrying out said analysis;outputting the results of said analysis; whereby an analysis can be made using the context within which words occur in the input word string.
-
-
3. A data processing method for parsing natural language text in a computer having a memory, comprising the steps of;
-
inputting a human language word string; isolating word components of said input word string in a bidirectional list data structure; storing a list data structure in said memory, said list data structure being a plurality of list nodes stored in said memory, including a first list node having a first backward address pointer to a preceding list node and a first forward address pointer to a second list node in said list data structure, and a first string address pointer; said second list node further including a second backward address pointer to a said first list node and a second forward address pointer to a succeeding lit node in said list data structure, and a second string address pointer; said list data structure further including a first string node stored in said memory pointed to by said firs string address pointer for storing information relating to a first character string stored in said memory representing a first one of said isolated word components, and further including a first property address pointer; said list data structure further including a second string node stored in said memory pointed to by said second string address pointer, for storing information relating to a second character string stored in said memory representing a second one of said isolated word components, and further including a second property address pointer; said list data structure further including a first property node stored in said memory pointed to by said first property address pointer, for storing information relating to first language attributes of said first character string; said list data structure further including a second property node stored in said memory pointed to by said second property address pointer, for storing information relating to second language attributes of said second character string; performing a word context analysis of said input word stream by accessing said first and second list nodes in both a forward and a backward direction along said list data structure, accessing said first and second string nodes pointed to by said accessed first and second list nodes, accessing said first and second property nodes pointed to by said accessed first and second string node, and processing in context said first and second character strings and said first and second language attributes in accordance with stored program instructions for carrying out said word context analysis; outputting the results of said analysis; whereby an analysis can be made using the context within which words occur in the input word string. - View Dependent Claims (4, 5, 6, 7)
-
Specification