METHOD FOR PARSING NATURAL LANGUAGE TEXT WITH CONSTITUENT CONSTRUCTION LINKS
First Claim
1. A method for Improving a processor in communication with a memory storing a program which uses a parser to parse natural language text, said method comprising:
- a) training said parser by accessing a corpus of labeled utterances;
b) using said parser to extract details from said corpus, where said details include at least two simple links, where a simple link consists of a source word in the utterance, a target word in the utterance that is distinct from said source word, and a link action, said link action is chosen from a set of link actions which includes at least 2 of Append, Insert Below, Insert Above, and Insert Above and Below;
c) using said parser to create a language model using said details;
d) using said language model to generate at least two new simple links for at least two source words in at least one additional utterance;
e) using said new simple links to generate a constituent tree structure that represents the sentence parse result for said additional utterance by performing determination steps and repeating the determination steps until the required nodes for each new simple link have been added to said constituent tree, where the determination steps include;
i. if this is the first new simple link for said additional utterance, create a new node for the first word of said additional utterance, also create a node and make it the parent of this new node;
ii. create a new node for the source word;
iii. find the highest node above the target node of the simple link, for which the target node is either the first child of said highest node, or for which the target node is a descendent of the first child of said highest node and is also a descendent of the first child of all intervening nodes between said highest node and the target node, and herein designate said node as the highest right most node;
iv. if the link action is Append, make the node for the source word a child of the highest right most node;
v. if the link action is Insert Below, create a new node, make it a child of the highest right most node and make the node for the source word a child to this new node;
vi. if the link action is Insert Above and the highest right most node DOES NOT have a parent, create a new node, make said new node the parent of the highest right most node, and make the node of the source word a child of said new node;
vii. if the link action is Insert Above and the highest right most node DOES have a parent, separate the highest right most node and its parent, create a new node, make said new node a child of the node that was the parent of the highest right most node, make the highest right most node a child of said new node, and make the node of the source word a child of said new node;
viii. if the link action is insert Above and Below and the highest right most node DOES NOT have a parent, create a first new node, make said first new node the parent of the highest right most node, create a second new node, make said second new node a child of said first new node, and make the node of the source word a child of said second new node;
ix. if the link action is Insert Above and Below and the highest right most node DOES have a parent, separate the highest right most node and its parent, create a first new node, make said first new node a child of the node that was the parent of the highest right most node, make the highest right most node a child of said first new node, create a second new node, make said second new node a child of said first new node, and make the node of the source word a child of said second new node;
f) outputting the results of said parsing of said additional utterance as an array of simple links and said constituent tree structure with the additional utterance.
0 Assignments
0 Petitions
Accused Products
Abstract
A parser for natural language text is provided. The parser is trained by accessing a corpus of labeled utterances. The parser extracts details of the syntactic tree structures and part of speech tags from the labeled utterances. The details extracted from the tree structures include Simple Links which are the key to the improved efficiency of this new approach. The parser creates a language model using the details that were extracted from the corpus. The parser then uses the language model to parse utterances.
15 Citations
29 Claims
-
1. A method for Improving a processor in communication with a memory storing a program which uses a parser to parse natural language text, said method comprising:
-
a) training said parser by accessing a corpus of labeled utterances; b) using said parser to extract details from said corpus, where said details include at least two simple links, where a simple link consists of a source word in the utterance, a target word in the utterance that is distinct from said source word, and a link action, said link action is chosen from a set of link actions which includes at least 2 of Append, Insert Below, Insert Above, and Insert Above and Below; c) using said parser to create a language model using said details; d) using said language model to generate at least two new simple links for at least two source words in at least one additional utterance; e) using said new simple links to generate a constituent tree structure that represents the sentence parse result for said additional utterance by performing determination steps and repeating the determination steps until the required nodes for each new simple link have been added to said constituent tree, where the determination steps include; i. if this is the first new simple link for said additional utterance, create a new node for the first word of said additional utterance, also create a node and make it the parent of this new node; ii. create a new node for the source word; iii. find the highest node above the target node of the simple link, for which the target node is either the first child of said highest node, or for which the target node is a descendent of the first child of said highest node and is also a descendent of the first child of all intervening nodes between said highest node and the target node, and herein designate said node as the highest right most node; iv. if the link action is Append, make the node for the source word a child of the highest right most node; v. if the link action is Insert Below, create a new node, make it a child of the highest right most node and make the node for the source word a child to this new node; vi. if the link action is Insert Above and the highest right most node DOES NOT have a parent, create a new node, make said new node the parent of the highest right most node, and make the node of the source word a child of said new node; vii. if the link action is Insert Above and the highest right most node DOES have a parent, separate the highest right most node and its parent, create a new node, make said new node a child of the node that was the parent of the highest right most node, make the highest right most node a child of said new node, and make the node of the source word a child of said new node; viii. if the link action is insert Above and Below and the highest right most node DOES NOT have a parent, create a first new node, make said first new node the parent of the highest right most node, create a second new node, make said second new node a child of said first new node, and make the node of the source word a child of said second new node; ix. if the link action is Insert Above and Below and the highest right most node DOES have a parent, separate the highest right most node and its parent, create a first new node, make said first new node a child of the node that was the parent of the highest right most node, make the highest right most node a child of said first new node, create a second new node, make said second new node a child of said first new node, and make the node of the source word a child of said second new node; f) outputting the results of said parsing of said additional utterance as an array of simple links and said constituent tree structure with the additional utterance.
-
-
2. A non-transitory computer-readable storage medium having instructions that develop a parser for use In natural language processing, the instructions comprising:
-
a) training said parser by accessing a corpus of labeled utterances; b) using said parser to extract details from said corpus, where said details include at least two simple links, where a simple link consists of a source word in the utterance, a target word in the utterance that is distinct from said source word, and a link action, said link action is chosen from a set of link actions which includes at least 2 of Append, Insert Below, Insert Above, and Insert Above and Below; c) using said parser to create a language model using said details; d) using said language model to generate at least two new simple links for at least two source words in at least one additional utterance; e) using said new simple links to generate a constituent tree structure that represents the sentence parse result for said additional utterance by performing determination steps and repeating the determination steps until the required nodes for each new simple link have been added to said constituent tree, where the determination steps include; i. if this is the first new simple link for said additional utterance, create a new node for the first word of said additional utterance, also create a node and make it the parent of this new node; ii. create a new node for the source word; iii. find the highest node above the target node of the simple link, for which the target node is either the first child of said highest node, or for which the target node is a descendent of the first child of said highest node and is also a descendent of the first child of all Intervening nodes between said highest node and the target node, and herein designate said node as the highest right most node; iv. If the link action is Append, make the node for the source word a child of the highest right most node; v. if the link action is Insert Below, create a new node, make it a child of the highest right most node and make the node for the source word a child to this new node; vi. If the link action is Insert Above and the highest right most node DOES NOT have a parent, create a new node, make said new node the parent of the highest right most node, and make the node of the source word a child of said new node; vii. If the link action is Insert Above and the highest right most node DOES have a parent, separate the highest right most node and its parent, create a new node, make said new node a child of the node that was the parent of the highest right most node, make the highest right most node a child of said new node, and make the node of the source word a child of said new node; viii. if the link action is Insert Above and Below and the highest right most node DOES NOT have a parent, create a first new node, make said first new node the parent of the highest right most node, create a second new node, make said second new node a child of said first new node, and make the node of the source word a child of said second new node; ix. if the link action is Insert Above and Below and the highest right most node DOES have a parent, separate the highest right most node and its parent, create a first new node, make said first new node a child of the node that was the parent of the highest right most node, make the highest right most node a child of said first new node, create a second new node, make said second new node a child of said first new node, and make the node of the source word a child of said second new node; f) outputting the results of said parsing of the additional utterance as an array of simple links and said constituent tree structure.
-
-
3. A method for providing an improved natural language parser to a memory unit of a computer system having a system process, the method comprising the steps of:
-
a) training said parser by accessing a corpus of labeled utterances; b) using said parser to extract details from said corpus, where said details include at least two simple links, where a simple link consists of a source word in the utterance, a target word in the utterance that is distinct from said source word, and a link action, said link action is chosen from a set of link actions which includes at least 2 of Append, Insert Below, Insert Above, and Insert Above and Below; c) using said parser to create a language model using said details; d) using said language model to generate at least two new simple links for at least two source words in at least one additional utterance; e) using said new simple links to generate a constituent tree structure that represents the sentence parse result for said additional utterance by performing determination steps and repeating the determination steps until the required nodes for each new simple link have been added to said constituent tree, where the determination steps include; i. if this is the first new simple link for said additional utterance, create a new node for the first word of said additional utterance, also create a node and make it the parent of this new node; ii. create a new node for the source word; iii. find the highest node above the target node of the simple link, for which the target node is either the first child of said highest node, or for which the target node is a descendent of the first child of said highest node and is also a descendent of the first child of all intervening nodes between said highest node and the target node, and herein designate said node as the highest right most node; iv. if the link action is Append, make the node for the source word a child of the highest right most node; v. if the link action is Insert Below, create a new node, make it a child of the highest right most node and make the node for the source word a child to this new node; vi. if the link action is Insert Above and the highest right most node DOES NOT have a parent, create a new node, make said new node the parent of the highest right most node, and make the node of the source word a child of said new node; vii. if the link action is Insert Above and the highest right most node DOES have a parent, separate the highest right most node and its parent, create a new node, make said new node a child of the node that was the parent of the highest right most node, make the highest right most node a child of said new node, and make the node of the source word a child of said new node; viii. if the link action is Insert Above and Below and the highest right most node DOES NOT have a parent, create a first new node, make said first new node the parent of the highest right most node, create a second new node, make said second new node a child of said first new node, and make the node of the source word a child of said second new node; ix. if the link action is Insert Above and Below and the highest right most node DOES have a parent, separate the highest right most node and its parent, create a first new node, make said first new node a child of the node that was the parent of the highest right most node, make the highest right most node a child of said first new node, create a second new node, make said second new node a child of said first new node, and make the node of the source word a child of said second new node; f) make the array of said simple links, said constituent tree structure and the user generated input, available for future requests from a service.
-
-
4. A method for accessing a language model in a data storage system of a computer system having means for reading and writing data from the data storage system, relaying information, and accepting input generated by a user;
- parsing the user generated input, the method comprising the steps of;
a) training said parser by accessing a corpus of labeled utterances; b) using said parser to extract details from said corpus, where said details include at least two simple links, where a simple link consists of a source word in the utterance, a target word in the utterance that is distinct from said source word, and a link action, said link action is chosen from a set of link actions which includes at least 2 of Append, Insert Below, Insert Above, and Insert Above and Below; c) using said parser to create a language model using said details; d) using said language model to generate at least two new simple links for at least two source words in at least one additional utterance; e) using said new simple links to generate a constituent tree structure that represents the sentence parse result for said additional utterance by performing determination steps and repeating the determination steps until the required nodes for each new simple link have been added to said constituent tree, where the determination steps include; i. if this is the first new simple link for said additional utterance, create a new node for the first word of said additional utterance, also create a node and make it the parent of this new node; ii. create a new node for the source word; iii. find the highest node above the target node of the simple link, for which the target node is either the first child of said highest node, or for which the target node is a descendent of the first child of said highest node and is also a descendent of the first child of all intervening nodes between said highest node and the target node, and herein designate said node as the highest right most node; iv. if the link action is Append, make the node for the source word a child of the highest right most node; v. If the link action is Insert Below, create a new node, make it a child of the highest right most node and make the node for the source word a child to this new node; vi. if the link action is Insert Above and the highest right most node DOES NOT have a parent, create a new node, make said new node the parent of the highest right most node, and make the node of the source word a child of said new node; vii. if the link action is Insert Above and the highest right most node DOES have a parent, separate the highest right most node and its parent, create a new node, make said new node a child of the node that was the parent of the highest right most node, make the highest right most node a child of said new node, and make the node of the source word a child of said new node; viii. if the link action is Insert Above and Below and the highest right most node DOES NOT have a parent, create a first new node, make said first new node the parent of the highest right most node, create a second new node, make said second new node a child of said first new node, and make the node of the source word a child of said second new node; ix. If the link action is Insert Above and Below and the highest right most node DOES have a parent, separate the highest right most node and its parent, create a first new node, make said first new node a child of the node that was the parent of the highest right most node, make the highest right most node a child of said first new node, create a second new node, make said second new node a child of said first new node, and make the node of the source word a child of said second new node; f) relaying the resulting array of said simple links, said constituent tree structure and said user generated input, to further modules which perform specific computer operations.
- parsing the user generated input, the method comprising the steps of;
-
5. A method for improving a processor in communication with a memory storing a program which uses a parser to parse natural language text, the method comprising:
-
a) training the parser by accessing a corpus of utterances, which utterances are labelled with marks which specify the constituent tree for each of the utterances; b) using the parser to extract details from the corpus, where the details include at least one constituent construction link, where each constituent construction link consists of a source word in the utterance, a target word in the utterance that is distinct from the source word, and a link action; and c) finding a common ancestor of the source word, a previous word, and a left-most descendent of the common ancestor which becomes the target word for the constituent construction link, and thereby defining a relationship of the source word to the target word in terms of nodes in the constituent tree and Identifying the link action based on this relationship. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for improving a processor in communication with a memory storing a program which uses a parser to parse natural language text, the method comprising:
-
a) using the parser to generate at least one constituent construction link for at least one source word in at least one additional utterance; b) using the new constituent construction links to generate at least one constituent tree structure that represents a sentence parse result for each additional utterance by performing determination steps and repeating the determination steps, where the determination steps include; i. if this is an initial constituent construction link for the additional utterance, create a first word node for a first word of the additional utterance, and create a new node and make it a parent of the first word node; ii. create a source word node for the source word; iii. find a highest node above the target word of the constituent construction link for which the target word is either a first child of the highest node, or for which the target word is a descendent of the first child of the highest node and is also a descendent of the first child of all intervening nodes between the highest node and the target word, and designate the highest node as a highest right most node; iv. add one or more nodes to the constituent tree structure at locations relative to the highest right most node if so indicated by the type of link action of the Constituent Construction Link; v. attach the source word node of the source word to the constituent tree structure at a point relative to the highest right most node based on the type of link action of the Constituent Construction Link. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
Specification