System and method for parsing a natural language input span using a candidate list to generate alternative nodes
First Claim
1. A computer-implemented method of parsing a natural language input span, the method comprising the steps of:
- breaking the input span into individual words;
looking up each individual word in a lexicon to determine at least one part of speech associated with the word;
storing each individual word and an associated part of speech as a node in a candidate list;
assigning a first goodness measure to a first one of the nodes in the candidate list;
determining a leading candidate node;
promoting the leading candidate node to a node chart;
comparing the leading candidate node to a template; and
in response to a match between the leading candidate node and the template, generating an alternative node and storing the alternative node in the candidate list.
2 Assignments
0 Petitions
Accused Products
Abstract
An improved natural language parser uses a directed search template set to identify problematic word sequences, thus reducing processing time while increasing accuracy. The directed search template set is used to identify problematic input spans or portions of input spans. A problematic input span is one that contains at least one word or phrase that can be constructed in alternative ways. Problematic input spans can reduce the efficiency of a natural language parser and can result in the production of an inaccurate parse tree. Once a problematic span has been identified, the improved parser generates alternative parses for the problematic portion of the input span. This on-the-fly alternative parse generation permits the parser to consider the alternatives as early in the parse process as possible, thus reducing the overall time needed to parse a problematic input span.
-
Citations
25 Claims
-
1. A computer-implemented method of parsing a natural language input span, the method comprising the steps of:
-
breaking the input span into individual words;
looking up each individual word in a lexicon to determine at least one part of speech associated with the word;
storing each individual word and an associated part of speech as a node in a candidate list;
assigning a first goodness measure to a first one of the nodes in the candidate list;
determining a leading candidate node;
promoting the leading candidate node to a node chart;
comparing the leading candidate node to a template; and
in response to a match between the leading candidate node and the template, generating an alternative node and storing the alternative node in the candidate list. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
applying a syntax rule to a second node in the node chart to combine the second node with a third node in the node chart, thereby creating a fourth node.
-
-
4. The computer-implemented method of claim 3, further comprising the step of:
moving the fourth node to the candidate list and assigning a second goodness measure to the fourth node.
-
5. The computer-implemented method of claim 1, wherein each of the nodes in the candidate list has an associated goodness measure and wherein the step of determining the leading candidate node comprises the steps of:
-
comparing the goodness measures associated with each of the nodes in the candidate list; and
selecting the node having the highest goodness measure associated thereto as the leading candidate node.
-
-
6. The computer-implemented method of claim 1, wherein:
-
the template comprises a plurality of alternative constructions; and
each alternative construction comprises a sequence of constituents that is equivalent to the sequence of constituents of each other alternative construction in the template.
-
-
7. The computer-implemented method of claim 6, wherein the step of comparing the leading candidate node to the template comprises the step of:
comparing the leading candidate node to each alternative construction in the template.
-
8. The computer-implemented method of claim 7, wherein the step of generating an alternative node comprises the step of:
searching the candidate list for a node having an equivalent construction to a first alternative construction.
-
9. The computer-implemented method of claim 8, further comprising the step of:
in response to a failure to find a node having an equivalent construction to the first alternative construction, searching the candidate list for component nodes, each component node comprising at least one leaf node and having an equivalent construction to a corresponding constituent of the first alternative construction.
-
10. The computer-implemented method of claim 9, further comprising the steps of:
-
in response to finding the component nodes in the candidate list, moving the component nodes to the node chart; and
applying a syntax rule to combine at least two of the component nodes into the alternative node.
-
-
11. A computer readable medium on which is stored computer-executable instructions for performing the steps of:
-
breaking an input span into a plurality of words;
associating each word with a part of speech;
storing each word and each associated part of speech as a leaf node;
following storing each word and each associated part of speech as a leaf node, combining leaf nodes to form intermediate-level nodes to a template; and
in response to a match between one of the intermediate-level nodes and the template, creating an alternative node. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
searching a lexicon for an entry corresponding to the word; and
in response to finding the entry in the lexicon, determining at least one part of speech associated with the entry.
-
-
13. The computer readable medium of claim 11, herein the step of combining leaf nodes to form intermediate-level odes comprises the steps of:
-
finding a first leaf node that is adjacent to a second leaf node;
comparing the first leaf node to a syntax rule;
comparing the second leaf node to the syntax rule, in response to a match between the first leaf node and the syntax rule; and
combining the first node and the second node to form an intermediate-level node, in response to a match between the second leaf node and the syntax rule.
-
-
14. The computer readable medium of claim 11, wherein:
-
the template comprises a plurality of alternative constructions; and
each alternative construction comprises a sequence of constituents that is equivalent to the sequence of constituents of each other alternative construction in the template.
-
-
15. The computer readable medium of claim 14, wherein the step of comparing the intermediate-level nodes to the template comprises the step of:
comparing the intermediate-level nodes to each alternative construction in the template.
-
16. The computer readable medium of claim 15, wherein the step of creating the alternative node comprises the step of:
in response to a match between a first intermediate-level node and a first alternative construction in the template, searching for a second intermediate-level node having an equivalent construction to a second alternative construction in the template.
-
17. The computer readable medium of claim 16, further comprising the step of:
in response to a failure to find the second intermediate-level node having an equivalent construction to the second alternative construction, searching for component nodes, each component node comprising at least one leaf node and having an equivalent construction to a corresponding constituent in the second alternative construction.
-
18. The computer readable medium of claim 17, wherein the component nodes comprise at least one intermediate-level node.
-
19. The computer readable medium of claim 17, further comprising the step of:
in response to finding the component nodes, applying a syntax rule to combine at least two of the component nodes into the alternative node.
-
20. A computer-implemented method of increasing efficiency of a natural language parser for parsing an input span through the generation of an alternative node, the method comprising the steps of:
-
comparing each of a plurality of intermediate-level nodes to a template;
in response to a match between one of the intermediate-level nodes and the template, generating an alternative node. - View Dependent Claims (21, 22, 23, 24, 25)
the template comprises a plurality of alternative constructions; and
each alternative construction comprising a sequence of constituents that is equivalent to the sequence of constituents of each other alternative construction in the template.
-
-
22. The computer-implemented method of claim 21, wherein the step of comparing the intermediate-level nodes to the template comprises the step of:
comparing the intermediate-level nodes to a first alternative construction in the template.
-
23. The computer-implemented method of claim 22, wherein the step of generating an alternative node comprises the step of:
searching for a first intermediate-level node having an equivalent construction to the first alternative construction in the template.
-
24. The computer-implemented method of claim 23, further comprising the step of:
in response to a failure to find the first intermediate-level node having an equivalent construction to the first alternative construction, searching for component nodes, each component node comprising at least one leaf node and having an equivalent construction to a corresponding constituent node in the first alternative construction.
-
25. The computer-implemented method of claim 24, further comprising the step of:
in response to finding the component nodes applying a syntax rule to combine the component nodes into a new intermediate-level node.
Specification