Method and system for natural language parsing using podding
First Claim
1. A method in a computer system for parsing input in a natural language, the computer system having syntax rules indicating how lower-level syntactic constructs in the language are combined to form higher-level syntactic constructs, the method comprising:
- providing one or more heuristic score formulas for each syntax rule, the heuristic score formulas for each syntax rule indicating how to calculate a heuristic score when the syntax rule is to be applied to generate a node that may be part of the parse tree, the heuristic score indicating a likelihood that generate node will be part of the parse tree;
initializing a chart to contain one leaf node for each word of the input wherein the chart upon completion of the parse of the input contains a parse tree representing the parse;
applying syntax rules to the nodes of the chart so that higher-level nodes are generated and added to the chart in an order based on a heuristic score calculated in accordance with the heuristic score formulas for the applied syntax rules; and
when a termination condition is satisfied, selecting from the chart a set of spanning sub-trees that span each word of the input based on the parse of the input.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for determining the likelihood that a certain syntax rule when applied to a partial parse of an input will produce a node that will be part of the correct parse for the input. Each syntax rule indicates a higher-level syntactic construct that can be formed from one or more lower-level syntactic constructs. Each syntax rule has a probability which indicates the likelihood that the syntax rule will succeed and produce a node in the resulting parse tree. Each syntax rule also has a heuristic score formula indicating how to calculate a heuristically derived score for the higher-level syntactic construct created when the syntax rule is successfully applied. When a syntax rule is successfully applied while parsing the input sentence, the system calculates a probability for the higher-level syntactic construct produced by the syntax rule. The system then calculates a heuristic score for the higher-level syntactic construct produced by the syntax rule based on the heuristic score formula of the syntax rule and the calculated heuristic scores of the lower-level syntactic constructs to which the syntax rule was successfully applied. The system then combines the calculated probabilities and the calculated heuristic scores to guide the selecting of syntax rules to apply.
172 Citations
46 Claims
-
1. A method in a computer system for parsing input in a natural language, the computer system having syntax rules indicating how lower-level syntactic constructs in the language are combined to form higher-level syntactic constructs, the method comprising:
-
providing one or more heuristic score formulas for each syntax rule, the heuristic score formulas for each syntax rule indicating how to calculate a heuristic score when the syntax rule is to be applied to generate a node that may be part of the parse tree, the heuristic score indicating a likelihood that generate node will be part of the parse tree; initializing a chart to contain one leaf node for each word of the input wherein the chart upon completion of the parse of the input contains a parse tree representing the parse; applying syntax rules to the nodes of the chart so that higher-level nodes are generated and added to the chart in an order based on a heuristic score calculated in accordance with the heuristic score formulas for the applied syntax rules; and when a termination condition is satisfied, selecting from the chart a set of spanning sub-trees that span each word of the input based on the parse of the input. - View Dependent Claims (2, 3, 24)
-
-
4. A method in a computer system for parsing input in a natural language, the computer system having syntax rules indicating how lower-level syntactic constructs in the language are combined to form higher-level syntactic constructs, the method comprising:
-
providing one or more heuristic score formulas for each syntax rule, the heuristic score formulas for each syntax rule indicating how to calculate a heuristic score when the syntax rule is to be applied to generate a node that may be part of the parse tree, the heuristic score indicating a likelihood that the generated node will be part of the parse tree; initializing a chart to contain one leaf node for each word of the input wherein the chart upon completion of the parse of the input contains a parse tree representing the parse; pre-executing all syntax rules by applying them to the nodes of the chart, wherein each higher-level node that is generated is stored in an ordered list, awaiting introduction into the chart in the chart, each generated higher-level node having a heuristic score that is calculated in accordance with the heuristic score formulas for the syntax rule that generated the higher-level node; adding a higher-level node generated from the pre-execution of syntax rules with the highest heuristic score to the chart; and repeating the pre-executing and adding until the parsing is complete. - View Dependent Claims (5, 6, 7, 8)
-
-
9. A method in a computer system for parsing an input sentence in a natural language, the input sentence having words, the computer system having syntax rules indicating how lower-level syntactic constructs in the natural language are combined to form higher-level syntactic constructs, each syntax rule having an associated probability, the parse of the input sentence to be represented by a parse tree with nodes, each node of the parse tree representing a syntactic construct that is part of the parse of the input sentence, the method comprising:
-
providing one or more heuristic score formulas for each syntax rule, the heuristic score formulas for each syntax rule indicating how to calculate a heuristic score when the syntax rule is to be applied to generate a node that may be part of the parse tree, the heuristic score indicating a likelihood that the generated node will be part of the parse tree; initializing a chart to contain one leaf node for each word of the input sentence wherein the chart upon completion of the parse of the input sentence contains the parse tree; and repeating the following until the parse tree of the input sentence is generated, identifying a syntax rule that can be applied to the nodes of the chart to generate a higher-level node representing the higher-level syntactic construct of the identified syntax rule, the identified syntax rule being determined to be the syntax rule with the greatest likelihood that the resulting generated node will be part of the parse tree, the determination being made by combining a heuristic score calculated in accordance with the provided heuristic score formulas of the syntax rule and the probability of the syntax rule; and applying the identified syntax rule to generate the higher-level node and to add the generated higher-level node to the chart. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
25. A method in a computer system for determining a likelihood that a certain syntax rule when applied to a partial parse of an input will be part of the correct complete parse for the input, the computer system having syntax rules indicating a higher-level syntactic construct to be created from one or more lower-level syntactic constructs, each syntax rule having a probability, each syntax rule having a heuristic score formula indicating how to calculate a heuristically derived score for the higher-level syntactic construct when the syntax rule is applied, the method comprising:
for each syntax rule applied when parsing the input, calculating a heuristic score for the higher-level syntactic construct of the syntax rule based on the heuristic score formula of the syntax rule; and combining the probabilities and the calculated heuristic score to generate a combined score indicating the likelihood that the syntax rule when applied to the current partial parse of the input will be part of the correct parse for the input. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
36. A computer-readable medium containing instructions for causing a computer system to determine a likelihood that a certain syntax rule when applied to a partial parse of an input will be part of the correct parse for the input, the computer system having syntax rules indicating a higher-level syntactic construct to be created from one or more lower-level syntactic constructs, each syntax rule having a probability, each syntax rule having a heuristic score formula indicating how to calculate a heuristically derived score for the higher-level syntactic construct when the syntax rule is applied, by:
for each syntax rule applied when parsing the input, calculating a heuristic score for the higher-level syntactic construct of the syntax rule based on the heuristic score formula of the syntax rule; and combining the probabilities and the calculated heuristic score to generate a combined score indicating the likelihood that the syntax rule when applied to the current partial parse of the input will be part of the correct parse for the input. - View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
Specification