Method and system for natural language parsing using chunking
First Claim
1. A method in a computer system for parsing input in a language, the language having a grammar described by syntax rules, each syntax rule having a probability indicating a likelihood that the syntax rule will lead to a final parse of the input, the method comprising repeating the following until the final parse is generated:
- selecting a syntax rule to apply to a current partial parse of the input, the selected syntax rule having a high probability relative to other syntax rules that can be applied to the current partial parse of the input;
applying the selected syntax rule to the current partial parse of the input to form a new current partial parse of the input;
determining whether syntax rules with low probabilities have been recently applied; and
when it is determined that syntax rules with low probabilities have recently been applied, disabling application of syntax rules to a portion of the current partial parse of the input so that that syntax rule application can be focused on the other portion of the current partial parse of the input.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system that uses a chunking technique to guide the parsing. A chunk is a portion of the input for which the system has determined that a sufficient number of syntax rules have been applied such that further application of syntax rules to that chunk is unlikely to produce a more accurate sub-parse for that chunk. When using the chunking technique, the system selects a syntax rule to apply to the current partial parse (sub-trees) of the input sentence. The selected syntax rule has a high probability relative to other syntax rules that can be applied to the one or more potential sub-trees of the input sentence. The system then applies the selected syntax rule to the potential sub-trees of the input sentence to form a new potential sub-trees of the input sentence. When the system determines that syntax rules with low probabilities have recently been applied, the system disables application of syntax rules to a portion of parse of the input sentence (i.e., a chunk) so that that syntax rules can be applied to the other portion of the input sentence.
54 Citations
23 Claims
-
1. A method in a computer system for parsing input in a language, the language having a grammar described by syntax rules, each syntax rule having a probability indicating a likelihood that the syntax rule will lead to a final parse of the input, the method comprising repeating the following until the final parse is generated:
-
selecting a syntax rule to apply to a current partial parse of the input, the selected syntax rule having a high probability relative to other syntax rules that can be applied to the current partial parse of the input; applying the selected syntax rule to the current partial parse of the input to form a new current partial parse of the input; determining whether syntax rules with low probabilities have been recently applied; and when it is determined that syntax rules with low probabilities have recently been applied, disabling application of syntax rules to a portion of the current partial parse of the input so that that syntax rule application can be focused on the other portion of the current partial parse of the input. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method in a computer system for parsing an input segment, the input segment comprising words, the method comprising:
-
identifying syntax rules to be applied to a current partial parse of the input segment; applying the identified syntax rules to the current partial parse of the input segment; determining whether thrashing is occurring in the applying of the identified syntax rules; and when it is determined that thrashing is occurring, selecting a portion of the input segment on which to focus the applying of syntax rules; and adjusting the identifying of syntax rules so that rules that are to be applied to the selected portion are identified. - View Dependent Claims (8, 9, 10, 11, 12, 13)
-
-
14. A computer-readable medium having instructions for causing a computer system to parse an input segment, the input segment comprising words by repeating the following until a complete parse is generated:
-
identifying a syntax rule to be applied to a current partial parse of the input segment; applying the identified syntax rule to the current partial parse of the input segment; determining whether syntax rules with a low probability of leading to a complete parse have recently been applied; and when it is determined that that such syntax rules have recently been applied, establishing a pseudo-end of the input segment so that syntax rules that previously had a low probability now have a higher probability and are thus more likely to be identified. - View Dependent Claims (15, 16, 17, 18, 19)
-
-
20. A parser for generating a parse tree data structure for an input segment, comprising:
-
a component for determining syntax rules that are applicable to records currently in a chart for the input segment and for storing an indication of each determined syntax rule in a list; a rule application component for selecting a syntax rule indicated by the list and applying the syntax rule to the records currently in the chart; and a thrashing detection component for determining whether thrashing is occurring and for adjusting the selecting of syntax rules so that syntax rules are applied to certain portions of the input segment. - View Dependent Claims (21, 22, 23)
-
Specification