Method and system for bootstrapping statistical processing into a rule-based natural language parser
First Claim
1. A method in a computer system for bootstrapping statistical processing into a rule-based natural language parser to efficiently parse a principal input string using a plurality of sample input strings representative of strings to be parsed by the natural language parser, the natural language parser for producing one or more parse results from an input string comprised of words by applying rules from a set of conditioned rules that each combine words or already combined groups of words, certain subsets of the set of rules being applicable when parsing particular input strings, comprising the steps of:
- for each rule, initializing a plurality of indications of the number of times that the rule has succeeded, each of the plurality of indications corresponding to a characteristic of at least one of the words or already combined groups of words that may be combined by the rule;
for each sample input string;
exhaustively parsing the sample input string by applying each applicable rule of the set of rules to produce one or more parse results, andif fewer than a maximum number of parse results were produced by exhaustively parsing the sample input string, updating for each rule that combined words or already combined groups of words in the parse result an indication of the number of times that the rule succeeded that corresponds to a characteristic of at least one of the words or already combined groups of words of the sample input string combined in the parse results by the rule; and
efficiently parsing the principal input string by applying applicable rules to the principal input string from the set of rules in the decreasing order of their likelihood of success as indicated by updated indications of the number of times that each rule succeeded that corresponds to a characteristic of at least one of the words or already combined groups of words of the sample input string combined in the parse results by the rule.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system for bootstrapping statistical processing into a rule-based natural language parser is provided. In a preferred embodiment, a statistical bootstrapping software facility optimizes the operation of a robust natural language parser that uses a set of lexicon entries to determine possible parts of speech of words from an input string and a set of rules to combine words from the input string into syntactic structures. The facility first operates the parser in a statistics compilation mode, in which, for each of many sample input strings, the parser attempts to apply all applicable rules and lexicon entries. While the parser is operating in the statistics compilation mode, the facility compiles statistics indicating the likelihood of success of each rule and lexicon entry, based on the success of each rule and lexicon entry when applied in the statistics compilation mode. After a sufficient body of likelihood of success statistics have been compiled, the facility operates the parser in an efficient parsing mode, in which the facility uses the compiled statistics to optimize the operation of the parser. In order to parse an input string in the efficient parsing mode, the facility causes the parser to apply applicable rules and lexicon entries in the descending order of the likelihood of their success as indicated by the statistics compiled in the statistics compilation mode.
-
Citations
16 Claims
-
1. A method in a computer system for bootstrapping statistical processing into a rule-based natural language parser to efficiently parse a principal input string using a plurality of sample input strings representative of strings to be parsed by the natural language parser, the natural language parser for producing one or more parse results from an input string comprised of words by applying rules from a set of conditioned rules that each combine words or already combined groups of words, certain subsets of the set of rules being applicable when parsing particular input strings, comprising the steps of:
-
for each rule, initializing a plurality of indications of the number of times that the rule has succeeded, each of the plurality of indications corresponding to a characteristic of at least one of the words or already combined groups of words that may be combined by the rule; for each sample input string; exhaustively parsing the sample input string by applying each applicable rule of the set of rules to produce one or more parse results, and if fewer than a maximum number of parse results were produced by exhaustively parsing the sample input string, updating for each rule that combined words or already combined groups of words in the parse result an indication of the number of times that the rule succeeded that corresponds to a characteristic of at least one of the words or already combined groups of words of the sample input string combined in the parse results by the rule; and efficiently parsing the principal input string by applying applicable rules to the principal input string from the set of rules in the decreasing order of their likelihood of success as indicated by updated indications of the number of times that each rule succeeded that corresponds to a characteristic of at least one of the words or already combined groups of words of the sample input string combined in the parse results by the rule.
-
-
2. A computer-readable medium whose contents cause a computer system to bootstrap statistical processing into a rule-based natural language parser to efficiently parse a principal input string using a plurality of sample input strings representative of strings to be parsed by the natural language parser, the natural language parser for producing one or more parse results from an input string comprised of words by applying rules from a set of conditioned rules that each combined words or already combined groups of words, certain subsets of the set of rules being applicable when parsing particular input strings, by performing:
-
for each rule, initializing a plurality of indications of the number of times that the rule has succeeded, each of the plurality of indications corresponding to a characteristic of at least one of the words or already combined groups of words that may be combined by the rule; for each sample input string; exhaustively parsing the sample input string by applying each applicable rule of the set of rules to produce one or more parse results, and if a single parse result was produced by exhaustively parsing the sample input string, updating for each rule that combined words or already combined groups of words in the parse result an indication of the number of times that the rule succeeded that corresponds to a characteristic of at least one of the words or already combined groups of words of the sample input string combined in the parse results by the rule; and efficiently parsing the principal input string by applying applicable rules to the principal input string from the set of rules in the decreasing order of their likelihood of success as indicated by updated indications of the number of times that each rule succeeded that corresponds to a characteristic of at least one of the words or already combined groups of words of the sample input string combined in the parse results by the rule.
-
-
3. A method in a computer system for bootstrapping statistical processing into a rule-based natural language parser to efficiently parse a principal input string using a plurality of sample input strings representative of strings to be parsed by the natural language parser, the natural language parser for producing one or more parse results from an input string comprised of words by applying rules from a set of conditioned rules that each combine words or already combined groups of words, certain subsets of the set of rules being applicable when parsing particular input strings, comprising the steps of:
-
for each rule, initializing a plurality of indications of the number of times that the rule has succeeded, each of the plurality of indications corresponding to the identities of any subordinate rules that combined a group of words that the rule further combines with other words or groups of words; for each sample input string; exhaustively parsing the sample input string by applying each applicable rule of the set of rules to produce one or more parse results, and if fewer than a maximum number of parse results were produced by exhaustively parsing the sample input string, updating for each rule that combined words or already combined groups of words in the parse result an indication of the number of times that the rule succeeded that corresponds to the identities of any subordinate rules that combined a group of words that the rule further combines with other words or groups of words; and efficiently parsing the principal input string by applying applicable rules to the principal input string from the set of rules in the decreasing order of their likelihood of success as indicated by updated indications of the number of times that each rule succeeded that corresponds to the identities of any subordinate rules that combined a group of words that the rule further combines with other words or groups of words.
-
-
4. A computer-readable medium whose contents cause a computer system to bootstrap statistical processing into a rule-based natural language parser to efficiently parse the principal input string using a plurality of sample input strings representative of strings to be parsed by the natural language parser, the natural language parser for producing one or more parse results from an input string comprised of words by applying rules from a set of conditioned rules that each combined words or already combined groups of words, certain subsets of the set of rules being applicable when parsing particular input strings, by performing:
-
for each rule, initializing a plurality of indications of the number of times that the rule has succeeded, each of the plurality of indications corresponding to identities of any subordinate rules that combined a group of words that the rule further combines with other words or groups of words; for each sample input string; exhaustively parsing the sample input string by applying each applicable rule of the set of rules to produce one or more parse results, and if a single parse result was formed by exhaustively parsing the sample input string, updating for each rule that combined words or already combined groups of words in the parse result an indication of the number of times that the rule succeeded that corresponds to the identities of any subordinate rules that combined a group of words that the rule further combines with other words or groups of words; and efficiently parsing the principal input string by applying applicable rules to the principal input string from the set of rules in the decreasing order of their likelihood of success as indicated by updated indications of the number of times that each rule succeeded that corresponds to the identities of any subordinate rules that combined a group of words that the rule further combines with other words or groups of words.
-
-
5. A computer-implemented method system for bootstrapping statistical processing into a rule-based parser for parsing input strings of natural language text using a set of conditioned rules, the method comprising the steps of:
-
(a) operating the parser across a plurality of input strings to produce statistics indicating a likelihood of success of each rule of the parser; and (b) operating the parser such that the parser applies at least one of the rules of the parser in descending order of the likelihood of success indicated by the statistics produced in step (a). - View Dependent Claims (6)
-
-
7. A computer-readable medium whose contents cause a computer system to bootstrap statistical processing into a rule-based parser for parsing input strings of natural language text using a set of conditioned rules by performing:
-
(a) operating the parser across a plurality of input strings to produce statistics indicating a likelihood of success of each rule of the parser; and (b) operating the parser such that the parser applies at least one of the rules of the parser in descending order of the likelihood of success indicated by the statistics produced in step (a). - View Dependent Claims (8)
-
-
9. A computer-implemented method for efficiently parsing input strings using a parser that utilizes a set of lexicon entries and a set of rules, each lexicon entry of the set of lexicon entries and each rule of the set of rules either succeeding or failing each time it is applied, certain subsets both of the set of rules and of the set of lexicon entries being applicable when parsing particular input strings, the method comprising the steps of:
-
(a) applying the parser to each of a first set of input strings to generate statistics indicating a relative level of success of each lexicon entry in the set of lexicon entries and of each rule in the set of rules; and (b) applying lexicon entries in the set of lexicon entries and rules in the set of rules in the decreasing order of the relative levels of success of the rules and lexicon entries indicated by the statistics generated in step (a) to parse each of a second set of input strings. - View Dependent Claims (10)
-
-
11. A computer-readable medium whose contents cause a computer system to efficiently parse input strings using a parser that utilizes a set of lexicon entries and a set of rules, certain subsets both of the set of rules and of the set of lexicon entries being applicable when parsing particular input strings, by performing:
-
(a) applying the parser to each of a first set of input strings to generate statistics indicating the relative level of success of each lexicon entry in the set of lexicon entries and of each rule in the set of rules; and (b) applying lexicon entries in the set of lexicon entries and rules in the set of rules in the decreasing order of the relative levels of success of the rules and lexicon entries indicated by the statistics generated in step (a) to parse each of a second set of input strings. - View Dependent Claims (12)
-
-
13. A computer memory storing a parsing expedition data structure for expediting the parsing of natural language input text strings by a rule-based natural language parser, the data structure comprising:
for each of a plurality of rules applicable by the parser, frequency statistics indicating the frequency with which the rule contributes to a successful parse of natural language strings when applied by the rule-based natural language parser, such that the frequency statistics stored in the data structure may be used to expedite the parsing of natural language input text strings by ordering the application of rules in accordance with the frequency statistics. - View Dependent Claims (14)
-
15. A computer-implemented method system for continuously enhancing a set of statistics used by a rule-based parser for parsing input strings of natural language text using a set of conditioned rules, the set of statistics indicating a likelihood of success of each rule of the parser, the method comprising the steps of:
-
(a) operating the parser such that the parser applies at least one of the rules of the parser in descending order of the likelihood of success indicated by the set of statistics; (b) augmenting the set of statistics indicating the likelihood of success of each rule of the parser, based on the success of each rule when applied in step (a); and (c) operating the parser such that the parser applies at least one of the rules of the parser in descending order of the likelihood of success indicated by the augmented set of statistics.
-
-
16. A computer-readable medium whose contents cause a computer system to continuously enhance a set of statistics used by a rule-based parser for parsing input strings of natural language text using a set of conditioned rules, the set of statistics indicating likelihood of success of each rule of the parser, by performing:
-
(a) operating the parser such that the parser applies at least one of the rules of the parser in descending order of the likelihood of success indicated by the set of statistics; (b) augmenting the set of statistics indicating the likelihood of success of each rule of the parser, based on the success of each rule when applied in step (a); and (c) operating the parser such that the parser applies at least one of the rules of the parser in descending order of the likelihood of success indicated by the augmented set of statistics.
-
Specification