Method and system for bootstrapping statistical processing into a rule-based natural language parser

US 5,963,894 A
Filed: 05/20/1997
Issued: 10/05/1999
Est. Priority Date: 06/24/1994
Status: Expired due to Term

First Claim

Patent Images

1. A method in a computer system for bootstrapping statistical processing into a rule-based natural language parser to efficiently parse a principal input string using a plurality of sample input strings representative of strings to be parsed by the natural language parser, the natural language parser for producing one or more parse results from an input string comprised of words by applying rules from a set of conditioned rules that each combine words or already combined groups of words, certain subsets of the set of rules being applicable when parsing particular input strings, comprising the steps of:

for each rule, initializing a plurality of indications of the number of times that the rule has succeeded, each of the plurality of indications corresponding to a characteristic of at least one of the words or already combined groups of words that may be combined by the rule;

for each sample input string;

exhaustively parsing the sample input string by applying each applicable rule of the set of rules to produce one or more parse results, andif fewer than a maximum number of parse results were produced by exhaustively parsing the sample input string, updating for each rule that combined words or already combined groups of words in the parse result an indication of the number of times that the rule succeeded that corresponds to a characteristic of at least one of the words or already combined groups of words of the sample input string combined in the parse results by the rule; and

efficiently parsing the principal input string by applying applicable rules to the principal input string from the set of rules in the decreasing order of their likelihood of success as indicated by updated indications of the number of times that each rule succeeded that corresponds to a characteristic of at least one of the words or already combined groups of words of the sample input string combined in the parse results by the rule.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for bootstrapping statistical processing into a rule-based natural language parser is provided. In a preferred embodiment, a statistical bootstrapping software facility optimizes the operation of a robust natural language parser that uses a set of lexicon entries to determine possible parts of speech of words from an input string and a set of rules to combine words from the input string into syntactic structures. The facility first operates the parser in a statistics compilation mode, in which, for each of many sample input strings, the parser attempts to apply all applicable rules and lexicon entries. While the parser is operating in the statistics compilation mode, the facility compiles statistics indicating the likelihood of success of each rule and lexicon entry, based on the success of each rule and lexicon entry when applied in the statistics compilation mode. After a sufficient body of likelihood of success statistics have been compiled, the facility operates the parser in an efficient parsing mode, in which the facility uses the compiled statistics to optimize the operation of the parser. In order to parse an input string in the efficient parsing mode, the facility causes the parser to apply applicable rules and lexicon entries in the descending order of the likelihood of their success as indicated by the statistics compiled in the statistics compilation mode.

Citations

16 Claims

1. A method in a computer system for bootstrapping statistical processing into a rule-based natural language parser to efficiently parse a principal input string using a plurality of sample input strings representative of strings to be parsed by the natural language parser, the natural language parser for producing one or more parse results from an input string comprised of words by applying rules from a set of conditioned rules that each combine words or already combined groups of words, certain subsets of the set of rules being applicable when parsing particular input strings, comprising the steps of:
- for each rule, initializing a plurality of indications of the number of times that the rule has succeeded, each of the plurality of indications corresponding to a characteristic of at least one of the words or already combined groups of words that may be combined by the rule;
  
  for each sample input string;
  
  exhaustively parsing the sample input string by applying each applicable rule of the set of rules to produce one or more parse results, andif fewer than a maximum number of parse results were produced by exhaustively parsing the sample input string, updating for each rule that combined words or already combined groups of words in the parse result an indication of the number of times that the rule succeeded that corresponds to a characteristic of at least one of the words or already combined groups of words of the sample input string combined in the parse results by the rule; and
  
  efficiently parsing the principal input string by applying applicable rules to the principal input string from the set of rules in the decreasing order of their likelihood of success as indicated by updated indications of the number of times that each rule succeeded that corresponds to a characteristic of at least one of the words or already combined groups of words of the sample input string combined in the parse results by the rule.

2. A computer-readable medium whose contents cause a computer system to bootstrap statistical processing into a rule-based natural language parser to efficiently parse a principal input string using a plurality of sample input strings representative of strings to be parsed by the natural language parser, the natural language parser for producing one or more parse results from an input string comprised of words by applying rules from a set of conditioned rules that each combined words or already combined groups of words, certain subsets of the set of rules being applicable when parsing particular input strings, by performing:
- for each rule, initializing a plurality of indications of the number of times that the rule has succeeded, each of the plurality of indications corresponding to a characteristic of at least one of the words or already combined groups of words that may be combined by the rule;
  
  for each sample input string;
  
  exhaustively parsing the sample input string by applying each applicable rule of the set of rules to produce one or more parse results, andif a single parse result was produced by exhaustively parsing the sample input string, updating for each rule that combined words or already combined groups of words in the parse result an indication of the number of times that the rule succeeded that corresponds to a characteristic of at least one of the words or already combined groups of words of the sample input string combined in the parse results by the rule; and
  
  efficiently parsing the principal input string by applying applicable rules to the principal input string from the set of rules in the decreasing order of their likelihood of success as indicated by updated indications of the number of times that each rule succeeded that corresponds to a characteristic of at least one of the words or already combined groups of words of the sample input string combined in the parse results by the rule.

3. A method in a computer system for bootstrapping statistical processing into a rule-based natural language parser to efficiently parse a principal input string using a plurality of sample input strings representative of strings to be parsed by the natural language parser, the natural language parser for producing one or more parse results from an input string comprised of words by applying rules from a set of conditioned rules that each combine words or already combined groups of words, certain subsets of the set of rules being applicable when parsing particular input strings, comprising the steps of:
- for each rule, initializing a plurality of indications of the number of times that the rule has succeeded, each of the plurality of indications corresponding to the identities of any subordinate rules that combined a group of words that the rule further combines with other words or groups of words;
  
  for each sample input string;
  
  exhaustively parsing the sample input string by applying each applicable rule of the set of rules to produce one or more parse results, andif fewer than a maximum number of parse results were produced by exhaustively parsing the sample input string, updating for each rule that combined words or already combined groups of words in the parse result an indication of the number of times that the rule succeeded that corresponds to the identities of any subordinate rules that combined a group of words that the rule further combines with other words or groups of words; and
  
  efficiently parsing the principal input string by applying applicable rules to the principal input string from the set of rules in the decreasing order of their likelihood of success as indicated by updated indications of the number of times that each rule succeeded that corresponds to the identities of any subordinate rules that combined a group of words that the rule further combines with other words or groups of words.

4. A computer-readable medium whose contents cause a computer system to bootstrap statistical processing into a rule-based natural language parser to efficiently parse the principal input string using a plurality of sample input strings representative of strings to be parsed by the natural language parser, the natural language parser for producing one or more parse results from an input string comprised of words by applying rules from a set of conditioned rules that each combined words or already combined groups of words, certain subsets of the set of rules being applicable when parsing particular input strings, by performing:
- for each rule, initializing a plurality of indications of the number of times that the rule has succeeded, each of the plurality of indications corresponding to identities of any subordinate rules that combined a group of words that the rule further combines with other words or groups of words;
  
  for each sample input string;
  
  exhaustively parsing the sample input string by applying each applicable rule of the set of rules to produce one or more parse results, andif a single parse result was formed by exhaustively parsing the sample input string, updating for each rule that combined words or already combined groups of words in the parse result an indication of the number of times that the rule succeeded that corresponds to the identities of any subordinate rules that combined a group of words that the rule further combines with other words or groups of words; and
  
  efficiently parsing the principal input string by applying applicable rules to the principal input string from the set of rules in the decreasing order of their likelihood of success as indicated by updated indications of the number of times that each rule succeeded that corresponds to the identities of any subordinate rules that combined a group of words that the rule further combines with other words or groups of words.

5. A computer-implemented method system for bootstrapping statistical processing into a rule-based parser for parsing input strings of natural language text using a set of conditioned rules, the method comprising the steps of:
- (a) operating the parser across a plurality of input strings to produce statistics indicating a likelihood of success of each rule of the parser; and
  
  (b) operating the parser such that the parser applies at least one of the rules of the parser in descending order of the likelihood of success indicated by the statistics produced in step (a).
- View Dependent Claims (6)
- - 6. The method of claim 5 wherein step (a) produces statistics indicating separate likelihoods of success for each rule of the parser corresponding to different conditions under which the rule may be applied by the parser, and wherein step (b) operates the parser to apply the rules of the parser in descending order of, in the case of each rule, the likelihood of success corresponding to the condition most similar to the condition in which the rule is to be applied.

7. A computer-readable medium whose contents cause a computer system to bootstrap statistical processing into a rule-based parser for parsing input strings of natural language text using a set of conditioned rules by performing:
- (a) operating the parser across a plurality of input strings to produce statistics indicating a likelihood of success of each rule of the parser; and
  
  (b) operating the parser such that the parser applies at least one of the rules of the parser in descending order of the likelihood of success indicated by the statistics produced in step (a).
- View Dependent Claims (8)
- - 8. The computer-readable medium of claim 7 wherein step (a) produces statistics indicating separate likelihoods of success for each rule of the parser corresponding to different conditions under which the rule may be applied by the parser, and wherein step (b) operates the parser to apply the rules of the parser in descending order of, in the case of each rule, the likelihood of success corresponding to the condition most similar to the condition in which the rule is to be applied.

9. A computer-implemented method for efficiently parsing input strings using a parser that utilizes a set of lexicon entries and a set of rules, each lexicon entry of the set of lexicon entries and each rule of the set of rules either succeeding or failing each time it is applied, certain subsets both of the set of rules and of the set of lexicon entries being applicable when parsing particular input strings, the method comprising the steps of:
- (a) applying the parser to each of a first set of input strings to generate statistics indicating a relative level of success of each lexicon entry in the set of lexicon entries and of each rule in the set of rules; and
  
  (b) applying lexicon entries in the set of lexicon entries and rules in the set of rules in the decreasing order of the relative levels of success of the rules and lexicon entries indicated by the statistics generated in step (a) to parse each of a second set of input strings.
- View Dependent Claims (10)
- - 10. The method of claim 9, further including the step of normalizing the generated statistics indicating the relative level of success of each lexicon entry against the generated statistics indicating the relative level of success of each rule, such that the statistics indicating the relative level of success of each lexicon entry are directly comparable to the statistics indicating the relative level of success of each rule.

11. A computer-readable medium whose contents cause a computer system to efficiently parse input strings using a parser that utilizes a set of lexicon entries and a set of rules, certain subsets both of the set of rules and of the set of lexicon entries being applicable when parsing particular input strings, by performing:
- (a) applying the parser to each of a first set of input strings to generate statistics indicating the relative level of success of each lexicon entry in the set of lexicon entries and of each rule in the set of rules; and
  
  (b) applying lexicon entries in the set of lexicon entries and rules in the set of rules in the decreasing order of the relative levels of success of the rules and lexicon entries indicated by the statistics generated in step (a) to parse each of a second set of input strings.
- View Dependent Claims (12)
- - 12. The computer-readable medium of claim 11 wherein the contents of the computer-readable medium further cause the computer system to perform the step of normalizing the generated statistics indicating the relative level of success of each lexicon entry against the generated statistics indicating the relative level of success of each rule, such that the statistics indicating the relative level of success of each lexicon entry are directly comparable to the statistics indicating the relative level of success of each rule.

13. A computer memory storing a parsing expedition data structure for expediting the parsing of natural language input text strings by a rule-based natural language parser, the data structure comprising:
- for each of a plurality of rules applicable by the parser, frequency statistics indicating the frequency with which the rule contributes to a successful parse of natural language strings when applied by the rule-based natural language parser, such that the frequency statistics stored in the data structure may be used to expedite the parsing of natural language input text strings by ordering the application of rules in accordance with the frequency statistics.
- View Dependent Claims (14)
- - 14. The computer memory of claim 13 wherein, in the data structure, the frequency statistics for each rule are subdivided into statistics indicating the frequency with which the rule contributes to a successful parse of natural language strings for natural language strings having different characteristics.

15. A computer-implemented method system for continuously enhancing a set of statistics used by a rule-based parser for parsing input strings of natural language text using a set of conditioned rules, the set of statistics indicating a likelihood of success of each rule of the parser, the method comprising the steps of:
- (a) operating the parser such that the parser applies at least one of the rules of the parser in descending order of the likelihood of success indicated by the set of statistics;
  
  (b) augmenting the set of statistics indicating the likelihood of success of each rule of the parser, based on the success of each rule when applied in step (a); and
  
  (c) operating the parser such that the parser applies at least one of the rules of the parser in descending order of the likelihood of success indicated by the augmented set of statistics.

16. A computer-readable medium whose contents cause a computer system to continuously enhance a set of statistics used by a rule-based parser for parsing input strings of natural language text using a set of conditioned rules, the set of statistics indicating likelihood of success of each rule of the parser, by performing:
- (a) operating the parser such that the parser applies at least one of the rules of the parser in descending order of the likelihood of success indicated by the set of statistics;
  
  (b) augmenting the set of statistics indicating the likelihood of success of each rule of the parser, based on the success of each rule when applied in step (a); and
  
  (c) operating the parser such that the parser applies at least one of the rules of the parser in descending order of the likelihood of success indicated by the augmented set of statistics.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Richardson, Stephen Darrow, Heidorn, George E.
Primary Examiner(s)
Trammell, James P.
Assistant Examiner(s)
Nguyen, Cuong H.

Application Number

US08/858,959
Time in Patent Office

868 Days
Field of Search

704/4, 704/9, 704/8, 704/2, 704/10, 704/257, 705/1, 706/11, 345/326, 707/532
US Class Current

704/9
CPC Class Codes

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/193   Formal grammars, e.g. finit...

G10L 15/197   Probabilistic grammars, e.g...

Method and system for bootstrapping statistical processing into a rule-based natural language parser

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

16 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for bootstrapping statistical processing into a rule-based natural language parser

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

16 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links