Method and system for natural language parsing using chunking

US 6,108,620 A
Filed: 05/17/1999
Issued: 08/22/2000
Est. Priority Date: 07/17/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method in a computer system for parsing input in a language, the language having a grammar described by syntax rules, each syntax rule having a probability indicating a likelihood that the syntax rule will lead to a final parse of the input, the method comprising repeating the following until the final parse is generated:

selecting a syntax rule to apply to a current partial parse of the input, the selected syntax rule having a high probability relative to other syntax rules that can be applied to the current partial parse of the input;

applying the selected syntax rule to the current partial parse of the input to form a new current partial parse of the input;

determining whether syntax rules with low probabilities have been recently applied; and

when it is determined that syntax rules with low probabilities have recently been applied, disabling application of syntax rules to a portion of the current partial parse of the input so that that syntax rule application can be focused on the other portion of the current partial parse of the input.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system that uses a chunking technique to guide the parsing. A chunk is a portion of the input for which the system has determined that a sufficient number of syntax rules have been applied such that further application of syntax rules to that chunk is unlikely to produce a more accurate sub-parse for that chunk. When using the chunking technique, the system selects a syntax rule to apply to the current partial parse (sub-trees) of the input sentence. The selected syntax rule has a high probability relative to other syntax rules that can be applied to the one or more potential sub-trees of the input sentence. The system then applies the selected syntax rule to the potential sub-trees of the input sentence to form a new potential sub-trees of the input sentence. When the system determines that syntax rules with low probabilities have recently been applied, the system disables application of syntax rules to a portion of parse of the input sentence (i.e., a chunk) so that that syntax rules can be applied to the other portion of the input sentence.

54 Citations

23 Claims

1. A method in a computer system for parsing input in a language, the language having a grammar described by syntax rules, each syntax rule having a probability indicating a likelihood that the syntax rule will lead to a final parse of the input, the method comprising repeating the following until the final parse is generated:
- selecting a syntax rule to apply to a current partial parse of the input, the selected syntax rule having a high probability relative to other syntax rules that can be applied to the current partial parse of the input;
  
  applying the selected syntax rule to the current partial parse of the input to form a new current partial parse of the input;
  
  determining whether syntax rules with low probabilities have been recently applied; and
  
  when it is determined that syntax rules with low probabilities have recently been applied, disabling application of syntax rules to a portion of the current partial parse of the input so that that syntax rule application can be focused on the other portion of the current partial parse of the input.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 wherein syntax rules are applied to the input from the end of the input to the beginning of the input.
  - 3. The method of claim 2 wherein the portion of the current partial parse of the input is a portion from a certain word to the end of the input.
  - 4. The method of claim 1 including calculating a minimum path length for each word of the input, the minimum path length indicating a minimum number of syntactic constructs of the current partial parse of the input that encompass the word to the end of the input.
  - 5. The method of claim 4 wherein the disabling occurs for words to the right of the left-most word with a minimum path length of 1.
  - 6. The method of claim 1 including determining that syntax rules with low probabilities have recently been applied when a ratio of a number of syntax rules applied below a threshold probability to a total number of syntax rules applied that exceeds a threshold ratio.

7. A method in a computer system for parsing an input segment, the input segment comprising words, the method comprising:
- identifying syntax rules to be applied to a current partial parse of the input segment;
  
  applying the identified syntax rules to the current partial parse of the input segment;
  
  determining whether thrashing is occurring in the applying of the identified syntax rules; and
  
  when it is determined that thrashing is occurring,selecting a portion of the input segment on which to focus the applying of syntax rules; and
  
  adjusting the identifying of syntax rules so that rules that are to be applied to the selected portion are identified.
- View Dependent Claims (8, 9, 10, 11, 12, 13)
- - 8. The method of claim 7 wherein the identifying of syntax rules identifies syntax rules based on their probability of being part of a complete syntax parse tree for the input segment.
  - 9. The method of claim 7 wherein the selected portion is to the left of the left-most word of the input segment for which a single syntax rule has been applied that encompasses all the words to the right.
  - 10. The method of claim 9 wherein all the words to the right include words to a pseudo-end of the input segment.
  - 11. The method of claim 9 wherein all the words to the right include words to the end of the input segment.
  - 12. The method of claim 7 wherein the selecting of a portion is based on a minimal path length of the words of the input segment.
  - 13. The method of claim 12 wherein the minimal path length of a word is the smallest number of applied syntax rules that encompass each word to the right of the word without overlap.

14. A computer-readable medium having instructions for causing a computer system to parse an input segment, the input segment comprising words by repeating the following until a complete parse is generated:
- identifying a syntax rule to be applied to a current partial parse of the input segment;
  
  applying the identified syntax rule to the current partial parse of the input segment;
  
  determining whether syntax rules with a low probability of leading to a complete parse have recently been applied; and
  
  when it is determined that that such syntax rules have recently been applied, establishing a pseudo-end of the input segment so that syntax rules that previously had a low probability now have a higher probability and are thus more likely to be identified.
- View Dependent Claims (15, 16, 17, 18, 19)
- - 15. The computer-readable medium of claim 14 wherein the identifying of syntax rules identifies syntax rules is based on their probability of being part of a complete syntax parse tree for the input segment.
  - 16. The computer-readable medium of claim 14 wherein the pseudo-end of the input segment is to the left of the left-most word of the input segment for which a single syntax rule has been applied that encompasses that leftmost word and all the words to the right.
  - 17. The computer-readable medium of claim 16 wherein all the words to the right include words to a pseudo-end of the input segment.
  - 18. The computer-readable medium of claim 14 wherein the establishing of the pseudo-end of the input segment is based on a minimal path length of the words of the input segment.
  - 19. The computer-readable medium of claim 18 wherein the minimal path length of a word is the smallest number of applied syntax rules that encompass each word to the right of the word without overlap.

20. A parser for generating a parse tree data structure for an input segment, comprising:
- a component for determining syntax rules that are applicable to records currently in a chart for the input segment and for storing an indication of each determined syntax rule in a list;
  
  a rule application component for selecting a syntax rule indicated by the list and applying the syntax rule to the records currently in the chart; and
  
  a thrashing detection component for determining whether thrashing is occurring and for adjusting the selecting of syntax rules so that syntax rules are applied to certain portions of the input segment.
- View Dependent Claims (21, 22, 23)
- - 21. The parser of claim 20 wherein the indications of the rules in the list are sorted based on a probability that each syntax rule will lead to a complete syntax tree.
  - 22. The parser of claim 20 wherein the thrashing detection component determines that thrashing is occurring when syntax rules with low probabilities of leading to a complete syntax tree are being applied.
  - 23. The parser of claim 20 wherein thrashing detection component identifies a portion of the input segment on which to focus further application of syntax rules.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Jensen, Karen, Richardson, Steve
Primary Examiner(s)
Thomas, Joseph

Application Number

US09/312,808
Time in Patent Office

463 Days
Field of Search

704/1, 704/9, 704/10, 704/251, 704/255, 704/257, 707/1, 707/4, 707/5, 707/6, 707/100, 707/104, 707/530, 707/531, 707/532
US Class Current

704/9
CPC Class Codes

G06F 40/211 Syntactic parsing, e.g. bas...

Method and system for natural language parsing using chunking

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

54 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for natural language parsing using chunking

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

54 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links