Systems and methods for parsing a natural language sentence

US 6,539,348 B1
Filed: 08/24/1999
Issued: 03/25/2003
Est. Priority Date: 08/24/1998
Status: Expired due to Fees

First Claim

Patent Images

1. A method for parsing a sentence having a series of words and punctuation marks comprising:

(a) identifying for each of the words a token comprised of a list of syntactic identifiers corresponding to the word;

(b) token merging consecutive tokens by matching consecutive tokens against a first list of rules to produce a narrower set of possible syntactic interpretations;

(c) continuing step (b) until no further changes are determined for the syntactic identifiers; and

(d) token merging the narrower set of possible interpretations by matching the narrower set of possible interpretations against a second list of rules to map the narrower set of possible interpretations into a parse for the sentence having a still narrower set of possible interpretations;

wherein the second set of rules comprises substitution and concatenation rules specific to a collection of tokens.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, computer program product, and apparatus for parsing a sentence which includes tokenizing the words of the sentence and putting them through an iterative inductive processor. The processor has access to at least a first and second set of rules. The rules narrow the possible syntactic interpretations for the words in the sentence. After exhausting application of the first set of rules, the program moves to the second set of rules. The program reiterates back and forth between the sets of rules until no further reductions in the syntactic interpretation can be made. Thereafter, deductive token merging is performed if needed.

171 Citations

23 Claims

1. A method for parsing a sentence having a series of words and punctuation marks comprising:
- (a) identifying for each of the words a token comprised of a list of syntactic identifiers corresponding to the word;
  
  (b) token merging consecutive tokens by matching consecutive tokens against a first list of rules to produce a narrower set of possible syntactic interpretations;
  
  (c) continuing step (b) until no further changes are determined for the syntactic identifiers; and
  
  (d) token merging the narrower set of possible interpretations by matching the narrower set of possible interpretations against a second list of rules to map the narrower set of possible interpretations into a parse for the sentence having a still narrower set of possible interpretations;
  
  wherein the second set of rules comprises substitution and concatenation rules specific to a collection of tokens.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 further comprising:
3. The method of claim 2 further comprising deductive token merging upon completion of said step of reiterating.
4. The method of claim 3 wherein said step of deductive token merging includes reducing the list of syntactic identifiers for a word by selecting a syntactic identifier most commonly used for the word.
5. The method of claim 1 wherein the first set of rules comprises substitution and concatenation rules and wherein substitution is preferred over concatenation when both may be applied to a series of tokens in the step of token merging consecutive tokens.
6. The method of claim 1 wherein substitution is preferred over concatenation when both may be applied to a collection of tokens in the step of token merging possible interpretations.
7. The method of claim 1 wherein the first set of rules comprises substitution and concatenation rules and a rule includes a condition comprised of a series of elements, each element being for comparison with at least one token, and wherein when more than one rule resulting in substitution applies in the step of token merging consecutive tokens, an applicable substitution rule having a longer list of elements is applied.
8. The method of claim 1 wherein the first set of rules comprises substitution and concatenation rules and a rule includes a condition comprised of a series of elements, each element being for comparison with at least one token, and wherein when more than one rule resulting in substitution applies in the step of token merging the narrower set of possible interpretations, an applicable substitution rule having a longer list of elements is applied.
9. The method of claim 1 wherein the first set of rules comprises substitution and concatenation rules and a rule includes a condition comprised of a series of elements, each element being for comparison with at least one token, and wherein when more than one rule resulting in concatenation applies in the step of token merging consecutive tokens, an applicable concatenation rule having a longer list of elements is applied.
10. The method of claim 1 wherein the first set of rules comprises substitution and concatenation rules and a rule includes a condition comprised of a series of elements, each element being for comparison with at least one token, and wherein when more than one rule resulting in concatenation applies in the step of token merging the narrower set of possible interpretations, an applicable concatenation rule having a longer list of elements is applied.
11. The method of claim 1 wherein the step of identifying comprises looking up a word in a dictionary, identifying the syntactic identifiers associated with the word and providing a syntactic identifier from a given set of syntactic identifiers for any syntactic identifier that is not in the given set of syntactic identifiers and is in a subclass of the substitute syntactic identifier.
12. The method of claim 1 further comprising matching consecutive words in the sentence with multiple words in a dictionary that contains syntactic identifiers for the multiple words and substituting a token comprised of the syntactic identifiers corresponding to a matched multiple word for the tokens of each word of the consecutive words that matched.

13. A computer program product for use on a computer system for parsing sentences, the computer program product comprising a computer usable medium having computer readable program code thereon, the computer readable program comprising:
- tokenizing program code that provides tokens, each comprised of a list of syntactic identifiers, for the works of a sentence;
  
  first inductive merging program code applying a first set of rules to consecutive tokens in a sentence processed by said tokenizing program code to produce a narrower set of syntactic interpretations; and
  
  second inductive merging program code applying a second set of rules comprising substitution and concatenation rules specific to a collection of tokens to the narrower set of syntactic interpretations.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The computer readable program product of claim 13 further comprising:
15. The computer program product of claim 14 further comprising deductive token merging code for reducing synactic possibilities after completing execution of said reiteration program code.
16. The computer program product of claim 13 wherein said tokenizing program code comprises code that looks up a word in a dictionary, identifies the syntactic identifiers associated with the word and provides a syntactic identifier from a given set of syntactic identifiers for any syntactic identifier that is not in the given set of syntactic identifiers and is in a subclass of the substitute syntactic identifier.
17. The computer program product of claim 13 further comprising multiword matching program code.
18. The computer program product of claim 13 wherein said first inductive merging program code in conjunction with the first set of rules identifies phrase structures in the sentence.
19. The computer program product of claim 13 wherein said second inductive merging program code identifies in conjunction with the second set of rules identifies clause structures in the sentence.

20. A sentence parser comprising:
- a tokenization module that receives a sentence comprised of a string of words and generates syntactic possibilities for the words of the sentence;
  
  a replaceable set of first substitution and concatenation rules;
  
  a replaceable set of second substitution and concatenation rules specific to a collection of tokens; and
  
  an iterative inductive processor for receiving sentences that have been processed by said tokenization module and matching said sentences first against the replaceable set of first substitution and concatenation rules and then against the replaceable set of second substitution and concatenation rules and reiterating said matching to reduce the syntactic possibilities for a sentence.
- View Dependent Claims (21, 22, 23)
- - 21. The sentence parser of claim 20 further comprising a multiword comparator.
  - 22. The sentence parser of claim 20 further comprising a deductive processor arranged to operate on the syntactic possibilities remaining from said iterative inductive processor so as to further reduce the syntactic possibilities for the sentence.
  - 23. The sentence parser of claim 20 wherein said tokenization module generates syntactic possibilities by looking up each word in a dictionary, identifying the syntactic identifiers associated with each word and providing a syntactic identifier from a given set of syntactic identifiers for any syntactic identifier that is not in the given set of syntactic identifiers and is in a subclass of the substitute syntactic identifier.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Virtual Research Associates Incorporated
Original Assignee
Virtual Research Associates Incorporated
Inventors
Oh, Churl, Bond, Douglas G.
Primary Examiner(s)
EDOUARD, PATRICK NESTOR

Application Number

US09/379,668
Time in Patent Office

1,309 Days
Field of Search

704/1,9,10 707/530,531
US Class Current

704/9
CPC Class Codes

G06F 16/30 of unstructured textual dat...

G06F 40/211 Syntactic parsing, e.g. bas...

Systems and methods for parsing a natural language sentence

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

171 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for parsing a natural language sentence

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

171 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links