Systems and methods for parsing a natural language sentence
First Claim
Patent Images
1. A method for parsing a sentence having a series of words and punctuation marks comprising:
- (a) identifying for each of the words a token comprised of a list of syntactic identifiers corresponding to the word;
(b) token merging consecutive tokens by matching consecutive tokens against a first list of rules to produce a narrower set of possible syntactic interpretations;
(c) continuing step (b) until no further changes are determined for the syntactic identifiers; and
(d) token merging the narrower set of possible interpretations by matching the narrower set of possible interpretations against a second list of rules to map the narrower set of possible interpretations into a parse for the sentence having a still narrower set of possible interpretations;
wherein the second set of rules comprises substitution and concatenation rules specific to a collection of tokens.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, computer program product, and apparatus for parsing a sentence which includes tokenizing the words of the sentence and putting them through an iterative inductive processor. The processor has access to at least a first and second set of rules. The rules narrow the possible syntactic interpretations for the words in the sentence. After exhausting application of the first set of rules, the program moves to the second set of rules. The program reiterates back and forth between the sets of rules until no further reductions in the syntactic interpretation can be made. Thereafter, deductive token merging is performed if needed.
171 Citations
23 Claims
-
1. A method for parsing a sentence having a series of words and punctuation marks comprising:
-
(a) identifying for each of the words a token comprised of a list of syntactic identifiers corresponding to the word;
(b) token merging consecutive tokens by matching consecutive tokens against a first list of rules to produce a narrower set of possible syntactic interpretations;
(c) continuing step (b) until no further changes are determined for the syntactic identifiers; and
(d) token merging the narrower set of possible interpretations by matching the narrower set of possible interpretations against a second list of rules to map the narrower set of possible interpretations into a parse for the sentence having a still narrower set of possible interpretations;
wherein the second set of rules comprises substitution and concatenation rules specific to a collection of tokens. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
(e) reiterating steps b-d until no further token merging is possible.
-
-
3. The method of claim 2 further comprising deductive token merging upon completion of said step of reiterating.
-
4. The method of claim 3 wherein said step of deductive token merging includes reducing the list of syntactic identifiers for a word by selecting a syntactic identifier most commonly used for the word.
-
5. The method of claim 1 wherein the first set of rules comprises substitution and concatenation rules and wherein substitution is preferred over concatenation when both may be applied to a series of tokens in the step of token merging consecutive tokens.
-
6. The method of claim 1 wherein substitution is preferred over concatenation when both may be applied to a collection of tokens in the step of token merging possible interpretations.
-
7. The method of claim 1 wherein the first set of rules comprises substitution and concatenation rules and a rule includes a condition comprised of a series of elements, each element being for comparison with at least one token, and wherein when more than one rule resulting in substitution applies in the step of token merging consecutive tokens, an applicable substitution rule having a longer list of elements is applied.
-
8. The method of claim 1 wherein the first set of rules comprises substitution and concatenation rules and a rule includes a condition comprised of a series of elements, each element being for comparison with at least one token, and wherein when more than one rule resulting in substitution applies in the step of token merging the narrower set of possible interpretations, an applicable substitution rule having a longer list of elements is applied.
-
9. The method of claim 1 wherein the first set of rules comprises substitution and concatenation rules and a rule includes a condition comprised of a series of elements, each element being for comparison with at least one token, and wherein when more than one rule resulting in concatenation applies in the step of token merging consecutive tokens, an applicable concatenation rule having a longer list of elements is applied.
-
10. The method of claim 1 wherein the first set of rules comprises substitution and concatenation rules and a rule includes a condition comprised of a series of elements, each element being for comparison with at least one token, and wherein when more than one rule resulting in concatenation applies in the step of token merging the narrower set of possible interpretations, an applicable concatenation rule having a longer list of elements is applied.
-
11. The method of claim 1 wherein the step of identifying comprises looking up a word in a dictionary, identifying the syntactic identifiers associated with the word and providing a syntactic identifier from a given set of syntactic identifiers for any syntactic identifier that is not in the given set of syntactic identifiers and is in a subclass of the substitute syntactic identifier.
-
12. The method of claim 1 further comprising matching consecutive words in the sentence with multiple words in a dictionary that contains syntactic identifiers for the multiple words and substituting a token comprised of the syntactic identifiers corresponding to a matched multiple word for the tokens of each word of the consecutive words that matched.
-
13. A computer program product for use on a computer system for parsing sentences, the computer program product comprising a computer usable medium having computer readable program code thereon, the computer readable program comprising:
-
tokenizing program code that provides tokens, each comprised of a list of syntactic identifiers, for the works of a sentence;
first inductive merging program code applying a first set of rules to consecutive tokens in a sentence processed by said tokenizing program code to produce a narrower set of syntactic interpretations; and
second inductive merging program code applying a second set of rules comprising substitution and concatenation rules specific to a collection of tokens to the narrower set of syntactic interpretations. - View Dependent Claims (14, 15, 16, 17, 18, 19)
reiteration program code for returning to said first inductive merging program code and said second inductive merging program code until said first and second inductive merging program code can make no further reductions in the syntactic interpretations.
-
-
15. The computer program product of claim 14 further comprising deductive token merging code for reducing synactic possibilities after completing execution of said reiteration program code.
-
16. The computer program product of claim 13 wherein said tokenizing program code comprises code that looks up a word in a dictionary, identifies the syntactic identifiers associated with the word and provides a syntactic identifier from a given set of syntactic identifiers for any syntactic identifier that is not in the given set of syntactic identifiers and is in a subclass of the substitute syntactic identifier.
-
17. The computer program product of claim 13 further comprising multiword matching program code.
-
18. The computer program product of claim 13 wherein said first inductive merging program code in conjunction with the first set of rules identifies phrase structures in the sentence.
-
19. The computer program product of claim 13 wherein said second inductive merging program code identifies in conjunction with the second set of rules identifies clause structures in the sentence.
-
20. A sentence parser comprising:
-
a tokenization module that receives a sentence comprised of a string of words and generates syntactic possibilities for the words of the sentence;
a replaceable set of first substitution and concatenation rules;
a replaceable set of second substitution and concatenation rules specific to a collection of tokens; and
an iterative inductive processor for receiving sentences that have been processed by said tokenization module and matching said sentences first against the replaceable set of first substitution and concatenation rules and then against the replaceable set of second substitution and concatenation rules and reiterating said matching to reduce the syntactic possibilities for a sentence. - View Dependent Claims (21, 22, 23)
-
Specification