Method and system for identifying and resolving commonly confused words in a natural language parser
First Claim
1. A method in a computer system for parsing a segment of natural language input text containing one or more words using grammar rules and a dictionary containing a plurality of entries, each dictionary entry corresponding to a word in the natural language and specifying one or more possible parts of speech for the word, the method comprising the steps of:
- (a) creating a chart for containing a parse tree representing the input text segment and parsing results intermediate thereto;
(b) for each word occurring in the input text segment, creating a part-of-speech record in the chart for the word specifying a part of speech specified by the dictionary entry for the word;
(c) identifying a word occurring in the input text segment that is commonly confused with another word;
(d) creating a part-of-speech record in the chart for the identified word specifying a part of speech specified by the dictionary entry for the word commonly confused with the identified word; and
(e) applying the grammar rules to both the part-of-speech records created in step (b) and those created in step (d).
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for identifying and resolving commonly confused words in a natural language parser is provided. In a preferred embodiment, a computer system parses input text made up of two or more words using a relation that maps from potentially confused words, including one word among the words of the input text, to possibly intended words. The computer system first identifies the possible parts of speech for each word of the input text including the potentially confused word. The computer system then identifies the possible parts of speech for the possibly intended word to which the relation maps the potentially confused word. Finally, the computer system applies syntactic grammar rules to the identified parts of speech such that a complete syntax tree containing a possible part of speech for the possibly intended word is produced and no complete syntax tree containing a possible part of speech for the potentially confused word is produced. According to a further embodiment of the invention, the computer system provides feedback on the input text by outputting an indication that a sentence in the input text is syntactically incorrect and outputting a further indication that the sentence in the input text would be syntactically correct if the potentially confused word in the input text was replaced with the possibly intended word.
74 Citations
20 Claims
-
1. A method in a computer system for parsing a segment of natural language input text containing one or more words using grammar rules and a dictionary containing a plurality of entries, each dictionary entry corresponding to a word in the natural language and specifying one or more possible parts of speech for the word, the method comprising the steps of:
-
(a) creating a chart for containing a parse tree representing the input text segment and parsing results intermediate thereto; (b) for each word occurring in the input text segment, creating a part-of-speech record in the chart for the word specifying a part of speech specified by the dictionary entry for the word; (c) identifying a word occurring in the input text segment that is commonly confused with another word; (d) creating a part-of-speech record in the chart for the identified word specifying a part of speech specified by the dictionary entry for the word commonly confused with the identified word; and (e) applying the grammar rules to both the part-of-speech records created in step (b) and those created in step (d). - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method in a computer system for loading into a parser a list of rules and lexical records for the parser to apply when parsing a segment of natural language input text containing words, each contained word having associated with it lexical information, the method comprising the steps of:
-
(a) adding to the list lexical records, each lexical record specifying lexical information associated with one of the words contained in the input text segment; (b) identifying a word contained in the input text that is commonly confused with another word; (c) adding to the list a lexical record specifying lexical information associated with the word with which the identified word is commonly confused; and (d) adding to the list rules that can be applied to the lexical records added to the list in steps (a) and (c). - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. A method in a computer system for parsing input text made up of a plurality of words using a relation that maps from each of a plurality of potentially confused words to a possibly intended word, a word of the input text being a potentially confused word mapped by the relation to a possibly intended word not occurring in the input text, each word of the input text and each possibly intended word having one or more possible parts of speech, the method comprising:
-
identifying the possible parts of speech for each word of the input text including the potentially confused word; identifying the possible parts of speech for the possibly intended word; and applying syntactic grammar rules to the identified parts of speech, such that a complete syntax tree containing a possible part of speech for the possibly intended word is produced and no complete syntax tree containing a possible part of speech for the potentially confused word is produced.
-
-
13. A computer-readable medium whose contents cause a computer system to parse a segment of natural language input text containing one or more words using grammar rules, a dictionary containing a plurality of entries, and a chart for containing a parse tree representing the input text segment and parsing results intermediate thereto, each dictionary entry corresponding to a word in the natural language and specifying one or more possible parts of speech for the word, by performing the steps of:
-
(a) for each word occurring in the input text segment, creating a part-of-speech record in the chart for the word specifying a part of speech specified by the dictionary entry for the word; (b) identifying a word occurring in the input text segment that is commonly confused with another word; (c) creating a part-of-speech record in the chart for the identified word specitfying a part of speech specified by the dictionary entry for the word commonly confused with the identified word; and (d) applying the grammar rules to both the part-of-speech records created in step (a) and those created in step (c). - View Dependent Claims (14, 15, 16, 17)
-
-
18. A computer-readable medium whose contents cause a computer system to parse input text made up of a plurality of words using a relation mapping from each of a plurality of potentially confused words to a possibly intended word, a word of the input text being a potentially confused word mapped to a possibly intended word by the relation, each word of the input text and each possibly intended word having one or more possibly parts of speech, by:
-
identifying the possible parts of speech for each word of the input text including the potentially confused word; identifying the possible parts of speech for the possibly intended word; and applying syntactic grammar rules to the identified parts of speech, such that a complete syntax tree containing a possible part of speech for the possibly intended word is produced and no complete syntax tree containing a possible part of speech for the potentially confused word is produced.
-
-
19. An apparatus for parsing a segment of natural language input text containing one or more words using grammar rules and a dictionary containing a plurality of entries, each dictionary entry corresponding to a word in the natural language and specifying one or more possible parts of speech for the word, comprising:
-
a data structure for containing a parse tree representing the input text segment and parsing results intermediate thereto; a primary part-of-speech record generator that creates a part-of-speech record in the data structure for each word occurring in the input text segment, each part-of-speech record specifying a part-of-speech record specified by the dictionary entry for the word; an identifier that identifies a word occurring in the input text segment that is commonly confused with another word; a secondary part-of-speech record generator that creates a part-of-speech record in the chart memory for the word identified by the identifier, the created part-of-speech record specifying a part of speech specified by the dictionary entry for the word commonly confused with the identified word; and a grammar rule application subsystem that applies the grammar rules to both the part-of-speech records created by the primary part-of-speech record generator and those created by the secondary part-of-speech record generator.
-
-
20. A computer-readable medium containing instructions for controlling a computer system to load into a parser a list of rules and lexical records for the parser to apply when parsing a segment of natural language input text containing words, each contained word having associated with it lexical information, by:
-
(a) adding to the list lexical records, each lexical record specifying lexical information associated with one of the words contained in the input text segment; (b) identifying a word contained in the input text that is commonly confused with another word; (c) adding to the list a lexical record specifying lexical information associated with the word with which the identified word is commonly confused; and (d) adding to the list rules that can be applied to the lexical records added to the list in steps (a) and (c).
-
Specification