Method and system for identifying and resolving commonly confused words in a natural language parser

US 5,999,896 A
Filed: 06/25/1996
Issued: 12/07/1999
Est. Priority Date: 06/25/1996
Status: Expired due to Fees

First Claim

Patent Images

1. A method in a computer system for parsing a segment of natural language input text containing one or more words using grammar rules and a dictionary containing a plurality of entries, each dictionary entry corresponding to a word in the natural language and specifying one or more possible parts of speech for the word, the method comprising the steps of:

(a) creating a chart for containing a parse tree representing the input text segment and parsing results intermediate thereto;

(b) for each word occurring in the input text segment, creating a part-of-speech record in the chart for the word specifying a part of speech specified by the dictionary entry for the word;

(c) identifying a word occurring in the input text segment that is commonly confused with another word;

(d) creating a part-of-speech record in the chart for the identified word specifying a part of speech specified by the dictionary entry for the word commonly confused with the identified word; and

(e) applying the grammar rules to both the part-of-speech records created in step (b) and those created in step (d).

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for identifying and resolving commonly confused words in a natural language parser is provided. In a preferred embodiment, a computer system parses input text made up of two or more words using a relation that maps from potentially confused words, including one word among the words of the input text, to possibly intended words. The computer system first identifies the possible parts of speech for each word of the input text including the potentially confused word. The computer system then identifies the possible parts of speech for the possibly intended word to which the relation maps the potentially confused word. Finally, the computer system applies syntactic grammar rules to the identified parts of speech such that a complete syntax tree containing a possible part of speech for the possibly intended word is produced and no complete syntax tree containing a possible part of speech for the potentially confused word is produced. According to a further embodiment of the invention, the computer system provides feedback on the input text by outputting an indication that a sentence in the input text is syntactically incorrect and outputting a further indication that the sentence in the input text would be syntactically correct if the potentially confused word in the input text was replaced with the possibly intended word.

74 Citations

View as Search Results

20 Claims

1. A method in a computer system for parsing a segment of natural language input text containing one or more words using grammar rules and a dictionary containing a plurality of entries, each dictionary entry corresponding to a word in the natural language and specifying one or more possible parts of speech for the word, the method comprising the steps of:
- (a) creating a chart for containing a parse tree representing the input text segment and parsing results intermediate thereto;
  
  (b) for each word occurring in the input text segment, creating a part-of-speech record in the chart for the word specifying a part of speech specified by the dictionary entry for the word;
  
  (c) identifying a word occurring in the input text segment that is commonly confused with another word;
  
  (d) creating a part-of-speech record in the chart for the identified word specifying a part of speech specified by the dictionary entry for the word commonly confused with the identified word; and
  
  (e) applying the grammar rules to both the part-of-speech records created in step (b) and those created in step (d).
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1 wherein the method further uses a list of commonly confused words that contains, for each commonly confused word, a word with which the word is commonly confused, and wherein step (c) includes the step of matching one of the words occurring in the input text segment with one of the words in the list.
  - 3. The method of claim 1 wherein step (b) creates in the chart, for each word occurring in the input text segment, part-of-speech records specifying each of the possible parts of speech specified in the dictionary entry for the word;
    - and wherein the method further includes the step of, for each word in the input text segment, linking together the parts of speech records created in the chart for the word; and
      
      wherein the application of one or more of the grammar rules to a part-of-speech records involves determining other possible parts of speech for the word by examining the other parts of speech record to which the part-of-speech record is linked; and
      
      wherein the method further comprises the step of linking the part-of-speech record created in step (d) to the part-of-speech records created for the identified word in step (b).
  - 4. The method of claim 1 wherein step (d) is performed after performance of step (e) begins.
  - 5. The method of claim 1 where in step (d) is performed after the application of grammar rules to part-of-speech records created in step (b) concludes.

6. A method in a computer system for loading into a parser a list of rules and lexical records for the parser to apply when parsing a segment of natural language input text containing words, each contained word having associated with it lexical information, the method comprising the steps of:
- (a) adding to the list lexical records, each lexical record specifying lexical information associated with one of the words contained in the input text segment;
  
  (b) identifying a word contained in the input text that is commonly confused with another word;
  
  (c) adding to the list a lexical record specifying lexical information associated with the word with which the identified word is commonly confused; and
  
  (d) adding to the list rules that can be applied to the lexical records added to the list in steps (a) and (c).
- View Dependent Claims (7, 8, 9, 10, 11)
- - 7. The method of claim 6, further comprising the step of:
    - (e) applying the lexical records and rules added to the list in steps (a), (c), and (d) in order to parse the input text segment.
  - 8. The method of claim 7, further comprising the steps of:
    - (f) adding to the list rules that can be applied to the lexical records added to the parser list in step (a); and
      
      (g) applying the lexical records and rules added to the list in steps (a) and (f),and wherein steps (b), (d), and (e) are performed after steps (a), (f), and (g).
  - 9. The method of claim 6 wherein each rule and lexical record has associated with it an application priority value, and wherein step (e) applies rules and lexical records in the list in decreasing order of their application priority value, and wherein the application priority value associated with the lexical record for the word with which the identified word is commonly confused is smaller than the application priority value associated with lexical record for the identified word.
  - 10. The method of claim 9 wherein the application priority value associated with the lexical record for the word with which the identified word is commonly confused is set to be equal to the smallest application priority value associated with a lexical record added to the list in step (a).
  - 11. The method of claim 6 wherein the lexical records added to the list for each word in the input text segment are linked together to facilitate the application of rules that utilize more of the word'"'"'s lexical information than is contained in a single lexical record, the method further comprising the step of linking the lexical record for the word with which the identified word is commonly confused to any lexical records added to the chart for the identified word.

12. A method in a computer system for parsing input text made up of a plurality of words using a relation that maps from each of a plurality of potentially confused words to a possibly intended word, a word of the input text being a potentially confused word mapped by the relation to a possibly intended word not occurring in the input text, each word of the input text and each possibly intended word having one or more possible parts of speech, the method comprising:
- identifying the possible parts of speech for each word of the input text including the potentially confused word;
  
  identifying the possible parts of speech for the possibly intended word; and
  
  applying syntactic grammar rules to the identified parts of speech, such that a complete syntax tree containing a possible part of speech for the possibly intended word is produced and no complete syntax tree containing a possible part of speech for the potentially confused word is produced.

13. A computer-readable medium whose contents cause a computer system to parse a segment of natural language input text containing one or more words using grammar rules, a dictionary containing a plurality of entries, and a chart for containing a parse tree representing the input text segment and parsing results intermediate thereto, each dictionary entry corresponding to a word in the natural language and specifying one or more possible parts of speech for the word, by performing the steps of:
- (a) for each word occurring in the input text segment, creating a part-of-speech record in the chart for the word specifying a part of speech specified by the dictionary entry for the word;
  
  (b) identifying a word occurring in the input text segment that is commonly confused with another word;
  
  (c) creating a part-of-speech record in the chart for the identified word specitfying a part of speech specified by the dictionary entry for the word commonly confused with the identified word; and
  
  (d) applying the grammar rules to both the part-of-speech records created in step (a) and those created in step (c).
- View Dependent Claims (14, 15, 16, 17)
- - 14. The computer-readable medium of claim 13 wherein the contents of the computer-readable medium further cause the computer system to use a list of commonly confused words that contains, for each commonly confused word, a word with which the word is commonly confused, and wherein step (b) includes the step of matching one of the words occurring in the input text segment with one of the words in the list.
  - 15. The computer-readable medium of claim 13 wherein step (a) creates in the chart, for each word occurring in the input text segment, part-of-speech records specifying each of the possible parts of speech specified in the dictionary entry for the word;
    - and wherein the method further includes the step of, for each word in the input text segment, linking together the parts of speech records created in the chart for the word; and
      
      wherein the application of one or more of the grammar rules to a part-of-speech records involves determining other possible parts of speech for the word by examining the other parts of speech record to which the part-of-speech record is linked; and
      
      wherein the contents of the computer-readable medium further cause the computer system to perform the step of linking the part-of-speech record created in step (c) to the part-of-speech records created for the identified word in step (a).
  - 16. The computer-readable medium of claim 13 wherein the contents of the computer-readable medium further cause the computer system to perform step (c) after performance of step (d) begins.
  - 17. The computer-readable medium of claim 13 wherein the contents of the computer-readable medium further cause the computer system to perform step (c) after the application of grammar rules to part-of-speech records created in step (b) concludes.

18. A computer-readable medium whose contents cause a computer system to parse input text made up of a plurality of words using a relation mapping from each of a plurality of potentially confused words to a possibly intended word, a word of the input text being a potentially confused word mapped to a possibly intended word by the relation, each word of the input text and each possibly intended word having one or more possibly parts of speech, by:
- identifying the possible parts of speech for each word of the input text including the potentially confused word;
  
  identifying the possible parts of speech for the possibly intended word; and
  
  applying syntactic grammar rules to the identified parts of speech, such that a complete syntax tree containing a possible part of speech for the possibly intended word is produced and no complete syntax tree containing a possible part of speech for the potentially confused word is produced.

19. An apparatus for parsing a segment of natural language input text containing one or more words using grammar rules and a dictionary containing a plurality of entries, each dictionary entry corresponding to a word in the natural language and specifying one or more possible parts of speech for the word, comprising:
- a data structure for containing a parse tree representing the input text segment and parsing results intermediate thereto;
  
  a primary part-of-speech record generator that creates a part-of-speech record in the data structure for each word occurring in the input text segment, each part-of-speech record specifying a part-of-speech record specified by the dictionary entry for the word;
  
  an identifier that identifies a word occurring in the input text segment that is commonly confused with another word;
  
  a secondary part-of-speech record generator that creates a part-of-speech record in the chart memory for the word identified by the identifier, the created part-of-speech record specifying a part of speech specified by the dictionary entry for the word commonly confused with the identified word; and
  
  a grammar rule application subsystem that applies the grammar rules to both the part-of-speech records created by the primary part-of-speech record generator and those created by the secondary part-of-speech record generator.

20. A computer-readable medium containing instructions for controlling a computer system to load into a parser a list of rules and lexical records for the parser to apply when parsing a segment of natural language input text containing words, each contained word having associated with it lexical information, by:
- (a) adding to the list lexical records, each lexical record specifying lexical information associated with one of the words contained in the input text segment;
  
  (b) identifying a word contained in the input text that is commonly confused with another word;
  
  (c) adding to the list a lexical record specifying lexical information associated with the word with which the identified word is commonly confused; and
  
  (d) adding to the list rules that can be applied to the lexical records added to the list in steps (a) and (c).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Heidorn, George E., Richardson, Stephen Darrow
Primary Examiner(s)
Isen, Forester W.
Assistant Examiner(s)
Edouard, Patrick N.

Application Number

US08/671,203
Time in Patent Office

1,260 Days
Field of Search

704/1, 704/9-10, 707/530-533
US Class Current

704/9
CPC Class Codes

G06F 40/253 Grammatical analysis; Style...

Method and system for identifying and resolving commonly confused words in a natural language parser

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

74 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for identifying and resolving commonly confused words in a natural language parser

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

74 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links