Natural language parser with dictionary-based part-of-speech probabilities

US 5,878,386 A
Filed: 06/28/1996
Issued: 03/02/1999
Est. Priority Date: 06/28/1996
Status: Expired due to Term

First Claim

Patent Images

1. In a parser of a natural language processing system, a method comprising the following steps:

examining individual dictionary entries for corresponding words in a dictionary;

counting, for an individual dictionary entry, a number of senses listed in the dictionary entry which are associated with a part of speech; and

deriving a part-of-speech probability indicative of how likely a dictionary entry is to be a particular part of speech based upon the number of senses associated with the particular part of speech.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A natural language parser determines part-of-speech probabilities by using a dictionary or other lexicon as a source for the part-of-speech probabilities. A machine-readable dictionary is scanned, word-by-word. For each word, the number of senses listed for the word and associated with a part of speech are counted. A part-of-speech probability is then computed for each part of speech based upon the number of senses counted. The part-of-speech probability is indicative of how likely the word is to assume a particular part of speech in a text. The most probable parts of speech are then used by a parser during the first parse of an input string of text to improve the parser'"'"'s accuracy and efficiency.

Citations

48 Claims

1. In a parser of a natural language processing system, a method comprising the following steps:
- examining individual dictionary entries for corresponding words in a dictionary;
  
  counting, for an individual dictionary entry, a number of senses listed in the dictionary entry which are associated with a part of speech; and
  
  deriving a part-of-speech probability indicative of how likely a dictionary entry is to be a particular part of speech based upon the number of senses associated with the particular part of speech.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. A method as recited in claim 1, wherein the examining step comprises the step of reading a computer-readable dictionary using a computational device.
  - 3. A method as recited in claim 1, further comprising the following steps:
    - counting a total number of senses for the dictionary entry; and
      
      computing the part-of-speech probability as a function of the number of senses counted for the part of speech and the total number of senses.
  - 4. A method as recited in claim 1, further comprising the following steps:
    - counting a number of senses listed in the dictionary entry which are associated with every part of speech; and
      
      determining which part of speech is most probable based upon the various numbers of senses associated with the different parts of speech.
  - 5. A method as recited in claim 4, further comprising the step of entering the most probable part of speech into the parser.
  - 6. A method as recited in claim 1, further comprising the following steps:
    - determining whether the dictionary entry is an inflected form of a lexeme accounted for by another dictionary entry;
      
      in an event that the dictionary entry is an inflected form, counting a number of senses for each part of speech attributable to the lexeme dictionary entry and a number of senses for each part of speech attributable to the inflected form dictionary entry; and
      
      adding the number of senses attributable to the lexeme dictionary entry and the inflected form dictionary entry to derive the part-of-speech probability for the inflected form dictionary entry.
  - 7. A method as recited in claim 1, further comprising the following steps:
    - counting a number of senses listed in the dictionary entry which are associated with every part of speech; and
      
      deriving part-of-speech probabilities for all of the parts of speech based upon the number of senses associated with the parts of speech.
  - 8. A method as recited in claim 1, wherein the dictionary entry has first and second parts of speech, further comprising the following steps:
    - counting a first number of senses for the first part of speech and a second number of senses for the second part of speech;
      
      modifying at least one of the first and second numbers to increase a difference between the first and second numbers; and
      
      deriving part-of-speech probabilities for the first and second parts of speech based on the modified first and second numbers.
  - 9. A method as recited in claim 1, further comprising the following steps:
    - deriving part-of-speech probabilities for many dictionary entries in the dictionary; and
      
      saving, as part of the dictionary, the part-of-speech probabilities in correlation with the dictionary entries.
  - 10. A computer-readable dictionary stored in a computer-readable memory which incorporates the part-of-speech probabilities created as a result of the method as recited in claim 9.
  - 11. A computer programmed to perform the steps of the method as recited in claim 1.
  - 12. A computer-implemented rule-based parser stored in a storage medium and executable on a process programmed to perform the steps of the method as recited in claim 1.
  - 13. A computer-readable memory which directs a computer to perform the steps of the method as recited in claim 1.

14. In a natural language processing system for determining which part of speech a word is likely to be in a natural language text, the word being listed in a dictionary with multiple senses attributed thereto, the senses reflecting multiple different parts of speech that the word can assume in different contexts, a method comprising the following steps:
- counting a number of senses listed in the dictionary for each part of speech that the word can assume; and
  
  deriving a part-of-speech probability indicative of how likely the word is to be a particular part of speech based upon the number of senses counted in conjunction with the particular part of speech.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
- - 15. A method as recited in claim 14, further comprising the following steps:
    - counting a total number of senses listed in the dictionary; and
      
      computing the part-of-speech probability as a function of the number of senses counted for the particular part of speech and the total number of senses.
  - 16. A method as recited in claim 14, further comprising the following steps:
    - deriving a part-of-speech probability for each part of speech that the word can assume; and
      
      determining which of the parts of speech is most probable from the part-of-speech probabilities.
  - 17. A method as recited in claim 14, further comprising the following steps:
    - deriving part-of-speech probabilities for multiple words in the dictionary; and
      
      saving, as part of the dictionary, the part-of-speech probabilities in correlation with the words.
  - 18. A computer-readable dictionary stored in a computer-readable memory which incorporates the part-of-speech probabilities created as a result of the method as recited in claim 17.
  - 19. A computer programmed to perform the steps of the method as recited in claim 14.
  - 20. A computer-implemented rule-based parser stored in a storage medium and executable on a process programmed to perform the steps of the method as recited in claim 14.
  - 21. A computer-readable memory which directs a computer to perform the steps of the method as recited in claim 14.

22. In a natural language processing system for determining which part of speech a word is likely to be in a natural language text, the word being listed in a dictionary with multiple senses attributed thereto, the senses reflecting multiple different parts of speech that the word can assume in different contexts, a method comprising the following steps:
- counting a number of senses listed in the dictionary for each part of speech that the word can assume; and
  
  using the number of senses counted for each part of speech as an indication of how likely the word is to be a particular part of speech.
- View Dependent Claims (23, 24, 25, 26)
- - 23. A method as recite in claim 22, further comprising the step of initializing the parser to parse beginning with the part of speech having a highest number of senses.
  - 24. A computer programmed to perform the steps of the method as recited in claim 22.
  - 25. A computer-implemented rule-based parser stored in a storage medium and executable on a process programmed to perform the steps of the method as recited in claim 22.
  - 26. A computer-readable memory which directs a computer to perform the steps of the method as recited in claim 22.

27. In a natural language processing system, a method comprising the following steps:
- generating, for lexemes listed as dictionary entries in a dictionary, inflected forms of the lexemes;
  
  for each lexeme, counting a number of senses for each part of speech attributable to the lexeme in the dictionary;
  
  for each inflected form, counting a number of senses for each part of speech attributable to the inflected form and adding, for each part of speech, the number of senses attributable to the inflected form and the number of senses attributable to the lexeme from which the inflected form is generated; and
  
  deriving, for each lexeme and inflected form, a part-of-speech probability indicative of how likely the lexeme or inflected form is to be a particular part of speech based upon the senses counted in said counting steps.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35, 36)
- - 28. A method as recited in claim 27, further comprising the step of reading a computer-readable dictionary using a computational device.
  - 29. A method as recited in claim 27, further comprising the following steps:
    - reading a computer-readable dictionary using a computational device, the computer-readable dictionary having dictionary entries that are substantially lexemes; and
      
      expanding the computer-readable dictionary to include dictionary entries for inflected forms of the lexemes; and
      
      using the expanded dictionary as a source for counting the senses in said counting steps.
  - 30. A method as recited in claim 27, further comprising the following steps:
    - repeating the counting steps for every part of speech attributable to a lexeme or an inflected form;
      
      deriving multiple part-of-speech probabilities for every part of speech.
  - 31. A method as recited in claim 30, further comprising the following steps:
    - determining which part of speech is most probable; and
      
      entering the most probable part of speech into the parser.
  - 32. A method as recited in claim 30, further comprising the step of saving the multiple part-of-speech probabilities in correlation with the lexeme or inflected form.
  - 33. A computer-readable dictionary stored in a computer-readable memory having the part-of-speech probabilities created as a result of the method as recited in claim 32.
  - 34. A computer programmed to perform the steps of the method as recited in claim 27.
  - 35. A computer-implemented rule-based parser stored in a storage s medium and executable on a process programmed to perform the steps of the method as recited in claim 27.
  - 36. A computer-readable memory which directs a computer to perform the steps of the method as recited in claim 27.

37. A method for parsing a natural language text comprising the following steps:
- counting a number of senses listed in a dictionary that are associated with a part of speech;
  
  deriving a part-of-speech probability as a function of the number of senses associated with the part of speech; and
  
  choosing a part of speech for a word in the text based upon the part-of-speech probability.
- View Dependent Claims (38, 39, 40, 41, 42)
- - 38. A method as recited in claim 37 further comprising the step of initially choosing, for the word, a part of speech with a highest part-of-speech probability as determined by the part of speech having a highest number of senses listed in the dictionary.
  - 39. A method as recited in claim 37 further comprising the step of sequentially choosing, for the word, parts of speech in decreasing order of part-of-speech probabilities as determined by the number of senses given for each part of speech attributable to the word in the dictionary.
  - 40. A computer programmed to perform the steps of the method as recited in claim 37.
  - 41. A computer-implemented rule-based parser stored in a storage medium and executable on a process programmed to perform the steps of the method as recited in claim 37.
  - 42. A computer-readable memory which directs a computer to perform the steps of the method as recited in claim 37.

43. A method for parsing a natural language text to determine which part of speech a word assumes within the text comprising the following steps:
- counting a number of senses listed in a dictionary which are associated with a part of speech for the word;
  
  determining the part of speech with a highest number of senses listed in the dictionary; and
  
  choosing, for an initial parse, the part of speech for the word with the highest number of senses.

44. An apparatus for determining which part of speech a word is likely to be in a natural language text, comprising:
- a sense counter to scan words from a machine-readable dictionary and to count, for each word, a number of senses associated with each part of speech attributable to the word; and
  
  a computational unit to compute, for each word, part-of-speech probabilities indicative of how likely the word is to be particular parts of speech based upon the number of senses counted by the sense counter.
- View Dependent Claims (45, 46, 47, 48)
- - 45. An apparatus as recited in claim 44, wherein:
    - the sense counter counts a total number of senses for all parts of speech attributable to the word; and
      
      the computational unit computes the part-of-speech probabilities as a function of the number of senses counted for each associated part of speech and the total number of senses.
  - 46. An apparatus as recited in claim 44, wherein:
    - the machine-readable dictionary contains words in lexeme form and inflected forms of the lexeme form;
      
      in an event that the word is an inflected form, the sense counter counts a number of senses associated with each part of speech attributable to the lexeme form of the word and a number of senses associated with each part of speech attributable to the inflected form of the word; and
      
      the computational unit adding the counts from the sense counter for both the lexeme and inflected forms for use in deriving the part-of-speech probabilities for the inflected form of the word.
  - 47. An apparatus as recited in claim 44, wherein:
    - the word has first and second parts of speech;
      
      the sense counter counts a first number of senses for the first part of speech and a second number of senses for the second part of speech; and
      
      the computational unit modifies at least one of the first and second numbers to increase a difference between the first and second numbers and computes the part-of-speech probabilities based on the modified first and second numbers.
  - 48. A computerized rule-based parser processing system comprising the apparatus recited in claim 44.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Coughlin, Deborah A.
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
EDOUARD, PATRICK NESTOR

Application Number

US08/671,940
Time in Patent Office

977 Days
Field of Search

704/8, 704/1, 704/9, 704/10, 707/530, 707/531, 707/532, 707/533
US Class Current

704/10
CPC Class Codes

G06F 40/205   Parsing

G06F 40/211   Syntactic parsing, e.g. bas...

G06F 40/216   using statistical methods

G06F 40/242   Dictionaries

G06F 40/268   Morphological analysis

G06F 40/284   Lexical analysis, e.g. toke...

Natural language parser with dictionary-based part-of-speech probabilities

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

48 Claims

Specification

Solutions

Use Cases

Quick Links

Natural language parser with dictionary-based part-of-speech probabilities

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

48 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links