SEMI-SUPERVISED PART-OF-SPEECH TAGGING

US 20090157384A1
Filed: 12/12/2007
Published: 06/18/2009
Est. Priority Date: 12/12/2007
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving a text comprising a sequence of words;

selecting a word from the text;

identifying features of the selected word, the features comprising a suffix of the selected word;

applying the features of the selected word to a model to identify probabilities for sets of part-of-speech tags, at least one set of part-of-speech tags comprising at least two part-of-speech tags, each part-of-speech tag representing a part-of-speech;

using the probabilities for sets of part-of-speech tags to weight scores for possible part-of-speech tags for the selected word to form weighted scores;

using the weighted scores to select a part-of-speech tag for the selected word; and

storing the selected part-of-speech tag for the selected word.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A word is selected from a received text and features are identified from the word. The features are applied to a model to identify probabilities for sets of part-of-speech tags. The probabilities for the sets of part-of-speech tags are used to weight scores for possible part-of-speech tags for the selected word to form weighted scores. The weighted scores are used to select a part-of-speech tag for the word and the selected part of speech tag is stored or output. The scores for the possible part-of-speech tags are based on variational approximation parameters trained from a sparse prior over probability distributions describing the probability of a part-of-speech tag given a word.

Citations

20 Claims

1. A method comprising:
- receiving a text comprising a sequence of words;
  
  selecting a word from the text;
  
  identifying features of the selected word, the features comprising a suffix of the selected word;
  
  applying the features of the selected word to a model to identify probabilities for sets of part-of-speech tags, at least one set of part-of-speech tags comprising at least two part-of-speech tags, each part-of-speech tag representing a part-of-speech;
  
  using the probabilities for sets of part-of-speech tags to weight scores for possible part-of-speech tags for the selected word to form weighted scores;
  
  using the weighted scores to select a part-of-speech tag for the selected word; and
  
  storing the selected part-of-speech tag for the selected word.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1 wherein the features of the selected word further comprise whether the selected word is capitalized in the text, whether the selected word contains a hyphen and whether the selected word contains a digit character.
  - 3. The method of claim 2 wherein using the probabilities for sets of part-of-speech tags to weight scores for possible part-of-speech tags comprises for each set of part-of-speech tags:
    - determining a separate value for each part-of-speech tag in the set of part-of-speech tags;
      
      selecting the part-of-speech tag with the largest value;
      
      computing a score using the selected part-of-speech tag; and
      
      weighting the score by the probability of the set of part-of-speech tags.
  - 4. The method of claim 3 wherein using the weighted scores to select a part-of-speech tag comprises selecting the set of part-of-speech tags that produces the largest weighted score and selecting the part-or-speech tag in the selected set of part-of-speech tags that is associated with the largest value in the set of part-of-speech tags.
  - 5. The method of claim 3 wherein determining a value for a part-of-speech tag comprises determining the value based on a probability distribution defined by a variational parameter trained from a sparse prior distribution of probability distributions that describe a probability of a part-of-speech tag given a word.
  - 6. The method of claim 1 wherein the model is trained based on entries in a dictionary, each entry identifying features of a word and a set of part-of-speech tags for the word, the dictionary lacking an entry for the selected word in the text.

7. The method of 6 wherein the model is trained by forming partial counts of part-of-speech tags based on a probability of a part-of-speech tag given a set of features.

8. A computer-readable storage medium having encoded thereon computer-executable instructions causing a processor to execute steps comprising:
- receiving a text comprising a sequence of words;
  
  training variational approximation parameters based on a sparse prior distribution of probability distributions that describe the probability of a part-of-speech tag given a word and based on the sequence of words, the variational approximation parameters comprising a separate variational approximation parameter for each occurrence of each word in the sequence of words, each separate variational approximation parameter describing a distribution for a tag given a word;
  
  selecting a part-of-speech tag for a word that maximizes a value computed from the distributions formed by the variational approximation parameters; and
  
  outputting the selected part-of-speech tag as the part-of-speech tag for the word.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
- - 9. The computer-readable storage medium of claim 8 wherein training the variational approximation parameters further comprises training the variational approximation parameters based on a prior distribution of probability distributions that describe the probability of a context word given a part-of-speech tag.
  - 10. The computer-readable storage medium of claim 8 wherein the sparse prior distribution is based on a set of part-of-speech tags associated with a word.
  - 11. The computer-readable storage medium of claim 8 wherein selecting a part-of-speech tag that maximizes a value computed from the distributions formed by the variational approximation parameters comprises identifying a separate part-of-speech tag for a word for each set of a collection of sets of part-of-speech tags by identifying the part-of-speech tag that maximizes a value computed from the distributions formed by the variational approximation parameters, computing a score based on the identified part-of-speech tag, weighting the score based on a probability of a set of part-of-speech tags given features of the word to form a weighted score and selecting the part-of-speech tag identified for the set of part-of-speech tags with the largest weighted score.
  - 12. The computer-readable storage medium of claim 11 wherein identifying the part-of-speech tag that maximizes a value computed from the distributions formed by the variational approximation parameters comprises identifying the part-of-speech tag with the highest probability given a probability distribution defined by a variational approximation parameter.
  - 13. The computer-readable storage medium of claim 11 further comprising selecting a part-of-speech tag for a second word that maximizes a value computed from the distributions formed by the variational approximation parameters by identifying a set of part-of-speech tags for the word from a dictionary and identifying the part-of-speech tag in the set of part-of-speech tags that maximizes a value computed from the distributions formed by the variational approximation parameters
  - 14. The computer-readable storage medium of claim 11 wherein the probability of a set of part-of-speech tags given features of the word is determined from a probability distribution trained on a dictionary using partial counts of part-of-speech tags.
  - 15. The computer-readable storage medium of claim 11 wherein the features of the word comprise whether the word is capitalized, whether the word contains a hyphen, whether the word contains a digit, and the suffix of the word.

16. A method comprising:
- receiving a text;
  
  selecting a first word in the text;
  
  retrieving an entry for the first word from a dictionary stored on a computer-readable storage medium, the entry indicating a set of part-of-speech tags associated with the first word;
  
  using the set of part-of-speech tags from the entry to identify a part-of-speech tag for the first word;
  
  storing the part-of-speech tag for the first word on a computer-readable storage medium;
  
  selecting a second word in the text;
  
  determining that the dictionary does not have an entry for the second word;
  
  selecting a part-of-speech tag for the second word based in part on probabilities of sets of part-of-speech tags given features of the second word; and
  
  storing the part-of-speech tag for the second word on a computer-readable storage medium.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The method of claim 16 wherein using the set of part-of-speech tags from the entry to identify a part-of-speech tag for the first word comprises selecting a part-of-speech tag from the set of part-of-speech tags and computing a value for the selected part-of-speech tag using a variational approximation parameter that describes a probability of the part-of-speech tag given the occurrence of the first word.
  - 18. The method of claim 17 wherein the variational approximation parameter is trained based in part on a sparse prior distribution of probability distributions that provide a probability of a part-of-speech tag given a word.
  - 19. The method of claim 16 wherein selecting a part-of-speech tag for the second word based in part on probabilities of sets of part-of-speech tags given features of the second word comprises determining a score for each part-of-speech tag in a set of part-of-speech tags, determining which score is a maximum score, using the part-of-speech tag associated with the maximum score to form a second score, weighting the second score by the probability of the set of part-of-speech tags given the features of the second word to form a set score for the set of part-of-speech tags, selecting a set of part-of-speech tags based on the set score, and selecting the part-of-speech tag associated with the maximum score of the selected set of part-of-speech tags.
  - 20. The method of claim 19 wherein forming a second score comprises integrating over probability distributions that describe the probability of the part-of-speech tag given the word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Toutanova, Kristina Nikolova, Johnson, Mark Edward

Granted Patent

US 8,275,607 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/9
CPC Class Codes

G06F 40/268 Morphological analysis

SEMI-SUPERVISED PART-OF-SPEECH TAGGING

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

SEMI-SUPERVISED PART-OF-SPEECH TAGGING

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links