×

Semi-supervised part-of-speech tagging

  • US 8,275,607 B2
  • Filed: 12/12/2007
  • Issued: 09/25/2012
  • Est. Priority Date: 12/12/2007
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • receiving a text comprising a sequence of words;

    selecting a word from the text;

    identifying features of the selected word, the features comprising a suffix of the selected word;

    applying the features of the selected word to a model to identify probabilities for sets of part-of-speech tags, at least one set of part-of-speech tags comprising at least two part-of-speech tags, each part-of-speech tag representing a part-of-speech;

    with a processor, using the probabilities for sets of part-of-speech tags to weight scores for possible part-of-speech tags for the selected word to form weighted scores by performing steps for each set of part-of speech tags, the steps comprising;

    selecting a variational approximation parameter that is dependent on the selected word, an occurrence number for the word and the set of part of speech tags wherein the variational parameter is trained from a sparse prior distribution of probability distributions that describe a probability of a part-of-speech tag given a word;

    determining a separate value for each part-of-speech tag in the set of part-of-speech tags by using the selected variational approximation parameter;

    selecting from the set of part-of-speech tags the part-of-speech tag with the largest value;

    computing a score using the selected part-of-speech tag; and

    weighting the score by the probability of the set of part-of-speech tags;

    using the weighted scores to select a part-of-speech tag for the selected word; and

    storing the selected part-of-speech tag for the selected word.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×