×

Method and apparatus for providing improved HMM POS tagger for multi-word entries and factoids

  • US 20050234717A1
  • Filed: 06/14/2005
  • Published: 10/20/2005
  • Est. Priority Date: 07/17/2001
  • Status: Active Grant
First Claim
Patent Images

1. A method of calculating trigram path probabilities for an input string of text, the method comprising:

  • tokenizing the input string of text to create a plurality of parse leaf units (PLUs);

    constructing a PosColumn for each word, multi-word-entry (MWE), factoid and character in the input string of text which has a unique first (Ft) and last (Lt) token pair associated therewith;

    constructing all TrigramColumns corresponding to the input string of text, wherein each TrigramColumn defines a corresponding TrigramNode representing a trigram for three PosColumns in the TrigramColumn, each TrigramNode being identifiable by a unique set of three tokens;

    determining, for each TrigramColumn, all neighboring TrigramColumns to the immediate left and to the immediate right;

    calculating a forward trigram path probability, for each separate TrigramNode of each TrigramColumn, of all forward paths from a TrigramNode in a right neighboring TrigramColumn through the separate TrigramNode;

    calculating a backward trigram path probability, for each separate TrigramNode of each TrigramColumn, of all backward paths from a TrigramNode in a left neighboring TrigramColumn through the separate TrigramNode; and

    calculating sums of all trigram path probabilities through each PLU as a function of the calculated forward and backward trigram path probabilities.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×