×

Method and apparatus for providing improved HMM POS tagger for multi-word entries and factoids

  • US 6,985,851 B2
  • Filed: 07/17/2001
  • Issued: 01/10/2006
  • Est. Priority Date: 07/17/2001
  • Status: Active Grant
First Claim
Patent Images

1. A method calculating trigram path probabilities for an input string of text, the method comprising:

  • tokenizing the input string of text to create plurality of parse leaf units (PLUs), wherein tokenizing the input string of text further comprises;

    assigning a token number, consecutively from left to right, to each word and character in the input string of text;

    identifying multi-word-entries (MWEs) and factoids in the input string of text; and

    assigning parts of speech to each token, MWE and factoid;

    constructing a PosColumn for each word, MWE, factoid and character in the input string of text which has a unique first (Ft) and last (Lt) token pair associated therewith, wherein constructing the PosColumn for each word, MWE, factoid and character in the input string of text further comprises;

    adding dummy tokens for positions immediately prior to the first word, MWE, factoid or character and for positions immediately after the last word, MWE, factoid or character of the input string of text; and

    assigning a Begin part of speech to dummy tokens for positions immediately prior to the first word, MWE, factoid or character of the input string of text, and assigning an End part of speech for positions immediately after the last word, MWE, factoid or character of the text;

    constructing all TrigramColumns corresponding to the input string of text, wherein each TrigramColumn defines a corresponding TrigramNode representing a trigram for three PosColumns in the TrigramColumn, each TrigramNode being identifiable by a unique set of three tokens;

    determining, for each TrigramColumn, all neighboring TrigramColumns to the immediate left and to the immediate right;

    calculating a forward trigram path probability, for each separate TrigramNode of each TrigramColumn, of all forward paths from a TrigramNode in a right neighboring TrigramColumn through the separate TrigramNode;

    calculating a backward trigram path probability, for each separate TrigramNode of each TrigramColumn, of all backward paths from a TrigramNode in a left neighboring TrigramColumn through the separate TrigramNode; and

    calculating sums of all trigram path probabilities through each PLU as a function of the calculated forward and backward trigram path probabilities.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×