Method and apparatus for providing improved HMM POS tagger for multi-word entries and factoids
First Claim
1. A method of calculating trigram path probabilities for an input string of text, the method comprising:
- tokenizing the input string often to create a plurality of parse leaf units (PLUs);
constructing a PosColumn for each word, multi-word-entry (MWE), factoid and character in the input swing of text which has a unique first (Ft) and last (Lt) token pair associated therewith;
constructing all TrigramColumns corresponding to the input string of text, wherein each TrigramColumn defines a corresponding TrigramNode representing a trigram for three PosColumns in the TrigramColumn, each TrigramNode being identifiable by a unique set of three tokens;
determining, for each TrigramColumn, all neighboring TrigramColumns to the immediate left and to the immediate right;
calculating a forward trigram path probability, for each separate TrigramNode of each TrigramColumn, of all forward paths from a TrigramNode in a right neighboring TrigramColumn though the separate TrigramNode;
calculating a backward trigram path probability, for each separate TrigramNode of each TrigramColumn, of all backward paths from a TrigramNode in a left neighboring TrigramColumn though the separate TrigramNode; and
calculating sums of all trigram path probabilities though each PLU as a function of the calculated forward and backward trigram path probabilities.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of calculating trigram path probabilities for an input string of text containing a multi-word-entry (MWE) or a factoid includes tokenizing the input string to create a plurality of parse leaf units (PLUs). A PosColumn is constructed for each word, MWE, factoid and character in the input string of text which has a unique first (Ft) and last (Lt) token pair. TrigramColumns are constructed which define corresponding TrigramNodes each representing a trigram for three PosColumns. Forward and backward trigram path probabilities are calculated for each separate TrigramNode. The sums of all trigram path probabilities through each PLU are then calculated as a function of the forward and backward trigram path probabilities. Systems and computer-readable medium configured to implement the methods are also provided.
9 Citations
20 Claims
-
1. A method of calculating trigram path probabilities for an input string of text, the method comprising:
-
tokenizing the input string often to create a plurality of parse leaf units (PLUs); constructing a PosColumn for each word, multi-word-entry (MWE), factoid and character in the input swing of text which has a unique first (Ft) and last (Lt) token pair associated therewith; constructing all TrigramColumns corresponding to the input string of text, wherein each TrigramColumn defines a corresponding TrigramNode representing a trigram for three PosColumns in the TrigramColumn, each TrigramNode being identifiable by a unique set of three tokens; determining, for each TrigramColumn, all neighboring TrigramColumns to the immediate left and to the immediate right; calculating a forward trigram path probability, for each separate TrigramNode of each TrigramColumn, of all forward paths from a TrigramNode in a right neighboring TrigramColumn though the separate TrigramNode; calculating a backward trigram path probability, for each separate TrigramNode of each TrigramColumn, of all backward paths from a TrigramNode in a left neighboring TrigramColumn though the separate TrigramNode; and calculating sums of all trigram path probabilities though each PLU as a function of the calculated forward and backward trigram path probabilities. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-readable medium having computer executable instructions for performing the trigram path probability calculating steps comprising:
-
tokenizing an input string of text to create a plurality of parse leaf units (PLUs); constructing a PosColumn for each word, multi-word-entry (MWE), factoid and character in the input string of text which has a unique first (Ft) and last (Lt) token pair associated therewith; constructing all TrigramColumns corresponding to the input string of text, wherein each TrigramColumn defines a corresponding TrigramNode representing a trigram for three PosColumns in the TrigramColumn, each TrigramNode being identifiable by a unique set of three tokens; determining, for each TrigramColumn, all neighboring TrigramColumns to the immediate left and to the immediate right; calculating a forward trigram path probability, for each separate TrigramNode of each TrigramColumn, of all forward paths from a TrigramNode in a right neighboring TrigramColumn through the separate TrigramNode; calculating a backward trigram path probability, for each separate TrigramNode of each TrigramColumn, of all backward paths from a TrigramNode in a left neighboring TrigramColumn through the separate TrigramNode; and calculating sums of all trigram path probabilities through each PLU as a function of the calculated forward and backward trigram path probabilities. - View Dependent Claims (9, 10, 11)
-
-
12. A trigram path probability calculating system for calculating trigram path probabilities for an input string of text, the system comprising:
-
a tokenizer configured to tokenize the input string of text to create a plurality of parse leaf units (PLUs); a PosColumn generator configured to construct a PosColumn for each word, MWE, factoid and character in the input string of text which has a unique first (Ft) and last (Lt) token pair associated therewith; a TrigramColumn generator configured to construct TrigramColumns corresponding to the input string of text, wherein each TrigramColumn defines a corresponding TrigramNode representing a trigram for three PosColumns in the TrigramColumn; a trigram path probability calculator configured to calculate a forward trigram path probability and a backward trigram path probability for each separate TrigramNode of each TrigramColumn, the trigram path probability calculator further configured to calculate sums of all trigram path probabilities through each PLU as a function of the calculated forward and backward trigram path probabilities. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification