Method and apparatus for providing improved HMM POS tagger for multi-word entries and factoids
First Claim
1. A method of calculating trigram path probabilities for an input string of text, the method comprising:
- tokenizing the input string of text to create a plurality of parse leaf units (PLUs);
constructing a PosColumn for each word, multi-word-entry (MWE), factoid and character in the input string of text which has a unique first (Ft) and last (Lt) token pair associated therewith;
constructing all TrigramColumns corresponding to the input string of text, wherein each TrigramColumn defines a corresponding TrigramNode representing a trigram for three PosColumns in the TrigramColumn, each TrigramNode being identifiable by a unique set of three tokens;
determining, for each TrigramColumn, all neighboring TrigramColumns to the immediate left and to the immediate right;
calculating a forward trigram path probability, for each separate TrigramNode of each TrigramColumn, of all forward paths from a TrigramNode in a right neighboring TrigramColumn through the separate TrigramNode;
calculating a backward trigram path probability, for each separate TrigramNode of each TrigramColumn, of all backward paths from a TrigramNode in a left neighboring TrigramColumn through the separate TrigramNode; and
calculating sums of all trigram path probabilities through each PLU as a function of the calculated forward and backward trigram path probabilities.
1 Assignment
0 Petitions
Accused Products
Abstract
A method of calculating trigram path probabilities for an input string of text containing a multi-word-entry (MWE) or a factoid includes tokenizing the input string to create a plurality of parse leaf units (PLUs). A PosColumn is constructed for each word, MWE, factoid and character in the input string of text which has a unique first (Ft) and last (Lt) token pair. TrigramColumns are constructed which define corresponding TrigramNodes each representing a trigram for three PosColumns. Forward and backward trigram path probabilities are calculated for each separate TrigramNode. The sums of all trigram path probabilities through each PLU are then calculated as a function of the forward and backward trigram path probabilities. Systems and computer-readable medium configured to implement the methods are also provided.
-
Citations
42 Claims
-
1. A method of calculating trigram path probabilities for an input string of text, the method comprising:
-
tokenizing the input string of text to create a plurality of parse leaf units (PLUs);
constructing a PosColumn for each word, multi-word-entry (MWE), factoid and character in the input string of text which has a unique first (Ft) and last (Lt) token pair associated therewith;
constructing all TrigramColumns corresponding to the input string of text, wherein each TrigramColumn defines a corresponding TrigramNode representing a trigram for three PosColumns in the TrigramColumn, each TrigramNode being identifiable by a unique set of three tokens;
determining, for each TrigramColumn, all neighboring TrigramColumns to the immediate left and to the immediate right;
calculating a forward trigram path probability, for each separate TrigramNode of each TrigramColumn, of all forward paths from a TrigramNode in a right neighboring TrigramColumn through the separate TrigramNode;
calculating a backward trigram path probability, for each separate TrigramNode of each TrigramColumn, of all backward paths from a TrigramNode in a left neighboring TrigramColumn through the separate TrigramNode; and
calculating sums of all trigram path probabilities through each PLU as a function of the calculated forward and backward trigram path probabilities. - View Dependent Claims (2, 3, 4, 5, 6, 10)
-
-
7-9. -9. (canceled)
-
11-15. -15. (canceled)
-
16. A computer-readable medium having computer executable instructions for performing the trigram path probability calculating steps comprising:
-
tokenizing an input string of text to create a plurality of parse leaf units (PLUs);
constructing a PosColumn for each word, multi-word-entry (MWE), factoid and character in the input string of text which has a unique first (Ft) and last (Lt) token pair associated therewith;
constructing all TrigramColumns corresponding to the input string of text, wherein each TrigramColumn defines a corresponding TrigramNode representing a trigram for three PosColumns in the TrigramColumn, each TrigramNode being identifiable by a unique set of three tokens;
determining, for each TrigramColumn, all neighboring TrigramColumns to the immediate left and to the immediate right;
calculating a forward trigram path probability, for each separate TrigramNode of each TrigramColumn, of all forward paths from a TrigramNode in a right neighboring TrigramColumn through the separate TrigramNode;
calculating a backward trigram path probability, for each separate TrigramNode of each TrigramColumn, of all backward paths from a TrigramNode in a left neighboring TrigramColumn through the separate TrigramNode; and
calculating sums of all trigram path probabilities through each PLU as a function of the calculated forward and backward trigram path probabilities. - View Dependent Claims (17, 18, 19)
-
-
20-29. -29. (canceled)
-
30. A trigram path probability calculating system for calculating trigram path probabilities for an input string of text, the system comprising:
-
a tokenizer configured to tokenize the input string of text to create a plurality of parse leaf units (PLUs);
a PosColumn generator configured to construct a PosColumn for each word, MWE, factoid and character in the input string of text which has a unique first (Ft) and last (Lt) token pair associated therewith;
a TrigramColumn generator configured to construct TrigramColumns corresponding to the input string of text, wherein each TrigramColumn defines a corresponding TrigramNode representing a trigram for three PosColumns in the TrigramColumn;
a trigram path probability calculator configured to calculate a forward trigram path probability and a backward trigram path probability for each separate TrigramNode of each TrigramColumn, the trigram path probability calculator further configured to calculate sums of all trigram path probabilities through each PLU as a function of the calculated forward and backward trigram path probabilities. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 41)
-
-
38-40. -40. (canceled)
-
42-44. -44. (canceled)
Specification