Method for improving results in an HMM-based segmentation system by incorporating external knowledge
First Claim
1. A method for improving a Hidden Markov model (HMM) based mark-up system, the method including the steps of:
- a. constructing a HMM defining a plurality of states;
b. modifying a Viterbi algorithm, related to the HMM, in order to apply a multiplicative factor if a particular state is re-entered; and
c. executing the modified Viterbi algorithm against at least one information source.
3 Assignments
0 Petitions
Accused Products
Abstract
A Hidden Markov model is used to segment a data sequence. To reduce the potential for error that may result from the Markov assumption, the Viterbi dynamic programming algorithm is modified to apply a multiplicative factor if a particular set of states is re-entered. As a result, structural domain knowledge is incorporated into the algorithm by expanding the state space in the dynamic programming recurrence. In a specific example of segmenting resumes, the factor is used to reward or penalize (even require or prohibit) a segmentation of the resume that results in the re-entry into a section such as Experience or Contact Information. The method may be used to impose global constraints in the processing of an input sequence or to impose constraints to local sub-sequences.
32 Citations
18 Claims
-
1. A method for improving a Hidden Markov model (HMM) based mark-up system, the method including the steps of:
-
a. constructing a HMM defining a plurality of states; b. modifying a Viterbi algorithm, related to the HMM, in order to apply a multiplicative factor if a particular state is re-entered; and c. executing the modified Viterbi algorithm against at least one information source. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method of improving a Hidden Markov model (HMM) segmentation system comprising the steps of:
-
a. receiving a data sequence to be segmented; b. invoking a Viterbi algorithm to label the received data sequence into a plurality of segment types; c. if a segment type is identified more than once during labeling, verifying which identification is correct; d. anchoring, within the data sequence, labels verified as being correct; and e. invoking the Viterbi algorithm to label the data sequence, including the anchored labels, into the plurality of segment types.
-
-
13. A multi-pass method for improving results of a Hidden Markov model (HMM) based segmentation system, comprising the steps of:
-
a. receiving a data sequence to be segmented; b. invoking a Viterbi algorithm to label the received data sequence into a plurality of segment types; and c. if a segment type is identified more than once during labeling, invoking a modified Viterbi algorithm to label the received data sequence into the plurality of segment types;
wherein the modified Viterbi algorithm imposes a constraint regarding re-entry into a particular state of the HMM. - View Dependent Claims (14, 15, 16)
-
-
17. A method for improving a Hidden Markov model (HMM) based mark-up system, the method including the steps of:
-
a. constructing an HMM defining a plurality of hierarchically arranged states; b. modifying a Viterbi algorithm, related to the HMM, in order to apply a first multiplicative factor if a first state of the HMM is re-entered and to apply a second multiplicative factor if a second state of the HMM is re-entered, wherein the second state is at a second hierarchical level under the first hierarchical level of the first state; and c. invoking the modified Viterbi algorithm against at least one information source.
-
-
18. A method for improving a conventional Viterbi algorithm, the method comprising the step of modifying the determination of δ
- and φ
of the conventional Viterbi algorithm such that;a. for each state i ε
{1, . . . ,N},i. if state i is in re-entry group k, a) δ
1(i,Gk)=π
l×
bi(O1);
for all G≠
Gk, δ
1(i,G)=0, andb) For all G, φ
i(i,G)=0;ii. otherwise, if state i is not in any re-entry group k, a) δ
1(i,G0)=π
i×
bi(O1);
for all G≠
G0, δ
1(i,G)=0, andb) For all G, φ
1(i,G)=0; andb. for time t=2 to T, iii. for each state i ε
{1, . . . ,N},a) for each re-entry state G, 1) δ
1(i,G)=max1≦
j≦
N{δ
t-1(j,G′
)×
aji×
d(G′
,i,j)}×
bi(Ot), and2) φ
t(i,G)=argmax1≦
j≦
N{δ
t-1(j)×
aji×
d(G′
,i,j)},C. wherein Gk, for each re-entry group k, denotes the current number of entries into that particular re-entry group;
G denotes a re-entry state comprising a set of values for all Gk;
Gk denotes the re-entry state consisting of zeroes for all re-entry groups except k and a one for group k; and
G′
denotes the re-entry state j that would have led to re-entry state G when moving from state j to state i.
- and φ
Specification