Header-token driven automatic text segmentation
First Claim
Patent Images
1. A system comprising:
- a first machine configured to transmit a header and a corresponding unstructured description; and
a second machine in at least selective communication with the first machine, the second machine configured to,receive the header and the unstructured description,determine relevant state values and irrelevant state values for a set of tokens in the unstructured description based on the header,indicate as most relevant a sequence of tokens based, at least in part, on the relevant state values of the sequence of tokens and the irrelevant state values of those of the set of tokens outside of the sequence of tokens.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and a system to automatically segment text based on header tokens is described. A relevance value and an irrelevance value are determined for each token in a description, assuming no tokens are left out of computations. The irrelevance value is based on occurrences of a token in a sample set of descriptions. The relevance value is an estimated probability of relevance based on the header of the description being segmented.
-
Citations
30 Claims
-
1. A system comprising:
-
a first machine configured to transmit a header and a corresponding unstructured description; and a second machine in at least selective communication with the first machine, the second machine configured to, receive the header and the unstructured description, determine relevant state values and irrelevant state values for a set of tokens in the unstructured description based on the header, indicate as most relevant a sequence of tokens based, at least in part, on the relevant state values of the sequence of tokens and the irrelevant state values of those of the set of tokens outside of the sequence of tokens. - View Dependent Claims (2, 3, 4)
-
-
5. A method of automatic text segmentation, the method comprising the acts of:
-
estimating for each of a set of tokens in a description, a first probability that the token occurs as an irrelevant token in the description, a second probability that the token occurs as a relevant token in the description based, at least in part, on a header for the description; and identifying a group of sequential tokens in the description with a maximum probability of relevance based, at least in part, on the computed first probabilities of those of the set of tokens outside of the group of sequential tokens and the computed second probabilities of those of the set of tokens in the group of sequential tokens. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method of automatic text segmentation, the method comprising the acts of:
-
estimating for each token in set of tokens in a description, a probability that the token is irrelevant for the description; associating with each token in the set of tokens in the description, one of a first, second or third values dependent on whether respectively, the token occurs in a header for the description a lexical association exists between the token and a token in the header, the lexical association is absent and the token does not occur in the header; iterating over a plurality of groups of sequential tokens in the description, in each iteration, selecting a group, computing a relevance value for the selected group based, at least in part, on the estimated probability of one or more tokens out of the selected group and values associated with one or more tokens in the selected group; and indicating the one of the plural groups having a greatest relevance value. - View Dependent Claims (21, 22, 23, 24, 25, 26)
-
-
27. A set of instructions encoded in one or more machine-readable media, the set of instructions comprising:
-
a first sequence of instructions executable to, for each of a plurality of tokens of a description, associate an estimated probability of irrelevance and an estimated probability of relevance based on a set of one or more tokens in a header for the description; and a second sequence of instructions executable to indicate a group of sequential tokens of the plurality of tokens based, at least in part, on the estimated probabilities of relevance associated by the first sequence of instructions with those of the plurality of tokens in the group and on the estimated probabilities of irrelevance associated by the first sequence of instructions with those of the plurality of tokens outside of the group. - View Dependent Claims (28)
-
-
29. An apparatus comprising:
-
memory operable to host a description of a product or service and a header for the description, wherein the description is represented with a plurality of tokens; and means for automatically identifying a group of sequential tokens of the plurality of tokens as most relevant to the description based on estimated probabilities of relevance of the plurality of tokens based, at least in part, on the header and estimated probabilities of irrelevance of the plurality of tokens. - View Dependent Claims (30)
-
Specification