HEADER-TOKEN DRIVEN AUTOMATIC TEXT SEGMENTATION
First Claim
Patent Images
1. A method comprising:
- assigning a value to a first token in a description, the value indicating either;
that the first token also occurs in a header of the description,that a lexical association exists between the first token and a second token in the header, orthat the lexical association does not exist and the first token is absent from the header;
computing a relevance value of a group of tokens that occur in the description and include the first token with the assigned value, the relevance value of the group being computed by a processor of a machine based on the value assigned to the first token; and
indicating that the group of tokens is a most relevant group of tokens in the description.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and a system to automatically segment text based on header tokens is described. A relevance value and an irrelevance value are determined for each token in a description, assuming no tokens are left out of computations. The irrelevance value is based on occurrences of a token in a sample set of descriptions. The relevance value is an estimated probability of relevance based on the header of the description being segmented.
-
Citations
20 Claims
-
1. A method comprising:
-
assigning a value to a first token in a description, the value indicating either; that the first token also occurs in a header of the description, that a lexical association exists between the first token and a second token in the header, or that the lexical association does not exist and the first token is absent from the header; computing a relevance value of a group of tokens that occur in the description and include the first token with the assigned value, the relevance value of the group being computed by a processor of a machine based on the value assigned to the first token; and indicating that the group of tokens is a most relevant group of tokens in the description. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A non-transitory machine-readable medium comprising instructions that, when executed by one or more processors of a machine, cause the machine to perform operations comprising:
-
assigning a value to a first token in a description, the value indicating either; that the first token also occurs in a header of the description, that a lexical association exists between the first token and a second token in the header, or that the lexical association does not exist and the first token is absent from the header; computing a relevance value of a group of tokens that occur in the description and include the first token with the assigned value, the relevance value of the group being computed based on the value assigned to the first token; and indicating that the group of tokens is a most relevant group of tokens in the description. - View Dependent Claims (15, 16)
-
-
17. A system comprising:
-
at least one processor; and a segmentation module that configures the at least one processor to; assign a value to a first token in a description, the value indicating either; that the first token also occurs in a header of the description, that a lexical association exists between the first token and a second token in the header, or that the lexical association does not exist and the first token is absent from the header; compute a relevance value of a group of tokens that occur in the description and include the first token with the assigned value, the relevance value of the group being computed based on the value assigned to the first token; and indicate that the group of tokens is a most relevant group of tokens in the description. - View Dependent Claims (18, 19, 20)
-
Specification