HEADER-TOKEN DRIVEN AUTOMATIC TEXT SEGMENTATION
2 Assignments
0 Petitions
Accused Products
Abstract
A method and a system to automatically segment text based on header tokens is described. A relevance value and an irrelevance value are determined for each token in a description, assuming no tokens are left out of computations. The irrelevance value is based on occurrences of a token in a sample set of descriptions. The relevance value is an estimated probability of relevance based on the header of the description being segmented.
17 Citations
21 Claims
- 1. (canceled)
-
2. A system comprising:
a processor-implemented segmentation module configured to; receive data from a client machine, the data comprising a product title and a product description; identify a first token in the product description; receive a token probability value associated with the first token; assign a value to the first token, the value indicating that, one of; the first token also occurs in the product title, a lexical association exists between the first token and a second token in the product title, and the lexical association does not exist and the first token is absent from the product title; compute a relevance value of a group of tokens that occur in the product description and include the first token with the assigned value, the relevance value of the group computed based on the value assigned to the first token; and determine and store in memory an indication that the group of tokens is a most relevant group of tokens in the product description; wherein the assigning of the value to the first token includes; initially assigning and storing a default value that indicates the lexical association does not exist and the first token is absent from the product title; and overwriting the stored initially assigned default value based on the first token occurring in the product title.
-
10. A method implemented on a processor-implemented segmentation module, the method comprising:
-
receiving data from a client machine, the data comprising a product title and a product description; identifying a first token in the product description; receiving a token probability value associated with the first token; assigning a value to the first token, the value indicating that, one of; the first token also occurs in the product title, a lexical association exists between the first token and a second token in the product title, and the lexical association does not exist and the first token is absent from the product title; computing a relevance value of a group of tokens that occur in the product description and include the first token with the assigned value, the relevance value of the group computed based on the value assigned to the first token; and determining and store in memory an indication that the group of tokens is a most relevant group of tokens in the product description; wherein the assigning of the value to the first token includes; initially assigning and storing a default value that indicates the lexical association does not exist and the first token is absent from the product title; and overwriting the stored initially assigned default value based on the first token occurring in the product title. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A system comprising:
-
a processor-implemented segmentation module configured to; receive data from a client machine, the data comprising a product header and a product description; identify a first token in the product description; receive a token probability value associated with the first token; assign a value to the first token, the value indicating that, one of; the first token also occurs in the product header, a lexical association exists between the first token and a second token in the product header, and the lexical association does not exist and the first token is absent from the product header; compute a relevance value of a group of tokens that occur in the product description and include the first token with the assigned value, the relevance value of the group computed based on the value assigned to the first token; and determining and storing in memory an indication that the group of tokens is a most relevant group of tokens in the product description; wherein the assigning of the value to the first token includes; initially assigning and storing a default value that indicates the lexical association does not exist and the first token is absent from the product header; and overwriting the stored initially assigned default value based on the first token occurring in the product header. - View Dependent Claims (19, 20, 21)
-
Specification