Header-token driven automatic text segmentation
First Claim
Patent Images
1. A system comprising:
- a processor-implemented segmentation module configured to;
receive data from a client machine, the data comprising a product title and a product description;
identify a first token in the product title;
receive a token probability value associated with the first token;
assign a value to the first token, the value indicating that, one of;
the first token also occurs in the product description, a lexical association exists between the first token and a second token in the product description, andthe lexical association does not exist and the first token is absent from the product description;
compute a relevance value of a segmented group of tokens that occur in the product description and include the first token with the assigned value without requiring previously defined data tagging of the data beforehand of an unstructured text, the relevance value of the segmented group computed based on the value assigned to the first token; and
determine and store in memory an indication that the segmented group of tokens is a most relevant segmented group of tokens in the product description;
wherein the assigning of the value to the first token includes;
initially assigning and storing a default value that indicates the lexical association does not exist and the first token is absent from the product description; and
overwriting the stored initially assigned default value based on the first token occurring in the product description.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and a system to automatically segment text based on header tokens is described. A relevance value and an irrelevance value are determined for each token in a description, assuming no tokens are left out of computations. The irrelevance value is based on occurrences of a token in a sample set of descriptions. The relevance value is an estimated probability of relevance based on the header of the description being segmented.
-
Citations
20 Claims
-
1. A system comprising:
-
a processor-implemented segmentation module configured to; receive data from a client machine, the data comprising a product title and a product description; identify a first token in the product title; receive a token probability value associated with the first token; assign a value to the first token, the value indicating that, one of; the first token also occurs in the product description, a lexical association exists between the first token and a second token in the product description, and the lexical association does not exist and the first token is absent from the product description; compute a relevance value of a segmented group of tokens that occur in the product description and include the first token with the assigned value without requiring previously defined data tagging of the data beforehand of an unstructured text, the relevance value of the segmented group computed based on the value assigned to the first token; and determine and store in memory an indication that the segmented group of tokens is a most relevant segmented group of tokens in the product description; wherein the assigning of the value to the first token includes; initially assigning and storing a default value that indicates the lexical association does not exist and the first token is absent from the product description; and
overwriting the stored initially assigned default value based on the first token occurring in the product description. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method implemented on a processor-implemented segmentation module, the method comprising:
-
receiving data from a client machine, the data comprising a product title and a product description; identifying a first token in the product title; receiving a token probability value associated with the first token; assigning a value to the first token, the value indicating that, one of; the first token also occurs in the product description, a lexical association exists between the first token and a second token in the product description, and the lexical association does not exist and the first token is absent from the product title; computing a relevance value of a segmented group of tokens that occur in the product description and include the first token with the assigned value without requiring previously defined data tagging of the data beforehand of an unstructured text, the relevance value of the segmented group computed based on the value assigned to the first token; and determining and store in memory an indication that the segmented group of tokens is a most relevant segmented group of tokens in the product description;
wherein the assigning of the value to the first token includes;initially assigning and storing a default value that indicates the lexical association does not exist and the first token is absent from the product description; and
overwriting the stored initially assigned default value based on the first token occurring in the product description. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system comprising:
-
a processor-implemented segmentation module configured to; receive data from a client machine, the data comprising a product header and a product description; identify a first token in the product header; receive a token probability value associated with the first token; assign a value to the first token, the value indicating that, one of; the first token also occurs in the product description, a lexical association exists between the first token and a second token in the product description, and the lexical association does not exist and the first token is absent from the product header; compute a relevance value of a segmented group of tokens that occur in the product description and include the first token with the assigned value without requiring previously defined data tagging of the data beforehand of an unstructured text, the relevance value of the segmented group computed based on the value assigned to the first token; and determining and storing in memory an indication that the segmented group of tokens is a most relevant segmented group of tokens in the product description; wherein the assigning of the value to the first token includes; initially assigning and storing a default value that indicates the lexical association does not exist and the first token is absent from the product description, and overwriting the stored initially assigned default value based on the first token occurring in the product description. - View Dependent Claims (18, 19, 20)
-
Specification