Header-token driven automatic text segmentation

US 9,529,862 B2
Filed: 05/28/2015
Issued: 12/27/2016
Est. Priority Date: 12/28/2006
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a processor-implemented segmentation module configured to;

receive data from a client machine, the data comprising a product title and a product description;

identify a first token in the product title;

receive a token probability value associated with the first token;

assign a value to the first token, the value indicating that, one of;

the first token also occurs in the product description, a lexical association exists between the first token and a second token in the product description, andthe lexical association does not exist and the first token is absent from the product description;

compute a relevance value of a segmented group of tokens that occur in the product description and include the first token with the assigned value without requiring previously defined data tagging of the data beforehand of an unstructured text, the relevance value of the segmented group computed based on the value assigned to the first token; and

determine and store in memory an indication that the segmented group of tokens is a most relevant segmented group of tokens in the product description;

wherein the assigning of the value to the first token includes;

initially assigning and storing a default value that indicates the lexical association does not exist and the first token is absent from the product description; and

overwriting the stored initially assigned default value based on the first token occurring in the product description.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and a system to automatically segment text based on header tokens is described. A relevance value and an irrelevance value are determined for each token in a description, assuming no tokens are left out of computations. The irrelevance value is based on occurrences of a token in a sample set of descriptions. The relevance value is an estimated probability of relevance based on the header of the description being segmented.

Citations

20 Claims

1. A system comprising:
- a processor-implemented segmentation module configured to;
  
  receive data from a client machine, the data comprising a product title and a product description;
  
  identify a first token in the product title;
  
  receive a token probability value associated with the first token;
  
  assign a value to the first token, the value indicating that, one of;
  
  the first token also occurs in the product description, a lexical association exists between the first token and a second token in the product description, andthe lexical association does not exist and the first token is absent from the product description;
  
  compute a relevance value of a segmented group of tokens that occur in the product description and include the first token with the assigned value without requiring previously defined data tagging of the data beforehand of an unstructured text, the relevance value of the segmented group computed based on the value assigned to the first token; and
  
  determine and store in memory an indication that the segmented group of tokens is a most relevant segmented group of tokens in the product description;
  
  wherein the assigning of the value to the first token includes;
  
  initially assigning and storing a default value that indicates the lexical association does not exist and the first token is absent from the product description; and
  
  overwriting the stored initially assigned default value based on the first token occurring in the product description.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The system of claim 1, wherein:
    - the assigned value further indicates a relevance probability of the first token; and
      
      the computing of the relevance value of the segmented group of tokens is based on the relevance probability of the first token and further based on an irrelevance probability of the first token.
  - 3. The system of claim 2, the processor-implemented segmentation module further configured to determine the irrelevance probability based on a proportion of descriptions that contain the first token within a set of descriptions.
  - 4. The system of claim 1, the processor-implemented segmentation module further configured to determining that the lexical association exists between the first token and the second token by accessing a table that lexically associates the first token with the second token.
  - 5. The system of claim 1, wherein:
    - the computing of the relevance value of the segmented group of tokens computes the relevance value of a segment of the description; and
      
      the indicating indicates that the segment is a most relevant segment of the description.
  - 6. The system of claim 1, wherein:
    - the computing of the relevance value of the segmented group of tokens computes the relevance value of a sequence of tokens; and
      
      the indicating indicates that the sequence is a most relevant sequence of tokens in the description.
  - 7. The system of claim 1, wherein the indicating that the segmented group of tokens is the most relevant segmented group of tokens in the description includes modifying a font of the most relevant segmented group of tokens in the description.
  - 8. The system of claim 1, wherein the indicating that the segmented group of tokens is the most relevant segmented group of tokens in the description includes presenting the most relevant segmented group of tokens with the title of the description.

9. A method implemented on a processor-implemented segmentation module, the method comprising:
- receiving data from a client machine, the data comprising a product title and a product description;
  
  identifying a first token in the product title;
  
  receiving a token probability value associated with the first token;
  
  assigning a value to the first token, the value indicating that, one of;
  
  the first token also occurs in the product description, a lexical association exists between the first token and a second token in the product description, andthe lexical association does not exist and the first token is absent from the product title;
  
  computing a relevance value of a segmented group of tokens that occur in the product description and include the first token with the assigned value without requiring previously defined data tagging of the data beforehand of an unstructured text, the relevance value of the segmented group computed based on the value assigned to the first token; and
  
  determining and store in memory an indication that the segmented group of tokens is a most relevant segmented group of tokens in the product description;
  
  wherein the assigning of the value to the first token includes;
  
  initially assigning and storing a default value that indicates the lexical association does not exist and the first token is absent from the product description; and
  
  overwriting the stored initially assigned default value based on the first token occurring in the product description.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The method of claim 9, wherein:
    - the assigned value further indicates a relevance probability of the first token; and
      
      the computing of the relevance value of the segmented group of tokens is based on the relevance probability of the first token and further based on an irrelevance probability of the first token.
  - 11. The method of claim 10, the method further comprising determining the irrelevance probability based on a proportion of descriptions that contain the first token within a set of descriptions.
  - 12. The method of claim 9, the method further comprising determining that the lexical association exists between the first token and the second token by accessing a table that lexically associates the first token with the second token.
  - 13. The method of claim 9, wherein:
    - the computing of the relevance value of the segmented group of tokens computes the relevance value of a segment of the description; and
      
      the indicating indicates that the segment is a most relevant segment of the description.
  - 14. The method of claim 9, wherein:
    - the computing of the relevance value of the segmented group of tokens computes the relevance value of a sequence of tokens; and
      
      the indicating indicates that the sequence is a most relevant sequence of tokens in the description.
  - 15. The method of claim 9, wherein the indicating that the segmented group of tokens is the most relevant segmented group of tokens in the description includes modifying a font of the most relevant segmented group of tokens in the description.
  - 16. The method of claim 9, wherein the indicating that the segmented group of tokens is the most relevant segmented group of tokens in the description includes presenting the most relevant segmented group of tokens with the title of the description.

17. A system comprising:
- a processor-implemented segmentation module configured to;
  
  receive data from a client machine, the data comprising a product header and a product description;
  
  identify a first token in the product header;
  
  receive a token probability value associated with the first token;
  
  assign a value to the first token, the value indicating that, one of;
  
  the first token also occurs in the product description, a lexical association exists between the first token and a second token in the product description, andthe lexical association does not exist and the first token is absent from the product header;
  
  compute a relevance value of a segmented group of tokens that occur in the product description and include the first token with the assigned value without requiring previously defined data tagging of the data beforehand of an unstructured text, the relevance value of the segmented group computed based on the value assigned to the first token; and
  
  determining and storing in memory an indication that the segmented group of tokens is a most relevant segmented group of tokens in the product description;
  
  wherein the assigning of the value to the first token includes;
  
  initially assigning and storing a default value that indicates the lexical association does not exist and the first token is absent from the product description, and overwriting the stored initially assigned default value based on the first token occurring in the product description.
- View Dependent Claims (18, 19, 20)
- - 18. The system of claim 17, wherein:
    - the header of the description is a body of text shorter than the description and provides information about the description.
  - 19. The system of claim 17, wherein:
    - the header of the description includes at least one of a title of the description, an abstract of the description, a summary of description, or a synopsis of the description.
  - 20. The system of claim 17, the processor-implemented segmentation module further configured to determine an irrelevance probability of the first token based on a proportion of descriptions that contain the first token within a set of descriptions,wherein:
    - the assigned value further indicates a relevance probability of the first token; and
      
      the computing of the relevance value of the segmented group of tokens is based on the relevance probability of the first token and further based on the irrelevance probability.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
PayPal, Inc. (PayPal Holdings, Inc.)
Original Assignee
PayPal, Inc. (PayPal Holdings, Inc.)
Inventors
Sarwar, Badrul M., Mount, John A.
Primary Examiner(s)
Ly, Anh

Application Number

US14/724,269
Publication Number

US 20150261761A1
Time in Patent Office

579 Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/22   Indexing; Data structures t...

G06F 16/24578   using ranking

G06F 16/285   Clustering or classification

G06F 40/284   Lexical analysis, e.g. toke...

G06F 40/289   Phrasal analysis, e.g. fini...

Header-token driven automatic text segmentation

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Header-token driven automatic text segmentation

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links