Identifying product references in user-generated content

US 9,256,593 B2
Filed: 11/28/2012
Issued: 02/09/2016
Est. Priority Date: 11/28/2012
Status: Active Grant

First Claim

Patent Images

1. A method for product extraction, the method comprising:

receiving, by a computer system, a document;

identifying, by the computer system, a product type for the document according to content of the document;

extracting, by the computer system, product attributes and attribute values from the document;

retrieving, by the computer system, an attribute set corresponding to the product type from a database;

identifying, by the computer system, a first set of products that have at least the product attributes and the attribute values of the document that are included in the attribute set, the first set of products being nodes in a hierarchical taxonomy;

filtering, by the computer system, the first set of products by;

identifying a common ancestor node in the hierarchical taxonomy having all of the first set of products as descendants;

identifying immediate child nodes of the common ancestor node;

identifying a majority child node having a major portion of the first set of products as descendants; and

identifying a second set of products including a portion of the first set of products that are descendants of the majority child node and excluding those products of the first set of products that are not descendants of the majority child node;

selecting, by the computer system, an inferred product for the document from the second set of products;

wherein;

identifying the second set of products comprises;

calculating a score for each product in the first set of products; and

selecting the second set of products based at least in part on the calculated scores for the first set of products;

selecting the second set of products comprises;

removing products from the first set of products if application of a blacklist rule to the document so indicates; and

selecting the inferred product comprises;

selecting the inferred product as specified by a whitelist rule if application of the whitelist rule to the document so indicates; and

at least one of the blacklist rule and the whitelist rule take as an input a list of keywords from the document.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods are disclosed herein for extracting products referenced in a document. A document is analyzed to identify a product type that is referenced in the document. Attributes are extracted from the document. A set of candidate products are identified corresponding to the extracted attributes. A score is calculated for the candidate products and the products are further selected or filtered based on the score, whitelist rules, and blacklist rules in order to identify one or more inferred products referenced by the document. The whitelist and blacklist rules may take as inputs a domain, a user identifier, and keywords included in the document. A set of sufficient attributes may be identified for each product type. Selection of a candidate product may be based at least in part on the document including all of the attributes in the set of sufficient attributes.

21 Citations

14 Claims

1. A method for product extraction, the method comprising:
- receiving, by a computer system, a document;
  
  identifying, by the computer system, a product type for the document according to content of the document;
  
  extracting, by the computer system, product attributes and attribute values from the document;
  
  retrieving, by the computer system, an attribute set corresponding to the product type from a database;
  
  identifying, by the computer system, a first set of products that have at least the product attributes and the attribute values of the document that are included in the attribute set, the first set of products being nodes in a hierarchical taxonomy;
  
  filtering, by the computer system, the first set of products by;
  
  identifying a common ancestor node in the hierarchical taxonomy having all of the first set of products as descendants;
  
  identifying immediate child nodes of the common ancestor node;
  
  identifying a majority child node having a major portion of the first set of products as descendants; and
  
  identifying a second set of products including a portion of the first set of products that are descendants of the majority child node and excluding those products of the first set of products that are not descendants of the majority child node;
  
  selecting, by the computer system, an inferred product for the document from the second set of products;
  
  wherein;
  
  identifying the second set of products comprises;
  
  calculating a score for each product in the first set of products; and
  
  selecting the second set of products based at least in part on the calculated scores for the first set of products;
  
  selecting the second set of products comprises;
  
  removing products from the first set of products if application of a blacklist rule to the document so indicates; and
  
  selecting the inferred product comprises;
  
  selecting the inferred product as specified by a whitelist rule if application of the whitelist rule to the document so indicates; and
  
  at least one of the blacklist rule and the whitelist rule take as an input a list of keywords from the document.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein at least one of the blacklist rule and the whitelist rule take as the input an author of the document.
  - 3. The method of claim 1, wherein at least one of the blacklist rule and the whitelist rule take as the input a web domain of the document.
  - 4. The method of claim 1, wherein at least one of the blacklist rule and the whitelist rule take as the input a Uniform Resource Locator (“
    - URL”
      
      ) for the document.
  - 5. The method of claim 1, wherein calculating the score for each product of the first set of products further comprises calculating the score based on at least one of:
    - a proximity of context of the document to a context associated with the each product;
      
      a similarity of attributes of the each product and the product attributes;
      
      textual properties of the product attributes; and
      
      a number of actual mentions of the each product and any synonyms thereof in the document.
  - 6. The method of claim 5, wherein the textual properties of the product attributes include a user of capitalization and a presence of digits in the product attributes.
  - 7. The method of claim 1, wherein identifying the second set of products based at least in part on the calculated scores for the first set of products further comprises:
    - removing products from the first set of products based on a comparison of the calculated scores thereof to a threshold;
      
      wherein selecting the inferred product from the second set of products comprises identify a highest scoring product of the second set of products.

8. A system comprising:
- one or more processors, the one or more processors embodied as one or more processing devices; and
  
  one or more non-transitory storage modules storing executable and operational data effective to cause the one or more processors to;
  
  receive a document;
  
  identify a product type for the document according to content of the document;
  
  extract product attributes and attribute values from the document;
  
  retrieve an attribute set corresponding to the product type from a database;
  
  identify a first set of products that have at least the product attributes and the attribute values of the document that are included in the attribute set, the first set of products being nodes in a hierarchical taxonomy;
  
  filter the first set of products by;
  
  identifying a common ancestor node in the hierarchical taxonomy having all of the first set of products as descendants;
  
  identifying immediate child nodes of the common ancestor node;
  
  identifying a majority child node having a major portion of the first set of products as descendants; and
  
  identifying a second set of products including a portion of the first set of products that are descendants of the majority child node and excluding those products of the first set of products that are not descendants of the majority child node;
  
  select an inferred product for the document from the second set of products;
  
  wherein;
  
  the executable and operational data are further effective to cause the one or more processors to identify the second set of products by;
  
  calculating a score for each product in the first set of products; and
  
  selecting the second set of products based at least in part on the calculated scores for the first set of products;
  
  the executable and operational data are further effective to cause the one or more processors to select the second set of products by;
  
  removing products from the first set of products if application of a blacklist rule to the document so indicates; and
  
  wherein selecting the inferred product comprises selecting the inferred product as specified by a whitelist rule if application of the whitelist rule to the document so indicates; and
  
  at least one of the blacklist rule and the whitelist rule take as an input a list of keywords from the document.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein at least one of the blacklist rule and the whitelist rule take as the input an author of the document.
  - 10. The system of claim 8, wherein at least one of the blacklist rule and the whitelist rule take as the input a web domain associated with the document.
  - 11. The system of claim 8, wherein at least one of the blacklist rule and the whitelist rule take as the input a Uniform Resource Locator (“
    - URL”
      
      ) for the document.
  - 12. The system of claim 8, wherein the executable and operational data are further effective to cause the one or more processors to calculate the score for each product of the first set of products by calculating the score based on at least one of:
    - a proximity of context of the document to a context associated with the each product;
      
      a similarity of attributes of the each product and the product attributes;
      
      textual properties of the product attributes; and
      
      a number of actual mentions of the each product and any synonyms thereof in the document.
  - 13. The system of claim 12, wherein the textual properties of the product attributes include a user of capitalization and a presence of digits in the product attributes.
  - 14. The system of claim 8, wherein the executable and operational data are further effective to cause the one or more processors to identify the second set of products by:
    - removing products from the first set of products based on a comparison of the calculated scores thereof to a threshold;
      
      wherein selecting the inferred product from the second set of products comprises identify a highest scoring product of the second set of products.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Walmart Apollo, LLC (WalMart Inc.)
Original Assignee
Wal-Mart Stores Texas LLC (WalMart Inc.)
Inventors
Lamba, Digvijay Singh, Chai, Xiaoyong, Whisler, Nicole
Primary Examiner(s)
Liao, Jason
Assistant Examiner(s)
HTAY, LIN LIN M

Application Number

US13/688,060
Publication Number

US 20140149105A1
Time in Patent Office

1,168 Days
Field of Search

707/704
US Class Current

1/1
CPC Class Codes

G06F 40/279 Recognition of textual enti...

G06F 40/284 Lexical analysis, e.g. toke...

Identifying product references in user-generated content

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

21 Citations

14 Claims

Specification

Solutions

Use Cases

Quick Links

Identifying product references in user-generated content

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

21 Citations

14 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links