PRODUCT LINE EXTRACTION
First Claim
1. One or more computer-readable media having computer-executable instructions embodied thereon for performing a method of extracting product lines from a plurality of product titles, the method comprising:
- receiving the plurality of product titles;
breaking the plurality of product titles into a plurality of tokens, wherein the plurality of tokens includes unigrams and bigrams;
generating an association rule for each of a plurality of token pairs, wherein a token pair may include two of the bigrams, two of the unigrams, or one bigram and one unigram;
generating a plurality of brand specific tokens that form part of a brand name;
generating a plurality of product class specific tokens using the plurality of brand specific tokens and the association rule for each of the plurality of token pairs;
generating a plurality of model specific tokens that form part of a product model; and
generating a plurality of product lines from the plurality of tokens.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems and computer readable media for extracting product lines from a plurality of product titles are provided. In one embodiment, the plurality of product titles are broken into tokens. Association rules are calculated for individual tokens and pairs of tokens. Brand specific terms and product class specific terms within the product titles are identified. In one embodiment, a token tree is used to identify product lines within the list of product titles using the association rules, the brand specific terms, and the product class specific terms.
20 Citations
20 Claims
-
1. One or more computer-readable media having computer-executable instructions embodied thereon for performing a method of extracting product lines from a plurality of product titles, the method comprising:
-
receiving the plurality of product titles; breaking the plurality of product titles into a plurality of tokens, wherein the plurality of tokens includes unigrams and bigrams; generating an association rule for each of a plurality of token pairs, wherein a token pair may include two of the bigrams, two of the unigrams, or one bigram and one unigram; generating a plurality of brand specific tokens that form part of a brand name; generating a plurality of product class specific tokens using the plurality of brand specific tokens and the association rule for each of the plurality of token pairs; generating a plurality of model specific tokens that form part of a product model; and generating a plurality of product lines from the plurality of tokens. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computerized system for generating a list of product lines from a list of product titles, the system including:
-
an association rule builder that calculates an associative probability that a first token is associated with a second token, wherein the first token and the second token are generated from the list of product titles; a product class recognizer that identifies product class specific tokens within the list of product titles, wherein the product class specific tokens describe a product category that is recognized across multiple brands; and a product line extractor that identifies product lines within the list of product titles. - View Dependent Claims (11, 12, 13, 14)
-
-
15. One or more computer-readable media having computer-executable instructions embodied thereon for performing a method of extracting product lines from a group of product titles, the method comprising:
-
tokenizing the group of product titles to create a plurality of tokens, wherein the plurality of tokens includes unigrams and bigrams; generating an association rule for token pairs generated from the plurality of tokens, wherein the association rule indicates how frequently an individual pairing of tokens occur together within the group of product titles; creating a product line token tree that includes a brand token as a root node and suffix tokens as second level nodes; analyzing token branches on the product line token tree to generate a plurality of product lines, wherein a token branch includes the root node and a suffix token; and storing the plurality of product lines. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification