SYSTEMS AND METHODS FOR IDENTIFYING SETS OF SIMILAR PRODUCTS
First Claim
1. A computer-implemented method comprising:
- receiving an identification of a product for clustering into one or more sets of similar products;
transmitting a query to one or more information sources having information related to the product;
receiving from the one or more internet websites information relevant to the product;
merging and storing at least a portion of the received information into one or more databases;
transforming at least a portion of the received information into a text file;
cleansing at least a portion of the text file to create a cleansed text file;
creating a dictionary from at least a portion of the cleansed text file, wherein the dictionary comprises words found in the cleansed text file; and
performing topic modeling on the cleansed text file to determine one or more clusters of one or more substitutes of the product.
9 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the present invention relate to systems and methods for determining sets of products which are similar to each other in terms of consumers'"'"' wants and needs. Queries are performed on a particular product. Documents relating to the query are received and stored. A dictionary is created from the received documents, whereby the documents, which are text files, are scrubbed of certain data to create a scrubbed text file. Topic modeling is then performed on the cleansed text file. Various methods can be used to perform topic modeling, including, but not limited to, latent semantic analysis, nonnegative matrix factorization, and singular value decomposition.
21 Citations
22 Claims
-
1. A computer-implemented method comprising:
-
receiving an identification of a product for clustering into one or more sets of similar products; transmitting a query to one or more information sources having information related to the product; receiving from the one or more internet websites information relevant to the product; merging and storing at least a portion of the received information into one or more databases; transforming at least a portion of the received information into a text file; cleansing at least a portion of the text file to create a cleansed text file; creating a dictionary from at least a portion of the cleansed text file, wherein the dictionary comprises words found in the cleansed text file; and performing topic modeling on the cleansed text file to determine one or more clusters of one or more substitutes of the product. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A non-transitory computer-readable medium having software instructions stored thereon, which, when executed by a client device, causes the client device to perform the operations comprising:
-
receiving an identification of a product for clustering into one or more sets of similar products; transmitting a query to one or more internet websites having information related to the product; receiving from the one or more internet websites information relevant to the product; merging and storing the received information into one or more databases; transforming all received information into a text file; cleansing the text file to create a cleansed text file; creating a dictionary from the cleansed text file, wherein the dictionary comprises all words found in the cleansed text file; and performing topic modeling on the cleansed text file to determine a cluster of one or more substitutes of the product. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
-
-
22. A system for topic modeling comprising:
-
a client computer comprising a processor configured to execute computer-executable instructions, the instructions comprising instructions for; receiving an identification of a product for clustering into one or more sets of similar products; transmitting a query to one or more internet web sites having information related to the product; receiving from the one or more internet websites information relevant to the product; merging and storing at least a portion of the received information into one or more databases; transforming at least a portion of the received information into a text file; cleansing at least a portion of the text file to create a cleansed text file; creating a dictionary from at least a portion of the cleansed text file, wherein the dictionary comprises words found in the cleansed text file; and performing topic modeling on the cleansed text file to determine one or more substitutes of the product; and a database for storing; the text file; the cleansed text file; and the dictionary.
-
Specification