Automated categorization of products in a merchant catalog
First Claim
1. A method implemented on at least one machine having at least one processor, storage, and a communication platform connected to a network, the method comprising:
- receiving information about a plurality of products, the information about each of the plurality of products comprising a plurality of text metadata fields;
receiving a set of categories arranged in one or more hierarchical structures;
automatically determining, using the at least one processor, associations between the plurality of products and the set of categories by estimating a probability that each of the products belongs to each category in the set of categories by generating a feature vector for the each product by concatenating the respective plurality of text metadata fields into a paragraph, the associations specifying one or more categories from the set of categories associated with each of the plurality of products based upon at least one of the plurality of text metadata fields in accordance with each of the plurality of products, wherein each of the associations specifies that the respective product belongs to the respective one or more categories, the estimated probability being computed based on a prior probability of each of the products belonging to a particular category;
receiving a search query input by a user to a search engine via the network;
determining, using the search engine executed by the at least one processor, a first product responsive to the search query;
identifying, using the at least one processor, a first category associated with a first hierarchical structure to which the first product belongs based upon the determined associations;
identifying, using the at least one processor, a second category associated with a second hierarchical structure to which the first product belongs based upon the determined associations, wherein the second hierarchical structure is independent of the first hierarchical structure; and
identifying, using the at least one processor, a second product belonging to the second category as a search result provided to the user via the network based at least upon the determined associations.
5 Assignments
0 Petitions
Accused Products
Abstract
A system and method is described for large-scale, automated classification of products. The system and method receives information about products, wherein such information includes one or more text metadata fields associated with each product, receives a set of categories, and automatically selects one or more categories from the set of categories to which each product belongs based upon at least one of the one or more text metadata fields associated with each product. A machine learning classifier may be used to automatically select the one or more categories to which each product belongs by operating upon a feature vector for each product derived from text metadata fields of the product description. The machine learning classifier may be trained using a set of pre-categorized product descriptions. The product-category associations generated by the system and method can be used to improve search engine results or product recommendations to consumers.
80 Citations
20 Claims
-
1. A method implemented on at least one machine having at least one processor, storage, and a communication platform connected to a network, the method comprising:
-
receiving information about a plurality of products, the information about each of the plurality of products comprising a plurality of text metadata fields; receiving a set of categories arranged in one or more hierarchical structures; automatically determining, using the at least one processor, associations between the plurality of products and the set of categories by estimating a probability that each of the products belongs to each category in the set of categories by generating a feature vector for the each product by concatenating the respective plurality of text metadata fields into a paragraph, the associations specifying one or more categories from the set of categories associated with each of the plurality of products based upon at least one of the plurality of text metadata fields in accordance with each of the plurality of products, wherein each of the associations specifies that the respective product belongs to the respective one or more categories, the estimated probability being computed based on a prior probability of each of the products belonging to a particular category; receiving a search query input by a user to a search engine via the network; determining, using the search engine executed by the at least one processor, a first product responsive to the search query; identifying, using the at least one processor, a first category associated with a first hierarchical structure to which the first product belongs based upon the determined associations; identifying, using the at least one processor, a second category associated with a second hierarchical structure to which the first product belongs based upon the determined associations, wherein the second hierarchical structure is independent of the first hierarchical structure; and identifying, using the at least one processor, a second product belonging to the second category as a search result provided to the user via the network based at least upon the determined associations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 15, 16, 18)
-
-
10. A system, comprising:
-
one or more computing machines comprising hardware; a product classifier that is executed by at least one of the one or more computing machines to receive information about a plurality of products, the information about each of the plurality of products comprising a plurality of text metadata fields, to receive a set of categories arranged in one or more hierarchical structures, and to automatically associate each product with one or more categories from the set of categories by estimating a probability that the each product belongs to each category in the set of categories by generating a feature vector by concatenating the plurality of text metadata fields associated with the each product into a paragraph, wherein each of the associations specifies that the respective product belongs to the respective one or more categories, the estimated probability being computed based on a prior probability of each of the products belonging to a particular category; and a search engine that is executed by at least one of the one or more computing machines to receive a search query input by a user to the search engine via a computer network, to determine a first product responsive to the search query, to identify a first category associated with a first hierarchical structure to which the first product belongs based upon the associations between categories and products generated by the product classifier, to identify a second category associated with a second hierarchical structure to which the first product belongs based upon the determined associations, and to identify a second product belonging to the second category as a search result provided to the user via the computer network based at least upon the associations between categories and products generated by the product classifier, wherein the second hierarchical structure is independent of the first hierarchical structure. - View Dependent Claims (11, 12, 13, 14, 17, 19)
-
-
20. A method implemented on at least one machine having at least one processor, storage, and a communication platform connected to a network, the method comprising:
-
receiving information about a plurality of products, the information about each of the plurality of products comprising one or more text metadata fields; receiving a set of categories arranged in a first hierarchical structure associated with a first merchant and in a second hierarchical structure associated with a second merchant different from the first merchant; estimating, using the at least one processor, a probability that each product of the plurality of products belongs to each category in the set of categories based upon at least one of the corresponding one or more text metadata fields; automatically determining, using the at least one processor, associations between the plurality of products and the set of categories, the associations specifying one category from the set of categories to be associated with each of the plurality of products based upon the corresponding estimated probability exceeding a certain threshold, wherein the estimated probability is computed based on a prior probability of each of the products belonging to a particular category; automatically determining, using the at least one processor, an association for each of the plurality of products with at least one additional category in the set of categories, when the estimated probability that the product belongs to the at least one additional category exceeds the certain threshold; receiving a search query input by a user to a search engine via the network; determining, using the search engine executed by the at least one processor, a first product responsive to the search query; identifying, using the at least one processor, at least one category of the set of categories to which the first product belongs based at least upon the associations determined prior to the search query input being received; and identifying, using the at least one processor, at least one other category of the set of categories to which the first product belongs based at least upon the associations determined prior to the search query input being received, wherein the at least one other category is different from the at least one category; and identifying, using the at least one processor, a second product belonging to the at least one other category as a search result provided to the user via the computer network based at least upon the associations determined prior to the search query input being received, wherein the first and second categories belong to the first and second hierarchical structures, respectively, and wherein the first hierarchical structure is independent of the second hierarchical structure.
-
Specification