Attribute fill using text extraction
First Claim
Patent Images
1. A computer-implemented method, comprising:
- identifying, for a first item of a plurality of items, at least one item category associated with the first item and at least one second item;
determining, by a computer system, a plurality of attributes common to the identified at least one item category based at least in part on identifying the plurality of attributes inherited from at least one parent item category associated with the first item, the at least one parent item category being a parent node to a child node associated with the at least one item category associated with the first item and the at least one second item in a browse-node hierarchy;
identifying at least one attribute of the plurality of attributes that is not populated for the first item;
extracting, from the at least one second item of the plurality of items, a plurality of existing values assigned to the at least one attribute of the plurality of attributes;
identifying, from text of the first item, a plurality of candidate values, the plurality of candidate values comprising at least one candidate value for the at least one attribute of the plurality of attributes;
associating a set of priority indicators with the plurality of candidate values, the set of priority indicators generated based at least in part on pre-determined rules that utilize a context associated with a candidate value of the plurality of candidate values to represent importance of the candidate value in comparison to other potential candidate values, the context based at least in part on a set of candidate values of the plurality of candidate values identified by the pre-determined rules that are associated with a position of a given candidate value within the text;
implementing a rule engine that implements a rule set specified by a user to iteratively alter the rule set after removing one or more candidate values of the plurality of candidate values to prioritize remaining candidate values of the plurality of candidate values based at least in part on the set of priority indicators;
filtering the prioritized remaining candidate values from the rule engine to determine a potential value based at least in part on the plurality of existing values associated with the at least one attribute of the plurality of attributes and the prioritized remaining candidate values; and
populating, by the computing system, the at least one attribute of the plurality of attributes of the first item with the potential value.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods involve filling missing attribute values from unstructured text. A computing device may provide a plurality of items, such as an item catalog for an electronic marketplace. When an item is found to have a missing attribute value, a plurality of existing values for that attribute is compiled by mining other items. Text associated with the item is parsed to determine possible values for the attribute. From those possible values, the most likely value is identified and the missing attribute value is populated with that value.
79 Citations
18 Claims
-
1. A computer-implemented method, comprising:
-
identifying, for a first item of a plurality of items, at least one item category associated with the first item and at least one second item; determining, by a computer system, a plurality of attributes common to the identified at least one item category based at least in part on identifying the plurality of attributes inherited from at least one parent item category associated with the first item, the at least one parent item category being a parent node to a child node associated with the at least one item category associated with the first item and the at least one second item in a browse-node hierarchy; identifying at least one attribute of the plurality of attributes that is not populated for the first item; extracting, from the at least one second item of the plurality of items, a plurality of existing values assigned to the at least one attribute of the plurality of attributes; identifying, from text of the first item, a plurality of candidate values, the plurality of candidate values comprising at least one candidate value for the at least one attribute of the plurality of attributes; associating a set of priority indicators with the plurality of candidate values, the set of priority indicators generated based at least in part on pre-determined rules that utilize a context associated with a candidate value of the plurality of candidate values to represent importance of the candidate value in comparison to other potential candidate values, the context based at least in part on a set of candidate values of the plurality of candidate values identified by the pre-determined rules that are associated with a position of a given candidate value within the text; implementing a rule engine that implements a rule set specified by a user to iteratively alter the rule set after removing one or more candidate values of the plurality of candidate values to prioritize remaining candidate values of the plurality of candidate values based at least in part on the set of priority indicators; filtering the prioritized remaining candidate values from the rule engine to determine a potential value based at least in part on the plurality of existing values associated with the at least one attribute of the plurality of attributes and the prioritized remaining candidate values; and populating, by the computing system, the at least one attribute of the plurality of attributes of the first item with the potential value. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer-readable storage medium storing computer-executable instructions that, when executed by a processor, configure the processor to perform operations comprising:
-
identifying at least one first item of a plurality of items for which an attribute is unpopulated based at least in part on identifying a plurality of attributes inherited from at least one parent item category associated with the at least one first item and at least one second item, the at least one parent item category being a parent node to a child node associated with at least one first item and the at least one second item in a browse-node hierarchy; extracting a plurality of existing values for the attribute based at least in part on the at least one second item of the plurality of items, the at least one second item having the attribute populated; associating a set of priority indicators with the plurality of existing values, the set of priority indicators generated based at least in part on pre-determined rules that utilize a context associated with an existing value of the plurality of existing values to represent importance of the existing value in comparison to other existing values, the context based at least in part on a set of existing values of the plurality of existing values identified by the pre-determined rules that are associated with a position of a given existing value within text of the at least one first item; implementing a rule engine that implements a rule set specified by a user to iteratively alter the rule set after removing one or more existing values of the plurality of existing values to prioritize remaining existing values of the plurality of existing values based at least in part on the set of priority indicators; determining a potential value for the attribute from the text associated with the at least one first item based at least in part on filtering the plurality of existing values and the prioritized remaining existing values from the rule engine; and populating the attribute of the at least one first item of the plurality of items with the determined potential value. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system, comprising:
-
a processor; and a memory device including instructions that, when executed by the processor, cause the system to; provide a first item categorization for a set of items; determine at least one attribute associated with the first item categorization; identify at least one first item of the set of items for which the associated at least one attribute is empty based at least in part on identifying a plurality of attributes inherited from a parent item category associated with the at least one first item and at least one second item, the parent item category being a parent node to a child node associated with the at least one first item and the at least one second item in a browse-node hierarchy; determine, from one or more second items of the set of items, a range of existing values for the attribute; associate a set of priority indicators with the range of existing values, the set of priority indicators generated based at least in part on pre-determined rules that utilize a context associated with an existing value of the range of existing values to represent importance of the existing value in comparison to other existing values, the context based at least in part on a set of existing values of the plurality of existing values identified by the pre-determined rules that are associated with a position of a given existing value within text of the at least one first item; implement a rule engine that implements a rule set specified by a user to iteratively alter the rule set after removing one or more values from the range of existing values to prioritize remaining values of the range of existing values based at least in part on the set of priority indicators; filter the prioritized remaining values from the rule engine to determine at least one potential value for the attribute based at least in part on the range of existing values and the prioritized remaining values; determine, from the at least one potential value for the attribute, a probable value; and set the at least one attribute associated with the first item to the probable value. - View Dependent Claims (16, 17, 18)
-
Specification