Method and system for determining sets of variant items
First Claim
1. A computer-implemented method, comprising:
- performing, by one or more computers having at least one processor and memory;
for each particular item of a plurality of items;
determining one or more other items of the plurality of items that are each distinct from but similar to the particular item, wherein said determining is based on accessing data that includes, for each item of the plurality of items, a textual description of the item that describes the item but is not itself an item in the plurality of items;
for each given item of the determined one or more other items, identifying an item data pair with one member comprising a sequence of text strings from the textual description of the particular item, and the other member comprising another sequence of text strings from the textual description of the given item;
subsequent to said identifying, aligning each identified item data pair, wherein said aligning the identified item data pair comprises aligning text in the sequence of text strings from the textual description of the particular item with text in the other sequence of text strings from the textual description of the given item; and
for each aligned item data pair, determining one or more misalignments of the aligned item data pair, and assigning a similarity score to the aligned item data pair dependent on the one or more misalignments, wherein the similarity score indicates a degree of confidence that the given item and the particular item are distinct variants of each other; and
based on a plurality of the aligned item data pairs and similarity scores assigned to each of those aligned item data pairs, determining a variant set comprising multiple ones of the plurality of items, wherein each item of the variant set is determined to be a variant of each other item of the variant set;
wherein at least one of the aligned item data pairs comprises multiple misalignments;
for each misalignment of the multiple misalignments, determining a respective subscore based on that misalignment;
wherein said assigning the similarity score to said at least one aligned item data pair comprises assigning a result of a combination of each of said subscores to said at least one aligned item data pair.
1 Assignment
0 Petitions
Accused Products
Abstract
Various embodiments of a method and system for determining sets of variant items are described. Various embodiments may include a system configured to generate multiple item pairs each corresponding to a particular item and another item determined to be similar to the particular item. For the particular item and the other item, each item pair may include a respective sequence of text strings (e.g., a title). For each item pair, the system may perform a corresponding text alignment and determine one or more misalignments of the item pair. The system may also assign a similarity score to each item pair; the similarity score may be dependent on the misalignment(s) determined for the particular item pair. Based on each aligned item pair and the similarity score assigned to that aligned item pair, the system may generate an indication specifying that each of a set of items are variants of each other.
-
Citations
46 Claims
-
1. A computer-implemented method, comprising:
performing, by one or more computers having at least one processor and memory; for each particular item of a plurality of items; determining one or more other items of the plurality of items that are each distinct from but similar to the particular item, wherein said determining is based on accessing data that includes, for each item of the plurality of items, a textual description of the item that describes the item but is not itself an item in the plurality of items; for each given item of the determined one or more other items, identifying an item data pair with one member comprising a sequence of text strings from the textual description of the particular item, and the other member comprising another sequence of text strings from the textual description of the given item; subsequent to said identifying, aligning each identified item data pair, wherein said aligning the identified item data pair comprises aligning text in the sequence of text strings from the textual description of the particular item with text in the other sequence of text strings from the textual description of the given item; and for each aligned item data pair, determining one or more misalignments of the aligned item data pair, and assigning a similarity score to the aligned item data pair dependent on the one or more misalignments, wherein the similarity score indicates a degree of confidence that the given item and the particular item are distinct variants of each other; and based on a plurality of the aligned item data pairs and similarity scores assigned to each of those aligned item data pairs, determining a variant set comprising multiple ones of the plurality of items, wherein each item of the variant set is determined to be a variant of each other item of the variant set; wherein at least one of the aligned item data pairs comprises multiple misalignments; for each misalignment of the multiple misalignments, determining a respective subscore based on that misalignment; wherein said assigning the similarity score to said at least one aligned item data pair comprises assigning a result of a combination of each of said subscores to said at least one aligned item data pair. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
16. A system, comprising:
-
a memory comprising program instructions; and one or more processors coupled to said memory, wherein the program instructions are executable by at least one of said one or more processors to; for each particular item of a plurality of items; determine one or more other items of the plurality of items that are each distinct from but similar to the particular item, wherein said determining is based on accessing data that includes, for each item of the plurality of items, a textual description of the item that describes the item but is not itself an item in the plurality of items; for each given item of the determined one or more other items, identify an item data pair with one member comprising a sequence of text strings from the textual description of the particular item, and the other member comprising another sequence of text strings from the textual description of the given item; subsequent to said identifying, align each identified item data pair, wherein to align the identified item data pair the program instructions are configured to align text in the sequence of text strings from the textual description of the particular item with text in the other sequence of text strings from the textual description of the given item; and for each aligned item data pair, determine one or more misalignments of the aligned item data pair, and assign a similarity score to the aligned item data pair dependent on the one or more misalignments, wherein the similarity score indicates a degree of confidence that the given item and the particular item are distinct variants of each other; and based on a plurality of the aligned item data pairs and similarity scores assigned to each of those aligned item data pairs, determine a variant set comprising multiple ones of the plurality of items, wherein each item of the variant set is determined to be a variant of each other item of the variant set; wherein at least one of the aligned item data pairs comprises multiple misalignments; for each misalignment of the multiple misalignments, determine a respective subscore based on that misalignment; wherein to assign the similarity score to said at least one aligned item data pair, the program instructions are configured to assign a result of a combination of each of said subscores to said at least one aligned item data pair. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A computer-readable non-transitory storage medium storing program instructions computer-executable to:
-
for each particular item of a plurality of items; determine one or more other items of the plurality of items that are each distinct from but similar to the particular item, wherein said determining is based on accessing data that includes, for each item of the plurality of items, a textual description of the item that describes the item but is not itself an item in the plurality of items; for each given item of the determined one or more other items, identify an item data pair with one member comprising a sequence of text strings from the textual description of the particular item, and the other member comprising another sequence of text strings from the textual description of the given item; subsequent to said identifying, align each identified item data pair, wherein to align the identified item data pair the program instructions are configured to align text in the sequence of text strings from the textual description of the particular item with text in the other sequence of text strings from the textual description of the given item; and for each aligned item data pair, determine one or more misalignments of the aligned item data pair, and assign a similarity score to the aligned item data pair dependent on the one or more misalignments, wherein the similarity score indicates a degree of confidence that the given item and the particular item are distinct variants of each other; and based on a plurality of the aligned item data pairs and similarity scores assigned to each of those aligned item data pairs, determine a variant set comprising multiple ones of the plurality of items, wherein each item of the variant set is determined to be a variant of each other item of the variant set; wherein at least one of the aligned item data pairs comprises multiple misalignments; for each misalignment of the multiple misalignments, determine a respective subscore based on that misalignment; wherein to assign the similarity score to said at least one aligned item data pair, the program instructions are configured to assign a result of a combination of each of said subscores to said at least one aligned item data pair. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45)
-
-
46. A computer-readable non-transitory storage medium storing program instructions computer-executable to, for a plurality of items:
-
identify a multiplicity of item pairs, each comprising two distinct items from the plurality of items that are similar to each other, wherein said determining is based on accessing data that includes, for each item of the plurality of items, a textual description of the item that describes the item but is not itself an item in the plurality of items; for each identified item pair of the multiplicity of item pairs, receive a corresponding item information pair, wherein one member of the corresponding item information pair contains text from the textual description of one item of the identified item pair, while the other member of the corresponding item information pair contains text from the textual description of the other item of the identified item pair; for each two members of each item information pair of the received plurality of item information pairs; align the text in the one member with the text in the other member; assign a score to the item information pair based on ascertaining the presence, within one or more predetermined variation phrase sets, of text strings determined to be mismatched within the aligned text, wherein the score indicates a degree of probability that the one item of the plurality of items whose textual description comprises the text in the one member and the other item of the plurality of items whose textual description comprises the text in the other member are variants of each other; and based on the assigned score, determine a likelihood that the one item of the plurality of items whose textual description comprises the text in the one member and the other item of the plurality of items whose textual description comprises the text in the other member are variants of each other; wherein at least one of the item information pairs comprises multiple mismatches; for each mismatch of the multiple mismatches, determine a respective subscore based on that misalignment; wherein to assign the score to said at least one item information pair, the program instructions are configured to assign a result of a combination of each of said subscores to said at least one item information pair.
-
Specification