LARGE SCALE ITEM REPRESENTATION MATCHING
First Claim
1. A computerized method for matching item representations within a collection of item representations, the method comprising:
- determining candidate pairs of item representations based on frequency information indicative of the frequency at which terms appear in the collection of item representations; and
matching item representations by analyzing the candidate pairs using one or more fuzzy matching functions.
2 Assignments
0 Petitions
Accused Products
Abstract
A two-phase process quickly and accurately identifies representations of the same items within a collection of item representations. In the first phase, referred to as a “blocking phase,” frequency information indicating the frequency with which terms appear within the collection of item representations is used to quickly identify “candidate pairs” (i.e., pairs of item representations that have a relatively high probability of matching). The blocking phase results in a reduced subset of the data for further analysis during the second phase. In the second phase, referred to as a “matching phase,” the candidate pairs are analyzed using fuzzy matching functions to accurately identify “matching pairs” (i.e., representations of the same items).
25 Citations
20 Claims
-
1. A computerized method for matching item representations within a collection of item representations, the method comprising:
-
determining candidate pairs of item representations based on frequency information indicative of the frequency at which terms appear in the collection of item representations; and matching item representations by analyzing the candidate pairs using one or more fuzzy matching functions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. One or more computer-readable media embodying computer-useable instructions for performing a method of matching item representations from a collection of item representations, the method comprising:
-
extracting terms from the collection of item representation; determining frequency information indicative of the frequency with which the terms appear within the collection of item representations; generating an inverted index mapping the terms to the item representations in which the terms appear, wherein the inverted index further includes the frequency information for the terms; determining one or more candidate pairs of item representations using the inverted index based on terms shared between item representations and frequency information associated with the terms; and identifying one or more matching pairs of item representations by analyzing the candidate pairs using one or more fuzzy matching algorithms. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. A computerized system including one or more computer-readable media embodying software components for matching item representations from a collection of item representations, the software components comprising:
-
a blocking component that identifies candidate pairs of item representations based on frequency information associated with terms shared between the candidate pairs; and a matching component that identifies matching pairs of item representations by analyzing the candidate pairs using one or more fuzzy matching algorithms. - View Dependent Claims (19, 20)
-
Specification