Data merging techniques
First Claim
1. A method executed in a computer system for determining if an update entry has a matching entry in an existing database comprising:
- determining a subset of one or more existing entries in the existing database in which each of said one or more existing entries has a phone number corresponding to a phone number in said update entry;
for each existing entry in said subset, calculating an associated score representing a matching correspondence between the update entry and said each existing entry;
for each existing entry in said subset, updating said associated score if a zip code match between said each existing entry and said update entry is determined;
determining if there is at least one associated score greater than a predetermined threshold; and
if there is only one existing entry in the subset with an associated score greater than the predetermined threshold, determining this existing entry matches the update entry.
5 Assignments
0 Petitions
Accused Products
Abstract
Disclosed is a system for performing online data queries. The system for performing online data queries is a distributed computer system with a plurality of server nodes each fully redundant and capable of processing a user query request. Each server node includes a data query cache and other caches that may be used in performing data queries. The data query, as well as request allocation, is performed in accordance with an adaptive partitioning technique with a bias towards an initial partitioning scheme. Generic objects are created and used to represent business listings upon which the user may perform queries. Various data processing and integration techniques are included which enhance data queries. An update technique is used for synchronizing data updates as needed in updating the plurality of server nodes. A multi-media data transfer technique is used to transfer non-text or multi-media data between various components of the online query tool. Optimizations for searching, such as the common term optimization, are included for those commonly performed data queries. Also disclosed is a system for targeting advertisements that are displayed to a user of the system.
345 Citations
18 Claims
-
1. A method executed in a computer system for determining if an update entry has a matching entry in an existing database comprising:
-
determining a subset of one or more existing entries in the existing database in which each of said one or more existing entries has a phone number corresponding to a phone number in said update entry;
for each existing entry in said subset, calculating an associated score representing a matching correspondence between the update entry and said each existing entry;
for each existing entry in said subset, updating said associated score if a zip code match between said each existing entry and said update entry is determined;
determining if there is at least one associated score greater than a predetermined threshold; and
if there is only one existing entry in the subset with an associated score greater than the predetermined threshold, determining this existing entry matches the update entry. - View Dependent Claims (2, 3, 4, 5)
performing a set comparison of name components to determine how many name components of an existing entry in said existing database match those of the update entry.
-
-
3. The method of claim 1, wherein said existing database is in a first predetermined format and said data update entry is in a second predetermined format, said first and said second being different predetermined formats.
-
4. The method of claim 1, wherein said existing database is in a normalized form.
-
5. The method of claim 1, wherein a phone number and a zip code associated with an entry in said existing database uniquely identifies a corresponding business listing that includes said phone number, said zip code, and a business name.
-
6. A method executed in a computer system for determining if an update entry has a matching entry in an existing database comprising:
-
determining a subset of one or more existing entries in the existing database in which each of said one or more existing entries has a phone number corresponding to a phone number in said update entry;
for each existing entry in said subset, calculating an associated score representing a matching correspondence between the update entry and said each existing entry;
for each existing entry in said subset, updating said associated score if a zip code match between said each existing entry and said update entry is determined;
determining if there is at least one associated score greater than a predetermined threshold;
if there is only one existing entry in the subset with an associated score greater than the predetermined threshold, determining this existing entry matches the update entry;
if there is not at least one associated score greater than a predetermined threshold;
for each existing entry in said subset, determining if there is a name length difference of more than 3 characters between said update entry and said each existing entry;
if there is no entry in said subset with a difference in name length of less than or equal to three, determining that there is no matching entry in the existing database for the update entry;
if there is at least one entry in said subset with a difference in name length of less than or equal to three, computing a name edit distance for each entry in said subset with a difference in name length of less than or equal to three, said name edit distance being the minimum number of character modifications to convert the name of the update entry to the name of said each entry; and
if there is only one entry in said subset with a distance less than a predetermined value, said one entry is determined to be a matching entry of the update entry.
-
-
7. A method executed in a computer system for determining if an update entry has a matching entry in an existing database comprising:
-
determining if the update entry includes a phone number indicating a toll-free telephone call;
if the update entry does not include a phone number which is toll-free;
determining if there are one or more entries in the existing database having an area code and exchange that match the update entry using a table identifying equivalent area codes;
if there are one or more entries in the database having an area code and exchange matching the update entry, forming said subset of said one or more entries if no existing entries in the database have an area code and exchange matching the update entry;
determining a filtered name for a name associated with the update record, said filtered name having predetermined insignificant search terms removed;
searching the existing database for entries which match the filtered name concatenated with a zip code of the update entry producing said subset;
determining that there is no matching entry in the existing database corresponding to the update entry if said subset includes more than five existing entries;
if the update entry includes a phone number which is toll-free;
determining a subset of one or more existing entries in the existing database in which each of said one or more existing entries has a phone number corresponding to a phone number in said update entry;
for each existing entry in said subset, calculating an associated score representing a matching correspondence between the update entry and said each existing entry;
for each existing entry in said subset, updating said associated score if a zip code match between said each existing entry and said update entry is determined;
determining if there is at least one associated score greater than a predetermined threshold; and
if there is only one existing entry in the subset with an associated score greater than the predetermined threshold, determining this existing entry matches the update entry.
-
-
8. A method executed in a computer system for determining if an update entry has a matching entry in an existing database comprising:
-
determining if the update entry includes a phone number which is toll-free;
if the update entry includes a phone number which is toll-free;
determining if said phone number and an associated zip code of said update entry match one or more existing entries in the existing database;
if said phone number and said associated zip code match one or more existing entries in the existing database, forming said subset of said one or more existing entries, and otherwise, determining that there is no matching entry in said existing database for said update entry;
if the update entry includes a phone number which is not toll-free;
determining a subset of one or more existing entries in the existing database in which each of said one or more existing entries has a phone number corresponding to a phone number in said update entry;
for each existing entry in said subset, calculating an associated score representing a matching correspondence between the update entry and said each existing entry;
for each existing entry in said subset, updating said associated score if a zip code match between said each existing entry and said update entry is determined;
determining if there is at least one associated score greater than a predetermined threshold; and
if there is only one existing entry in the subset with an associated score greater than the predetermined threshold, determining this existing entry matches the update entry.
-
-
9. A computer program product for determining if an update entry has a matching entry in an existing database comprising:
-
means for determining a subset of one or more existing entries in the existing database in which each of said one or more existing entries has a phone number corresponding to a phone number in said update entry;
means for calculating, for each existing entry in said subset, an associated score representing a matching correspondence between the update entry and said each existing entry;
means for updating, for each existing entry in said subset, said associated score if a zip code match between said each existing entry and said update entry is determined;
means for determining if there is at least one associated score greater than a predetermined threshold; and
means for determining that an existing entry matches the update record if said existing entry is the only entry having an associated score greater than the predetermined threshold.
-
-
10. A computer program product for determining if a data update entry has a matching entry in an existing database comprising:
-
machine executable code for determining a subset of one or more existing entries in the existing database in which each of said one or more existing entries has a phone number corresponding to a phone number in said update entry;
machine executable code for calculating, for each existing entry in said subset, an associated score representing a matching correspondence between the update entry and said each existing entry;
machine executable code for updating, for each existing entry in said subset, said associated score if a zip code match between said each existing entry and said update entry is determined;
machine executable code for determining if there is at least one associated score greater than a predetermined threshold; and
machine executable code for determining that an existing entry matches the update record if said existing entry is the only entry having an associated score greater than the predetermined threshold. - View Dependent Claims (11, 12, 13, 14)
machine executable code for performing a set comparison of name components to determine how many name components of an existing entry in said existing database match those of the update entry.
-
-
12. The computer program product of claim 10, wherein said existing database is in a first predetermined format and said update entry is of a second predetermined format, said first and second predetermined formats being different.
-
13. The computer program product of claim 10, wherein said existing database is in a normalized form.
-
14. The computer program product of claim 10, wherein a phone number and a zip code associated with an entry in said existing database uniquely identifies a corresponding business listing that includes said phone number, said zip code, and a business name.
-
15. A computer program product for determine if a data update entry has a matching entry in an existing database comprising:
-
machine executable code for determining a subset of one or more existing entries in the existing database in which each of said one or more existing entries has a phone number corresponding to a phone number in said update entry;
machine executable code for calculating, for each existing entry in said subset, an associated score representing a matching correspondence between the update entry and said each existing entry;
machine executable code for updating, for each existing entry in said subset, said associated score if a zip code match between said each existing entry and said update entry is determined;
machine executable code for determining if there is at least one associated score greater than a predetermined threshold;
machine executable code for determining that an existing entry matches the update record if said existing entry is the only entry having an associated score greater than the predetermined threshold;
machine executable code for determining that there is not at least one associated score greater than a predetermined threshold;
machine executable code for determining, for each existing entry in said subset, if there is a name length difference of more than three characters between said update entry and said each existing entry;
machine executable code for determining that there is no matching entry in the existing database for the update entry if there is no entry in said subset with a difference in name length of less than or equal to three;
machine executable code for computing a name edit distance for each entry in said subset with a difference in name length of less than or equal to three; and
machine executable code for determining that one entry in said subset is a matching entry if said one entry is determined to be an only entry in said subset with a distance less than a predetermined value.
-
-
16. A computer program product for determine if a data update entry has a matching entry in an existing database comprising:
-
machine executable code for determining if the update entry includes a phone number which is toll-free;
machine executable code for determining if there are one or more entries in the existing database having an area code and exchange that match the update entry using a table identifying equivalent area codes;
machine executable code for forming said subset of one or more entries if there are one or more entries in the database having an area code and exchange matching the update entry;
machine executable code for determining a subset of one or more existing entries in the existing database in which each of said one or more existing entries has a phone number corresponding to a phone number in said update entry;
machine executable code for calculating, for each existing entry in said subset, an associated score representing a matching correspondence between the update entry and said each existing entry;
machine executable code for updating, for each existing entry in said subset, said associated score if a zip code match between said each existing entry and said update entry is determined;
machine executable code for determining if there is at least one associated score greater than a predetermined threshold; and
machine executable code for determining that an existing entry matches the update record if said existing entry is the only entry having an associated score greater than the predetermined threshold. - View Dependent Claims (17)
machine executable code for determining a filtered name for a name associated with the update record, said filtered name having predetermined insignificant search terms removed;
machine executable code for searching the existing database for entries which match the filtered name concatenated with a zip code producing said subset; and
machine executable code for determining that there is no matching entry in the existing database corresponding to the update entry if said subset includes more than five existing entries.
-
-
18. A computer program product for determine if a data update entry has a matching entry in an existing database comprising:
-
machine executable code for determining if the update entry includes a phone number which is toll-free;
machine executable code for determining if said matching phone number and an associated zip code of said update entry match one or more existing entries in the existing database;
machine executable code for forming said subset of said one or more entries if said phone number and said associated zip code match one or more existing entries in the existing database, and for otherwise determining that there is no matching entry in said existing database for said update entry;
machine executable code for determining a subset of one or more existing entries in the existing database in which each of said one or more existing entries has a phone number corresponding to a phone number in said update entry;
machine executable code for calculating, for each existing entry in said subset, an associated score representing a matching correspondence between the update entry and said each existing entry;
machine executable code for updating, for each existing entry in said subset, said associated score if a zip code match between said each existing entry and said update entry is determined;
machine executable code for determining if there is at least one associated score greater than a predetermined threshold; and
machine executable code for determining that an existing entry matches the update record if said existing entry is the only entry having an associated score greater than the predetermined threshold.
-
Specification