Database correlation method
First Claim
1. A method for correlating a first location-dependant attribute of an underground reservoir in a first database to a second location-dependant attribute in a second database, said method comprising:
- a. determining a first location for said first attribute;
b. determining a second location for said second attribute;
c. comparing said locations using an algorithm;
d. determining said location-dependant attributes as at least partial duplications of each other if said locations are at least in part within a location tolerance; and
e. drilling into said underground reservoir.
1 Assignment
0 Petitions
Accused Products
Abstract
A multi-pass algorithm identifies duplicative information and correlates higher confidence and/or selected primary information in distributed databases. One embodiment determines a bounded area based at least in part on location information and/or location tolerances for a location-dependent attribute and comparing the bounded areas to previously indexed location information using a multi-pass algorithm to identify duplicative information. The algorithm may also use textual tolerances, confidence levels, and other factors to determine what information is to be correlated with the option of elevating the correlated information to a higher level database.
207 Citations
29 Claims
-
1. A method for correlating a first location-dependant attribute of an underground reservoir in a first database to a second location-dependant attribute in a second database, said method comprising:
-
a. determining a first location for said first attribute;
b. determining a second location for said second attribute;
c. comparing said locations using an algorithm;
d. determining said location-dependant attributes as at least partial duplications of each other if said locations are at least in part within a location tolerance; and
e. drilling into said underground reservoir. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
f. comparing a first textual name with a second textual name; and
g. considering said attributes as at least partial duplicates if said textual names differ by no more than a first textual tolerance.
-
-
7. The method of claim 6 wherein at least one of said textual names is a portion of a concatenated term that also includes location information.
-
8. The method of claim 7 wherein a confidence level is assigned to said attributes based on a location tolerance and a textual tolerance.
-
9. The method of claim 8 wherein the step of considering said attributes as at least partial duplicates is accomplished at a first confidence level.
-
10. A method for determining whether a first attribute in a first database is at least a partial duplication of a second attribute in a second database, said method comprising:
-
a. determining a first identifier for said first attribute;
b. determining a second identifier for said second attribute;
c. comparing at least in part said first and second identifiers using a multi-pass algorithm; and
d. determining that said second attribute is at least a partial duplicate of said first attribute if said second identifier is within a tolerance value of said first identifier. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A method for determining whether a first data point in a first database is not likely to be a duplication of a second data point in a second database, said method comprising:
-
a. building a first concatenated identifier for said first data point by concatenating at least a first textual identifier with a first location identifier for said data point;
b. comparing said first concatenated identifier to similar concatenated identifiers derived from a second database; and
c. if said first concatenated identifier is not within a tolerance of at least one of said concatenated identifiers derived from a second database, storing said first concatenated identifier. - View Dependent Claims (18, 19, 20, 21)
d. if said first concatenated identifier is within a tolerance of said identifiers derived from a second database, building a match index; and
e. counting the number of matched data points.
-
-
19. The method of claim 18 which also comprises the steps of:
-
f. comparing said number of matched data points with a number of data points having similar identifiers in each database; and
g. selecting correlatable data points from either said first or second database depending at least in part on the step of comparing said number of matched data points.
-
-
20. The method of claim 19 wherein said comparing step f finds less than 100 percent matching data points, but at least about 75 percent matching data at locations where data points exist.
-
21. The method of claim 20 which also comprises the steps of:
-
h. building a cross-reference table; and
i. identifying a textual identifier as duplicative.
-
-
22. A method for determining whether a first location-dependant data point at a first location having a first textual identifier is a likely duplication of a test location-dependant data point at a test location having a test textual identifier, said method comprising:
-
a. determining a location tolerance around said first location;
b. determining a textual tolerance associated with said first location-dependant data point;
c. comparing said test location with said location tolerance;
d. comparing said test textual identifier with said textual tolerance;
e. if said comparisons show location and textual identifiers outside the respective tolerances, storing said test data point; and
f. if said comparisons show location and textual identifiers within the respective tolerances, handling said test data point as duplicative of said first data point. - View Dependent Claims (23, 24, 25, 26)
g. repeating steps a-f for other location-dependent test data points; h. counting the number of test data points and the number of stored data points;
i. comparing the number of test data points with the number of stored data points; and
j. if the comparison shows at least 95% of said test data points as likely to be duplicative, handling all data points having a similar textual identifier as duplicative.
-
-
25. The method of claim 24 wherein said location tolerance is no more than 100 feet.
-
26. The method of claim 25 wherein said textual tolerance is a difference of no more than 2 characters.
-
27. A method for determining whether a first series of location-dependant data values is a likely duplication of a second series of location-dependent data values wherein both of said series of data values are taken at locations separated by similar distance intervals, said method comprising:
-
a. building a first concatenated identifier of a portion of said first series of data values wherein said concatenated identifier includes a textual identifier and location information;
b. building a second concatenated identifier of less than all of the second series of data values;
c. comparing said concatenated identifiers; and
d. if said comparing shows similar location and textual identifiers, determining that one of said series of data values is duplicative of the other series of data values.
-
-
28. A computer-based device for determining whether a first series of location-dependant data values is a likely duplication of a second series of location-dependent data values wherein both of said series of data are taken at locations separated by similar distance intervals, wherein said device is capable of:
-
a. building a first concatenated identifier of a portion of said first series of data values wherein said concatenated identifier includes a textual identifier and location information;
b. building a second concatenated identifier of less than all of the second series of data values;
c. comparing said concatenated identifiers; and
d. if said comparing shows similar location and textual identifiers, determining that one of said series of data values as duplicative of the other series of data values.
-
-
29. A method for correlating a first location-dependant attribute of an underground reservoir in a first database to a second location-dependant attribute in a second database, said method comprising:
-
a. determining a first underground location associated with a first value of said first attribute;
b. determining a second underground location associated with a second value of said second attribute;
c. comparing said locations using an algorithm;
d. determining said first and second values as at least a partial data duplication of each other if said locations are within a location tolerance;
e. analyzing at least some of said database values in the absence of at least one of said values determined to be at least a partial data duplication; and
f. drilling into said underground reservoir based at least in part on said analyzing.
-
Specification