Approximate string matching optimization for a database
First Claim
1. A computer program product comprising a computer readable storage medium having stored thereon:
- program instructions programmed to receive a query of a database, wherein the query includes a search value, and wherein the database includes a plurality of datasets;
program instructions programmed to identify the search value within the received query;
program instructions programmed to determine at least one reference value based on the identified search value;
program instructions programmed to determine a distance between the search value and the at least one reference value;
program instructions programmed to determine a maximum distance from the search value to be used in searching the database, wherein the maximum distance from the search value defines a search range and is based, at least in part, on the determined distance between the search value and the at least one reference value;
program instructions programmed to determine a subset of datasets from the plurality of datasets that includes datasets for which a data range with respect to each reference value overlaps with the search range; and
program instructions programmed to perform approximate string matching for the search value on the subset of datasets;
wherein;
each dataset of the plurality of datasets is assigned a minimum distance and a maximum distance between values of dataset entries and the at least one reference value; and
the minimum distance and the maximum distance for each dataset define the data range for the respective dataset with respect to the at least one reference value, and wherein the minimum and maximum distance are permanently stored in a respective dataset to which the minimum and maximum distance are assigned and transferred with the respective dataset when the dataset is copied to a new location or to a new database.
1 Assignment
0 Petitions
Accused Products
Abstract
Software for processing a database query that includes: (i) receiving a query of a database including a search value; (ii) determining a distance between the search value and at least one reference value; (iii) determining a maximum distance from the search value to be used in searching a plurality of datasets of the database, wherein the maximum distance from the search value defines a search range and is based, at least in part, on the determined distance between the search value and the at least one reference value; (iv) determining a subset of datasets from the plurality of datasets that includes datasets for which a data range with respect to each reference value overlaps with the search range; and (v) performing approximate string matching for the search value on the subset of datasets.
24 Citations
2 Claims
-
1. A computer program product comprising a computer readable storage medium having stored thereon:
-
program instructions programmed to receive a query of a database, wherein the query includes a search value, and wherein the database includes a plurality of datasets; program instructions programmed to identify the search value within the received query; program instructions programmed to determine at least one reference value based on the identified search value; program instructions programmed to determine a distance between the search value and the at least one reference value; program instructions programmed to determine a maximum distance from the search value to be used in searching the database, wherein the maximum distance from the search value defines a search range and is based, at least in part, on the determined distance between the search value and the at least one reference value; program instructions programmed to determine a subset of datasets from the plurality of datasets that includes datasets for which a data range with respect to each reference value overlaps with the search range; and program instructions programmed to perform approximate string matching for the search value on the subset of datasets; wherein; each dataset of the plurality of datasets is assigned a minimum distance and a maximum distance between values of dataset entries and the at least one reference value; and the minimum distance and the maximum distance for each dataset define the data range for the respective dataset with respect to the at least one reference value, and wherein the minimum and maximum distance are permanently stored in a respective dataset to which the minimum and maximum distance are assigned and transferred with the respective dataset when the dataset is copied to a new location or to a new database.
-
-
2. A computer system comprising:
-
a processor(s) set; and a computer readable storage medium; wherein; the processor set is structured, located, connected and/or programmed to run program instructions stored on the computer readable storage medium; and the program instructions include; program instructions programmed to receive a query of a database, wherein the query includes a search value, and wherein the database includes a plurality of datasets; program instructions programmed to identify the search value within the received query; program instructions programmed to determine at least one reference value based on the identified search value; program instructions programmed to determine a distance between the search value and the at least one reference value; program instructions programmed to determine a maximum distance from the search value to be used in searching the database, wherein the maximum distance from the search value defines a search range and is based, at least in part, on the determined distance between the search value and the at least one reference value; program instructions programmed to determine a subset of datasets from the plurality of datasets that includes datasets for which a data range with respect to each reference value overlaps with the search range; and program instructions programmed to perform approximate string matching for the search value on the subset of datasets; wherein; each dataset of the plurality of datasets is assigned a minimum distance and a maximum distance between values of dataset entries and the at least one reference value; and the minimum distance and the maximum distance for each dataset define the data range for the respective dataset with respect to the at least one reference value, and wherein the minimum and maximum distance are permanently stored in a respective dataset to which the minimum and maximum distance are assigned and transferred with the respective dataset when the dataset is copied to a new location or to a new database.
-
Specification