Method for processing a database query
First Claim
1. A computer program product for processing a query in a database, the computer program product comprising:
- one or more computer-readable storage devices and program instructions stored on at least one of the one or more tangible storage devices, the program instructions executable by a processor, the program instructions comprising;
program instructions to determine a plurality of reference values for a plurality of datasets with entries associated with the database, wherein the database is stored on a first computer, wherein a number of characters in each reference value within the determined plurality of reference values is equal to or less than a maximum number of characters per entry of the datasets, wherein determining the plurality of reference values comprises determining a frequency of a certain character on a certain digit of the entries of the database and selecting each reference value within the plurality of reference values based on a plurality of characters being found with a highest frequency on a plurality of individual digits per entry of the datasets, and wherein a sequence of the plurality of characters associated with each reference value within the plurality of reference values is adapted to a plurality of sequences of characters of the plurality of values of the entries of the dataset;
program instructions to assign the determined plurality of reference values to the plurality of datasets with entries associated with the database;
program instructions to assign a plurality of distance statistics to the plurality of datasets associated with the database, wherein the assigned plurality of distance statistics describe a minimum and a maximum distance between a plurality of values of the entries of a dataset within the plurality of datasets and an assigned reference value within the assigned plurality of reference values;
program instructions to receive, from a second computer, the query associated with the database, wherein the received query comprises a search value;
program instructions to identify the search value within the received query;
program instructions to determine a search reference value based on the identified search value, wherein a first three characters of the identified search value matches a first three characters of the determined search reference value;
program instructions to determine the distance between the identified search value and the determined search reference value, said determination resulting in a search distance;
program instructions to determine a subset of datasets from the plurality of datasets for which the search distance is within a limit given by the minimum and maximum distances described by the respective distance statistics; and
program instructions to search for the search value in the subset of datasets.
1 Assignment
0 Petitions
Accused Products
Abstract
The invention relates to a computer-implemented method for processing a query in a database, the query comprising a search value. The database comprises a plurality of datasets the datasets comprising entries, wherein distance statistics are assigned to the datasets. The distance statistics describe the minimum and maximum distance between the values of the entries of a dataset of the plurality of datasets and a reference value. The method comprises determining the distance between the search value and the reference value, said determination resulting in a search distance, determining a subset of datasets from the plurality of datasets for which the search distance is within the limits given by the minimum and maximum distances described by the respective distance statistics, and searching for the search value in the subset of datasets.
-
Citations
8 Claims
-
1. A computer program product for processing a query in a database, the computer program product comprising:
-
one or more computer-readable storage devices and program instructions stored on at least one of the one or more tangible storage devices, the program instructions executable by a processor, the program instructions comprising; program instructions to determine a plurality of reference values for a plurality of datasets with entries associated with the database, wherein the database is stored on a first computer, wherein a number of characters in each reference value within the determined plurality of reference values is equal to or less than a maximum number of characters per entry of the datasets, wherein determining the plurality of reference values comprises determining a frequency of a certain character on a certain digit of the entries of the database and selecting each reference value within the plurality of reference values based on a plurality of characters being found with a highest frequency on a plurality of individual digits per entry of the datasets, and wherein a sequence of the plurality of characters associated with each reference value within the plurality of reference values is adapted to a plurality of sequences of characters of the plurality of values of the entries of the dataset; program instructions to assign the determined plurality of reference values to the plurality of datasets with entries associated with the database; program instructions to assign a plurality of distance statistics to the plurality of datasets associated with the database, wherein the assigned plurality of distance statistics describe a minimum and a maximum distance between a plurality of values of the entries of a dataset within the plurality of datasets and an assigned reference value within the assigned plurality of reference values; program instructions to receive, from a second computer, the query associated with the database, wherein the received query comprises a search value; program instructions to identify the search value within the received query; program instructions to determine a search reference value based on the identified search value, wherein a first three characters of the identified search value matches a first three characters of the determined search reference value; program instructions to determine the distance between the identified search value and the determined search reference value, said determination resulting in a search distance; program instructions to determine a subset of datasets from the plurality of datasets for which the search distance is within a limit given by the minimum and maximum distances described by the respective distance statistics; and program instructions to search for the search value in the subset of datasets. - View Dependent Claims (2, 4)
-
-
3. The computer program product 14, wherein the method further comprises determining from the plurality of search distances a minimum search distance and the respective first reference value, wherein when determining the subset of datasets from the plurality of datasets only the minimum and maximum distances for the first reference value are considered.
-
5. A computer system for processing a query in a database, the computer system comprising:
-
one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage devices, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, wherein the computer system is capable of performing a method comprising; determining a plurality of reference values for a plurality of datasets with entries associated with the database, wherein the database is stored on a first computer, wherein a number of characters in each reference value within the determined plurality of reference values is equal to or less than a maximum number of characters per entry of the datasets, wherein determining the plurality of reference values comprises determining a frequency of a certain character on a certain digit of the entries of the database and selecting each reference value within the plurality of reference values based on a plurality of characters being found with a highest frequency on a plurality of individual digits per entry of the datasets, and wherein a sequence of the plurality of characters associated with each reference value within the plurality of reference values is adapted to a plurality of sequences of characters of the plurality of values of the entries of the dataset; assigning the determined plurality of reference values to the plurality of datasets with entries associated with the database; assigning a plurality of distance statistics to the plurality of datasets associated with the database, wherein the assigned plurality of distance statistics describe a minimum and a maximum distance between a plurality of values of the entries of a dataset within the plurality of datasets and an assigned reference value within the assigned plurality of reference values; receiving, from a second computer, the query associated with the database, wherein the received query comprises a search value; identifying the search value within the received query; determining a search reference value based on the identified search value, wherein a first three characters of the identified search value matches a first three characters of the determined search reference value; determining the distance between the identified search value and the determined search reference value, said determination resulting in a search distance; determining a subset of datasets from the plurality of datasets for which the search distance is within a limit given by the minimum and maximum distances described by the respective distance statistics; and searching for the search value in the subset of datasets. - View Dependent Claims (6, 7, 8)
-
Specification