System and method for selecting data to be corrected
First Claim
1. A method for selecting data to be corrected, comprising:
- (a) receiving a plurality of data elements from a first storage device and identifying a pool of data elements from the plurality of data elements;
(b) selecting a set of random data elements from the pool;
(c) scoring the random data elements in the set to generate a first score;
(d) identifying a predetermined number of lowest scoring data elements and a predetermined number of highest data elements;
(e) removing the lowest scoring data elements from the set;
(f) selecting data elements from the pool related to the highest scoring data elements in the set;
(g) scoring the set to determine a current score;
(h) determining if the current set score is within a predetermined range of the first score; and
(i) if the current set score is not within a predetermined desired range of the previous set score, then repeating steps (c) through (i) until the current set score is within a predetermined desired range of the previous set score, otherwise identifying the set as data to be corrected and storing the data in a second storage device;
wherein scoring the data elements in steps (c) and (g) is based on at least one of a type of the data, a domain of the data;
a structure of the data;
a size of the data, and a volume of the data that comprises the data elements.
1 Assignment
0 Petitions
Accused Products
Abstract
A pool of data elements is identified. A set of random data elements is selected from the pool. The data elements in the set are scored. Data elements may be scored based on attributes of the data such as, for example, the type of the data, the domain of the data; the structure of the data; the size of the data, and the volume of the data. The lowest scoring data elements are removed from the set. The lowest scoring data elements are replaced by data elements from the pool that are related to the highest scoring data elements in the set. The set is scored. It is determined whether the current set score is within a predetermined desired range of the previous set score. If the current set score is not within a predetermined desired range of the previous set score, then the process is repeated.
14 Citations
20 Claims
-
1. A method for selecting data to be corrected, comprising:
-
(a) receiving a plurality of data elements from a first storage device and identifying a pool of data elements from the plurality of data elements; (b) selecting a set of random data elements from the pool; (c) scoring the random data elements in the set to generate a first score; (d) identifying a predetermined number of lowest scoring data elements and a predetermined number of highest data elements; (e) removing the lowest scoring data elements from the set; (f) selecting data elements from the pool related to the highest scoring data elements in the set; (g) scoring the set to determine a current score; (h) determining if the current set score is within a predetermined range of the first score; and (i) if the current set score is not within a predetermined desired range of the previous set score, then repeating steps (c) through (i) until the current set score is within a predetermined desired range of the previous set score, otherwise identifying the set as data to be corrected and storing the data in a second storage device; wherein scoring the data elements in steps (c) and (g) is based on at least one of a type of the data, a domain of the data;
a structure of the data;
a size of the data, and a volume of the data that comprises the data elements. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system for selecting data to be corrected, comprising:
-
a processor operative to execute computer executable instruction; and memory having stored therein computer executable instructions for performing the following steps; (a) receiving a plurality of data elements from a first storage device and identifying a pool of data elements from the plurality of data elements; (b) selecting a set of random data elements from the pool; (c) scoring the random data elements in the set to generate a first score; (d) identifying a predetermined number of lowest scoring data elements and a predetermined number of highest data elements; (e) removing the lowest scoring data elements from the set; (f) selecting data elements from the pool related to the highest scoring data elements in the set; (g) scoring the set to determine a current score; (h) determining if the current set score is within a predetermined range of the first score; and (i) if the current set score is not within a predetermined desired range of the previous set score, then repeating steps (c) through (i) until the current set score is within a predetermined desired range of the previous set score, otherwise identifying the set as data to be corrected and storing the data in a second storage device; wherein the computer executable instructions for performing the scoring of the data elements in steps (c) and (g) are based on at least one of a type of the data, a domain of the data;
a structure of the data;
a size of the data, and a volume of the data that comprises the data elements. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer readable medium having stored thereon computer readable instructions for performing the following steps:
-
(a) receiving a plurality of data elements from a first storage device and identifying a pool of data elements from the plurality of data elements; (b) selecting a set of random data elements from the pool; (c) scoring the random data elements in the set to generate a first score; (d) identifying a predetermined number of lowest scoring data elements and a predetermined number of highest data elements; (e) removing the lowest scoring data elements from the set; (f) selecting data elements from the pool related to the highest scoring data elements in the set; (g) scoring the set to determine a current score; (h) determining if the current set score is within a predetermined range of the first score; and (i) if the current set score is not within a predetermined desired range of the previous set score, then repeating steps (c) through (i) until the current set score is within a predetermined desired range of the previous set score, otherwise identifying the set as data to be corrected and storing the data in a second storage device; wherein the computer readable instructions for performing the scoring of the data elements in steps (c) and (g) are based on at least one of a type of the data, a domain of the data;
a structure of the data;
a size of the data, and a volume of the data that comprises the data elements. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification