Error checking database records
First Claim
1. An apparatus for performing a quality check on database records, the apparatus comprising:
- a processor; and
a non-transitory computer readable medium storing a plurality of instructions, which when executed, cause the one or more processors to;
select, by a database system, a first record comprising a first entity being assigned a first predefined label, and a second entity being assigned a second predefined label, the second predefined label being different than the first predefined label, the first entity including multiple terms;
rearrange, by the database system, the multiple terms of the first entity into a plurality of permutations;
evaluate, by the database system, for each permutation, a likelihood that the permutation corresponds to the first entity of the first record;
determine, by the database system, a first number of times that the permutation of the first entity with a highest likelihood is assigned to the first predefined label in stored database records;
determine, by the database system, a second number of times that the permutation of the first entity with the highest likelihood is assigned to any label in the stored database records;
determine, by the database system, a likelihood that the first predefined label that is assigned to permutation of the first entity with the highest likelihood is correct, wherein the assigned first predefined label likelihood is determined based the first number and the second number; and
initiate, by the database system, action to correct the first record when the assigned first predefined label likelihood is less than a first threshold.
2 Assignments
0 Petitions
Accused Products
Abstract
An error checking technique for database records. A record is selected and its entities are compared with the entities of other records stored in the database to determine a likelihood that the labels associated with the entities of the selected record are correct. The likelihood for each entity of the selected record being correctly labeled can be determined by comparing the number of times that the entity appears in the database records with that label to the number of times that the entity appears in the database records with any other label. If the likelihood does not exceed a threshold, then an error is likely, and action can be taken to correct the record.
-
Citations
14 Claims
-
1. An apparatus for performing a quality check on database records, the apparatus comprising:
-
a processor; and a non-transitory computer readable medium storing a plurality of instructions, which when executed, cause the one or more processors to; select, by a database system, a first record comprising a first entity being assigned a first predefined label, and a second entity being assigned a second predefined label, the second predefined label being different than the first predefined label, the first entity including multiple terms; rearrange, by the database system, the multiple terms of the first entity into a plurality of permutations; evaluate, by the database system, for each permutation, a likelihood that the permutation corresponds to the first entity of the first record; determine, by the database system, a first number of times that the permutation of the first entity with a highest likelihood is assigned to the first predefined label in stored database records; determine, by the database system, a second number of times that the permutation of the first entity with the highest likelihood is assigned to any label in the stored database records; determine, by the database system, a likelihood that the first predefined label that is assigned to permutation of the first entity with the highest likelihood is correct, wherein the assigned first predefined label likelihood is determined based the first number and the second number; and initiate, by the database system, action to correct the first record when the assigned first predefined label likelihood is less than a first threshold. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer program product comprising computer-readable program code to be executed by one or more processors when retrieved from a non-transitory computer-readable medium, the program code including instructions to:
-
select, by a database system, a first record comprising a first entity being assigned a first predefined label and a second entity being assigned a second predefined label, the second predefined label being different than the first predefined label, the first entity including multiple terms; rearrange, by the database system, the multiple terms of the first entity into a plurality of permutations; evaluate, by the database system, for each permutation, a likelihood that the permutation corresponds to the first entity of the first record; determine, by the database system, a first number of times that the permutation of the first entity with a highest likelihood is assigned to the first predefined label in stored database records; determine, by the database system, a second number of times that the permutation of the first entity with the highest likelihood is assigned to any label in the stored database records; determine, by the database system, a likelihood that the first predefined label that is assigned to permutation of the first entity with the highest likelihood is correct, wherein the assigned first predefined label likelihood is determined based the first number and the second number; and initiate, by the database system, action to correct the first record when the assigned first predefined label likelihood is less than a first threshold. - View Dependent Claims (7, 8, 9)
-
-
10. A method for performing a quality check on database records, the method comprising:
-
selecting, by a database system, a first record comprising a first entity being assigned a first predefined label and a second entity being assigned a second predefined label, the second predefined label being different than the first predefined label, the first entity including multiple terms; rearranging, by the database system, the multiple terms of the first entity into a plurality of permutations; evaluating, by the database system, for each permutation, a likelihood that the permutation corresponds to the first entity of the first record, determining, by the database system, a first number of times that the permutation of the first entity with a highest likelihood is assigned to the first predefined label in stored database records; determining, by the database system, a second number of times that the permutation of the first entity with the highest likelihood is assigned to any label in the stored database records; determining, by the database system, a likelihood that the first predefined label that is assigned to permutation of the first entity with the highest likelihood is correct, wherein the assigned first predefined label likelihood is determined based the first number and the second number; and initiating, by the database system, action to correct the first record when the assigned first predefined label likelihood is less than a first threshold. - View Dependent Claims (11, 12, 13, 14)
-
Specification