Method and apparatus for analyzing the quality of the content of a database
First Claim
1. A computer method comprising:
- selecting a first field of a database for analysis, the database comprising a two-dimensional database wherein each row represents a record corresponding to an item in an electronic catalog and each column represents a field corresponding to an attribute of the item;
selecting a second field of the database for analysis;
fetching values for the first field for each record of the database;
comparing the fetched values of the first field to a thesaurus, the thesaurus including a list of synonyms for units of measure and abbreviations;
assigning a first consistency score to the first field based on comparing the fetched values of the first field;
fetching values for the second field for each record of the database;
comparing the fetched values of the second field to the thesaurus;
assigning a second consistency score for the second field based on comparing the fetched values of the second field; and
assigning an overall consistency score for the database by combing the first consistency score and the second consistency score.
10 Assignments
0 Petitions
Accused Products
Abstract
The present invention provides a method for scoring a searchable electronic catalog such as are in use in e-commerce and industrial materiel systems. Such catalogs are typically configured as databases which the present invention analyzes for a quality, for example, completeness, consistency or comprehensibility. The method includes selecting fields of the database that are to be analyzed, ranking the fields in order of pertinence to the quality that is to be measured, fetching values for each record of the database from the fields that are to be analyzed and comparing the fetched values to a standard. After the comparison, a score is assigning for each field based on the comparison. The scores are weighted for each field based on the rank of each field and the weighted scores are combined to obtain a score for the database. A variety of different qualities can be evaluated and the resulting scores can be used to compare databases or to localize deficiencies in databases for improvement.
75 Citations
22 Claims
-
1. A computer method comprising:
-
selecting a first field of a database for analysis, the database comprising a two-dimensional database wherein each row represents a record corresponding to an item in an electronic catalog and each column represents a field corresponding to an attribute of the item;
selecting a second field of the database for analysis;
fetching values for the first field for each record of the database;
comparing the fetched values of the first field to a thesaurus, the thesaurus including a list of synonyms for units of measure and abbreviations;
assigning a first consistency score to the first field based on comparing the fetched values of the first field;
fetching values for the second field for each record of the database;
comparing the fetched values of the second field to the thesaurus;
assigning a second consistency score for the second field based on comparing the fetched values of the second field; and
assigning an overall consistency score for the database by combing the first consistency score and the second consistency score. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
ranking the first field and the second field in order of importance; and
weighting the first consistency score and the second consistency score based on the respective ranks of the first field and the second field.
-
-
3. The method of claim 2 wherein weighting the first consistency score and the second consistency score based on the rank of each field comprises assigning a weight to each field based on the rank of the field and multiplying the total points assigned to the field by the weight.
-
4. The method of claim 3 further comprising classifying fetched values for the first field into types, counting the number of each value type for the first field and assigning a comprehensibility score based on the number of each value type for the first field.
-
5. The method of claim 4 wherein the value types include one or more of nouns and adjectives.
-
6. The method of claim 4 wherein assigning a comprehensibility score includes forming a ratio of value types in the first field to other value types in the first field and comparing the ratio to a desired ratio.
-
7. The method of claim 1 further comprising:
-
assigning a first completeness score for the first field by comparing the fetched values for the first field by assigning points for each non-null value so that the first completeness score corresponds to the number of non-null values for all records in the first field;
assigning a second completeness score for the second field by assigning points for each non-null value so that the second completeness score corresponds to the number of non-null values for all records in the second field; and
assigning an overall completeness score for the database by combining the first completeness score and the second completeness score.
-
-
8. The method of claim 7 wherein the first field corresponds to units of measure, and wherein assigning a first consistency score includes assigning points for each use of an alternate expression for the same unit of measure.
-
9. The method of claim 1 wherein assigning a first consistency score comprises assigning points for each fetched value that does not match a thesaurus value so that the first consistency score corresponds to the number of non-matching values for all records for the first field.
-
10. The method of claim 9 wherein the first field contains values that are abbreviations, wherein the thesaurus contains alternative abbreviations with the same meaning and wherein assigning a first consistency score includes assigning points for each use of an alternate abbreviation for the same meaning.
-
11. The method of claim 1 further comprising assigning a comprehensibility score for the first field by comparing the fetched values for the first field to the thesaurus and assigning points for each fetched value that does not match a thesaurus value so that the comprehensibility score corresponds to the number of non-matching values for all records for the first field.
-
12. A machine-readable medium having stored thereon data representing sequences of instructions which, when executed by a processor, cause the processor to perform the steps of:
-
selecting a first field of a database for analysis, the database comprising a two-dimensional database wherein each row represents a record corresponding to an item in an electronic catalog and each column represents a field corresponding to an attribute of the item;
selecting a second field of the database for analysis;
fetching values for the first field for each record of the database;
comparing the fetched values of the first field to a thesaurus, the thesaurus including a list of synonyms for units of measure and abbreviations;
assigning a first consistency score to the first field based on comparing the fetched values of the first field;
fetching values for the second field for each record of the database;
comparing the fetched values of the second field to the thesaurus;
assigning a second consistency score for the second field based on comparing the fetched values of the second field; and
assigning an overall consistency score for the database by combining the first consistency score and the second consistency score. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
ranking the first field and the second field in order of importance; and
weighting the first consistency score and the second consistency score based on the respective ranks of the first field and the second field.
-
-
14. The medium of claim 13 wherein weighting the first consistency score and the second consistency score based on the rank of each field comprises assigning a weight to each field based on the rank of the field and multiplying the total points assigned to the field by the weight.
-
15. The medium of claim 12 further comprising:
-
assigning a first completeness score for the first field by comparing the fetched values for the first field by assigning points for each non-null value so that the first completeness score corresponds to the number of non-null values for all records in that field;
assigning a second completeness score for the second field by assigning points for each non-null value so that the second completeness score corresponds to the number of non-null values for all records in the second field; and
assigning an overall completeness score for the database by combining the first completeness score and the second completeness score.
-
-
16. The medium of claim 12 wherein assigning a first consistency score comprises assigning points for each fetched value that does not match a thesaurus value so that the first consistency score corresponds to the number of non-matching values for all records for the first field.
-
17. The medium of claim 16 wherein the first field corresponds to units of measure, and wherein assigning a score includes assigning points for each use of an alternate expression for the same unit of measure.
-
18. The medium of claim 16 wherein the first field contains values that are abbreviations, wherein the thesaurus contains alternative abbreviations with the same meaning and wherein assigning a first consistency score includes assigning points for each use of an alternate abbreviation for the same meaning.
-
19. The medium of claim 12 further comprising assigning a comprehensibility score for the first field by comparing the fetched values for the first field to the thesaurus;
- and assigning points for each fetched value that does not match a thesaurus value so that the comprehensibility score a for the first field corresponds to the number of non-matching values for all records for the first field.
-
20. The medium of claim 19 further comprising classifying fetched values for the first field into types, counting the number of each value type for the first field and assigning a comprehensibility score based on the number of each value type for the first field.
-
21. The medium of claim 20 wherein the value types include one or more of nouns and adjectives.
-
22. The medium of claim 20 wherein assigning a comprehensibility score based on the number of each value type includes forming a ratio of value types in the first field to other value types in the first field and comparing the ratio to a desired ratio.
Specification