DEVICES AND METHOD FOR SCORING DATA TO QUICKLY IDENTIFY RELEVANT ATTRIBUTES FOR INSTANT CLASSIFICATION
First Claim
1. A method for allowing a computer to classify an input containing data, the method comprising:
- receiving, by the computer, a list of categories from a classification system stored at the computer, wherein each category in the list is assigned a corresponding distinct correlation score that indicates a relevance of a particular category to a particular classification;
selecting, by the computer, a sub-list of categories, wherein the sub-list comprises those categories in the list that have corresponding distinct correlation scores above a predetermined value, and wherein the sub-list of categories comprises less than all of the categories in the list;
receiving, by the computer, the input containing the data, wherein the data is organized by a plurality of input attributes, wherein each input attribute has a corresponding input category and a corresponding input value, such that the plurality of input attributes has a plurality of input categories and a corresponding plurality of input values, wherein at least some of the plurality of input attributes has corresponding input categories that match at least some of a plurality of categories of the list, and wherein the plurality of input attributes tends to over-correlate with respect to the classification system, the tendency to over-correlate occurring as a result of an amount of the data;
generating, by the computer, a truncated snapshot, the truncated snapshot comprising only attributes from the plurality of input attributes that have the corresponding input categories that match categories in the sub-list of categories; and
classifying the data, by the computer, using the truncated snapshot and the classification system.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for allowing a computer to classify an input containing data. A list of categories is received. A sub list of categories is selected, wherein the sub-list comprises those categories in the list that have corresponding distinct correlation scores above a predetermined value. Input data that tends to over correlate to the classification system is received. A truncated snapshot is generated, the truncated snapshot comprising only attributes from the plurality of input attributes that have corresponding input categories that match categories in the sub-list of categories. The data is classified using the truncated snapshot and the classification system.
34 Citations
20 Claims
-
1. A method for allowing a computer to classify an input containing data, the method comprising:
-
receiving, by the computer, a list of categories from a classification system stored at the computer, wherein each category in the list is assigned a corresponding distinct correlation score that indicates a relevance of a particular category to a particular classification; selecting, by the computer, a sub-list of categories, wherein the sub-list comprises those categories in the list that have corresponding distinct correlation scores above a predetermined value, and wherein the sub-list of categories comprises less than all of the categories in the list; receiving, by the computer, the input containing the data, wherein the data is organized by a plurality of input attributes, wherein each input attribute has a corresponding input category and a corresponding input value, such that the plurality of input attributes has a plurality of input categories and a corresponding plurality of input values, wherein at least some of the plurality of input attributes has corresponding input categories that match at least some of a plurality of categories of the list, and wherein the plurality of input attributes tends to over-correlate with respect to the classification system, the tendency to over-correlate occurring as a result of an amount of the data; generating, by the computer, a truncated snapshot, the truncated snapshot comprising only attributes from the plurality of input attributes that have the corresponding input categories that match categories in the sub-list of categories; and classifying the data, by the computer, using the truncated snapshot and the classification system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A data processing system comprising:
-
a processor; a bus connected to the processor; a non-transitory computer readable storage medium connected to the bus, the non-transitory computer readable storage medium storing a computer program product which, when executed by the processor, performs a computer implemented method for allowing a computer to classify an input containing data, the computer program product comprising; computer usable program code for receiving a list of categories from a classification system stored at the computer, wherein each category in the list is assigned a corresponding distinct correlation score that indicates a relevance of a particular category to a particular classification; computer usable program code for selecting a sub-list of categories, wherein the sub-list comprises those categories in the list that have corresponding distinct correlation scores above a predetermined value, and wherein the sub-list of categories comprises less than all of the categories in the list; computer usable program code for receiving the input containing the data, wherein the data is organized by a plurality of input attributes, wherein each input attribute has a corresponding input category and a corresponding input value, such that the plurality of input attributes has a plurality of input categories and a corresponding plurality of input values, wherein at least some of the plurality of input attributes has corresponding input categories that match at least some of a plurality of categories of the list of categories, and wherein the plurality of input attributes tends to over-correlate with respect to the classification system, the tendency to over-correlate occurring as a result of an amount of the data; computer usable program code for generating a truncated snapshot, the truncated snapshot comprising only attributes from the plurality of input attributes that have corresponding input categories that match categories in the sub-list of categories; and computer usable program code for classifying the data using the truncated snapshot and the classification system. - View Dependent Claims (10, 11, 12, 13)
-
-
14. A system configured to classify an input containing data, the system comprising:
-
a tangible input device configured to receive a list of categories from a classification system stored at a computer, wherein each category in the list is assigned a corresponding distinct correlation score that indicates a relevance of a particular category to a particular classification; a selection device configured to select a sub-list of categories, wherein the sub-list comprises those categories in the list that have corresponding distinct correlation scores above a predetermined value, and wherein the sub-list of categories comprises less than all of the categories in the list; a data input device configured to receive the input, the input containing data, wherein the data is organized by a plurality of input attributes, wherein each input attribute has a corresponding input category and a corresponding input value, such that the plurality of input attributes has a plurality of input categories and a corresponding plurality of input values, wherein at least some of the plurality of input attributes has corresponding input categories that match at least some of a plurality of categories of the list, and wherein the plurality of input attributes tends to over-correlate with respect to the classification system, the tendency to over-correlate occurring as a result of an amount of the data; a snapshot generator configured to generate a truncated snapshot, the truncated snapshot comprising only attributes from the plurality of input attributes that have corresponding input categories that match categories in the sub-list of categories; and a classifier configured to classify the data using the truncated snapshot and the classification system. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification