Method and apparatus for automatically determining salient features for object classification
First Claim
Patent Images
1. A method for classifying one or more electronic documents, said method comprising:
- extracting one or more unique features from a first content group of data objects representing a first group of electronic documents to form a first feature list;
extracting one or more unique features from a second anti-content group of data objects representing a second group of electronic documents to form a second feature list;
identifying those unique features of said first feature list that are not present in said second feature list;
identifying those unique features of said first feature list that are also present in said second feature list;
creating a ranked list of features by applying statistical differentiation between unique features of said first feature list and unique features of said second feature list, wherein those unique features of said first feature list that are not present in said second feature list are ranked higher within said ranked list as compared to those unique features of said first feature list that are also present in said second feature list;
identifying a set of salient features from said ranked list of features, wherein the set of salient features distinguishes the first group of electronic documents from the second group of electronic documents; and
classifying the first group of electronic documents and the second group of electronic documents based on the set of salient features.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for automatically determining salient features for object classification is provided. In accordance with one embodiment, one or more unique features are extracted from a first content group of objects to form a first feature list, and one or more unique features are extracted from a second anti-content group of objects to form a second feature list. A ranked list of features is then created by applying statistical differentiation between unique features of the first feature list and unique features of the second feature list. A set of salient features is then identified from the resulting ranked list of features.
130 Citations
31 Claims
-
1. A method for classifying one or more electronic documents, said method comprising:
-
extracting one or more unique features from a first content group of data objects representing a first group of electronic documents to form a first feature list;
extracting one or more unique features from a second anti-content group of data objects representing a second group of electronic documents to form a second feature list;
identifying those unique features of said first feature list that are not present in said second feature list;
identifying those unique features of said first feature list that are also present in said second feature list;
creating a ranked list of features by applying statistical differentiation between unique features of said first feature list and unique features of said second feature list, wherein those unique features of said first feature list that are not present in said second feature list are ranked higher within said ranked list as compared to those unique features of said first feature list that are also present in said second feature list;
identifying a set of salient features from said ranked list of features, wherein the set of salient features distinguishes the first group of electronic documents from the second group of electronic documents; and
classifying the first group of electronic documents and the second group of electronic documents based on the set of salient features. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 15)
-
-
11. A method for classifying one or more electronic documents, the method comprising:
-
identifying one or more unique features that are members of a first data class, said first data class comprising a first group of electronic documents;
examining a second data class to identify those of said one or more unique features that are also members of said second data class, and those of said one or more unique features that are not members of said second data class, said second data class comprising a second group of electronic documents;
generating a ranked list of unique features having an order based upon membership of each of said one or more unique features within said second data class, wherein those of said unique features that are not members of said second data class are ranked higher in said ranked list than those of said unique features that are also members of said second data class;
identifying as salient one or more of said ranked list of unique features, wherein said one or more of said ranked list of unique features identified as salient distinguish the first group of electronic documents from the second group of electronic documents; and
classifying the first group of electronic documents from the second group of electronic documents based on said one or more of said ranked list of unique features identified as salient. - View Dependent Claims (12, 13, 14)
-
-
16. An apparatus for classifying one or more electronic documents, said apparatus comprising:
a storage medium having stored therein a plurality of programming instructions designed to implement a plurality of functions of a category name service for providing a category name to a data object, including first one or more functions to extract one or more unique features from a first content group of data objects representing a first group of electronic documents to form a first feature list, extract one or more unique features from a second anti-content group of data objects representing a second group of electronic documents to form a second feature list, identify those unique features of said first feature list that are not present in said second feature list, identify those unique features of said first feature list that are also present in said second feature list, create a ranked list of features by applying statistical differentiation between unique features of said first feature list and unique features of said second feature list, wherein those unique features of said first feature list that are not present in said second feature list are ranked higher within said ranked list as compared to those unique features of said first feature list that are also present in said second feature list, identify a set of salient features from said ranked list of features, wherein said set of salient features distinguishes the first group of electronic documents from the second group of electronic documents, and classify the first group of electronic documents and the second group of electronic documents based on the set of salient features. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
27. An apparatus comprising:
-
a storage medium having stored therein a plurality of programming instructions designed to implement a plurality of functions including first one or more functions to identify one or more unique features that are members of a first data class, said first data class comprising a first group of electronic documents, examine a second data class to identify those of said one or more unique features that are also members of said second data class, and those of said one or more unique features that are not members of said second data class, said second data class comprising a second group of electronic documents, generate a ranked list of unique features having an order based upon membership of each of said one or more unique features within said second data class, wherein those of said unique features that are not members of said second data class are ranked higher in said ranked list than those of said unique features that are also members of said second data class, and identify as salient one or more of said ranked list of unique features, wherein said salient distinguishes the first group of electronic documents from the second group of electronic documents, and classify the first group of electronic documents and the second group of electronic documents based on the set of salient features. - View Dependent Claims (28, 29, 30, 31)
-
Specification