Method for classifying unknown electronic documents based upon at least one classificaton
First Claim
1. A method comprising:
- receiving an electronic document;
analyzing, using at least one processor, the electronic document to extract a number of first attributes from the electronic document, the number of first attributes selected from a first attribute set;
obtaining a first classification output for the electronic document based upon the number of first attributes extracted from the electronic document;
if the first classification output for the electronic document is above a first threshold, withholding delivery of the electronic document;
if the first classification output for the electronic document is below the first threshold, analyzing the electronic document to extract a number of second attributes from the electronic document, the number of second attributes selected from a second attribute set, and obtaining a second classification output for the electronic document based upon the number of second attributes extracted from the electronic document;
if the second classification output for the electronic document is above a second threshold, withholding delivery of the electronic document; and
if the second classification output for the electronic document is below the second threshold, delivering the electronic document.
5 Assignments
0 Petitions
Accused Products
Abstract
A classification system includes a signature-based duplicate detector and an inductive classifier that share attribute information. To perform the duplicate detection and the classification, the duplicate detector and inductive classifier are first initialized by generating a lexicon of attributes for the duplicate detector and a classification model for the classifier. To develop a classification model, a training set of documents of known class are used by the classifier to determine the attributes of the documents that are most useful in classifying an unknown document. The model is developed from these attributes. Attribute information containing the attributes determined by the classifier is then passed to the duplicate detector and the duplicate detector uses the attribute information to generate the lexicon of attributes.
-
Citations
21 Claims
-
1. A method comprising:
-
receiving an electronic document; analyzing, using at least one processor, the electronic document to extract a number of first attributes from the electronic document, the number of first attributes selected from a first attribute set; obtaining a first classification output for the electronic document based upon the number of first attributes extracted from the electronic document; if the first classification output for the electronic document is above a first threshold, withholding delivery of the electronic document; if the first classification output for the electronic document is below the first threshold, analyzing the electronic document to extract a number of second attributes from the electronic document, the number of second attributes selected from a second attribute set, and obtaining a second classification output for the electronic document based upon the number of second attributes extracted from the electronic document; if the second classification output for the electronic document is above a second threshold, withholding delivery of the electronic document; and if the second classification output for the electronic document is below the second threshold, delivering the electronic document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method comprising:
-
generating, using at least one processor, a first attribute set and a second attribute set for use in classifying electronic documents; receiving an electronic document having an unknown query signature; analyzing the electronic document to determine whether the electronic document contains one or more attributes selected from the first attribute set; making a first determination as to whether the electronic document contains a number of attributes selected from the first attribute set above a first threshold; classifying the electronic document based on the first determination; if the electronic document does not contain a number of attributes selected from the first attribute set above the first threshold, analyzing the electronic document to determine whether the electronic document contains one or more attributes selected from the second attribute set; making a second determination as to whether the electronic document contains a number of attributes selected from the second attribute set above the second threshold; and classifying the electronic document based on the second determination. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. A system comprising:
-
at least one processor; and at least one non-transitory computer readable medium storing instructions thereon that, when executed by at least one process, cause the system to; receive an electronic document; analyze the electronic document to extract a first number of attributes from the electronic document, the number of first attributes selected from a first attribute set; obtain a first classification output for the electronic document based upon the number of first attributes extracted from the electronic document; if the first classification output for the electronic document is above a first threshold, withhold delivery of the electronic document based; if the first classification output for the electronic document is below the first threshold, analyze the electronic document to extract a number of second attributes from the electronic document, the number of second attributes selected from a second attribute set, and obtain a second classification output for the electronic document based upon the number of second attributes extracted from the electronic document; if the second classification output for the electronic document is above a second threshold, withhold delivery of the electronic document; and if the second classification output for the electronic document is below the second threshold, deliver the electronic document. - View Dependent Claims (19, 20, 21)
-
Specification