Apparatus, method and computer-accessible medium for explaining classifications of documents
First Claim
1. A non-transitory computer readable medium including instructions thereon that are accessible by a hardware processing arrangement, wherein, when the processing arrangement executes the instructions, the processing arrangement is configured to generate information associated with at least one first classification of at least one document, comprising:
- (a) identifying at least one characteristic of the at least one document, wherein the at least one characteristic includes a plurality of items;
(b) obtaining at least one second classification of the at least one document based on the at least one characteristic of the at least one document;
(c) removing at least one of the items from the at least one document;
(d) obtaining the at least one first classification based on the removal of the at least one of the items; and
(e) generating the information associated with the at least one first classification of the at least one document by repeating procedures (c) and (d) until the at least one first classification is different from the at least one second classification.
1 Assignment
0 Petitions
Accused Products
Abstract
Classification of collections of items such as words, which are called “document classification,” and more specifically explaining a classification of a document, such as a web-page or website. This can include exemplary procedure, system and/or computer-accessible medium to find explanations, as well as a framework to assess the procedure'"'"'s performance. An explanation is defined as a set of words (e.g., terms, more generally) such that removing words within this set from the document changes the predicted class from the class of interest. The exemplary procedure system and/or computer-accessible medium can include a classification of web pages as containing adult content, e.g., to allow advertising on safe web pages only. The explanations can be concise and document-specific, and provide insight into the reasons for the classification decisions, into the workings of the classification models, and into the business application itself. Other exemplary aspects describe how explaining documents'"'"' classifications can assist in improving the data quality and model performance.
-
Citations
29 Claims
-
1. A non-transitory computer readable medium including instructions thereon that are accessible by a hardware processing arrangement, wherein, when the processing arrangement executes the instructions, the processing arrangement is configured to generate information associated with at least one first classification of at least one document, comprising:
-
(a) identifying at least one characteristic of the at least one document, wherein the at least one characteristic includes a plurality of items; (b) obtaining at least one second classification of the at least one document based on the at least one characteristic of the at least one document; (c) removing at least one of the items from the at least one document; (d) obtaining the at least one first classification based on the removal of the at least one of the items; and (e) generating the information associated with the at least one first classification of the at least one document by repeating procedures (c) and (d) until the at least one first classification is different from the at least one second classification. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A non-transitory computer readable medium including instructions thereon that are accessible by a hardware processing arrangement, wherein, when the processing arrangement executes the instructions, the processing arrangement is configured to generate information associated with at least one first classification of a collection, comprising:
-
(a) identifying at least one characteristic of the collection, wherein the at least one characteristic includes a plurality of items; (b) obtaining at least one second classification of the collection based on the at least one characteristic of the collection; (c) removing at least one of the items from the at least one document; (d) obtaining the at least one first classification based on the removal of the at least one of the items; and (e) generating the information associated with the at least one first classification of the collection by repeating procedures (c) and (d) until the at least one first classification is different than the at least one second classification. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A method for generating information associated with at least one first classification of a collection, comprising:
-
(a) identifying at least one characteristic of the collection, wherein the at least one characteristic includes a plurality of items; (b) obtaining at least one second classification of the collection based on the at least one characteristic of the collection; (c) removing at least one of the items from the at least one document; (d) obtaining the at least one first classification based on the removal of the at least one of the items; and (e) using a computer hardware arrangement, generating the information associated with the at least one first classification of the collection by repeating procedures (c) and (d) until the at least one first classification is different than the at least one second classification. - View Dependent Claims (25, 26)
-
-
27. A system configured to generate information associated with at least one first classification of a collection, comprising:
a processing arrangement configured to; (a) identify at least one first characteristic of the collection, wherein the at least one first characteristic includes a plurality of items; (b) obtain at least one second classification of the collection based on the at least one first characteristic of the collection; (c) remove at least one of the items from the at least one document; (d) obtain the at least one first classification based on the removal of the at least one of the items; and (e) generate the information associated with the at least one first classification of the collection by repeating procedures (c) and (d) until the at least one first classification is different than the at least one second classification. - View Dependent Claims (28, 29)
Specification