UNSTRUCTURED DOCUMENT CLASSIFICATION
First Claim
1. A method comprising:
- (i) classifying pages of an input document to generate page classifications;
(ii) aggregating the page classifications to generate an input document representation, the aggregating not being based on ordering of the pages; and
(iii) classifying the input document based on the input document representation;
wherein the operations (i), (ii), and (iii) are performed by a digital processor.
1 Assignment
0 Petitions
Accused Products
Abstract
A document classification method comprises: (i) classifying pages of an input document to generate page classifications; (ii) aggregating the page classifications to generate an input document representation, the aggregating not being based on ordering of the pages; and (iii) classifying the input document based on the input document representation. A page classifier for use in the page classifying operation (i) is trained based on pages of a set of labeled training documents having document classification labels. In some such embodiments, the pages of the set of labeled training documents are not labeled, and the page classifier training comprises: clustering pages of the set of labeled training documents to generate page clusters; and generating the page classifier based on the page clusters.
79 Citations
27 Claims
-
1. A method comprising:
-
(i) classifying pages of an input document to generate page classifications; (ii) aggregating the page classifications to generate an input document representation, the aggregating not being based on ordering of the pages; and (iii) classifying the input document based on the input document representation; wherein the operations (i), (ii), and (iii) are performed by a digital processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. An apparatus comprising:
a digital processor configured to perform a method including; (i) classifying pages of an input document to generate page classification, and (ii) aggregating the page classifications to generate an input document representation. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24)
-
25. A storage medium storing instructions that are executable by a digital processor to perform method operations including:
-
(i) classifying pages of an input document to generate page classification, and (ii) aggregating the page classifications to generate an input document representation, the aggregating not based on ordering of the pages in the input document. - View Dependent Claims (26, 27)
-
Specification