Efficient document processing system and method
First Claim
1. A document processing method comprising:
- incrementally,for each of a set of samples of a document, extracting local features from the sample;
for each of the samples, computing a local score based on the local features extracted from the sample;
estimating a global score for the document based on the local scores currently computed; and
computing a confidence in a decision for the estimated global score, the computed confidence being based on the local scores currently computed; and
outputting a decision for the document based on the estimated score when the computed confidence in the decision reaches a threshold value, the extracting of local features, computing the local score, estimating of the global score, and the computing of the confidence in the decision being repeated with additional samples until the computed confidence in the decision reaches the threshold value.
1 Assignment
0 Petitions
Accused Products
Abstract
A document processing system and method are disclosed. In the method local scores are incrementally computed for document samples, based on local features extracted from the respective sample. A global score is estimated for the document based on the local scores currently computed, i.e., on fewer than all document samples. A confidence in a decision for the estimated global score is computed. The computed confidence is based on the local scores currently computed and, optionally, the number of samples used in computing the estimated global score. A classification decision, such as a categorization or retrieval decision for the document is output, based on the estimated score when the computed confidence in the decision reaches a threshold value.
49 Citations
26 Claims
-
1. A document processing method comprising:
-
incrementally, for each of a set of samples of a document, extracting local features from the sample; for each of the samples, computing a local score based on the local features extracted from the sample; estimating a global score for the document based on the local scores currently computed; and computing a confidence in a decision for the estimated global score, the computed confidence being based on the local scores currently computed; and outputting a decision for the document based on the estimated score when the computed confidence in the decision reaches a threshold value, the extracting of local features, computing the local score, estimating of the global score, and the computing of the confidence in the decision being repeated with additional samples until the computed confidence in the decision reaches the threshold value. - View Dependent Claims (2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
7. A document processing method comprising:
-
incrementally, for each of a set of samples of a document, computing a local score based on local features extracted from the sample; estimating a global score for the document based on the local scores currently computed; computing a confidence in a decision for the estimated global score, the computed confidence being based on the local scores currently computed; computing at least one of a lower bound and an upper bound of a distribution of the local scores currently computed, comprising accessing a data structure which stores values of a parameter B(α
,T) for different numbers of samples and a preselected value of the confidence in the decision and,if Y T>
ε
computing the lower bound according to;
-
-
24. A document processing system comprising:
-
a descriptor generator which, progressively, for each of a set of samples of a document, generates a local descriptor based on features extracted from the sample of the document; a scoring component which incrementally estimates a global score for the document based on the local descriptors currently computed for the set of samples of the document; a confidence computing component which computes a confidence in a decision for the estimated global score, the computed confidence being based on the local descriptors currently computed and optionally the number of the samples used in computing the estimated global score; a decision output component which outputs a decision for the document based on the estimated score when the computed confidence in the decision reaches a threshold value, wherein the generating of the local descriptor based on features extracted from the sample of the document, the estimating of the global score, and the computing of the confidence in the decision are repeated with additional samples until the computed confidence in the decision for the document reaches the threshold value; and a processor which implements the descriptor generator, scoring component, confidence computing component, and decision output component. - View Dependent Claims (25)
-
-
26. A document processing system including a processor and memory, which takes as input a document and outputs a decision for a class when:
-
an estimate of the document score for the class is higher than a threshold ε
or lower than a threshold −
ε
;the estimated document score for the class being computed as an aggregation of local scores computed from local features; the local features and scores being computed incrementally and used to compute a confidence in the class decision; and the extraction of the local features and local score computation being discontinued as soon as the computed confidence in the class decision exceeds a threshold α
, the memory including instructions, implemented by the processor, for incrementally extracting the local features, computing the scores, and computing the confidence in the decision.
-
Specification