SYSTEM AND METHODS FOR COMPUTERIZED INFORMATION GOVERNANCE OF ELECTRONIC DOCUMENTS
First Claim
1. An information governance system comprising:
- A plurality of classifiers which employ cutoffs for classifying at least a portion of a population of incoming documents as documents to be retained and documents to be discarded in accordance with a retention policy comprising a corresponding plurality of pre-defined retention schedules;
training apparatus for training said classifiers based on relevance inputs provided by a human information governance expert regarding a training set of documents within a universe of documents to be governed; and
retain/discard apparatus operative to automatically cause any classified document to be retained and subsequently discarded in accordance with its pre-defined retention schedule including discarding only documents that (a) have been classified as documents to be discarded and (b) have not been classified as documents to be retained, and to automatically cause any document which could not be classified, to be retained as gray area data until further notice.
3 Assignments
0 Petitions
Accused Products
Abstract
An information governance system comprising a plurality of classifiers which employ cutoffs for classifying at least a portion of a population of incoming documents as documents to be retained and documents to be discarded in accordance with a corresponding plurality of pre-defined retention schedules; training apparatus for training said classifiers based on relevance inputs provided by a human information governance expert regarding a training set of documents within a universe of documents to be governed; and apparatus operative to automatically cause any classified document to be retained and subsequently discarded in accordance with its pre-defined retention schedule including discarding only documents that (a) have been classified as documents to be discarded and (b) have not been classified as documents to be retained, and to automatically cause any document which could not be classified, to be retained as gray area data until further notice.
11 Citations
19 Claims
-
1. An information governance system comprising:
-
A plurality of classifiers which employ cutoffs for classifying at least a portion of a population of incoming documents as documents to be retained and documents to be discarded in accordance with a retention policy comprising a corresponding plurality of pre-defined retention schedules; training apparatus for training said classifiers based on relevance inputs provided by a human information governance expert regarding a training set of documents within a universe of documents to be governed; and retain/discard apparatus operative to automatically cause any classified document to be retained and subsequently discarded in accordance with its pre-defined retention schedule including discarding only documents that (a) have been classified as documents to be discarded and (b) have not been classified as documents to be retained, and to automatically cause any document which could not be classified, to be retained as gray area data until further notice. - View Dependent Claims (2, 3, 4, 5, 6, 7, 12)
-
-
8. An information governance method comprising:
-
generating a plurality of classifiers for classifying electronic documents into a corresponding plurality of documentation retention categories; running training iterations thereby to improve at least one of the plurality of classifiers; classifying a repository of electronic documents using said plurality of classifiers and running a Logarithmic stratified sampling-based Quality Assurance process to compute precision in cases of low or unknown richness including ordering documents by their ranks then partitioning the ranks into slices;
[0,p] [p, 2p], [2p, 4p], . . . , and randomly selecting documents to represent each slice, thereby to generate Quality Assurance results;if the Quality Assurance results are not deemed good enough, improve the classifier and return to one of said running steps; if the Quality Assurance results are good enough, use last classifier to implement a plurality of document retention settings corresponding to said plurality of documentation retention categories. - View Dependent Claims (9, 10, 11, 13, 14, 15, 16, 17, 18)
-
-
19. A computer program product, comprising a non-transitory tangible computer readable medium having computer readable program code embodied therein, said computer readable program code adapted to be executed to implement an information governance method comprising:
-
generating a plurality of classifiers for classifying electronic documents into a corresponding plurality of documentation retention categories; running training iterations thereby to improve at least one of the plurality of classifiers; classifying a repository of electronic documents using said plurality of classifiers and running a Logarithmic stratified sampling-based Quality Assurance process to compute precision in cases of low or unknown richness including ordering documents by their ranks then partitioning the ranks into slices;
[0,p] [p, 2p], [2p, 4p], . . . , and randomly selecting documents to represent each slice, thereby to generate Quality Assurance results;if the Quality Assurance results are not deemed good enough, improve the classifier and return to one of said running steps; if the Quality Assurance results are good enough, use last classifier to implement a plurality of document retention settings corresponding to said plurality of documentation retention categories.
-
Specification