Providing collection transparency information to an end user to achieve a guaranteed quality document search and production in electronic data discovery
First Claim
1. A method for providing collection transparency information to an end user to achieve a guaranteed quality document search and production in electronic data discovery, comprising the steps of:
- providing a content management system configured for performing the steps of;
providing a search results page;
extracting indexing state and text index-ability information from a content repository with regard to files identified on said search results page;
classifying said files in said content repository with regard to text index-ability and indexing state as follows;
not indexable;
types of files that are known not to be indexable and wherein said content management system does not attempt to perform full-text indexing on said types of files to enhance user experience with time;
indexable, but not indexed yet;
indexable files that have not been indexed yet;
failed to index;
files that are considered to be indexable for which an indexing attempt failed; and
indexable and indexed;
files that were successfully indexed;
wherein text index-ability is any of;
not indexable and indexable and wherein indexing state is any of;
indexed, not indexed yet, and failed to index;
collecting statistical information on what file types are not indexable by collecting statistics of indexing failure per file type, said indexing failure per file type based on whether;
a file was of an indexable type but was corrupt;
a file was of an indexable type but indexing failed;
files of different types use a same file extension; and
a file was treated as indexable but it was not an indexable file type;
observing said collected statistics of indexing failure per file type by, over time, collecting the following information per file type;
how many files of each type were indexed successfully;
how many files of each type failed to index; and
how many files of each type have been uploaded;
based on said observance of said collected statistics of indexing failure per file type, calculating a ratio of indexing failure in accordance with the following formula;
ratio of failure of a given type=number of failed files of a given type/number of files of a given type uploaded and attempted to index;
reporting file types having a high indexing failure ratio;
classifying said file types having a high indexing failure ratio as not indexable and adding said file types to a do not index list;
displaying indexing and extraction status of files contained in said content repository and pertaining to a given matter or legal request or a particular search query in the processing status area of the search results page based upon said classifying said files, as well as said extracting indexing state and text index-ability information from said content repository; and
providing a processing status area of said search results page and a file detail information page for displaying a processing status warning next to a file entry to allow a user to see what processing problems occurred with each file;
wherein the steps of the method are performed on one or more computing devices.
2 Assignments
0 Petitions
Accused Products
Abstract
Techniques are provided that enable the display of full text index-ability, indexing, and container extraction status of files in a collection repository in connection with content management. Further, techniques are provided that: guarantee the user knows which files failed to index and explode and which files are not indexable; tell the user which files have not been indexed yet, so they are not omitted from the analysis; allow users to work on the collected files without waiting for maximum possible indexing period; allow users to start working immediately on collected content; allow for displaying indexing and extraction status information relevant only to the search query; allow for automatic and manual update of a list of un-indexable file types; and allow for informing users about the processing status of a collection by sending notifications, displaying alerts, and providing appropriate views.
172 Citations
14 Claims
-
1. A method for providing collection transparency information to an end user to achieve a guaranteed quality document search and production in electronic data discovery, comprising the steps of:
-
providing a content management system configured for performing the steps of; providing a search results page; extracting indexing state and text index-ability information from a content repository with regard to files identified on said search results page; classifying said files in said content repository with regard to text index-ability and indexing state as follows; not indexable;
types of files that are known not to be indexable and wherein said content management system does not attempt to perform full-text indexing on said types of files to enhance user experience with time;indexable, but not indexed yet;
indexable files that have not been indexed yet;failed to index;
files that are considered to be indexable for which an indexing attempt failed; andindexable and indexed;
files that were successfully indexed;wherein text index-ability is any of;
not indexable and indexable and wherein indexing state is any of;
indexed, not indexed yet, and failed to index;collecting statistical information on what file types are not indexable by collecting statistics of indexing failure per file type, said indexing failure per file type based on whether; a file was of an indexable type but was corrupt; a file was of an indexable type but indexing failed; files of different types use a same file extension; and a file was treated as indexable but it was not an indexable file type; observing said collected statistics of indexing failure per file type by, over time, collecting the following information per file type; how many files of each type were indexed successfully; how many files of each type failed to index; and how many files of each type have been uploaded; based on said observance of said collected statistics of indexing failure per file type, calculating a ratio of indexing failure in accordance with the following formula;
ratio of failure of a given type=number of failed files of a given type/number of files of a given type uploaded and attempted to index;reporting file types having a high indexing failure ratio; classifying said file types having a high indexing failure ratio as not indexable and adding said file types to a do not index list; displaying indexing and extraction status of files contained in said content repository and pertaining to a given matter or legal request or a particular search query in the processing status area of the search results page based upon said classifying said files, as well as said extracting indexing state and text index-ability information from said content repository; and providing a processing status area of said search results page and a file detail information page for displaying a processing status warning next to a file entry to allow a user to see what processing problems occurred with each file; wherein the steps of the method are performed on one or more computing devices. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
Specification