Signature Detection in E-Mails
First Claim
Patent Images
1. A method of identifying non-substantive text in a document, comprising:
- capturing, by a processor, one or more blocks of text from each document in a set of documents;
determining, by the processor, the frequency of occurrence for each captured block of text in the set of documents;
identifying, by the processor one or more blocks of text in the set of documents as non-substantive text when the frequency of occurrence is above a received threshold; and
tagging, by the processor, the non-substantive text in each document in the set of documents.
2 Assignments
0 Petitions
Accused Products
Abstract
In an electronic discovery search tool, non-substantive information, such as signatures in e-mail, can bias a search tool and add processing time. A method and system for identifying recurring non-substantive text in documents has been developed so that non-substantive text may be processed or ignored by the search tool, as needed.
16 Citations
23 Claims
-
1. A method of identifying non-substantive text in a document, comprising:
-
capturing, by a processor, one or more blocks of text from each document in a set of documents; determining, by the processor, the frequency of occurrence for each captured block of text in the set of documents; identifying, by the processor one or more blocks of text in the set of documents as non-substantive text when the frequency of occurrence is above a received threshold; and tagging, by the processor, the non-substantive text in each document in the set of documents. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for identifying non-substantive text in a document, comprising:
-
a capturer to capture one or more blocks of text from each document in a set of documents; a generator that generates a checksum for each captured block of text; a calculator to calculate a frequency of occurrence of each checksum in the set of documents; and a tagger for tagging blocks of text as non-substantive when a frequency of occurrence is above a threshold. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A system, comprising:
-
a processor; and a memory, the memory having instructions stored thereon that, when executed by the processor, cause the processor to perform a method of identifying non-substantive text in a document, the method comprising; capturing one or more blocks of text from each document in a set of documents; determining the frequency of occurrence for each captured block of text in the set of documents; identifying one or more blocks of text in the set of documents as non-substantive text when the frequency of occurrence is above a received threshold; and tagging the non-substantive text in each document in the set of documents. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23)
-
Specification