SYSTEM AND METHOD FOR EFFICIENTLY FINDING EMAIL SIMILARITY IN AN EMAIL REPOSITORY
First Claim
1. A method, comprising:
- identifying, for each email document of a plurality of email documents, whether each subset of one or more subsets of character sequences within the email document is a common-type subset of character sequences or an uncommon-type subset of character sequences;
grouping a first set of the plurality of email documents with only common-type subsets of character sequences in a first searchable group;
grouping a second set of the plurality of email documents with one or more uncommon-type subsets of character sequences in a second searchable group;
identifying whether each subset of character sequences in a particular email document to be evaluated is a common-type or an uncommon-type subset of character sequences;
selectively searching either only one of or both of the first and second searchable groups depending upon whether the particular email contains only common-type subsets of character sequences, only uncommon-type subsets of character sequences, or a combination of common-type and uncommon-type subsets of character sequences; and
identifying selected one or more email documents of the plurality of email documents that may contain content that is similar to the particular email document based on the searching.
1 Assignment
0 Petitions
Accused Products
Abstract
Systems and methods for efficiently identifying emails with content similarity are disclosed. In one embodiment, a method comprises grouping a first set of a plurality of email documents with only common-type subsets of character sequences in a first searchable group, and grouping a second set of the plurality of email documents with one or more uncommon-type subsets of character sequences in a second searchable group. The method further comprises selectively searching either only one of or both of the first and second searchable groups, and identifying selected one or more email documents of the plurality of email documents that may contain content that is similar to the particular email document based on the searching.
30 Citations
20 Claims
-
1. A method, comprising:
-
identifying, for each email document of a plurality of email documents, whether each subset of one or more subsets of character sequences within the email document is a common-type subset of character sequences or an uncommon-type subset of character sequences; grouping a first set of the plurality of email documents with only common-type subsets of character sequences in a first searchable group; grouping a second set of the plurality of email documents with one or more uncommon-type subsets of character sequences in a second searchable group; identifying whether each subset of character sequences in a particular email document to be evaluated is a common-type or an uncommon-type subset of character sequences; selectively searching either only one of or both of the first and second searchable groups depending upon whether the particular email contains only common-type subsets of character sequences, only uncommon-type subsets of character sequences, or a combination of common-type and uncommon-type subsets of character sequences; and identifying selected one or more email documents of the plurality of email documents that may contain content that is similar to the particular email document based on the searching. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer readable medium storing program instructions that are computer executable to:
-
identify, for each email document of a plurality of email documents, whether each subset of one or more subsets of character sequences within the email document is a common-type subset of character sequences or an uncommon-type subset of character sequences; group a first set of the plurality of email documents with only common-type subsets of character sequences in a first searchable group; group a second set of the plurality of email documents with one or more uncommon-type subsets of character sequences in a second searchable group; identify whether each subset of character sequences in a particular email document to be evaluated is a common-type or an uncommon-type subset of character sequences; selectively search either only one of or both of the first and second searchable groups depending upon whether the particular email contains only common-type subsets of character sequences, only uncommon-type subsets of character sequences, or a combination of common-type and uncommon-type subsets of character sequences; and identify selected one or more email documents of the plurality of email documents that may contain content that is similar to the particular email document based on the search.
-
- 9. The computer readable medium of claim 9, wherein each subset of character sequences is a paragraph.
-
15. A system, comprising:
-
one or more processors; a memory storing program instructions that are computer-executable by the one or more processors to; identify, for each email document of a plurality of email documents, whether each subset of one or more subsets of character sequences within the email document is a common-type subset of character sequences or an uncommon-type subset of character sequences; group a first set of the plurality of email documents with only common-type subsets of character sequences in a first searchable group; group a second set of the plurality of email documents with one or more uncommon-type subsets of character sequences in a second searchable group; identify whether each subset of character sequences in a particular email document to be evaluated is a common-type or an uncommon-type subset of character sequences; selectively search either only one of or both of the first and second searchable groups depending upon whether the particular email contains only common-type subsets of character sequences, only uncommon-type subsets of character sequences, or a combination of common-type and uncommon-type subsets of character sequences; and identify selected one or more email documents of the plurality of email documents that may contain content that is similar to the particular email document based on the search. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification