System and method for efficiently processing messages stored in multiple message stores
First Claim
1. A system for efficiently identifying unique email messages stored in organized email message stores, comprising:
- a duplicate email message selector removing duplicate email messages containing substantially duplicative content from topically identical email messages logically extracted from a plurality of organized email message stores;
a near-duplicate email message selector removing near-duplicate email messages containing content recursively included within another of the remaining email messages;
a unique email message selector storing unique email messages comprising at least one of a email message storing a single occurrence of a given topic and a email message storing non-recursive content relative to each other such logically extracted email message and storing the unique email messages in a location within a store corresponding to a location within the organized email message stores from which each unique email message originated;
a log identifying the relative source location of each unique email message and cross referencing any of the duplicate email messages and near-duplicate email messages relating thereto; and
a cross-reference keyed collection identifying the relative source location of each unique email message and any of the duplicate email message and near-duplicate email messages relating thereto.
13 Assignments
0 Petitions
Accused Products
Abstract
A system and method for efficiently processing messages stored in multiple message stores is described. Metadata identifying a range of topically identical messages extracted from a plurality of message stores storing a multiplicity of messages to be processed is iteratively copied. The metadata for the extracted range of topically identical messages is categorized. Those messages containing substantially duplicative content within the extracted range are identified as duplicate messages. Those non-duplicate messages within the extracted range are tallied into an ordering of conversation thread length. Those messages whose content is recursively-included content within another of the tallied non-duplicate messages are classified as near-duplicate messages. The remaining messages are designated as unique messages containing substantially non-duplicative content.
164 Citations
30 Claims
-
1. A system for efficiently identifying unique email messages stored in organized email message stores, comprising:
-
a duplicate email message selector removing duplicate email messages containing substantially duplicative content from topically identical email messages logically extracted from a plurality of organized email message stores;
a near-duplicate email message selector removing near-duplicate email messages containing content recursively included within another of the remaining email messages;
a unique email message selector storing unique email messages comprising at least one of a email message storing a single occurrence of a given topic and a email message storing non-recursive content relative to each other such logically extracted email message and storing the unique email messages in a location within a store corresponding to a location within the organized email message stores from which each unique email message originated;
a log identifying the relative source location of each unique email message and cross referencing any of the duplicate email messages and near-duplicate email messages relating thereto; and
a cross-reference keyed collection identifying the relative source location of each unique email message and any of the duplicate email message and near-duplicate email messages relating thereto. - View Dependent Claims (2, 3, 4)
a thread length selector sorting the email messages remaining after the duplicate email messages are removed in order of conversation thread length.
-
-
3. A system according to claim 1, further comprising:
-
a email message processor extracting metadata identifying a relative source location for each email message within the organized email message stores; and
the near-duplicate email message selector and the near-duplicate email message selector processing the metadata during removal of the duplicate email messages and the near-duplicate email messages.
-
-
4. A system according to claim 1, further comprising:
the duplicate email message selector and the near-duplicate email message selector storing the duplicate email messages and the near-duplicate email messages for at least one unique email message into the store by identifying each duplicate email message and the near-duplicate email message using the cross-reference keyed collection.
-
5. A method for efficiently identifying unique email messages stored in organized email message stores, comprising:
-
removing duplicate email messages containing duplicative content from topically identical email messages logically extracted from a plurality of organized email message stores as extracted email messages;
removing near-duplicate email messages containing content recursively included within another of the remaining email messages;
storing unique email messages comprising at least one of a email message storing a single occurrence of a given topic and an email message storing non-recursive content relative to each other such logically extracted email message;
storing the unique email messages in a location within a store corresponding to a location within the organized email message stores from which each unique email message originated;
maintaining a log identifying the relative source location of each unique email message and cross referencing any of the duplicate email messages and near-duplicate email messages relating thereto; and
maintaining a cross-reference keyed collection identifying the relative source location of each unique email message and any of the duplicate email message and near-duplicate email messages relating thereto. - View Dependent Claims (6, 7, 8, 9)
sorting the email messages remaining after the duplicate email messages are removed in order of conversation thread length.
-
-
7. A method according to claim 5, further comprising:
-
extracting metadata identifying a relative source location for each email message within the organized email message stores; and
processing the metadata during removal of the duplicate email messages and the near-duplicate email messages.
-
-
8. A method according to claim 5, further comprising:
storing the duplicate email messages and the near-duplicate email messages for at least one unique email message into the store by identifying each duplicate email message and the near-duplicate email message using the cross-reference keyed collection.
-
9. A computer-readable storage medium holding code for performing the method of claim 5.
-
10. A system for efficiently processing email messages stored in multiple email message stores, comprising:
-
an email message processor iteratively copying metadata identifying a range of topically identical email messages extracted from a plurality of email message stores storing a multiplicity of email messages to be processed and categorizing the metadata for the extracted range of topically identical email messages, the email message process further comprising;
a duplicate email message selector identifying those email messages containing duplicative content within the extracted range as duplicate email messages;
a thread length selector tallying those non-duplicate email messages within the extracted range into an ordering of conversation thread length;
a near-duplicate email message selector classifying those email messages whose content is recursively-included content within another of the tallied non-duplicate email messages as near-duplicate email messages;
a unique email message selector designating the remaining email messages as unique email messages containing substantially non-duplicative content;
a store storing the unique email messages and comprising a plurality of relative stores and folders corresponding to the email message stores from which each unique email message originated;
a log comprising an entry for each of the unique email messages, each log entry storing email message source location information and identification information for any such duplicate email message and near-duplicate email message related thereto; and
a cross-reference keyed collection comprising an entry for each of the duplicate email message and the near-duplicate email messages keyed to identification information for one such unique email message associated therewith. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
the email message processor extracting the metadata for the email messages to be processed from the email message stores and sorting the metadata according to topic
-
-
12. A system according to claim 11, further comprising:
the duplicate message selector sorting the metadata for the extracted range of topically identical email messages according to content prior to identifying the duplicate email messages.
-
13. A system according to claim 10, further comprising:
the thread length selector sorting the metadata for the non-duplicate email messages by content prior to tallying the non-duplicate email messages.
-
14. A system according to claim 10, further comprising:
the duplicate email message selector verifying the duplicate email messages by comparing indicia in addition to the content stored therein.
-
15. A system according to claim 14, wherein the indicia comprises header information, further comprising:
the duplicate email message selector comparing the header information stored with each of the duplicate email messages.
-
16. A system according to claim 10, further comprising:
the thread length selector determining each conversation thread length based on thread markers comprising at least one of keywords, delimiter strings, and relative location within each email message.
-
17. A system according to claim 10, further comprising:
the store storing the duplicate email messages and the near-duplicate email messages copied thereto by identifying the associated unique email message with the cross-reference keyed collection.
-
18. A system according to claim 10, wherein each email message store comprises a MAPI-compliant email message store.
-
19. A method for efficiently processing email messages stored in multiple email message stores, comprising:
-
iteratively copying metadata identifying a range of topically identical email messages extracted from a plurality of email message stores storing a multiplicity of email messages to be processed; and
categorizing the metadata for the extracted range of topically identical email messages, comprising;
identifying those email messages containing duplicative content within the extracted range as duplicate email messages;
tallying those non-duplicate email messages within the extracted range into an ordering of conversation thread length;
classifying those email messages whose content is recursively-included content within another of the tallied non-duplicate email messages as near-duplicate email messages;
designating the remaining email messages as unique email messages containing non-duplicative content;
storing the unique email messages in a store comprising a plurality of relative stores and folders corresponding to the email message stores from which each unique email message originated;
maintaining a log comprising an entry for each of the unique email messages, each log entry storing email message source location information and identification information for any such duplicate email message and near-duplicate email message related thereto; and
maintaining a cross-reference keyed collection comprising an entry for each of the duplicate email messages and the near-duplicate email messages keyed to identification information for one such unique email message associated therewith. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28)
extracting the metadata for the email messages to be processed from the email message stores; and
sorting the metadata according to topic.
-
-
21. A method according to claim 20, further comprising:
sorting the metadata for the extracted range of topically identical email messages according to content prior to identifying the duplicate email messages.
-
22. A method according to claim 19, further comprising:
sorting the metadata for the non-duplicate email messages by content prior to tallying the non-duplicate email messages.
-
23. A method according to claim 19, further comprising:
verifying the duplicate email messages by comparing indicia in addition to the content stored therein.
-
24. A method according to claim 23, wherein the indicia comprises header information, further comprising:
comparing the header information stored with each of the duplicate email messages.
-
25. A method according to claim 19, further comprising:
determining each conversation thread length based on thread markers comprising at least one of keywords, delimiter strings, and relative location within each email message.
-
26. A method according to claim 19, further comprising:
storing the duplicate email messages and the near-duplicate email messages copied thereto by identifying the associated unique email message with the cross-reference keyed collection.
-
27. A method according to claim 19, wherein each email message store comprises a MAPI-compliant email message store.
-
28. A computer-readable storage medium holding code for performing the method of claim 19.
-
29. A system for categorizing email messages stored in email message stores into discrete categories, comprising:
-
a master array storing metadata for each email message to be processed from a plurality of email message stores, the metadata identifying the source email message store and relative storage location for the email message;
means for sorting the metadata according to topic and comparing content of email messages with similar topics to identify those email messages containing duplicative content;
means for sorting the email messages according to content by referencing the metadata and ordering the metadata in order of conversation thread length;
means for comparing the content to identify those email messages whose content is recursively-included content within another of the email messages; and
means for identifying the remaining email messages by referencing the metadata as unique email messages;
means for storing the unique email messages and comprising a plurality of relative stores and folders corresponding to the email message stores from which each unique email message originated;
means for maintaining a log comprising an entry for each of the unique email messages, each log entry storing email message source location information and identification information for any such non-unique email message related thereto; and
means for maintaining a cross-reference keyed collection comprising an entry for any such non-unique email message keyed to identification information for one such unique email message associated therewith.
-
-
30. A method for categorizing messages stored in email message stores into discrete categories, comprising:
-
extracting metadata for each email message to be processed from a plurality of email message stores, the metadata identifying the source email message store and relative storage location for the email message;
sorting the metadata according to topic and comparing content of email messages with similar topics to identify those email messages containing substantially duplicative content;
sorting the email messages according to content by referencing the metadata and ordering the metadata in order of conversation thread length;
comparing the content to identify those email messages whose content is recursively-included content within another of the email messages;
identifying the remaining email messages by referencing the metadata as unique email messages;
storing the unique email messages in a store comprising a plurality of relative stores and folders corresponding to the email massage stores from which each unique email message originated;
maintaining a log comprising an entry for each of the unique email messages, each log entry storing email message source location information and identification information for any such non-unique email message related thereto; and
maintaining a cross-reference keyed collection comprising an entry for any such non-unique email message keyed to identification information for one such unique email message associated therewith.
-
Specification