×

System and method for identifying and categorizing messages extracted from archived message stores

  • US 7,577,656 B2
  • Filed: 04/24/2006
  • Issued: 08/18/2009
  • Est. Priority Date: 03/19/2001
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer-implemented system for identifying messages in a message store, comprising:

  • a digester module configured to encode at least part of metadata associated with and at least part of content contained in each of a plurality of messages in a message store by generating a metadata sequence and a content sequence for each message; and

    a comparer module configured to group the messages into sets by similar metadata sequences and similar content sequences and to compare the messages in each set, comprising;

    a unique marker module configured to mark each such message not matching any other such message in the set as a unique message;

    an exact duplicate marker module configured to mark each such message matching at least one other such message in the set as an exact duplicate message; and

    a near duplicate marker module configured to mark each such message comprising a subset of at least one other such message in the set as a near duplicate message;

    an attachment digester module configured to encode at least part of at least one attachment associated with one or more of the messages by generating an attachment sequence for each attachment;

    a concatenator module configured to concatenate the metadata sequence and the content sequence for the message and the attachment sequence for the at least one attachment into a compound sequence;

    an attachment comparer module configured to compare the compound sequences for the messages;

    an attachment marker module configured to mark each exact duplicate message and each near duplicate message having a compound sequence not matching any other compound sequence in the set as a unique message; and

    a processor to execute each of the modules, which are stored on a computer-readable storage medium.

View all claims
  • 12 Assignments
Timeline View
Assignment View
    ×
    ×