×

Computer-implemented system and method for identifying near duplicate messages

  • US 8,626,767 B2
  • Filed: 06/03/2013
  • Issued: 01/07/2014
  • Est. Priority Date: 03/19/2001
  • Status: Expired due to Term
First Claim
Patent Images

1. A computer-implemented system for identifying near duplicate messages, comprising:

  • a processor coupled to a memory to execute the following modules comprising;

    a message grouping module to group by conversation thread, messages each comprising a content body, wherein one or more of the messages also includes an attachment;

    a message sorting module to sort the messages for each conversation thread in order of message length;

    a message selection module to select for one of the threads at least one of the messages and to compare the body of the selected message with the body of one such shorter message in that thread;

    a determination module to determine that the body of the shorter message is included in the body of the selected message;

    a message relationship module to determine a relationship between the selected message and the shorter message by marking the shorter message as a near duplicate of the selected message if the selected message and the shorter message do not have attachments and by comparing hash codes of the attachments for the selected message and the shorter message, if the selected message and the shorter message each have attachments, and marking the shorter message as a near duplicate message of the selected message when the hash codes of the attachments match.

View all claims
  • 6 Assignments
Timeline View
Assignment View
    ×
    ×