Computer-implemented system and method for identifying related messages
First Claim
1. A computer-implemented system for identifying related messages, comprising:
- a set of messages, each comprising a header, sender, and transmission time; and
a processor to execute modules for processing the message set, comprising;
a selection module to select one of the messages from the set;
a content comparison module to compare a body of the selected message to a body of a further message in the set;
a labeling module to label the further message as a duplicate of the selected message when the bodies match;
a verification module to verify the duplicate labeling of the further message when the header, sender, and transmission time of the further message matches the header, sender, and transmission time of the selected message;
a removal module to remove the messages with verified duplicate labels;
a sorting module to sort the remaining messages of the set in order of message length;
a length comparison module to compare a shorter message comprising a short text body with a message comprising a longer text body and to determine that the body of the shorter message is included in the body of the longer message;
a marker module to mark the shorter message as a near duplicate of the longer message;
an extractor module to extract metadata from each of the messages in the set;
a compiler to compile the messages of the set into a master array based on the extracted metadata;
a topic module to determine topics of the messages in the master array and to sort the messages in the master array based on the topics;
a unique message comparison module to select another one of the messages and to compare the other selected message with a next message; and
a unique message determination module to determine that the topics of the other selected message and the next message do not match, to determine that the other selected message is a first message of the topic, and to mark the other selected message as a unique message.
6 Assignments
0 Petitions
Accused Products
Abstract
A system and method for identifying related messages are provided. A set of messages, each having a header, sender and transmission time, is obtained. A message is selected from the set. A body of the selected message is compared to a body of a further message in the set. The further message is labeled as a duplicate of the selected message when the bodies match. The duplicate labeling of the further message is verified when the header, sender, and transmission time of the further message matches the header, sender, and transmission time of the selected message. The duplicate messages are removed from the set. The remaining messages are sorted in order of message length. A shorter message is compared with a longer message and is marked as a near duplicate of the longer message when the body of the shorter message is included in the body of the longer message.
-
Citations
14 Claims
-
1. A computer-implemented system for identifying related messages, comprising:
-
a set of messages, each comprising a header, sender, and transmission time; and a processor to execute modules for processing the message set, comprising; a selection module to select one of the messages from the set; a content comparison module to compare a body of the selected message to a body of a further message in the set; a labeling module to label the further message as a duplicate of the selected message when the bodies match; a verification module to verify the duplicate labeling of the further message when the header, sender, and transmission time of the further message matches the header, sender, and transmission time of the selected message; a removal module to remove the messages with verified duplicate labels; a sorting module to sort the remaining messages of the set in order of message length; a length comparison module to compare a shorter message comprising a short text body with a message comprising a longer text body and to determine that the body of the shorter message is included in the body of the longer message; a marker module to mark the shorter message as a near duplicate of the longer message; an extractor module to extract metadata from each of the messages in the set; a compiler to compile the messages of the set into a master array based on the extracted metadata; a topic module to determine topics of the messages in the master array and to sort the messages in the master array based on the topics; a unique message comparison module to select another one of the messages and to compare the other selected message with a next message; and a unique message determination module to determine that the topics of the other selected message and the next message do not match, to determine that the other selected message is a first message of the topic, and to mark the other selected message as a unique message. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented method for identifying related messages, comprising:
-
obtaining a set of messages, each comprising a header, sender, and transmission time; selecting one of the messages from the set; comparing a body of the selected message to a body of a further message in the set; labeling the further message as a duplicate of the selected message when the bodies match; verifying the duplicate labeling of the further message when the header, sender, and transmission time of the further message matches the header, sender, and transmission time of the selected message; removing the messages with verified duplicate labels; sorting the remaining messages of the set in order of message length; comparing a shorter message comprising a short text body with a message comprising a longer text body and determining that the body of the shorter message is included in the body of the longer message; marking the shorter message as a near duplicate of the longer message; extracting metadata from each of the messages in the set; compiling the messages of the set into a master array based on the extracted metadata; determining topics of the messages in the master array; sorting the messages in the master array based on the topics; selecting another one of the messages and comparing the other selected message with a next message; determining that the topics of the other selected message and the next message do not match; determining that the other selected message is a first message of the topic; and marking the other selected message as a unique message. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
Specification