COMPARING SIMILARITY BETWEEN DOCUMENTS FOR FILTERING UNWANTED DOCUMENTS
First Claim
1. A computer-implemented method of determining similarity between a reference document and a candidate document, comprising:
- segmenting the reference document into a plurality of reference data items;
segmenting the candidate document into a plurality of document data items;
computing a count representing a number of document data items matching the reference data items; and
computing a similarity index representing similarity between the reference document and the candidate document based on the count.
2 Assignments
0 Petitions
Accused Products
Abstract
A mechanism for efficiently determining similarity between documents. A set of reference data items is generated by processing a reference document. A similarity index representing similarity between a candidate document and the reference documents is obtained by counting segments of the candidate document matching the reference data items. The candidate document is a message transmitted in a communication system where the message is compared against one or more reference documents representing unwanted messages to filter and block unwanted messages from being transmittal or propagated.
-
Citations
30 Claims
-
1. A computer-implemented method of determining similarity between a reference document and a candidate document, comprising:
-
segmenting the reference document into a plurality of reference data items; segmenting the candidate document into a plurality of document data items; computing a count representing a number of document data items matching the reference data items; and computing a similarity index representing similarity between the reference document and the candidate document based on the count. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implemented method of filtering unwanted communications, comprising:
-
receiving a reference document identified as unwanted; receiving a communication transmitted through a communication channel; comparing the communication with a first set of reference data items associated with the reference document to generate a first similarity index, the first similarity index representing similarity between the communication and the first reference document; determining whether the communication matches the first reference document based on the first similarity index; and designating the communication to be blocked from further transmission through the communication channel responsive to determining that the communication matches the first reference document. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A communication system for filtering unwanted communications, comprising:
-
an communication processor configured to compare a communication with a first set of reference data items associated with a first reference document, the reference document identified as unwanted, to generate a first similarity index, the first similarity index representing similarity between the communication and the first reference document, the communication processor configured to determine whether the communication matches the first reference document based on the first similarity index, the communication processor configured to designate the communication to be blocked from further transmission through the communication channel responsive to determining that the communication matches the first reference document; and a communication module configured to receive communications to be transmitted and transmit the communication responsive to one or more recipients responsive to the communication not being blocked by the communication processor. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A computer readable storage medium storing instructions for filtering unwanted communications, the instructions when executed by a processor, cause the processor to:
-
receive a first reference document identified as unwanted; receive a communication to be routed or published in a communication channel controlled by the computing device; compare the communication with a first set of the reference data items associated with the first reference document to generate a first similarity index, the first similarity index representing similarity between the communication and the first reference document; determine whether the communication matches the first reference document based on the first similarity index; and designate the communication to be blocked from further transmission through the communication channel responsive to determining that the communication matches the first reference document. - View Dependent Claims (28)
-
-
29. A computer-implemented method of filtering unwanted communications, comprising steps for:
-
receiving a communication transmitted through a communication channel; determining whether the communication is similar within a predefined tolerance to a reference document identified as unwanted; and designate the communication to be blocked from access by one or more intended recipients responsive to determining that the message is similar to the reference document. - View Dependent Claims (30)
-
Specification