Method and System for Detection of Authors
First Claim
1. A method for detection of authors across different types of information sources, comprising:
- providing a collection of documents from different types of information sources;
obtaining a compression signature for a document;
determining the similarity between compression signatures of two or more documents; and
determining that, if the similarity is greater than a threshold measure, the two or more documents are by the same author.
5 Assignments
0 Petitions
Accused Products
Abstract
A method and system are provided for detection of authors across different types of information sources such as across documents on the Web. The method includes obtaining a compression signature (303) for a document, and determining the similarity (304) between compression signatures of two or more documents. If the similarity is greater than a threshold measure (305), the two or more documents are considered to be by the same author. Scored pairs of documents are clustered (308) to provide a group of documents by the same author. The group of documents by the same author can be used for user profiling, noise reduction, contribution sizing, detecting fraudulent contributions, obtaining other search results by the same author, or mating a document with undisclosed authorship to a document of known author.
45 Citations
20 Claims
-
1. A method for detection of authors across different types of information sources, comprising:
-
providing a collection of documents from different types of information sources; obtaining a compression signature for a document; determining the similarity between compression signatures of two or more documents; and determining that, if the similarity is greater than a threshold measure, the two or more documents are by the same author. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer program product stored on a computer readable storage medium for detection of document authors, comprising computer readable program code means for performing the steps of:
-
providing a collection of documents from different types of information sources; obtaining a compression signature for a document; determining the similarity between compression signatures of two or more documents; and determining that, if the similarity is greater than a threshold measure, the two or more documents are by the same author.
-
-
14. A system for detection of authors across different types of information sources, comprising:
-
a collection of documents from different types of information sources; means for obtaining a compression signature for a document; means for determining the similarity between compression signatures of two or more documents; and means for determining that, if the similarity is greater than a threshold measure, the two or more documents are by the same author. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification