×

Document similarity detection

  • US 8,650,199 B1
  • Filed: 06/25/2012
  • Issued: 02/11/2014
  • Est. Priority Date: 06/17/2003
  • Status: Active Grant
First Claim
Patent Images

1. A method performed by one or more server devices, the method comprising:

  • receiving, using one or more processors associated with the one or more server devices, a document;

    selecting, using one or more processors associated with the one or more server devices, terms from the received document to form a plurality of term groups for the received document,each term group, of the plurality of term groups, being associated with an indication that a first term, of the term group, occurs before a second term, of the term group, within the received document;

    identifying, using one or more processors associated with the one or more server devices and from an inverted index of term groups, one or more clusters of a plurality of clusters,each cluster, of the one or more identified clusters, comprising a set of term groups for a respective other document,each respective term group, of the set of term groups, being associated with an indication that a first term, of the respective term group, occurs before a second term, of the respective term group, within the respective other document;

    determining, using one or more processors associated with the one or more server devices, measures of similarity between the plurality of term groups for the received document and the set of term groups for each of the one or more identified clusters; and

    determining, using one or more processors associated with the one or more server devices and based on the determined measures of similarity, that the received document is similar to the respective other document.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×