×

Classifying content

  • US 8,577,866 B1
  • Filed: 12/07/2006
  • Issued: 11/05/2013
  • Est. Priority Date: 12/07/2006
  • Status: Active Grant
First Claim
Patent Images

1. A method performed by data processing apparatus, the method comprising:

  • fragmenting a document into a plurality of content pieces, each of the content pieces representing a number of consecutive words in the document, the document having a date of creation and an author;

    comparing the document'"'"'s content pieces to a repository of stored content pieces that occur in a first corpus of documents, each document of the first corpus having a date that is earlier than the date of the document and an author that is not the same as the author of the document;

    determining that one or more of the document'"'"'s content pieces does not match at least one of the stored content pieces;

    in response to determining that one or more of the document'"'"'s content pieces does not match at least one of the stored content pieces, determining that the one or more content pieces are original content pieces, wherein an original content piece is a content piece that has not occurred in the first corpus of documents;

    determining a rate of occurrence over time of the one or more content pieces in a second corpus of documents, wherein determining the rate of occurrence comprises determining the rate of occurrence over time of the one or more content pieces for each of a plurality of time intervals, the rate of occurrence over time of the one or more content pieces in an interval based on a count of a number of the other documents that both contain the first content piece and are associated with a time within the interval, each document of the second corpus having a date that is later than the date of the document and an author that is not the same as the author of the document;

    comparing the rate of occurrence over time of the one or more content pieces to predefined copying patterns that are each associated with a different document classification;

    selecting a document classification that is associated with a predefined copying pattern that is most consistent with the rate of occurrence over time of the one or more content pieces; and

    associating the selected document classification with the document.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×