Cloud-based plagiarism detection system

US 20150186787A1
Filed: 12/30/2013
Published: 07/02/2015
Est. Priority Date: 12/30/2013
Status: Active Grant

First Claim

Patent Images

1. A system, comprising:

a database for storing a plurality of documents, each of the plurality of documents associated with at least one user, wherein the database is configured to;

receive a group of documents related to a course;

receive at least one edit to one of the group of documents by at least one user;

store the at least one edit and at least one time reference corresponding to the time during which the at least one edit was made;

wherein sharing of content between each of the plurality of documents is restricted;

a feature extraction module configured to;

obtain a writing history for the at least one user associated with the one of the group of documents;

determine a writing pattern associated with the one of the group of documents based on the writing history for the at least one user, the at least one edit, and at least one time reference; and

generate a feature vector for the writing pattern.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Plagiarism may be detected, as disclosed herein, utilizing a database that stores documents for one or more courses. The database may restrict sharing of content between documents. A feature extraction module may receive edits and timestamp the edits to the document. A writing pattern for a particular user or group of users may be discerned from the temporal data and the documents for the particular user or group of users. A feature vector may be generated that represents the writing pattern. A machine learning technique may be applied to the feature vector to determine whether or not a document is plagiarized.

Citations

13 Claims

1. A system, comprising:
- a database for storing a plurality of documents, each of the plurality of documents associated with at least one user, wherein the database is configured to;
  
  receive a group of documents related to a course;
  
  receive at least one edit to one of the group of documents by at least one user;
  
  store the at least one edit and at least one time reference corresponding to the time during which the at least one edit was made;
  
  wherein sharing of content between each of the plurality of documents is restricted;
  
  a feature extraction module configured to;
  
  obtain a writing history for the at least one user associated with the one of the group of documents;
  
  determine a writing pattern associated with the one of the group of documents based on the writing history for the at least one user, the at least one edit, and at least one time reference; and
  
  generate a feature vector for the writing pattern.
- View Dependent Claims (2, 3)
- - 2. The system of claim 1, the feature extraction module further configured to:
    - compare the feature vector to at least one other feature vector to generate a similarity score, wherein the at least one other feature vector corresponds to a second at least one user who is not present in the first at least one user; and
      
      provide an indication of the similarity score.
  - 3. The system of claim 1, the feature extraction module further configured to:
    - train a machine learning technique on a first set of documents that are known to be plagiarized; and
      
      classify the feature vector using the trained machine learning algorithm.

4. A system, comprising:
- a database for storing a plurality of documents, wherein sharing of the documents is restricted;
  
  a processor connected to the database and configured to;
  
  receive an edit to a document stored in the database;
  
  associate a time reference with the edit to the document;
  
  store the edit and the time reference to the database as a document history;
  
  generate a feature vector based on the document history; and
  
  determine a probability that the document is plagiarized based on a classification of the feature vector by a machine learning technique.
- View Dependent Claims (5, 6)
- - 5. The system of claim 4, wherein the probability is based on at least one pairwise comparison of the feature vector for the document to at least one other feature vector for a second document in the database;
  - 6. The system of claim 4, wherein the probability is based on a comparison of the feature history to an independent signal, wherein the independent signal corresponds to other documents generated by an author of the document stored in the database.

7. A method, comprising:
- receiving a group of documents related to a course;
  
  receiving at least one edit to one of the group of documents by at least one user;
  
  storing the at least one edit and at least one time reference corresponding to the time during which the at least one edit was made;
  
  obtaining a writing history for the at least one user associated with the one of the group of documents;
  
  determining a writing pattern associated with the one of the group of documents based on the writing history for the at least one user, the at least one edit, and at least one time reference; and
  
  generating a feature vector for the writing pattern.
- View Dependent Claims (8, 9, 10)
- - 8. The method of claim 7, wherein sharing of content between each document in the group of documents is restricted.
  - 9. The method of claim 7, further comprising:
    - comparing the feature vector to at least one other feature vector to generate a similarity score, wherein the at least one other feature vector corresponds to a second at least one user who is not present in the first at least one user; and
      
      providing an indication of the similarity score.
  - 10. The method of claim 7, further comprising:
    - training a machine learning technique on a first set of documents that are known to be plagiarized; and
      
      classifying the feature vector using the trained machine learning algorithm.

11. A method, comprising:
- receiving an edit to a document stored in a database;
  
  associating a time reference with the edit to the document;
  
  storing the edit and the time reference to the database as a document history;
  
  generating a feature vector based on the document history; and
  
  determining a probability that the document is plagiarized based on a classification of the feature vector by a machine learning technique.
- View Dependent Claims (12, 13)
- - 12. The method of claim 11, wherein the probability is based on at least one pairwise comparison of the feature vector for the document to at least one other feature vector for a second document in the database;
  - 13. The method of claim 11, wherein the probability is based on a comparison of the feature history to an independent signal, wherein the independent signal corresponds to other documents generated by an author of the document stored in the database.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Kumar, Sanjiv, Kernighan, Brian

Granted Patent

US 9,514,417 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 16/93   Document management systems

G06N 20/00   Machine learning

G06Q 10/107   Computer-aided management o...

Cloud-based plagiarism detection system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

13 Claims

Specification

Solutions

Use Cases

Quick Links

Cloud-based plagiarism detection system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

13 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links