CONTENT-BASED REVISION HISTORY TIMELINES
First Claim
1. A method for tracking content revision history, the method comprising:
- receiving a first document D1;
parsing the first document into a first set of shingles based on a shingle size w, the first set of shingles being represented by S(D1, w);
retrieving a second set of shingles corresponding to a second document D2, wherein the second set of shingles, which is also based on the shingle size w, is represented by S(D2, w);
making a determination with respect to whether the first document is nearly duplicative of the second document, wherein the determination is based on a comparison of S(D1, w) and S(D2, w); and
adding a data pair {S(D1, w), T} to a timeline repository, wherein T represents a revision history timeline that includes the first document.
1 Assignment
0 Petitions
Accused Products
Abstract
A document management system associates content provided within a managed document with a content-based revision history timeline. Multiple documents may be associated with the timeline, wherein each of the documents contains content that is nearly duplicative with respect to content contained in at least one other associated document. Content items can be considered to be nearly duplicative based on an evaluation of resemblance and containment of a set of shingles derived from each content items. If no nearly duplicative content is detected, a new revision history timeline is created. The resulting revision history timelines can be rendered in response to certain user commands, such as document check-out from the document management system, thereby providing users with a visual understanding of how content contained within a given document relates to content contained in other documents managed by the document management system.
18 Citations
20 Claims
-
1. A method for tracking content revision history, the method comprising:
-
receiving a first document D1; parsing the first document into a first set of shingles based on a shingle size w, the first set of shingles being represented by S(D1, w); retrieving a second set of shingles corresponding to a second document D2, wherein the second set of shingles, which is also based on the shingle size w, is represented by S(D2, w); making a determination with respect to whether the first document is nearly duplicative of the second document, wherein the determination is based on a comparison of S(D1, w) and S(D2, w); and adding a data pair {S(D1, w), T} to a timeline repository, wherein T represents a revision history timeline that includes the first document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for content revision history tracking, the system comprising:
-
a timeline repository stored in a memory device, the timeline repository including a plurality of data pairs {S(D, w), T}, wherein D represents a document, T represents a revision history timeline that includes D, and S(D, w) represents a set of shingles that is derived from D and that is based on a shingle size w; a document administration module configured to receive a first document D1 and a user command with respect to the first document; a content comparison module configured to evaluate a similarity measure between S(D1, w) and S(D2, w), wherein D2 represents a second document that is retrieved from a document repository; and a timeline administration module configured to store a data pair {S(D1, w), T12} in the timeline repository in response to determining that the similarity measure exceeds a predetermined threshold similarity, wherein T12 represents a revision history timeline that includes the first and second documents. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A computer program product encoded with instructions that, when executed by one or more processors, causes a process for tracking content revision history to be carried out, the process comprising:
-
receiving a first document containing first content; retrieving a second document containing second content, wherein the second document is not an older version of the first document; making a determination whether the first document is nearly duplicative of the second document based on a comparison of the first and second content; and where the determination indicates that the first and second documents are nearly duplicative of each other, adding a representation of the first document to a timeline repository that already contains a representation of the second document. - View Dependent Claims (18, 19, 20)
-
Specification