Methods and systems for uniquely identifying digital content for eDiscovery
First Claim
1. A method of enabling changes in website content to be detected, the method comprising:
- receiving an address;
accessing at a first time period, by a computerized system comprising at least one computing device, a web page corresponding to the address;
identifying, by the system, HTML web page text of the web page accessed at the first time period;
identifying, by the system, items of content linked to by the web page accessed at the first time period;
storing the identified HTML web page text accessed at the first time period;
accessing and storing the items of content linked to by the web page accessed at the first time period;
calculating, by the system, a first hash value corresponding to the identified HTML web page text accessed at the first time period, wherein calculating the first hash value does not include in the calculation of the first hash value the items of content linked to by the web page accessed at the first time period;
calculating a first set of hash values for respective items of content linked to by the web page accessed at the first time period;
calculating a first aggregated hash value based on the first hash value and the first set of hash values for respective items of content linked to by the web page accessed at the first time period;
storing the first hash value, the first set of hash values, and the first aggregated hash value in association with a date and time corresponding to the first time period and in association with a first identifier;
accessing, at a second time period, the web page corresponding to the address;
identifying, by the system, HTML web page text of the web page accessed at the second time period;
identifying, by the system, items of content linked to by the web page accessed at the second time period;
storing the identified HTML web page text accessed at the second time period;
accessing and storing the items of content linked to by the web page accessed at the second time period;
calculating, by the system, a second hash value corresponding to the identified HTML web page text accessed at the second time period;
calculating a second set of hash values for respective items of content linked to by the web page accessed at the second time period;
calculating a second aggregated hash value based on the second hash value and the second set of hash values for respective items of content linked to by the web page accessed at the second time period;
storing the second hash value, the second set of hash values, and the second aggregated hash value in association with a date and time corresponding to the second time period;
using the first hash value, corresponding to the identified HTML web page text accessed at the first time period, and the second hash value, corresponding to the identified HTML web page text accessed at the second time period, detecting whether the webpage HTML text has changed, and if the webpage HTML text has changed, and providing a first visual indication indicating changes in the webpage HTML text;
using the first set of hash values for respective items of content linked to by the web page accessed at the first time period, and the second set of hash values for respective items of content linked to by the web page accessed at the second time period, detecting if the respective items of content linked to by the web page have changed and providing a second visual indication indicating changes in the content linked to by the web page and indicating which content linked to by the web page has changed;
using the first aggregated hash value, calculated based on the first hash value and the first set of hash values for respective items of content linked to by the web page accessed at the first time period, the second aggregated hash value, calculated based on the second hash value and the second set of hash values for respective items of content linked to by the web page accessed at the second time period, to detect whether the webpage or respective items of content linked to by the web page have changed, and providing a corresponding third visual indication.
5 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods provide for the collection of content, such as webpage content, and for detection of changes in content. Files composing a document at a different time periods may be accessed and sets of hash values corresponding to files composing the document at the different periods may be calculated. A determination is made as to whether a file in the identified files at the different time periods is an HTML file, and if so an additional hash value corresponding to the HTML file is calculated. Aggregated hash values may be calculated based on hash values in the sets of hash values. A report may be generated reporting hash values for the document as it exists at the different time periods, including the hash values for the files composing the document, the additional hash values for respective HTML files, and the aggregated hash values. Changes in hash values may be indicated.
179 Citations
25 Claims
-
1. A method of enabling changes in website content to be detected, the method comprising:
-
receiving an address; accessing at a first time period, by a computerized system comprising at least one computing device, a web page corresponding to the address; identifying, by the system, HTML web page text of the web page accessed at the first time period; identifying, by the system, items of content linked to by the web page accessed at the first time period; storing the identified HTML web page text accessed at the first time period; accessing and storing the items of content linked to by the web page accessed at the first time period; calculating, by the system, a first hash value corresponding to the identified HTML web page text accessed at the first time period, wherein calculating the first hash value does not include in the calculation of the first hash value the items of content linked to by the web page accessed at the first time period; calculating a first set of hash values for respective items of content linked to by the web page accessed at the first time period; calculating a first aggregated hash value based on the first hash value and the first set of hash values for respective items of content linked to by the web page accessed at the first time period; storing the first hash value, the first set of hash values, and the first aggregated hash value in association with a date and time corresponding to the first time period and in association with a first identifier; accessing, at a second time period, the web page corresponding to the address; identifying, by the system, HTML web page text of the web page accessed at the second time period; identifying, by the system, items of content linked to by the web page accessed at the second time period; storing the identified HTML web page text accessed at the second time period; accessing and storing the items of content linked to by the web page accessed at the second time period; calculating, by the system, a second hash value corresponding to the identified HTML web page text accessed at the second time period; calculating a second set of hash values for respective items of content linked to by the web page accessed at the second time period; calculating a second aggregated hash value based on the second hash value and the second set of hash values for respective items of content linked to by the web page accessed at the second time period; storing the second hash value, the second set of hash values, and the second aggregated hash value in association with a date and time corresponding to the second time period; using the first hash value, corresponding to the identified HTML web page text accessed at the first time period, and the second hash value, corresponding to the identified HTML web page text accessed at the second time period, detecting whether the webpage HTML text has changed, and if the webpage HTML text has changed, and providing a first visual indication indicating changes in the webpage HTML text; using the first set of hash values for respective items of content linked to by the web page accessed at the first time period, and the second set of hash values for respective items of content linked to by the web page accessed at the second time period, detecting if the respective items of content linked to by the web page have changed and providing a second visual indication indicating changes in the content linked to by the web page and indicating which content linked to by the web page has changed; using the first aggregated hash value, calculated based on the first hash value and the first set of hash values for respective items of content linked to by the web page accessed at the first time period, the second aggregated hash value, calculated based on the second hash value and the second set of hash values for respective items of content linked to by the web page accessed at the second time period, to detect whether the webpage or respective items of content linked to by the web page have changed, and providing a corresponding third visual indication. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method comprising:
-
receiving an address for a document; identifying, by a computer system comprising at least one computing device, files composing the document, including files linked to by the document, at a first time period; calculating, by the computer system, a first set of hash values including respective hash values corresponding to the respective accessed files composing the document at the first time period, the first set of hash values including hash values for respective files linked to by the document accessed at the first time period; determining, by the computer system, if a file in the identified files composing the document at the first time period is an HTML file; at least partly in response to determining that a file in the identified files composing the document at the first time period is an HTML file, calculating, by the computer system, a first additional hash value corresponding to the HTML file, wherein calculating the first additional hash value does not include in the calculation of the first additional hash value files linked to by the document accessed at the first time period; calculating, by the computer system, a first aggregated hash value based on hash values in the first set of hash values and on the first additional hash value; identifying, by the computer system, files composing the document, including files linked to by the document, at a second time period; calculating, by the computer system, a second set of hash values including respective hash values corresponding to the respective accessed files composing the document at the second time period, wherein calculating the second additional hash value does not include in the calculation of the second additional hash value files linked to by the document accessed at the second time period; determining, by the computer system, if a file in the identified files composing the document at the second time period is an HTML file; at least partly in response to determining that a file in the identified files composing the document at the second time period is an HTML file, calculating, by the computer system, a second additional hash value corresponding to the HTML file, wherein the second additional hash value does not include hash values for respective files linked to by the document accessed at the second time period; calculating, by the computer system, a second aggregated hash value based on hash values in the second set of hash values and on the second additional hash value; using the first additional hash value, corresponding to the HTML file composing the document at the first time period, and the second additional hash value, corresponding to the HTML file composing the document at the second time period, detecting whether HTML file has changed, and if the HTML file has changed, providing a first visual indication indicating that the HTML file has changed; using the first set of hash values, including respective hash values corresponding to the respective accessed files composing the document, including files linked to by the document, at the first time period, and the second set of hash values, including respective hash values corresponding to the respective accessed files composing the document, including files linked to by the document, at the second time period, detecting if the respective accessed files composing the document have changed, and if the respective accessed files composing the document have changed, providing a corresponding second visual indication indicating which accessed files composing the document have changed; and using the first aggregated hash value, calculated based on hash values in the first set of hash values, the first set of hash values including hash values for respective files linked to by the document accessed at the first time period, and on the first additional hash value, and the second aggregated hash value, calculated based on hash values in the second set of hash values, the second set of hash values including hash values for respective files linked to by the document accessed at the second time period, and on the second additional hash value, to detect whether the document or files composing the document have changed, and providing a corresponding third visual indication. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A system, comprising:
-
a computing system comprising at least one computing device; a non-transitory computer storage medium having stored thereon executable instructions that direct the computing system to perform operations comprising; receiving an address for a document; identifying, files composing the document, including files linked to by the document, at a first time period; calculating, by the computer system, a first set of hash values including respective hash values corresponding to the respective accessed files composing the document at the first time period, the first set of hash values including hash values for respective files linked to by the document accessed at the first time period; determining, by the computer system, if a file in the identified files composing the document at the first time period is an HTML file; at least partly in response to determining that a file in the identified files composing the document at the first time period is an HTML file, calculating, by the computer system, a first additional hash value corresponding to the HTML file, wherein calculating the first additional hash value does not include in the calculation of the first additional hash value files linked to by the document accessed at the first time period; calculating, by the computer system, a first aggregated hash value based on hash values in the first set of hash values and on the first additional hash value; identifying, by the computer system, files composing the document, including files linked to by the document, at a second time period; calculating, by the computer system, a second set of hash values including respective hash values corresponding to the respective accessed files composing the document at the second time period, the second set of hash values including hash values for respective files linked to by the document accessed at the second time period; determining, by the computer system, if a file in the identified files composing the document at the second time period is an HTML file; at least partly in response to determining that a file in the identified files composing the document at the second time period is an HTML file, calculating, by the computer system, a second additional hash value corresponding to the HTML file, wherein calculating the second additional hash value does not include in the calculation of the second additional hash value files linked to by the document accessed at the second time period; calculating, by the computer system, a second aggregated hash value based on hash values in the second set of hash values and on the second additional hash value; using the first additional hash value, corresponding to the HTML file composing the document at the first time period, and the second additional hash value, corresponding to the HTML file composing the document at the second time period, detecting whether the HTML file has changed, and if the HTML file has changed, providing a first visual indication indicating that a change to the HTML file has occurred; using the first set of hash values, including respective hash values corresponding to the respective accessed files composing the document, including files linked to by the document, at the first time period, and the second set of hash values, including respective hash values corresponding to the respective accessed files composing the document, including files linked to by the document, at the second time period, detecting if the respective accessed files composing the document have changed, and if the respective accessed files composing the document have changed, providing a corresponding second visual indication indicating which accessed files composing the document have changed; and using the first aggregated hash value, calculated based on hash values in the first set of hash values, the first set of hash values including hash values for respective files linked to by the document accessed at the first time period, and on the first additional hash value, and the second aggregated hash value, calculated based on hash values in the second set of hash values, the second set of hash values including hash values for respective files linked to by the document accessed at the second time period, and on the second additional hash value, to detect whether the document or files composing the document have changed, and providing a corresponding third visual indication. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
-
24. A method comprising:
-
receiving an address for a document; identifying, by a computer system comprising at least one computing device, files composing the document, including files linked to by the document, at a first time period; calculating, by the computer system, a first hash value based at least on text present in the document at the first time period, wherein calculating the first hash value does not include in the calculation of the first hash value files linked to by the document accessed at the first time period; calculating, by the computer system, a first set of hash values including respective hash values corresponding to the respective accessed files composing the document at the first time period, the first set of hash values including hash values for respective files linked to by the document accessed at the first time period; calculating, by the computer system, a first aggregated hash value based on hash values in the first set of hash values and the first hash value; calculating, by the computer system, a second hash value based at least on text present in the document at a second time period, wherein calculating the second hash value does not include in the calculation of the second hash value files linked to by the document accessed at the second time period; calculating, by the computer system, a second set of hash values including respective hash values corresponding to the respective accessed files composing the document at the second time period, the second set of hash values including hash values for respective files linked to by the document accessed at the second time period; calculating, by the computer system, a second aggregated hash value based on hash values in the second set of hash values and the second hash value; using the first hash value, based at least on text present in the document at the first time period, and the second hash value, based at least on text present in the document at a second time period, detecting whether document text has changed, and if the document text has changed, providing a first visual indication indicating that a change to the document text has occurred; using the first set of hash values, including respective hash values corresponding to the respective accessed files composing the document, including files linked to by the document, at the first time period, and the second set of hash values, including respective hash values corresponding to the respective accessed files composing the document, including files linked to by the document, at the second time period, detecting if the respective accessed files composing the document have changed, and if the respective accessed files composing the document have changed, providing a corresponding second visual indication indicating which accessed files composing the document have changed; and using the first aggregated hash value, calculated based on hash values in the first set of hash values and the first hash value, the first set of hash values including hash values for respective files linked to by the document accessed at the first time period, and the second aggregated hash value, calculated second aggregated hash value based on hash values in the second set of hash values, the second set of hash values including hash values for respective files linked to by the document accessed at the second time period, and the second hash value, to detect whether the document or files composing the document have changed, and providing a corresponding third visual indication. - View Dependent Claims (25)
-
Specification