Change-detection tool indicating degree and location of change of internet documents by comparison of cyclic-redundancy-check(CRC) signatures
First Claim
1. A change-detection web server comprising:
- a network connection for transmitting and receiving packets from a remote client and a remote document server;
a responder, coupled to the network connection, for communicating with the remote client, the responder registering a document for change detection by receiving from the remote client a uniform-resource-locator (URL) identifying the document, the responder fetching the document from the remote document server and generating an original checksum for a checked portion of the document, the checked portion being less than the entire document;
a database, coupled to the responder, for receiving the URL and the original checksum from the responder when the document is registered by the remote client, the database for storing a plurality of records each containing a URL and a checksum for a registered document;
a periodic minder, coupled to the database and the network connection, for periodically re-fetching the document from the remote document server by transmitting the URL from the database to the network connection, the periodic minder receiving a fresh copy of the document from the remote document server, the periodic minder generating a fresh checksum of a portion of the fresh copy of the document and comparing the fresh checksum to the original checksum, the periodic minder signaling a detected change to the remote client when the fresh checksum does not match the original checksum;
whereby a change in the document is detected by comparing a checksum for the checked portion of the document, wherein changes in portions of the document outside the checked portion are not signaled to the remote client.
3 Assignments
0 Petitions
Accused Products
Abstract
A change-detection web server automatically checks web-page documents for recent changes. The server retrieves and compares documents one or more times a week. The user is notified by electronic mail when a change is detected. The user registers a web-page document by submitting his e-mail address and the uniform-resource locator (URL) of the desired document. The document is fetched and the user can select text on the page of interest. Non-selected text is ignored; only changes in the selected text are reported back to the user. Thus changes to less relevant parts of the document are ignored. The document is divided into sections bounded by hyper-text markup-language (HTML) tags. A checksum is generated and stored for each HTML-bound section. Storage requirements are reduced since only checksums are stored rather than the original documents. During periodic comparisons a fresh copy of the document is retrieved, divided into HTML-bound sections and checksums generated for each section. The freshly-generated checksums are compared to the archived checksums. Sections with non-matching checksums are highlighted as changed, and the percentage of changed sections is reported. The user-defined selection is also stored as a checksum and compared to a freshly-generated checksum. Changed checksums outside the user-defined selection do not generate a change notification. Re-ordering of sections does not generate a change notification when the checksums otherwise match. Thus format and layout changes do not generate change notifications, and the frequency of notices to user is reduced.
506 Citations
19 Claims
-
1. A change-detection web server comprising:
-
a network connection for transmitting and receiving packets from a remote client and a remote document server; a responder, coupled to the network connection, for communicating with the remote client, the responder registering a document for change detection by receiving from the remote client a uniform-resource-locator (URL) identifying the document, the responder fetching the document from the remote document server and generating an original checksum for a checked portion of the document, the checked portion being less than the entire document; a database, coupled to the responder, for receiving the URL and the original checksum from the responder when the document is registered by the remote client, the database for storing a plurality of records each containing a URL and a checksum for a registered document; a periodic minder, coupled to the database and the network connection, for periodically re-fetching the document from the remote document server by transmitting the URL from the database to the network connection, the periodic minder receiving a fresh copy of the document from the remote document server, the periodic minder generating a fresh checksum of a portion of the fresh copy of the document and comparing the fresh checksum to the original checksum, the periodic minder signaling a detected change to the remote client when the fresh checksum does not match the original checksum; whereby a change in the document is detected by comparing a checksum for the checked portion of the document, wherein changes in portions of the document outside the checked portion are not signaled to the remote client. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented method for detecting recent changes in a document and notifying a user of the recent changes, the method comprising the steps of:
-
registering the document by receiving an address of the user and a locator for the document; fetching the document from a remote server by transmitting the locator to a network server; determining when the document is a hyper-text markup-language (HTML) document; when the document is an HTML document; dividing the document into sections, each section beginning and ending with an HTML tag, the HTML tag not directly visible to a user viewing the document on a browser; generating a cyclical-redundancy-checksum (CRC) for each section of the document; storing the CRC generated for each section of the document in a database together with the locator of the document and the address of the user; after a period of time; reading the locator from the database and transmitting the locator to remote server to fetch a recent copy of the document; when the document is an HTML document; dividing the recent copy of the document into sections, each section beginning and ending with an HTML tag; generating a recent cyclical-redundancy-checksum (CRC) for each section of the recent copy of the document; reading the CRC'"'"'s from the database and comparing the CRC'"'"'s to the recent CRC'"'"'s to determine which CRC'"'"'s from the database do not have a matching recent CRC; signaling that a change is detected when a CRC'"'"'s from the database does not have a matching recent CRC; whereby the document is not stored in the database which stored CRC'"'"'s for HTMMLbound sections of HTML documents. - View Dependent Claims (12, 13, 14, 15, 16, 17)
-
-
18. A computer-program product comprising:
-
a computer-usable medium having computer-readable program code means embodied therein for detecting changes in a document, the computer-readable program code means in the computer-program product comprising; network connection means for transmitting and receiving packets from a remote client and a remote document server; responder means, coupled to the network connection means, for communicating with the remote client, the responder means registering documents for change detection by receiving from the remote client a uniform-resource-locator (URL) identifying the document, the responder means fetching the document from the remote document server and generating an original checksum for a checked portion of the document, the checked portion being less than an entire document; database means, coupled to the responder means, for receiving the URL and the original checksum from the responder means when the document is registered by the remote client, the database means for storing a plurality of records each containing a URL and a checksum for a registered document, the database means not storing the document or the registered documents, the database means storing a checksum for the document; periodic minder means, coupled to the database means and the network connection means, for periodically re-fetching the document from the remote document server by transmitting the URL from the database means to the network connection means, the periodic minder means receiving a fresh copy of the document from the remote document server, the periodic minder means generating a fresh checksum of a portion of the fresh copy of the document and comparing the fresh checksum to the original checksum, the periodic minder means signaling a detected change to the remote client when the fresh checksum does not match the original checksum; whereby a change in the document is detected by comparing a checksum for the checked portion of the document, wherein changes in portions of the document outside the checked portion are not signaled to the remote client and whereby storage requirements for the database means are reduced by archiving checksums and not entire documents. - View Dependent Claims (19)
-
Specification