Document capture using client-based delta encoding with server
First Claim
Patent Images
1. A computer-implemented method, comprising:
- accessing, by a client device, from a first computer system, a first electronic document identified by a uniform resource indicator (URI);
generating, by the client device, a first set of hashed data portions corresponding to the first electronic document, wherein each of the first set of hashed data portions corresponds to a different portion of data in the first electronic document;
sending, by the client device, to a second computer system, a request for one or more hashed data portions corresponding to the first electronic document, the request including information identifying the first electronic document;
receiving, by the client device, from the second computer system, a second set of hashed data portions responsive to the request, wherein each of the second set of hashed data portions is generated based on a different portion of data in a second electronic document, wherein each of the different data portions satisfies a threshold frequency for appearing in the second electronic document, and wherein the second set of hashed data portions are identified as responsive to the request based on determining that the second electronic document matches the first electronic document identified by the information in the request;
comparing the first set of hashed data portions to the second set of hashed data portions;
identifying, based on the comparing, one or more data portions of the first electronic document that are different from the second electronic document; and
sending, to the second computer system, the one or more identified data portions of the first electronic document as updates to the second electronic document, wherein the second computer system associates the one or more identified data portions as updates to the second electronic document to construct the first electronic document.
5 Assignments
0 Petitions
Accused Products
Abstract
When different client devices request the same document, most of content of the response from the server (i.e. the response document from a web server) will be the same. Embodiments allow the client devices to use fingerprints, i.e. hashes, sent by a capture system to pinpoint only the changing portions of the document instead of sending the entire document. In various embodiments, the client compares client-generated fingerprints for the document with capture system-generated fingerprints for most likely appearing portions of text of the document or related documents to fully represent and sends to the capture system the client document in a compact and efficient way.
-
Citations
19 Claims
-
1. A computer-implemented method, comprising:
-
accessing, by a client device, from a first computer system, a first electronic document identified by a uniform resource indicator (URI); generating, by the client device, a first set of hashed data portions corresponding to the first electronic document, wherein each of the first set of hashed data portions corresponds to a different portion of data in the first electronic document; sending, by the client device, to a second computer system, a request for one or more hashed data portions corresponding to the first electronic document, the request including information identifying the first electronic document; receiving, by the client device, from the second computer system, a second set of hashed data portions responsive to the request, wherein each of the second set of hashed data portions is generated based on a different portion of data in a second electronic document, wherein each of the different data portions satisfies a threshold frequency for appearing in the second electronic document, and wherein the second set of hashed data portions are identified as responsive to the request based on determining that the second electronic document matches the first electronic document identified by the information in the request; comparing the first set of hashed data portions to the second set of hashed data portions; identifying, based on the comparing, one or more data portions of the first electronic document that are different from the second electronic document; and sending, to the second computer system, the one or more identified data portions of the first electronic document as updates to the second electronic document, wherein the second computer system associates the one or more identified data portions as updates to the second electronic document to construct the first electronic document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A system, comprising:
-
one or more processors; a memory accessible to the one or more processors, the memory comprising instructions that, when executed by the one or more processors, causes the one or more processors to; access, from a first computer system, a first electronic document identified by a uniform resource indicator (URI); generate a first set of hashed data portions corresponding to the first electronic document, wherein each of the first set of hashed data portions corresponds to a different portion of data in the first electronic document; send, to a second computer system, a request for one or more hashed data portions corresponding to the first electronic document, the request including information identifying the first electronic document; receive, from the second computer system, a second set of hashed data portions responsive to the request, wherein each of the second set of hashed data portions is generated based on a different portion of data in a second electronic document, wherein each of the different data portions satisfies a threshold frequency for appearing in the second electronic document, and wherein the second set of hashed data portions are identified as responsive to the request based on determining that the second electronic document matches the first electronic document identified by the information in the request; compare the first set of hashed data portions to the second set of hashed data portions; identify, based on the comparing, one or more data portions of the first electronic document that are different from the second electronic document; and send, to the second computer system, the one or more identified data portions of the first electronic document as updates to the second electronic document, wherein the second computer system associates the one or more identified data portions as updates to the second electronic document to construct the first electronic document. - View Dependent Claims (17)
-
-
18. A computer-implemented method, comprising:
-
determining, by a computer system, identification data for a first electronic document stored in association with a uniform resource indicator (URI); identifying, by the computer system, a set of data portions in the first electronic document, wherein each of the set of data portions satisfies a threshold frequency for appearing in the first electronic document; generating, by the computer system, a set of hashed data portions for the identified set of data portions in the first electronic document, wherein each of the set of hashed data portions are generated based on a different portion of the identified set of data portions; receiving, by the computer system, from a client device, a request for one or more hashed data portions corresponding to a second electronic document, the request including information identifying the second electronic document; determining that the second electronic document matches the first electronic document based on the information identifying the second electronic document matching the identification data for the first electronic document; responsive to determining that the second electronic document matches the first electronic document, sending the set of hashed data portions to the client device; receiving, by the computer system, from the client device, one or more data portions of the second electronic document that are different from the first electronic document, wherein the one or more data portions are identified using the set of hashed data portions sent to the client device; and constructing, by the computer system, a third electronic document based on the one or more data portions of the second electronic document and the identified set of data portions, wherein the third electronic document is constructed as an update to the first electronic document having as at least a portion of the first electronic document and the one or more data portions of the second electronic document that are received from the client device. - View Dependent Claims (19)
-
Specification