Method and system for collecting user profile information over the world-wide web in the presence of dynamic content using document comparators
First Claim
1. A method of collecting information about document retrievals over the World-Wide Web, comprising the steps of:
- receiving a requesting user identity, requested Universal Resource Identifier (URI), and a content of a retrieved document;
selecting a Candidate Document from a Retrieved Document Database, said Candidate Document associated with a Candidate Document Key;
comparing said retrieved document to said Candidate Document to determine a sufficiency of said Candidate Document;
associating said retrieved document with a newly generated Retrieved Document Key if said Candidate Document is not deemed to be sufficient;
adding said retrieved document to said Received Document Database; and
adding a Log File Entry including said requesting user identity, said requested URI, and said Retrieved Document Key.
4 Assignments
0 Petitions
Accused Products
Abstract
Disclosed is a method and system for collecting profile information about users accessing dynamically generated content from one or more servers. In a specific embodiment, a server dynamically generates a web page in response to a user request. The server customizes the web page content based on the requested universal resource identifier (URI) and one or more of: the user'"'"'s identity, access permissions, demographic information, and previous behavior at the site. The web server then passes the URI, user identity, and dynamically generated web page to an access information collector. The access information collector generates document comparators from the current web page content and compares them to document comparators associated with previously retrieved web pages. If the current web page is sufficiently similar to some previously retrieved web page, the access information collector logs the URI, user identity, and a document key associated with the matching previously retrieved page. Otherwise, the access information collector generates a new key; stores the new key and the document comparators in a database; and logs the URI, user identity, and the newly generated document key.
-
Citations
27 Claims
-
1. A method of collecting information about document retrievals over the World-Wide Web, comprising the steps of:
-
receiving a requesting user identity, requested Universal Resource Identifier (URI), and a content of a retrieved document;
selecting a Candidate Document from a Retrieved Document Database, said Candidate Document associated with a Candidate Document Key;
comparing said retrieved document to said Candidate Document to determine a sufficiency of said Candidate Document;
associating said retrieved document with a newly generated Retrieved Document Key if said Candidate Document is not deemed to be sufficient;
adding said retrieved document to said Received Document Database; and
adding a Log File Entry including said requesting user identity, said requested URI, and said Retrieved Document Key. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
computing said first Document Comparator for said retrieved document;
retrieving said second Document Comparator for said Candidate Document;
computing with said Document Comparator Function a numeric measure of a difference between said first Document Comparator and said second Document Comparator; and
comparing said numeric measure against a predefined Document Difference Threshold.
-
-
4. The method of claim 2, wherein each said Document Comparator comprises content of said each of a plurality of documents associated therewith.
-
5. The method of claim 4, wherein a URI for said Candidate Document is equal to a URI for said retrieved document.
-
6. The method of claim 2, wherein each said Document Comparator is computed by associating predefined portions of said each of a plurality of documents to a binary token.
-
7. The method of claim 2, wherein each said Document Comparator comprises a list of significant words or phrases in said each of a plurality of documents.
-
8. The method of claim 2, wherein each said Document Comparator comprises a Comparator for each of a plurality of predefined sections of said each of a plurality of documents.
-
9. The method of claim 2, wherein said step of selecting a Candidate Document comprises selecting from a Document Comparator Database.
-
10. A system for collecting information about document retrievals over the World-Wide Web, comprising:
-
means for receiving a requesting user identity, requested Universal Resource Identifier (URI), and a content of a retrieved document;
means for selecting a Candidate Document from a Retrieved Document Database, said Candidate Document associated with a Candidate Document Key;
means for comparing said retrieved document to said Candidate Document to determine a sufficiency of said Candidate Document;
means for associating said retrieved document with a newly generated Retrieved Document Key if said Candidate Document is not deemed to be sufficient;
means for adding said retrieved document to said Received Document Database; and
means for adding a Log File Entry including said requesting user identity, said requested URI, and said Retrieved Document Key. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
means for computing said first Document Comparator for said retrieved document;
means for retrieving said second Document Comparator for said Candidate Document;
means for computing with said Document Comparator Function a numeric measure of a difference between said first Document Comparator and said second Document Comparator; and
means for comparing said numeric measure against a predefined Document Difference Threshold.
-
-
13. The system of claim 11, wherein each said Document Comparator comprises content of said each of a plurality of documents associated therewith.
-
14. The system of claim 13, wherein a URI for said Candidate Document is equal to a URI for said retrieved document.
-
15. The system of claim 11, wherein each said Document Comparator is computed by associating predefined portions of said each of a plurality of documents to a binary token.
-
16. The system of claim 11, wherein each said Document Comparator comprises a list of significant words or phrases in said each of a plurality of documents.
-
17. The system of claim 11, wherein each said Document Comparator comprises a Comparator for each of a plurality of predefined sections of said each of a plurality of documents.
-
18. The system of claim 11, wherein said means for selecting a Candidate Document comprises selecting from a Document Comparator Database.
-
19. A computer program product recorded on computer readable medium for collecting information about document retrievals over the World-Wide Web, comprising:
-
computer readable means for receiving a requesting user identity, requested Universal Resource Identifier (URI), and a content of a retrieved document;
computer readable means for selecting a Candidate Document from a Retrieved Document Database, said Candidate Document associated with a Candidate Document Key;
computer readable means for comparing said retrieved document to said Candidate Document to determine a sufficiency of said Candidate Document;
computer readable means for associating said retrieved document with a newly generated Retrieved Document Key if said Candidate Document is not deemed to be sufficient;
computer readable means for adding said retrieved document to said Received Document Database; and
computer readable means for adding a Log File Entry including said requesting user identity, said requested URI, and said Retrieved Document Key. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
computer readable means for computing said first Document Comparator for said retrieved document;
computer readable means for retrieving said second Document Comparator for said Candidate Document;
computer readable means for computing with said Document Comparator Function a numeric measure of a difference between said first Document Comparator and said second Document Comparator; and
computer readable means for comparing said numeric measure against a predefined Document Difference Threshold.
-
-
22. The program product of claim 20, wherein each said Document Comparator comprises content of said each of a plurality of documents associated therewith.
-
23. The program product of claim 22, wherein a URI for said Candidate Document is equal to a URI for said retrieved document.
-
24. The program product of claim 20, wherein each said Document Comparator is computed by associating predefined portions of said each of a plurality of documents to a binary token.
-
25. The program product of claim 20, wherein each said Document Comparator comprises a list of significant words or phrases in said each of a plurality of documents.
-
26. The program product of claim 20, wherein each said Document Comparator comprises a Comparator for each of a plurality of predefined sections of said each of a plurality of documents.
-
27. The program product of claim 20, wherein said computer readable means for selecting a Candidate Document comprises selecting from a Document Comparator Database.
Specification