Techniques for web site integration
First Claim
Patent Images
1. A system, comprising:
- a computer system coupled to one or more storage devices storing instructions that, when executed by the computer system, cause the computer system to perform operations comprising;
identifying an initial document accessed by a user, wherein the initial document is in a Web site, and wherein the initial document includes a plurality of terms;
selecting documents contained in the Web site that are related to the initial document, comprising;
identifying a respective compressed document surrogate for each of a plurality of other documents in the Web site, wherein each of the compressed document surrogates includes data identifying a plurality of terms that occur in the other document and, for each of the plurality of terms, a respective frequency of occurrence of the term in the other document;
assigning a respective score to each of the plurality of other documents in the web site based at least in part on frequencies of occurrence identified in the compressed document surrogate for the other document for terms from the plurality terms included in the initial document; and
selecting one or more other documents as related documents based on the scores; and
presenting data identifying the related documents to the user.
5 Assignments
0 Petitions
Accused Products
Abstract
Disclosed is a method and device for finding documents, such as Web pages, for presentation to a user, automatically or in response to a user expression of interest, which documents are part of a Web site being accessed by the user, and which documents relate to a document, such as a Web page, being accessed in the Web site. The method takes advantage of information retrieval techniques. The method generates the search query to use to find documents by reference to the text of the document in the Web site being accessed by the user. The method further uses a weighting function to weigh the terms used in the search query.
176 Citations
48 Claims
-
1. A system, comprising:
a computer system coupled to one or more storage devices storing instructions that, when executed by the computer system, cause the computer system to perform operations comprising; identifying an initial document accessed by a user, wherein the initial document is in a Web site, and wherein the initial document includes a plurality of terms; selecting documents contained in the Web site that are related to the initial document, comprising; identifying a respective compressed document surrogate for each of a plurality of other documents in the Web site, wherein each of the compressed document surrogates includes data identifying a plurality of terms that occur in the other document and, for each of the plurality of terms, a respective frequency of occurrence of the term in the other document; assigning a respective score to each of the plurality of other documents in the web site based at least in part on frequencies of occurrence identified in the compressed document surrogate for the other document for terms from the plurality terms included in the initial document; and selecting one or more other documents as related documents based on the scores; and presenting data identifying the related documents to the user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
16. A system, comprising:
-
a computer system coupled to one or more storage devices storing instructions that, when executed by the computer system, cause the computer system to perform operations comprising; identifying an initial document accessed by a user, wherein the initial document is in a Web site, and wherein the initial document includes a plurality of terms presenting data identifying other documents contained in the Web site that are related to the initial document to the user; and a search engine configured to select the other documents that are related to the initial document, comprising; identifying a respective compressed document surrogate for each of a plurality of other documents in the Web site, wherein each of the compressed document surrogates includes data identifying a plurality of terms that occur in the other document and, for each of the plurality of terms, a respective frequency of occurrence of the term in the other document; assigning a respective score to each of the plurality of other documents in the web site based at least in part on frequencies of occurrence identified in the compressed document surrogate for the other document for terms from the plurality terms included in the initial document; and selecting one or more other documents as related documents based on the scores. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
-
31. A system, comprising:
a computer system coupled to a storage device, wherein a computer program stored on the storage device is operable, when executed by the computer system, to cause the computer system to perform operations comprising; receiving an input from a user identifying an initial document in a collection of documents; identifying other documents in the collection of documents that are related to the initial document, comprising; assigning scores to a plurality of compressed document surrogates corresponding to the collection of documents, the scores depending on an occurrence in the compressed document surrogates of at least one term in the initial document, wherein a score SD of a compressed document surrogate D in the plurality of compressed document surrogates is determined by crediting the compressed document surrogate D, for each term T in the initial document which occurs in the compressed document surrogate D, with an amount proportional to Robertson'"'"'s term frequency TFTD and IDFT where
TFTD=NTD/(NTD+K1+K2*(LD/L0)), andNTD is the number of times the term T occurs in compressed document surrogate D, LD is the length of compressed document surrogate D, L0 is the average length of a document in the collection of documents, K1 and K2 are constants, and
IDFT=log((N+K3)/NT)/log(N+K4), andN is the number of documents in the collection of documents, NT is the number of documents containing the term T in the collection of documents, and K3 and K4 are constants; selecting a set of documents from the collection of documents, the set of documents comprising documents corresponding to those of the plurality of compressed document surrogates assigned the highest scores; and presenting information identifying the set of documents to the user. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39)
-
40. A system, comprising:
a computer system coupled to a storage device, wherein a computer program stored on the storage device is operable, when executed by the computer system, to cause the computer system to perform operations comprising; receiving a request for an initial document in a plurality of documents from a user; identifying documents in the plurality of documents that are related to the initial document comprising; assigning scores to a plurality of documents, including; determining, for each other document in the plurality of documents, if the document has an associated compressed document surrogate; and when the document has an associated compressed document surrogate, assigning a score to the document based on occurrences of at least one term included in the initial document in the associated compressed document surrogate, wherein a score of SD assigned to a document D in the plurality of documents is determined by crediting the document D, for each term T in the initial document which occurs in the compressed document surrogate associated with the document D, with an amount proportional to Robertson'"'"'s term frequency TFTD and to IDFT where
TFTD=NTD/(NTD+K1+K2*(LD/L0)), andNTD is the number of times the term T occurs in compressed document surrogate D, LD is the length of compressed document surrogate D, L0 is the average length of a document in the collection of documents, K1 and K2 are constants, and
IDFT=log((N+K3)/NT)/log(N+K4), andN is the number of documents in the collection of documents, NT is the number of documents containing the term T in the collection of and K3 and K4 are constants; selecting a set of documents from the plurality of documents based on the assigned scores; and presenting information identifying the set of documents to the user. - View Dependent Claims (41, 42, 43, 44, 45, 46, 47, 48)
Specification