Techniques for web site integration
First Claim
Patent Images
1. A processor-implemented method, comprising:
- receiving an input from a user identifying an initial document in a collection of documents;
identifying other documents in the collection of documents that are related to the initial document, comprising;
assigning scores to a plurality of compressed document surrogates corresponding to the collection of documents, the scores depending on an occurrence in the compressed document surrogates of at least one term in the initial document, wherein a score SD of a compressed document surrogate D in the plurality of compressed document surrogates is determined by crediting the compressed document surrogate D, for each term T in the initial document which occurs in the compressed document surrogate D, with an amount proportional to Robertson'"'"'s term frequency TFTD and to IDFT where
TFTD =NTD/(NTD+K1+K2 *(LD/L0)),andNTD is the number of times the term T occurs in compressed document surrogate D,LD is the length of compressed document surrogate D,L0 is the average length of a document in the collection of documents,K1 and K2 are constants, and
IDFT= log((N+K3 )/NT)/log(N+K4), andN is the number of documents in the collection of documents,NT is the number of documents containing the term T in the collection of documents, andK3 and K4 are constants;
selecting a set of documents from the collection of documents, the set of documents comprising documents corresponding to those of the plurality of compressed document surrogates assigned the highest scores; and
presenting information identifying the set of documents to the user.
5 Assignments
0 Petitions
Accused Products
Abstract
Disclosed is a method and device for finding documents, such as Web pages, for presentation to a user, automatically or in response to a user expression of interest, which documents are part of a Web site being accessed by the user, and which documents relate to a document, such as a Web page, being accessed in the Web site. The method takes advantage of information retrieval techniques. The method generates the search query to use to find documents by reference to the text of the document in the Web site being accessed by the user. The method further uses a weighting function to weigh the terms used in the search query.
-
Citations
18 Claims
-
1. A processor-implemented method, comprising:
-
receiving an input from a user identifying an initial document in a collection of documents; identifying other documents in the collection of documents that are related to the initial document, comprising; assigning scores to a plurality of compressed document surrogates corresponding to the collection of documents, the scores depending on an occurrence in the compressed document surrogates of at least one term in the initial document, wherein a score SD of a compressed document surrogate D in the plurality of compressed document surrogates is determined by crediting the compressed document surrogate D, for each term T in the initial document which occurs in the compressed document surrogate D, with an amount proportional to Robertson'"'"'s term frequency TFTD and to IDFT where
TFTD =NTD/(NTD+K1+K2 *(LD/L0)),andNTD is the number of times the term T occurs in compressed document surrogate D, LD is the length of compressed document surrogate D, L0 is the average length of a document in the collection of documents, K1 and K2 are constants, and
IDFT= log((N+K3 )/NT)/log(N+K4), andN is the number of documents in the collection of documents, NT is the number of documents containing the term T in the collection of documents, and K3 and K4 are constants; selecting a set of documents from the collection of documents, the set of documents comprising documents corresponding to those of the plurality of compressed document surrogates assigned the highest scores; and presenting information identifying the set of documents to the user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A processor-implemented method, comprising:
-
receiving a request for an initial document in a plurality of documents from a user; identifying documents in the plurality of documents that are related to the initial document, comprising;
assigning scores to other documents in the plurality of documents, including —determining, for each other document in the plurality of documents, if a the document has an associated compressed document surrogate, when the document has an associated compressed document surrogate, assigning a score to the document based on occurrences of at least one term included in the initial document in the associated compressed document surrogate, wherein a score of SD assigned to a document D in the plurality of documents is determined by crediting the document D, for each term T in the initial document which occurs in the compressed document surrogate associated with document D, with an amount proportional to Robertson'"'"'s term frequency TFTD and to IDFT where
TFTD =NTD/(NTD +K1+K2*(LD/L))),andNTD is the number of times the term T occurs in compressed document surrogate D, LD is the length of compressed document surrogate D, L0 is the average length of a document in the plurality of documents, K1 and K2 are constants, and
IDFT =log ((N+K3/log (N+K4), andN is the number of documents in the plurality of documents, NT is the number of documents containing the term T in the plurality of documents, and K3 and K4 are constants, selecting a set of documents from the plurality of documents based on the assigned scores; and presenting information identifying the set of documents to the user. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification