Techniques for web site integration

US 8,015,173 B2
Filed: 05/26/2005
Issued: 09/06/2011
Est. Priority Date: 05/08/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A processor-implemented method, comprising:

receiving an input from a user identifying an initial document in a collection of documents;

identifying other documents in the collection of documents that are related to the initial document, comprising;

assigning scores to a plurality of compressed document surrogates corresponding to the collection of documents, the scores depending on an occurrence in the compressed document surrogates of at least one term in the initial document, wherein a score S_Dof a compressed document surrogate D in the plurality of compressed document surrogates is determined by crediting the compressed document surrogate D, for each term T in the initial document which occurs in the compressed document surrogate D, with an amount proportional to Robertson'"'"'s term frequency TF_TDand to IDF_Twhere
TF_TD=N_TD/(N_TD+K₁+K₂*(L_D/L₀)),andN_TDis the number of times the term T occurs in compressed document surrogate D,L_Dis the length of compressed document surrogate D,L₀is the average length of a document in the collection of documents,K₁and K₂are constants, and
IDF_T= log((N+K₃)/N_T)/log(N+K₄), andN is the number of documents in the collection of documents,N_Tis the number of documents containing the term T in the collection of documents, andK₃and K₄are constants;

selecting a set of documents from the collection of documents, the set of documents comprising documents corresponding to those of the plurality of compressed document surrogates assigned the highest scores; and

presenting information identifying the set of documents to the user.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Disclosed is a method and device for finding documents, such as Web pages, for presentation to a user, automatically or in response to a user expression of interest, which documents are part of a Web site being accessed by the user, and which documents relate to a document, such as a Web page, being accessed in the Web site. The method takes advantage of information retrieval techniques. The method generates the search query to use to find documents by reference to the text of the document in the Web site being accessed by the user. The method further uses a weighting function to weigh the terms used in the search query.

Citations

18 Claims

1. A processor-implemented method, comprising:
- receiving an input from a user identifying an initial document in a collection of documents;
  
  identifying other documents in the collection of documents that are related to the initial document, comprising;
  
  assigning scores to a plurality of compressed document surrogates corresponding to the collection of documents, the scores depending on an occurrence in the compressed document surrogates of at least one term in the initial document, wherein a score S_Dof a compressed document surrogate D in the plurality of compressed document surrogates is determined by crediting the compressed document surrogate D, for each term T in the initial document which occurs in the compressed document surrogate D, with an amount proportional to Robertson'"'"'s term frequency TF_TDand to IDF_Twhere
  TF_TD=N_TD/(N_TD+K₁+K₂*(L_D/L₀)),andN_TDis the number of times the term T occurs in compressed document surrogate D,L_Dis the length of compressed document surrogate D,L₀is the average length of a document in the collection of documents,K₁and K₂are constants, and
  IDF_T= log((N+K₃)/N_T)/log(N+K₄), andN is the number of documents in the collection of documents,N_Tis the number of documents containing the term T in the collection of documents, andK₃and K₄are constants;
  
  selecting a set of documents from the collection of documents, the set of documents comprising documents corresponding to those of the plurality of compressed document surrogates assigned the highest scores; and
  
  presenting information identifying the set of documents to the user.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein assigning the scores, selecting the set of documents, and presenting the information identifying the set of documents to the user are performed automatically when a user accesses the initial document, without user intervention.
  - 3. The method of claim 1, wherein assigning the scores, selecting the set of documents, and presenting the information identifying the set of documents to the user are performed in response to a request from the user to obtain the set of documents.
  - 4. The method of claim 1, wherein the documents are Web pages, and the collection of documents comprises Web pages available from a Web site.
  - 5. The method of claim 1, wherein the set of documents comprises documents corresponding to those of the plurality of compressed document surrogates assigned scores above a threshold.
  - 6. The method of claim 1, wherein the set of documents comprises documents corresponding to those of the plurality of compressed document surrogates assigned the highest N scores, where N is a predetermined value of one or more.
  - 7. The method of claim 1, wherein the score assigned to a particular one of the plurality of compressed document surrogates depends upon how often the at least one term occurs in the initial document compared to how often the at least one term occurs in the plurality of compressed document surrogates.
  - 8. The method of claim 1, wherein the initial document includes at least one index term manually assigned to the initial document, the at least one term includes the at least one index term, and the score assigned to a particular one of the plurality of compressed document surrogates depends upon an occurrence of the at least one index term.
  - 9. The method of claim 8, wherein the score assigned to the particular one of the plurality of compressed document surrogates depends upon how often the at least one term occurs in the initial document compared to how often the at least one term occurs in the plurality of compressed document surrogates, and depends upon a weight assigned to the at least one index term.

10. A processor-implemented method, comprising:
- receiving a request for an initial document in a plurality of documents from a user;
  
  identifying documents in the plurality of documents that are related to the initial document, comprising;
  
  assigning scores to other documents in the plurality of documents, including —
  
  determining, for each other document in the plurality of documents, if a the document has an associated compressed document surrogate,when the document has an associated compressed document surrogate, assigning a score to the document based on occurrences of at least one term included in the initial document in the associated compressed document surrogate, wherein a score of S_Dassigned to a document D in the plurality of documents is determined by crediting the document D, for each term T in the initial document which occurs in the compressed document surrogate associated with document D, with an amount proportional to Robertson'"'"'s term frequency TF_TDand to IDF_Twhere
  TF_TD=N_TD/(N_TD+K₁+K₂*(L_D/L₎)),andN_TDis the number of times the term T occurs in compressed document surrogate D,L_Dis the length of compressed document surrogate D,L₀is the average length of a document in the plurality of documents,K₁and K₂are constants, and
  IDF_T=log ((N+K₃/log (N+K₄), andN is the number of documents in the plurality of documents,N_Tis the number of documents containing the term T in the plurality of documents, andK₃and K₄are constants,selecting a set of documents from the plurality of documents based on the assigned scores; and
  
  presenting information identifying the set of documents to the user.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The method of claim 10, wherein assigning the scores, selecting the set of documents, and presenting the information identifying the set of documents to the user are performed automatically in response to receiving the request for the initial document, without user intervention.
  - 12. The method of claim 10, wherein assigning the scores, selecting the set of documents, and presenting the information identifying the set of documents to the user are performed in response to an additional request from the user to obtain the set of documents.
  - 13. The method of claim 10, wherein the documents are Web pages, and the collection of documents comprises Web pages available from a Web site.
  - 14. The method of claim 10, wherein selecting the set of documents comprises selecting those of the plurality of documents assigned scores higher than a predetermined threshold.
  - 15. The method of claim 10, wherein selecting the set of documents comprises selecting those of the plurality of documents assigned the highest N scores, where N is a predetermined value of one or more.
  - 16. The method of claim 10, wherein assigning the scores further includes when the document does not have an associated compressed document surrogate, assigning a score to the document based on occurrences of the at least one term in the document.
  - 17. The method of claim 10, wherein the initial document includes at least one index term manually assigned to the initial document, the at least one term includes the at least one index term, and the score assigned to a particular one of the plurality of documents depends upon an occurrence of the at least one index term.
  - 18. The method of claim 17, wherein the score assigned to the particular one of the plurality of documents depends upon how often the at least one term occurs in the initial document compared to how often the at least one term occurs in the plurality of documents, and depends upon a weight assigned to the at least one index term.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Getchius, Jeffrey, Ponte, Jay, Chipalkatti, Renu
Primary Examiner(s)
PANNALA, SATHYANARAYA R

Application Number

US11/138,028
Publication Number

US 20050216478A1
Time in Patent Office

2,294 Days
Field of Search

None
US Class Current

707/708
CPC Class Codes

G06F 16/334   Query execution G06F16/335 ...

Y10S 707/955   Object-oriented

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99944   Object-oriented database st...

Techniques for web site integration

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Techniques for web site integration

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links