Use of generalized term frequency scores in information retrieval systems
First Claim
Patent Images
1. A method for presenting banner advertisements to a user who is seeking information about products and/or services, comprising:
- (a) defining a collection C0 of categories of products and/or services,(b) for a plurality of the categories in the collection C0 of categories, associating with the category a set of terms which describe the product(s) and/or service(s) associated with the category, and a unique category identifier term,(c) for a plurality of providers of products and/or services to be utilized, assigning the provider to one or more categories based upon the products and/or services offered by the provider,(d) assigning a plurality of the categories to supercategories,(e) for a plurality of the supercategories, associating one or more banner advertisements with the supercategory,(f) for a plurality of the supercategories, associating with the supercategory the sets of terms which describe the product(s) or service(s) associated with categories assigned to the supercategory, and the category identifier terms which are unique to categories assigned to the supercategory,(g) in response to a user query Q, comprising terms describing products and/or services of interest to the user, selecting categories, in the collection of categories C0, which have associated with them a descriptive term for the products and/or services therein which matches a term in the user query,(h) preparing a new query Q′
comprising the terms in the user query Q, the descriptive terms for the products and/or services associated with the categories selected, and the unique category identifier terms associated with the categories selected,(i) applying the query Q′
to the collection of supercategories,(j) selecting the supercategory with the highest score, and(k) presenting to the user a banner advertisement associated with the supercategory selected.
4 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are methods and systems for selecting electronic documents, such as Web pages or sites, from among documents in a collection, based upon the occurrence of selected terms in segments of the documents. The method may be applied where index terms have previously been assigned to the documents. The method may be used to select supercategories of banner advertisements from which to choose an advertisement to display for a user.
-
Citations
38 Claims
-
1. A method for presenting banner advertisements to a user who is seeking information about products and/or services, comprising:
-
(a) defining a collection C0 of categories of products and/or services, (b) for a plurality of the categories in the collection C0 of categories, associating with the category a set of terms which describe the product(s) and/or service(s) associated with the category, and a unique category identifier term, (c) for a plurality of providers of products and/or services to be utilized, assigning the provider to one or more categories based upon the products and/or services offered by the provider, (d) assigning a plurality of the categories to supercategories, (e) for a plurality of the supercategories, associating one or more banner advertisements with the supercategory, (f) for a plurality of the supercategories, associating with the supercategory the sets of terms which describe the product(s) or service(s) associated with categories assigned to the supercategory, and the category identifier terms which are unique to categories assigned to the supercategory, (g) in response to a user query Q, comprising terms describing products and/or services of interest to the user, selecting categories, in the collection of categories C0, which have associated with them a descriptive term for the products and/or services therein which matches a term in the user query, (h) preparing a new query Q′
comprising the terms in the user query Q, the descriptive terms for the products and/or services associated with the categories selected, and the unique category identifier terms associated with the categories selected,(i) applying the query Q′
to the collection of supercategories,(j) selecting the supercategory with the highest score, and (k) presenting to the user a banner advertisement associated with the supercategory selected.
-
-
2. The method of claim 1, wherein the search query Q′
- is applied to select a supercategory from among the collection of supercategories by calculating for each supercategory a score SC based upon the occurrence in the supercategory of terms in the search query Q′
.
- is applied to select a supercategory from among the collection of supercategories by calculating for each supercategory a score SC based upon the occurrence in the supercategory of terms in the search query Q′
-
3. The method of claim 2, wherein there are approximately 20,000 categories of products and/or services in the collection of categories.
-
4. The method of claim 3, wherein there are approximately 50 supercategories.
-
5. The method of claim 2, wherein every category, in the collection of categories C0, which has associated with it a descriptive term for the products and/or services therein which matches a term in the user query, is presented to the user, and the user is permitted to select from among said categories a category of interest for which a list of the merchants, stores or other sources of product(s) and/or service(s) associated with the category will be presented.
-
6. The method of claim 5, wherein in response to the user selecting a category from those presented,
(a) a new query Q″ - is prepared, comprising the descriptive terms for the product(s) and/or service(s) associated with the category selected, and the unique category identifier term associated with the category selected,
(b) the query Q″
is applied to the collection of supercategories,(c) the supercategory with the highest score is selected, and (d) the user is presented with a banner advertisement associated with the supercategory selected.
- is prepared, comprising the descriptive terms for the product(s) and/or service(s) associated with the category selected, and the unique category identifier term associated with the category selected,
-
7. The method of claim 2, wherein the query Q′
- is applied to the collection of supercategories by utilizing Robertson'"'"'s term frequency score, such that the score for a supercategory SC under the query Q′
is determined by;where; T0 is the number of terms which occur in the query Q′
i,TFTD is Robertson'"'"'s term frequency for term T in supercategory SC,
=NTC/(NTC K1 K2*(LC/L0)),where; NTC is the number of times the term T occurs in supercategory SC, LC is the length of supercategory SC, L0 is the average length of a supercategory, and K1 and K2 are constants
and IDFT=log((N K3)/NT)/log(N K4)where; N is the number of supercategories in the collection NT is the number of supercategories containing the term T, and K3 and K4 are constants.
- is applied to the collection of supercategories by utilizing Robertson'"'"'s term frequency score, such that the score for a supercategory SC under the query Q′
-
8. The method of claim 1, wherein K1=0.5, K2=1.5, K3=0.5, and K4=1.0.
-
9. The method of claim 2, wherein:
-
(a) categories are assigned to supercategories by a plurality of methods, (b) supercategories are considered to comprise multiple segments, (c) segments of supercategories comprise terms and term identifiers associated with categories assigned to the supercategory by a single method, (d) segments of supercategories are assigned weights WSC, (e) terms in a segment are assigned weights WSTC, and (f) in applying the query Q′
i to the collection of supercategories a generalized term frequency score is used, such that the score SC for a supercategory with respect to the query Q′
i is calculated as follows;where; SC is the total score for the supercategory SC, T0 is the number of terms which occur in the query Q′
i,S0 is the number of segments in the supercategory SC, TFSTC=Robertson'"'"'s generalized term frequency score for Term T in Segment Si of supercategory SC
=GSTC/(GSTC K1 K2*WSC*(HSC/HSO)where; GSTC=the generalized term count for Term T in Segment Si of supercategory SC,
=WSC*WSTC*NSTC,where; WSC is the weight assigned to segment Si of the supercategories, WSTC is the weight assigned to term T in segment Si of supercategory SC, and NSTC is the number of times the term T occurs in segment Si of supercategory SC, HSC=the generalized length of segment Si of supercategory SC, where; LSC is the number of different terms in segment Si of supercategory SC, HSO=the generalized average length of segment Si of the supercategories, where; C0 is the number of supercategories and K1 and K2 are constants and IDFST=the generalized inverted document frequency for term T,
IDFSt=log((C0 K3)/CST)log(C0 K4)where; C0 is the number of supercategories CST is the number of supercategories containing the term T in the segment Si, and K3 and K4 are constants.
-
-
10. The method of claim 9, wherein K1=0.5, K2=1.5, K3=0.5, and K4=1.0.
-
11. The method of claim 9, wherein a subset of the categories are assigned to supercategories manually, while the remainder are assigned utilizing an automatic or semi-automatic index term augmentation technique based upon the co-occurrence of terms between the manually-assigned categories and the categories being automatically or semi-automatically assigned.
-
12. The method of claim 11, further comprising assigning the remainder of the terms not manually assigned to supercategories by
(a) selecting a category Ci from among the categories in the collection not yet assigned to supercategories which has not yet been processed, (b) selecting a supercategory Sj from among the set of supercategories, (c) calculating a likelihood function for the category Ci and a category Ck in the collection which has previously been assigned to the supercategory Sj manually, which likelihood function is based upon the likelihood that a term occurring in the category Ci also occurs in the category Ck (d) repeating step (c) for a plurality of other categories Ck in the collection which have previously been assigned to the supercategory Sj manually, (e) calculating a total score for the category Ci for the supercategory Sj which total score is based upon the likelihood functions for the category Ci and the categories Ck in the collection which have previously been assigned to the supercategory Sj manually, (f) repeating steps (b)-(e) for a plurality of other supercategories Sj, (g) assigning category Ci to the supercategory for which the total score calculated for the category Ci is the highest, and (h) repeating steps (a)-(g) for a plurality of other categories in the collection which have not yet been assigned to supercategories and which have not yet been processed.
-
13. The method of claim 12, wherein the likelihood function for the category Ci and a category Ck in the collection which has previously been assigned to the supercategory Sj manually, is the log likelihood ratio L(Ci, Ck) for the category Ci and the category Ck,
-
( C i , C k ) = log ( ∑ m = 1 M 0 ∏ m ( C i , C k ) / ∑ m = 1 M 0 ∏ m ( C i ) ) , where π
m(Ci, Ck)=1, if item m is assigned to category Ci and to category Ck=0 otherwise π
m(Ci)=1, if item m is assigned to category Ci=0 otherwise M0=the number of items which are assigned to the category Ci.
-
-
14. The method of claim 13, wherein the total score T (C1, Sj) for the category Ci for the supercategory Sj is
-
( C i , S j ) = ∑ k = 1 K 0 L ( C i , C k ) / K 0 , where K0=the number of categories in the collection assigned to supercategory Sj manually.
-
-
15. The method of claim 14, wherein there are approximately 20,000 categories.
-
16. The method of claim 15, wherein there are approximately 50 supercategories.
-
17. The method of claim 16, wherein a portion of the approximately 20,000 categories is manually assigned to supercategories.
-
18. The method of claim 17, wherein K1=0.5, K2=1.5, K3=0.5, and K4=1.0.
-
19. The method of claim 18, wherein the weight WSC assigned to the segment of a supercategory which comprises terms and term identifiers associated with the categories assigned to the supercategory manually is 1.0, and the weight WSC assigned to the segment of a supercategory which comprises terms and term identifiers associated with the categories assigned to the supercategory semi-automatically is 0.4.
-
20. A device for presenting banner advertisements to a user who is seeking information about products and/or services, comprising:
-
(a) means for defining a collection C0 of categories of products and/or services, (b) means for associating, for a plurality of the categories in the collection C0 of categories, the category with a set of terms which describe the product(s) and/or service(s) associated with the category, and a unique category identifier term, (c) means for assigning, for a plurality of providers of products and/or services to be utilized, the provider to one or more categories based upon the products and/or services offered by the provider, (d) means for assigning a plurality of the categories to supercategories, (e) means for associating, for a plurality of supercategories, one or more banner advertisements with the supercategory, (f) means for associating, for a plurality of the supercategories, the supercategory with the sets of terms which describe the product(s) or service(s) associated with categories assigned to the supercategory, and the category identifier terms which are unique to categories assigned to the supercategory, (g) means for selecting, in response to a user query Q′
, comprising terms describing products and/or services of interest to the user, categories, in the collection of categories C0, which has associated with them a descriptive term for the products and/or services therein which matches a term in the user query,(h) means for preparing a new query Q′
, comprising the terms in the user query Q, the descriptive terms for the products and/or services associated with the categories selected, and the unique category identifier terms associated with the categories selected,(i) means for applying the query Q′
to the collection of supercategories,(j) means for selecting the supercategory with the highest score, and (k) means for presenting to the user a banner advertisement associated with the supercategory selected.
-
-
21. The device of claim 20, wherein the search query Q is applied to select a supercategory from among the collection of supercategories by calculating for each supercategory a score SC based upon the occurrence in the supercategory of terms in the search query Q′
- .
-
22. The device of claim 21, wherein there are approximately 20,000 categories of products and/or services in the collection of categories.
-
23. The device of claim 22, wherein there are approximately 50 supercategories.
-
24. The device of claim 21, wherein every category, in the collection of categories C0, which has associated with it a descriptive term for the products and/or services therein which matches a term in the user query, is presented to the user, and the user is permitted to select from among said categories a category of interest for which a list of the merchants, stores or other sources of product(s) and/or service(s) associated with the category will be presented.
-
25. The device of claim 24, further comprising,
(a) means for preparing, in response to the user selecting a category from those presented, a new query Q″ - , comprising the descriptive terms for the product(s) and/or service(s) associated with the category selected, and the unique category identifier term associated with the category selected,
(b) means for applying the query Q″
to the collection of supercategories,(c) means for selecting the supercategory with the highest score, and (d) means for presenting the user with a banner advertisement associated with the supercategory selected.
- , comprising the descriptive terms for the product(s) and/or service(s) associated with the category selected, and the unique category identifier term associated with the category selected,
-
26. The device of claim 21, wherein the query Q′
- is applied to the collection of supercategories by utilizing Robertson'"'"'s term frequency score, such that the score for a supercategory SC under the query Q′
is determined by;where; T0 is the number of terms which occur in the query Q′
i,TFTD is Robertson'"'"'s term frequency for term T in supercategory SC,
=NTC/(NTC K1 K2*(LC/L0)),where; NTC is the number of times the term T occurs in supercategory SC, LC is the length of supercategory SC, L0 is the average length of a supercategory, and K1 and K2 are constants
and IDFT=log((N K3)/NT)log(N K4)where; N is the number of supercategories in the collection NT is the number of supercategories containing the term T, and K3 and K4 are constants.
- is applied to the collection of supercategories by utilizing Robertson'"'"'s term frequency score, such that the score for a supercategory SC under the query Q′
-
27. The device of claim 26, wherein K1=0.5, K2=1.5, K3=0.5, and K4=1.0.
-
28. The device of claim 21, further comprising:
-
(a) means for assigning categories to supercategories by a plurality of methods, (b) means for considering supercategories to comprise multiple segments, (c) means for causing segments of supercategories to comprise terms and term identifiers associated with categories assigned to the supercategory by a single method (d) means for assigning segments of supercategories weights WSC, (e) means for assigning terms in a segment weights WSTC, and (f) means for using a generalized term frequency score in applying the query Q′
i to the collection of supercategories, such that the score SC for a supercategory with respect to the query Q′
i is calculated as follows;where; SC is the total score for the supercategory SC, T0 is the number of terms which occur in the query Q′
i,S0 is the number of segments in the supercategory SC, TFSTC=Robertson'"'"'s generalized term frequency score for Term T in Segment Si of supercategory SC
=GSTC/(GSTC K1 K2*WSC*(HSC/HSO),where; GSTC=the generalized term count for Term T in Segment Si of supercategory SC,
=WSC*WSTC*NSTC,where; WSC is the weight assigned to segment Si of the supercategories, WSTC is the weight assigned to term T in segment Si of supercategory SC, and NSTC is the number of times the term T occurs in segment Si of supercategory SC, HSC=the generalized length of segment Si of supercategory SC, where; LSC is the number of different terms in segment Si of supercategory SC, HSO=the generalized average length of segment Si of the supercategories, where; C0 is the number of supercategories and K1 and K2 are constants and IDFST=the generalized inverted document frequency for term T,
IDFST=log((C0 K3)/CST)/log(C0 K4)where; C0 is the number of supercategories CST is the number of supercategories containing the term T in the segment Si, and K3 and K4 are constants.
-
-
29. The device of claim 28, wherein K1=0.5, K2=1.5, K3=0.5, and K41.0.
-
30. The device of claim 28, further comprising means for assigning certain of the categories to supercategories manually, and means for assigning the remainder utilizing an automatic or semi-automatic index term augmentation technique based upon the co-occurrence of terms between the manually-assigned categories and the categories being automatically or semi-automatically assigned.
-
31. The device of claim 30, wherein the means for assigning the remainder of the terms not manually assigned to supercategories further comprises
(a) means for selecting a category Ci from among the categories in the collection not yet assigned to supercategories which has not yet been processed, (b) means for selecting a supercategory Sj from among the set of supercategories, (c) means for calculating a likelihood function for the category Ci and a category Ck in the collection which has previously been assigned to the supercategory Sj manually, which likelihood function is based upon the likelihood that a term occurring in the category Ci also occurs in the category Ck (d) means for repeating step (c) for a plurality of other categories Ck in the collection which have previously been assigned to the supercategory Sj manually, (e) means for calculating a total score for the category Ci for the supercategory Sj, which total score is based upon the likelihood functions for the category Ci and the categories Ck in the collection which have previously been assigned to the supercategory Sj manually, (f) means for repeating steps (b)-(e) for a plurality of other supercategories Sj, (g) means for assigning category Ci to the supercategory for which the total score calculated for the category Ci is the highest, and (h) means for repeating steps (a)-(g) for a plurality of other categories in the collection which have not yet been assigned to supercategories and which have not yet been processed.
-
32. The device of claim 31, wherein the likelihood function for the category Ci and a category Ck in the collection which has previously been assigned to the supercategory Sj manually, is the log likelihood ratio L(Cj, Ck) for the category Ci and the category Ck,
-
( C i , C k ) = log ( ∑ m = 1 M 0 ∏ m ( C i , C k ) / ∑ m = 1 M 0 ∏ m ( C i ) ) , where π
m(Ci, Ck)=1, if item m is assigned to category Ci and to category Ck=0 otherwise π
m(Ci)=1, if item m is assigned to category Ci=0 otherwise M0 the number of items which are assigned to the category Ci.
-
-
33. The device of claim 32, wherein the total score T (Ci, Sj) for the category Ci for the supercategory Sj is
-
( C i , S j ) = ∑ k = 1 K 0 L ( C i , C k ) / K 0 , where K0=the number of categories in the collection assigned to supercategory Sj manually.
-
-
34. The device of claim 33, wherein there are approximately 20,000 categories.
-
35. The device of claim 34, wherein there are approximately 50 supercategories.
-
36. The device of claim 35, wherein some of the 20,000 categories are manually assigned to supercategories.
-
37. The device of claim 36, wherein K1=0.5, K2=1.5, K3=0.5, and K4=1.0.
-
38. The device of claim 37, wherein the weight WSC assigned to the segment of a supercategory which comprises terms and term identifiers associated with the categories assigned to the supercategory manually is 1.0, and the weight WSC assigned to the segment of a supercategory which comprises terms and term identifiers associated with the categories assigned to the supercategory semi-automatically is 0.4.
Specification