Semi-automatic index term augmentation in document retrieval
First Claim
Patent Images
1. A method for assigning index terms to a document Di in a collection of documents/ where other documents in the collection have previously had index terms assigned by another method, comprising:
- (a) selecting a term Ij from among a set of terms from which the index terms are being assigned, which term Ij has not yet been processed, (b) calculating a likelihood function for the document Di and a document Dk in the collection to which the term Ij has previously been assigned as an index term by another method, which likelihood function is based upon the likelihood that a term occurring in the document Di also occurs in the document Dk, (c) repeating step (b) for a plurality of other documents Dk in the collection to which the term Ij has previously been assigned as an index term by another method, (d) calculating a total score for the Document Di for the Index Term Ij, which total score is based upon the likelihood functions for the document Di and the documents Dk in the collection to which the term Ij has previously been assigned as an index term by another method, (e) repeating steps (a)-(d) for a plurality of other terms Ij from among the set of terms from which index terms are being assigned, and (f) choosing index terms to be assigned to Document Di, from among the set of terms Ij from which index terms are being assigned, based upon the total scores calculated for the Document Di for the Index Terms Ij.
4 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are methods and systems for indexing or retrieving materials accessible through computer networks.
88 Citations
70 Claims
-
1. A method for assigning index terms to a document Di in a collection of documents/ where other documents in the collection have previously had index terms assigned by another method, comprising:
-
(a) selecting a term Ij from among a set of terms from which the index terms are being assigned, which term Ij has not yet been processed, (b) calculating a likelihood function for the document Di and a document Dk in the collection to which the term Ij has previously been assigned as an index term by another method, which likelihood function is based upon the likelihood that a term occurring in the document Di also occurs in the document Dk, (c) repeating step (b) for a plurality of other documents Dk in the collection to which the term Ij has previously been assigned as an index term by another method, (d) calculating a total score for the Document Di for the Index Term Ij, which total score is based upon the likelihood functions for the document Di and the documents Dk in the collection to which the term Ij has previously been assigned as an index term by another method, (e) repeating steps (a)-(d) for a plurality of other terms Ij from among the set of terms from which index terms are being assigned, and (f) choosing index terms to be assigned to Document Di, from among the set of terms Ij from which index terms are being assigned, based upon the total scores calculated for the Document Di for the Index Terms Ij. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 11)
-
-
12. A method for assigning index terms to documents in a collection of documents, comprising:
-
(a) manually pre-assigning index terms to a subset of the documents in the collection, (b) selecting a document Di from among the documents in the collection to which index terms have not yet been assigned, which document Di has not yet been processed, (c) selecting a term Ij from among a set of terms from which index terms are being assigned which term In has not yet been processed, (d) calculating a likelihood function for the document Di and a document Dk in the collection to which the term Ij has previously been assigned as an index term manually, which likelihood function is based upon the likelihood that a term occurring in the document Di also occurs in the document Dk, (e) repeating step (d) for a plurality of other documents Dk in the collection to which the term Ij has previously been assigned as an index term manually, (f) calculating a total score for the Document Di for the Index Term;
Ij which total score is based upon the likelihood functions for the Document Di and the Documents Dk in the collection to which the term Ij has previously been assigned as an index term manually,(g) repeating steps (c)-(f) for a plurality of other terms Ij from among the set of terms from which index terms are being assigned, (h) choosing index terms to be assigned to Document Di, from among the set of terms Ij from which index terms are to be assigned, based upon the total scores calculated for the Document Di for the index Terms Ij, and (i) repeating steps (b)-(h) for a plurality of other documents in the collection to which index terms have not yet been assigned which have not yet been processed. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method for assigning categories of items to supercategories, comprising:
-
(a) assigning a subset of the categories in the collection to supercategories manually, (b) selecting a category Ci from among the categories in the collection not yet assigned to supercategories which has not yet been processed, (c) calculating a likelihood function Lik for the category Ci and a category Ck in the collection which has previously been assigned to a supercategory Sj manually, which likelihood function is based upon the likelihood that a term occurring in the category Ci also occurs in the category Ck, (d) repeating step (c) for a plurality of other categories Ck in the collection which have previously been assigned to a supercategory Sj manually, (e) assigning the category Ci to a supercategory Sj based on the likelihood functions Lik that a term occurring in the category Ci also occurs in the category Ck which is assigned to supercategory Sj, and (f) repeating steps (b)-(e) for a plurality of other categories in the collection which have not yet been assigned to supercategories and which have not yet been processed. - View Dependent Claims (24, 25, 26)
-
-
27. A method for assigning categories of items to supercategories, comprising:
-
(a) assigning a subset of the categories in the collection to supercategories manually, (b) selecting a category Ci from among the categories in the collection not yet assigned to supercategories which has not yet been processed, (c) selecting a supercategory Sj from among the set of supercategories, (d) calculating a likelihood function for the category Ci and a category Ck in the collection which has previously been assigned to the supercategory Sj manually, which likelihood function is based upon the likelihood that a term occurring in the category Ci also occurs in the category Ck, (e) repeating step (d) for a plurality of other categories Ck in the collection which have previously been assigned to the supercategory Sj manually, (f) calculating a total score for the category Ci for the supercategory Sj, which total score is based upon the likelihood functions for the category Ci and the categories Ck in the collection which have previously been assigned to the supercategory Sj manually, (g) repeating steps (c)-(f) for a plurality of other supercategories Sj, (h) assigning category Ci to the supercategory for which the total score calculated for the category Ci is the highest, and (i) repeating steps (b)-(h) for a plurality of other categories in the collection which have not yet been assigned to supercategories and which have not yet been processed - View Dependent Claims (10, 28, 29, 30, 31, 32, 33, 34, 35)
-
-
36. A device for assigning index terms to a document Di in a collection of documents, where other documents in the collection have previously had index terms assigned by another method, comprising:
-
(a) means for selecting a term Ij from among a set of terms from which the index terms are being assigned, which has not yet been processed, (b) means for calculating a likelihood function for the document Di and a document Dk in the collection to which the term Ij has previously been assigned as an index term by another method, which likelihood function is based upon the likelihood that a term occurring in the document Di also occurs in the document Dk (c) means for repeating step (b) for a plurality of other documents Dk in the collection to which the term Ij has previously been assigned as an index term by another method, (d) means for calculating a total score for the Document Di for the Index Term Ij, which total score is based upon the likelihood functions for the document Di and the documents Dk in the collection to which the term Ij has previously been assigned as an index term by another method, (e) means for repeating steps (a)-(d) for a plurality of other terms Ij from among the set of terms from which index terms are being assigned, and (f) means for choosing index terms to be assigned to Document Di, from among the set of terms from which index terms are being assigned, based upon the total scores calculated for the Document Di for the Index Terms Ij. - View Dependent Claims (37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
-
-
47. A device for assigning index terms to documents in a collection of documents, comprising:
-
(a) means for manually pre-assigning index terms to a subset of the documents in the collection, (b) means for selecting a document Di from among the documents in the collection to which index terms have not yet been assigned, which document Di has not yet been processed, (c) means for selecting a term Ij from among a set of terms from which index terms are being assigned, which term Ij has not yet been processed, (d) means for calculating a likelihood function for the document Di and a document Dk in the collection to which the term Ij has previously been assigned as an index term manually, which likelihood function is based upon the likelihood that a term occurring in the document Di also occurs in the document Dk, (e) means for repeating step (d) for a plurality of other documents Dk in the collection to which the term Ij has previously been assigned as an index term manually, (f) means for calculating a total score for the Document Di for the Index Term Ij which total score is based upon the likelihood functions for the Document Di and the Documents Dk in the collection to which the term Ij has previously been assigned as an index term manually, (g) means for repeating steps (c)-(f) for a plurality of other terms Ij from among the set of terms from which index terms are being assigned, (h) means for choosing index terms to be assigned to Document Di, from among the set of terms from which index terms are to be assigned, based upon the total scores calculated for the Document Di for the Index Terms Ij, and (i) means for repeating steps (b)-(h) for a plurality of other documents in the collection to which index terms have not yet been assigned which have not yet been processed. - View Dependent Claims (48, 49, 50, 51, 52, 53, 54, 55, 56, 57)
-
-
58. A device for assigning categories of items to supercategories, comprising:
-
(a) means for assigning a subset of the categories in the collection to supercategories manually, (b) means for selecting a category Ci from among the categories in the collection not yet assigned to supercategories which has not yet been processed, (c) means for calculating a likelihood function Lik for the category Ci and a category Ck in the collection which has previously been assigned to a supercategory Sj manually, which likelihood function is based upon the likelihood that a term occurring in the category Ci also occurs in the category Ck, (d) means for repeating step (c) for a plurality of other categories Ck in the collection which have previously been assigned to a supercategory Sj manually, (e) means for assigning the category Ci to a supercategory Sj based on the likelihood functions Lik that a term occurring in the category Ci also occurs in the category Ck which is assigned to supercategory Sj, and (f) means for repeating steps (b)-(e) for a plurality of other categories in the collection which have not yet been assigned to supercategories and which have not yet been processed. - View Dependent Claims (59, 60, 61)
-
-
62. A device for assigning categories of items to supercategories, comprising:
-
(a) means for assigning a subset of the categories in the collection to supercategories manually, (b) means for selecting a category Ci from among the categories in the collection not yet assigned to supercategories which has not yet been processed, (c) means for selecting a supercategory Sj from among the set of supercategories, (d) means for calculating a likelihood function for the category Ci and a category Ck in the collection which has previously been assigned to the supercategory Sj manually, which likelihood function is based upon the likelihood that a term occurring in the category Ci also occurs in the category Ck (e) means for repeating step (d) for a plurality of other categories Ck in the collection which have previously been assigned to the supercategory Sj manually, (f) means for calculating a total score for the category Ci for the supercategory Sj, which total score is based upon the likelihood functions for the category Ci and the categories Ck in the collection which have previously been assigned to the supercategory Sj manually, (g) means for repeating steps (c)-(f) for a plurality of other supercategories Sj, (h) means for assigning category Ci to the supercategory for which the total score calculated for the category Ci is the highest, and (i) means for repeating steps (b)-(h) for a plurality of other categories in the collection which have not yet been assigned to supercategories and which have not yet been processed. - View Dependent Claims (63, 64, 65, 66, 67, 68, 69, 70)
-
Specification