Computer-implemented method, computer program product and system for creating an index of a subset of data
First Claim
1. A computer-implemented method for creating an inverted index of a subset of data, comprising:
- executing in a computer system processor a software component of a computer program product, wherein said executing of the software component configures the computer system to perform;
a first step of receiving from computer storage;
a set of data;
a first inverted index of said set of data; and
identifiers of a subset of data of said set; and
a second step of creating, without reprocessing the data identified by said identifiers, a second inverted index using only the first inverted index and said identifiers, said second inverted index being both an inverted index of said subset of data and a subset of said first inverted index, wherein said second step of creating comprises comparing the first inverted index with said identifiers, and wherein said second step does not make use of data of the set which are not in the subset or identifiers of data of the set which are not in the subset.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer implemented method for transforming an inverted index of a collection of documents into a smaller inverted index of documents. The smaller index contains links to all and only to those documents appearing in a subset of the original collection of documents. The method avoids reprocessing the subset to create the smaller inverted index by intersecting each inverted list with the list of document references from the desired subset. If this intersection is empty then the list is removed from the new smaller index, otherwise the list containing only the intersected reference list is included in the new inverted index. The method is also extended to deal with creating multiple smaller inverted indexes and with propagating updates changes in the first collection of documents down into the smaller inverted index or indexes.
15 Citations
11 Claims
-
1. A computer-implemented method for creating an inverted index of a subset of data, comprising:
-
executing in a computer system processor a software component of a computer program product, wherein said executing of the software component configures the computer system to perform; a first step of receiving from computer storage; a set of data; a first inverted index of said set of data; and identifiers of a subset of data of said set; and a second step of creating, without reprocessing the data identified by said identifiers, a second inverted index using only the first inverted index and said identifiers, said second inverted index being both an inverted index of said subset of data and a subset of said first inverted index, wherein said second step of creating comprises comparing the first inverted index with said identifiers, and wherein said second step does not make use of data of the set which are not in the subset or identifiers of data of the set which are not in the subset. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer program product on a computer readable storage medium, the computer program product including a software component executable by a computer system processor to perform the steps of:
-
a first step of receiving from computer storage; a set of data; a first inverted index of said set of data; and identifiers of a subset of data of said set; and a second step of creating, without reprocessing the data identified by said identifiers, a second inverted index using only the first inverted index and said identifiers, said second inverted index being both an inverted index of said subset of data and a subset of said first inverted index, wherein said second step of creating comprises comparing the first inverted index with said identifiers, wherein said second step does not make use of data of the set which are not in the subset or identifiers of data of the set which are not in the subset.
-
-
11. A computer system comprising:
-
computer storage; and a computer program product including a software component executable by a computer system processor to perform the steps of; a first step of receiving from said computer storage; a set of data; a first inverted index of said set of data; and identifiers of a subset of data of said set; and a second step of creating, without reprocessing the data identified by said identifiers, a second inverted index using only the first inverted index and said identifiers, said second inverted index being both an inverted index of said subset of data and a subset of said first inverted index, wherein said second step of creating comprises comparing the first inverted index with said identifiers, and wherein said second step does not make use of data of the set which are not in the subset or identifiers of data of the set which are not in the subset.
-
Specification