Sort system for merging database entries
First Claim
1. A method for merging databases, comprising:
- identifying common terms that exist in a first and a second inverted subdatabase;
placing subdocument identifiers of said common terms for said second inverted subdatabase into said first inverted subdatabase;
placing subdocument identifiers of non-common terms for said second inverted subdatabase into said first inverted subdatabase after said placement of said common term subdocument identifiers; and
sorting said subdocument identifiers from said second inverted subdatabase in an order corresponding to an order of said subdocument identifiers of said first inverted subdatabase.
3 Assignments
0 Petitions
Accused Products
Abstract
The present invention is a method for operating a computer system to minimize the number of disk storage access operations used in creating an inverted database. This method divides a database into several smaller subdatabases. The documents of the subdatabases are decomposed into subdocuments. A postings list for each subdatabase is then created in which all the terms for the subdatabase are associated with the identity of each subdocument of the subdatabase in which the terms occur. The resulting postings lists for the subdatabases are then merged. The merge process sorts the postings of the subdatabases and merges common terms. The non-common terms are merged after the common terms. The process of sorting the postings list and then merging the common terms followed by the non-common terms minimizes the number of disk storage access operations required for creating the inverted database from a series of inverted subdatabases.
45 Citations
8 Claims
-
1. A method for merging databases, comprising:
-
identifying common terms that exist in a first and a second inverted subdatabase;
placing subdocument identifiers of said common terms for said second inverted subdatabase into said first inverted subdatabase;
placing subdocument identifiers of non-common terms for said second inverted subdatabase into said first inverted subdatabase after said placement of said common term subdocument identifiers; and
sorting said subdocument identifiers from said second inverted subdatabase in an order corresponding to an order of said subdocument identifiers of said first inverted subdatabase. - View Dependent Claims (2, 3, 4)
said common terms are sorted in a heap sort process prior to placement in said first inverted subdatabase.
-
-
3. A method for merging databases, as in claim 1, wherein:
-
said merging of said inverted subdatabases comprises selecting terms from a second inverted subdatabase to be merged into a first inverted subdatabase;
identifying a second inverted subdatabase index for each of said selected terms in said second inverted subdatabase;
translating said second inverted subdatabase index into a first inverted subdatabase index when said term in said second subdatabase exists in said first subdatabase;
sorting said second inverted subdatabase by said index; and
placing entries from said second inverted subdatabase into said first inverted subdatabase by said index.
-
-
4. A method for merging databases, as in claim 3, wherein:
a heap sort process sorts said second inverted subdatabase.
-
5. A system for retrieving documents from a database, comprising:
-
a computer coupled to a disk storage unit, said disk storage unit stores a database;
said computer divides said database into a plurality of subdatabases stored on said disk storage unit, said subdatabases being formed from a plurality of documents from said database;
said computer inverts each of said subdatabases by dividing each document of said subdatabase into subdocuments wherein each subdocument has an identifier and relating each term of said subdocument with each subdocument in which said term appears by said subdocument identifier;
said computer merges said inverted subdatabases by identifying common terms that exist in a first and a second inverted subdatabase;
said computer merges said inverted database by placing subdocument identifiers of said common terms for said second inverted subdatabase into said first inverted subdatabase;
said computer merges said inverted database by placing subdocument identifiers of non-common terms for said second inverted subdatabase into said first inverted subdatabase after said placement of said common term subdocument identifiers; and
said computer sorts said subdocument identifiers from said second inverted subdatabase in an order corresponding to an order of said subdocument identifiers of said first inverted subdatabase. - View Dependent Claims (6, 7, 8)
said common terms are sorted in a heap sort process prior to placement in said first inverted subdatabase.
-
-
7. A system for retrieving documents from a database, as in claim 5, wherein:
-
said computer merges said inverted subdatabases by selecting terms from a second inverted subdatabase to be merged into a first inverted subdatabase;
said computer merges said inverted database by identifying a second inverted subdatabase index for each of said selected terms in said second inverted subdatabase;
said computer merges said inverted subdatabase by translating said second inverted subdatabase index into a first inverted subdatabase index when said term in said second subdatabase exists in said first subdatabase;
said computer merges said inverted subdatabase by sorting said second inverted subdatabase by said index; and
said computer merges said inverted subdatabase by placing entries from said second inverted subdatabase into said first inverted subdatabase by said index.
-
-
8. A system for retrieving documents from a database, as in claim 7, wherein:
a heap sort process sorts said second inverted subdatabase.
Specification