Sort system for merging database entries
First Claim
1. A method for merging databases, comprising:
- identifying common terms that exist in a first and a second inverted subdatabase;
placing subdocument identifiers of said common terms for said second inverted subdatabase into said first inverted subdatabase;
placing subdocument identifiers of non-common terms for said second inverted subdatabase into said first inverted subdatabase after said placement of said common terms; and
sorting said subdocument identifiers from said second inverted subdatabase in an order corresponding to an order of said subdocument identifiers of said first inverted subdatabase.
3 Assignments
0 Petitions
Accused Products
Abstract
The present invention is a method for operating a computer system to minimize the number of disk storage access operations used in creating an inverted database. This method divides a database into several smaller subdatabases. The documents of the subdatabases are decomposed into subdocuments. A postings list for each subdatabase is then created in which all the terms for the subdatabase are associated with the identity of each subdocument of the subdatabase in which the terms occur. The resulting postings lists for the subdatabases are then merged. The merge process sorts the postings of the subdatabases and merges common terms. The non-common terms are merged after the common terms. The process of sorting the postings list and then merging the common terms followed by the non-common terms minimizes the number of disk storage access operations required for creating the inverted database from a series of inverted subdatabases.
35 Citations
8 Claims
-
1. A method for merging databases, comprising:
-
identifying common terms that exist in a first and a second inverted subdatabase; placing subdocument identifiers of said common terms for said second inverted subdatabase into said first inverted subdatabase; placing subdocument identifiers of non-common terms for said second inverted subdatabase into said first inverted subdatabase after said placement of said common terms; and sorting said subdocument identifiers from said second inverted subdatabase in an order corresponding to an order of said subdocument identifiers of said first inverted subdatabase. - View Dependent Claims (2)
-
-
3. A method for merging databases, comprising:
-
identifying common terms that exist in a first and a second inverted subdatabase; placing subdocument identifiers of said common terms for said second inverted subdatabase into said first inverted subdatabase; and placing subdocument identifiers of non-common terms for said second inverted subdatabase into said first inverted subdatabase after said placement of said common terms, wherein said merging of said inverted subdatabases comprises selecting terms from a second inverted subdatabase to be merged into a first inverted subdatabase; identifying a second inverted subdatabase index for each of said selected terms in said second inverted subdatabase; translating said second inverted subdatabase index into a first inverted subdatabase index when said term in said second subdatabase exists in said first subdatabase; sorting said second inverted subdatabase by said index; and placing entries from said second inverted subdatabase into said first inverted subdatabase by said index. - View Dependent Claims (4)
-
-
5. A system for retrieving documents from a database, comprising:
-
a computer coupled to a disk storage unit, said disk storage unit stores a database, said computer divides said database into a plurality of subdatabases stored on said disk storage unit, said subdatabases being formed from a plurality of documents from said database; said computer inverts each of said subdatabases by dividing each document of said subdatabase into subdocuments wherein each subdocument has an identifier and relating each term of said subdocument with each subdocument in which said term appears by said subdocument identifier; said computer merges said inverted subdatabases by identifying common terms that exist in a first and a second inverted subdatabase; said computer merges said inverted database by placing subdocument identifiers of said common terms for said second inverted subdatabase into said first inverted subdatabase; and said computer merges said inverted database by placing subdocument identifiers of non-common terms for said second inverted subdatabase into said first inverted subdatabase after said placement of said common terms, wherein said computer sorts said subdocument identifiers from said second inverted subdatabase in an order corresponding to an order of said subdocument identifiers of said first inverted subdatabase. - View Dependent Claims (6, 7, 8)
-
Specification