Sort system for merging database entries

US 6,523,030 B1
Filed: 10/24/2000
Issued: 02/18/2003
Est. Priority Date: 07/25/1997
Status: Expired due to Fees

First Claim

Patent Images

1. A method for merging databases, comprising:

identifying common terms that exist in a first and a second inverted subdatabase;

placing subdocument identifiers of said common terms for said second inverted subdatabase into said first inverted subdatabase;

placing subdocument identifiers of non-common terms for said second inverted subdatabase into said first inverted subdatabase after said placement of said common term subdocument identifiers; and

sorting said subdocument identifiers from said second inverted subdatabase in an order corresponding to an order of said subdocument identifiers of said first inverted subdatabase.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention is a method for operating a computer system to minimize the number of disk storage access operations used in creating an inverted database. This method divides a database into several smaller subdatabases. The documents of the subdatabases are decomposed into subdocuments. A postings list for each subdatabase is then created in which all the terms for the subdatabase are associated with the identity of each subdocument of the subdatabase in which the terms occur. The resulting postings lists for the subdatabases are then merged. The merge process sorts the postings of the subdatabases and merges common terms. The non-common terms are merged after the common terms. The process of sorting the postings list and then merging the common terms followed by the non-common terms minimizes the number of disk storage access operations required for creating the inverted database from a series of inverted subdatabases.

45 Citations

View as Search Results

8 Claims

1. A method for merging databases, comprising:
- identifying common terms that exist in a first and a second inverted subdatabase;
  
  placing subdocument identifiers of said common terms for said second inverted subdatabase into said first inverted subdatabase;
  
  placing subdocument identifiers of non-common terms for said second inverted subdatabase into said first inverted subdatabase after said placement of said common term subdocument identifiers; and
  
  sorting said subdocument identifiers from said second inverted subdatabase in an order corresponding to an order of said subdocument identifiers of said first inverted subdatabase.
- View Dependent Claims (2, 3, 4)
- - 2. A method for merging databases, as in claim 1, wherein:
3. A method for merging databases, as in claim 1, wherein:
- said merging of said inverted subdatabases comprises selecting terms from a second inverted subdatabase to be merged into a first inverted subdatabase;
  
  identifying a second inverted subdatabase index for each of said selected terms in said second inverted subdatabase;
  
  translating said second inverted subdatabase index into a first inverted subdatabase index when said term in said second subdatabase exists in said first subdatabase;
  
  sorting said second inverted subdatabase by said index; and
  
  placing entries from said second inverted subdatabase into said first inverted subdatabase by said index.
4. A method for merging databases, as in claim 3, wherein:
- a heap sort process sorts said second inverted subdatabase.

5. A system for retrieving documents from a database, comprising:
- a computer coupled to a disk storage unit, said disk storage unit stores a database;
  
  said computer divides said database into a plurality of subdatabases stored on said disk storage unit, said subdatabases being formed from a plurality of documents from said database;
  
  said computer inverts each of said subdatabases by dividing each document of said subdatabase into subdocuments wherein each subdocument has an identifier and relating each term of said subdocument with each subdocument in which said term appears by said subdocument identifier;
  
  said computer merges said inverted subdatabases by identifying common terms that exist in a first and a second inverted subdatabase;
  
  said computer merges said inverted database by placing subdocument identifiers of said common terms for said second inverted subdatabase into said first inverted subdatabase;
  
  said computer merges said inverted database by placing subdocument identifiers of non-common terms for said second inverted subdatabase into said first inverted subdatabase after said placement of said common term subdocument identifiers; and
  
  said computer sorts said subdocument identifiers from said second inverted subdatabase in an order corresponding to an order of said subdocument identifiers of said first inverted subdatabase.
- View Dependent Claims (6, 7, 8)
- - 6. A system for retrieving documents from a database, as in claim 5, wherein:
7. A system for retrieving documents from a database, as in claim 5, wherein:
- said computer merges said inverted subdatabases by selecting terms from a second inverted subdatabase to be merged into a first inverted subdatabase;
  
  said computer merges said inverted database by identifying a second inverted subdatabase index for each of said selected terms in said second inverted subdatabase;
  
  said computer merges said inverted subdatabase by translating said second inverted subdatabase index into a first inverted subdatabase index when said term in said second subdatabase exists in said first subdatabase;
  
  said computer merges said inverted subdatabase by sorting said second inverted subdatabase by said index; and
  
  said computer merges said inverted subdatabase by placing entries from said second inverted subdatabase into said first inverted subdatabase by said index.
8. A system for retrieving documents from a database, as in claim 7, wherein:
- a heap sort process sorts said second inverted subdatabase.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Justsystems Evans Research Incorporated
Original Assignee
Claritech Corporation
Inventors
Horowitz, Michael L.
Primary Examiner(s)
Corrielus, Jean M.

Application Number

US09/695,113
Time in Patent Office

847 Days
Field of Search

707/5, 707/2, 707/7, 707/102, 707/532, 707/1, 707/100, 707/101, 707/103, 707/104, 707/3, 707/200-206, 705/7, 705/1
US Class Current

1/1
CPC Class Codes

G06F 16/2272   Management thereof

G06F 16/2456   Join operations

G06F 16/319   Inverted lists

G06F 16/328   Management therefor

G06F 2207/224   External sorting

G06F 7/36   Combined merging and sorting

Y10S 707/99931   Database or file accessing

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99937   Sorting

Sort system for merging database entries

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

45 Citations

8 Claims

Specification

Use Cases

Quick Links

Others

Sort system for merging database entries

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

8 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others