Database system with improved methods for filtering duplicates from a tuple stream

US 5,937,401 A
Filed: 11/27/1996
Issued: 08/10/1999
Est. Priority Date: 11/27/1996
Status: Expired due to Term

First Claim

Patent Images

1. In a database system storing a plurality of database tables, each database table comprising a plurality of tuples storing columns of information, each column representing a particular category of information for which each tuple stores a data value, a method for eliminating duplicate tuples from a tuple stream, the method comprising:

receiving a query specifying selection criteria for selecting information of interest from the database system, said query specifying that said information of interest is to be selected by a database join operation which joins selected ones of said database tables by one or more columns shared between tables (key columns), said query further specifying that the particular information is to be returned as distinct tuples;

determining a join order specifying a sequence indicating how said selected ones of said database tables are to be preferentially scanned by the system for determining which tuples of each said selected ones of said database tables qualify, said join order indicating innermost and outermost tables of the join and being selected so as to guarantee that tuples will stream in order during execution of the query;

initializing a filter at the outermost table for said one or more key columns, for forcing the method to pass a first tuple encountered and to construct an initial key from it;

attaching the filter to the innermost table, so that the filter is executed for each tuple which qualifies on the innermost table;

executing the query for generating a tuple stream satisfying said selection criteria, said executing step including scanning, according to said determined join order, said selected ones of said database tables; and

as the innermost table is scanned, executing the filter for filtering duplicate tuples from the tuple stream by discarding those tuples having keys already encountered in the tuple stream.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A Client/Server Database system is described which includes a Database Server providing methods eliminating duplicates from an ordered tuple stream (e.g., resulting from a query involving a database "join"), without the need for performing an expensive sort operation. Specifically, the system provides a "filter" which eliminates duplicates without having to perform a sort. The filter, which is implemented as an optimization at the level of the query processor, comprises two basic pieces. The first piece, INIT_-- FILTER, simply serves to initialize the filter--that is, the piece sets a flag that forces the filter to pass the first tuple encountered and to construct a first key from it. The second piece, FILTER, serves as the actual filter, when the system scans the tuple stream. If the current tuple has the same key as the preceding tuple, then the current tuple is thrown away. Otherwise, the current tuple is passed and a new key is constructed from it. The positions of both INIT_-- FILTER and FILTER in a given join order are important. INIT_-- FILTER immediately preceeds the scan which initializes the filter; FILTER immediately follows the scan which actually performs the filtering.

177 Citations

10 Claims

1. In a database system storing a plurality of database tables, each database table comprising a plurality of tuples storing columns of information, each column representing a particular category of information for which each tuple stores a data value, a method for eliminating duplicate tuples from a tuple stream, the method comprising:
- receiving a query specifying selection criteria for selecting information of interest from the database system, said query specifying that said information of interest is to be selected by a database join operation which joins selected ones of said database tables by one or more columns shared between tables (key columns), said query further specifying that the particular information is to be returned as distinct tuples;
  
  determining a join order specifying a sequence indicating how said selected ones of said database tables are to be preferentially scanned by the system for determining which tuples of each said selected ones of said database tables qualify, said join order indicating innermost and outermost tables of the join and being selected so as to guarantee that tuples will stream in order during execution of the query;
  
  initializing a filter at the outermost table for said one or more key columns, for forcing the method to pass a first tuple encountered and to construct an initial key from it;
  
  attaching the filter to the innermost table, so that the filter is executed for each tuple which qualifies on the innermost table;
  
  executing the query for generating a tuple stream satisfying said selection criteria, said executing step including scanning, according to said determined join order, said selected ones of said database tables; and
  
  as the innermost table is scanned, executing the filter for filtering duplicate tuples from the tuple stream by discarding those tuples having keys already encountered in the tuple stream.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein said query comprises a structured query language (SQL) statement including a DISTINCT command.
  - 3. The method of claim 1, wherein said one or more columns shared between tables comprises a single column shared between tables.
  - 4. The method of claim 1, wherein said one or more columns shared between tables comprises two or more columns shared between tables.
  - 5. The method of claim 1, wherein said determining a join order step includes:
    - determining by a optimizer, an optimal join order for executing the query.
  - 6. The method of claim 1, wherein said executing the filter step includes:
    - constructing a key for each tuple;
      
      if the key has changed from that of a previous key, passing the tuple into the tuple stream; and
      
      if the key has not changed from that of a previous key, discarding the tuple, so that it is eliminated from the tuple stream.
  - 7. The method of claim 1, wherein said query comprises a structured query language (SQL) statement having a general form of:
    - SELECT DISTINCTFROMWHERE.
  - 8. The method of claim 1, wherein duplicate tuples are eliminated from the tuple stream without sorting the tuple stream.
  - 9. The method of claim 1, wherein duplicate tuples are eliminated from the tuple stream without hashing keys for lookup.
  - 10. The method of claim 1, further comprising:
    - determining before application of the filter, whether the tuple stream is already unique; and
      
      if the tuple steam is already known to be unique, passing all tuples of the tuple stream.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sybase Incorporated (SAP SE)
Original Assignee
Sybase Incorporated (SAP SE)
Inventors
Hillegas, Richard
Primary Examiner(s)
Kulik, Paul V.
Assistant Examiner(s)
Corrielus, Jean M.

Application Number

US08/757,367
Time in Patent Office

986 Days
Field of Search

707/2, 707/3, 707/201, 707/10, 364/282.1, 364/283.4, 364/283.1, 364/222.81
US Class Current

1/1
CPC Class Codes

G06F 16/24556   Aggregation; Duplicate elim...

G06F 16/24568   Data stream processing; Con...

Y10S 707/99932   Access augmentation or opti...

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99952   Coherency, e.g. same view t...

Database system with improved methods for filtering duplicates from a tuple stream

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

177 Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Database system with improved methods for filtering duplicates from a tuple stream

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

177 Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links