Database system with improved methods for filtering duplicates from a tuple stream
First Claim
1. In a database system storing a plurality of database tables, each database table comprising a plurality of tuples storing columns of information, each column representing a particular category of information for which each tuple stores a data value, a method for eliminating duplicate tuples from a tuple stream, the method comprising:
- receiving a query specifying selection criteria for selecting information of interest from the database system, said query specifying that said information of interest is to be selected by a database join operation which joins selected ones of said database tables by one or more columns shared between tables (key columns), said query further specifying that the particular information is to be returned as distinct tuples;
determining a join order specifying a sequence indicating how said selected ones of said database tables are to be preferentially scanned by the system for determining which tuples of each said selected ones of said database tables qualify, said join order indicating innermost and outermost tables of the join and being selected so as to guarantee that tuples will stream in order during execution of the query;
initializing a filter at the outermost table for said one or more key columns, for forcing the method to pass a first tuple encountered and to construct an initial key from it;
attaching the filter to the innermost table, so that the filter is executed for each tuple which qualifies on the innermost table;
executing the query for generating a tuple stream satisfying said selection criteria, said executing step including scanning, according to said determined join order, said selected ones of said database tables; and
as the innermost table is scanned, executing the filter for filtering duplicate tuples from the tuple stream by discarding those tuples having keys already encountered in the tuple stream.
1 Assignment
0 Petitions
Accused Products
Abstract
A Client/Server Database system is described which includes a Database Server providing methods eliminating duplicates from an ordered tuple stream (e.g., resulting from a query involving a database "join"), without the need for performing an expensive sort operation. Specifically, the system provides a "filter" which eliminates duplicates without having to perform a sort. The filter, which is implemented as an optimization at the level of the query processor, comprises two basic pieces. The first piece, INIT-- FILTER, simply serves to initialize the filter--that is, the piece sets a flag that forces the filter to pass the first tuple encountered and to construct a first key from it. The second piece, FILTER, serves as the actual filter, when the system scans the tuple stream. If the current tuple has the same key as the preceding tuple, then the current tuple is thrown away. Otherwise, the current tuple is passed and a new key is constructed from it. The positions of both INIT-- FILTER and FILTER in a given join order are important. INIT-- FILTER immediately preceeds the scan which initializes the filter; FILTER immediately follows the scan which actually performs the filtering.
177 Citations
10 Claims
-
1. In a database system storing a plurality of database tables, each database table comprising a plurality of tuples storing columns of information, each column representing a particular category of information for which each tuple stores a data value, a method for eliminating duplicate tuples from a tuple stream, the method comprising:
-
receiving a query specifying selection criteria for selecting information of interest from the database system, said query specifying that said information of interest is to be selected by a database join operation which joins selected ones of said database tables by one or more columns shared between tables (key columns), said query further specifying that the particular information is to be returned as distinct tuples; determining a join order specifying a sequence indicating how said selected ones of said database tables are to be preferentially scanned by the system for determining which tuples of each said selected ones of said database tables qualify, said join order indicating innermost and outermost tables of the join and being selected so as to guarantee that tuples will stream in order during execution of the query; initializing a filter at the outermost table for said one or more key columns, for forcing the method to pass a first tuple encountered and to construct an initial key from it; attaching the filter to the innermost table, so that the filter is executed for each tuple which qualifies on the innermost table; executing the query for generating a tuple stream satisfying said selection criteria, said executing step including scanning, according to said determined join order, said selected ones of said database tables; and as the innermost table is scanned, executing the filter for filtering duplicate tuples from the tuple stream by discarding those tuples having keys already encountered in the tuple stream. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
Specification