Distributed set-expression cardinality estimation
First Claim
Patent Images
1. In a system adapted to receive update streams from a plurality of update stream sites, a method comprising:
- establishing, for each of said update stream sites, a respective site charge budget, said site charge budget being determined by a coordinator site; and
communicating, by the coordinator site, which comprises at least a processor, said site charge budgets toward said update stream sites;
each site charge budget being adapted to control an initiation of update stream transmission by restricting transmission of said update stream until a sum of element charges at the respective update stream site exceeds the site charge budget, said element charges being attributed to stream element updates;
wherein said update stream elements comprise insertion elements and deletion elements, each of said insertion elements and deletion elements being associated with element charges wherein said insertion element charges and said deletion elements charges are determined in a manner satisfying the following invariants;
For each e∈
E−
Ê
, Σ
jφ
j+(e)≧
1; and
For each e∈
Ê
−
E, Σ
jφ
j−
(e)≧
1.where e is the specific data element whose frequency changes;
E is a set expression;
Ê
is the result of evaluating the expression E;
φ
j (e) is a charge with each element e at every remote site j.
6 Assignments
0 Petitions
Accused Products
Abstract
A method and system for answering set-expression cardinality queries while lowering data communication costs by utilizing a coordinator site to provide global knowledge of the distribution of certain frequently occurring stream elements to significantly reduce the transmission of element state information to the central site and, optionally, capturing the semantics of the input set expression in a Boolean logic formula and using models of the formula to determine whether an element state change at a remote site can affect the set expression result.
-
Citations
19 Claims
-
1. In a system adapted to receive update streams from a plurality of update stream sites, a method comprising:
-
establishing, for each of said update stream sites, a respective site charge budget, said site charge budget being determined by a coordinator site; and communicating, by the coordinator site, which comprises at least a processor, said site charge budgets toward said update stream sites; each site charge budget being adapted to control an initiation of update stream transmission by restricting transmission of said update stream until a sum of element charges at the respective update stream site exceeds the site charge budget, said element charges being attributed to stream element updates; wherein said update stream elements comprise insertion elements and deletion elements, each of said insertion elements and deletion elements being associated with element charges wherein said insertion element charges and said deletion elements charges are determined in a manner satisfying the following invariants;
For each e∈
E−
Ê
, Σ
jφ
j+(e)≧
1; and
For each e∈
Ê
−
E, Σ
jφ
j−
(e)≧
1.where e is the specific data element whose frequency changes; E is a set expression; Ê
is the result of evaluating the expression E;φ
j (e) is a charge with each element e at every remote site j. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A distributed framework for providing a set expression estimation, comprising:
-
allocating, by a coordinator site, error budgets to each of a plurality of remote sites according to a first set of invariants, each of said remote sites providing a respective update stream to the coordinator site, said update streams including at least one of insert elements and delete elements; and allocating update stream insert element and delete element charges according to a second set of invariants comprising;
For each e∈
E−
Ê
, Σ
jφ
j+(e)≧
1; and
For each e∈
Ê
−
E, Σ
jφ
j−
(e)≧
1.where e is the specific data element whose frequency changes; E is a set expression; Ê
is the result of evaluating the expression E;φ
j e) is a charge with each element e at every remote site j.each of said remote sites accumulating element charges in response to the occurrence of respective update and delete elements; each of said remote sites transmitting a respective update stream to said coordinator site when an accumulation of element charges exceeds a respective error budget. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A system for processing set-expression cardinality queries, comprising:
-
a central site for establishing and communicating a site charge budget; and a plurality of update stream sites, wherein; each update stream site being allocated an error budget satisfying the following invariants;
For each e∈
E−
Ê
, Σ
jφ
j+(e)≧
1; and
For each e∈
Ê
−
E, Σ
jφ
j−
(e)≧
1.where e is the specific data element whose frequency changes; E is a set expression; Ê
is the result of evaluating the expression E;φ
j e) is a charge with each element e at every remote site j.said error budgets adapted to determine when the update stream site communicates stream state information to the central site, said site error budget θ
i (e) is reduced by a factor of 2 if Ci (e) becomes less than θ
i (e) and doubled if Ci (e) exceeds 4×
θ
i (e).each update stream site associates a charge with every stream element that is inserted or deleted since stream state was previously transmitted to the central site; and
each update stream site transmits current stream state information when the sum of its respective element charges exceeds its respective error budget.
-
Specification