US 7,596,544 B2
Tracking set-expression cardinalities over continuous update streams
Sumit Ganguly, Bhopal (India); Minos Garofalakis, Morristown, N.J. (US); and Rajeev Rastogi, New Providence, N.J. (US)
Assigned to Alcatel-Lucent USA Inc., Murray Hill, N.J. (US)
Filed on Dec. 29, 2004, as Appl. No. 11/25,355.
Prior Publication US 2006/0143218 A1, Jun. 29, 2006
Int. Cl. G06F 7/00 (2006.01)
U.S. Cl. 707—2 17 Claims
OG exemplary drawing
 
1. A method of obtaining an estimate of a set-expression cardinality relating to at least a first and second data-stream, the method comprising the steps of:
using a database management system comprising a computer for:
creating a first hash-sketch synopsis for the first data stream and a second hash-sketch synopsis for the second data stream, each hash-sketch synopsis comprising a random hash-table and a 2-level hash sketch for each hash-bucket of said random hash-tables, said 2-level hash sketch comprising a first-level hash-table and a counter away for each hash-bucket of said first-level hash-table;
pre-hashing said first and second data-streams into said first and second random hash tables, respectively;
hashing individual buckets of said random hashing tables to the corresponding 2-level hash sketch for each of those buckets;
maintaining said first and said second hash-sketch synopsis using one or more data elements from said first and second data-streams respectively;
obtaining a set-expression singleton count over said first and second hash-sketch; and
estimating said set-expression cardinality estimate using said set-expression singleton count.