US 7,496,584 B2
Incremental cardinality estimation for a set of data values
Walid Rjaibi, Thornhill (Canada); and Peter Jay Haas, San Jose, Calif. (US)
Assigned to International Business Machines Corporation, Armonk, N.Y. (US)
Filed on Aug. 08, 2006, as Appl. No. 11/463,294.
Application 11/463294 is a continuation of application No. 10/428191, filed on Apr. 30, 2003, granted, now 7,124,146.
Prior Publication US 2006/0288022 A1, Dec. 21, 2006
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 17/30 (2006.01)
U.S. Cl. 707—100  [707/7; 707/203] 13 Claims
OG exemplary drawing
 
1. A method performed on a computer system for estimating a cardinality value for a set of data values, the method comprising:
initializing a data structure for representing an array of counts, wherein a given entry in the array of counts indicates a number of data values that have hashed to a position of the given entry;
obtaining a data value from said set of data values;
transforming said data value into a transformed string;
modifying said data structure with said transformed string;
obtaining a summary statistic value from said modified data structure, wherein the summary statistic value is based on the array of counts; and
generating said estimated cardinality value using said summary statistic value, wherein said generating step is performed for each distinct occurrence of a hashing function within a plurality of hashing functions, and wherein said generating step further comprises:
generating a cardinality estimate value associated with each instance of said hashing function, wherein said cardinality estimate value generating step further comprises computing a result of 2n/q, wherein n is equal to said summary statistic value and q is less than one;
summing said cardinality estimate values;
averaging said cardinality estimate values to produce a raw cardinality estimate value; and
adjusting said raw cardinality estimate value for data value collisions to produce a final cardinality estimate value.