| US 7,496,584 B2 | ||
| Incremental cardinality estimation for a set of data values | ||
| Walid Rjaibi, Thornhill (Canada); and Peter Jay Haas, San Jose, Calif. (US) | ||
| Assigned to International Business Machines Corporation, Armonk, N.Y. (US) | ||
| Filed on Aug. 08, 2006, as Appl. No. 11/463,294. | ||
| Application 11/463294 is a continuation of application No. 10/428191, filed on Apr. 30, 2003, granted, now 7,124,146. | ||
| Prior Publication US 2006/0288022 A1, Dec. 21, 2006 | ||
| This patent is subject to a terminal disclaimer. | ||
| Int. Cl. G06F 17/30 (2006.01) | ||
| U.S. Cl. 707—100 [707/7; 707/203] | 13 Claims |

| 1. A method performed on a computer system for estimating a cardinality value for a set of data values, the method comprising:
initializing a data structure for representing an array of counts, wherein a given entry in the array of counts indicates
a number of data values that have hashed to a position of the given entry;
obtaining a data value from said set of data values;
transforming said data value into a transformed string;
modifying said data structure with said transformed string;
obtaining a summary statistic value from said modified data structure, wherein the summary statistic value is based on the
array of counts; and
generating said estimated cardinality value using said summary statistic value, wherein said generating step is performed
for each distinct occurrence of a hashing function within a plurality of hashing functions, and wherein said generating step
further comprises:
generating a cardinality estimate value associated with each instance of said hashing function, wherein said cardinality estimate
value generating step further comprises computing a result of 2n/q, wherein n is equal to said summary statistic value and
q is less than one;
summing said cardinality estimate values;
averaging said cardinality estimate values to produce a raw cardinality estimate value; and
adjusting said raw cardinality estimate value for data value collisions to produce a final cardinality estimate value.
|