| US 7,542,954 B1 | ||
| Data classification by kernel density shape interpolation of clusters | ||
| Tanveer Syeda-Mahmood, Cupertino, Calif. (US); Peter J. Haas, San Jose, Calif. (US); John M. Lake, Cary, N.C. (US); and Guy M. Lohman, San Jose, Calif. (US) | ||
| Assigned to International Business Machines Corporation, Armonk, N.Y. (US) | ||
| Filed on Jun. 30, 2008, as Appl. No. 12/164,532. | ||
| Application 12/164532 is a continuation of application No. 12/142949, filed on Jun. 20, 2008. | ||
| Application 12/142949 is a continuation of application No. 11/940739, filed on Nov. 15, 2007, granted, now 7,412,429. | ||
| This patent is subject to a terminal disclaimer. | ||
| Int. Cl. G06F 17/00 (2006.01); G06F 15/00 (2006.01); G06F 15/18 (2006.01); G06N 5/00 (2006.01) | ||
| U.S. Cl. 706—45 [706/62] | 5 Claims |

| 1. A method executed on a computer for representing a dataset for data classification, the method comprising:
clustering the dataset using an unsupervised, non-parametric clustering method to generate a set of clusters each comprising
a set of data points in an image of the dataset;
clustering the data points of each cluster of the set of clusters using a supervised, partitional clustering method to partition
each cluster into a specified number of sub-clusters each comprising a subset of the set of data points of the cluster;
generating a density estimate value of each grid point of a set of grid points sampled from the image at a specified resolution
for each sub-cluster in the image using a kernel density function;
evaluating the density estimate value of each grid point for each sub-cluster to identify a maximum density estimate value
for the grid point and a sub-cluster associated with the maximum density estimate value for the grid point;
adding each grid point for which the maximum density estimate value exceeds a specified threshold to the sub-cluster associated
with the maximum density estimate value for the grid point; and
for each cluster of the set of the clusters, merging the sub-clusters of the cluster into a cluster region in the image corresponding
to the cluster to form a shape interpolated representation of the set of clusters.
|