US 7,542,954 B1
Data classification by kernel density shape interpolation of clusters
Tanveer Syeda-Mahmood, Cupertino, Calif. (US); Peter J. Haas, San Jose, Calif. (US); John M. Lake, Cary, N.C. (US); and Guy M. Lohman, San Jose, Calif. (US)
Assigned to International Business Machines Corporation, Armonk, N.Y. (US)
Filed on Jun. 30, 2008, as Appl. No. 12/164,532.
Application 12/164532 is a continuation of application No. 12/142949, filed on Jun. 20, 2008.
Application 12/142949 is a continuation of application No. 11/940739, filed on Nov. 15, 2007, granted, now 7,412,429.
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 17/00 (2006.01); G06F 15/00 (2006.01); G06F 15/18 (2006.01); G06N 5/00 (2006.01)
U.S. Cl. 706—45  [706/62] 5 Claims
OG exemplary drawing
 
1. A method executed on a computer for representing a dataset for data classification, the method comprising:
clustering the dataset using an unsupervised, non-parametric clustering method to generate a set of clusters each comprising a set of data points in an image of the dataset;
clustering the data points of each cluster of the set of clusters using a supervised, partitional clustering method to partition each cluster into a specified number of sub-clusters each comprising a subset of the set of data points of the cluster;
generating a density estimate value of each grid point of a set of grid points sampled from the image at a specified resolution for each sub-cluster in the image using a kernel density function;
evaluating the density estimate value of each grid point for each sub-cluster to identify a maximum density estimate value for the grid point and a sub-cluster associated with the maximum density estimate value for the grid point;
adding each grid point for which the maximum density estimate value exceeds a specified threshold to the sub-cluster associated with the maximum density estimate value for the grid point; and
for each cluster of the set of the clusters, merging the sub-clusters of the cluster into a cluster region in the image corresponding to the cluster to form a shape interpolated representation of the set of clusters.