US 11,816,131 B2
Cross-domain clusterability evaluation for cross-guided data clustering based on alignment between data domains
Jeffrey M. Achtermann, Austin, TX (US); Indrajit Bhattacharya, New Delhi (IN); Kevin W. English, Fairfield, CT (US); Shantanu R. Godbole, New Delhi (IN); Sachindra Joshi, New Delhi (IN); Ashwin Srinivasan, New Delhi (IN); and Ashish Verma, New Delhi (IN)
Assigned to KYNDRYL, INC., New York, NY (US)
Filed by Kyndryl, Inc., New York, NY (US)
Filed on Mar. 25, 2019, as Appl. No. 16/362,861.
Application 16/362,861 is a continuation of application No. 15/070,495, filed on Mar. 15, 2016, granted, now 10,311,086.
Application 15/070,495 is a continuation of application No. 14/147,691, filed on Jan. 6, 2014, granted, now 9,336,296, issued on May 10, 2016.
Application 14/147,691 is a continuation of application No. 13/432,425, filed on Mar. 28, 2012, granted, now 8,639,696, issued on Jan. 28, 2014.
Application 13/432,425 is a continuation of application No. 12/683,095, filed on Jan. 6, 2010, granted, now 8,229,929, issued on Jul. 24, 2012.
Prior Publication US 2019/0220470 A1, Jul. 18, 2019
Int. Cl. G06F 16/28 (2019.01); G06F 16/35 (2019.01); G06F 16/951 (2019.01)
CPC G06F 16/285 (2019.01) [G06F 16/355 (2019.01); G06F 16/951 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A method, said method comprising:
finding, by a processor of a computer system, an alignment between source centroids of a source domain and target centroids of a target domain over a cross-domain similarity graph, said cross-domain similarity graph comprising a first set of vertices corresponding to the source centroids and a second set of vertices corresponding to the target centroids wherein a weight is assigned to each edge between one of the vertices in the first set vertices corresponding to the source centroids and one of the vertices in the second set vertices corresponding to the target centroids, said finding the alignment comprising determining a subset of the edges having a maximum sum of the weights as compared with a sum of the weights for all other subsets of the edges, subject to each vertex connected by the edges in the subset being spanned by at most one edge in the subset;
said processor calculating target clusterability as an average of a respective clusterability of at least one target data item comprised by the target domain;
said processor calculating target-side matchability as an average of a respective matchability of each target centroid of the target domain to source centroids of the source domain, wherein the source domain comprises at least one source data item;
said processor calculating source-side matchability as an average of a respective matchability of each source centroid of said source centroids to the target centroids;
said processor calculating source-target pair matchability as an average of the target-side match ability and the source-side matchability; and
said processor calculating cross-domain clusterability between the target domain and the source domain as a linear combination of the calculated target clusterability and the calculated source-target pair matchability, said cross-domain clusterability providing an improved clustering over conventional data clustering due to the alignment between the source centroids and the target centroids; and
said processor transferring the calculated cross-domain clusterability over a network to a first device selected from the group consisting of an output device of a computer system, a storage device of the computer system, a remote computer system coupled to the computer system, and a combination thereof.