US 11,704,315 B1
Trimming blackhole clusters
Yan Yan, Seattle, WA (US); Aria Haghighi, Seattle, WA (US); and Joseph Christianson, Seattle, WA (US)
Assigned to AMPERITY, INC., Seattle, WA (US)
Filed by Amperity, Inc., Seattle, WA (US)
Filed on Jul. 24, 2020, as Appl. No. 16/938,233.
Int. Cl. G06F 16/2453 (2019.01); G06F 16/28 (2019.01); G06F 16/2457 (2019.01)
CPC G06F 16/24542 (2019.01) [G06F 16/285 (2019.01); G06F 16/24578 (2019.01)] 20 Claims
OG exemplary drawing
 
1. A method comprising:
reading a set of clusters from a database, each cluster in the clusters including a plurality of records stored in a table of the database, wherein the plurality of records is received from a plurality of independent data sources and wherein the plurality of records is associated with a plurality of entities, wherein at least one record in the plurality of records includes incorrect data;
extracting an oversized cluster in the set of clusters;
performing a breadth-first search (BFS) on the oversized cluster, the BFS generating a list of visited records;
terminating the BFS upon determining that a size of the list of visited records exceeds a maximum size, wherein the size of the list of visited records is less than a size of the oversized cluster;
generating a new cluster from the list of visited records and adding the new cluster to the set of clusters, wherein the new cluster comprises a set of records in the visited records associated with a single entity in the plurality of entities; and
writing the new cluster to the database by updating each of the visited records in the table to identify the new cluster.