US 11,755,625 B2
Aggregation of noisy datasets into master firmographic database
Tai Vo, Orange, CA (US); Nitin Vijayvargiya, San Francisco, CA (US); Daniel Hsiung, San Jose, CA (US); Premal Shah, Union City, CA (US); Viral Bajaria, San Francisco, CA (US); and Akshara Palakodety, Mountain View, CA (US)
Assigned to 6Sense Insights, Inc., San Franciso, CA (US)
Filed by 6SENSE INSIGHTS, INC., San Francisco, CA (US)
Filed on Jun. 29, 2021, as Appl. No. 17/362,843.
Claims priority of provisional application 63/045,707, filed on Jun. 29, 2020.
Prior Publication US 2021/0406285 A1, Dec. 30, 2021
Int. Cl. G06F 16/28 (2019.01); G06F 16/21 (2019.01); G06F 16/215 (2019.01); G06F 16/2455 (2019.01); G06F 16/22 (2019.01)
CPC G06F 16/285 (2019.01) [G06F 16/211 (2019.01); G06F 16/215 (2019.01); G06F 16/2272 (2019.01); G06F 16/24556 (2019.01)] 17 Claims
OG exemplary drawing
 
1. A method comprising using at least one hardware processor to:
receive data comprising a plurality of firmographic records from a plurality of sources, wherein each of the plurality of firmographic records comprises a plurality of fields;
normalize the plurality of firmographic records into a common schema;
clean the plurality of firmographic records by replacing a value of each of one or more of the plurality of fields in one or more of the plurality of firmographic records with a value of that field in another one of the plurality of firmographic records, wherein cleaning the plurality of firmographic records comprises
classifying each of the plurality of firmographic records into one of a plurality of categories, wherein the plurality of categories comprises a strong category, a neutral category, and a weak category, and wherein classifying each of the plurality of firmographic records into one of a plurality of categories comprises, for each of the plurality of firmographic records,
calculating a first strength of a first value for a first field in the firmographic record within a first dimension defined by a second value for a second field in the firmographic record and a third value for a third field in the firmographic record,
calculating a second strength of the second value within a second dimension defined by the first value and the third value,
when the first strength and the second strength both satisfy a respective strong criterion, classifying the firmographic record into the strong category,
when the first strength and the second strength both satisfy a respective weak criterion, classifying the firmographic record into the weak category, and,
when the first strength and the second strength do not both satisfy the respective strong criterion and do not both satisfy the respective weak criterion, classifying the firmographic record into the neutral category, and,
for each of one or more of the plurality of firmographic records that are classified into the weak category, replacing the value of each of one or more of the plurality of fields in that firmographic record with the value of that field in one of the plurality of firmographic records that is classified into the strong category,
wherein none of the values of the plurality of fields in the plurality of firmographic records that are classified into the neutral category are replaced during the cleaning;
cluster the plurality of firmographic records into a plurality of clusters, wherein each of the plurality of clusters comprises a subset of the plurality of firmographic records;
for each of the plurality of clusters, collapse the subset of firmographic records in that cluster into a single conflated firmographic record based on a voting process within that cluster;
generate a master identifier for each conflated firmographic record; and
merge the conflated firmographic records into a master firmographic database, comprising a plurality of mastered firmographic records, indexed by the master identifiers.