US 11,704,438 B2
Systems and method of contextual data masking for private and secure data linkage
Satyender Goel, Chicago, IL (US); Upwan Chachra, Bothell, WA (US); and James B. Cushman, II, Longboat Key, FL (US)
Assigned to Collibra Belgium BV, Brussels (BE)
Filed by Collibra Belgium BV, Brussels (BE)
Filed on Jun. 21, 2022, as Appl. No. 17/845,848.
Application 17/845,848 is a continuation of application No. 16/776,293, filed on Jan. 29, 2020, granted, now 11,366,928, issued on Jun. 21, 2022.
Prior Publication US 2022/0318428 A1, Oct. 6, 2022
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 21/62 (2013.01); H04L 9/06 (2006.01); H04L 9/32 (2006.01)
CPC G06F 21/6245 (2013.01) [G06F 21/6227 (2013.01); H04L 9/0643 (2013.01); H04L 9/3213 (2013.01)] 17 Claims
OG exemplary drawing
 
1. A computer-implemented method for securely classifying and tokenizing data, the method comprising:
ingesting a dataset corresponding to a client;
inspecting the dataset to identify a classifier that is indicative of a characteristic of an attribute included in the dataset;
retrieving client-specific encryption information and client-specific configuration information that includes a listing of anonymized labels that are indicative of types of information included in the dataset;
identifying a label included in the listing of anonymized labels that corresponds to a type of information in the attribute based on the identified classifier;
responsive to determining that the attribute corresponds to the label, processing the attribute of the dataset to generate a modified attribute that is modified into a standardized format according to a set of standardization rules; and
generating a tokenized version of the modified attribute, including:
hashing the modified attribute using a hash salt and encryption key included in the client-specific encryption information to generate a hashed modified attribute;
comparing the label with a tag store including a series of client-specific tags to identify a first tag that corresponds to the label; and
generating a contextualized token of the modified attribute that includes the first tag:
wherein processing the attribute of the dataset to generate the modified attribute further comprises:
retrieving a set of validation rules and the set of standardization rules that correspond to the attribute, the set of validation rules providing rules indicative of whether the attribute corresponds to the label, and the set of standardization rules providing rules to modify the attribute into the standardized format; and
comparing the attribute with the set of validation rules to determine whether the attribute corresponds to the label.