CPC G06F 16/21 (2019.01) | 20 Claims |
1. A computer-implemented method for creating an optimized data structure, the computer-implemented method comprising:
receiving a data file comprising a dataset, wherein the dataset consists of a plurality of records having key-value pairs;
setting at least one bucketing limit, wherein bucketing limits control a resulting data structure;
calculating a value size for each value in the dataset;
creating buckets based at least in part on the value size and the bucketing limit, wherein the buckets comprise a plurality of slots, wherein each slot in a single bucket is equal in size, and wherein different buckets comprise slots of different sizes;
dividing each value into at least one of the created buckets based on a size range, wherein each value is bucketed using a hash function;
adjusting each bucket by removing duplicative values across all buckets to create a structured data file based at least in part on the bucketing limit; and
storing the structured data file along with metadata.
|