US 11,809,451 B2
Caching systems and methods
Benoit Dageville, Foster City, CA (US); Thierry Cruanes, San Mateo, CA (US); and Marcin Zukowski, San Mateo, CA (US)
Assigned to Snowflake Inc., Bozeman, MT (US)
Filed by Snowflake Inc., Bozeman, MT (US)
Filed on Oct. 20, 2014, as Appl. No. 14/518,971.
Claims priority of provisional application 61/941,986, filed on Feb. 19, 2014.
Prior Publication US 2015/0234922 A1, Aug. 20, 2015
Int. Cl. G06F 16/27 (2019.01); G06F 9/48 (2006.01); G06F 9/50 (2006.01); G06F 16/14 (2019.01); G06F 16/182 (2019.01); G06F 16/21 (2019.01); G06F 16/22 (2019.01); G06F 16/23 (2019.01); G06F 16/2453 (2019.01); G06F 16/2455 (2019.01); G06F 16/951 (2019.01); G06F 16/2458 (2019.01); G06F 16/9535 (2019.01); H04L 67/568 (2022.01); G06F 16/28 (2019.01); G06F 16/25 (2019.01); A61F 5/56 (2006.01); G06F 16/9538 (2019.01); H04L 67/1095 (2022.01); H04L 67/1097 (2022.01)
CPC G06F 16/273 (2019.01) [A61F 5/566 (2013.01); G06F 9/4881 (2013.01); G06F 9/5016 (2013.01); G06F 9/5044 (2013.01); G06F 9/5083 (2013.01); G06F 9/5088 (2013.01); G06F 16/148 (2019.01); G06F 16/1827 (2019.01); G06F 16/211 (2019.01); G06F 16/221 (2019.01); G06F 16/2365 (2019.01); G06F 16/2456 (2019.01); G06F 16/2471 (2019.01); G06F 16/24532 (2019.01); G06F 16/24545 (2019.01); G06F 16/24552 (2019.01); G06F 16/254 (2019.01); G06F 16/27 (2019.01); G06F 16/283 (2019.01); G06F 16/951 (2019.01); G06F 16/9535 (2019.01); G06F 16/9538 (2019.01); H04L 67/1095 (2013.01); H04L 67/1097 (2013.01); H04L 67/568 (2022.05)] 21 Claims
OG exemplary drawing
 
1. A method comprising:
receiving a query directed to database data stored across a plurality of shared storage devices;
referencing a metadata store to locate a set of files that comprises data that needs to be processed to respond to the query;
referencing the metadata store to determine whether the set of files is cached among execution nodes of an execution platform comprising a plurality of execution nodes, wherein the execution platform is separate from the metadata store and the plurality of shared storage devices;
in response to determining that at least a portion of the set of files is cached among the plurality of execution nodes, assigning by one or more processors, processing of one or more of the set of files to each of one or more execution nodes that have cached at least a portion of the set of files;
for each of the one or more assigned execution nodes:
determining, by the assigned execution node, whether the assigned one or more files is stored at least in part in a cache of the assigned execution node; and
in response to the assigned execution node determining the assigned one or more files is not entirely stored in the cache of the assigned execution node:
retrieving a missing portion of the assigned one or more files from one or more remote storage devices of the plurality of remote storage devices including the missing portion of the assigned one or more files, wherein the plurality of execution nodes are organized into one or more virtual warehouses having one or more logical mappings between them, and a virtual warehouse including the assigned execution node dynamically establishes a communication link with each of the one or more of the plurality of remote storage devices based at least in part on the query so that the assigned execution node may retrieve the missing portion;
storing, by the assigned execution node, the missing portion of the assigned one or more files in the cache of the assigned execution node so that the entire one or more files is stored in the cache of the assigned execution node, wherein a size and composition of the cache is adjusted to accommodate the missing portion of the assigned one or more files;
processing the query using the assigned one or more files stored in the cache of the assigned execution node; and
updating the metadata store to indicate the entire assigned one or more files is now cached in the cache of the assigned execution node;
wherein any of the set of files stored in the plurality of shared storage devices may be accessed by any of a plurality of execution nodes of the execution platform;
wherein any of the set of files stored in the plurality of shared storage devices may be stored in a cache of any of the plurality of execution nodes of the execution platform; and
wherein any of the set of files stored in the plurality of shared storage devices may be stored in a cache of multiple execution nodes of the plurality of execution nodes of the execution platform at one point in time; and
in response to a determination of a change in the number of execution nodes of the execution platform, wherein the change is creating a new execution node, the new execution node comprising a plurality of processors, wherein the cache varies among the plurality of processors, wherein a first subset of the plurality of processors comprises a minimal cache and a second subset of the plurality of processors comprises a cache providing faster input-output operations, reassign processing, among the changed number of execution nodes of the execution platform, of the set of files comprising data that needs to be processed to respond to the query.