US 11,816,107 B2
Index generation using lazy reassembling of semi-structured data
Mahmud Allahverdiyev, Berlin (DE); Selcuk Aya, Izmir (TR); Bowei Chen, San Bruno, CA (US); and Ismail Oukid, Berlin (DE)
Assigned to Snowflake Inc., Bozeman, MT (US)
Filed by Snowflake Inc., Bozeman, MT (US)
Filed on Dec. 27, 2022, as Appl. No. 18/146,912.
Application 18/146,912 is a continuation of application No. 17/814,110, filed on Jul. 21, 2022, granted, now 11,567,939.
Application 17/814,110 is a continuation in part of application No. 17/655,124, filed on Mar. 16, 2022, granted, now 11,494,384.
Application 17/655,124 is a continuation of application No. 17/394,149, filed on Aug. 4, 2021, granted, now 11,308,090.
Application 17/394,149 is a continuation in part of application No. 17/358,154, filed on Jun. 25, 2021, granted, now 11,308,089.
Application 17/358,154 is a continuation of application No. 17/161,115, filed on Jan. 28, 2021, granted, now 11,086,875.
Application 17/161,115 is a continuation of application No. 16/932,462, filed on Jul. 17, 2020, granted, now 10,942,925.
Application 16/932,462 is a continuation of application No. 16/727,315, filed on Dec. 26, 2019, granted, now 10,769,150.
Claims priority of provisional application 63/197,750, filed on Jun. 7, 2021.
Prior Publication US 2023/0139194 A1, May 4, 2023
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 16/24 (2019.01); G06F 16/2455 (2019.01); G06F 16/9035 (2019.01); G06F 16/28 (2019.01); G06F 17/18 (2006.01); G06F 16/22 (2019.01)
CPC G06F 16/24557 (2019.01) [G06F 16/2272 (2019.01); G06F 16/283 (2019.01); G06F 16/9035 (2019.01); G06F 17/18 (2013.01)] 30 Claims
OG exemplary drawing
 
1. A system comprising:
at least one hardware processor; and
at least one memory storing instructions that cause the at least one hardware processor to perform operations comprising:
generating an index for a source table comprising a column of semi-structured data, the index indexing distinct values in each column of the source table, the generating of the index comprising:
identifying, based on a reassembly hook object, a first set of values corresponding to a first portion of the semi-structured data that is subcolumnarized, the reassembly hook object comprising a first data structure that represents the first portion of the semi-structured data; and
identifying, based on a residual object, a second set of values corresponding to a second portion of the semi-structured data that is not subcolumnarized, the residual object comprising a second data structure that represents at least a portion of the second portion of the semi-structured data;
storing the index with an association with the source table;
receiving a query directed at the column;
generating a set of search fingerprints based on a value in the query; and
processing the query by scanning a reduced scan set generated using the index and the set of search fingerprints.