US 11,809,281 B2
Metadata management for scaled and high density backup environments
Aaditya Rakesh, Bangalore (IN); Upanshu Singhal, Bangalore (IN); and Adam Brenner, Mission Viejo, CA (US)
Assigned to EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed by EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed on Apr. 20, 2021, as Appl. No. 17/235,663.
Prior Publication US 2022/0334925 A1, Oct. 20, 2022
Int. Cl. G06F 11/14 (2006.01); G06F 16/182 (2019.01); G06F 16/16 (2019.01); G06F 11/20 (2006.01)
CPC G06F 11/1451 (2013.01) [G06F 16/164 (2019.01); G06F 16/1824 (2019.01); G06F 2201/82 (2013.01)] 13 Claims
OG exemplary drawing
 
1. A computer-implemented method of managing metadata for backup data in a network attached storage (NAS) fileshare, comprising:
obtaining a change file list (CFL) listing only files that are changed between two different snapshot backups, and obtained by one of an external source providing the CFL or a crawler process crawling the fileshare, to determine changes in files between successive backup operations, wherein the changes in a file comprise at least one of: newly added data, modified data, and deleted data;
creating, by a processor-based backup engine, multiple slices for parallel backup of the files, wherein each slice comprises one or more files of the NAS fileshare;
creating separate backup containers and separate metadata files for each slice;
combining the separate metadata files into a single consolidated metadata file;
defining file details and backup properties in a backup table;
processing, by the processor-based backup engine, a backup query using the consolidated metadata file and backup table;
responding to the backup query through an interface of the backup engine by returning a unique backup ID for each file of the files;
generating the multiple slices by an NAS agent for the backup;
performing an incremental backup using the CFL;
obtaining, by the NAS agent the CFL for the entire fileshare;
using the consolidated metadata file from a last backup to identify the changes in each file to generate storage buckets therefor;
passing the generated storage buckets to a backup engine; and
using the consolidated metadata file again for identifying unchanged files for synthesis from the last backup to get a location of the unchanged files in the previous backup container for synthesizing into a new container.