US 9,811,545 B1
Storage of sparse files using parallel log-structured file system
John M. Bent, Los Alamos, NM (US); Sorin Faibish, Newton, MA (US); Gary Grider, Los Alamos, NM (US); and Aaron Torres, Los Alamos, NM (US)
Assigned to EMC IP Holding Company LLC, Hopkinton, MA (US); and Los Alamos National Security, LLC, Los Alamos, NM (US)
Filed by EMC IP Holding Company LLC, Hopkinton, MA (US)
Filed on Jun. 19, 2013, as Appl. No. 13/921,719.
Int. Cl. G06F 17/00 (2006.01); G06F 17/30 (2006.01)
CPC G06F 17/30321 (2013.01) [G06F 17/30286 (2013.01); G06F 17/30312 (2013.01)] 19 Claims
OG exemplary drawing
 
1. A method for storing a sparse file, comprising the steps of:
obtaining, using at least one processing device, at least a portion of said sparse file, wherein said sparse file portion comprises a plurality of data portions and a corresponding plurality of holes, wherein each of said plurality of data portions has been written with data and wherein remainder portions of said sparse file portion associated with each of said holes have not been written with data;
detecting a write pattern for a plurality of said data portions of a plurality of said sparse files;
generating, using at least one processing device, a patterned index entry for each of said sparse files only for said patterned data portions of said plurality of said sparse files, each of said patterned index entries comprising a logical offset, physical offset and length of each of said data portions; and
storing, using at least one processing device, said plurality of data portions of said sparse file in a single file in a storage device of a file system using a parallel log-structured file system without storing said hole associated with each of said data portions, wherein said patterned index entries for said plurality of said sparse files are stored as a file in a directory, wherein each patterned index entry in said file comprises an identifier of a corresponding sparse file.