US 9,811,529 B1
Automatically redistributing data of multiple file systems in a distributed storage system
Silvius V. Rus, Orina, CA (US); and Thileepan Subramaniam, Mountain View, CA (US)
Assigned to Quantcast Corporation, San Francisco, CA (US)
Filed by Quantcast Corporation, San Francisco, CA (US)
Filed on Feb. 6, 2013, as Appl. No. 13/760,933.
This patent is subject to a terminal disclaimer.
Int. Cl. G06F 12/00 (2006.01); G06F 17/30 (2006.01); G06F 3/06 (2006.01)
CPC G06F 17/30194 (2013.01) [G06F 3/067 (2013.01); G06F 17/30584 (2013.01)] 14 Claims
OG exemplary drawing
 
1. A computer-implemented method for redistributing data stored in a distributed storage, the method comprising:
maintaining a plurality of logically independent file systems, wherein each file system includes a data set stored by the distributed storage and metadata including a unique identifier and organizational structure information;
observing accesses to data files in the data set included in each of the file systems to determine access pattern levels;
determining a respective access pattern level for each of the plurality of logically independent file systems;
determining that a first file system from the plurality of logically independent file systems has an access pattern level specifying a higher probability of future access than a probability of future access specified by an access pattern level of a second file system from the plurality of logically independent file systems;
determining storage requirements for a plurality of file systems;
obtaining device characteristic information for the plurality of storage devices of the distributed storage;
redistributing the data set of the first file system having the higher probability of future access before redistributing the data set of the second file system across a plurality of storage devices of the distributed storage; and
redistributing data sets across the plurality of storage devices based on the storage requirements for the plurality of file systems and the obtained device characteristic information for the plurality of storage devices of the distributed storage;
wherein redistributing the data sets across the plurality of storage devices based on the storage requirements for the plurality of file systems and the obtained device characteristic information for the plurality of storage devices of the distributed storage comprises:
determining a respective performance level for each of the plurality of storage devices based on the device characteristic information;
computing an aggregate performance level for the distributed storage by summing the determined respective performance levels for the plurality of storage devices of the distributed storage;
computing a proportional performance level for a particular storage device of the plurality of storage devices by dividing the determined performance level for the particular storage device by the aggregate performance level;
computing a target amount of storage space assigned to the first file system from the plurality of logically independent file systems for the particular storage device by multiplying a determined storage requirement for the first file system and the proportional performance level for the particular storage device; and
redistributing the data set of the first file system based at least in part on the computed target amount of storage space assigned to the first file system for the particular storage device.