US 7,596,587 B2
Multi-tiered storage
Pavel Berkhin, Sunnyvale, Calif. (US); Usama M. Fayyad, Sunnyvale, Calif. (US); and Shanmugasundaram Ravikumar, Cupertino, Calif. (US)
Assigned to Yahoo! Inc., Sunnyvale, Calif. (US)
Filed on Jul. 19, 2006, as Appl. No. 11/489,885.
Prior Publication US 2008/0021859 A1, Jan. 24, 2008
Int. Cl. G06F 17/30 (2006.01)
U.S. Cl. 707—204  [707/1; 711/117] 18 Claims
OG exemplary drawing
 
1. A computer-implemented method for storing a plurality of objects in a plurality of storage options, the plurality of objects being characterized by an associated obligation of permanent storage, the method comprising:
generating an importance index for each of the plurality of objects with reference to importance data associated with each object, at least a portion of the importance data representing relevance of the associated object relative to a population of users interacting with the plurality of objects, the importance index representing a likelihood that the corresponding object will be retrieved;
storing each of the objects in a selected one of the storage options with reference to the corresponding importance index and a hierarchical model of the storage options, the hierarchical model of storage options at least partially ordering the storage options with reference to economic costs and efficiency of retrieval;
controlling migration of selected ones of the plurality of objects across the storage options with reference to recomputed values of the corresponding importance indices and the hierarchical model while meeting the obligation of permanent storage for the plurality of objects, wherein migrating each object comprises copying the object from a first storage option to a second storage option among the plurality of storage options and removing the object from the first storage option; and
employing a machine learning technique to forecast the importance index for at least some of the plurality of objects with reference to a predictive model, the machine learning technique having been trained using training data corresponding to first ones of the objects identified as having relevance in the user population using a human-controlled editorial process.