| US 7,480,817 B2 | ||
| Method for replicating data based on probability of concurrent failure | ||
| Jinliang Fan, Redmond, Wash. (US); Zhen Liu, Tarrytown, N.Y. (US); and Dimitrios Pendarakis, Westport, Conn. (US) | ||
| Assigned to International Business Machines Corporation, Armonk, N.Y. (US) | ||
| Filed on Mar. 31, 2006, as Appl. No. 11/395,018. | ||
| Prior Publication US 2007/0234102 A1, Oct. 04, 2007 | ||
| Int. Cl. G06F 11/00 (2006.01) | ||
| U.S. Cl. 714—4 [714/6; 370/238] | 1 Claim |

| 1. A computer-implemented method for replicating data, the method comprising the steps of:
determining, by a source node having a non-volatile data storage area, that the source node has data in the non-volatile data
storage area to be replicated;
surveying, by the source node, all nodes coupled to the source node via a network so as to determine candidate replication
nodes, the nodes being geographically distributed data storage entities, and the candidate replication nodes being the nodes
that are functional, communicating nodes with memory capacity available to store at least a portion of the data to be replicated;
acquiring, by the source node, coordinates for each of the candidate replication nodes;
using, by the source node, the coordinates to determine a geographic location of each of the candidate replication nodes;
using, by the source node, the coordinates to determine a communication cost for each of the candidate replication nodes,
the communication cost being determined based on communication parameters that include a physical distance, an electrical
pathway distance, a number of switches in an electrical pathway, a cost of establishing a connection, and an electrical pathway
signal carrying capacity;
rating, by the source node, each of the geographic locations based on probability of a concurrent failure of the source node
and the candidate replication node, the probability being based on historical data and predictive mathematical models, the
historical data including statistical records of previous events, and the predictive mathematical models including a model
of a combination of independent and correlated failures;
using, by the source node, a branch-and-bound algorithm to assign values to sets of the candidate replication nodes based
on a combination of the communication costs and the ratings of the geographic locations of the candidate replication nodes;
selecting, by the source node, one of the sets of candidate replication nodes based on the values that are assigned, the one
set of candidate replication nodes being selected so as to obtain a lowest value of the combination of the communication cost
and the probability of a concurrent failure;
replicating the data to be replicated on the nodes of the one set of candidate replication nodes; and
at least periodically monitoring, by the source node, all nodes coupled to the source node via the network to determine availability
of new nodes.
|