US 9,811,775 B2
Parallelizing neural networks during training
Alexander Krizhevsky, Toronto (CA); Ilya Sutskever, Mountain View, CA (US); and Geoffrey E. Hinton, Toronto (CA)
Assigned to Google Inc., Mountain View, CA (US)
Filed by Google Inc., Mountain View, CA (US)
Filed on Sep. 18, 2013, as Appl. No. 14/30,938.
Claims priority of provisional application 61/745,717, filed on Dec. 24, 2012.
Prior Publication US 2014/0180989 A1, Jun. 26, 2014
Int. Cl. G06F 15/18 (2006.01); G06N 3/04 (2006.01); G06K 9/66 (2006.01); G06K 9/62 (2006.01); G06T 1/20 (2006.01); G06N 3/063 (2006.01); G06K 9/46 (2006.01)
CPC G06N 3/04 (2013.01) [G06K 9/4628 (2013.01); G06K 9/6256 (2013.01); G06K 9/66 (2013.01); G06N 3/0454 (2013.01); G06N 3/063 (2013.01); G06T 1/20 (2013.01)] 24 Claims
OG exemplary drawing
 
9. A method for parallelizing a neural network having a plurality of parameters during training of the neural network on a training set to determine a final parameter setting for the parameters of the neural network that produces correct classifications for the training set, the method comprising:
processing data using each of a plurality of parallel neural networks, wherein each of the plurality of parallel neural networks is implemented on a respective computing node, wherein the plurality of parallel neural networks each receive a same input image from the training set and collectively generate an output that classifies the input image, wherein each of the neural networks comprises a respective plurality of layers, wherein each plurality of layers comprises an interconnected layer and a non-interconnected layer, wherein processing data using each of the plurality of parallel neural networks comprises processing the data through the layers of each of the plurality of parallel neural networks, and wherein processing the data through the layers of each of the plurality of parallel neural networks comprises:
providing output from the interconnected layer to at least one layer in each of the other parallel neural networks in the plurality of parallel neural networks;
providing output from the non-interconnected layer only to a layer of the same parallel neural network; and
after the training, storing the final parameter setting for the parameters of the neural network on one or more non-transitory computer storage media.