US 11,741,350 B2
Efficient utilization of processing element array
Jeffrey T. Huynh, San Jose, CA (US); Ron Diamant, Santa Clara, CA (US); Hongbin Zheng, San Jose, CA (US); Yizhi Liu, Fremont, CA (US); Animesh Jain, Sunnyvale, CA (US); Yida Wang, Palo Alto, CA (US); Vinod Sharma, Menlo Park, CA (US); Richard John Heaton, San Jose, CA (US); Randy Renfu Huang, Morgan Hill, CA (US); Sundeep Amirineni, Cedar Park, TX (US); and Drazen Borkovic, Los Altos, CA (US)
Assigned to Amazon Technologies, Inc., Seattle, WA (US)
Filed by Amazon Technologies, Inc., Seattle, WA (US)
Filed on Nov. 27, 2019, as Appl. No. 16/698,461.
Prior Publication US 2021/0158132 A1, May 27, 2021
Int. Cl. G06N 3/063 (2023.01); G06N 3/04 (2023.01); G06N 3/08 (2023.01)
CPC G06N 3/063 (2013.01) [G06N 3/04 (2013.01)] 20 Claims
OG exemplary drawing
 
1. A computer-implemented method comprising:
receiving, by a processor of a computer system, a neural network model for implementation using a neural network accelerator that includes a first number of rows of processing elements, the neural network model including a network layer that includes a convolution operation for generating an output feature map using a second number of input feature maps and a set of filters;
determining, by the processor, that the second number is equal to or less than a half of the first number;
adding, by the processor, operations to the neural network model, the operations including:
padding the second number of input feature maps with padding data to generate padded input feature maps;
dividing each of the padded input feature maps into partitions; and
dividing the convolution operation into sub-operations based on the partitions,
wherein the operations added to the neural network model increase utilization of the rows of processing elements;
generating, by the processor as part of compiling the neural network model, instructions for execution by the neural network accelerator to implement the convolution operation, wherein the convolution operation involves using the rows of processing elements to perform matrix multiplication between individual filters and portions of the input feature maps, and wherein the portions of the input feature maps correspond to windows formed by sliding the individual filters across the input feature maps;
detecting, from the instructions, a first instruction and a second instruction that both use a first partition in the partitions of a padded input feature map, wherein the first instruction and the second instruction use different elements of a filter in the set of filters; and
generating, by the processor, an instruction for replicating the first partition read from memory and used by the first instruction for use by the second instruction, wherein the replicating of the first partition avoids rereading the first partition from the memory in connection with the second instruction.