US 11,816,560 B2
Performance estimation-based resource allocation for reconfigurable architectures
Zhuo Chen, Mountain View, CA (US); and Sumti Jairath, Santa Clara, CA (US)
Assigned to SambaNova Systems, Inc., Palo Alto, CA (US)
Filed by SambaNova Systems, Inc., Palo Alto, CA (US)
Filed on Aug. 8, 2022, as Appl. No. 17/883,407.
Application 17/883,407 is a continuation of application No. 16/572,527, filed on Sep. 16, 2019, granted, now 11,410,027.
Prior Publication US 2022/0374695 A1, Nov. 24, 2022
Int. Cl. G06N 3/063 (2023.01); G06F 16/904 (2019.01); G06F 15/78 (2006.01)
CPC G06N 3/063 (2013.01) [G06F 15/7892 (2013.01); G06F 16/904 (2019.01)] 18 Claims
OG exemplary drawing
 
1. A computer-implemented method of efficiently executing an operation unit graph on a reconfigurable data processor with a target architecture that includes physical compute units and/or physical memory units, the method including:
reducing a number of the physical compute units and/or physical memory units of the reconfigurable data processor required to execute the operation unit graph by
receiving, from a user, architectural hints that are specific to the target architecture of the reconfigurable data processor,
wherein the architectural hints
call for fusing first operation units when executing a pattern of the first operation units on the physical compute units and/or physical memory units of the reconfigurable data processor,
specify the first operation units in the pattern as first nodes,
specify first dataflows among the first operation units in the pattern as first edges, and
direct fusion among the first operation units in the pattern;
scanning the operation unit graph to detect an instance of the pattern of the first operation units specified by the architectural hints, including
matching second nodes and second edges in the operation unit graph with the first nodes and the first edges in the architectural hints, and detecting a pattern match;
fusing operation units of the second nodes and the second edges in the operation unit graph into a consolidated operation units block, thereby producing a fused operation unit graph;
allocating a set of physical compute units and/or physical memory units of the physical compute units and/or physical memory units of the reconfigurable data processor to the fused operation unit graph; and
executing the fused operation unit graph on the reconfigurable data processor based on the allocation.