US 7,484,075 B2
Method and apparatus for providing fast remote register access in a clustered VLIW processor using partitioned register files
Krishnan K. Kailas, Ossining, N.Y. (US)
Assigned to International Business Machines Corporation, Armonk, N.Y. (US)
Filed on Dec. 16, 2002, as Appl. No. 10/320,150.
Prior Publication US 2004/0117597 A1, Jun. 17, 2004
Int. Cl. G06F 9/30 (2006.01)
U.S. Cl. 712—24 20 Claims
OG exemplary drawing
 
1. A computer system, comprising:
a plurality of clustered processing cores for processing VLIW (Very Long Instruction Word) operations, wherein each processing core comprises:
a local partitioned register file having a subset of an architected name space;
an instruction decoder to decode a VLIW for execution;
an inter-cluster communication bus enabling communication between the processing cores;
a processor pipeline including a plurality of stages for operating on the VLIW; and
a hardware register pre-fetch unit comprising an instruction pre-fetch buffer to store the VLIW to await decoding by the instruction decoder,
wherein the hardware register pre-fetch unit (i) pre-decodes a name of a register specified in the VLIW in advance of decoding by the instruction decoder to determine if a remote register is needed to execute the VLIW, and (ii) generates a control signal to pre-fetch data, from the specified remote register in a remote processing core or from a remote bypass network, for an instruction along one execution path in a program, in advance of decoding of the VLIW by the instruction decoder for execution, based on a compiler analysis of the program that schedules instructions that are data dependent by taking into account a latency of the inter-cluster communication bus, a size of the instruction pre-fetch buffer, and a depth of the processor pipeline.