US 9,811,464 B2
Apparatus and method for considering spatial locality in loading data elements for execution
Ruchira Sasanka, Hillsboro, OR (US); and Elmoustapha Ould-Ahmed-Vall, Chandler, AZ (US)
Assigned to INTEL CORPORATION, Santa Clara, CA (US)
Filed by Intel Corporation, Santa Clara, CA (US)
Filed on Dec. 11, 2014, as Appl. No. 14/567,602.
Prior Publication US 2016/0170883 A1, Jun. 16, 2016
Int. Cl. G06F 12/08 (2016.01); G06F 12/0811 (2016.01); G06F 12/0875 (2016.01); G06F 12/0804 (2016.01); G06F 12/0886 (2016.01)
CPC G06F 12/0811 (2013.01) [G06F 12/0804 (2013.01); G06F 12/0875 (2013.01); G06F 12/0886 (2013.01); G06F 2212/283 (2013.01); G06F 2212/452 (2013.01)] 14 Claims
OG exemplary drawing
 
1. A processor that loads data elements, comprising:
an upper level cache;
and
at least one processor core coupled to the upper level cache, including one or more registers and a plurality of instruction processing stages:
a decoder unit to decode an instruction requiring an input of a plurality of data elements, wherein a size of each of the plurality of data elements is less than a cache line size of the processor, and
responsive to the instruction, an execution unit to load the plurality of data elements to the one or more registers, without loading data elements spatially adjacent to the plurality of data elements, wherein the loading of the plurality of data elements is to:
gather the plurality of data elements in a temporary buffer; and
load the plurality of data elements from the temporary buffer to the one or more registers.