1. An apparatus comprising:
a display;
a shared local memory;
a data register; and
a processor configured to execute:
a scalar module to define a plurality of execution elements, wherein two or more of the execution elements are to be grouped into an element block; and
a block module to be invoked by the scalar module and implement a block operation on a data block, wherein the block operation is to include a data transfer event between system memory and the data register excluding the shared local memory to be performed by the two or more execution elements of the element block simultaneously by an access to one memory address in the system memory for the entire data block that does not explicitly define a width of the data block, wherein the width of the data block is to be implicitly defined based on the number of execution elements in the element block, and wherein the display is to render data based on the block operation.