US 7,593,978 B2
Processor reduction unit for accumulation of multiple operands with or without saturation
Michael J. Schulte, Madison, Wis. (US); Pablo I. Balzola, Nanuet, N.Y. (US); and C. John Glossner, Carmel, N.Y. (US)
Assigned to Sandbridge Technologies, Inc., White Plains, N.Y. (US)
Filed on May 07, 2004, as Appl. No. 10/841,261.
Claims priority of provisional application 60/469253, filed on May 09, 2003.
Prior Publication US 2005/0071413 A1, Mar. 31, 2005
Int. Cl. G06F 15/00 (2006.01)
U.S. Cl. 708—603 17 Claims
OG exemplary drawing
 
1. A multithreaded processor comprising:
a plurality of arithmetic units;
an accumulator unit; and
a reduction unit coupled between the plurality of arithmetic units and the accumulator unit, the reduction unit being configured to receive input operands from the arithmetic units and a first accumulator value from the accumulator unit;
wherein the reduction unit is pipelined and operative to sum the input operands and the first accumulator value, and to generate a second accumulator value for delivery to the accumulator unit, the reduction unit further comprising m inputs, m adders, and an m stage pipeline, where m is greater than or equal to two, each of the in inputs being coupled to a corresponding adder by means of N−1 pipeline registers where N is a stage number greater than or equal to 1, the m stage pipeline being configured to reduce the worst case delay of the reduction unit;
wherein the reduction unit is controllable to support saturation and wrap-around arithmetic; and
wherein operations for a dot product computed for a given thread are executed concurrently with operations from other threads, the number of cycles between execution of instructions from the given thread being greater than or equal to a number of pipeline stages in the reduction unit plus any additional cycles needed to write to and read from the accumulator unit.