| US 7,593,978 B2 | ||
| Processor reduction unit for accumulation of multiple operands with or without saturation | ||
| Michael J. Schulte, Madison, Wis. (US); Pablo I. Balzola, Nanuet, N.Y. (US); and C. John Glossner, Carmel, N.Y. (US) | ||
| Assigned to Sandbridge Technologies, Inc., White Plains, N.Y. (US) | ||
| Filed on May 07, 2004, as Appl. No. 10/841,261. | ||
| Claims priority of provisional application 60/469253, filed on May 09, 2003. | ||
| Prior Publication US 2005/0071413 A1, Mar. 31, 2005 | ||
| Int. Cl. G06F 15/00 (2006.01) | ||
| U.S. Cl. 708—603 | 17 Claims |

| 1. A multithreaded processor comprising:
a plurality of arithmetic units;
an accumulator unit; and
a reduction unit coupled between the plurality of arithmetic units and the accumulator unit, the reduction unit being configured
to receive input operands from the arithmetic units and a first accumulator value from the accumulator unit;
wherein the reduction unit is pipelined and operative to sum the input operands and the first accumulator value, and to generate
a second accumulator value for delivery to the accumulator unit, the reduction unit further comprising m inputs, m adders,
and an m stage pipeline, where m is greater than or equal to two, each of the in inputs being coupled to a corresponding adder
by means of N−1 pipeline registers where N is a stage number greater than or equal to 1, the m stage pipeline being configured
to reduce the worst case delay of the reduction unit;
wherein the reduction unit is controllable to support saturation and wrap-around arithmetic; and
wherein operations for a dot product computed for a given thread are executed concurrently with operations from other threads,
the number of cycles between execution of instructions from the given thread being greater than or equal to a number of pipeline
stages in the reduction unit plus any additional cycles needed to write to and read from the accumulator unit.
|