A minimal tensor processing unit (TPU), reinvented from Google's TPU V2 and V1.
tinytpu.mp4
Table of Contents
Architecture
Processing Element (PE)
Function : Performs a multiply-accumulate operation every clock cycle
: Performs a multiply-accumulate operation every clock cycle Data Flow : Incoming data is multiplied by a stored weight and added to an incoming partial sum to produce an output sum Incoming data also passes through to the next element for propagation across the array
:
Systolic Array
Architecture : A grid of processing elements, starting from 2x2
... continue reading