Multiplatform Matrix Multiplication Kernels
Few algorithmic problems are as central to modern computing as matrix multiplication. It is fundamental to AI, forming the basis of fully connected layers used throughout neural networks. In transformer architectures, most of the computation is spent performing matrix multiplication. And since compute largely determines capability, faster matrix multiplication algorithms directly translate into more powerful models [1 ]. NVIDIA probably deserves much of the credit for making matrix multiplicatio