2025-07-20
What do I think the ideal array language should look like?
The fundamental units of computation available to users today are not the same as they were 20 years ago. When users had at most a few cores on a single CPU, it made complete sense that every program was written with the assumption that it would only run on a single core.
Even in a high-performance computing (HPC) context, the default mode of parallelism was (for a long time) the Message Passing Interface (MPI), which is a descriptive model of multi-core and multi-node parallelism. Most code was still basically written with the same assumtions: all units of computation were assumed to be uniform.
Hardware has trended towards heterogeneity in several ways:
More cores per node
More nodes per system
More kinds of subsystems (GPUs, FPGAs, etc.)
More kinds of computational units on a single subsystem CPUs have lots of vector units and specialized instructions NVIDIA GPUs have lots of tensor cores specialized for matrix operations
New paradigms at the assembly level Scalable Vector Extensions (SVE) and Scalable Matrix Extensions (SMEs) on Arm
... continue reading