Energy-efficient AI inference framework & kernels for phones & AI-native hardware. Budget and mid-range phones control over 70% of the market, but frameworks today optimise for the highend phones with advanced chips. Cactus is designed bottom-up with no dependencies for all mobile devices.
Example (CPU-only):
Model: Qwen3-600m-INT8
File size: 370-420mb
16-20 t/s on Pixel 6a, Galaxy S21, iPhone 11 Pro
50-70 t/s on Pixel 9, Galaxy S25, iPhone 16
Architecture
Cactus exposes 4 levels of abstraction.
┌─────────────────┐ │ Cactus FFI │ ←── OpenAI compatible C API for integration └─────────────────┘ │ ┌─────────────────┐ │ Cactus Engine │ ←── High-level transformer engine └─────────────────┘ │ ┌─────────────────┐ │ Cactus Graph │ ←── Unified zero-copy computation graph └─────────────────┘ │ ┌─────────────────┐ │ Cactus Kernels │ ←── Low-level ARM-specific SIMD operations └─────────────────┘
Cactus Graph is a general numerical computing framework that runs on Cactus Kernels. Great for implementing custom models and scientific computing, like JAX for phones.
... continue reading