Launch HN: Cactus (YC S25) – AI inference on smartphones

Energy-efficient AI inference framework & kernels for phones & AI-native hardware. Budget and mid-range phones control over 70% of the market, but frameworks today optimise for the highend phones with advanced chips. Cactus is designed bottom-up with no dependencies for all mobile devices.

Example (CPU-only):

Model: Qwen3-600m-INT8

File size: 370-420mb

16-20 t/s on Pixel 6a, Galaxy S21, iPhone 11 Pro

50-70 t/s on Pixel 9, Galaxy S25, iPhone 16

Architecture

Cactus exposes 4 levels of abstraction.

┌─────────────────┐ │ Cactus FFI │ ←── OpenAI compatible C API for integration └─────────────────┘ │ ┌─────────────────┐ │ Cactus Engine │ ←── High-level transformer engine └─────────────────┘ │ ┌─────────────────┐ │ Cactus Graph │ ←── Unified zero-copy computation graph └─────────────────┘ │ ┌─────────────────┐ │ Cactus Kernels │ ←── Low-level ARM-specific SIMD operations └─────────────────┘

Cactus Graph is a general numerical computing framework that runs on Cactus Kernels. Great for implementing custom models and scientific computing, like JAX for phones.

... continue reading