Show HN: We made our own inference engine for Apple Silicon

uzu A high-performance inference engine for AI models on Apple Silicon. Key features: Simple, high-level API Hybrid architecture, where layers can be computed as GPU kernels or via MPSGraph (a low-level API beneath CoreML with ANE access) Unified model configurations, making it easy to add support for new models Traceable computations to ensure correctness against the source-of-truth implementation Utilizes unified memory on Apple devices Quick Start First, add the uzu dependency to your Cargo.toml : [ dependencies ] uzu = { git = " https://github.com/trymirai/uzu " , branch = " main " , package = " uzu " } Then, create an inference Session with a specific model and configuration: use std :: path :: PathBuf ; use uzu :: { backends :: metal :: sampling_config :: SamplingConfig , session :: { session :: Session , session_config :: SessionConfig , session_input :: SessionInput , session_output :: SessionOutput , session_run_config :: SessionRunConfig , } , } ; fn main ( ) -> Result < ( ) , Box < dyn std :: error :: Error > > { let model_path = PathBuf :: from ( "MODEL_PATH" ) ; let mut session = Session :: new ( model_path . clone ( ) ) ? ; session . load_with_session_config ( SessionConfig :: default ( ) ) ? ; let input = SessionInput :: Text ( "Tell about London" . to_string ( ) ) ; let tokens_limit = 128 ; let run_config = SessionRunConfig :: new_with_sampling_config ( tokens_limit , Some ( SamplingConfig :: default ( ) ) ) ; let output = session . run ( input , run_config , Some ( |_ : SessionOutput | { return true ; } ) ) ; println ! ( "{}" , output . text ) ; Ok ( ( ) ) } Overview For a detailed explanation of the architecture, please refer to the documentation. uzu uses its own model format. To export a specific model, use lalamo. First, get the list of supported models: uv run lalamo list-models Then, export the specific one: uv run lalamo convert meta-llama/Llama-3.2-1B-Instruct --precision float16 Alternatively, you can download a preprepared model using the sample script: ./scripts/download_test_model.sh $MODEL_PATH CLI You can run uzu in a CLI mode: cargo run --release -p cli -- help Usage: uzu_cli [COMMAND] Commands: run Run a model with the specified path serve Start a server with the specified model path help Print this message or the help of the given subcommand(s) Bindings uzu-swift - a prebuilt Swift framework, ready to use with SPM License This project is licensed under the MIT License. See the LICENSE file for details.

Show HN: We made our own inference engine for Apple Silicon

Share this article

Related Articles