Archimedes – A Python toolkit for hardware engineering

Introducing Archimedes ¶

A Python toolkit for hardware engineering

By Jared Callaham • 6 Oct 2025

A great engineer (controls being no exception) has to be part hacker, part master craftsman.

You have to be a hacker because things rarely “just work” in the real world without a little… creativity. But you can’t only be a hacker; developing complex systems in aerospace, automotive, robotics, and similar industries demands a disciplined, systematic approach. You need tools that let you iterate fast and maintain a methodical workflow where changes are version-controlled, algorithms are tested systematically, and deployment is repeatable.

Modern deep learning frameworks solved this years ago — you can develop in PyTorch or JAX and deploy anywhere. But those tools were built for neural net models, GPUs, and cloud deployments, not dynamics models, MCUs, and HIL testing.

That’s where Archimedes comes in; what PyTorch did for ML deployment, Archimedes aims to do for control systems. The goal is to build an open-source “PyTorch for hardware” that gives you the productivity of Python with the deployability of C.

In short, Archimedes is a Python framework that lets you develop and analyze algorithms in NumPy and automatically generate optimized C code for embedded systems. For instance, you can write a physics model in Python, calibrate it with data, use the model to design and simulate control logic, validate with simple hardware-in-the-loop (HIL) testing, and deploy with confidence:

This is one workflow you might use with Archimedes (specifically, the one from the hardware deployment tutorial), but it’s designed to be flexible, so you’re free to build up whatever workflow suits your style and application best.

The Linchpin: Python → C Code Generation¶ Archimedes started with the question, “What would you need to actually do practical control systems development in Python?” As a high-level language, it’s hard to beat Python on design principles like progressive disclosure, flexibility, and scalability. The numerical ecosystem (NumPy, SciPy, Matplotlib, Pandas, PyTorch, etc.) is also excellent. The problem is that none of it can deploy to typical embedded systems. If you need to deploy to hardware today, you have a few basic options: Work in a high-level language like Python or Julia and manually translate algorithms to C code Work entirely in a low-level language like C/C++ or Rust Adopt an expensive vendor-locked ecosystem that supports automatic code generation (Side note: While running Python itself on a microcontroller is growing in popularity for educational and hobby applications, there’s no real future for pure Python in real-time mission-critical deployments.) However, if you could do seamless C code generation from standard NumPy code, you could layer on simulation and optimization tools, building blocks for physics modeling, testing frameworks, and other features of comprehensive controls engineering toolchains. But without the code generation, there will always be a gulf between the software and the hardware deployment. Just to drive the point home, here’s a side-by-side of manual vs automatic coding for a common piece of sensor fusion algorithms: Kalman Filter Comparison Below are two implementations of a Kalman filter, an algorithm that combines noisy sensor measurements with a prediction model to estimate system state. This is what’s behind GPS navigation, spacecraft guidance, and sensor fusion in millions of devices. On the left is hand-written C code, and on the right is a NumPy version that can be used to generate an equivalent function. Here we’ll show an implementation for the common case of a single sensor, which avoids having to use a library for matrix inversion in C (though Archimedes does support operations like Cholesky factorization). Handwritten C #include #define N_STATES 4 typedef struct { float H [ N_STATES ]; // Measurement matrix (1 x n) float R ; // Measurement noise covariance (scalar) } kf_params_t ; typedef struct { float x [ N_STATES ]; // State estimate float P [ N_STATES ][ N_STATES ]; // Estimate covariance } kf_state_t ; typedef struct { float K [ N_STATES ]; // Kalman gain (n x 1) float M [ N_STATES ][ N_STATES ]; // I - K * H temporary float MP [ N_STATES ][ N_STATES ]; // M * P temporary float MPMT [ N_STATES ][ N_STATES ]; // M * P * M^T temporary float KRKT [ N_STATES ][ N_STATES ]; // K * R * K^T temporary } kf_work_t ; /** * Kalman filter update step (scalar measurement case) * * Mathematical formulation: * y = z - H·x (innovation) * S = H·P·H^T + R (innovation covariance) * K = P·H^T·S^(-1) (Kalman gain) * x' = x + K·y (state update) * P' = (I-KH)·P·(I-KH)^T + K·R·K^T (Joseph form covariance) * * @param z: Latest measurement * @param kf_state: Pointer to Kalman filter state struct * @param kf_params: Pointer to Kalman filter parameters struct * @param kf_work: Pointer to Kalman filter work struct (for temporaries) * @return: 0 on success, -1 on error */ int kalman_update ( float z , kf_state_t * kf_state , const kf_params_t * kf_params , kf_work_t * kf_work ) { #ifdef DEBUG if ( ! kf_state || ! kf_params || ! kf_work ) return -1 ; #endif size_t i , j , k ; // Innovation: y = z - H * x float y = z ; for ( i = 0 ; i < N_STATES ; i ++ ) y -= kf_params -> H [ i ] * kf_state -> x [ i ]; // Innovation covariance: S = H * P * H^T + R float S = kf_params -> R ; // Compute P * H^T (mv_mult) // Using K as temporary storage here for ( i = 0 ; i < N_STATES ; i ++ ) { kf_work -> K [ i ] = 0.0f ; for ( j = 0 ; j < N_STATES ; j ++ ) { kf_work -> K [ i ] += kf_state -> P [ i ][ j ] * kf_params -> H [ j ]; } } for ( i = 0 ; i < N_STATES ; i ++ ) S += kf_params -> H [ i ] * kf_work -> K [ i ]; // Kalman gain: K = P * H^T / S for ( i = 0 ; i < N_STATES ; i ++ ) kf_work -> K [ i ] /= S ; // Update state with feedback from new measurement: x = x + K * y for ( i = 0 ; i < N_STATES ; i ++ ) kf_state -> x [ i ] += kf_work -> K [ i ] * y ; // Joseph form update: P = (I - K * H) * P * (I - K * H)^T + K * R * K^T // First compute M = I - K * H for ( i = 0 ; i < N_STATES ; i ++ ) { for ( j = 0 ; j < N_STATES ; j ++ ) { if ( i == j ) kf_work -> M [ i ][ j ] = 1.0f - kf_work -> K [ i ] * kf_params -> H [ j ]; else kf_work -> M [ i ][ j ] = - kf_work -> K [ i ] * kf_params -> H [ j ]; } } // Compute M * P for ( i = 0 ; i < N_STATES ; i ++ ) { for ( j = 0 ; j < N_STATES ; j ++ ) { kf_work -> MP [ i ][ j ] = 0.0f ; for ( k = 0 ; k < N_STATES ; k ++ ) { kf_work -> MP [ i ][ j ] += kf_work -> M [ i ][ k ] * kf_state -> P [ k ][ j ]; } } } // Compute (M * P) * M^T for ( i = 0 ; i < N_STATES ; i ++ ) { for ( j = 0 ; j < N_STATES ; j ++ ) { kf_work -> MPMT [ i ][ j ] = 0.0f ; for ( k = 0 ; k < N_STATES ; k ++ ) { kf_work -> MPMT [ i ][ j ] += kf_work -> MP [ i ][ k ] * kf_work -> M [ j ][ k ]; } } } // Compute K * R * K^T for ( i = 0 ; i < N_STATES ; i ++ ) { for ( j = 0 ; j < N_STATES ; j ++ ) { kf_work -> KRKT [ i ][ j ] = kf_work -> K [ i ] * kf_params -> R * kf_work -> K [ j ]; } } // Final covariance update: P = MPMT + KRKT for ( i = 0 ; i < N_STATES ; i ++ ) { for ( j = 0 ; j < N_STATES ; j ++ ) { kf_state -> P [ i ][ j ] = kf_work -> MPMT [ i ][ j ] + kf_work -> KRKT [ i ][ j ]; } } return 0 ; } Archimedes Codegen @arc . compile def kalman_update ( x , P , z , H , R ): """Update state estimate with new measurement""" I = np . eye ( len ( x )) R = np . atleast_2d ( R ) # Ensure R is 2D for matrix operations y = np . atleast_1d ( z - H @ x ) # Innovation S = H @ P @ H . T + R # Innovation covariance K = P @ H . T / S # Kalman gain (scalar S) # Update state with feedback from new measurement x_new = x + K * y # Joseph form covariance update P_new = ( I - K @ H ) @ P @ ( I - K @ H ) . T + K @ R @ K . T return x_new , P_new # Generate optimized C code: return_names = ( "x_new" , "P_new" ) args = ( x , P , z , H , R ) arc . codegen ( kalman_update , args , return_names = return_names ) Neither of these implementations is optimized, but it gives a sense of what it looks like to work in either environment. Of course, for production hand-written code, you’d likely also use optimized linear algebra libraries like CMSIS-DSP and numerical strategies like Cholesky factorization or a square-root form for stability. But the extra numerical features are only a few extra lines in NumPy, while the hand-written C version becomes more and more complex. This capability, and most of the other core functionality in Archimedes, is made possible by building on CasADi, a sophisticated open-source library for nonlinear optimization and algorithmic differentiation. This lets Archimedes translate your NumPy code into C++ computational graphs that support code generation, derivative calculation, and more.

... continue reading