khadas_yolov8n_multithread - Real-time UAV detection on the RK3588S NPU
Real-time YOLOv8n UAV detection at the sensor's 46 FPS ceiling, in ~140 MB of RAM. A high-throughput, low-footprint computer-vision pipeline for the Rockchip RK3588S SoC: it captures live 1080p MIPI frames, runs YOLOv8n across all 3 NPU cores in parallel (lifting throughput from ~31 to 46 FPS — the camera, not the pipeline, is now the limit), and streams the annotated result to HDMI or RTSP. Capture, color-convert/resize and inference run entirely on fixed-function silicon (ISP, RGA, NPU), so the CPU stays free and memory holds flat at ~140 MB per stream — small enough to run on even the cheapest 2 GB RK3588S boards, not just high-end dev kits. Targets any RK3588S board; built and tested on the Khadas Edge2.
Then it goes a step further: when a tracked UAV leaves the scene, an on-device LLM (Qwen2.5-0.5B, on the same NPU) writes a natural-language assessment of what just happened. The whole thing is a chain of small, independent processes connected by Unix-domain sockets — detections flow downstream into multi-object tracking, temporal-feature extraction, a presence FSM, and the on-demand LLM summary.
Highlights
Saturates the sensor: 3-thread NPU inference lifts throughput from ~31 FPS to the 46 FPS camera ceiling — the pipeline is no longer the bottleneck.
3-thread NPU inference lifts throughput from ~31 FPS to the 46 FPS camera ceiling — the pipeline is no longer the bottleneck. Fully hardware-accelerated: capture (ISP), color-convert/resize (RGA), and inference (NPU) never touch the CPU, giving a flat ~140 MB RSS per stream.
capture (ISP), color-convert/resize (RGA), and inference (NPU) never touch the CPU, giving a flat ~140 MB RSS per stream. Runs on any RK3588S board: because the footprint is so small (~140 MB for one stream, ~290 MB for two), it fits comfortably on the cheapest RK3588S boards on the market — even 2 GB models that sell for as little as ~€90 — not just high-end dev kits.
because the footprint is so small (~140 MB for one stream, ~290 MB for two), it fits comfortably on the cheapest RK3588S boards on the market — even 2 GB models that sell for — not just high-end dev kits. Two cameras at once: independent per-device sockets let two streams run and be controlled side by side.
independent per-device sockets let two streams run and be controlled side by side. Composable pipeline: detection → ByteTrack → temporal features → presence FSM → on-demand LLM summary, each a separate process.
detection → ByteTrack → temporal features → presence FSM → on-demand LLM summary, each a separate process. NPU hand-off for the LLM: a blackout / resume control plane frees the whole NPU so the LLM runs at full speed, then hands it back to the cameras.
... continue reading