Clinical-grade autonomous cytopathology through whole-slide edge tomography

Whole-slide edge tomography

As shown in Extended Data Fig. 1, the whole-slide edge tomograph comprises multiple hardware modules optimized for high-speed 3D imaging and edge-side data processing. The illumination system uses a high-power light-emitting diode (XQ-E; Cree) as the light source, paired with a motorized iris (Nihon Seimitsu Sokki) to control the numerical aperture. This illumination passes through the cytology sample and is projected onto a camera board equipped with a CMOS image sensor (IMX531; Sony) and imaging optics. The camera board (e-con Systems) is mounted on a Z stage (Chuo Precision Industrial), which executes precise axial scanning under the control of a real-time controller. The XY stage translates the slide in the lateral plane for complete coverage during image acquisition.

These mechanical components are tightly integrated with the edge computer, which includes several modules: an image sensor FPGA (CertusPro-NX; Lattice Semiconductor), a real-time controller equipped with an extra FPGA (Artrix7; Advanced Micro Devices), an XY stage controller on the basis of a microcontroller (STM32; STMicroelectronics), an illumination controller on the basis of a microcontroller (RL78; Renesas Electronics) and an SOM unit (Jetson Xavier NX; NVIDIA). The SOM features a multicore central processing unit (CPU), a GPU, a hardware encoder and main memory used as an image buffer. An application running on SOM manages internal communications over USB and SPI protocols to coordinate the XY stage, Z stage and illumination modules. Captured images from the CMOS sensor are first transmitted to the FPGA, where real-time high-speed signal conditioning and protocol conversion are performed.

To support high spatial and temporal resolution, the system uses a dual four-lane MIPI, which effectively doubles the data throughput compared with a single MIPI–Camera Serial Interface configuration. This allows continuous transmission of 4,480 × 4,504 resolution images at up to 50 frames per second from the FPGA to the SOM, facilitating reliable real-time handling of large volumetric datasets. Upon receipt by the SOM, image data undergo a three-step processing pipeline: (1) 3D image acquisition; (2) 3D reconstruction through axial alignment using both the GPU and CPU; and (3) real-time compression using the on-board encoder. For compression, the system leverages the NVENC library of NVIDIA to encode 3D image stacks into the HEVC format with hardware acceleration. This process ensures substantial data reduction while maintaining the critical structural features needed for downstream visualization and analysis. The resulting compressed image data are stored locally on a solid-state drive integrated within the edge computer.

From there, compressed image data are transmitted to a back-end server where they are stitched into full-slide 3D volumes and stored on a network-attached storage (NAS) system. These reconstructed volumes are subsequently used for both interactive visualization and AI-based computational analysis. The back-end server hosts a DZI viewer, which enables smooth and responsive visualization by dynamically decompressing and transmitting only the requested tile regions on the basis of user inputs, such as zooming, panning and focus adjustments. These operations are accelerated by a GPU (RTX 4000 Ada; NVIDIA), which handles stitching, image rendering and hardware-accelerated decoding. In parallel, an AI analysis server retrieves the compressed data from the NAS, decodes it using hardware acceleration and performs diagnostic or morphological inference using a high-performance GPU (RTX 6000 Ada; NVIDIA). The resulting predictions and associated metadata are stored back on the NAS for subsequent review or downstream integration.

Sectional 3D image construction and compression

The imaging workflow involves tightly coordinated real-time interactions among multiple software and hardware components operating in parallel. The real-time controller adjusts the Z stage to sequentially position the slide at specified focal depths, enabling the acquisition of sectional 2D images across various Z planes. Concurrently, each captured image is transmitted to the image signal processing unit of the FPGA, which forwards the data to the GPU buffer on the edge computer. Upon completion of image acquisition at a given region, the XY stage promptly moves the slide to the next imaging section while image processing and compression begin immediately, achieving a pipelined, non-blocking execution flow. A dedicated 3D image construction module processes the acquired Z-stack by enhancing colour uniformity and dynamic range and selecting optimal focal planes to ensure that all cells appear sharply focused. In parallel, a 3D image compression module uses the hardware encoder integrated in the SOM to compress the processed image stack into an HEVC-format video file. These modules operate simultaneously, enabling high-throughput scanning without computational bottlenecks.

To evaluate the timing characteristics of individual imaging tasks, time logs were recorded under three Z-layer configurations: 10, 20 and 40 layers. Representative task sequences for the first eight imaging sections at the beginning of a whole-slide scan are shown in Extended Data Fig. 2d (10 layers), Extended Data Fig. 2e (20 layers) and Extended Data Fig. 2f (40 layers). These logs delineate task execution for XY stage motion, image acquisition, 3D construction and compression. The first imaging section includes an initialization step and thus takes slightly longer than subsequent areas.

To quantify system performance across the full scan, we calculated the average and standard deviation of the latency of each task per imaging section for each Z-layer setting. As summarized in Extended Data Fig. 2i, the XY stage motion time remained constant regardless of Z-stack depth, whereas image acquisition, construction and compression durations increased linearly with the number of Z layers. The larger error bars associated with XY stage motion reflect variations in travel distance between imaging sections. These results confirm the predictable and efficient scaling behaviour of the system across varying imaging depths.

Sectional 3D image compression

... continue reading