Show HN: OSS implementation of Test Time Diffusion that runs on a 24gb GPU

TTD-RAG: A Test-Time Diffusion Framework for the MMU-RAG Competition

This repository contains our submission for the MMU-RAG Competition, a deep research agent named TTD-RAG. Our system is a faithful implementation of the framework proposed in the paper "Deep Researcher with Test-Time Diffusion (TTD-DR)". This README is generated by gemini 2.5.

It conceptualizes report generation as an iterative "denoising" process, starting with a preliminary draft and progressively refining it through cycles of targeted search, synthesis, and revision. This approach is designed to excel at complex, multi-hop reasoning tasks that require coherent, long-form answers.

🎯 Key Features

Test-Time Diffusion Framework : Models research report generation as an iterative process of refining a "noisy" draft with external information, ensuring coherence and reducing information loss.

: Models research report generation as an iterative process of refining a "noisy" draft with external information, ensuring coherence and reducing information loss. Report-Level Denoising with Retrieval : Uses an evolving draft to dynamically guide the search process, ensuring each retrieval step is targeted at filling specific knowledge gaps.

: Uses an evolving draft to dynamically guide the search process, ensuring each retrieval step is targeted at filling specific knowledge gaps. Component-wise Self-Evolution : Enhances the quality of each step in the workflow (planning, synthesis) by generating diverse variants, critiquing them, and merging them into a superior output.

: Enhances the quality of each step in the workflow (planning, synthesis) by generating diverse variants, critiquing them, and merging them into a superior output. High-Performance Serving : Utilizes vLLM to serve both the generative ( Qwen/Qwen3-4B-Instruct-2507 ) and reranking ( tomaarsen/Qwen3-Reranker-0.6B-seq-cls ) models for high throughput and low latency.

: Utilizes to serve both the generative ( ) and reranking ( ) models for high throughput and low latency. Competition Compliant: Fully supports both dynamic (streaming) and static evaluation endpoints as required by the competition rules, validated with the provided local_test.py script.

⚙️ System Architecture & Workflow

... continue reading