Show HN: Arc – high-throughput time-series warehouse with DuckDB analytics

Arc Core High-performance time-series data warehouse built on DuckDB, Parquet, and MinIO. ⚠️ Alpha Release - Technical Preview Arc Core is currently in active development and evolving rapidly. While the system is stable and functional, it is not recommended for production workloads at this time. We are continuously improving performance, adding features, and refining the API. Use in development and testing environments only. Features High-Performance Ingestion : MessagePack binary protocol (recommended), InfluxDB Line Protocol (drop-in replacement), JSON : MessagePack binary protocol (recommended), InfluxDB Line Protocol (drop-in replacement), JSON DuckDB Query Engine : Fast analytical queries with SQL : Fast analytical queries with SQL Distributed Storage with MinIO : S3-compatible object storage for unlimited scale and cost-effective data management (recommended). Also supports local disk, AWS S3, and GCS : S3-compatible object storage for unlimited scale and cost-effective data management (recommended). Also supports local disk, AWS S3, and GCS Data Import : Import data from InfluxDB, TimescaleDB, HTTP endpoints : Import data from InfluxDB, TimescaleDB, HTTP endpoints Query Caching : Configurable result caching for improved performance : Configurable result caching for improved performance Production Ready: Docker deployment with health checks and monitoring Performance Benchmark 🚀 Arc achieves 1.89M records/sec with MessagePack binary protocol! Metric Value Notes Throughput 1.89M records/sec MessagePack binary protocol p50 Latency 21ms Median response time p95 Latency 204ms 95th percentile Success Rate 99.9998% Production-grade reliability vs Line Protocol 7.9x faster 240K → 1.89M RPS Tested on Apple M3 Max (14 cores), native deployment with MinIO 🎯 Optimal Configuration: Workers: 3x CPU cores (e.g., 14 cores = 42 workers) 3x CPU cores (e.g., 14 cores = 42 workers) Deployment: Native mode (2.4x faster than Docker) Native mode (2.4x faster than Docker) Storage: MinIO native (not containerized) MinIO native (not containerized) Protocol: MessagePack binary ( /write/v2/msgpack ) Quick Start (Native - Recommended for Maximum Performance) Native deployment delivers 1.89M RPS vs 570K RPS in Docker (2.4x faster). # One-command start (auto-installs MinIO, auto-detects CPU cores) ./start.sh native # Alternative: Manual setup python3.11 -m venv venv source venv/bin/activate pip install -r requirements.txt cp .env.example .env # Start MinIO natively (auto-configured by start.sh) brew install minio/stable/minio minio/stable/mc # macOS # OR download from https://min.io/download for Linux # Start Arc (auto-detects optimal worker count: 3x CPU cores) ./start.sh native Arc API will be available at http://localhost:8000 MinIO Console at http://localhost:9001 (minioadmin/minioadmin) Quick Start (Docker) # Start Arc Core with MinIO docker-compose up -d # Check status docker-compose ps # View logs docker-compose logs -f arc-api # Stop docker-compose down Note: Docker mode achieves ~570K RPS. For maximum performance (1.89M RPS), use native deployment. Remote Deployment Deploy Arc Core to a remote server: # Docker deployment ./deploy.sh -h your-server.com -u ubuntu -m docker # Native deployment ./deploy.sh -h your-server.com -u ubuntu -m native Configuration Arc Core uses a centralized arc.conf configuration file (TOML format). This provides: Clean, organized configuration structure Environment variable overrides for Docker/production Production-ready defaults Comments and documentation inline Primary Configuration: arc.conf Edit the arc.conf file for all settings: # Server Configuration [ server ] host = " 0.0.0.0 " port = 8000 workers = 8 # Adjust based on load: 4=light, 8=medium, 16=high # Authentication [ auth ] enabled = true default_token = " " # Leave empty to auto-generate # Query Cache [ query_cache ] enabled = true ttl_seconds = 60 # Storage Backend (MinIO recommended) [ storage ] backend = " minio " [ storage . minio ] endpoint = " http://minio:9000 " access_key = " minioadmin " secret_key = " minioadmin123 " bucket = " arc " use_ssl = false # For AWS S3 # [storage] # backend = "s3" # [storage.s3] # bucket = "arc-data" # region = "us-east-1" # For Google Cloud Storage # [storage] # backend = "gcs" # [storage.gcs] # bucket = "arc-data" # project_id = "my-project" Configuration Priority (highest to lowest): Environment variables (e.g., ARC_WORKERS=16 ) arc.conf file Built-in defaults Environment Variable Overrides You can override any setting via environment variables: # Server ARC_HOST=0.0.0.0 ARC_PORT=8000 ARC_WORKERS=8 # Storage STORAGE_BACKEND=minio MINIO_ENDPOINT=minio:9000 MINIO_ACCESS_KEY=minioadmin MINIO_SECRET_KEY=minioadmin123 MINIO_BUCKET=arc # Cache QUERY_CACHE_ENABLED=true QUERY_CACHE_TTL=60 # Logging LOG_LEVEL=INFO Legacy Support: .env files are still supported for backward compatibility, but arc.conf is recommended. Getting Started 1. Get Your Admin Token After starting Arc Core, create an admin token for API access: # Docker deployment docker exec -it arc-api python3 -c " from api.auth import AuthManager auth = AuthManager(db_path='/data/historian.db') token = auth.create_token('my-admin', description='Admin token') print(f'Admin Token: {token}') " # Native deployment cd /path/to/arc-core source venv/bin/activate python3 -c " from api.auth import AuthManager auth = AuthManager() token = auth.create_token('my-admin', description='Admin token') print(f'Admin Token: {token}') " Save this token - you'll need it for all API requests. 2. API Endpoints All endpoints require authentication via Bearer token: # Set your token export ARC_TOKEN= " your-token-here " Health Check curl http://localhost:8000/health Ingest Data (MessagePack - Recommended) MessagePack binary protocol offers 3x faster ingestion with zero-copy PyArrow processing: import msgpack import requests from datetime import datetime # Prepare data in MessagePack format data = { "database" : "metrics" , "table" : "cpu_usage" , "records" : [ { "timestamp" : int ( datetime . now (). timestamp () * 1e9 ), # nanoseconds "host" : "server01" , "cpu" : 0.64 , "memory" : 0.82 }, { "timestamp" : int ( datetime . now (). timestamp () * 1e9 ), "host" : "server02" , "cpu" : 0.45 , "memory" : 0.71 } ] } # Send via MessagePack response = requests . post ( "http://localhost:8000/write/v2/msgpack" , headers = { "Authorization" : f"Bearer { token } " , "Content-Type" : "application/msgpack" }, data = msgpack . packb ( data ) ) print ( response . json ()) Batch ingestion (for high throughput): # Send 10,000 records at once records = [ { "timestamp" : int ( datetime . now (). timestamp () * 1e9 ), "sensor_id" : f"sensor_ { i } " , "temperature" : 20 + ( i % 10 ), "humidity" : 60 + ( i % 20 ) } for i in range ( 10000 ) ] data = { "database" : "iot" , "table" : "sensors" , "records" : records } response = requests . post ( "http://localhost:8000/write/v2/msgpack" , headers = { "Authorization" : f"Bearer { token } " , "Content-Type" : "application/msgpack" }, data = msgpack . packb ( data ) ) Ingest Data (Line Protocol - InfluxDB Compatibility) For drop-in replacement of InfluxDB - compatible with Telegraf and InfluxDB clients: # InfluxDB 1.x compatible endpoint curl -X POST " http://localhost:8000/write/line?db=mydb " \ -H " Authorization: Bearer $ARC_TOKEN " \ -H " Content-Type: text/plain " \ --data-binary " cpu,host=server01 value=0.64 1633024800000000000 " # Multiple measurements curl -X POST " http://localhost:8000/write/line?db=metrics " \ -H " Authorization: Bearer $ARC_TOKEN " \ -H " Content-Type: text/plain " \ --data-binary " cpu,host=server01,region=us-west value=0.64 1633024800000000000 memory,host=server01,region=us-west used=8.2,total=16.0 1633024800000000000 disk,host=server01,region=us-west used=120.5,total=500.0 1633024800000000000 " Telegraf configuration (drop-in InfluxDB replacement): [[ outputs . influxdb ]] urls = [ " http://localhost:8000 " ] database = " telegraf " skip_database_creation = true # Authentication username = " " # Leave empty password = " $ARC_TOKEN " # Use your Arc token as password # Or use HTTP headers [ outputs . influxdb . headers ] Authorization = " Bearer $ARC_TOKEN " Query Data curl -X POST http://localhost:8000/query \ -H " Authorization: Bearer $ARC_TOKEN " \ -H " Content-Type: application/json " \ -d ' { "database": "mydb", "query": "SELECT * FROM cpu_usage WHERE host = ' \' ' server01 ' \' ' ORDER BY timestamp DESC LIMIT 100" } ' Advanced queries with DuckDB SQL: # Aggregations curl -X POST http://localhost:8000/query \ -H " Authorization: Bearer $ARC_TOKEN " \ -H " Content-Type: application/json " \ -d ' { "database": "metrics", "query": "SELECT host, AVG(cpu) as avg_cpu, MAX(memory) as max_memory FROM cpu_usage WHERE timestamp > now() - INTERVAL 1 HOUR GROUP BY host" } ' # Time-series analysis curl -X POST http://localhost:8000/query \ -H " Authorization: Bearer $ARC_TOKEN " \ -H " Content-Type: application/json " \ -d ' { "database": "iot", "query": "SELECT time_bucket(INTERVAL ' \' ' 5 minutes ' \' ' , timestamp) as bucket, AVG(temperature) as avg_temp FROM sensors GROUP BY bucket ORDER BY bucket" } ' Architecture Overview ┌─────────────────────────────────────────────────────────────┐ │ Client Applications │ │ (Telegraf, Python, Go, JavaScript, curl, etc.) │ └──────────────────┬──────────────────────────────────────────┘ │ │ HTTP/HTTPS ▼ ┌─────────────────────────────────────────────────────────────┐ │ Arc API Layer (FastAPI) │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────────┐ │ │ │ Line Protocol│ │ MessagePack │ │ Query Engine │ │ │ │ Endpoint │ │ Binary API │ │ (DuckDB) │ │ │ └──────────────┘ └──────────────┘ └──────────────────┘ │ └──────────────────┬──────────────────────────────────────────┘ │ │ Write Pipeline ▼ ┌─────────────────────────────────────────────────────────────┐ │ Buffering & Processing Layer │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ ParquetBuffer (Line Protocol) │ │ │ │ - Batches records by measurement │ │ │ │ - Polars DataFrame → Parquet │ │ │ │ - Snappy compression │ │ │ └──────────────────────────────────────────────────────┘ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ ArrowParquetBuffer (MessagePack Binary) │ │ │ │ - Zero-copy PyArrow RecordBatch │ │ │ │ - Direct Parquet writes (3x faster) │ │ │ │ - Columnar from start │ │ │ └──────────────────────────────────────────────────────┘ │ └──────────────────┬──────────────────────────────────────────┘ │ │ Parquet Files ▼ ┌─────────────────────────────────────────────────────────────┐ │ Storage Backend (Pluggable) │ │ ┌────────────────────────────────────────────────────────┐ │ │ │ MinIO (Recommended - S3-compatible) │ │ │ │ ✓ Unlimited scale ✓ Distributed │ │ │ │ ✓ Cost-effective ✓ Self-hosted │ │ │ │ ✓ High availability ✓ Erasure coding │ │ │ │ ✓ Multi-tenant ✓ Object versioning │ │ │ └────────────────────────────────────────────────────────┘ │ │ │ │ Alternative backends: Local Disk, AWS S3, Google Cloud │ └─────────────────────────────────────────────────────────────┘ │ │ Query Path (Direct Parquet reads) ▼ ┌─────────────────────────────────────────────────────────────┐ │ Query Engine (DuckDB) │ │ - Direct Parquet reads from object storage │ │ - Columnar execution engine │ │ - Query cache for common queries │ │ - Full SQL interface (Postgres-compatible) │ └─────────────────────────────────────────────────────────────┘ Why MinIO? Arc Core is designed with MinIO as the primary storage backend for several key reasons: Unlimited Scale: Store petabytes of time-series data without hitting storage limits Cost-Effective: Commodity hardware or cloud storage at fraction of traditional database costs Distributed Architecture: Built-in replication and erasure coding for data durability S3 Compatibility: Works with any S3-compatible storage (AWS S3, GCS, Wasabi, etc.) Performance: Direct Parquet reads from object storage with DuckDB's efficient execution Separation of Compute & Storage: Scale storage and compute independently Self-Hosted Option: Run on your own infrastructure without cloud vendor lock-in The MinIO + Parquet + DuckDB combination provides the perfect balance of cost, performance, and scalability for analytical time-series workloads. Performance Arc Core has been benchmarked using ClickBench - the industry-standard analytical database benchmark with 100M row dataset (14GB) and 43 analytical queries. ClickBench Results Hardware: AWS c6a.4xlarge (16 vCPU AMD EPYC 7R13, 32GB RAM, 500GB gp2) Cold Run Total : 35.18s (sum of 43 queries, first execution) : 35.18s (sum of 43 queries, first execution) Hot Run Average : 0.81s (average per query after caching) : 0.81s (average per query after caching) Aggregate Performance : ~2.8M rows/sec cold, ~123M rows/sec hot (across all queries) : ~2.8M rows/sec cold, ~123M rows/sec hot (across all queries) Storage : MinIO (S3-compatible) : MinIO (S3-compatible) Success Rate: 43/43 queries (100%) Hardware: Apple M3 Max (14 cores ARM, 36GB RAM) Cold Run Total : 23.86s (sum of 43 queries, first execution) : 23.86s (sum of 43 queries, first execution) Hot Run Average : 0.52s (average per query after caching) : 0.52s (average per query after caching) Aggregate Performance : ~4.2M rows/sec cold, ~192M rows/sec hot (across all queries) : ~4.2M rows/sec cold, ~192M rows/sec hot (across all queries) Storage : Local NVMe SSD : Local NVMe SSD Success Rate: 43/43 queries (100%) Key Performance Characteristics Columnar Storage : Parquet format with Snappy compression : Parquet format with Snappy compression Query Engine : DuckDB with default settings (ClickBench compliant) : DuckDB with default settings (ClickBench compliant) Result Caching : 60s TTL for repeated queries (production mode) : 60s TTL for repeated queries (production mode) End-to-End: All timings include HTTP/JSON API overhead Fastest Queries (M3 Max) Query Time (avg) Description Q1 0.021s Simple aggregation Q8 0.034s String parsing Q27 0.086s Complex grouping Q41 0.048s URL parsing Q42 0.044s Multi-column filter Most Complex Queries Query Time (avg) Description Q29 7.97s Heavy string operations Q19 1.69s Multiple joins Q33 1.86s Complex aggregations Benchmark Configuration: Dataset: 100M rows, 14GB Parquet (ClickBench hits.parquet) Protocol: HTTP REST API with JSON responses Caching: Disabled for benchmark compliance Tuning: None (default DuckDB settings) See full results and methodology at ClickBench Results (Arc submission pending). Docker Services The docker-compose.yml includes: arc-api : Main API server (port 8000) : Main API server (port 8000) minio : S3-compatible storage (port 9000, console 9001) : S3-compatible storage (port 9000, console 9001) minio-init: Initializes MinIO buckets on startup Development # Run with auto-reload uvicorn api.main:app --reload --host 0.0.0.0 --port 8000 # Run tests (if available in parent repo) pytest tests/ Monitoring Health check endpoint: curl http://localhost:8000/health Logs: # Docker docker-compose logs -f arc-api # Native (systemd) sudo journalctl -u arc-api -f API Reference Public Endpoints (No Authentication Required) GET / - API information - API information GET /health - Service health check - Service health check GET /ready - Readiness probe - Readiness probe GET /docs - Swagger UI documentation - Swagger UI documentation GET /redoc - ReDoc documentation - ReDoc documentation GET /openapi.json - OpenAPI specification Note: All other endpoints require Bearer token authentication. Data Ingestion Endpoints MessagePack Binary Protocol (Recommended - 3x faster): POST /write/v2/msgpack - Write data via MessagePack - Write data via MessagePack POST /api/v2/msgpack - Alternative endpoint - Alternative endpoint GET /write/v2/msgpack/stats - Get ingestion statistics - Get ingestion statistics GET /write/v2/msgpack/spec - Get protocol specification Line Protocol (InfluxDB compatibility): POST /write - InfluxDB 1.x compatible write - InfluxDB 1.x compatible write POST /api/v1/write - InfluxDB 1.x API format - InfluxDB 1.x API format POST /api/v2/write - InfluxDB 2.x API format - InfluxDB 2.x API format POST /api/v1/query - InfluxDB 1.x query format - InfluxDB 1.x query format GET /write/health - Write endpoint health check - Write endpoint health check GET /write/stats - Write statistics - Write statistics POST /write/flush - Force flush write buffer Query Endpoints POST /query - Execute DuckDB SQL query - Execute DuckDB SQL query POST /query/estimate - Estimate query cost - Estimate query cost POST /query/stream - Stream large query results - Stream large query results GET /query/{measurement} - Get measurement data - Get measurement data GET /query/{measurement}/csv - Export measurement as CSV - Export measurement as CSV GET /measurements - List all measurements/tables Authentication GET /auth/verify - Verify token validity - Verify token validity GET /auth/tokens - List all tokens - List all tokens POST /auth/tokens - Create new token - Create new token GET /auth/tokens/{id} - Get token details - Get token details PATCH /auth/tokens/{id} - Update token - Update token DELETE /auth/tokens/{id} - Delete token - Delete token POST /auth/tokens/{id}/rotate - Rotate token (generate new) Health & Monitoring GET /health - Service health check - Service health check GET /ready - Readiness probe - Readiness probe GET /metrics - Prometheus metrics - Prometheus metrics GET /metrics/timeseries/{type} - Time-series metrics - Time-series metrics GET /metrics/endpoints - Endpoint statistics - Endpoint statistics GET /metrics/query-pool - Query pool status - Query pool status GET /metrics/memory - Memory profile - Memory profile GET /logs - Application logs Connection Management InfluxDB Connections: GET /connections/influx - List InfluxDB connections - List InfluxDB connections POST /connections/influx - Create InfluxDB connection - Create InfluxDB connection PUT /connections/influx/{id} - Update connection - Update connection DELETE /connections/{type}/{id} - Delete connection - Delete connection POST /connections/{type}/{id}/activate - Activate connection - Activate connection POST /connections/{type}/test - Test connection Storage Connections: GET /connections/storage - List storage backends - List storage backends POST /connections/storage - Create storage connection - Create storage connection PUT /connections/storage/{id} - Update storage connection Export Jobs GET /jobs - List all export jobs - List all export jobs POST /jobs - Create new export job - Create new export job PUT /jobs/{id} - Update job configuration - Update job configuration DELETE /jobs/{id} - Delete job - Delete job GET /jobs/{id}/executions - Get job execution history - Get job execution history POST /jobs/{id}/run - Run job immediately - Run job immediately POST /jobs/{id}/cancel - Cancel running job - Cancel running job GET /monitoring/jobs - Monitor job status HTTP/JSON Export POST /api/http-json/connections - Create HTTP/JSON connection - Create HTTP/JSON connection GET /api/http-json/connections - List connections - List connections GET /api/http-json/connections/{id} - Get connection details - Get connection details PUT /api/http-json/connections/{id} - Update connection - Update connection DELETE /api/http-json/connections/{id} - Delete connection - Delete connection POST /api/http-json/connections/{id}/test - Test connection - Test connection POST /api/http-json/connections/{id}/discover-schema - Discover schema - Discover schema POST /api/http-json/export - Export data via HTTP Cache Management GET /cache/stats - Cache statistics - Cache statistics GET /cache/health - Cache health status - Cache health status POST /cache/clear - Clear query cache Interactive API Documentation Arc Core includes auto-generated API documentation: Swagger UI : http://localhost:8000/docs : ReDoc : http://localhost:8000/redoc : OpenAPI JSON: http://localhost:8000/openapi.json Roadmap Arc Core is under active development. Current focus areas: Performance Optimization : Further improvements to ingestion and query performance : Further improvements to ingestion and query performance API Stability : Finalizing core API contracts : Finalizing core API contracts Enhanced Monitoring : Additional metrics and observability features : Additional metrics and observability features Documentation : Expanded guides and tutorials : Expanded guides and tutorials Production Hardening: Testing and validation for production use cases We welcome feedback and feature requests as we work toward a stable 1.0 release. License Arc Core is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). This means: ✅ Free to use - Use Arc Core for any purpose - Use Arc Core for any purpose ✅ Free to modify - Modify the source code as needed - Modify the source code as needed ✅ Free to distribute - Share your modifications with others - Share your modifications with others ⚠️ Share modifications - If you modify Arc and run it as a service, you must share your changes under AGPL-3.0 Why AGPL? AGPL-3.0 ensures that improvements to Arc benefit the entire community, even when run as a cloud service. This prevents the "SaaS loophole" where companies could take the code, improve it, and keep changes proprietary. Commercial Licensing For organizations that require: Proprietary modifications without disclosure Commercial support and SLAs Enterprise features and managed services Please contact us at: enterprise[at]basekick[dot]net We offer dual licensing and commercial support options. Support Community Support : GitHub Issues : GitHub Issues Enterprise Support : enterprise[at]basekick[dot]net : enterprise[at]basekick[dot]net General Inquiries: support[at]basekick[dot]net Disclaimer Arc Core is provided "as-is" in alpha state. While we use it extensively for development and testing, it is not yet production-ready. Features and APIs may change without notice. Always back up your data and test thoroughly in non-production environments before considering any production deployment.

Show HN: Arc – high-throughput time-series warehouse with DuckDB analytics

Share this article

Related Articles