Find Related products on Amazon

Shop on Amazon

Smallpond – A lightweight data processing framework built on DuckDB and 3FS

Published on: 2025-07-06 06:56:35

smallpond A lightweight data processing framework built on DuckDB and 3FS. Features 🚀 High-performance data processing powered by DuckDB 🌍 Scalable to handle PB-scale datasets 🛠️ Easy operations with no long-running services Installation Python 3.8 to 3.12 is supported. pip install smallpond Quick Start # Download example data wget https://duckdb.org/data/prices.parquet import smallpond # Initialize session sp = smallpond . init () # Load data df = sp . read_parquet ( "prices.parquet" ) # Process data df = df . repartition ( 3 , hash_by = "ticker" ) df = sp . partial_sql ( "SELECT ticker, min(price), max(price) FROM {0} GROUP BY ticker" , df ) # Save results df . write_parquet ( "output/" ) # Show results print ( df . to_pandas ()) Documentation For detailed guides and API reference: Performance We evaluated smallpond using the GraySort benchmark (script) on a cluster comprising 50 compute nodes and 25 storage nodes running 3FS. The benchmark sorted 110.5TiB of data in 3 ... Read full article.