Skip to content
Tech News
← Back to articles

Statistics that live in your SQL

read original more articles
Why This Matters

The latest version of the-stats-duck extension enhances DuckDB's capabilities by enabling advanced statistical analysis directly within SQL, including distributions, tests, and regression, all accessible in-browser. This development simplifies data exploration and modeling, making sophisticated analytics more accessible to developers and data scientists without leaving their SQL environment.

Key Takeaways

← All posts

the-stats-duck — our open-source DuckDB extension — just shipped v0.6.0, and in keeping with the whole operation, the release is named i-m-not-dead. It very much isn't!

In case you haven't met it before: the-stats-duck is the statistics engine that Bedevere and KoliLang lean on. It allows DuckDB to do real statistics — distributions, tests, regression, even plots — without ever leaving SQL. MIT-licensed, and it runs anywhere DuckDB runs — including a browser!

Which is quite handy, because every demo below is a live Bedevere instance running the-stats-duck right here in your browser. The dataset is the famous Palmer Penguins — change the SQL and re-run it.

SELECT * FROM 'penguins' ; Copy

The first thing anyone would usually do with a new dataset is squint at it. meta() squints for you the full profile — one row per column:

SELECT column_name, kind, n_missing , n_distinct, mean, median, stddev, top FROM meta( 'penguins' ) ; Copy

It overlaps with DuckDB's built-in SUMMARIZE , but meta() is a table function — so you can join it, filter it, and compose it in CTEs. Do you need to know "how many numeric columns, and how many missing values total?" That's just aggregation over meta() .

Yes, ordinary least squares, in a SQL query, with the formula syntax you already know:

SELECT * FROM lm_summary( 'penguins' , formula : = 'body_mass_g ~ flipper_length_mm + bill_length_mm' ); Copy

... continue reading