TL;DR: Built a Git-like CLI backed by PostgreSQL with automatic delta compression. Import any git repo, query its entire history with SQL. Benchmarked on 20 real repositories (273,703 commits): pgit outcompresses git gc --aggressive on 12 out of 20 repositories, while giving you full SQL access to every commit, file version, and change pattern. Then I gave an AI agent a single prompt and it produced a full codebase health report on Neon's own repo in under 10 minutes.
What is pgit?
pgit is a Git-like version control CLI where everything lives in PostgreSQL instead of the filesystem. You get the familiar workflow (init, add, commit, push, pull, diff, blame), but your repository is a database. And that means your entire commit history is queryable.
pgit init pgit import /path/to/your/repo --branch main pgit analyze coupling
file_a file_b commits_together ──────────────────────── ──────────────────────── ──────────────── src/parser.rs src/lexer.rs 127 src/db/schema.go src/db/migrations.go 84 README.md CHANGELOG.md 63
No scripts. No parsing git log output. No piping things through awk. Just answers.
The most common analyses are built in as single commands: churn, coupling, hotspots, authors, activity, and bus-factor. All support --json for programmatic consumption, --raw for piping, and display results in an interactive table with search and clipboard copy.
But everything is PostgreSQL underneath. When the built-in analyses aren't enough, drop down to raw SQL:
The coupling analysis above, as raw SQL SELECT pa.path, pb.path, COUNT ( * ) as times_together FROM pgit_file_refs a JOIN pgit_paths pa ON pa.path_id = a.path_id JOIN pgit_file_refs b ON a.commit_id = b.commit_id AND a.path_id < b.path_id JOIN pgit_paths pb ON pb.path_id = b.path_id GROUP BY pa.path, pb.path ORDER BY times_together DESC ; This finds every pair of files changed in the same commit, counts co-occurrences, and ranks by frequency. The a.path_id < b.path_id condition avoids counting the same pair twice. pgit analyze coupling optimizes this further: it computes pairs in memory and filters out bulk reformats (commits touching 100+ files) that produce noise, not signal.
Want to know your maintenance hotspots? That's pgit analyze churn . Or as SQL:
... continue reading