This article is both an introduction to a tool I have been working on called jsongrep , as well as a technical explanation of the internal search engine it uses. I also discuss the benchmarking strategy used to compare the performance of jsongrep against other JSON path-like query tools and implementations.
In this post I'll first show you the tool, then explain why it's fast (conceptually), then how it's fast (the automata theory), and finally prove it (benchmarks).
Upfront I would like to say that this article is heavily inspired by Andrew Gallant's amazing ripgrep tool, and his associated blog post "ripgrep is faster than {grep, ag, git grep, ucg, pt, sift}".
You can install jsongrep from crates.io:
cargo install jsongrep
Like ripgrep , jsongrep is cross-platform (binaries available here) and written in Rust.
jsongrep ( jg binary) takes a query and a JSON input and prints every value whose path through the document matches the query. Let's build up the query language piece by piece using this sample document:
sample.json :
{ " name " : " Micah " , " favorite_drinks " : [ " coffee " , " Dr. Pepper " , " Monster Energy " ] , " roommates " : [ { " name " : " Alice " , " favorite_food " : " pizza " } ] }
Dot paths select nested fields by name. Dots ( . ) between field names denote concatenation-- "match this field, then that field":
... continue reading