AI/ML Data Analytics Malloy Malloy
TL;DR
Text-to-SQL is not enough. Answering real user questions requires going the extra mile like multi-step plans, external tools (coding) and external context.
Answering real user questions requires going the extra mile like multi-step plans, external tools (coding) and external context. Context is the product. A semantic layer (we use Malloy ⎋) encodes business meaning and sharply reduces SQL complexity.
A semantic layer (we use Malloy ⎋) encodes business meaning and sharply reduces SQL complexity. Use a multi-agent, research-oriented system. Break problems down using context / domain knowledge, retrieve precisely, write code, interact with the environment and learn from it.
Break problems down using context / domain knowledge, retrieve precisely, write code, interact with the environment and learn from it. Retrieval is a recommendation problem. Mix keyword, embeddings, and a fine-tuned reranker; optimise for precision, recall, and latency.
Mix keyword, embeddings, and a fine-tuned reranker; optimise for precision, recall, and latency. Benchmarks ≠ production. Users expect human-level answers, drill-downs, and defensible reasoning, not just pass@k.
Users expect human-level answers, drill-downs, and defensible reasoning, not just pass@k. Latency and quality are a tight bar. Route between fast and reasoning models; cache aggressively; keep contexts short. Continuous model evaluation is needed to avoid drifts as new models are launched.
The short story
I spent years on ML for Analytics and Knowledge Discovery at Google and Twitter. For the past 3 years I've been building an AI data analyst at Findly (findly.ai ⎋). We entered Y Combinator with a different idea, but quickly realised the real problem for most teams wasn't "lack of data" — it was data discovery and use.
... continue reading