Show HN: Use Claude Code to Query 600 GB Indexes over Hacker News, ArXiv, etc.

# ExoPriors Alignment Scry (Public Access) You have **public** access to the ExoPriors alignment research corpus. ## API Key (Public - No Signup Required) ``` exopriors_public_readonly_v1_2025 ``` (This key is intentionally embedded for ergonomics.) ## Capabilities - **Query**: SQL over 60M documents (posts, papers, tweets, comments) - **Embed**: Store named embeddings for semantic search - **Timeout**: adaptive, roughly 20–120s per query depending on load ## Strategy (public access) Start with quick exploratory queries (LIMIT 10–50) to confirm schema and search semantics, then build a small candidate set and join/aggregate. Keep result sets small to avoid flooding context. Let Postgres choose join order when possible; if public timeouts bite, intersect small candidate sets client-side as a fallback. Use `alignment.search()` + LIMIT as your candidate generator. ## Public @handles Public handle names must match `p_<8 hex>_` (e.g., `p_8f3a1c2d_myhandle`). Handles are write-once. Create an account for a private namespace with overwrites. --- ## Quick Reference ### SQL Query ```bash curl -X POST https://api.exopriors.com/v1/alignment/query \ -H "Authorization: Bearer exopriors_public_readonly_v1_2025" \ -H "Content-Type: application/json" \ -d '{{"sql": "SELECT * FROM alignment.search('\''mesa optimization'\'') LIMIT 10"}}' ``` ### Query Estimate (No Execution) ```bash curl -X POST https://api.exopriors.com/v1/alignment/estimate \ -H "Authorization: Bearer exopriors_public_readonly_v1_2025" \ -H "Content-Type: application/json" \ -d '{{"sql": "SELECT id FROM alignment.entities WHERE source = '\''hackernews'\'' AND kind = '\''comment'\'' LIMIT 1000"}}' ``` ### Schema Discovery ```bash curl -X GET https://api.exopriors.com/v1/alignment/schema \ -H "Authorization: Bearer exopriors_public_readonly_v1_2025" ``` **All `source` values (external_system enum):** `manual`, `lesswrong`, `eaforum`, `twitter`, `bluesky`, `arxiv`, `chinarxiv`, `community_archive`, `hackernews`, `datasecretslox`, `ethresearch`, `ethereum_magicians`, `openzeppelin_forum`, `devcon_forum`, `eips`, `ercs`, `sep`, `exo_user`, `coefficientgiving`, `slatestarcodex`, `marginalrevolution`, `rethinkpriorities`, `crawled_url`, `wikipedia`, `other` ### Store Embedding ```bash curl -X POST https://api.exopriors.com/v1/alignment/embed \ -H "Authorization: Bearer exopriors_public_readonly_v1_2025" \ -H "Content-Type: application/json" \ -d '{{"text": "concept description", "name": "p_8f3a1c2d_shared_concept"}}' ``` ### Semantic Search with @handle ```sql SELECT e.id, e.original_author, e.metadata->>'title' FROM alignment.embeddings emb JOIN alignment.entities e ON e.id = emb.entity_id WHERE emb.chunk_index = 0 AND emb.embedding IS NOT NULL ORDER BY emb.embedding <=> @p_8f3a1c2d_shared_concept LIMIT 20; ``` ### Good Starting Views (Materialized) Prefer materialized views for fast, filtered semantic search: - `mv_lesswrong_posts`, `mv_eaforum_posts`, `mv_hackernews_posts` - `mv_af_posts` (Alignment Forum posts) - `mv_high_karma_comments` (high-score comments; filter by `source`) - `mv_lesswrong_comments`, `mv_eaforum_comments` (all comments; embedding may be NULL) - `mv_hackernews_comments` (HN comments) - `mv_arxiv_papers` (papers; filter `WHERE embedding IS NOT NULL`) Note: `mv_*` are exposed under the **alignment** schema. If you qualify with a schema, use `alignment.mv_*`. Canonical MV columns include: `entity_id`, `uri`, `source`, `kind`, `original_author`, `original_timestamp`, `title`, `score`, `comment_count`, `vote_count`, `word_count`, `is_af`, `preview`, `embedding` (nullable). ### Lexical Search (BM25) ```sql SELECT * FROM alignment.search('mesa optimization'); SELECT * FROM alignment.search('"inner alignment"'); -- phrase search SELECT * FROM alignment.search('corrigibility', kinds => ARRAY['post', 'paper']); ``` **Completeness warning**: `alignment.search()` hard-caps `limit_n` at 100. It returns top BM25 results, not all matches. Use `alignment.search_exhaustive()` with pagination if missing results is worse than waiting: ```sql SELECT * FROM alignment.search_exhaustive( 'left brain', kinds => ARRAY['post'], limit_n => 500, offset_n => 0 ); ``` **Search semantics (important):** - Default `mode => 'auto'`: quoted phrases → phrase search, otherwise AND semantics. - Use `mode => 'or'` for any-term matching. - Use `mode => 'phrase'` for exact sequences; `mode => 'fuzzy'` for typos. - Common `kinds`: `post`, `comment`, `paper`, `tweet`, `twitter_thread`, `text` (others exist but are rare). **Return schema**: `alignment.search()` returns `(id, score, snippet, uri, kind, original_author, title, original_timestamp)`. `original_author` may be NULL (especially tweets). It does **not** return `metadata` or `payload`. Join to `alignment.entities` if you need them: ```sql SELECT s.title, s.score, e.metadata->>'baseScore' AS base_score FROM alignment.search('rust programming', kinds => ARRAY['post'], limit_n => 50) s JOIN alignment.entities e ON e.id = s.id ORDER BY s.score DESC LIMIT 20; ``` **Performance pattern (avoid timeouts):** Keep CTEs small by limiting inside the candidate set, then join: ```sql WITH candidates AS ( SELECT id FROM alignment.search('alignment', kinds => ARRAY['post'], limit_n => 200) ) SELECT e.original_author, COUNT(*) AS n FROM candidates c JOIN alignment.entities e ON e.id = c.id GROUP BY e.original_author ORDER BY n DESC LIMIT 25; ``` Prefer the `kinds => ARRAY[...]` parameter over `WHERE kind IN (...)` to keep the BM25 scan focused. **Completeness vs performance (explicit)**: - `alignment.search()` returns only the top 100 BM25 matches. Treat it as a sample. - For completeness-sensitive tasks, use `alignment.search_exhaustive()` + pagination, start with the rarest term, and intersect author sets. **Author topic intersection helper:** ```sql SELECT * FROM alignment.author_topics( NULL, ARRAY['alignment', 'rationality', 'governance'], kinds => ARRAY['post'], limit_n => 200 ); ``` **Performance tips (ballpark, load-dependent):** - Simple searches: ~1–5s - Embedding joins (<500K rows): ~5–20s - Complex aggregations (<2M rows): ~20–60s - Large scans (>5M rows): may timeout under load - `alignment.search()` is capped at 100 rows; use `alignment.search_exhaustive()` + pagination if completeness matters - If a query times out: reduce sample size, use fewer embeddings, or pre-filter with `alignment.search()`. For public keys, intersect small candidate lists client-side as a fallback. --- ## Upgrade Sign up at **exopriors.com/scry** for: - Private @handle namespace - Up to ~10-minute query timeout when load allows; estimates may show lower caps under load - 1.5M embedding token budget