ck - Semantic Grep by Embedding ck (seek) finds code by meaning, not just keywords. It's a drop-in replacement for grep that understands what you're looking for โ€” search for "error handling" and find try/catch blocks, error returns, and exception handling code even when those exact words aren't present. Quick start cargo install ck-search # Find error handling patterns (finds try/catch, Result types, etc.) ck --sem " error handling " src/ # Traditional grep-compatible search still works ck -n " TODO " * .rs # Combine both: semantic relevance + keyword filtering ck --hybrid " connection timeout " src/ Why ck? For Developers: Stop hunting through thousands of regex false positives. Find the code you actually need by describing what it does. For AI Agents: Get structured, semantic search results in JSON format. Perfect for code analysis, documentation generation, and automated refactoring. For Teams: Works exactly like grep with the same flags and behavior, but adds semantic intelligence when you need it. Quick Start # Build from source cargo build --release # Index your project for semantic search ./target/debug/ck index src/ # Search by meaning ./target/debug/ck --sem " authentication logic " src/ ./target/debug/ck --sem " database connection pooling " src/ ./target/debug/ck --sem " retry mechanisms " src/ # Use all the grep features you know ./target/debug/ck -n -C 3 " error " src/ ./target/debug/ck -r " TODO|FIXME " . Core Features ๐Ÿ” Semantic Search Find code by concept, not keywords. Searches understand synonyms, related terms, and conceptual similarity. # These find related code even without exact keywords: ck --sem " retry logic " # finds backoff, circuit breakers ck --sem " user authentication " # finds login, auth, credentials ck --sem " data validation " # finds sanitization, type checking # Get complete functions/classes containing matches (NEW!) ck --sem --full-section " error handling " # returns entire functions ck --full-section " async def " src/ # works with regex too โšก Drop-in grep Compatibility All your muscle memory works. Same flags, same behavior, same output format. ck -i " warning " * .log # Case-insensitive ck -n -A 3 -B 1 " error " src/ # Line numbers + context ck --no-filename " TODO " src/ # Suppress filenames (grep -h equivalent) ck -l " error " src/ # List files with matches only (NEW!) ck -L " TODO " src/ # List files without matches (NEW!) ck -r --exclude " *.test.js " " bug " # Recursive with exclusions ck " pattern " file1.txt file2.txt # Multiple files ๐ŸŽฏ Hybrid Search Combine keyword precision with semantic understanding using Reciprocal Rank Fusion. ck --hybrid " async timeout " src/ # Best of both worlds ck --hybrid --scores " cache " src/ # Show relevance scores with color highlighting ck --hybrid --threshold 0.02 query # Filter by minimum relevance ck -l --hybrid " database " src/ # List files using hybrid search ๐Ÿค– Agent-Friendly Output Perfect JSON output for LLMs, scripts, and automation. ck --json --sem " error handling " src/ | jq ' .file ' ck --json --topk 5 " TODO " . | jq -r ' .preview ' ck --json --full-section --sem " database " . | jq -r ' .preview ' # Complete functions ๐Ÿ“ Smart File Filtering Automatically excludes cache directories, build artifacts, and system files. # These are excluded by default: # .git, node_modules, target/, .fastembed_cache, __pycache__ # Override defaults: ck --no-default-excludes " pattern " . # Search everything ck --exclude " dist " --exclude " logs " . # Add custom exclusions How It Works 1. Index Once, Search Many # Create semantic index (one-time setup) ck index /path/to/project # Now search instantly by meaning ck --sem " database queries " . ck --sem " error handling " . ck --sem " authentication " . 2. Three Search Modes --regex (default): Classic grep behavior, no indexing required (default): Classic grep behavior, no indexing required --sem : Pure semantic search using embeddings (requires index) : Pure semantic search using embeddings (requires index) --hybrid : Combines regex + semantic with intelligent ranking 3. Relevance Scoring ck --sem --scores " machine learning " docs/ # [0.847] ./ai_guide.txt: Machine learning introduction... # [0.732] ./statistics.txt: Statistical learning methods... # [0.681] ./algorithms.txt: Classification algorithms... Advanced Usage Search Specific Files # Glob patterns work ck --sem " authentication " * .py * .js * .rs # Multiple files ck --sem " error handling " src/auth.rs src/db.rs # Quoted patterns prevent shell expansion ck --sem " auth " " src/**/*.ts " Threshold Filtering # Only high-confidence semantic matches ck --sem --threshold 0.7 " query " # Low-confidence hybrid matches (good for exploration) ck --hybrid --threshold 0.01 " concept " # Get complete code sections instead of snippets (NEW!) ck --sem --full-section " database queries " ck --full-section " class.*Error " src/ # Complete classes Top-K Results # Limit results for focused analysis ck --sem --topk 5 " authentication patterns " # Great for AI agent consumption ck --json --topk 10 " error handling " | process_results.py Directory Management # Check index status ck status . # Clean up and rebuild ck clean . ck index . # Add single file to index ck add new_file.rs File Support Language Indexing Tree-sitter Parsing Semantic Chunking Python โœ… โœ… โœ… Functions, classes JavaScript โœ… โœ… โœ… Functions, classes, methods TypeScript โœ… โœ… โœ… Functions, classes, methods Haskell โœ… โœ… โœ… Functions, types, instances Text Formats: Markdown, JSON, YAML, TOML, XML, HTML, CSS, shell scripts, SQL, and plain text. Smart Exclusions: Automatically skips .git , node_modules , target/ , build/ , dist/ , __pycache__/ , .fastembed_cache , .venv , venv , and other common build/cache/virtual environment directories. Installation From Source git clone https://github.com/BeaconBay/ck cd ck cargo install --path ck-cli Package Managers (Planned) # Coming soon: brew install ck-search apt install ck-search Architecture ck uses a modular Rust workspace: ck-cli - Command-line interface and argument parsing - Command-line interface and argument parsing ck-core - Shared types, configuration, and utilities - Shared types, configuration, and utilities ck-search - Search engine implementations (regex, BM25, semantic) - Search engine implementations (regex, BM25, semantic) ck-index - File indexing, hashing, and sidecar management - File indexing, hashing, and sidecar management ck-embed - Text embedding providers (FastEmbed, API backends) - Text embedding providers (FastEmbed, API backends) ck-ann - Approximate nearest neighbor search indices - Approximate nearest neighbor search indices ck-chunk - Text segmentation and language-aware parsing - Text segmentation and language-aware parsing ck-models - Model registry and configuration management Index Storage Indexes are stored in .ck/ directories alongside your code: project/ โ”œโ”€โ”€ src/ โ”œโ”€โ”€ docs/ โ””โ”€โ”€ .ck/ # Semantic index (can be safely deleted) โ”œโ”€โ”€ embeddings.json โ”œโ”€โ”€ ann_index.bin โ””โ”€โ”€ tantivy_index/ The .ck/ directory is a cache โ€” safe to delete and rebuild anytime. Examples Finding Code Patterns # Find authentication/authorization code ck --sem " user permissions " src/ ck --sem " access control " src/ ck --sem " login validation " src/ # Find error handling strategies ck --sem " exception handling " src/ ck --sem " error recovery " src/ ck --sem " fallback mechanisms " src/ # Find performance-related code ck --sem " caching strategies " src/ ck --sem " database optimization " src/ ck --sem " memory management " src/ Integration Examples # Git hooks git diff --name-only | xargs ck --sem " TODO " # CI/CD pipeline ck --json --sem " security vulnerability " . | security_scanner.py # Code review prep ck --hybrid --scores " performance " src/ > review_notes.txt # Documentation generation ck --json --sem " public API " src/ | generate_docs.py Team Workflows # Find related test files ck --sem " unit tests for authentication " tests/ ck -l --sem " test " tests/ # List test files by semantic content # Identify refactoring candidates ck --sem " duplicate logic " src/ ck --sem " code complexity " src/ ck -L " test " src/ # Find source files without tests # Security audit ck --hybrid " password|credential|secret " src/ ck --sem " input validation " src/ ck -l --hybrid --scores " security " src/ # Files with security-related code Configuration Default Exclusions # View current exclusion patterns ck --help | grep -A 20 exclude # These directories are excluded by default: # .git, .svn, .hg # Version control # node_modules, target, build # Build artifacts # .cache, __pycache__, .fastembed_cache # Caches # .vscode, .idea # IDE files Custom Configuration (Planned) # .ck/config.toml [ search ] default_mode = " hybrid " default_threshold = 0.05 [ indexing ] exclude_patterns = [ " *.log " , " temp/ " ] chunk_size = 512 overlap = 64 [ models ] embedding_model = " BAAI/bge-small-en-v1.5 " Performance Indexing: ~1M LOC in under 2 minutes (with smart exclusions and optimized embedding computation) ~1M LOC in under 2 minutes (with smart exclusions and optimized embedding computation) Search: Sub-500ms queries on typical codebases Sub-500ms queries on typical codebases Index size: ~2x source code size with compression ~2x source code size with compression Memory: Efficient streaming for large repositories with span-based content extraction Efficient streaming for large repositories with span-based content extraction File filtering: Automatic exclusion of virtual environments and build artifacts Automatic exclusion of virtual environments and build artifacts Output: Clean stdout/stderr separation for reliable piping and scripting Testing Run the comprehensive test suite: # Full test suite (40+ tests) ./test_ck.sh # Quick smoke test (14 core tests) ./test_ck_simple.sh Tests cover grep compatibility, semantic search, index management, file filtering, and more. Contributing ck is actively developed and welcomes contributions: Issues: Report bugs, request features Code: Submit PRs for bug fixes, new features Documentation: Improve examples, guides, tutorials Testing: Help test on different codebases and languages Development Setup git clone https://github.com/your-org/ck cd ck cargo build cargo test ./target/debug/ck index test_files/ ./target/debug/ck --sem " test query " test_files/ Roadmap Current (v0.3+) โœ… grep-compatible CLI with semantic search and file listing flags ( -l , -L ) , ) โœ… FastEmbed integration with BGE models โœ… File exclusion patterns and glob support โœ… Threshold filtering and relevance scoring with visual highlighting โœ… Tree-sitter parsing and intelligent chunking (Python, TypeScript, JavaScript, Haskell) โœ… Complete code section extraction ( --full-section ) ) โœ… Enhanced indexing strategy with v3 semantic search optimization โœ… Clean stdout/stderr separation for reliable scripting โœ… Incremental index updates with hash-based change detection ๐Ÿšง Configuration file support ๐Ÿšง Package manager distributions ๐Ÿ”ฎ Multiple embedding model support ๐Ÿ”ฎ Advanced ranking algorithms ๐Ÿ”ฎ Plugin architecture for custom chunkers ๐Ÿ”ฎ Distributed/remote index support ๐Ÿ”ฎ IDE integrations (VS Code, IntelliJ, etc.) ๐Ÿ”ฎ Git integration (semantic diffs, blame) ๐Ÿ”ฎ Web interface for team usage ๐Ÿ”ฎ Multi-language semantic understanding FAQ Q: How is this different from grep/ripgrep/silver-searcher? A: ck includes all the features of traditional search tools, but adds semantic understanding. Search for "error handling" and find relevant code even when those exact words aren't used. Q: Does it work offline? A: Yes, completely offline. The embedding model runs locally with no network calls. Q: How big are the indexes? A: Typically 1-3x the size of your source code, depending on content. The .ck/ directory can be safely deleted to reclaim space. Q: Is it fast enough for large codebases? A: Yes. Indexing is a one-time cost, and searches are sub-second even on large projects. Regex searches require no indexing and are as fast as grep. Q: Can I use it in scripts/automation? A: Absolutely. The --json flag provides structured output perfect for automated processing. Use --full-section to get complete functions for AI analysis. Q: What about privacy/security? A: Everything runs locally. No code or queries are sent to external services. The embedding model is downloaded once and cached locally. License Licensed under either of: Apache License, Version 2.0 (LICENSE-APACHE) MIT License (LICENSE-MIT) at your option. Credits Built with: Rust - Systems programming language FastEmbed - Fast text embeddings Tantivy - Full-text search engine clap - Command line argument parsing Inspired by the need for better code search tools in the age of AI-assisted development. Start finding code by what it does, not what it says.