# probe **Repository Path**: mirrors_buger/probe ## Basic Information - **Project Name**: probe - **Description**: Probe is an AI-friendly, fully local, semantic code search engine which which works with for large codebases. The final missing building block for next generation of AI coding tools. - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-03-07 - **Last Updated**: 2026-04-19 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

Probe Logo

# Probe **We read code 10x more than we write it.** Probe is a code and markdown context engine, with a built-in agent, made to work on enterprise-scale codebases. Today's AI coding tools use a caveman approach: grep some files, read random lines, hope for the best. It works on toy projects. It falls apart on real codebases. **Probe is a context engine built for reading and reasoning.** It treats your code as code—not text. AST parsing understands structure. Semantic search finds what matters. You get complete, meaningful context in a single call. **The Probe Agent** is purpose-built for code understanding. It knows how to wield the Probe engine expertly—searching, extracting, and reasoning across your entire codebase. Perfect for spec-driven development, code reviews, onboarding, and any task where understanding comes before writing. **One Probe call captures what takes other tools 10+ agentic loops**—deeper, cleaner, and far less noise. --- ## Table of Contents - [Why Probe?](#why-probe) - [Quick Start](#quick-start) - [Features](#features) - [Usage Modes](#usage-modes) - [Probe Agent (MCP)](#probe-agent-mcp) - [Raw MCP Tools](#raw-mcp-tools) - [CLI Agent](#cli-agent) - [Direct CLI Commands](#direct-cli-commands) - [Node.js SDK](#nodejs-sdk) - [LLM Script](#llm-script) - [Installation](#installation) - [Supported Languages](#supported-languages) - [Documentation](#documentation) - [Environment Variables](#environment-variables) - [Contributing](#contributing) - [License](#license) --- ## Why Probe? Most code search tools fall into two camps: **text-based** (grep, ripgrep) or **embedding-based** (vector search requiring indexing and an embedding model). Probe takes a third path: **AST-aware structural search with zero setup**. | | grep/ripgrep | Embedding tools (grepai, Octocode) | Probe | |---|---|---|---| | **Setup time** | None | Minutes (indexing + embedding service) | None | | **Code understanding** | Text only | Text chunks (can split mid-function) | AST-aware (returns complete functions/classes) | | **Search method** | Regex | Vector similarity | Elasticsearch-style boolean queries + BM25 | | **Result quality** | Line fragments | ~512-char chunks | Complete semantic code blocks | | **Ranking** | None (line order) | Cosine similarity | BM25/TF-IDF/Hybrid with SIMD acceleration | | **External dependencies** | None | Embedding API (Ollama/OpenAI) | None | | **Token awareness** | No | Partial | Yes (`--max-tokens`, session dedup) | | **Works offline** | Yes | Only with local model | Always | | **AI agent integration** | None | MCP server | Full agent loop + MCP + Vercel AI SDK | ### The key insight: AI agents don't need embedding search Embedding-based tools solve vocabulary mismatch -- finding "authentication" when the code says `verify_credentials`. But when an AI agent is the consumer, **the LLM already handles this**: ``` User: "find the authentication logic" -> LLM generates: probe search "verify_credentials OR authenticate OR login OR auth_handler" -> Probe returns complete AST blocks in milliseconds ``` The LLM translates intent into precise boolean queries. Probe gives it a powerful query language (`AND`, `OR`, `+required`, `-excluded`, `"exact phrases"`, `ext:rs`, `lang:python`) purpose-built for this. Combined with session dedup, the agent can run 3-4 rapid searches and cover more ground than a single embedding query -- faster, deterministic, and with zero setup cost. --- ## Quick Start ### Option 1: Probe Agent via MCP (Recommended) Our built-in agent natively integrates with Claude Code, using its authentication—no extra API keys needed. Add to `~/.claude/claude_desktop_config.json`: ```json { "mcpServers": { "probe": { "command": "npx", "args": ["-y", "@probelabs/probe@latest", "agent", "--mcp"] } } } ``` The Probe Agent is purpose-built to read and reason about code. It piggybacks on Claude Code's auth (or Codex auth), or works with any model via your own API key (e.g., `GOOGLE_API_KEY`). ### Option 2: Raw Probe Tools via MCP If you prefer direct access to search/query/extract tools without the agent layer: ```json { "mcpServers": { "probe": { "command": "npx", "args": ["-y", "@probelabs/probe@latest", "mcp"] } } } ``` ### Option 3: Direct CLI (No MCP) Use Probe directly from your terminal—no AI editor required: ```bash # Semantic search with Elasticsearch syntax npx -y @probelabs/probe search "authentication AND login" ./src # Extract code block at line 42 npx -y @probelabs/probe extract src/main.rs:42 # AST pattern matching npx -y @probelabs/probe query "fn $NAME($$$) -> Result<$RET>" --language rust ``` ### Option 4: CLI Agent Ask questions about any codebase directly from your terminal: ```bash # One-shot question (works with any LLM provider) npx -y @probelabs/probe@latest agent "How is authentication implemented?" # With code editing capabilities npx -y @probelabs/probe@latest agent "Refactor the login function" --allow-edit ``` --- ## Features - **Code-Aware**: Tree-sitter AST parsing understands your code's actual structure - **Semantic Search**: Elasticsearch-style queries (`AND`, `OR`, `NOT`, phrases, filters) - **Complete Context**: Returns entire functions, classes, or structs -- not text chunks that break mid-function - **Zero Indexing**: Instant results on any codebase. No embedding models, no vector databases, no setup - **Deterministic**: Same query always returns the same results. No model variance, no stale indexes - **Fully Local**: Your code never leaves your machine. No API calls for search - **Blazing Fast**: SIMD-accelerated pattern matching + ripgrep scanning + rayon parallelism - **Smart Ranking**: BM25, TF-IDF, and hybrid algorithms with optional BERT reranking - **Token-Aware**: `--max-tokens` budget, session-based dedup to avoid repeating context - **Built-in Agent**: Multi-provider (Anthropic, OpenAI, Google, Bedrock) with retry, fallback, and context compaction - **Multi-Language**: Rust, Python, JavaScript, TypeScript, Go, C/C++, Java, Ruby, PHP, Swift, C#, and more --- ## Usage Modes ### Probe Agent (MCP) The recommended way to use Probe with AI editors. The Probe Agent is a specialized coding assistant that reasons about your code—not just pattern matches. ```json { "mcpServers": { "probe": { "command": "npx", "args": ["-y", "@probelabs/probe@latest", "agent", "--mcp"] } } } ``` **Why use the agent?** - Purpose-built to understand and reason about code - Piggybacks on Claude Code / Codex authentication (or use your own API key) - Smarter multi-step reasoning for complex questions - Built-in code editing, task delegation, and more **Agent options:** | Option | Description | |--------|-------------| | `--path ` | Search directory (default: current) | | `--provider ` | AI provider: `anthropic`, `openai`, `google` | | `--model ` | Override model name | | `--prompt ` | Persona: `code-explorer`, `engineer`, `code-review`, `architect` | | `--allow-edit` | Enable code modification | | `--enable-delegate` | Enable task delegation to subagents | | `--enable-bash` | Enable bash command execution | | `--max-iterations ` | Max tool iterations (default: 30) | --- ### Raw MCP Tools Direct access to Probe's search, query, and extract tools—without the agent layer. Use this when you want your AI editor to call Probe tools directly. ```json { "mcpServers": { "probe": { "command": "npx", "args": ["-y", "@probelabs/probe@latest", "mcp"] } } } ``` **Available tools:** - `search` - Semantic code search with Elasticsearch-style queries - `query` - AST-based structural pattern matching - `extract` - Extract code blocks by line number or symbol name - `symbols` - List all symbols in a file (functions, classes, constants) with line numbers --- ### CLI Agent Run the Probe Agent directly from your terminal: ```bash # One-shot question npx -y @probelabs/probe@latest agent "How does the ranking algorithm work?" # Specify search path npx -y @probelabs/probe@latest agent "Find API endpoints" --path ./src # Enable code editing npx -y @probelabs/probe@latest agent "Add error handling to login()" --allow-edit # Use custom persona npx -y @probelabs/probe@latest agent "Review this code" --prompt code-review ``` --- ### Direct CLI Commands For scripting and direct code analysis. #### Search Command ```bash probe search [PATH] [OPTIONS] ``` **Examples:** ```bash # Basic search probe search "authentication" ./src # Boolean operators (Elasticsearch syntax) probe search "error AND handling" ./ probe search "login OR auth" ./src probe search "database NOT sqlite" ./ # Search hints (file filters) probe search "function AND ext:rs" ./ # Only .rs files probe search "class AND file:src/**/*.py" ./ # Python files in src/ probe search "error AND dir:tests" ./ # Files in tests/ # Limit results for AI context windows probe search "API" ./ --max-tokens 10000 ``` **Key options:** | Option | Description | |--------|-------------| | `--max-tokens ` | Limit total tokens returned | | `--max-results ` | Limit number of results | | `--reranker ` | Ranking: `bm25`, `tfidf`, `hybrid`, `hybrid2` | | `--allow-tests` | Include test files | | `--format ` | Output: `markdown`, `json`, `xml` | #### Extract Command ```bash probe extract [OPTIONS] ``` **Examples:** ```bash # Extract function at line 42 probe extract src/main.rs:42 # Extract by symbol name probe extract src/main.rs#authenticate # Extract line range probe extract src/main.rs:10-50 # From compiler output go test | probe extract ``` #### Symbols Command ```bash probe symbols [OPTIONS] ``` **Examples:** ```bash # List symbols in a file probe symbols src/main.rs # JSON output for programmatic use probe symbols src/main.rs --format json # Multiple files probe symbols src/main.rs src/lib.rs ``` #### Query Command (AST Patterns) ```bash probe query [PATH] [OPTIONS] ``` **Examples:** ```bash # Find all async functions in Rust probe query "async fn $NAME($$$)" --language rust # Find React components probe query "function $NAME($$$) { return <$$$> }" --language javascript # Find Python classes with specific method probe query "class $CLASS: def __init__($$$)" --language python ``` --- ### Node.js SDK Use Probe programmatically in your applications. ```javascript import { ProbeAgent } from '@probelabs/probe/agent'; // Create agent const agent = new ProbeAgent({ path: './src', provider: 'anthropic' }); await agent.initialize(); // Ask questions const response = await agent.answer('How does authentication work?'); console.log(response); // Get token usage console.log(agent.getTokenUsage()); ``` **Direct functions:** ```javascript import { search, extract, query, symbols } from '@probelabs/probe'; // Semantic search const results = await search({ query: 'authentication', path: './src', maxTokens: 10000 }); // Extract code const code = await extract({ files: ['src/auth.ts:42'], format: 'markdown' }); // List symbols in a file const fileSymbols = await symbols({ files: ['src/auth.ts'] }); // AST pattern query const matches = await query({ pattern: 'async function $NAME($$$)', path: './src', language: 'typescript' }); ``` **Vercel AI SDK integration:** ```javascript import { tools } from '@probelabs/probe'; const { searchTool, queryTool, extractTool } = tools; // Use with Vercel AI SDK const result = await generateText({ model: anthropic('claude-sonnet-4-6'), tools: { search: searchTool({ defaultPath: './src' }), query: queryTool({ defaultPath: './src' }), extract: extractTool({ defaultPath: './src' }) }, prompt: 'Find authentication code' }); ``` --- ## LLM Script Probe Agent can use the `execute_plan` tool to run deterministic, multi-step code analysis tasks. LLM Script is a sandboxed JavaScript DSL where the AI generates executable plans combining search, extraction, and LLM reasoning in a single pipeline. ```javascript // AI-generated LLM Script example (await is auto-injected, don't write it) const files = search("authentication login") const chunks = chunk(files) const analysis = map(chunks, c => LLM("Summarize auth patterns", c)) return analysis.join("\n") ``` **Key features:** - **Agent integration** - Probe Agent calls `execute_plan` tool to run scripts - **Auto-await** - Async calls are automatically awaited (don't write `await`) - **All tools available** - `search()`, `query()`, `extract()`, `LLM()`, `map()`, `chunk()`, plus any MCP tools - **Sandboxed execution** - Safe, isolated JavaScript environment with timeout protection See the full [LLM Script Documentation](./docs/llm-script.md) for syntax and examples. --- ## Installation ### NPM (Recommended) ```bash npm install -g @probelabs/probe ``` ### curl (macOS/Linux) ```bash curl -fsSL https://raw.githubusercontent.com/probelabs/probe/main/install.sh | bash ``` ### PowerShell (Windows) ```powershell iwr -useb https://raw.githubusercontent.com/probelabs/probe/main/install.ps1 | iex ``` ### From Source ```bash git clone https://github.com/probelabs/probe.git cd probe cargo build --release cargo install --path . ``` --- ## Supported Languages | Language | Extensions | |----------|------------| | Rust | `.rs` | | JavaScript/JSX | `.js`, `.jsx` | | TypeScript/TSX | `.ts`, `.tsx` | | Python | `.py` | | Go | `.go` | | C/C++ | `.c`, `.h`, `.cpp`, `.cc`, `.hpp` | | Java | `.java` | | Ruby | `.rb` | | PHP | `.php` | | Swift | `.swift` | | C# | `.cs` | | Markdown | `.md` | --- ## Documentation Full documentation available at [probelabs.com/probe](https://probelabs.com/probe) or browse locally in [`docs/`](./docs/). ### Getting Started - [Quick Start](./docs/quick-start.md) - Get up and running in 5 minutes - [Installation](./docs/installation.md) - NPM, curl, Docker, and building from source - [Features Overview](./docs/features.md) - Core capabilities ### Probe CLI - [Search Command](./docs/probe-cli/search.md) - Elasticsearch-style semantic search - [Extract Command](./docs/probe-cli/extract.md) - Extract code blocks with full AST context - [Symbols Command](./docs/probe-cli/symbols.md) - List all symbols in files with line numbers - [Query Command](./docs/probe-cli/query.md) - AST-based structural pattern matching - [CLI Reference](./docs/probe-cli/cli-reference.md) - Complete command-line reference ### LSP & Indexing - [LSP Features](./docs/lsp-features.md) - What `--lsp` adds for semantic code intelligence - [LSP Quick Reference](./docs/lsp-quick-reference.md) - Day-to-day LSP command cheatsheet - [Indexing Overview](./docs/indexing-overview.md) - Project indexing concepts and workflow - [Indexing CLI Reference](./docs/indexing-cli-reference.md) - `probe lsp index*` command reference LSP capabilities include call hierarchy enrichment (`extract --lsp`), direct symbol operations (`probe lsp call definition|references|hover|...`), daemon diagnostics (`probe lsp logs --analyze`), and workspace indexing (`probe lsp index`, `probe lsp index-status`). ### Probe Agent - [Agent Overview](./docs/probe-agent/overview.md) - What is Probe Agent and when to use it - [API Reference](./docs/probe-agent/sdk/api-reference.md) - ProbeAgent class documentation - [Node.js SDK](./docs/probe-agent/sdk/nodejs-sdk.md) - Full Node.js SDK reference - [MCP Integration](./docs/probe-agent/protocols/mcp-integration.md) - Editor integration guide - [LLM Script](./docs/llm-script.md) - Programmable orchestration DSL ### Guides & Reference - [Query Patterns](./docs/guides/query-patterns.md) - Effective search strategies - [How Probe Compares](./docs/reference/comparison.md) - vs embedding search, knowledge graphs, LSP tools - [Architecture](./docs/reference/architecture.md) - System design and internals - [Environment Variables](./docs/reference/environment-variables.md) - All configuration options - [FAQ](./docs/reference/faq.md) - Frequently asked questions --- ## Environment Variables ```bash # AI Provider Keys ANTHROPIC_API_KEY=sk-ant-... OPENAI_API_KEY=sk-... GOOGLE_API_KEY=... # Provider Selection FORCE_PROVIDER=anthropic MODEL_NAME=claude-sonnet-4-6 # Custom Endpoints ANTHROPIC_API_URL=https://your-proxy.com OPENAI_API_URL=https://your-proxy.com # Debug DEBUG=1 ``` --- ## Contributing We welcome contributions! See our [Contributing Guide](https://github.com/probelabs/probe/blob/main/CONTRIBUTING.md). For questions or support: - [GitHub Issues](https://github.com/probelabs/probe/issues) - [Discord Community](https://discord.gg/hBN4UsTZ) --- ## License --- For questions or contributions, please open an issue on [GitHub](https://github.com/probelabs/probe/issues) or join our [Discord community](https://discord.gg/hBN4UsTZ) for discussions and support. Happy coding—and searching!