# probe
**Repository Path**: mirrors_buger/probe
## Basic Information
- **Project Name**: probe
- **Description**: Probe is an AI-friendly, fully local, semantic code search engine which which works with for large codebases. The final missing building block for next generation of AI coding tools.
- **Primary Language**: Unknown
- **License**: Apache-2.0
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-03-07
- **Last Updated**: 2026-04-19
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# Probe
**We read code 10x more than we write it.** Probe is a code and markdown context engine, with a built-in agent, made to work on enterprise-scale codebases.
Today's AI coding tools use a caveman approach: grep some files, read random lines, hope for the best. It works on toy projects. It falls apart on real codebases.
**Probe is a context engine built for reading and reasoning.** It treats your code as code—not text. AST parsing understands structure. Semantic search finds what matters. You get complete, meaningful context in a single call.
**The Probe Agent** is purpose-built for code understanding. It knows how to wield the Probe engine expertly—searching, extracting, and reasoning across your entire codebase. Perfect for spec-driven development, code reviews, onboarding, and any task where understanding comes before writing.
**One Probe call captures what takes other tools 10+ agentic loops**—deeper, cleaner, and far less noise.
---
## Table of Contents
- [Why Probe?](#why-probe)
- [Quick Start](#quick-start)
- [Features](#features)
- [Usage Modes](#usage-modes)
- [Probe Agent (MCP)](#probe-agent-mcp)
- [Raw MCP Tools](#raw-mcp-tools)
- [CLI Agent](#cli-agent)
- [Direct CLI Commands](#direct-cli-commands)
- [Node.js SDK](#nodejs-sdk)
- [LLM Script](#llm-script)
- [Installation](#installation)
- [Supported Languages](#supported-languages)
- [Documentation](#documentation)
- [Environment Variables](#environment-variables)
- [Contributing](#contributing)
- [License](#license)
---
## Why Probe?
Most code search tools fall into two camps: **text-based** (grep, ripgrep) or **embedding-based** (vector search requiring indexing and an embedding model). Probe takes a third path: **AST-aware structural search with zero setup**.
| | grep/ripgrep | Embedding tools (grepai, Octocode) | Probe |
|---|---|---|---|
| **Setup time** | None | Minutes (indexing + embedding service) | None |
| **Code understanding** | Text only | Text chunks (can split mid-function) | AST-aware (returns complete functions/classes) |
| **Search method** | Regex | Vector similarity | Elasticsearch-style boolean queries + BM25 |
| **Result quality** | Line fragments | ~512-char chunks | Complete semantic code blocks |
| **Ranking** | None (line order) | Cosine similarity | BM25/TF-IDF/Hybrid with SIMD acceleration |
| **External dependencies** | None | Embedding API (Ollama/OpenAI) | None |
| **Token awareness** | No | Partial | Yes (`--max-tokens`, session dedup) |
| **Works offline** | Yes | Only with local model | Always |
| **AI agent integration** | None | MCP server | Full agent loop + MCP + Vercel AI SDK |
### The key insight: AI agents don't need embedding search
Embedding-based tools solve vocabulary mismatch -- finding "authentication" when the code says `verify_credentials`. But when an AI agent is the consumer, **the LLM already handles this**:
```
User: "find the authentication logic"
-> LLM generates: probe search "verify_credentials OR authenticate OR login OR auth_handler"
-> Probe returns complete AST blocks in milliseconds
```
The LLM translates intent into precise boolean queries. Probe gives it a powerful query language (`AND`, `OR`, `+required`, `-excluded`, `"exact phrases"`, `ext:rs`, `lang:python`) purpose-built for this. Combined with session dedup, the agent can run 3-4 rapid searches and cover more ground than a single embedding query -- faster, deterministic, and with zero setup cost.
---
## Quick Start
### Option 1: Probe Agent via MCP (Recommended)
Our built-in agent natively integrates with Claude Code, using its authentication—no extra API keys needed.
Add to `~/.claude/claude_desktop_config.json`:
```json
{
"mcpServers": {
"probe": {
"command": "npx",
"args": ["-y", "@probelabs/probe@latest", "agent", "--mcp"]
}
}
}
```
The Probe Agent is purpose-built to read and reason about code. It piggybacks on Claude Code's auth (or Codex auth), or works with any model via your own API key (e.g., `GOOGLE_API_KEY`).
### Option 2: Raw Probe Tools via MCP
If you prefer direct access to search/query/extract tools without the agent layer:
```json
{
"mcpServers": {
"probe": {
"command": "npx",
"args": ["-y", "@probelabs/probe@latest", "mcp"]
}
}
}
```
### Option 3: Direct CLI (No MCP)
Use Probe directly from your terminal—no AI editor required:
```bash
# Semantic search with Elasticsearch syntax
npx -y @probelabs/probe search "authentication AND login" ./src
# Extract code block at line 42
npx -y @probelabs/probe extract src/main.rs:42
# AST pattern matching
npx -y @probelabs/probe query "fn $NAME($$$) -> Result<$RET>" --language rust
```
### Option 4: CLI Agent
Ask questions about any codebase directly from your terminal:
```bash
# One-shot question (works with any LLM provider)
npx -y @probelabs/probe@latest agent "How is authentication implemented?"
# With code editing capabilities
npx -y @probelabs/probe@latest agent "Refactor the login function" --allow-edit
```
---
## Features
- **Code-Aware**: Tree-sitter AST parsing understands your code's actual structure
- **Semantic Search**: Elasticsearch-style queries (`AND`, `OR`, `NOT`, phrases, filters)
- **Complete Context**: Returns entire functions, classes, or structs -- not text chunks that break mid-function
- **Zero Indexing**: Instant results on any codebase. No embedding models, no vector databases, no setup
- **Deterministic**: Same query always returns the same results. No model variance, no stale indexes
- **Fully Local**: Your code never leaves your machine. No API calls for search
- **Blazing Fast**: SIMD-accelerated pattern matching + ripgrep scanning + rayon parallelism
- **Smart Ranking**: BM25, TF-IDF, and hybrid algorithms with optional BERT reranking
- **Token-Aware**: `--max-tokens` budget, session-based dedup to avoid repeating context
- **Built-in Agent**: Multi-provider (Anthropic, OpenAI, Google, Bedrock) with retry, fallback, and context compaction
- **Multi-Language**: Rust, Python, JavaScript, TypeScript, Go, C/C++, Java, Ruby, PHP, Swift, C#, and more
---
## Usage Modes
### Probe Agent (MCP)
The recommended way to use Probe with AI editors. The Probe Agent is a specialized coding assistant that reasons about your code—not just pattern matches.
```json
{
"mcpServers": {
"probe": {
"command": "npx",
"args": ["-y", "@probelabs/probe@latest", "agent", "--mcp"]
}
}
}
```
**Why use the agent?**
- Purpose-built to understand and reason about code
- Piggybacks on Claude Code / Codex authentication (or use your own API key)
- Smarter multi-step reasoning for complex questions
- Built-in code editing, task delegation, and more
**Agent options:**
| Option | Description |
|--------|-------------|
| `--path ` | Search directory (default: current) |
| `--provider ` | AI provider: `anthropic`, `openai`, `google` |
| `--model ` | Override model name |
| `--prompt ` | Persona: `code-explorer`, `engineer`, `code-review`, `architect` |
| `--allow-edit` | Enable code modification |
| `--enable-delegate` | Enable task delegation to subagents |
| `--enable-bash` | Enable bash command execution |
| `--max-iterations ` | Max tool iterations (default: 30) |
---
### Raw MCP Tools
Direct access to Probe's search, query, and extract tools—without the agent layer. Use this when you want your AI editor to call Probe tools directly.
```json
{
"mcpServers": {
"probe": {
"command": "npx",
"args": ["-y", "@probelabs/probe@latest", "mcp"]
}
}
}
```
**Available tools:**
- `search` - Semantic code search with Elasticsearch-style queries
- `query` - AST-based structural pattern matching
- `extract` - Extract code blocks by line number or symbol name
- `symbols` - List all symbols in a file (functions, classes, constants) with line numbers
---
### CLI Agent
Run the Probe Agent directly from your terminal:
```bash
# One-shot question
npx -y @probelabs/probe@latest agent "How does the ranking algorithm work?"
# Specify search path
npx -y @probelabs/probe@latest agent "Find API endpoints" --path ./src
# Enable code editing
npx -y @probelabs/probe@latest agent "Add error handling to login()" --allow-edit
# Use custom persona
npx -y @probelabs/probe@latest agent "Review this code" --prompt code-review
```
---
### Direct CLI Commands
For scripting and direct code analysis.
#### Search Command
```bash
probe search [PATH] [OPTIONS]
```
**Examples:**
```bash
# Basic search
probe search "authentication" ./src
# Boolean operators (Elasticsearch syntax)
probe search "error AND handling" ./
probe search "login OR auth" ./src
probe search "database NOT sqlite" ./
# Search hints (file filters)
probe search "function AND ext:rs" ./ # Only .rs files
probe search "class AND file:src/**/*.py" ./ # Python files in src/
probe search "error AND dir:tests" ./ # Files in tests/
# Limit results for AI context windows
probe search "API" ./ --max-tokens 10000
```
**Key options:**
| Option | Description |
|--------|-------------|
| `--max-tokens ` | Limit total tokens returned |
| `--max-results ` | Limit number of results |
| `--reranker ` | Ranking: `bm25`, `tfidf`, `hybrid`, `hybrid2` |
| `--allow-tests` | Include test files |
| `--format ` | Output: `markdown`, `json`, `xml` |
#### Extract Command
```bash
probe extract [OPTIONS]
```
**Examples:**
```bash
# Extract function at line 42
probe extract src/main.rs:42
# Extract by symbol name
probe extract src/main.rs#authenticate
# Extract line range
probe extract src/main.rs:10-50
# From compiler output
go test | probe extract
```
#### Symbols Command
```bash
probe symbols [OPTIONS]
```
**Examples:**
```bash
# List symbols in a file
probe symbols src/main.rs
# JSON output for programmatic use
probe symbols src/main.rs --format json
# Multiple files
probe symbols src/main.rs src/lib.rs
```
#### Query Command (AST Patterns)
```bash
probe query [PATH] [OPTIONS]
```
**Examples:**
```bash
# Find all async functions in Rust
probe query "async fn $NAME($$$)" --language rust
# Find React components
probe query "function $NAME($$$) { return <$$$> }" --language javascript
# Find Python classes with specific method
probe query "class $CLASS: def __init__($$$)" --language python
```
---
### Node.js SDK
Use Probe programmatically in your applications.
```javascript
import { ProbeAgent } from '@probelabs/probe/agent';
// Create agent
const agent = new ProbeAgent({
path: './src',
provider: 'anthropic'
});
await agent.initialize();
// Ask questions
const response = await agent.answer('How does authentication work?');
console.log(response);
// Get token usage
console.log(agent.getTokenUsage());
```
**Direct functions:**
```javascript
import { search, extract, query, symbols } from '@probelabs/probe';
// Semantic search
const results = await search({
query: 'authentication',
path: './src',
maxTokens: 10000
});
// Extract code
const code = await extract({
files: ['src/auth.ts:42'],
format: 'markdown'
});
// List symbols in a file
const fileSymbols = await symbols({
files: ['src/auth.ts']
});
// AST pattern query
const matches = await query({
pattern: 'async function $NAME($$$)',
path: './src',
language: 'typescript'
});
```
**Vercel AI SDK integration:**
```javascript
import { tools } from '@probelabs/probe';
const { searchTool, queryTool, extractTool } = tools;
// Use with Vercel AI SDK
const result = await generateText({
model: anthropic('claude-sonnet-4-6'),
tools: {
search: searchTool({ defaultPath: './src' }),
query: queryTool({ defaultPath: './src' }),
extract: extractTool({ defaultPath: './src' })
},
prompt: 'Find authentication code'
});
```
---
## LLM Script
Probe Agent can use the `execute_plan` tool to run deterministic, multi-step code analysis tasks. LLM Script is a sandboxed JavaScript DSL where the AI generates executable plans combining search, extraction, and LLM reasoning in a single pipeline.
```javascript
// AI-generated LLM Script example (await is auto-injected, don't write it)
const files = search("authentication login")
const chunks = chunk(files)
const analysis = map(chunks, c => LLM("Summarize auth patterns", c))
return analysis.join("\n")
```
**Key features:**
- **Agent integration** - Probe Agent calls `execute_plan` tool to run scripts
- **Auto-await** - Async calls are automatically awaited (don't write `await`)
- **All tools available** - `search()`, `query()`, `extract()`, `LLM()`, `map()`, `chunk()`, plus any MCP tools
- **Sandboxed execution** - Safe, isolated JavaScript environment with timeout protection
See the full [LLM Script Documentation](./docs/llm-script.md) for syntax and examples.
---
## Installation
### NPM (Recommended)
```bash
npm install -g @probelabs/probe
```
### curl (macOS/Linux)
```bash
curl -fsSL https://raw.githubusercontent.com/probelabs/probe/main/install.sh | bash
```
### PowerShell (Windows)
```powershell
iwr -useb https://raw.githubusercontent.com/probelabs/probe/main/install.ps1 | iex
```
### From Source
```bash
git clone https://github.com/probelabs/probe.git
cd probe
cargo build --release
cargo install --path .
```
---
## Supported Languages
| Language | Extensions |
|----------|------------|
| Rust | `.rs` |
| JavaScript/JSX | `.js`, `.jsx` |
| TypeScript/TSX | `.ts`, `.tsx` |
| Python | `.py` |
| Go | `.go` |
| C/C++ | `.c`, `.h`, `.cpp`, `.cc`, `.hpp` |
| Java | `.java` |
| Ruby | `.rb` |
| PHP | `.php` |
| Swift | `.swift` |
| C# | `.cs` |
| Markdown | `.md` |
---
## Documentation
Full documentation available at [probelabs.com/probe](https://probelabs.com/probe) or browse locally in [`docs/`](./docs/).
### Getting Started
- [Quick Start](./docs/quick-start.md) - Get up and running in 5 minutes
- [Installation](./docs/installation.md) - NPM, curl, Docker, and building from source
- [Features Overview](./docs/features.md) - Core capabilities
### Probe CLI
- [Search Command](./docs/probe-cli/search.md) - Elasticsearch-style semantic search
- [Extract Command](./docs/probe-cli/extract.md) - Extract code blocks with full AST context
- [Symbols Command](./docs/probe-cli/symbols.md) - List all symbols in files with line numbers
- [Query Command](./docs/probe-cli/query.md) - AST-based structural pattern matching
- [CLI Reference](./docs/probe-cli/cli-reference.md) - Complete command-line reference
### LSP & Indexing
- [LSP Features](./docs/lsp-features.md) - What `--lsp` adds for semantic code intelligence
- [LSP Quick Reference](./docs/lsp-quick-reference.md) - Day-to-day LSP command cheatsheet
- [Indexing Overview](./docs/indexing-overview.md) - Project indexing concepts and workflow
- [Indexing CLI Reference](./docs/indexing-cli-reference.md) - `probe lsp index*` command reference
LSP capabilities include call hierarchy enrichment (`extract --lsp`), direct symbol operations (`probe lsp call definition|references|hover|...`), daemon diagnostics (`probe lsp logs --analyze`), and workspace indexing (`probe lsp index`, `probe lsp index-status`).
### Probe Agent
- [Agent Overview](./docs/probe-agent/overview.md) - What is Probe Agent and when to use it
- [API Reference](./docs/probe-agent/sdk/api-reference.md) - ProbeAgent class documentation
- [Node.js SDK](./docs/probe-agent/sdk/nodejs-sdk.md) - Full Node.js SDK reference
- [MCP Integration](./docs/probe-agent/protocols/mcp-integration.md) - Editor integration guide
- [LLM Script](./docs/llm-script.md) - Programmable orchestration DSL
### Guides & Reference
- [Query Patterns](./docs/guides/query-patterns.md) - Effective search strategies
- [How Probe Compares](./docs/reference/comparison.md) - vs embedding search, knowledge graphs, LSP tools
- [Architecture](./docs/reference/architecture.md) - System design and internals
- [Environment Variables](./docs/reference/environment-variables.md) - All configuration options
- [FAQ](./docs/reference/faq.md) - Frequently asked questions
---
## Environment Variables
```bash
# AI Provider Keys
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
GOOGLE_API_KEY=...
# Provider Selection
FORCE_PROVIDER=anthropic
MODEL_NAME=claude-sonnet-4-6
# Custom Endpoints
ANTHROPIC_API_URL=https://your-proxy.com
OPENAI_API_URL=https://your-proxy.com
# Debug
DEBUG=1
```
---
## Contributing
We welcome contributions! See our [Contributing Guide](https://github.com/probelabs/probe/blob/main/CONTRIBUTING.md).
For questions or support:
- [GitHub Issues](https://github.com/probelabs/probe/issues)
- [Discord Community](https://discord.gg/hBN4UsTZ)
---
## License
---
For questions or contributions, please open an issue on [GitHub](https://github.com/probelabs/probe/issues) or join our [Discord community](https://discord.gg/hBN4UsTZ) for discussions and support. Happy coding—and searching!