# agentic-rag-jdemo

**Repository Path**: createmaker/agentic-rag-jdemo

## Basic Information

- **Project Name**: agentic-rag-jdemo
- **Description**: agentic-rag-jdemo
- **Primary Language**: Unknown
- **License**: Not specified
- **Default Branch**: master
- **Homepage**: None
- **GVP Project**: No

## Statistics

- **Stars**: 0
- **Forks**: 0
- **Created**: 2026-05-03
- **Last Updated**: 2026-05-09

## Categories & Tags

**Categories**: Uncategorized

**Tags**: None

## README

# agentic-rag-jdemo

Minimal **agentic RAG** in Java — port of [`agentic-rag-demo`](../PythonProjects/agentic-rag-demo) (Python). The agent decides when to retrieve, what to query, when to re-query, and which chunks to read in full — instead of "always retrieve top-k, stuff into prompt".

Drives a local **Hermes Gateway** (or any OpenAI-compatible endpoint: Ollama / LM Studio / vLLM). No cloud key, no model downloads outside what your gateway already serves.

## What's in the box

- **Retriever** — hand-rolled Okapi BM25 over markdown chunks split at H2 headings. ~120 lines, no embedding library.
- **LLM client** — `java.net.http.HttpClient` direct POST to `{base_url}/chat/completions`. Mirrors `task-manager-cloud2/utils/llm.py`. Works with anything OpenAI-compatible.
- **Agent** — JSON-prompted **ReAct** loop (model emits `{"action": {...}}` or `{"answer": "..."}` each turn). No native function-calling required, so smaller open-source models work.
- **Tools** — `search_docs`, `read_doc`, `calculate`, `current_date`. One `ToolSpec` + one `case` branch to add more.
- **CLI** — `arag` with ANSI-colored REPL that streams thought / tool calls / observations / final answer in real time.
- **Sample corpus** — 5 markdown docs about TaskSaaS. Replace with your own to repurpose.
- **Single runtime dep** — Jackson (JSON). HTTP, BM25, .env, ANSI all stdlib / hand-rolled.

## Why "agentic"?

A vanilla RAG pipeline pre-fetches top-k chunks for *every* query, then asks the LLM to answer. It can't:

- skip retrieval when not needed (math, dates, general knowledge);
- re-query if the first search missed (synonyms, related concepts);
- decide a chunk preview is enough vs. needing the full body;
- mix retrieval with non-doc tools (calculator, time, your APIs) in one session.

Here all four are decisions the model makes, exposed as tool calls. The agent loop is just `while obj has "action": run tool; loop`.

## Setup

Requires JDK 17+ and Maven.

```powershell
# Windows
cd D:\javacode\agentic-rag-jdemo
mvn package
copy .env.example .env
```

```bash
# Mac / Linux
cd agentic-rag-jdemo
mvn package
cp .env.example .env
```

`.env` defaults match the TaskSaaS Hermes Gateway:

```ini
LLM_BASE_URL=http://localhost:8642/v1
LLM_MODEL=hermes-agent
LLM_API_KEY=unused          # Hermes 不校验, 任意非空字符串均可
# LLM_TIMEOUT=180
# LLM_TEMPERATURE=0.2
```

For Ollama: `LLM_BASE_URL=http://localhost:11434/v1`, `LLM_MODEL=llama3.1` (or whatever you've pulled).

## Verify the gateway

```powershell
java -jar target/arag.jar --ping
# [OK] reply: 'pong'
```

If you get `无法连接 LLM 网关 ...`, your Hermes / Ollama isn't running on the configured `LLM_BASE_URL`. If you get `LLM 网关返回 4xx/5xx`, the model name is probably wrong.

## Usage

```powershell
# Interactive REPL
java -jar target/arag.jar

# One-shot
java -jar target/arag.jar "What's the gotcha with x-api-token CORS?"

# List indexed chunks
java -jar target/arag.jar --list

# Different corpus
java -jar target/arag.jar --corpus path\to\your-docs "your question"
```

Sample questions to try against the bundled TaskSaaS corpus:

```text
java -jar target/arag.jar "How do I authenticate from a mobile client?"
java -jar target/arag.jar "What's the envelope format and why isn't it REST-conformant?"
java -jar target/arag.jar "Why does my Web build give 'XFile.path is empty'?"
java -jar target/arag.jar "What is 7 factorial and what's today's UTC date?"
```

You'll see the agent search, sometimes search again with different keywords, occasionally `read_doc` for full content, and finally answer with `[source#index]` citations.

## Project layout

```
agentic-rag-jdemo/
├── src/main/java/com/agentic/rag/
│   ├── App.java              # CLI entry, args, REPL, ANSI rendering
│   ├── Agent.java            # ReAct loop (~140 lines that matter)
│   ├── LLMClient.java        # java.net.http POST + extract_json fallback
│   ├── Tools.java            # ToolSpec registry + executor
│   ├── BM25Retriever.java    # hand-rolled Okapi BM25 + markdown chunk splitter
│   ├── SafeEval.java         # recursive-descent arithmetic evaluator
│   ├── EnvLoader.java        # tiny .env reader
│   └── Document.java         # chunk record
├── src/test/java/com/agentic/rag/
│   └── SafeEvalTest.java     # 10 JUnit tests
├── corpus/                   # 5 sample markdown docs (TaskSaaS)
├── docs/flow.md              # Mermaid diagrams (architecture / ReAct / sequence)
├── pom.xml
├── .env.example
└── README.md
```

## How the ReAct protocol works

The system prompt tells the model to output exactly one JSON object per turn:

```jsonc
// Continue with another tool call:
{"thought": "I need to look up X", "action": {"tool": "search_docs", "args": {"query": "..."}}}

// Or give the final answer:
{"thought": "got enough", "answer": "<markdown answer with [source#index] cites>"}
```

The host parses the JSON (with fenced-code-block fallback — see `LLMClient.extractJson`), runs the tool, appends the observation as a `user` message, and loops. When the model returns `answer`, we stop.

Why not OpenAI native `tools=...` function calling? Hermes routes to local models that may not handle native tool calls reliably. Prompted JSON works on any model that can follow instructions, including 7B / 13B open-source models.

## Swap the corpus

Drop your own `*.md` files into `corpus/` (or pass `--corpus`). The retriever splits on `^## ` headings — write docs with H2 sections for finer-grained retrieval, or one section per file for coarser.

For non-Latin scripts (Chinese, Japanese), replace the tokenizer regex `[A-Za-z0-9_]+` in `BM25Retriever.java` with a real segmenter. BM25 itself is language-agnostic; the tokenizer isn't.

## Add a new tool

In `Tools.java`:

```java
TOOLS.add(new ToolSpec(
    "lookup_user",
    "按用户名查信息。返回 email / role / last_seen JSON。",
    "{\"username\": \"string, 必填\"}"
));

// in execute(...):
case "lookup_user" -> {
    String u = String.valueOf(args.getOrDefault("username", "")).trim();
    yield "{\"email\": \"" + u + "@example.com\", \"role\": \"dev\"}";
}
```

The system prompt auto-includes any new `ToolSpec` via `Tools.descriptionsForPrompt()`.

## Tests

```powershell
mvn test
# 10 tests passed (SafeEval grammar + injection guards)
```

## Knobs

- `LLM_BASE_URL` / `LLM_MODEL` / `LLM_API_KEY` / `LLM_TIMEOUT` / `LLM_TEMPERATURE` env vars
- `--max-iters N` caps tool-use roundtrips per question (default 8)
- `--ping` for a connectivity test before going live

## See also

- Python sibling: [`agentic-rag-demo`](../PythonProjects/agentic-rag-demo) — same architecture, ~half the lines
- Flow diagrams: [`docs/flow.md`](docs/flow.md)