# agentic-rag-jdemo **Repository Path**: createmaker/agentic-rag-jdemo ## Basic Information - **Project Name**: agentic-rag-jdemo - **Description**: agentic-rag-jdemo - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-05-03 - **Last Updated**: 2026-05-09 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # agentic-rag-jdemo Minimal **agentic RAG** in Java — port of [`agentic-rag-demo`](../PythonProjects/agentic-rag-demo) (Python). The agent decides when to retrieve, what to query, when to re-query, and which chunks to read in full — instead of "always retrieve top-k, stuff into prompt". Drives a local **Hermes Gateway** (or any OpenAI-compatible endpoint: Ollama / LM Studio / vLLM). No cloud key, no model downloads outside what your gateway already serves. ## What's in the box - **Retriever** — hand-rolled Okapi BM25 over markdown chunks split at H2 headings. ~120 lines, no embedding library. - **LLM client** — `java.net.http.HttpClient` direct POST to `{base_url}/chat/completions`. Mirrors `task-manager-cloud2/utils/llm.py`. Works with anything OpenAI-compatible. - **Agent** — JSON-prompted **ReAct** loop (model emits `{"action": {...}}` or `{"answer": "..."}` each turn). No native function-calling required, so smaller open-source models work. - **Tools** — `search_docs`, `read_doc`, `calculate`, `current_date`. One `ToolSpec` + one `case` branch to add more. - **CLI** — `arag` with ANSI-colored REPL that streams thought / tool calls / observations / final answer in real time. - **Sample corpus** — 5 markdown docs about TaskSaaS. Replace with your own to repurpose. - **Single runtime dep** — Jackson (JSON). HTTP, BM25, .env, ANSI all stdlib / hand-rolled. ## Why "agentic"? A vanilla RAG pipeline pre-fetches top-k chunks for *every* query, then asks the LLM to answer. It can't: - skip retrieval when not needed (math, dates, general knowledge); - re-query if the first search missed (synonyms, related concepts); - decide a chunk preview is enough vs. needing the full body; - mix retrieval with non-doc tools (calculator, time, your APIs) in one session. Here all four are decisions the model makes, exposed as tool calls. The agent loop is just `while obj has "action": run tool; loop`. ## Setup Requires JDK 17+ and Maven. ```powershell # Windows cd D:\javacode\agentic-rag-jdemo mvn package copy .env.example .env ``` ```bash # Mac / Linux cd agentic-rag-jdemo mvn package cp .env.example .env ``` `.env` defaults match the TaskSaaS Hermes Gateway: ```ini LLM_BASE_URL=http://localhost:8642/v1 LLM_MODEL=hermes-agent LLM_API_KEY=unused # Hermes 不校验, 任意非空字符串均可 # LLM_TIMEOUT=180 # LLM_TEMPERATURE=0.2 ``` For Ollama: `LLM_BASE_URL=http://localhost:11434/v1`, `LLM_MODEL=llama3.1` (or whatever you've pulled). ## Verify the gateway ```powershell java -jar target/arag.jar --ping # [OK] reply: 'pong' ``` If you get `无法连接 LLM 网关 ...`, your Hermes / Ollama isn't running on the configured `LLM_BASE_URL`. If you get `LLM 网关返回 4xx/5xx`, the model name is probably wrong. ## Usage ```powershell # Interactive REPL java -jar target/arag.jar # One-shot java -jar target/arag.jar "What's the gotcha with x-api-token CORS?" # List indexed chunks java -jar target/arag.jar --list # Different corpus java -jar target/arag.jar --corpus path\to\your-docs "your question" ``` Sample questions to try against the bundled TaskSaaS corpus: ```text java -jar target/arag.jar "How do I authenticate from a mobile client?" java -jar target/arag.jar "What's the envelope format and why isn't it REST-conformant?" java -jar target/arag.jar "Why does my Web build give 'XFile.path is empty'?" java -jar target/arag.jar "What is 7 factorial and what's today's UTC date?" ``` You'll see the agent search, sometimes search again with different keywords, occasionally `read_doc` for full content, and finally answer with `[source#index]` citations. ## Project layout ``` agentic-rag-jdemo/ ├── src/main/java/com/agentic/rag/ │ ├── App.java # CLI entry, args, REPL, ANSI rendering │ ├── Agent.java # ReAct loop (~140 lines that matter) │ ├── LLMClient.java # java.net.http POST + extract_json fallback │ ├── Tools.java # ToolSpec registry + executor │ ├── BM25Retriever.java # hand-rolled Okapi BM25 + markdown chunk splitter │ ├── SafeEval.java # recursive-descent arithmetic evaluator │ ├── EnvLoader.java # tiny .env reader │ └── Document.java # chunk record ├── src/test/java/com/agentic/rag/ │ └── SafeEvalTest.java # 10 JUnit tests ├── corpus/ # 5 sample markdown docs (TaskSaaS) ├── docs/flow.md # Mermaid diagrams (architecture / ReAct / sequence) ├── pom.xml ├── .env.example └── README.md ``` ## How the ReAct protocol works The system prompt tells the model to output exactly one JSON object per turn: ```jsonc // Continue with another tool call: {"thought": "I need to look up X", "action": {"tool": "search_docs", "args": {"query": "..."}}} // Or give the final answer: {"thought": "got enough", "answer": ""} ``` The host parses the JSON (with fenced-code-block fallback — see `LLMClient.extractJson`), runs the tool, appends the observation as a `user` message, and loops. When the model returns `answer`, we stop. Why not OpenAI native `tools=...` function calling? Hermes routes to local models that may not handle native tool calls reliably. Prompted JSON works on any model that can follow instructions, including 7B / 13B open-source models. ## Swap the corpus Drop your own `*.md` files into `corpus/` (or pass `--corpus`). The retriever splits on `^## ` headings — write docs with H2 sections for finer-grained retrieval, or one section per file for coarser. For non-Latin scripts (Chinese, Japanese), replace the tokenizer regex `[A-Za-z0-9_]+` in `BM25Retriever.java` with a real segmenter. BM25 itself is language-agnostic; the tokenizer isn't. ## Add a new tool In `Tools.java`: ```java TOOLS.add(new ToolSpec( "lookup_user", "按用户名查信息。返回 email / role / last_seen JSON。", "{\"username\": \"string, 必填\"}" )); // in execute(...): case "lookup_user" -> { String u = String.valueOf(args.getOrDefault("username", "")).trim(); yield "{\"email\": \"" + u + "@example.com\", \"role\": \"dev\"}"; } ``` The system prompt auto-includes any new `ToolSpec` via `Tools.descriptionsForPrompt()`. ## Tests ```powershell mvn test # 10 tests passed (SafeEval grammar + injection guards) ``` ## Knobs - `LLM_BASE_URL` / `LLM_MODEL` / `LLM_API_KEY` / `LLM_TIMEOUT` / `LLM_TEMPERATURE` env vars - `--max-iters N` caps tool-use roundtrips per question (default 8) - `--ping` for a connectivity test before going live ## See also - Python sibling: [`agentic-rag-demo`](../PythonProjects/agentic-rag-demo) — same architecture, ~half the lines - Flow diagrams: [`docs/flow.md`](docs/flow.md)