# AutoResearchClaw **Repository Path**: Analyst2020/AutoResearchClaw ## Basic Information - **Project Name**: AutoResearchClaw - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-04-14 - **Last Updated**: 2026-04-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

AutoResearchClaw Logo

Chat an Idea. Get a Paper. Autonomous, Collaborative & Self-Evolving.

Just chat with OpenClaw: "Research X" β†’ done.

AutoResearchClaw Framework

MIT License Python 3.11+ 2699 Tests Passed GitHub OpenClaw Compatible Discord Ethics Guidelines

πŸ‡¨πŸ‡³ δΈ­ζ–‡ Β· πŸ‡―πŸ‡΅ ζ—₯本θͺž Β· πŸ‡°πŸ‡· ν•œκ΅­μ–΄ Β· πŸ‡«πŸ‡· FranΓ§ais Β· πŸ‡©πŸ‡ͺ Deutsch Β· πŸ‡ͺπŸ‡Έ EspaΓ±ol Β· πŸ‡§πŸ‡· PortuguΓͺs Β· πŸ‡·πŸ‡Ί Русский Β· πŸ‡ΈπŸ‡¦ Ψ§Ω„ΨΉΨ±Ψ¨ΩŠΨ©

πŸ† Paper Showcase Β· πŸ§‘β€βœˆοΈ Co-Pilot Guide Β· πŸ“– Integration Guide Β· πŸ’¬ Discord Community

---
Sample Paper πŸ† Generated Paper Showcase

8 papers across 8 domains β€” math, statistics, biology, computing, NLP, RL, vision, robustness β€” generated fully autonomously or with Human-in-the-Loop co-pilot guidance.

View Showcase
--- > **πŸ§ͺ We're looking for testers!** Try the pipeline with your own research idea β€” from any field β€” and [tell us what you think](docs/TESTER_GUIDE.md). Your feedback directly shapes the next version. **[β†’ Testing Guide](docs/TESTER_GUIDE.md)** | **[β†’ δΈ­ζ–‡ζ΅‹θ―•ζŒ‡ε—](docs/TESTER_GUIDE_CN.md)** | **[β†’ ζ—₯本θͺžγƒ†γ‚Ήγƒˆγ‚¬γ‚€γƒ‰](docs/TESTER_GUIDE_JA.md)** --- ## πŸ”₯ News - **[04/08/2026]** **Ethics and Responsible Use Guidelines!** β€” We've added comprehensive [ethics guidelines](#%EF%B8%8F-ethics-and-responsible-use) covering academic integrity, transparency, citation verification, misuse prevention, and dual-use considerations. AI-generated papers are drafts, not finished work β€” human review is essential. Please read before using AutoResearchClaw for any submission. - **[04/01/2026]** **v0.4.0** β€” **Human-in-the-Loop Co-Pilot System** β€” AutoResearchClaw is no longer purely autonomous. New HITL system adds 6 intervention modes (`full-auto`, `gate-only`, `checkpoint`, `step-by-step`, `co-pilot`, `custom`), per-stage policies, and deep human-AI collaboration. Includes: Idea Workshop for hypothesis co-creation, Baseline Navigator for experiment design review, Paper Co-Writer for collaborative drafting, SmartPause (confidence-driven dynamic intervention), ALHF intervention learning, anti-hallucination claim verification, cost budget guardrails, pipeline branching for parallel hypothesis exploration, and CLI commands (`attach`/`status`/`approve`/`reject`/`guide`). **[β†’ Full HITL Guide](docs/HITL_GUIDE.md)** - **[03/30/2026]** **Flexible Skill Loading** β€” AutoResearchClaw now supports loading open-source and custom skills from any discipline to further enhance your research experience. 19 pre-loaded skills are included as ready-to-use references, covering scientific writing, experiment design, chemistry, biology, and more β€” including an [A-Evolve](https://github.com/A-EVO-Lab/a-evolve) agentic evolution skill contributed by the community. Load your own via `researchclaw skills install` or drop a `SKILL.md` into `.claude/skills/`. See [Skills Library](#-skills-library). - **[03/22/2026]** [v0.3.2](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.2) β€” **Cross-Platform Support + Major Stability** β€” AutoResearchClaw now runs on any ACP-compatible agent backend (Claude Code, Codex CLI, Copilot CLI, Gemini CLI, Kimi CLI) and supports messaging platforms (Discord, Telegram, Lark, WeChat) via OpenClaw bridge. New CLI-agent code generation backend delegates Stages 10 & 13 to external CLI agents with budget control and timeout management. Also includes anti-fabrication system (VerifiedRegistry + experiment diagnosis & repair loop), 100+ bug fixes, modular executor refactoring, `--resume` auto-detection, LLM retry hardening, and community-reported fixes.
Earlier releases - **[03/18/2026]** [v0.3.1](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.1) β€” **OpenCode Beast Mode + Community Contributions** β€” New "Beast Mode" routes complex code generation to [OpenCode](https://github.com/anomalyco/opencode) with automatic complexity scoring and graceful fallback. Added Novita AI provider support, thread-safety hardening, improved LLM output parsing robustness, and 20+ bug fixes from community PRs and internal audit. - **[03/17/2026]** [v0.3.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.3.0) β€” **MetaClaw Integration** β€” AutoResearchClaw now supports [MetaClaw](https://github.com/aiming-lab/MetaClaw) cross-run learning: pipeline failures β†’ structured lessons β†’ reusable skills, injected into all 23 stages. **+18.3%** robustness in controlled experiments. Opt-in (`metaclaw_bridge.enabled: true`), fully backward-compatible. See [Integration Guide](#-metaclaw-integration). - **[03/16/2026]** [v0.2.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.2.0) β€” Three multi-agent subsystems (CodeAgent, BenchmarkAgent, FigureAgent), hardened Docker sandbox with network-policy-aware execution, 4-round paper quality audit (AI-slop detection, 7-dim review scoring, NeurIPS checklist), and 15+ bug fixes from production runs. - **[03/15/2026]** [v0.1.0](https://github.com/aiming-lab/AutoResearchClaw/releases/tag/v0.1.0) β€” We release AutoResearchClaw: a fully autonomous 23-stage research pipeline that turns a single research idea into a conference-ready paper. No human intervention required.
--- ## ⚑ One Command. One Paper. ```bash # Fully autonomous β€” no human intervention pip install -e . && researchclaw setup && researchclaw init && researchclaw run --topic "Your research idea here" --auto-approve # Co-Pilot mode β€” collaborate with AI at key decision points researchclaw run --topic "Your research idea here" --mode co-pilot ``` --- ## πŸ€” What Is This? **You think it. AutoResearchClaw writes it. You guide the key decisions.** Drop a research topic β€” get back a full academic paper with real literature from OpenAlex, Semantic Scholar & arXiv, hardware-aware sandbox experiments (GPU/MPS/CPU auto-detected), statistical analysis, multi-agent peer review, and conference-ready LaTeX targeting NeurIPS/ICML/ICLR. Run it fully autonomous, or use **Co-Pilot mode** to guide the AI at critical decision points β€” choose research directions, review experiment designs, and co-write the paper. No hallucinated references.
πŸ“„paper_draft.mdFull academic paper (Introduction, Related Work, Method, Experiments, Results, Conclusion)
πŸ“paper.texConference-ready LaTeX (NeurIPS / ICLR / ICML templates)
πŸ“šreferences.bibReal BibTeX references from OpenAlex, Semantic Scholar and arXiv β€” auto-pruned to match inline citations
πŸ”verification_report.json4-layer citation integrity + relevance verification (arXiv, CrossRef, DataCite, LLM)
πŸ§ͺexperiment runs/Generated code + sandbox results + structured JSON metrics
πŸ“Šcharts/Auto-generated condition comparison charts with error bars and confidence intervals
πŸ“reviews.mdMulti-agent peer review with methodology-evidence consistency checks
🧬evolution/Self-learning lessons extracted from each run
πŸ“¦deliverables/All final outputs in one folder β€” compile-ready for Overleaf
The pipeline runs **end-to-end** β€” fully autonomous or with human-in-the-loop collaboration. When experiments fail, it self-heals. When hypotheses don't hold, it pivots. When citations are fake, it kills them. When you want to steer, it pauses and listens. 🌍 **Run it anywhere.** AutoResearchClaw isn't locked to a single platform. Use it standalone via CLI, plug it into [OpenClaw](https://github.com/openclaw/openclaw), or wire it up through any ACP-compatible agent β€” πŸ€– Claude Code, πŸ’» Codex CLI, πŸ™ Copilot CLI, β™Š Gemini CLI, πŸŒ™ Kimi CLI, you name it. And because OpenClaw bridges to messaging platforms, you can kick off a full research run from πŸ’¬ Discord, ✈️ Telegram, 🐦 Lark (飞书), πŸ’š WeChat, or wherever your team already hangs out. One topic in, one paper out β€” no matter where you type it. --- ## πŸš€ Quick Start ```bash # 1. Clone & install git clone https://github.com/aiming-lab/AutoResearchClaw.git cd AutoResearchClaw python3 -m venv .venv && source .venv/bin/activate pip install -e . # 2. Setup (interactive β€” installs OpenCode beast mode, checks Docker/LaTeX) researchclaw setup # 3. Configure researchclaw init # Interactive: choose LLM provider, creates config.arc.yaml # Or manually: cp config.researchclaw.example.yaml config.arc.yaml # 4. Run export OPENAI_API_KEY="sk-..." researchclaw run --config config.arc.yaml --topic "Your research idea" --auto-approve ``` Output β†’ `artifacts/rc-YYYYMMDD-HHMMSS-/deliverables/` β€” compile-ready LaTeX, BibTeX, experiment code, charts.
πŸ“ Minimum required config ```yaml project: name: "my-research" research: topic: "Your research topic here" llm: base_url: "https://api.openai.com/v1" api_key_env: "OPENAI_API_KEY" primary_model: "gpt-4o" fallback_models: ["gpt-4o-mini"] experiment: mode: "sandbox" sandbox: python_path: ".venv/bin/python" ```
--- ## 🧠 What Makes It Different | Capability | How It Works | |-----------|-------------| | **πŸ§‘β€βœˆοΈ Co-Pilot Mode** | 6 intervention modes β€” from fully autonomous to step-by-step. Guide the AI at critical decisions (hypotheses, baselines, paper writing) or let it run free. SmartPause auto-detects when human input would help. | | **πŸ”„ PIVOT / REFINE Loop** | Stage 15 autonomously decides: PROCEED, REFINE (tweak params), or PIVOT (new direction). Artifacts auto-versioned. | | **πŸ€– Multi-Agent Debate** | Hypothesis generation, result analysis, and peer review each use structured multi-perspective debate. | | **🧬 Self-Learning** | Lessons extracted per run (decision rationale, runtime warnings, metric anomalies) with 30-day time-decay. Future runs learn from past mistakes. | | **πŸ“š Knowledge Base** | Every run builds structured KB across 6 categories (decisions, experiments, findings, literature, questions, reviews). | | **πŸ›‘οΈ Sentinel Watchdog** | Background quality monitor: NaN/Inf detection, paper-evidence consistency, citation relevance scoring, anti-fabrication guard. | | **πŸ” Claim Verification** | Inline fact-checking: extracts claims from AI-generated text and cross-references against collected literature. Flags ungrounded citations and fabricated numbers. | | **🌿 Branch Exploration** | Fork the pipeline to explore multiple research directions simultaneously, compare results side-by-side, and merge the best path forward. | --- ## 🦞 OpenClaw Integration **AutoResearchClaw is an [OpenClaw](https://github.com/openclaw/openclaw)-compatible service.** Install it in OpenClaw and launch autonomous research with a single message β€” or use it standalone via CLI, Claude Code, or any AI coding assistant.
### πŸš€ Use with OpenClaw (Recommended) If you already use [OpenClaw](https://github.com/openclaw/openclaw) as your AI assistant: ``` 1️⃣ Share the GitHub repo URL with OpenClaw 2️⃣ OpenClaw auto-reads RESEARCHCLAW_AGENTS.md β†’ understands the pipeline 3️⃣ Say: "Research [your topic]" 4️⃣ Done β€” OpenClaw clones, installs, configures, runs, and returns results ``` **That's it.** OpenClaw handles `git clone`, `pip install`, config setup, and pipeline execution automatically. You just chat.
πŸ’‘ What happens under the hood 1. OpenClaw reads `RESEARCHCLAW_AGENTS.md` β†’ learns the research orchestrator role 2. OpenClaw reads `README.md` β†’ understands installation and pipeline structure 3. OpenClaw copies `config.researchclaw.example.yaml` β†’ `config.yaml` 4. Asks for your LLM API key (or uses your environment variable) 5. Runs `pip install -e .` + `researchclaw run --topic "..." --auto-approve` 6. Returns the paper, LaTeX, experiments, and citations
### πŸ”Œ OpenClaw Bridge (Advanced) For deeper integration, AutoResearchClaw includes a **bridge adapter system** with 6 optional capabilities: ```yaml # config.arc.yaml openclaw_bridge: use_cron: true # ⏰ Scheduled research runs use_message: true # πŸ’¬ Progress notifications (Discord/Slack/Telegram) use_memory: true # 🧠 Cross-session knowledge persistence use_sessions_spawn: true # πŸ”€ Spawn parallel sub-sessions for concurrent stages use_web_fetch: true # 🌐 Live web search during literature review use_browser: false # πŸ–₯️ Browser-based paper collection ``` Each flag activates a typed adapter protocol. When OpenClaw provides these capabilities, the adapters consume them without code changes. See [`docs/integration-guide.md`](docs/integration-guide.md) for full details. ### ACP (Agent Client Protocol) AutoResearchClaw can use **any ACP-compatible coding agent** as its LLM backend β€” no API keys required. The agent communicates via [acpx](https://github.com/openclaw/acpx), maintaining a single persistent session across all 23 pipeline stages. | Agent | Command | Notes | |-------|---------|-------| | Claude Code | `claude` | Anthropic | | Codex CLI | `codex` | OpenAI | | Copilot CLI | `gh` | GitHub | | Gemini CLI | `gemini` | Google | | OpenCode | `opencode` | SST | | Kimi CLI | `kimi` | Moonshot | ```yaml # config.yaml β€” ACP example llm: provider: "acp" acp: agent: "claude" # Any ACP-compatible agent CLI command cwd: "." # Working directory for the agent # No base_url or api_key needed β€” the agent handles its own auth. ``` ```bash # Just run β€” the agent uses its own credentials researchclaw run --config config.yaml --topic "Your research idea" --auto-approve ``` ### πŸ› οΈ Other Ways to Run | Method | How | |--------|-----| | **Standalone CLI** | `researchclaw run --topic "..." --auto-approve` (autonomous) or `--mode co-pilot` (collaborative) | | **Python API** | `from researchclaw.pipeline import Runner; Runner(config).run()` | | **Claude Code** | Reads `RESEARCHCLAW_CLAUDE.md` β€” just say *"Run research on [topic]"* | | **Copilot CLI** | `researchclaw run --topic "..."` with `llm.acp.agent: "gh"` | | **OpenCode** | Reads `.claude/skills/` β€” same natural language interface | | **Any AI CLI** | Provide `RESEARCHCLAW_AGENTS.md` as context β†’ agent auto-bootstraps | --- ## πŸ”¬ Pipeline: 23 Stages, 8 Phases ``` Phase A: Research Scoping Phase E: Experiment Execution 1. TOPIC_INIT 12. EXPERIMENT_RUN 2. PROBLEM_DECOMPOSE 13. ITERATIVE_REFINE ← self-healing Phase B: Literature Discovery Phase F: Analysis & Decision 3. SEARCH_STRATEGY 14. RESULT_ANALYSIS ← multi-agent 4. LITERATURE_COLLECT ← real API 15. RESEARCH_DECISION ← PIVOT/REFINE 5. LITERATURE_SCREEN [gate] 6. KNOWLEDGE_EXTRACT Phase G: Paper Writing 16. PAPER_OUTLINE Phase C: Knowledge Synthesis 17. PAPER_DRAFT 7. SYNTHESIS 18. PEER_REVIEW ← evidence check 8. HYPOTHESIS_GEN ← debate 19. PAPER_REVISION Phase D: Experiment Design Phase H: Finalization 9. EXPERIMENT_DESIGN [gate] 20. QUALITY_GATE [gate] 10. CODE_GENERATION 21. KNOWLEDGE_ARCHIVE 11. RESOURCE_PLANNING 22. EXPORT_PUBLISH ← LaTeX 23. CITATION_VERIFY ← relevance check ``` > **Gate stages** (5, 9, 20) pause for human approval or auto-approve with `--auto-approve`. On rejection, the pipeline rolls back. > **Co-Pilot mode** (`--mode co-pilot`): Deep human-AI collaboration at Stages 7-8 (Idea Workshop), Stage 9 (Baseline Navigator), and Stages 16-17 (Paper Co-Writer). Other stages auto-execute with SmartPause monitoring. > **Decision loops**: Stage 15 can trigger REFINE (β†’ Stage 13) or PIVOT (β†’ Stage 8), with automatic artifact versioning.
πŸ“‹ What Each Phase Does | Phase | What Happens | |-------|-------------| | **A: Scoping** | LLM decomposes the topic into a structured problem tree with research questions | | **A+: Hardware** | Auto-detects GPU (NVIDIA CUDA / Apple MPS / CPU-only), warns if local hardware is limited, adapts code generation accordingly | | **B: Literature** | Multi-source search (OpenAlex β†’ Semantic Scholar β†’ arXiv) for real papers, screens by relevance, extracts knowledge cards | | **C: Synthesis** | Clusters findings, identifies research gaps, generates testable hypotheses via multi-agent debate | | **D: Design** | Designs experiment plan, generates hardware-aware runnable Python (GPU tier β†’ package selection), estimates resource needs | | **E: Execution** | Runs experiments in sandbox, detects NaN/Inf and runtime bugs, self-heals code via targeted LLM repair | | **F: Analysis** | Multi-agent analysis of results; autonomous PROCEED / REFINE / PIVOT decision with rationale | | **G: Writing** | Outlines β†’ section-by-section drafting (5,000-6,500 words) β†’ peer reviews (with methodology-evidence consistency) β†’ revises with length guard | | **H: Finalization** | Quality gate, knowledge archival, LaTeX export with conference template, citation integrity + relevance verification |
--- ## ✨ Key Features | Feature | Description | |---------|------------| | **πŸ“š Multi-Source Literature** | Real papers from OpenAlex, Semantic Scholar & arXiv β€” query expansion, deduplication, circuit breaker with graceful degradation | | **πŸ” 4-Layer Citation Verification** | arXiv ID check β†’ CrossRef/DataCite DOI β†’ Semantic Scholar title match β†’ LLM relevance scoring. Hallucinated refs auto-removed. | | **πŸ–₯️ Hardware-Aware Execution** | Auto-detects GPU (NVIDIA CUDA / Apple MPS / CPU-only) and adapts code generation, imports, and experiment scale accordingly | | **🦾 OpenCode Beast Mode** | Complex experiments auto-routed to [OpenCode](https://github.com/anomalyco/opencode) β€” generates multi-file projects with custom architectures, training loops, and ablation studies. Install via `researchclaw setup`. | | **πŸ§ͺ Sandbox Experiments** | AST-validated code, immutable harness, NaN/Inf fast-fail, self-healing repair, iterative refinement (up to 10 rounds), partial result capture | | **πŸ“ Conference-Grade Writing** | NeurIPS/ICML/ICLR templates, section-by-section drafting (5,000-6,500 words), anti-fabrication guard, revision length guard, anti-disclaimer enforcement | | **πŸ“ Template Switching** | `neurips_2025`, `iclr_2026`, `icml_2026` β€” Markdown β†’ LaTeX with math, tables, figures, cross-refs, `\cite{}` | | **πŸ›‘οΈ Anti-Fabrication** | VerifiedRegistry enforces ground-truth experiment data in papers. Auto-diagnoses failed experiments and repairs them before writing. Unverified numbers sanitized. | | **🚦 Quality Gates** | 3 human-in-the-loop gates (Stages 5, 9, 20) with rollback. Skip with `--auto-approve`. | | **πŸ§‘β€βœˆοΈ HITL Co-Pilot** | 6 intervention modes with per-stage policies. Idea Workshop, Baseline Navigator, Paper Co-Writer for deep collaboration. SmartPause, cost guardrails, escalation policies, and intervention learning for production safety. CLI/WebSocket/MCP adapters. | | **πŸ’° Cost Guardrails** | Budget monitoring with configurable threshold alerts (50%/80%/100%). Pipeline auto-pauses when cost exceeds budget. | | **πŸ” Reproducibility** | SHA256 checksums for all stage artifacts. Immutable manifests for verification. Multi-level undo with versioned snapshots. | --- ## πŸ§‘β€βœˆοΈ Human-in-the-Loop Co-Pilot **AutoResearchClaw v0.4.0 introduces a complete Human-in-the-Loop (HITL) system** that transforms the pipeline from purely autonomous to a human-AI collaborative research engine. Choose your level of involvement: ### Intervention Modes | Mode | Command | What It Does | |------|---------|-------------| | **Full Auto** | `--auto-approve` | Original behavior β€” no human intervention | | **Gate Only** | `--mode gate-only` | Pause at 3 gate stages (5, 9, 20) for approval | | **Checkpoint** | `--mode checkpoint` | Pause at each phase boundary (8 checkpoints) | | **Co-Pilot** | `--mode co-pilot` | Deep collaboration at critical stages, auto elsewhere | | **Step-by-Step** | `--mode step-by-step` | Pause after every stage β€” learn the pipeline | | **Express** | `--mode express` | Quick review β€” only 3 most critical gates | | **Custom** | `--mode custom` | Define per-stage policies via `stage_policies` config | ### Co-Pilot Workflow ``` You: researchclaw run --topic "Quantum noise as neural network regularization" --mode co-pilot Pipeline runs Stages 1-7 automatically... β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ HITL | Stage 08: HYPOTHESIS_GEN β”‚ β”‚ Post-stage review β”‚ β”‚ β”‚ β”‚ Hypotheses mentioned: 3 β”‚ β”‚ Novelty score: 0.72 (moderate) β”‚ β”‚ β”‚ β”‚ [a] Approve [r] Reject [e] Edit [c] Collaborate β”‚ β”‚ [i] Inject guidance [v] View output [q] Abort β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ You: c (start collaborative chat) You: Hypothesis 3 is interesting but needs Dropout/Label Smoothing as baselines AI: Updated β€” added Dropout, Label Smoothing, MixUp, CutMix as baselines... You: approve Pipeline continues with your refined hypothesis... ``` ### CLI Commands ```bash # Start with HITL mode researchclaw run --topic "..." --mode co-pilot # Attach to a paused pipeline (from another terminal) researchclaw attach artifacts/rc-2026-xxx # Check pipeline and HITL status researchclaw status artifacts/rc-2026-xxx # Approve/reject from another terminal or script researchclaw approve artifacts/rc-2026-xxx --message "LGTM" researchclaw reject artifacts/rc-2026-xxx --reason "Missing key baseline" # Inject guidance for a stage (even before it runs) researchclaw guide artifacts/rc-2026-xxx --stage 9 --message "Use ResNet-50 as primary baseline" ``` ### Key Capabilities | Feature | Description | |---------|------------| | **Idea Workshop** | Brainstorm, evaluate, and refine hypotheses collaboratively (Stage 7-8) | | **Baseline Navigator** | AI suggests baselines + human adds/removes + reproducibility checklist (Stage 9) | | **Paper Co-Writer** | Section-by-section drafting with human editing and AI polishing (Stage 16-19) | | **SmartPause** | Confidence-driven dynamic pausing β€” auto-detects when human input would help | | **Claim Verification** | Inline fact-checking against collected literature β€” flags ungrounded claims | | **Cost Guardrails** | Budget monitoring with 50%/80%/100% threshold alerts | | **Intervention Learning** | ALHF β€” learns from your review patterns to optimize future pause decisions | | **Branch Exploration** | Fork pipeline to explore multiple hypotheses, compare, merge the best | | **Escalation Policy** | Tiered notification (terminal β†’ Slack β†’ email β†’ auto-halt) when unattended | | **3 Adapters** | CLI (terminal), WebSocket (web dashboard), MCP (external agents) | ### Configuration ```yaml # config.arc.yaml hitl: enabled: true mode: co-pilot # full-auto | gate-only | checkpoint | co-pilot | custom cost_budget_usd: 50.0 # Pause when cost exceeds budget (0 = no limit) notifications: on_pause: true on_quality_drop: true channels: ["terminal"] # terminal | slack | webhook timeouts: default_human_timeout_sec: 86400 # 24h default wait auto_proceed_on_timeout: false collaboration: max_chat_turns: 50 save_chat_history: true # Per-stage custom policies (optional, for 'custom' mode) stage_policies: 8: { require_approval: true, enable_collaboration: true } 9: { require_approval: true, allow_edit_output: true } ``` ### Backward Compatibility - **Default: OFF.** Without `hitl.enabled: true` or `--mode`, the pipeline behaves exactly as before. - **`--auto-approve` still works.** It overrides HITL mode. - **All 2,699 existing tests pass** with HITL code present. --- ## 🧠 MetaClaw Integration **AutoResearchClaw + [MetaClaw](https://github.com/aiming-lab/MetaClaw) = A pipeline that learns from every run.** MetaClaw adds **cross-run knowledge transfer** to AutoResearchClaw. When enabled, the pipeline automatically captures lessons from failures and warnings, converts them into reusable skills, and injects those skills into all 23 pipeline stages on subsequent runs β€” so the same mistakes are never repeated. ### How It Works ``` Run N executes β†’ failures/warnings captured as Lessons ↓ MetaClaw Lesson β†’ Skill conversion ↓ arc-* Skill files stored in ~/.metaclaw/skills/ ↓ Run N+1 β†’ build_overlay() injects skills into every LLM prompt ↓ LLM avoids known pitfalls β†’ higher quality, fewer retries ``` ### Quick Setup ```bash # 1. Install MetaClaw (if not already) pip install metaclaw # 2. Enable in your config ``` ```yaml # config.arc.yaml metaclaw_bridge: enabled: true proxy_url: "http://localhost:30000" # MetaClaw proxy (optional) skills_dir: "~/.metaclaw/skills" # Where skills are stored fallback_url: "https://api.openai.com/v1" # Direct LLM fallback fallback_api_key: "" # API key for fallback URL lesson_to_skill: enabled: true min_severity: "warning" # Convert warnings + errors max_skills_per_run: 3 ``` ```bash # 3. Run as usual β€” MetaClaw works transparently researchclaw run --config config.arc.yaml --topic "Your idea" --auto-approve ``` After each run, check `~/.metaclaw/skills/arc-*/SKILL.md` to see the skills your pipeline has learned. ### Experiment Results In controlled A/B experiments (same topic, same LLM, same configuration): | Metric | Baseline | With MetaClaw | Improvement | |--------|----------|---------------|-------------| | Stage retry rate | 10.5% | 7.9% | **-24.8%** | | Refine cycle count | 2.0 | 1.2 | **-40.0%** | | Pipeline stage completion | 18/19 | 19/19 | **+5.3%** | | Overall robustness score (composite) | 0.714 | 0.845 | **+18.3%** | > Composite robustness score is a weighted average of stage completion rate (40%), retry reduction (30%), and refine cycle efficiency (30%). ### Backward Compatibility - **Default: OFF.** If `metaclaw_bridge` is absent or `enabled: false`, the pipeline behaves exactly as before. - **No new dependencies.** MetaClaw is optional β€” the core pipeline works without it. - **All 2,699 existing tests pass** with the integration code present. --- ## 🧩 Skills Library AutoResearchClaw now supports loading **open-source and custom skills** to further enhance your research experience. We also ship with **19 pre-loaded built-in skills** (scientific writing, literature search, chemistry, biology, and more) as ready-to-use references, offering a high degree of flexibility out of the box. Disable any skill by adding `enabled: false` to its frontmatter. **Sample built-in skills:** | Category | Skill | Description | |----------|-------|-------------| | **Writing** | `scientific-writing` | IMRAD structure, citation formatting, reporting guidelines | | **Domain** | `chemistry-rdkit` | Molecular analysis, SMILES, fingerprints, drug discovery | | **Experiment** | `literature-search` | Systematic review, PRISMA methodology | > See all 19 skills with `researchclaw skills list`. ### Load Your Own Skills ```bash # Option 1: Install a skill (persists across projects) researchclaw skills install /path/to/my-skill/ # Option 2: Drop a SKILL.md into the project mkdir -p .claude/skills/my-custom-skill # Then create a SKILL.md with YAML frontmatter (name, description, trigger-keywords, applicable-stages) # Option 3: Configure shared skill directories in config.arc.yaml # skills: # custom_dirs: # - /path/to/team-shared-skills ``` ### Using Skills Skills are loaded and injected into LLM prompts automatically β€” no manual activation needed. Use the CLI to inspect: ```bash researchclaw skills list # Show all loaded skills with sources researchclaw skills validate ./my-skill # Check SKILL.md format ``` Browse community skills: [K-Dense-AI/claude-scientific-skills](https://github.com/K-Dense-AI/claude-scientific-skills) (150+ scientific skills across multiple disciplines). --- ## βš™οΈ Configuration Reference
Click to expand full configuration reference ```yaml # === Project === project: name: "my-research" # Project identifier mode: "docs-first" # docs-first | semi-auto | full-auto # === Research === research: topic: "..." # Research topic (required) domains: ["ml", "nlp"] # Research domains for literature search daily_paper_count: 8 # Target papers per search query quality_threshold: 4.0 # Minimum quality score for papers # === Runtime === runtime: timezone: "America/New_York" # For timestamps max_parallel_tasks: 3 # Concurrent experiment limit approval_timeout_hours: 12 # Gate stage timeout retry_limit: 2 # Retry count on stage failure # === LLM === llm: provider: "openai-compatible" # openai | openrouter | deepseek | minimax | acp | openai-compatible base_url: "https://..." # API endpoint (required for openai-compatible) api_key_env: "OPENAI_API_KEY" # Env var for API key (required for openai-compatible) api_key: "" # Or hardcode key here primary_model: "gpt-4o" # Primary model fallback_models: ["gpt-4o-mini"] # Fallback chain s2_api_key: "" # Semantic Scholar API key (optional, higher rate limits) acp: # Only used when provider: "acp" agent: "claude" # ACP agent CLI command (claude, codex, gemini, etc.) cwd: "." # Working directory for the agent # === Experiment === experiment: mode: "sandbox" # simulated | sandbox | docker | ssh_remote time_budget_sec: 300 # Max execution time per run (default: 300s) max_iterations: 10 # Max optimization iterations metric_key: "val_loss" # Primary metric name metric_direction: "minimize" # minimize | maximize sandbox: python_path: ".venv/bin/python" gpu_required: false allowed_imports: [math, random, json, csv, numpy, torch, sklearn] max_memory_mb: 4096 docker: image: "researchclaw/experiment:latest" network_policy: "setup_only" # none | setup_only | pip_only | full gpu_enabled: true memory_limit_mb: 8192 auto_install_deps: true # Auto-detect imports → requirements.txt ssh_remote: host: "" # GPU server hostname gpu_ids: [] # Available GPU IDs remote_workdir: "/tmp/researchclaw_experiments" opencode: # OpenCode Beast Mode (auto-installed via `researchclaw setup`) enabled: true # Master switch (default: true) auto: true # Auto-trigger without confirmation (default: true) complexity_threshold: 0.2 # 0.0-1.0 — higher = only trigger on complex experiments model: "" # Override model (empty = use llm.primary_model) timeout_sec: 600 # Max seconds for OpenCode generation max_retries: 1 # Retry count on failure workspace_cleanup: true # Remove temp workspace after collection code_agent: # CodeAgent v2 — multi-phase code generation enabled: true # Use CodeAgent instead of legacy single-prompt codegen architecture_planning: true # Generate deep implementation blueprint before coding sequential_generation: true # Generate files one-by-one following dependency DAG hard_validation: true # AST-based validation gates (blocks identical ablations, hardcoded metrics) hard_validation_max_repairs: 2 # Max repair attempts when validation fails exec_fix_max_iterations: 3 # Execution-in-the-loop fix attempts exec_fix_timeout_sec: 60 # Timeout per exec-fix attempt benchmark_agent: # BenchmarkAgent — automated dataset & baseline selection enabled: true # Enable 4-agent benchmark pipeline (Surveyor→Selector→Acquirer→Validator) enable_hf_search: true # Search HuggingFace Datasets enable_web_search: true # Search Google Scholar for benchmarks tier_limit: 2 # Dataset tier filtering (1=small/cached, 2=medium, 3=large) min_benchmarks: 1 # Minimum datasets required min_baselines: 2 # Minimum baseline methods required figure_agent: # FigureAgent — academic figure generation enabled: true # Enable 5-agent figure pipeline (Planner→CodeGen→Renderer→Critic→Integrator) min_figures: 3 # Minimum figures to generate max_figures: 8 # Maximum figures max_iterations: 3 # Critic-driven refinement iterations dpi: 300 # Output resolution strict_mode: false # Fail pipeline if figure generation fails repair: # Anti-fabrication experiment repair enabled: true # Auto-diagnose and repair failed experiments max_cycles: 3 # Repair retry loops min_completion_rate: 0.5 # >=50% conditions must complete to proceed min_conditions: 2 # At least 2 conditions for valid experiment use_opencode: true # Route repairs through OpenCode Beast Mode # === Web Search (Optional) === web_search: enabled: true # Enable web-augmented literature search tavily_api_key_env: "TAVILY_API_KEY" # Tavily API key env var (optional) enable_scholar: true # Google Scholar search enable_pdf_extraction: true # Extract text from PDFs max_web_results: 10 # Max web results per query # === Export === export: target_conference: "neurips_2025" # neurips_2025 | iclr_2026 | icml_2026 authors: "Anonymous" bib_file: "references" # === Prompts === prompts: custom_file: "" # Path to custom prompts YAML (empty = defaults) # === HITL Co-Pilot (NEW in v0.4.0) === hitl: enabled: false # Set to true to enable HITL mode: co-pilot # full-auto | gate-only | checkpoint | step-by-step | co-pilot | custom cost_budget_usd: 0.0 # Cost limit in USD (0 = no limit) notifications: on_pause: true # Notify when pipeline pauses on_quality_drop: true # Notify on quality issues channels: ["terminal"] # terminal | slack | webhook timeouts: default_human_timeout_sec: 86400 # Wait up to 24h for human input auto_proceed_on_timeout: false # If true, auto-approve on timeout collaboration: max_chat_turns: 50 # Max turns per collaboration session save_chat_history: true # Persist chat logs stage_policies: {} # Per-stage overrides (for 'custom' mode) # === Security === security: hitl_required_stages: [5, 9, 20] # Stages requiring human approval allow_publish_without_approval: false redact_sensitive_logs: true # === Knowledge Base === knowledge_base: backend: "markdown" # markdown | obsidian root: "docs/kb" # === Notifications === notifications: channel: "console" # console | discord | slack target: "" # === MetaClaw Bridge (Optional) === metaclaw_bridge: enabled: false # Set to true to enable cross-run learning proxy_url: "http://localhost:30000" # MetaClaw proxy URL skills_dir: "~/.metaclaw/skills" # Where arc-* skills are stored fallback_url: "" # Direct LLM fallback when proxy is down fallback_api_key: "" # API key for fallback endpoint lesson_to_skill: enabled: true # Auto-convert lessons to skills min_severity: "warning" # Minimum severity to convert max_skills_per_run: 3 # Max new skills per pipeline run prm: # Process Reward Model quality gate (optional) enabled: false # Use LLM-as-judge to score stage outputs model: "gpt-5.4" # PRM judge model votes: 3 # Majority vote count gate_stages: [5, 9, 15, 20] # Stages to apply PRM gates # === OpenClaw Bridge === openclaw_bridge: use_cron: false # Scheduled research runs use_message: false # Progress notifications use_memory: false # Cross-session knowledge persistence use_sessions_spawn: false # Spawn parallel sub-sessions use_web_fetch: false # Live web search use_browser: false # Browser-based paper collection ```
--- ## πŸ™ Acknowledgments Inspired by: - πŸ”¬ [AI Scientist](https://github.com/SakanaAI/AI-Scientist) (Sakana AI) β€” Automated research pioneer - 🧠 [AutoResearch](https://github.com/karpathy/autoresearch) (Andrej Karpathy) β€” End-to-end research automation - 🌐 [FARS](https://analemma.ai/blog/introducing-fars/) (Analemma) β€” Fully Automated Research System --- ## ⚠️ Ethics and Responsible Use AutoResearchClaw is a research assistance tool, not a replacement for human researchers. We ask all users to observe the following principles: **Academic integrity.** Papers generated by AutoResearchClaw should be treated as drafts that require thorough human review, verification, and revision before any submission. Authors listed on a paper bear full responsibility for its content, claims, and correctness. Using AI-generated text without adequate human oversight or disclosure may violate academic integrity policies at your institution or target venue. **Transparency and disclosure.** We strongly encourage users to disclose the use of AutoResearchClaw (or any AI assistance) in their manuscripts, in accordance with the policies of the target venue (e.g., NeurIPS, ICML, ICLR, and most major venues now require disclosure of AI writing assistance). The Human-in-the-Loop Co-Pilot exists precisely to keep humans in meaningful control of research decisions. **Citation and attribution.** AutoResearchClaw verifies citations through a 4-layer pipeline, but no automated system is perfect. Users must manually verify that all references are real, relevant, and correctly cited before submission. Fabricated or misattributed citations undermine scientific trust. **Potential for misuse.** Like any powerful tool, AutoResearchClaw can be misused to produce low-quality or misleading research at scale. We do not condone using this system to generate paper mills, fraudulent submissions, or content designed to game peer review. We reserve the right to update the license or terms of use if systematic misuse is identified. **Dual use.** Autonomous research systems raise broader questions about the future of scientific labor, authorship norms, and review processes. We welcome community discussion on these topics and are committed to developing this technology responsibly. By using AutoResearchClaw, you agree to use it in a manner consistent with these principles and with the ethical guidelines of your institution and research community. --- ## πŸ“„ License MIT β€” see [LICENSE](LICENSE) for details. --- ## πŸ“Œ Citation If you find AutoResearchClaw useful, please cite: ```bibtex @misc{liu2026autoresearchclaw, author = {Liu, Jiaqi and Xia, Peng and Han, Siwei and Qiu, Shi and Zhang, Letian and Chen, Guiming and Tu, Haoqin and Yang, Xinyu and Zhou, Jiawei and Zhu, Hongtu and Li, Yun and Zhang, Jiaheng and Zhou, Yuyin and Zheng, Zeyu and Xie, Cihang and Ding, Mingyu and Yao, Huaxiu}, title = {AutoResearchClaw: Fully Autonomous Research from Idea to Paper}, year = {2026}, organization = {GitHub}, url = {https://github.com/aiming-lab/AutoResearchClaw}, } ```

Built with 🦞 by the AutoResearchClaw team