# agent-rdp **Repository Path**: iwannay_admin/agent-rdp ## Basic Information - **Project Name**: agent-rdp - **Description**: No description available - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-05-12 - **Last Updated**: 2026-05-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # agent-rdp A CLI tool for AI agents to control Windows Remote Desktop sessions, built on [IronRDP](https://github.com/Devolutions/IronRDP). ## Demo Claude Code automating SQLite database and table creation via RDP: https://github.com/user-attachments/assets/91892b39-4edb-412b-b265-55ccd75d7421 ## Features - **Connect to RDP servers** - Full RDP protocol support with TLS and CredSSP authentication - **Take screenshots** - Capture the remote desktop as PNG or JPEG - **Mouse control** - Click, double-click, right-click, drag, scroll - **Keyboard input** - Type text, press key combinations (Ctrl+C, Alt+Tab, etc.) - **Clipboard sync** - Copy/paste text between local machine and remote Windows - **Drive mapping** - Map local directories as network drives on the remote machine - **UI Automation** - Interact with Windows applications via accessibility API (click, select, toggle, expand) - **OCR text location** - Find text on screen using OCR when UI Automation isn't available - **JSON output** - Structured output for AI agent consumption - **Session management** - Multiple named sessions with automatic daemon lifecycle ## Installation ### From npm ```bash npm install -g agent-rdp ``` ### As a Claude Code skill ```bash npx add-skill https://github.com/thisnick/agent-rdp ``` ### From source ```bash git clone https://github.com/thisnick/agent-rdp cd agent-rdp pnpm install pnpm build # Build native binary pnpm build:ts # Build TypeScript ``` ## Usage ### Connect to an RDP Server ```bash # Using command line (password visible in process list - not recommended) agent-rdp connect --host 192.168.1.100 --username Administrator --password 'secret' # Using environment variables (recommended) export AGENT_RDP_USERNAME=Administrator export AGENT_RDP_PASSWORD=secret agent-rdp connect --host 192.168.1.100 # Using stdin (most secure) echo 'secret' | agent-rdp connect --host 192.168.1.100 --username Administrator --password-stdin ``` ### Take a Screenshot ```bash # Save to file agent-rdp screenshot --output desktop.png # Output as base64 (for AI agents) agent-rdp screenshot --base64 # With JSON output agent-rdp --json screenshot --base64 ``` ### Mouse Operations ```bash # Click at position agent-rdp mouse click 500 300 # Right-click agent-rdp mouse right-click 500 300 # Double-click agent-rdp mouse double-click 500 300 # Move cursor agent-rdp mouse move 100 200 # Drag from (100,100) to (500,500) agent-rdp mouse drag 100 100 500 500 ``` ### Keyboard Operations ```bash # Type text (supports Unicode) agent-rdp keyboard type "Hello, World!" # Press key combinations agent-rdp keyboard press "ctrl+c" agent-rdp keyboard press "alt+tab" agent-rdp keyboard press "ctrl+shift+esc" # Press single keys (use press command) agent-rdp keyboard press enter agent-rdp keyboard press escape agent-rdp keyboard press f5 ``` ### Scroll ```bash agent-rdp scroll up --amount 3 agent-rdp scroll down --amount 5 agent-rdp scroll left agent-rdp scroll right ``` ### Locate (OCR) Find text on screen using OCR (powered by [ocrs](https://github.com/robertknight/ocrs)). Useful when UI Automation can't access certain elements (WebView content, some dialogs). ```bash # Find lines containing text agent-rdp locate "Cancel" # Pattern matching (glob-style) agent-rdp locate "Save*" --pattern # Get all text on screen agent-rdp locate --all # JSON output agent-rdp locate "OK" --json ``` Returns text lines with coordinates for clicking: ``` Found 1 line(s) containing 'Cancel': 'Cancel Button' at (650, 420) size 80x14 - center: (690, 427) To click the first match: agent-rdp mouse click 690 427 ``` ### Clipboard ```bash # Set clipboard text (available when you paste on Windows) agent-rdp clipboard set "Hello from CLI" # Get clipboard text (after copying on Windows) agent-rdp clipboard get # With JSON output agent-rdp --json clipboard get ``` ### Drive Mapping Map local directories as network drives on the remote Windows machine. Drives must be mapped at connect time. Multiple drives can be specified. ```bash # Map local directories during connection agent-rdp connect --host 192.168.1.100 -u Administrator -p secret \ --drive /home/user/documents:Documents \ --drive /tmp/shared:Shared # List mapped drives agent-rdp drive list ``` On the remote Windows machine, mapped drives appear in File Explorer as network locations. ### UI Automation Interact with Windows applications programmatically via the Windows UI Automation API using native patterns (InvokePattern, SelectionItemPattern, TogglePattern, etc.). When enabled, a PowerShell agent is injected into the remote session that captures the accessibility tree and performs actions. Communication between the CLI and the agent uses a Dynamic Virtual Channel (DVC) for fast bidirectional IPC. For detailed documentation, see [AUTOMATION.md](https://github.com/thisnick/agent-rdp/blob/main/docs/AUTOMATION.md). ```bash # Connect with automation enabled agent-rdp connect --host 192.168.1.100 -u Admin -p secret --enable-win-automation # Take an accessibility tree snapshot (refs are always included) agent-rdp automate snapshot # Snapshot filtering options (like agent-browser) agent-rdp automate snapshot -i # Interactive elements only agent-rdp automate snapshot -c # Compact (remove empty structural elements) agent-rdp automate snapshot -d 3 # Limit depth to 3 levels agent-rdp automate snapshot -s "~*Notepad*" # Scope to a window/element agent-rdp automate snapshot -i -c -d 5 # Combine options # Pattern-based element operations (refs use @eN format) agent-rdp automate click "#SaveButton" # Click button agent-rdp automate click "@e5" # Click by ref number from snapshot agent-rdp automate click "@e5" -d # Double-click (for file list items) agent-rdp automate select "@e10" # Select item (SelectionItemPattern) agent-rdp automate toggle "@e7" # Toggle checkbox (TogglePattern) agent-rdp automate expand "@e3" # Expand menu (ExpandCollapsePattern) agent-rdp automate context-menu "@e5" # Open context menu (Shift+F10) # Fill text fields agent-rdp automate fill ".Edit" "Hello World" # Window operations agent-rdp automate window list agent-rdp automate window focus "~*Notepad*" # Run PowerShell commands agent-rdp automate run "Get-Process" --wait agent-rdp automate run "Get-Process" --wait --process-timeout 5000 # With 5s timeout ``` **Selector Types:** - `@e5` or `@5` - Reference number from snapshot (e prefix recommended) - `#SaveButton` - Automation ID - `.Edit` - Win32 class name - `~*pattern*` - Wildcard name match - `File` - Element name (exact match) **Snapshot Output Format:** ``` - Window "Notepad" [ref=e1, id=Notepad] - MenuBar "Application" [ref=e2] - MenuItem "File" [ref=e3] - Edit "Text Editor" [ref=e5, value="Hello"] ``` ### Session Management ```bash # List active sessions agent-rdp session list # Get current session info agent-rdp session info # Close a session agent-rdp session close # Use a named session agent-rdp --session work connect --host work-pc.local ... agent-rdp --session work screenshot ``` ### Disconnect ```bash agent-rdp disconnect ``` ### Web Viewer Open the web-based viewer to see the remote desktop in your browser: ```bash # Open viewer (connects to default streaming port 9224) agent-rdp view # Specify a different port agent-rdp view --port 9224 ``` The viewer requires WebSocket streaming to be enabled. Start a session with streaming: ```bash agent-rdp --stream-port 9224 connect --host 192.168.1.100 -u Admin -p secret agent-rdp view ``` ## JSON Output All commands support `--json` for structured output: ```bash agent-rdp --json screenshot --base64 ``` **Success response:** ```json { "success": true, "data": { "type": "screenshot", "width": 1920, "height": 1080, "format": "png", "base64": "iVBORw0KGgo..." } } ``` **Error response:** ```json { "success": false, "error": { "code": "not_connected", "message": "Not connected to an RDP server" } } ``` ## Environment Variables | Variable | Description | |----------|-------------| | `AGENT_RDP_HOST` | RDP server hostname or IP | | `AGENT_RDP_PORT` | RDP server port (default: 3389) | | `AGENT_RDP_USERNAME` | RDP username | | `AGENT_RDP_PASSWORD` | RDP password | | `AGENT_RDP_SESSION` | Session name (default: "default") | | `AGENT_RDP_STREAM_PORT` | WebSocket streaming port (0 = disabled) | ## Node.js API Use agent-rdp programmatically from Node.js/TypeScript: ```typescript import { RdpSession } from 'agent-rdp'; const rdp = new RdpSession({ session: 'default' }); await rdp.connect({ host: '192.168.1.100', username: 'Administrator', password: 'secret', width: 1280, height: 800, drives: [{ path: '/tmp/share', name: 'Share' }], enableWinAutomation: true, // Enable UI Automation }); // Screenshot const { base64, width, height } = await rdp.screenshot({ format: 'png' }); // Mouse await rdp.mouse.click({ x: 100, y: 200 }); await rdp.mouse.rightClick({ x: 100, y: 200 }); await rdp.mouse.doubleClick({ x: 100, y: 200 }); await rdp.mouse.move({ x: 150, y: 250 }); await rdp.mouse.drag({ from: { x: 100, y: 100 }, to: { x: 500, y: 500 } }); // Keyboard await rdp.keyboard.type({ text: 'Hello World' }); await rdp.keyboard.press({ keys: 'ctrl+c' }); await rdp.keyboard.press({ keys: 'enter' }); // Single keys use press() // Scroll await rdp.scroll.up(); // Default amount: 3 await rdp.scroll.down({ amount: 5 }); // Custom amount await rdp.scroll.up({ x: 500, y: 300 }); // Scroll at position // Clipboard await rdp.clipboard.set({ text: 'text to copy' }); const text = await rdp.clipboard.get(); // Locate text using OCR const matches = await rdp.locate({ text: 'Cancel' }); if (matches.length > 0) { await rdp.mouse.click({ x: matches[0].center_x, y: matches[0].center_y }); } // Get all text on screen const allText = await rdp.locate({ all: true }); // Automation (requires --enable-win-automation at connect) const snapshot = await rdp.automation.snapshot({ interactive: true }); await rdp.automation.click('@e5'); // Click button by ref await rdp.automation.click('@e5', { doubleClick: true }); // Double-click await rdp.automation.select('@e10'); // Select item await rdp.automation.toggle('@e7'); // Toggle checkbox await rdp.automation.expand('@e3'); // Expand menu await rdp.automation.contextMenu('@e5'); // Open context menu await rdp.automation.fill('#input', 'text'); // Fill text field await rdp.automation.run('notepad.exe'); // Run command await rdp.automation.waitFor('#SaveButton', { timeout: 5000 }); // Window management const windows = await rdp.automation.listWindows(); await rdp.automation.focusWindow('~*Notepad*'); await rdp.automation.maximizeWindow(); // Drives const drives = await rdp.drives.list(); // Session info const info = await rdp.getInfo(); // Disconnect await rdp.disconnect(); ``` ### WebSocket Streaming Enable WebSocket streaming for real-time screen capture and bidirectional clipboard support: ```typescript const rdp = new RdpSession({ session: 'viewer', streamPort: 9224, // Enable streaming }); await rdp.connect({...}); // Connect your WebSocket client to receive JPEG frames const streamUrl = rdp.getStreamUrl(); // "ws://localhost:9224" ``` For the complete WebSocket protocol specification (message types, clipboard flow, input handling), see [WEBSOCKET.md](https://github.com/thisnick/agent-rdp/blob/main/docs/WEBSOCKET.md). ## Architecture agent-rdp uses a daemon-per-session architecture: 1. **CLI** (`agent-rdp`) - Parses commands and communicates with the daemon 2. **Daemon** - Maintains the RDP connection and processes commands 3. **IPC** - Unix sockets (macOS/Linux) or TCP (Windows) The daemon is automatically started on the first command and persists until explicitly closed or the session times out. ## Limitations ### UI Automation - **WebViews**: UI Automation cannot interact with WebView content (e.g., Windows Start menu search, Edge browser content, Electron apps). Use `Win+R` or `automate run` to launch programs directly instead of clicking through menus. - **UAC Dialogs**: User Account Control elevation prompts run on a secure desktop and are not accessible via UI Automation. There is no good workaround - the remote user must interact with UAC manually, or UAC must be disabled (not recommended for security reasons). ### OCR Fallback When UI Automation cannot access certain elements, the `locate` command provides OCR-based text detection: ```bash agent-rdp locate "Button Text" # Find text and get coordinates agent-rdp mouse click # Click at returned coordinates ``` This is not highly reliable (OCR can misread characters, miss text, or return imprecise coordinates), but may work for simple cases like dialog buttons. ### Screenshot Coordinate Detection **Claude models** (in non-computer-use mode, such as Claude Code) are poor at estimating pixel coordinates from screenshots. Do not ask Claude to look at a screenshot and guess where to click - it will likely be inaccurate. **Gemini models** are generally good at pixel coordinate estimation from images. If you need vision-based coordinate detection with Claude, implement your own harness using Claude's [Computer Use Tool](https://docs.anthropic.com/en/docs/agents-and-tools/computer-use) which is specifically designed for this purpose. ## Requirements - Rust 1.75 or later - Target RDP server with Network Level Authentication (NLA) enabled ## License MIT OR Apache-2.0 (same as IronRDP)