# grid-orch **Repository Path**: dev-linhu/grid-orch ## Basic Information - **Project Name**: grid-orch - **Description**: 一个支持 OpenAI 兼容接口的实时分布式算力调度系统,基于 Ray + 持久化 WebSocket 节点通道,执行端仅支持 Ollama。 - **Primary Language**: Unknown - **License**: MIT - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2026-04-11 - **Last Updated**: 2026-04-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Grid Compute Orchestrator (MVP) A production-oriented MVP for distributed model serving with: - `GridControl`: centralized scheduler + control plane (FastAPI + Ray + SQLite) - `GridNode`: seller-side execution agent (persistent WebSocket + Ollama runtime) The system exposes OpenAI-compatible APIs while dispatching inference to registered nodes in real time. ## Highlights - OpenAI-compatible endpoints: - `GET /v1/models` - `POST /v1/chat/completions` - Persistent WebSocket channel (`/ws/node`) from `GridNode` to `GridControl` - Real-time task push (no polling dispatch) - Node liveness and runtime-aware routing: - immediate offline on disconnect - immediate offline on Ollama unavailability (`runtime_up=false`) - Data sensitivity routing (`P0/P1`) with trust-level constraints - Request/job/usage/audit persistence in SQLite ## Architecture ```text Client -> GridControl API -> Scheduler -> Ray Dispatch Broker -> GridNode -> Ollama ^ | |-------------- result over WebSocket ----- ``` ## Repository Layout ```text . ├── docker-compose.yml ├── mvp_schema_sqlite.sql ├── gridcontrol/ │ ├── app.py │ ├── scheduler.py │ ├── ray_runtime.py │ ├── db.py │ ├── config.py │ ├── requirements.txt │ ├── Dockerfile │ └── .env.example ├── gridnode/ │ ├── agent.py │ ├── ws_client.py │ ├── runtime.py │ ├── config.py │ ├── requirements.txt │ ├── Dockerfile │ └── .env.example └── docs/ └── quickstart.md ``` ## Quick Start (Docker Compose) 1. Prepare env files ```bash cp gridcontrol/.env.example gridcontrol/.env cp gridnode/.env.example gridnode/.env ``` 2. Build and run ```bash docker compose up -d --build ``` 3. Verify ```bash curl http://127.0.0.1:8080/healthz curl http://127.0.0.1:8080/v1/models ``` 4. Test completion ```bash curl -X POST http://127.0.0.1:8080/v1/chat/completions \ -H 'Content-Type: application/json' \ -d '{ "model":"qwen2.5:7b", "messages":[{"role":"user","content":"hello"}], "stream":false, "metadata":{"data_class":"P0"} }' ``` For detailed run commands, see [docs/quickstart.md](./docs/quickstart.md). ## Runtime Behavior - `GridNode` discovers installed models from `OLLAMA_BASE_URL/api/tags`. - If Ollama is unreachable, node sends `runtime_up=false` on heartbeat. - `GridControl` immediately marks node offline and fails in-flight jobs for that node. ## Security Notes - Node authentication via shared secret (`X-Node-Secret` / WS hello secret) - P0/P1 routing guardrails with trust-level filtering - Sensitive payloads are not logged in plain text ## Current MVP Scope Included: - single control plane - SQLite persistence - Ollama-only node runtime - non-streaming chat completion Not included yet: - billing/settlement - multi-tenant auth hardening - HA control plane - streaming response path ## License MIT License. See [LICENSE](./LICENSE).