Multi-host
Control rupu agents and workflows running on other machines from one central Control Plane — register remote hosts, launch work on the host you choose, and watch it stream back, all in a single browser tab.
What it is
Multi-host is a hub model. One central Control Plane
federates out to a fleet of remote hosts, each running its
own control plane
(rupu cp serve). A Host is a
top-level peer in rupu — it sits alongside your Projects, not underneath
them. The machine the central CP runs on is itself just a host:
host[0], "this host", always present and always first.
The remote is always the source of truth — the central CP keeps no authoritative copy of a host's run state, and tags every run with the host it came from. How it reaches that state depends on the transport: directly-reachable hosts (HTTP, SSH) are live-queried on demand — open a run list and the CP fans out to each host, merges the results, and proxies straight through to a run's detail / event stream / transcript; hosts that can't be reached inbound (tunnel, bucket) mirror their run artifacts back to the central CP as they change. Either way you get one merged fleet view, and no host's state is duplicated as a second source of truth.
rupu cp serve,
and that token is what the center authenticates with.
Connection types
Every host is reached through a single transport port
(HostConnector), so the Fleet surface, host
attribution, launcher, and live streams work the same no matter how a host
is wired up. Five transports ship today, covering hosts
you can reach directly, hosts behind a NAT or firewall, and hosts you can't
reach at all:
| Transport | How the host is reached | How to register | Best for |
|---|---|---|---|
| Local | In-process — no network hop. This is host[0], "this host," always present. | Built in (the local host). Nothing to register. |
The machine the central CP runs on. |
| HTTP (federation) | The central CP is a client of the remote's rupu cp serve, over its token-gated HTTP API. |
rupu host add <name> --url https://… --token … |
A remote that can expose a reachable, TLS-fronted server. |
| SSH | The CP connects out over SSH and runs rupu on the remote; auth is delegated to the system ssh. |
rupu host add <name> --ssh user@host |
A host you can already ssh to but that runs no server. |
| Tunnel (dial-home) | The remote dials home over an outbound WebSocket; the CP reaches it via its node registry. Works behind NAT / firewalls. | rupu node enroll <name>, then run rupu node on the box. |
A NAT'd / firewalled box that can't expose any inbound port. |
| Bucket (pull / dead-drop) | The CP and the worker exchange jobs + results through an object store. Neither side connects to the other. | rupu host add <name> --bucket s3://…, then run rupu node pull. |
Air-gapped / fully disconnected / batch hosts. |
~/.rupu/hosts/<id>.toml holding only its name
and transport metadata. HTTP bearer tokens live in the
system keychain, referenced by host id; tunnel hosts store
only the SHA-256 hash of the enrollment token; SSH and
bucket hosts store no secret at all (auth is delegated to
the system ssh and the cloud credential chain).
Local — host[0]
The machine running the central CP is itself a host: the built-in
local entry, always present and always first.
It needs no registration and can't be removed. Single-host setups never
touch any of the transports below — everything just runs on
local.
HTTP federation
Federation has two sides. On the remote machine, run the
control plane so it's reachable from the center and guarded by a bearer
token. Bind to an address other than loopback, and set
--token so /api/*
requires Authorization: Bearer <token>:
# on the REMOTE machine — serve the control plane, bound + token-guarded $ rupu cp serve --bind 0.0.0.0:7878 --token "$RUPU_CP_TOKEN" --no-open
Put a TLS-terminating reverse proxy in front for a real
https:// URL — remote base URLs are expected to
be HTTPS so the token is protected in transit. Then, on the
central machine, register that host with a display name,
the remote's base --url, and the
--token:
# on the CENTRAL machine — register the remote host $ rupu host add prod --url https://host.example.com --token "$REMOTE_TOKEN" host_01J9Z4W7Q0X8Y6V5K3M2N1P0R8
To keep the token out of your shell history and process list, read it from
stdin instead with --token-stdin:
# pipe the token in on stdin (mutually exclusive with --token) $ pass show rupu/prod | rupu host add prod --url https://host.example.com --token-stdin
SSH
The SSH transport targets a host that runs a full rupu install but exposes
no rupu cp serve and runs no dial-home agent —
yet you can already ssh to it. The central CP
connects out over SSH: it dispatches a run with
ssh host rupu workflow run … (detached on the
remote so it survives the SSH session), then keeps a long-lived
ssh … tail -f pump that mirrors the run's
artifact files back into the central run store — so the run shows up in the
same host-aware lists, detail, and live events as any other. Register it
with the --ssh destination:
# register an SSH host — destination is user@host or a ~/.ssh/config alias $ rupu host add build-box --ssh deploy@build.example.com --port 22 --identity ~/.ssh/id_ed25519 host_01J9ZB3KQ8M2T0V4W6X8Y1A3C5
ssh — ssh-agent,
~/.ssh/config, default keys. The host record
keeps only the destination, optional --port, and
optional --identity path. Because the
destination can be a ~/.ssh/config alias,
ProxyJump / ControlMaster / keys all come for free from your SSH config.
Tunnel (dial-home)
When a host is behind a NAT or firewall and can't expose any inbound port,
invert the direction: a lightweight rupu node
agent on the box dials out to the CP over a persistent
WebSocket, executes dispatched runs locally, and streams their artifacts
back. The CP reaches the node through its node registry — no inbound ports,
no reachable server. Start on the central machine by
enrolling the node, which mints a one-time token and prints the exact
command to run on the box:
# on the CENTRAL machine — mint a node + one-time token $ rupu node enroll build-box-01 --cp-url wss://cp.example.com enrolled: build-box-01 (host_01J9ZC4M0N1P2Q3R4S5T6U7V8W) ⚠ token shown ONCE — copy it to the node now: rupu node --cp-url wss://cp.example.com --token <token> --node-id host_01J9ZC4M0N1P2Q3R4S5T6U7V8W
Copy the token to the remote box and run the node agent there, reading the token from stdin so it never lands in shell history. The agent authenticates, stays connected, and reconnects with backoff if the link drops:
# on the REMOTE box — dial home and stay connected $ rupu node --cp-url wss://cp.example.com --token-stdin < token.txt
Bucket (pull / dead-drop)
The most decoupled transport: for a host the CP can't reach at all and which can't reach the CP either. Both sides independently talk to a shared object-store bucket that acts as the gateway. The CP writes dispatched work into the bucket; the worker polls it, atomically claims a job, runs it locally, and writes results back; a CP-side poller reads those results and mirrors them in. Register the host on the central machine with a bucket URL and optional prefix:
# on the CENTRAL machine — register a bucket (dead-drop) host $ rupu host add nightly --bucket s3://my-bucket --prefix rupu/host-1 host_01J9ZD5N1P2Q3R4S5T6U7V8W9X
On the worker box, run the pull agent against the same
bucket and prefix. It claims jobs, runs them locally, and writes results
back — no inbound or outbound connectivity to the CP required. By default
it loops forever, polling on an interval; pass
--once to drain whatever is queued a single time
and exit (handy for cron / batch hosts):
# on the WORKER box — poll the bucket, claim + run jobs, write results back $ rupu node pull --bucket s3://my-bucket --prefix rupu/host-1 # or drain once and exit (cron / batch) $ rupu node pull --bucket s3://my-bucket --prefix rupu/host-1 --once
s3://, gs://, or
file:// (a shared filesystem on a local
network). Credentials are resolved from the standard cloud credential chain
/ environment (AWS_*,
GOOGLE_*) — rupu stores none. Control messages
(cancel / approve / reject) are queued in the bucket and applied on the
worker's next poll, since a dead-drop is inherently asynchronous.
Manage hosts
rupu host list shows every configured host. The
built-in local host is always first and
can't be removed — it's host[0], "this host." Each row is the host
id, name, and a transport label (the base URL for HTTP, or
ssh:… / tunnel:… /
bucket:… for the others):
$ rupu host list
local local local
host_01J9Z4W7Q0X8Y6V5K3M2N1P0R8 prod https://host.example.com
host_01J9ZB3KQ8M2T0V4W6X8Y1A3C5 build-box ssh:deploy@build.example.com:22
host_01J9ZC4M0N1P2Q3R4S5T6U7V8W build-box-01 tunnel:host_01J9ZC4M0N1P2Q3R4S5T6U7V8W
host_01J9ZD5N1P2Q3R4S5T6U7V8W9X nightly bucket:s3://my-bucket
Remove a remote host by its id with rupu host
remove. This deletes the TOML record and any keychain token;
removing local is refused:
$ rupu host remove host_01J9Z4W7Q0X8Y6V5K3M2N1P0R8 removed host_01J9Z4W7Q0X8Y6V5K3M2N1P0R8
The same registry is exposed in the browser — adding and removing hosts in the Control Plane writes the exact same TOML records and keychain entries, so the CLI and the CP are always looking at one shared fleet.
In the Control Plane
With hosts registered, the central Control plane becomes fleet-aware. Everything you can do on the local host you can now do on a remote one — whichever transport reaches it.
-
Fleet → Hosts. A new top-level surface listing every
host (name, transport, status, version, active runs, last seen), with
localpinned first and an Add Host form that registers HTTP, SSH, and bucket hosts right from the browser (tunnel nodes enroll via the CLI). Each host's status is probed live; a host's detail view scopes the run list to that one host plus its connection and health info. -
Host attribution + filtering. Run lists gain a
Host column and filter so you can see at a glance where
each run is executing. Runs are addressed by
(host_id, run_id), so deep-links stay stable; run detail shows the owning host. The default view stays "this host," so single-host setups are completely unchanged. - Host selector in the launcher. The launcher gains a host picker (defaulting to local). Launch a workflow run, an agent run, or a session on whichever host you choose — and send session turns to it — all proxied to that host's launch endpoints.
- Observe a host's runs. List, detail, the live event stream, and the transcript of any remote run all surface in the central CP — proxied straight through for HTTP hosts, mirrored into the central run store for tunnel / SSH / bucket hosts. The same live view works whether the run is local or three hosts away.
- Control remote runs. Approve, reject, and cancel a paused or in-flight run on the host that owns it — over whichever transport that host uses (queued in the bucket for dead-drop hosts).
- Graceful degradation. When a host is unreachable, the run-list fan-out tolerates it per-host: that host shows an offline marker and the rest of the page stays intact. Launch or control against an offline host returns a clear error, and live streams auto-reconnect.
rupu cp serve runtime — that's what installs the
host registry with its transport connectors and the mirror / pull workers.
The read-only rupu cp can show hosts but cannot
control them.
Distribute work across the fleet
Beyond running a whole workflow on one host, a single workflow run can
spread a for_each step's units across the
fleet. Add a distribute: block naming the
hosts; rupu assigns the units round-robin, dispatches each to
its host, attributes the unit's run to that host, and aggregates the results
back into the step — partial failures honored per
continue_on_error. It works over every transport
(HTTP / SSH / tunnel / bucket).
- id: review_each agent: code-reviewer for_each: "{{ inputs.files }}" distribute: hosts: [gpu-box, build-box] prompt: "Review {{ item }}."
See Workflows for the step format. Host-placement of individual file-mutating steps (which needs cross-host workspace sync) is still in progress — see below.
Choosing a transport
All five transports satisfy the same port, so pick by how the host can be reached on the network — not by what features you want:
-
The CP can reach the host directly. If the host can run
a reachable, TLS-fronted server, use HTTP. If it can't
run a server but you can already
sshto it, use SSH — no daemon to deploy, auth comes from your SSH config. -
The host is behind a NAT or firewall. It can't accept an
inbound connection but can dial out — use the Tunnel:
run
rupu nodeon the box and it connects home. -
Neither side can reach the other. Air-gapped, locked
down, or batch / cron hosts that only share an object store — use a
Bucket dead-drop and run
rupu node pullon the worker.
Still coming
All five transports above are shipped today. The
architecture was built around one transport port
(HostConnector) so the remaining work slots in
without disturbing the surfaces above. These are genuinely
not done yet:
| Coming later | What it adds |
|---|---|
| Per-step host placement & workspace sync | A host: on an individual (non-fan-out) step, plus cross-host workspace / file sync so file-mutating steps can run remotely. (Distributing a for_each step's units across hosts already ships — see Distribute work across the fleet above.) |
| Auto-placement | Capability-based scheduling — picking the host for a run automatically from its advertised backends and capabilities. |
| Remote sessions over the tunnel | Interactive, long-lived sessions dispatched to a tunnel node (a persistent session worker + reconnect-stable mapping). Today's tunnel covers workflow / agent runs and approve / reject / resume; sessions are deferred. |
| mTLS hardening | Mutual-TLS node enrollment (cert CN = node id) on top of today's token-hash auth — the frame envelope already carries an auth block so it slots in with no protocol change. |