official-public/storm-pulse

Fork 0

Table of Contents

Protocol Specification

Envelope

Validation rules

Message types

heartbeat
register
metrics.push
command.request
command.sequence
command.result
command.progress
log.batch

Long-running commands
Sealed commands

run_verify_block

Dashboard response types

register.ok
heartbeat.ack
metrics.ack
command.result.ack
log.batch.ack
error

Serialization
Versioning
Enrollment Protocol

Flow
Request
Response (200)
Errors

Protocol Specification

Storm Pulse uses a JSON-over-WebSocket protocol for all communication between agents and the dashboard. Every message shares a common envelope structure. The agent initiates all connections -- the dashboard never reaches out to agents.

Envelope

Every message on the wire is a JSON object with these fields:

Field	Type	Description
`v`	integer	Protocol version. Currently `1`.
`type`	string	One of the message types listed below.
`id`	string	Unique message ID (UUID v4).
`ts`	string	ISO 8601 timestamp with timezone. UTC uses `Z` suffix.
`agent_id`	string	Identifies the sending/receiving agent. Matches the certificate SAN and config.
`payload`	object	Message-specific data. Always a JSON object, even if empty.

Example:

{
  "v": 1,
  "type": "heartbeat",
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "ts": "2026-02-21T12:00:00Z",
  "agent_id": "vps-toronto-01",
  "payload": {}
}

Validation rules

v must equal 1. Any other value is rejected immediately. When the protocol evolves, this field gates backward-incompatible changes.
type must be a recognized message type. Unknown types are rejected.
ts must include timezone information. Naive timestamps (no offset, no Z) are rejected.
agent_id must be a non-empty string.
payload must be a JSON object. Arrays, strings, and other types are rejected.
Extra fields on the envelope are silently ignored for forward compatibility. A v1 parser won't break if a future version adds fields.
Missing required fields are rejected immediately. Partial envelopes don't parse.

Message types

`heartbeat`

Direction: Agent -> Dashboard Interval: Every 30 seconds Purpose: Confirms the WebSocket connection is alive.

Payload is always empty:

{"payload": {}}

If the dashboard receives no heartbeat for 90 seconds (3 missed intervals), it should consider the agent disconnected.

`register`

Direction: Agent -> Dashboard When: Sent once on each new WebSocket connection, before any other message.

Field	Type	Description
`version`	string	Agent software version (e.g. `"0.1.0"`).
`pulse_token`	string	UUID from `Server.pulse_token` in the dashboard. Binds the connection to a specific server record.
`commands`	object or null	Command metadata dict. Keys are command names, values are metadata objects. `null` for backward compatibility with older agents.
`garage`	object or null	Initial Garage node state. Present when Garage integration is enabled. `null` for non-Garage nodes. See Garage Integration.
`log_groups`	array of strings or null	Names of enabled log groups this agent will ship (e.g. `["storage", "pulse"]`, or container names like `["web", "db", "caddy"]` when the Docker source type is used). `null` when log shipping is not configured. The dashboard uses this to know which groups to expect `log.batch` messages for.
`signoff_sealed`	boolean or null	Sign-off seal state on the host. `true` means sealed: `run_verify_block` is excluded from `commands` and the agent will refuse dispatch at runtime. `false` means unsealed: `run_verify_block` is present in `commands` and dispatch will execute. `null` for agents that predate the seal (the dashboard treats `null` as sealed-equivalent for safety). See Security Architecture — Sign-off seal.
`unsealed_since`	string (ISO 8601) or null	Wall-clock UTC when the current unseal episode began. Present only when `signoff_sealed` is `false`. `null` when sealed, or when the operator removed the seal marker by hand without going through the CLI. The dashboard renders "unsealed for X" from this field.

Each command metadata object:

Field	Type	Description
`group`	string	Command group for UI sections (e.g. `"deploy"`, `"diagnostics"`).
`description`	string	Human-readable description for tooltips. May be empty.
`template`	array of strings	Display-safe command template. Absolute binary paths are stripped to basenames (e.g. `/usr/bin/docker` → `docker`). Placeholders like `{project_dir}` are preserved.
`timeout`	integer	Maximum execution time in seconds. For `long_running` commands, this is the per-batch / per-step timeout, not the overall job duration.
`requires_confirmation`	boolean	If true, the dashboard should show a confirmation dialog before sending.
`long_running`	boolean	If true, this command emits one or more `command.progress` events between the originating `command.request` and the terminal `command.result`, and may run for significantly longer than `timeout` suggests. The dashboard must hold pending state and render long-running UX accordingly. Omitted by older agents — treat as `false`.
`params`	object	Parameter definitions. Keys are parameter names, values are param metadata objects. Empty `{}` when no parameters.

Each param metadata object:

Field	Type	Description
`default`	string or null	Default value. `null` means no static default — a runtime override is required (or the value comes from config).
`pattern`	string	Regex pattern for validation (matched with `re.fullmatch`).
`description`	string	Human-readable description. May be empty.

{
  "payload": {
    "version": "0.1.0",
    "pulse_token": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "commands": {
      "git_pull": {
        "group": "deploy",
        "description": "Pull latest changes from remote",
        "template": ["git", "-C", "{project_dir}", "pull"],
        "timeout": 60,
        "requires_confirmation": false,
        "long_running": false,
        "params": {}
      },
      "docker_logs": {
        "group": "diagnostics",
        "description": "Show recent service logs",
        "template": ["docker", "compose", "--env-file", "{env_file}", "-f", "{compose_file}", "logs", "--tail", "{tail_lines}", "{docker_service_name}"],
        "timeout": 30,
        "requires_confirmation": false,
        "long_running": false,
        "params": {
          "docker_service_name": {
            "default": null,
            "pattern": "[a-zA-Z0-9_-]+",
            "description": "Docker Compose service name"
          },
          "tail_lines": {
            "default": "100",
            "pattern": "[0-9]{1,5}",
            "description": "Number of log lines to show"
          }
        }
      }
    }
  }
}

The dashboard looks up the server by pulse_token and may reject the connection if the token is unknown or the agent version is too old. When commands is present, the dashboard stores the metadata on the server record so the frontend can render command buttons with groups, descriptions, confirmation prompts, and parameter input fields. The Deploy button is disabled if any sequence step is missing from the command keys.

`metrics.push`

Direction: Agent -> Dashboard Interval: Configurable, default 15 seconds.

Field	Type	Description
`cpu_percent`	float	CPU usage percentage (0.0 - 100.0).
`memory_percent`	float	Memory usage percentage.
`memory_used_mb`	float	Used memory in megabytes.
`memory_total_mb`	float	Total memory in megabytes.
`disk_percent`	float	Root filesystem usage percentage.
`disk_used_gb`	float	Used disk space in gigabytes.
`disk_total_gb`	float	Total disk space in gigabytes.
`load_avg_1m`	float	1-minute load average.
`load_avg_5m`	float	5-minute load average.
`uptime_seconds`	float	System uptime in seconds.
`containers`	array	List of container status objects. May be empty.
`garage`	object or null	Garage node state snapshot. Present when Garage integration is enabled. `null` for non-Garage nodes. See Garage Integration.

Each container object:

Field	Type	Description
`name`	string	Container name.
`status`	string	Container status (e.g. `"running"`, `"exited"`).
`image`	string	Image name and tag.

{
  "payload": {
    "cpu_percent": 23.5,
    "memory_percent": 61.2,
    "memory_used_mb": 1245.0,
    "memory_total_mb": 2048.0,
    "disk_percent": 45.0,
    "disk_used_gb": 18.2,
    "disk_total_gb": 40.0,
    "load_avg_1m": 0.52,
    "load_avg_5m": 0.78,
    "uptime_seconds": 864000.0,
    "containers": [
      {"name": "web", "status": "running", "image": "myapp:latest"}
    ]
  }
}

`command.request`

Direction: Dashboard -> Agent Purpose: Execute a single whitelisted command.

Field	Type	Description
`command`	string	Command name from the registry (e.g. `"git_pull"`).
`params`	object	`dict[str, str]` of runtime parameter overrides. Empty `{}` when no overrides.
`hmac`	string	HMAC-SHA256 hex digest over the canonical request string (see below).
`nonce`	string	Unique random string. Prevents replay attacks.

{
  "payload": {
    "command": "docker_logs",
    "params": {"service": "celery", "lines": "50"},
    "hmac": "a1b2c3d4e5f6...",
    "nonce": "8f3a9b7c-unique-random"
  }
}

HMAC canonical format:

v1\n{command}\n{params_canonical}\n{nonce}\n{timestamp}

Where params_canonical is the sorted key=value pairs joined by & (e.g. lines=50&service=celery). When params is empty, the canonical string contains an empty component between the separators: v1\ngit_pull\n\nnonce\ntimestamp.

The agent verifies the HMAC, checks the nonce hasn't been seen, confirms the timestamp is within the configured expiry window, validates runtime params against the command's ParamDef declarations, and only then executes the command.

`command.sequence`

Direction: Dashboard -> Agent Purpose: Execute an ordered sequence of commands (the "Deploy" button).

Field	Type	Description
`sequence_id`	string	UUID identifying this sequence. All results reference it.
`commands`	array of strings	Ordered command names to execute.
`stop_on_failure`	boolean	If true, halt the sequence on the first failed step.
`hmac`	string	HMAC-SHA256 hex digest over the full payload.
`nonce`	string	Unique random string.

{
  "payload": {
    "sequence_id": "a1b2c3d4-5678-9012-3456-789012345678",
    "commands": ["git_pull", "docker_build", "docker_down", "docker_up"],
    "stop_on_failure": true,
    "hmac": "f6e5d4c3b2a1...",
    "nonce": "seq-nonce-unique"
  }
}

The dashboard sends the command list -- there is no hardcoded default sequence. All command names are validated against the registry before any execution begins. A typo in the last step won't cause earlier steps to run and then fail.

`command.result`

Direction: Agent -> Dashboard When: After each command execution (individual or within a sequence).

Field	Type	Description
`request_id`	string	UUID of the originating command request, or a per-step UUID for sequences.
`command`	string	The command name that was executed.
`group`	string	Command group (e.g. `"deploy"`).
`success`	boolean	Whether the command exited with code 0.
`exit_code`	integer	Process exit code. `-1` for timeout, binary not found, or OS error.
`stdout`	string	Standard output from the command.
`stderr`	string	Standard error from the command.
`duration_ms`	integer	Execution time in milliseconds.
`sequence_id`	string or null	If this result is part of a sequence, the sequence UUID. Null for individual commands.
`failure_reason`	string or null	Null on success. Categorizes the failure mode when `success` is false.

failure_reason values:

Value	Meaning
`null`	Command succeeded (exit code 0).
`"exit_code"`	Command ran but exited non-zero. Check `exit_code` and `stderr` for details.
`"timeout"`	Command exceeded its timeout. `stdout`/`stderr` may contain partial output.
`"not_found"`	The command binary doesn't exist on this system (e.g. Docker not installed).
`"os_error"`	OS-level failure (permissions, resource limits, etc.). `stderr` contains the error message.

{
  "payload": {
    "request_id": "b2c3d4e5-6789-0123-4567-890123456789",
    "command": "git_pull",
    "group": "deploy",
    "success": true,
    "exit_code": 0,
    "stdout": "Already up to date.\n",
    "stderr": "",
    "duration_ms": 342,
    "sequence_id": "a1b2c3d4-5678-9012-3456-789012345678",
    "failure_reason": null
  }
}

During a deploy sequence, each step produces its own command.result as it completes. The dashboard receives real-time progress -- it doesn't wait for the full sequence to finish.

For commands with long_running: true in their registered metadata, the command.result is the terminal event — preceded by one or more command.progress events — and may arrive significantly later than the originating command.request. Dashboards must hold the pending state for these commands beyond the static timeout and treat agent disconnect mid-job distinctly from generic timeout. See command.progress and Long-running commands below.

`command.progress`

Direction: Agent -> Dashboard When: During execution of a long_running command. Emitted at least once (the initial stage="starting" event) and as often as is useful while the job runs. Terminated by exactly one command.result.

Field	Type	Description
`request_id`	string	UUID of the originating `command.request`. Correlates the progress event back to the pending command.
`command`	string	The command name (same as in the originating request).
`group`	string	Command group (e.g. `"garage"`).
`stage`	string	One of `"starting"`, `"running"`, `"finalizing"`. See table below.
`current`	integer	Units of work completed so far (e.g. objects deleted). `0` in the initial `starting` event.
`total`	integer or null	Total units of work, when known. `null` when the agent does not yet know the total (e.g. before an initial listing pass completes).
`message`	string	Human-readable status line for the dashboard to display. May be empty.
`rate_bytes_per_sec`	integer or null	Instantaneous transfer rate. Emitted by transfer commands (`rclone_migrate`) from rclone's `core/stats`. `null` on every non-transfer command.
`eta_seconds`	integer or null	Estimated seconds remaining for this command. `null` on non-transfer commands, and also `null` early in a transfer, before rclone has enough samples to estimate.
`objects_current`	integer or null	Objects/files completed so far. Distinct from `current`, which counts bytes on a transfer command. `null` on non-transfer commands.
`objects_total`	integer or null	Total objects/files to transfer. `null` on non-transfer commands.

The four transfer fields are optional and were added after the initial v: 1 protocol. They needed no version bump: command.progress only ever flows agent → dashboard, an agent that predates them simply omits them, and both sides ignore payload keys they do not recognize. They are aggregates only. The agent never places a filename or a per-object record on this channel, even though rclone's own stats object carries one.

stage values:

Value	Meaning
`"starting"`	Sent exactly once at the very beginning, after the agent has accepted the job and before any real work begins. `current` is `0`; `total` may still be `null`.
`"running"`	Sent during the body of the job, typically once per batch or per meaningful chunk of work. `current` advances; `total` is set once known.
`"finalizing"`	Optional. Sent when the job is past the bulk of its work but is still doing cleanup (final reconciliation, summary computation) before the terminal `command.result`.

{
  "payload": {
    "request_id": "b2c3d4e5-6789-0123-4567-890123456789",
    "command": "garage_bucket_clear",
    "group": "garage",
    "stage": "running",
    "current": 2000,
    "total": 5000,
    "message": "Deleted batch 2 of 5"
  }
}

Heartbeats continue normally during a long-running job — command.progress does not substitute for heartbeat. If the dashboard sees no progress for an unusually long stretch but heartbeats continue, the agent is alive and the job is simply slow (typically a Garage performance signal, not an agent fault).

`log.batch`

Direction: Agent -> Dashboard Interval: Per-group, configured via ship_interval_seconds (minimum 5s, default 10s). Purpose: Ship a batch of parsed log entries from a tailed source.

Field	Type	Description
`group`	string	Log group name (e.g. `"storage"`, `"pulse"`, `"network"`). Matches a name advertised in the `register` payload's `log_groups`.
`parser`	string	Parser used for these entries: `"garage_s3"`, `"stormpulse"`, `"caddy_json"`, or `"docker_raw"`. Tells the dashboard which schema each line follows.
`batch_id`	string	UUID identifying this batch. Included in the matching `log.batch.ack` so the agent can advance the file position.
`lines`	array of objects	Parsed log entries. Each line's schema depends on `parser`. Empty when only `dropped > 0`. Maximum 200 entries per batch.
`dropped`	integer	Count of lines discarded this interval (parse failures, oversize lines, etc.). Useful for monitoring source health.
`from_position`	integer or string	Position marker at the start of this batch. Byte offset for file sources; ISO 8601 timestamp for docker sources.
`to_position`	integer or string	Position marker after the last line in this batch. Same type as `from_position`. The agent advances its stored position to this value only after receiving `log.batch.ack`.

{
  "payload": {
    "group": "storage",
    "parser": "garage_s3",
    "batch_id": "c7d8e9f0-1234-5678-9012-345678901234",
    "lines": [
      {
        "ts": "2026-04-10T13:23:51.766230Z",
        "client_ip": "71.19.243.102",
        "key_id": "GKc8a2eafe464b4754187172d0",
        "method": "HEAD",
        "bucket": "usr-1-obsidian-vault",
        "object_key": "",
        "truncated": false
      }
    ],
    "dropped": 0,
    "from_position": 12480,
    "to_position": 12612
  }
}

The agent buffers each batch's (group, to_position) keyed by batch_id until the dashboard acknowledges it. If no log.batch.ack arrives within 30 seconds, the entry is discarded — the un-advanced file position means the next interval will re-read the same lines, ensuring at-least-once delivery without retry logic. This also means the agent gracefully no-ops if the dashboard doesn't yet handle log.batch: positions stay at 0, batches keep being sent, and once the dashboard ships an implementation, the backlog drains naturally.

Long-running commands

Some commands perform work that cannot complete inside a normal request-response timeout — bulk object deletes, large data transfers, repair operations. The protocol supports these via the long_running flag on command metadata and the command.progress message type.

The dashboard recognizes a long-running command from the long_running: true field in its registered metadata (sent in the register payload). For these commands, the lifecycle expands from two events to three or more:

Event	Direction	When
`command.request`	Dashboard → Agent	Initial dispatch, same as a normal command.
`command.progress` (stage `"starting"`)	Agent → Dashboard	Sent immediately after the agent accepts the job. Confirms the agent is working on it.
`command.progress` (stage `"running"`)	Agent → Dashboard	Zero or more times during the body of the work. Typically once per batch or per meaningful chunk.
`command.progress` (stage `"finalizing"`)	Agent → Dashboard	Optional. Sent during cleanup before the terminal result.
`command.result`	Agent → Dashboard	Exactly once, terminal. Carries success/failure and the final summary.

All events for a single job share the same request_id (the UUID of the originating command.request's envelope, surfaced as request_id in the result and progress payloads).

Dashboard responsibilities for long-running commands:

Hold the pending state beyond the static timeout. A reasonable upper bound is one hour for v1; revisit if real workloads exceed this.
Render long-running UX (progress bar, stage message, elapsed time) in place of the synchronous "running…" indicator.
Treat agent disconnect mid-job as a distinct failure mode from timeout. The job is dead — the dashboard should surface a clean "agent went offline mid-job" state and allow the user to retry. The protocol does not support resuming a long-running job across agent reconnects in v1.
Tolerate command.progress arriving out of order with respect to other unrelated messages. Heartbeats, metrics, and log batches continue normally during a long-running job.

Agent responsibilities:

Emit command.progress with stage="starting" immediately on accepting the job, before any real work begins.
Emit progress at meaningful granularity — typically per-batch — without flooding the channel. Per-individual-object progress is too noisy for any realistic workload.
Continue sending heartbeats during the job. command.progress is additional signal, not a substitute.
Always emit exactly one terminal command.result. If the job fails, the result carries success: false and an appropriate failure_reason.

Sealed commands

One command in the registry, run_verify_block, is gated by the host-side sign-off seal. Its argv carries opaque shell text on the wire (["/bin/bash", "-c", "{verify_command}"], 4 KiB cap, no regex on the payload), which breaks the whitelist-only execution contract of Layer 4. The seal is the layer that bounds when that's allowed. The seal lives on the host filesystem, not on the wire: the dashboard observes it via the signoff_sealed field in the register payload, but cannot toggle it. See Security Architecture — Sign-off seal for the threat model.

`run_verify_block`

Agents that advertise signoff_sealed MUST implement this command, and MUST exclude it from the register payload's commands map when sealed.

Request params (in the command.request envelope):

Field	Type	Required	Description
`verify_command`	string	yes	The shell command to execute. Sourced from a sign-off checklist row on the dashboard. 4 KiB byte cap. No semantic constraint on contents.

Result behavior (in the terminal command.result envelope):

Unsealed at dispatch time (signoff_sealed: false at last register): the agent re-stats the seal marker on receipt, executes verify_command via /bin/bash -c, captures stdout/stderr/exit code, and returns command.result with the standard fields. success is true iff exit was 0.
Sealed at dispatch time (signoff_sealed: true at last register, or seal marker present when the agent re-stats on receipt): the agent does NOT execute the command. It returns command.result with success: false, exit_code: -1, failure_reason: "signoff_sealed", and a stderr line pointing the operator at stormpulse signoff unseal. The dashboard re-syncs Server.signoff_sealed to true from this refusal and resets the corresponding checklist row to pending rather than failed.

Two-layer enforcement, both authoritative: build_registry() excludes the command at registry construction when sealed, and _handle_command_request re-stats the seal marker on receipt so an operator sealing mid-run takes effect immediately. The dashboard's own dispatch gate is cosmetic by design (prevents disabled buttons from sending no-op envelopes); the agent's check is the one that bounds execution.

Dashboard response types

The dashboard sends acknowledgement and error messages back to the agent. These use the standard envelope structure. The agent logs them at debug level and takes no action -- they exist so the dashboard can confirm receipt and so operators can trace the full message flow in logs.

Type	Direction	Purpose
`register.ok`	Dashboard → Agent	Confirms registration succeeded.
`heartbeat.ack`	Dashboard → Agent	Confirms heartbeat received.
`metrics.ack`	Dashboard → Agent	Confirms metrics stored.
`command.result.ack`	Dashboard → Agent	Confirms result received.
`log.batch.ack`	Dashboard → Agent	Confirms a `log.batch` was stored. Carries `batch_id` so the agent can advance the matching file position.
`error`	Dashboard → Agent	Something went wrong (invalid token, unknown agent, etc.).

`register.ok`

Sent after a successful register. Payload is empty or may contain dashboard-defined fields (agents ignore the payload).

{"type": "register.ok", "payload": {}}

`heartbeat.ack`

Sent after each heartbeat. Payload is empty.

{"type": "heartbeat.ack", "payload": {}}

`metrics.ack`

Sent after each metrics.push. Payload is empty.

{"type": "metrics.ack", "payload": {}}

`command.result.ack`

Sent after each command.result. Payload is empty.

{"type": "command.result.ack", "payload": {}}

`log.batch.ack`

Sent after each log.batch is successfully stored. Unlike the other acks, the payload is not empty — it carries the batch_id so the agent can match the ack to a pending batch and advance its stored file position.

Field	Type	Description
`batch_id`	string	UUID of the `log.batch` being acknowledged. Must match the `batch_id` from the original batch payload.
`group`	string	Optional. Log group name, included for traceability. The agent doesn't require it.

{
  "type": "log.batch.ack",
  "payload": {
    "batch_id": "c7d8e9f0-1234-5678-9012-345678901234",
    "group": "storage"
  }
}

If the agent receives a log.batch.ack for an unknown batch_id (e.g. one that already timed out), it logs at debug and ignores it.

`error`

Sent when the dashboard encounters a problem with a message from the agent. The payload may include a human-readable message. The agent logs this at debug level -- it cannot take corrective action.

{"type": "error", "payload": {"message": "Unknown pulse_token"}}

Serialization

JSON encoding uses compact separators ("," and ":") with no whitespace padding.
Timestamps use ISO 8601 format with Z suffix for UTC (not +00:00).
Non-UTC offsets are preserved as-is (e.g. +05:30).
dataclasses.asdict() handles payload serialization for outbound agent messages.
Inbound messages are parsed as raw dicts on the envelope. Consuming code parses into typed payload dataclasses after matching on type.

Versioning

The v field exists to support future protocol changes without breaking deployed agents.

Current version: 1

Rules:

Adding new fields to existing payloads is backward-compatible. Parsers ignore unknown fields.
Adding new message types requires a version bump only if old agents must understand them.
Changing the meaning of existing fields, removing fields, or changing the envelope structure requires incrementing v.
An agent that receives v > 1 rejects the message and logs a warning. It does not crash -- the connection stays open for messages it can understand.
The dashboard should track each agent's protocol version (from the register message) and avoid sending messages the agent can't parse.
New message types added within v1 (e.g. command.progress) are additive but not silently ignored. A current parser rejects unknown types with ProtocolError, so a dashboard receiving a type it doesn't know will fail that message. In practice this means: deploy dashboard updates before agents that emit new message types. New types only ride alongside opt-in features (such as long_running commands), so agents won't emit them unless an updated dashboard dispatched the work.

Enrollment Protocol

Enrollment is a separate HTTPS protocol used once to bootstrap an agent's credentials before it can connect over WebSocket. It does not use the envelope structure above.

Flow

Agent generates an EC P-256 keypair locally. The private key never leaves the machine.
Agent builds a CSR (Certificate Signing Request) with CN=<agent_id>, signed with the private key.
Agent POSTs the CSR and a one-time enrollment token to the dashboard.
Dashboard validates the token, signs the CSR with the private CA, and returns the signed cert, CA cert, and HMAC key.
Agent writes credentials to disk.

Request

POST /api/enroll/ over standard HTTPS (no client cert — the agent doesn't have one yet).

{
    "agent_id": "vps-toronto-01",
    "token": "one-time-enrollment-token",
    "csr_pem": "-----BEGIN CERTIFICATE REQUEST-----\n...\n-----END CERTIFICATE REQUEST-----\n"
}

Field	Type	Description
`agent_id`	string	Unique agent identifier. Must match the CSR's CN.
`token`	string	One-time enrollment token from the dashboard. Burned after use.
`csr_pem`	string	PEM-encoded CSR. Must contain `CN=<agent_id>` and a valid signature.

Response (200)

{
    "client_cert_pem": "-----BEGIN CERTIFICATE-----\n...",
    "ca_cert_pem": "-----BEGIN CERTIFICATE-----\n...",
    "hmac_key": "base64-encoded-32-byte-hmac-key"
}

Field	Type	Description
`client_cert_pem`	string	PEM-encoded client certificate (the signed CSR).
`ca_cert_pem`	string	PEM-encoded CA certificate for mTLS trust chain.
`hmac_key`	string	Base64-encoded HMAC-SHA256 shared secret for command authentication.

Errors

Code	Meaning
`400`	Malformed request, invalid CSR PEM, or CSR CN does not match `agent_id`.
`401`	Invalid or expired enrollment token.
`409`	Agent ID already enrolled.

Error body: {"error": "human-readable message"}.