3 Caddy Integration
Mathew Storm edited this page 2026-05-26 13:13:28 -04:00

Caddy Integration

Storm Pulse can manage a Caddy reverse-proxy on the same VPS as a "drop-in" fragment owner — the agent receives a Caddyfile fragment from the dashboard (Storm Cellar), writes it atomically to a path that the main Caddyfile imports, and tells the running Caddy process to reload via its admin HTTP API. This is the mechanism behind per-region custom-domain hosting for customer buckets.

Overview

When Caddy integration is enabled, Storm Pulse:

  • Registers a single long-running command, cellar_custom_domain_caddy_sync, that accepts a Caddyfile fragment from the dashboard and applies it atomically.
  • Talks to Caddy over its admin HTTP API (POST /load with Content-Type: text/caddyfile). No docker exec, no docker-socket permissions, no container_name field — Caddy can be running natively, in Docker, or anywhere else reachable on admin_url.
  • Persists the fragment to a drop-in file (e.g. /etc/caddy/conf.d/cellar-custom-domains.caddy) so a Caddy restart preserves the active configuration.
  • Refuses to start the agent if the main Caddyfile does not actually import the drop-in path — see Boot-time verification.

The agent does not push Caddy state to the dashboard. There is no caddy field in the register or metrics.push payloads. The dashboard owns the manifest of which domains should be live; the agent's role is to apply the fragment it's told to apply.

Requirements

  • A running Caddy 2.x with its admin API enabled (default: http://localhost:2019).
  • A main Caddyfile on the host that contains an import directive for the drop-in path the agent will write to. Exact-path imports and fnmatch globs (import conf.d/*.caddy) both work.
  • Filesystem write access to the parent directory of the drop-in path. The agent runs as the operator's admin user; ensure that user can read and write the drop-in directory (typically by being in a group Caddy can also access).

Setup — stormpulse caddy init

After enrollment, on the VPS:

stormpulse caddy init

The wizard:

  1. Detects the main Caddyfile. Search paths in order: /etc/caddy/Caddyfile, /opt/caddy/Caddyfile, /opt/garage/Caddyfile. Override with --main-caddyfile /custom/path/Caddyfile.
  2. Prompts for the admin URL. Default: http://localhost:2019.
  3. Prompts for the drop-in path. Default: <main_caddyfile_dir>/conf.d/cellar-custom-domains.caddy.
  4. Verifies the import directive. Parses the main Caddyfile for import directives. Warns if none of them resolves to the chosen drop-in path. (The agent will refuse to start if this is still wrong at boot time — see below.)
  5. Writes the [caddy] section to ~/.config/stormpulse/stormpulse.toml.
  6. Offers to restart stormpulse so the new config takes effect.

Use --force to overwrite an existing [caddy] section. Use --config /custom/path/stormpulse.toml if your config lives outside ~/.config/stormpulse/.

Configuration

The [caddy] section added to ~/.config/stormpulse/stormpulse.toml:

[caddy]
enabled = true
admin_url = "http://localhost:2019"
main_caddyfile = "/etc/caddy/Caddyfile"
drop_in_path = "/etc/caddy/conf.d/cellar-custom-domains.caddy"

All four fields are required when the section is present.

Field Type Notes
enabled bool Must be true for the integration to activate.
admin_url string Must start with http:// or https://. Validated at config load.
main_caddyfile absolute path Must exist at config-load time. The agent reads it once at boot to check imports.
drop_in_path absolute path The agent writes here. The parent directory must exist; the file itself need not.

Main Caddyfile prerequisite

The main Caddyfile must import the drop-in path. Either form works:

# Exact path
import /etc/caddy/conf.d/cellar-custom-domains.caddy

# Glob (must match the drop-in filename)
import conf.d/*.caddy

The agent only writes the drop-in file. It does not modify the main Caddyfile. Wiring up the import is an operator step — done once, when Caddy is set up.

Boot-time verification

At every agent startup, before registering with the dashboard, the agent re-runs the import-directive check (verify_drop_in_imported). If no import directive in the main Caddyfile resolves to the configured drop_in_path, the agent refuses to start with a clear error:

ConfigError: Caddy configuration invalid: Main Caddyfile /etc/caddy/Caddyfile
does not import drop-in path /etc/caddy/conf.d/cellar-custom-domains.caddy.
Add an 'import' directive that resolves to that path.

This catches the silent failure mode where the fragment is written and Caddy is reloaded, but nothing in the served config actually references the fragment. Without this check, a misconfiguration looks healthy from the agent's side (writes succeed, reload returns 200) but customer domains stay dead. The check fails loudly at start instead of weeks later when a customer activation hangs.

The command: cellar_custom_domain_caddy_sync

This is the only command Caddy integration registers. It is long-running — it emits command.progress events between the originating command.request and the terminal command.result. See Protocol Specification — Long-running commands for the lifecycle pattern.

Required params

Param Validation Description
region regex [a-z0-9][a-z0-9-]{0,40}[a-z0-9] Region identifier; used for logging and the JobOutcome extras.
fragment max_bytes = 150000 The Caddyfile fragment text. Empty string ("") means "remove the drop-in file" — used when the region has zero active custom domains.

The fragment param uses byte-size validation rather than regex. Caddyfile syntax is multi-line and contains braces, quotes, and arbitrary site blocks; a regex can't sanely cover it. The 150 KB cap is sized for ~1000 active custom domains per region (typical fragment is ~150 bytes per domain).

Lifecycle

command.request                                          (dashboard → agent)
command.progress  stage="starting"                       (agent → dashboard)
command.progress  stage="running"   message="persist"
command.progress  stage="running"   message="POST /load"
command.progress  stage="finalizing"
command.result                                           (terminal)
  1. Persist — the agent atomically writes (or removes, if fragment == "") the drop-in file. Atomic write is write-to-tmp + rename. If the write fails, the result is failure_reason: "persist_failed"; Caddy is never contacted, the running config stays as-is.
  2. POST /load — the agent reads the main Caddyfile, absolutises any relative import paths (Caddy's admin API resolves imports relative to Caddy's working directory, not the source file, so leaving them relative would break composed reloads), and POSTs the full contents to Caddy's admin API as Content-Type: text/caddyfile. Caddy re-adapts the composed configuration from disk — picking up the drop-in we just wrote alongside every other site block and global directive in the main Caddyfile. If Caddy rejects the reload, the result is failure_reason: "reload_failed"; the drop-in is on disk (newer than live), and the next successful sync restores consistency.

Posting only the per-region fragment to /load would replace the entire running config with just those domain blocks — wiping every other site the main Caddyfile declares until an operator-initiated restart. Sending the composed main-Caddyfile body is what makes the drop-in/import pattern actually work at runtime, not just across Caddy restarts.

Terminal payload

The command.result payload includes the standard fields plus these top-level extras:

Field Type Description
region string The region passed in. Echoed so the dashboard can route the result.
fragment_bytes int Size of the fragment that was applied (0 if removed).
removed bool True if fragment == "" and the drop-in file was unlinked.

Failure modes

failure_reason When Recovery
persist_failed Disk write to the drop-in path failed (permissions, disk full, drop-in parent directory removed since boot). Caddy was never contacted; the running config is unchanged. Operator-level: fix filesystem permissions or directory state, then resync from the dashboard.
reload_failed Drop-in was persisted, but the subsequent Caddy reload failed. Three sub-cases: the main Caddyfile was unreadable at reload time, Caddy's admin returned non-2xx (most commonly a parse error in the new composed config), or the admin endpoint was unreachable. Disk is newer than the running config. Dashboard inspects stderr (contains Caddy's actual error), corrects, retries. The next successful sync restores consistency. A Caddy restart also recovers (re-reads from disk-truth).

Idempotence

cellar_custom_domain_caddy_sync is fully idempotent. The dashboard sends the desired-state fragment for the region; the agent applies it. Re-sending the same fragment is a no-op for customers and harmless for Caddy (reload is atomic). Resending after a partial failure is the standard recovery path.

Caddy log shipping

Caddy access logs and cert-lifecycle events ship through the standard log pipeline with parser = "caddy_json". The parser extracts:

  • HTTP requestsclient_ip, method, host, path, status, user_agent, duration_ms, bytes_sent.
  • Cert lifecycle events — log lines from any tls.* logger (tls.obtain, tls.issue, tls.renew, etc.) pass through with logger, msg, identifier, names, and error fields preserved. The dashboard classifies these into cert-issued / cert-failed / cert-renewed events. The agent does not interpret them.

See Log Shipping for [[log_groups]] block syntax and the parser table.

Security

The agent talks to Caddy's admin API at admin_url (default http://localhost:2019). Caddy's admin API has no authentication by default — it relies on being bound to localhost only. The agent inherits that trust model: if the operator configures a non-localhost admin_url, they are responsible for restricting access (firewall, mTLS proxy in front of :2019, etc.).

The fragment param is opaque content sent from the dashboard. The agent does not parse or validate Caddyfile syntax — that's Caddy's job during POST /load. A malicious-looking fragment is rejected by Caddy and returned to the dashboard as reload_failed. The agent's max_bytes = 150000 cap exists to prevent the dashboard from accidentally shipping an oversized payload through the protocol; it is not a security boundary.

A compromised dashboard can rewrite the running Caddy config on every agent it controls. This is in scope for the trust model — the dashboard is the HMAC-trusted authority by design. See Security Architecture for the full layered model.

Troubleshooting

Symptom Check
Agent refuses to start: "Caddy configuration invalid" The main Caddyfile does not import the drop-in path. Add an import directive and restart.
reload_failed on every sync Fetch the latest sync's stderr from the dashboard — Caddy's parse error is there. Common cause: a customer domain block references a backend the agent's Caddy can't resolve.
persist_failed after successful reload The drop-in path's parent directory isn't writable by the agent's user. Check ownership: ls -la /etc/caddy/conf.d/.
Fragments apply but custom domains 404 The Caddy process serving requests is not the one whose admin API the agent talks to. Common in containerized setups — ensure admin_url and the Caddy serving the public ports are the same instance.
Cert-event log lines missing Is the Caddy log group enabled with parser = "caddy_json"? Are TLS events emitted at INFO level or above in Caddy's log.level?