Table of Contents
Caddy Integration
Storm Pulse can manage a Caddy reverse-proxy on the same VPS as a "drop-in" fragment owner — the agent receives a Caddyfile fragment from the dashboard (Storm Cellar), writes it atomically to a path that the main Caddyfile imports, and tells the running Caddy process to reload via its admin HTTP API. This is the mechanism behind per-region custom-domain hosting for customer buckets.
Overview
When Caddy integration is enabled, Storm Pulse:
- Registers a single long-running command,
cellar_custom_domain_caddy_sync, that accepts a Caddyfile fragment from the dashboard and applies it atomically. - Talks to Caddy over its admin HTTP API (
POST /loadwithContent-Type: text/caddyfile). Nodocker exec, no docker-socket permissions, no container_name field — Caddy can be running natively, in Docker, or anywhere else reachable onadmin_url. - Persists the fragment to a drop-in file (e.g.
/etc/caddy/conf.d/cellar-custom-domains.caddy) so a Caddy restart preserves the active configuration. - Refuses to start the agent if the main Caddyfile does not actually import the drop-in path — see Boot-time verification.
The agent does not push Caddy state to the dashboard. There is no caddy field in the register or metrics.push payloads. The dashboard owns the manifest of which domains should be live; the agent's role is to apply the fragment it's told to apply.
Requirements
- A running Caddy 2.x with its admin API enabled (default:
http://localhost:2019). - A main
Caddyfileon the host that contains animportdirective for the drop-in path the agent will write to. Exact-path imports andfnmatchglobs (import conf.d/*.caddy) both work. - Filesystem write access to the parent directory of the drop-in path. The agent runs as the operator's admin user; ensure that user can read and write the drop-in directory (typically by being in a group Caddy can also access).
Setup — stormpulse caddy init
After enrollment, on the VPS:
stormpulse caddy init
The wizard:
- Detects the main Caddyfile. Search paths in order:
/etc/caddy/Caddyfile,/opt/caddy/Caddyfile,/opt/garage/Caddyfile. Override with--main-caddyfile /custom/path/Caddyfile. - Prompts for the admin URL. Default:
http://localhost:2019. - Prompts for the drop-in path. Default:
<main_caddyfile_dir>/conf.d/cellar-custom-domains.caddy. - Verifies the import directive. Parses the main Caddyfile for
importdirectives. Warns if none of them resolves to the chosen drop-in path. (The agent will refuse to start if this is still wrong at boot time — see below.) - Writes the
[caddy]section to~/.config/stormpulse/stormpulse.toml. - Offers to restart
stormpulseso the new config takes effect.
Use --force to overwrite an existing [caddy] section. Use --config /custom/path/stormpulse.toml if your config lives outside ~/.config/stormpulse/.
Configuration
The [caddy] section added to ~/.config/stormpulse/stormpulse.toml:
[caddy]
enabled = true
admin_url = "http://localhost:2019"
main_caddyfile = "/etc/caddy/Caddyfile"
drop_in_path = "/etc/caddy/conf.d/cellar-custom-domains.caddy"
All four fields are required when the section is present.
| Field | Type | Notes |
|---|---|---|
enabled |
bool | Must be true for the integration to activate. |
admin_url |
string | Must start with http:// or https://. Validated at config load. |
main_caddyfile |
absolute path | Must exist at config-load time. The agent reads it once at boot to check imports. |
drop_in_path |
absolute path | The agent writes here. The parent directory must exist; the file itself need not. |
Main Caddyfile prerequisite
The main Caddyfile must import the drop-in path. Either form works:
# Exact path
import /etc/caddy/conf.d/cellar-custom-domains.caddy
# Glob (must match the drop-in filename)
import conf.d/*.caddy
The agent only writes the drop-in file. It does not modify the main Caddyfile. Wiring up the import is an operator step — done once, when Caddy is set up.
Boot-time verification
At every agent startup, before registering with the dashboard, the agent re-runs the import-directive check (verify_drop_in_imported). If no import directive in the main Caddyfile resolves to the configured drop_in_path, the agent refuses to start with a clear error:
ConfigError: Caddy configuration invalid: Main Caddyfile /etc/caddy/Caddyfile
does not import drop-in path /etc/caddy/conf.d/cellar-custom-domains.caddy.
Add an 'import' directive that resolves to that path.
This catches the silent failure mode where the fragment is written and Caddy is reloaded, but nothing in the served config actually references the fragment. Without this check, a misconfiguration looks healthy from the agent's side (writes succeed, reload returns 200) but customer domains stay dead. The check fails loudly at start instead of weeks later when a customer activation hangs.
The command: cellar_custom_domain_caddy_sync
This is the only command Caddy integration registers. It is long-running — it emits command.progress events between the originating command.request and the terminal command.result. See Protocol Specification — Long-running commands for the lifecycle pattern.
Required params
| Param | Validation | Description |
|---|---|---|
region |
regex [a-z0-9][a-z0-9-]{0,40}[a-z0-9] |
Region identifier; used for logging and the JobOutcome extras. |
fragment |
max_bytes = 150000 |
The Caddyfile fragment text. Empty string ("") means "remove the drop-in file" — used when the region has zero active custom domains. |
The fragment param uses byte-size validation rather than regex. Caddyfile syntax is multi-line and contains braces, quotes, and arbitrary site blocks; a regex can't sanely cover it. The 150 KB cap is sized for ~1000 active custom domains per region (typical fragment is ~150 bytes per domain).
Lifecycle
command.request (dashboard → agent)
command.progress stage="starting" (agent → dashboard)
command.progress stage="running" message="persist"
command.progress stage="running" message="POST /load"
command.progress stage="finalizing"
command.result (terminal)
- Persist — the agent atomically writes (or removes, if
fragment == "") the drop-in file. Atomic write iswrite-to-tmp + rename. If the write fails, the result isfailure_reason: "persist_failed"; Caddy is never contacted, the running config stays as-is. - POST
/load— the agent reads the main Caddyfile, absolutises any relativeimportpaths (Caddy's admin API resolves imports relative to Caddy's working directory, not the source file, so leaving them relative would break composed reloads), and POSTs the full contents to Caddy's admin API asContent-Type: text/caddyfile. Caddy re-adapts the composed configuration from disk — picking up the drop-in we just wrote alongside every other site block and global directive in the main Caddyfile. If Caddy rejects the reload, the result isfailure_reason: "reload_failed"; the drop-in is on disk (newer than live), and the next successful sync restores consistency.
Posting only the per-region fragment to /load would replace the entire running config with just those domain blocks — wiping every other site the main Caddyfile declares until an operator-initiated restart. Sending the composed main-Caddyfile body is what makes the drop-in/import pattern actually work at runtime, not just across Caddy restarts.
Terminal payload
The command.result payload includes the standard fields plus these top-level extras:
| Field | Type | Description |
|---|---|---|
region |
string | The region passed in. Echoed so the dashboard can route the result. |
fragment_bytes |
int | Size of the fragment that was applied (0 if removed). |
removed |
bool | True if fragment == "" and the drop-in file was unlinked. |
Failure modes
failure_reason |
When | Recovery |
|---|---|---|
persist_failed |
Disk write to the drop-in path failed (permissions, disk full, drop-in parent directory removed since boot). Caddy was never contacted; the running config is unchanged. | Operator-level: fix filesystem permissions or directory state, then resync from the dashboard. |
reload_failed |
Drop-in was persisted, but the subsequent Caddy reload failed. Three sub-cases: the main Caddyfile was unreadable at reload time, Caddy's admin returned non-2xx (most commonly a parse error in the new composed config), or the admin endpoint was unreachable. Disk is newer than the running config. | Dashboard inspects stderr (contains Caddy's actual error), corrects, retries. The next successful sync restores consistency. A Caddy restart also recovers (re-reads from disk-truth). |
Idempotence
cellar_custom_domain_caddy_sync is fully idempotent. The dashboard sends the desired-state fragment for the region; the agent applies it. Re-sending the same fragment is a no-op for customers and harmless for Caddy (reload is atomic). Resending after a partial failure is the standard recovery path.
Caddy log shipping
Caddy access logs and cert-lifecycle events ship through the standard log pipeline with parser = "caddy_json". The parser extracts:
- HTTP requests —
client_ip,method,host,path,status,user_agent,duration_ms,bytes_sent. - Cert lifecycle events — log lines from any
tls.*logger (tls.obtain,tls.issue,tls.renew, etc.) pass through withlogger,msg,identifier,names, anderrorfields preserved. The dashboard classifies these into cert-issued / cert-failed / cert-renewed events. The agent does not interpret them.
See Log Shipping for [[log_groups]] block syntax and the parser table.
Security
The agent talks to Caddy's admin API at admin_url (default http://localhost:2019). Caddy's admin API has no authentication by default — it relies on being bound to localhost only. The agent inherits that trust model: if the operator configures a non-localhost admin_url, they are responsible for restricting access (firewall, mTLS proxy in front of :2019, etc.).
The fragment param is opaque content sent from the dashboard. The agent does not parse or validate Caddyfile syntax — that's Caddy's job during POST /load. A malicious-looking fragment is rejected by Caddy and returned to the dashboard as reload_failed. The agent's max_bytes = 150000 cap exists to prevent the dashboard from accidentally shipping an oversized payload through the protocol; it is not a security boundary.
A compromised dashboard can rewrite the running Caddy config on every agent it controls. This is in scope for the trust model — the dashboard is the HMAC-trusted authority by design. See Security Architecture for the full layered model.
Troubleshooting
| Symptom | Check |
|---|---|
| Agent refuses to start: "Caddy configuration invalid" | The main Caddyfile does not import the drop-in path. Add an import directive and restart. |
reload_failed on every sync |
Fetch the latest sync's stderr from the dashboard — Caddy's parse error is there. Common cause: a customer domain block references a backend the agent's Caddy can't resolve. |
persist_failed after successful reload |
The drop-in path's parent directory isn't writable by the agent's user. Check ownership: ls -la /etc/caddy/conf.d/. |
| Fragments apply but custom domains 404 | The Caddy process serving requests is not the one whose admin API the agent talks to. Common in containerized setups — ensure admin_url and the Caddy serving the public ports are the same instance. |
| Cert-event log lines missing | Is the Caddy log group enabled with parser = "caddy_json"? Are TLS events emitted at INFO level or above in Caddy's log.level? |