6 Garage Integration
Mathew Storm edited this page 2026-05-26 13:13:28 -04:00

Garage Integration

Storm Pulse supports first-class integration with Garage S3 nodes. When enabled, the agent automatically collects Garage cluster state, reports it to the dashboard alongside system metrics, and exposes whitelisted commands for managing buckets and keys - all without opening a terminal.

Overview

When Garage integration is enabled, Storm Pulse:

  • Collects node status, zone, capacity, and version on every connection
  • Reports bucket names, sizes, object counts, aliases, and key permissions as a manifest
  • Refreshes Garage state every 30 seconds (configurable)
  • Exposes 27 whitelisted commands across info, admin-plane mutations, alias management, customer bucket provisioning, and data-plane operations
  • Protects API key secrets - never logged at any level

Requirements

  • Garage running as a Docker container (official dxflrs/garage image)
  • Container accessible via docker exec from the operator's admin user (the agent runs against rootless dockerd)
  • /opt/garage/garage.toml present (or a custom path — see below)

Setup — new server

If you are setting up Storm Pulse on a fresh Garage node, run stormpulse init from /opt/garage/:

cd /opt/garage
stormpulse init

The wizard detects /opt/garage/garage.toml automatically and prompts:

Checking for Garage installation...
  Found: /opt/garage/garage.toml

Enable Garage integration? [Y/n]:

If you confirm, Garage configuration is written into stormpulse.toml alongside the standard config. No separate step needed.

Setup — existing enrolled agent

If Storm Pulse is already enrolled and running, add Garage integration without re-running the full init wizard:

stormpulse garage init

The wizard auto-detects your Garage installation and prompts for confirmation:

Garage installation detected at /opt/garage/garage.toml

Container name [garaged]:
Garage binary [/garage]:
Docker binary [/usr/bin/docker]:
State push interval seconds [30]:

Enable Garage integration? [Y/n]: y
  [garage] section written to ~/.config/stormpulse/stormpulse.toml

Restart stormpulse now? [Y/n]: y

Press Enter to accept defaults. The container name is auto-detected from your docker-compose.yml.

Use --force to overwrite an existing [garage] section:

stormpulse garage init --force

Use --garage-config if your Garage config is not in a standard location:

stormpulse garage init --garage-config /custom/path/garage.toml

Configuration

The [garage] section added to ~/.config/stormpulse/stormpulse.toml:

[garage]
enabled = true
container_name = "garaged"       # Docker container name
garage_binary = "/garage"        # Path to garage binary inside the container
docker_binary = "/usr/bin/docker" # Absolute path to docker on the host
config_path = "/opt/garage/garage.toml" # Used for detection only
state_push_interval_seconds = 30 # How often Garage state is refreshed (manifest cadence)

Detection scan paths (checked in order if --garage-config not specified):

  • /opt/garage/garage.toml
  • /etc/garage/garage.toml
  • ./garage.toml

Verifying it works

After restart, check the agent logs:

journalctl --user -u stormpulse -f

You should see:

INFO Garage node detected, collecting initial state
INFO Sent register (v0.1.1)

On the dashboard, the Garage node's server record will include cluster state — zone, capacity, data available, bucket list, and version.

Available commands

Admin-plane commands run via docker exec <container> /garage <subcommand> with absolute paths and shell=False. Data-plane and provisioning commands are handled directly by the agent — see Customer bucket provisioning and Data-plane operations below for details.

All command names are prefixed garage_. The tables below omit the prefix for readability.

Informational

Command Description Params
status Show node status and health
stats Show cluster statistics
bucket_list List all buckets
bucket_info Show bucket details bucket_name
key_list List all API keys

Admin-plane mutations

All non-confirmation defaults are No. Commands marked Yes prompt the dashboard for explicit confirmation before dispatch.

Command Description Params Confirm
bucket_create Create a new bucket bucket_name No
bucket_delete Delete a bucket bucket_name Yes
key_create Create a new API key (returns secret in stdout) key_name No
key_delete Delete an API key key_id Yes
bucket_allow Grant full access to a bucket for a key bucket_name, key_id No
bucket_allow_rw Grant read+write access bucket_name, key_id No
bucket_allow_ro Grant read-only access bucket_name, key_id No
bucket_deny Revoke all access to a bucket for a key bucket_name, key_id Yes
bucket_website_allow Enable static website hosting bucket_name, index_document (default index.html), error_document (default 404.html) No
bucket_website_deny Disable static website hosting bucket_name Yes

Alias management

See Aliases below for what local vs. global aliases mean and how they appear in the manifest.

Command Description Params Confirm
bucket_alias_global_add Attach a global alias to a bucket bucket_name (UUID or existing alias), new_alias No
bucket_alias_global_remove Detach a global alias alias_name Yes
bucket_alias_local_add Attach a local alias (scoped to a key) key_id, bucket_name, new_alias No
bucket_alias_local_remove Detach a local alias key_id, alias_name Yes

Internal

Command Description Long-running
refresh Trigger immediate state collection and metrics push No

Customer bucket provisioning (long-running)

These commands orchestrate multi-step Garage operations with rollback on failure. See Customer bucket provisioning for the orchestration model.

Command Description Params Confirm Sensitive output
provision_customer_bucket Create bucket + admin key + local alias atomically display_name, key_name_admin No Yes
delete_provisioned_bucket Delete bucket, all local aliases, and orphaned keys bucket_id Yes No
provision_additional_key Create a tier-specific key for an existing provisioned bucket new_key_name, bucket_id, local_alias, key_tier (rw|ro) No Yes
rotate_customer_key Create a new key, transfer permissions, delete old key old_key_id, new_key_name, bucket_id, local_alias, key_tier (all|rw|ro) No Yes

Data-plane operations (long-running)

These commands talk directly to the local Garage S3 endpoint via the agent's built-in SigV4 client (no docker exec, no boto3). They require S3 credentials in the params. See Data-plane operations.

Command Description Confirm Sensitive output
bucket_clear Bulk-delete every object in a bucket Yes Yes
bucket_set_cors Set CORS rules on a bucket No Yes
walk_bucket_stats Count objects and bytes under a prefix No Yes

Aliases

Garage has two kinds of bucket aliases. Both appear in the manifest; both are first-class in the command surface.

Global aliases are cluster-wide names for a bucket. Every global alias is unique across the entire Garage cluster. Customers can reach a bucket by global alias on any key that has permission. Use bucket_alias_global_add / bucket_alias_global_remove to manage them.

Local aliases are per-key names. A local alias documents on key A can coexist with a totally different bucket called documents on key B. Local aliases are what Storm Cellar uses for per-customer naming — the customer's bucket appears as display_name from their own key, without that name being claimed cluster-wide. Use bucket_alias_local_add / bucket_alias_local_remove to manage them.

In the manifest:

  • Every bucket entry carries a bucket_id (the 16-char Garage UUID). This is the join key for dashboard reconciliation — aliases are not unique across tenants and cannot be used for tenant attribution.
  • The per-bucket aliases field lists both global and local aliases. Local aliases include the key_id they're scoped to.

Garage's orphan rule: a bucket must have at least one alias (global or local) at all times. You cannot remove the last alias from a live bucket. delete_provisioned_bucket handles this automatically (attaches a temporary global alias if needed, then deletes the bucket through that reference); plain bucket_delete does not — try to delete a bucket with no aliases via bucket_delete and Garage refuses.

Scheduled state

Garage state is collected once on connection (included in the register payload) and refreshed every state_push_interval_seconds thereafter (default 30s). Each refresh runs garage status, garage stats, garage key list, and garage bucket info for each bucket, and includes the result in the next metrics.push payload.

Bucket state includes size, object count, key permissions, website hosting status (website_access, website_index_document, website_error_document), and quotas.

All keys are included in state — both bucket-linked keys (with permissions) and unlinked keys. The top-level keys list contains every key by ID and name; per-bucket key references include permissions.

On-demand refresh

After a mutation (bucket create, key create, etc.), the dashboard can dispatch garage_refresh to trigger an immediate state collection. The agent collects fresh state, sends a command.result confirming success, then immediately sends a metrics.push with the updated Garage data. This avoids waiting up to 30 seconds for the next scheduled refresh.

Long-running commands in the garage group also trigger this auto-refresh on success — see Manifest contract below.

Manifest contract (Storm-side reconciler)

The dashboard side (Storm Cellar) treats this Garage-state payload as a manifest and the agent-reported view as the source of truth. Storm-side CustomerBucket / CustomerKey rows are projections of what the manifest last reported. The full design lives in the dashboard repo at _architecture/specs/storm-pulse-manifest-foundation.md.

What this means for the Pulse contract:

  • The state collector is load-bearing. Bucket/key/permission/alias data must be complete and accurate per push. A bucket Garage has but the manifest omits will be reconciled away on the dashboard side. A bucket the manifest reports but Garage doesn't will be flagged as a divergence on the dashboard.
  • bucket_id (16-char Garage UUID) is the join key for tenant attribution on the dashboard side. It must be present per bucket entry. Aliases are not unique across tenants and cannot be used as join keys.
  • Per-bucket key list is also load-bearing. The dashboard's per-key reconciler joins on key_id per bucket. A key Storm has on a bucket that the manifest's per-bucket key list omits will be reconciled away (subject to a 30s grace window for in-flight rotations). Catches force-revoke, ops-side key delete.
  • Cadence default is 30s. state_push_interval_seconds defaults to 30 in prompt_garage_values. Bypass-path operations (internal admin actions, direct Garage CLI) reconcile in ≤30s. Older deployments initialized with the 300 default should be retoggled to 30.
  • Auto-refresh after long-running commands is implemented in agent._post_success_hook. Every successful long-running command in the garage group triggers an immediate metrics.push carrying fresh state. Customer-initiated ops reconcile in <1s.
  • No new envelope is required. The existing metrics.push already carries the manifest shape. The Storm-side reconciler consumes it via cellar_relay.relay_customer_metrics.

Customer bucket provisioning

Storm Cellar provisions customer buckets through four long-running commands that orchestrate multi-step Garage operations. Each is a single dispatchable unit, but internally runs a sequence of garage CLI calls and rolls back on partial failure. The orchestration model exists because no single Garage primitive does what Cellar needs (e.g. "create a bucket with a tenant-scoped local alias and an admin key in one atomic action"), and because partial state from a half-failed multi-step would diverge from the manifest.

garage_provision_customer_bucket

Creates a bucket, attaches a local alias scoped to a new admin key, and grants the key full access. The output secret rides in the result payload (sensitive output flag is set).

Internal step order:

  1. Create bucket with a throwaway global alias (Garage requires an alias to exist; a throwaway lets us avoid claiming a customer name globally).
  2. Create the admin key with key_name_admin.
  3. Grant the admin key all permissions on the bucket.
  4. Attach the local alias display_name (scoped to the admin key) — this is the customer-facing reference.
  5. Remove the throwaway global alias.

On failure at any step, the orchestrator rolls back already-completed steps in reverse and reports failure_reason naming the failed step (e.g. admin_key_create_failed, unalias_throwaway_failed). If rollback itself fails, failure_reason="rollback_failed" and the dashboard surfaces a manual-cleanup state.

garage_delete_provisioned_bucket

Deletes a provisioned bucket, all its local aliases, and any keys that no longer have access to other buckets.

Internal step order:

  1. bucket info <bucket_id> — enumerate existing aliases and keys with permissions.
  2. If no global alias exists, attach a temporary one (required by Garage's "every bucket must have an alias" rule before delete is permitted).
  3. Detach every local alias via bucket unalias --local <key> <alias>.
  4. Delete the bucket via the temporary or existing global alias reference.
  5. For each key found in step 1, check key info — if the key now has zero buckets, delete it. Shared keys (still attached to other buckets) are left alone.

Failed key deletes in step 5 are logged as manual_cleanup_required and do not fail the overall command. The bucket is gone; the orphaned key is a minor cleanup item, not a divergence.

garage_provision_additional_key

Creates a tier-specific key (rw or ro) and attaches it to an already-provisioned bucket with the customer's local alias name.

Internal step order:

  1. Create the key with new_key_name.
  2. Grant bucket_allow_rw or bucket_allow_ro permissions on bucket_id (per key_tier).
  3. Attach local_alias scoped to the new key.

Rollback on failure unwinds in reverse. Failure reasons: invalid_key_tier, new_key_create_failed, new_key_permission_grant_failed, new_key_alias_attach_failed, rollback_failed.

garage_rotate_customer_key

Replaces an existing key with a freshly created one, transfers permissions and the local alias, then deletes the old key. Used when a customer regenerates credentials.

Internal step order:

  1. Create the new key with new_key_name.
  2. Grant permissions per key_tier (all mirrors the old key's tier, otherwise rw or ro).
  3. Attach local_alias scoped to the new key.
  4. Delete the old key.

Failure reasons: new_key_create_failed, new_key_permission_grant_failed, new_key_alias_attach_failed, old_key_delete_failed, rollback_failed.

Race window: between step 3 and step 4, both old and new keys briefly have access to the bucket. The dashboard's manifest reconciler tolerates this with a 30s grace window (see Manifest contract).

Data-plane operations

Three long-running commands talk directly to the local Garage S3 endpoint rather than the admin CLI: garage_bucket_clear, garage_bucket_set_cors, and garage_walk_bucket_stats. They share a purpose-built SigV4 S3 client at stormpulse/garage/s3.py — stdlib + cryptography only, no boto3 (a 30MB dependency for what amounts to a handful of HTTP operations). They also share an envelope pattern: customer-controlled S3 credentials ride in the command.request params, are used for the job's lifetime, and never persist. See Security for the secret-handling contract.

All three follow the long-running lifecycle (see Protocol Specification — Long-running commands) and all three accept these five base params:

Param Description
bucket_name Bucket to operate on
s3_endpoint Garage S3 endpoint URL (e.g. http://localhost:3900)
region S3 region for SigV4 signing
access_key_id Customer S3 access key ID
secret_access_key Customer S3 secret access key

garage_bucket_clear

Bulk-deletes every object in a bucket. Garage's CLI does not expose a "clear bucket" primitive — every clear is a series of S3 DeleteObject calls.

Lifecycle:

  1. command.progress (stage "starting") — credential pre-flight: HeadBucket. Bad credentials produce an immediate terminal command.result with failure_reason="auth_failed" before any delete is attempted.
  2. command.progress (stage "starting") — full paginated list to compute total. Until listing finishes, total is null.
  3. command.progress (stage "running") — once per batch of 1000 deleted objects, with current advancing toward total.
  4. command.progress (stage "finalizing") — summary computation.
  5. command.result — terminal. Carries summary fields at the top of the payload.

Terminal payload extras:

Field Type Description
deleted_count int Objects successfully deleted.
failed_count int Objects that failed to delete (zero on full success).
errors array Up to 10 per-object errors. Each entry has Key, Code, Message. Truncated for wire-payload sanity.
duration_seconds float Wall-clock duration of the job.
error string Human-readable failure summary (only present on failure).

Failure modes:

failure_reason When Counts behavior
auth_failed HeadBucket returned 403 / SignatureDoesNotMatch / InvalidAccessKeyId. No delete attempted. All counts are 0. Dashboard rate-limiter increments.
partial_failure DeleteObjects reported per-object errors. The bucket was partially cleared. deleted_count and failed_count reflect what actually happened. Dashboard leaves bucket DB stats untouched; customer retries.
os_error List or Delete failed at HTTP level (network, server error). Counts reflect work completed before the error.
agent_disconnected Set by the dashboard when the agent's WebSocket closes mid-job. The agent itself emits no terminal result on disconnect — cancelled jobs die silently. Dashboard's responsibility, not the agent's.

A clear that fails partway through is naturally idempotent under retry: re-running on the same bucket continues from whatever objects remain. The agent does not persist intermediate state — there is no resume-from-checkpoint, just retry-from-scratch (which is cheap because list-and-delete is the same code path).

garage_bucket_set_cors

Sets CORS rules on a bucket via S3 PutBucketCors. Required for browser-side uploads from custom domains.

Extra param:

Param Description
origins JSON array of origin strings (e.g. ["https://example.com", "https://*.example.com"]). Pattern allows the bracket/quote/wildcard set needed for valid CORS origins.

Failure modes: auth_failed (HeadBucket rejected creds before the PUT), os_error (PutBucketCors failed at HTTP level).

garage_walk_bucket_stats

Counts objects and sums bytes under a key prefix. Used by Cellar to report per-prefix storage usage to customers without pulling per-object stats out of Garage's metrics surface.

Extra params:

Param Description
prefix Key prefix to walk. Empty string ("") walks the whole bucket.
max_objects Cap on the number of objects to count. Default "100000". If the prefix has more, the walk stops early and truncated=true in the result.

Terminal payload extras:

Field Type Description
count int Object count under the prefix.
bytes int Sum of Size for all objects walked.
truncated bool true if count reached max_objects before the prefix was exhausted. Dashboard treats truncated results as lower bounds.
duration_seconds float Wall-clock duration of the walk.
error string Human-readable failure summary (only present on failure).

Failure modes: auth_failed, os_error.

Security

garage_key_create returns the new API key's secret in command.result.stdout. This secret:

  • Is never logged at any level (DEBUG, INFO, WARNING, ERROR) by the agent
  • Travels over mTLS — encrypted in transit
  • Is displayed once by the dashboard and never stored

When a key is created from the dashboard, the secret is shown once in the sidebar. Closing the sidebar discards it permanently. Save it immediately.

Customer secrets in data-plane command params

The three data-plane commandsgarage_bucket_clear, garage_bucket_set_cors, and garage_walk_bucket_stats — carry a customer-controlled S3 secret in their command.request params. This is required because they hit the data plane, not the admin plane, and the agent does not hold customer S3 credentials at rest.

Storm Pulse handles these secrets as follows:

  • They travel in the HMAC-signed command.request envelope, encrypted in transit by mTLS.
  • The agent constructs a GarageS3Client from the param, uses it for the job's lifetime, and drops the reference when the function returns. Python's GC reclaims the memory.
  • The secret is never written to disk, never logged, and never appears in the terminal command.result payload.
  • Every command in this family sets sensitive_output = true, so any future addition of stdout to the result will be filtered from agent logs.
  • The secret_access_key param's regex pattern (.+) accepts any non-empty string. The agent does not validate secret format — that's the dashboard's responsibility before dispatch.

New long-running commands that need the same pattern should follow this approach (params + sensitive_output = true + per-job client construction) rather than introducing standing credentials in the agent's config.

Troubleshooting

Symptom Check
"Garage node detected" not in logs Is /opt/garage/garage.toml present? Is enabled = true in [garage]?
Garage state missing from dashboard Check state_push_interval_seconds — state refreshes on schedule, not immediately
Commands fail with not_found Is the garaged container running? Is container_name correct in config?
garage_key_create returns empty secret Secret was already displayed and discarded — delete the key and create a new one
Garage state shows stale data Dispatch garage_refresh from the dashboard, or restart stormpulse to force re-collection on register
Data-plane command returns auth_failed Customer's S3 access key was wrong, was revoked, or doesn't have permission on the bucket. Verify via key info <key_id>.
delete_provisioned_bucket returns bucket_not_empty Bucket has objects. Dispatch bucket_clear first, then retry delete.
Provisioning command returns rollback_failed A multi-step orchestration failed and the rollback also failed. Partial state exists in Garage — inspect with bucket info / key list and clean up manually. Dashboard will surface a divergence on the next manifest push.
delete_provisioned_bucket logs manual_cleanup_required for a key A key delete failed during step 5 (orphan-key cleanup) but the bucket itself is gone. The orphaned key has no bucket access — safe to ignore, or delete with garage_key_delete.