Skip to content

Operations Management (OPS)

OPS — Operations Management

Contract-only documentation for stack-wide maintenance windows and data vacuum operations.

Status

  • Implementation: implemented (maintenance lifecycle, vacuum all/org, secret code gate, audit logging)
  • OpenAPI: /ops/openapi.yaml

Scope

  • Maintenance mode: schedule, start, update, end maintenance windows that block API Gateway traffic across 14 services (fails open).
  • Vacuum all: purge all data across all service tables, S3 data buckets, event/changelog/usage buckets, and CloudWatch log groups. Protected resources (ops_main, audit logs, doc/mcp buckets) are never touched.
  • Vacuum org: purge all data for a specific organization across applicable services (skips UAS users, USM sessions, and CloudWatch).
  • All destructive operations require a secret code (stored as a scrypt hash in DynamoDB, never in env vars or CDK code).
  • Maintenance check in other services fails open (OPS infra issues do not cause global outage).
  • Vacuum has a 5-minute pending window before execution (cancel-safe).
  • Only one vacuum (all or org) at a time via DynamoDB mutex.
  • Dry-run mode reports what would be deleted without deleting.

Clarifications

  • Maintenance lifecycle: scheduled → active → ended (or cancelled). Updates can be posted while active.
  • Vacuum lifecycle: pending (5-min window) → running → completed/failed (or cancelled during pending).
  • Protected resources: ops_main, /g3nretailstack/ops/maintenance-audit, doc.g3nretailstack.com, mcp.g3nretailstack.com are never vacuumed.