Appearance
Operations Management (OPS)
OPS — Operations Management
Contract-only documentation for stack-wide maintenance windows and data vacuum operations.
Status
- Implementation: implemented (maintenance lifecycle, vacuum all/org, secret code gate, audit logging)
- OpenAPI:
/ops/openapi.yaml
Scope
- Maintenance mode: schedule, start, update, end maintenance windows that block API Gateway traffic across 14 services (fails open).
- Vacuum all: purge all data across all service tables, S3 data buckets, event/changelog/usage buckets, and CloudWatch log groups. Protected resources (
ops_main, audit logs, doc/mcp buckets) are never touched. - Vacuum org: purge all data for a specific organization across applicable services (skips UAS users, USM sessions, and CloudWatch).
- All destructive operations require a secret code (stored as a scrypt hash in DynamoDB, never in env vars or CDK code).
- Maintenance check in other services fails open (OPS infra issues do not cause global outage).
- Vacuum has a 5-minute pending window before execution (cancel-safe).
- Only one vacuum (all or org) at a time via DynamoDB mutex.
- Dry-run mode reports what would be deleted without deleting.
Clarifications
- Maintenance lifecycle: scheduled → active → ended (or cancelled). Updates can be posted while active.
- Vacuum lifecycle: pending (5-min window) → running → completed/failed (or cancelled during pending).
- Protected resources:
ops_main,/g3nretailstack/ops/maintenance-audit,doc.g3nretailstack.com,mcp.g3nretailstack.comare never vacuumed.