Appearance
Playbooks
Playbooks are standard operating procedures (SOPs) for OPS. Use calls.md for API Gateway payload shapes and RECAP.md for direct Lambda operations.
Surface availability (explicit)
- API Gateway: Available (ping, stat, maintenance list/get).
- Direct Lambda: Available (maintenance schedule/cancel/start/end/update, vacuum all/org/cancel/status).
- CLI: Available (
g3n ops ..., API Gateway + direct Lambdas). - MCP: Available.
Playbook: Maintenance window lifecycle
Goal: Schedule, execute, and end a maintenance window across all services.
Why this sequence:
- Maintenance mode blocks API Gateway traffic across 14 services (fails open).
- A controlled lifecycle ensures proper communication and recovery.
Preconditions
- Secret code (set at initial deploy, stored as scrypt hash in DynamoDB).
SOP (happy path)
- Schedule maintenance (
g3n ops maintenance-schedule --secret-code $CODE --description "..." --duration 3600 --start "2026-03-01T02:00:00Z").- Reason: creates a maintenance record in
scheduledstate.
- Reason: creates a maintenance record in
- Start maintenance (
g3n ops maintenance-start --secret-code $CODE --maintenance-id $ID).- Reason: activates maintenance mode; all services start returning 503.
- Post updates (
g3n ops maintenance-update --secret-code $CODE --text "Phase 1 complete").- Reason: provides progress updates visible via ping and maintenance/get.
- End maintenance (
g3n ops maintenance-end --secret-code $CODE --end-message "Complete").- Reason: deactivates maintenance mode; services recover within 20s cache TTL.
Outputs
- Maintenance record with full lifecycle history.
- Audit log entry in
/g3nretailstack/ops/maintenance-audit.
Failure modes / remediation
secret-code-mismatch: verify the correct secret code.- Services not recovering: maintenance check fails open; if OPS infra is down, services continue operating.
Playbook: Vacuum all (dry-run first)
Goal: Purge all data from the stack (development/testing reset).
Why this sequence:
- Vacuum is destructive and irreversible. Always dry-run first.
- 5-minute pending window allows cancellation.
Preconditions
- Secret code.
- No active vacuum (mutex enforced).
SOP (happy path)
- Dry-run (
g3n ops vacuum-all --secret-code $CODE --reason "Reset" --confirmation-phrase "VACUUM ALL DATA PERMANENTLY" --dry-run).- Reason: reports what would be deleted without deleting.
- Review dry-run results (
g3n ops vacuum-status --vacuum-id $VID). - Execute (
g3n ops vacuum-all --secret-code $CODE --reason "Reset" --confirmation-phrase "VACUUM ALL DATA PERMANENTLY").- Reason: starts the vacuum with 5-minute pending window.
- Monitor (
g3n ops vacuum-status --vacuum-id $VID).
Outputs
- Per-service deletion stats (items, objects, bytes).
- Audit record in DynamoDB.
Failure modes / remediation
vacuum-mutex-locked: another vacuum is running; wait or cancel it.- Cancel during pending:
g3n ops vacuum-cancel --secret-code $CODE --vacuum-id $VID.
Playbook: Vacuum org
Goal: Purge all data for a specific organization.
Preconditions
- Secret code.
- Valid orgcode.
SOP (happy path)
- Dry-run (
g3n ops vacuum-org --secret-code $CODE --orgcode TESTORG --reason "Cleanup" --dry-run). - Review then execute without
--dry-run.
Notes
- Skips UAS users and USM sessions (user accounts are cross-org).
- Skips CloudWatch log groups.
Cross-service relationships
- 20 service tables: OPS coordinates vacuum across the entire stack.