Skip to content

Eventing & Auditability

Status: INTERIM (AS-BUILT partial; standardization in progress). Current services emit events as documented per service.

Purpose

Ensure every material action is traceable and attributable for audit, reconciliation, and billing.

Core expectations

  • Every material action emits an event with org context and optional cost-centre attribution.
  • Events carry consistent metadata (actor, timestamps, source references, redaction rules).
  • Events embed the resolved request context (org/facility/channel/roles) for auditability.
  • Events include policy snapshot references using canonical *_policy_version keys; policy objects use policy_version.
  • Events must include reason and source_refs; system-generated events should use a system reason and reference the job/run that produced them.
  • When a machine-readable reason_code is present, use the shared taxonomy (see /common/reason-codes.html).
  • No secrets or PII are emitted in event payloads.
  • Events enable traceability from financial impact to originating business record.

Canonical event envelope (baseline)

All core events (UAS/USM/OFM), storage plane events (MRS), stack-module events (ICS/SCM/PCM/PPM/CRM/Influencer/Accounting), and integration plane events (IPM/RBS) extend base-event.schema.json (see /common/schemas.html).

Required baseline fields:

  • event_id (globally unique)
  • schema_version
  • timestamp_utc
  • build (build metadata)
  • service, action
  • orgcode
  • request_context (see /common/request-context.html)
  • reason, source_refs

Recommended correlation fields:

  • correlation_id (workflow-level)
  • causation_id (immediate trigger)
  • aggregate_type, aggregate_id, aggregate_revision

All services emit the base-event envelope (see event schemas in /common/schemas). For a browsable list of all events by service, see the Event Catalog.

Schema publishing and versioning

  • Schemas are published at /common/schemas/*.schema.json.
  • Schema versions are additive within a version. Breaking changes require a version bump and deprecation window.
  • Event schemas include redaction metadata (x-redaction) for safe handling.

Subscription and delivery

Consumers can receive events via:

  • RBS (Retail Bus Service): subscription → verification → delivery to your queue endpoint.
  • IPM: webhooks, CDC feeds, and bulk exports (where applicable).

Delivery semantics:

  • delivery_reason explains why the delivery was sent (live, retry, replay, verify, test).
  • Dedupe on event_id at the consumer.
  • Replays preserve original event_id and timestamp_utc (where possible).

See also:

Delivery metadata (webhooks/subscriptions)

Outbound deliveries include a delivery reason that explains why the delivery was sent, separate from the event’s own reason.

Example (webhook delivery):

json
{
  "delivery_reason": "live",
  "event": {
    "event_id": "evt-123",
    "reason": "status-set",
    "service": "pvm",
    "action": "variant-updated"
  }
}

Consumer guidance:

  • Treat delivery_reason=retry or replay as repeat delivery attempts (dedupe by event_id).
  • Treat delivery_reason=test|verify as connectivity checks (do not persist as business events).
  • When a payload pointer is delivered, use event_summary.reason for quick classification before fetching the full payload.

Replayable event contract (required)

To make replay deterministic and operationally safe, every event stream must include:

  • correlation_id: workflow-level identifier (same across a saga).
  • causation_id: immediate triggering action/event.
  • aggregate identity: aggregate_type + aggregate_id for the state being changed.
  • aggregate sequence: monotonic aggregate_revision (or sequence) per aggregate.
  • dedupe key: explicit, stable key for de-duplication (for example aggregate_id + action + idempotency_key).
  • cursor expectations: consumers must persist a cursor (per stream or per aggregate) and support replay from a checkpoint.

Publish guarantees:

  • Events must reflect committed state only.
  • A reliable publish pattern is required (for example, outbox or atomic write+emit).
  • If emission is async, reconciliation jobs must detect and republish missing events.

Operational resilience requirements

  • Event emission is idempotent; duplicates are safe and deduplicable by event_id.
  • Events are immutable; corrections are modeled as new events referencing the originals.
  • Out-of-order delivery is tolerated; consumers reconcile using timestamps and references.
  • Replay and backfill are supported for audits and downstream integrations.
  • Schema evolution is additive and backward-compatible to protect long-lived integrations.
  • Event pipelines are non-blocking to operational flows; outages must not prevent core transactions.

Reconciliation and repair (minimum standard)

  • Frequency: reconciliation jobs run continuously or at least every 15 minutes.
  • Drift windows: acceptable drift ≤ 60 minutes for derived views; alerts fire above thresholds.
  • Alerting: inbox alerts for drift, missing events, or backlog age thresholds.
  • Operator actions: replay window selection, rebuild materialized views, repair missing events, and annotate fixes with reason codes.
  • DLQ/poison handling: poisoned events are quarantined with visibility into cause and retry count; operators can requeue after correction.

Observability minimum checklist

Required metrics (per route and per workflow):

  • Latency (p95/p99), error rates by error.major.tag and error.minor.tag.
  • Retry counts and idempotency replay counts.
  • Outbox backlog size/age and event publish latency.
  • Reconciliation lag and repair queue depth.
  • Search index lag and reindex progress when applicable.

Required tags for logs/metrics/events:

  • orgcode, logical_guid, channel_guid
  • aggregate_type, action, request_id

Traceability requirements

  • Inventory, sales, procurement, loyalty, and influencer events must reference the originating transaction.
  • Ownership vs possession and facility context must be preserved in event metadata.

Planned event actions (not yet implemented)

The following event actions are defined in service schemas but do not yet have handler implementations. They are retained in the schema as planned contract items. Planned:

ServiceActionNotes
SCMallocation-committedInventory allocation commit event; depends on ICS integration.
PPMprice-snapshot-createdPeriodic price snapshot capture; deferred.
PPMpromotion-appliedRuntime promotion application event; deferred.
PPMprice-entry-endedPrice entry expiry/end lifecycle event; deferred.
CRMstored-value-adjustedStored value balance adjustment; deferred.
CRMstored-value-expiredStored value expiry processing; deferred.
CRMtier-updatedCustomer loyalty tier update; deferred.
ICSstock-card-entry-createdStock card entry creation event; deferred.