Platform

Observability

Audit trails, operational trails, debug traces, metrics, OpenTelemetry — plus the Explainability peer plane for decision reasoning.

Observability tells you what happened. Vadyl ships three distinct durable trail kinds plus a metrics meter — and pairs them with the Explainability plane for why decisions were made.

The three trail kinds

  • Audit trail (VadylAuditLog) — durable record of every state-changing operation: who did what, when, with what effect. Tamper-resistant, retention-policied, legally-defensible.
  • Operational trail (OperationalTrail) — durable operational events for every component lifecycle: deployments, schema transitions, cache invalidations, scheduled runs, webhook deliveries. Backpressure-Wait — never drops.
  • Debug trace (DebugTrace) — high-volume best-effort instrumentation. DropOldest under load. Useful for live debugging; not for compliance.

Channels and durability

Audit and operational trails route through the canonical observability relay (IObservabilityRelay) — a source-transactional buffer that survives process restart and provider hiccups. Trails are emitted from production code through a LifecycleEmitterBase-derived emitter that fans out to platform bus, operational channel, and structured logger under a single envelope.

Metrics

VadylMetrics exposes a typed System.Diagnostics.Metrics.Meter with named instruments for every canonical pipeline: read latency, write latency, cache hit rate, quota checks, scheduled run outcomes, deployment events, runtime desired/running counts, autoscale decisions, and workload saturation. Subscribed by OpenTelemetry exporters; piped to your APM of choice.

Runtime scaling measures

Runtime Fabric emits scale-target measures with canonical dimensions: project, environment, surface kind, scale group, connector, and target. Autoscale decisions record sampled values, cooldown and hysteresis gates, stale-measure skips, governance denials, and capability mismatches so operators can trace why a workload did or did not move.

OpenTelemetry

Vadyl emits OTel spans automatically for every request, every entity operation, every governed connection call, every workflow step, every scheduled run. Spans carry the canonical ObservabilityEnvelope: tenant, project, actor, publication version, correlation ID, request ID. Wire your OTel endpoint and everything threads end-to-end.

// vadyl.config.ts
observability: {
  otel: {
    endpoint: process.env.OTEL_ENDPOINT,
    headers:  { "x-honeycomb-team": secret.ref("HONEYCOMB_KEY") },
  },
},

Reason codes everywhere

Every meaningful decision carries a stable typed ReasonCode: access decisions, cache decisions, plan choices, deployment outcomes, scheduled-run outcomes. Build dashboards on reason codes, not on parsed log strings.

Invariant alarms

Vadyl declares specific metrics as invariant alarms — counters that must stay zero in production. Examples: EntityReadCacheDecryptionSkips, PlatformEventIntentNonRecoverableFailure, ReservedSubscriptionRejections. Wire your alerting to page on any non-zero rate; these mean something has broken in a way the platform cannot self-recover.

Inspecting trails

vadyl audit tail --entity Order --since 1h
vadyl operational tail --component scheduler
vadyl debug tail --workflow fulfillOrder --run <runId>
vadyl explain access --entity Order --as user:abc

What observability is NOT

It is not the source of truth for "why". Decision reasoning lives on the Explainability plane (anti-pattern #77 — never reconstruct from logs). Observability records what happened; Explainability projects why directly from canonical authorities.