Reference

Limits and quotas

Request limits, batch limits, realtime limits, workflow deadlines, runtime budgets, webhook retry ceilings, token budgets, and quota enforcement.

Limits are not scattered middleware settings. They are quota and policy descriptors tied to project scope, operation class, capability surface, publication version, and usage metering. Enterprise projects can raise many limits through governance-approved quota grants; fail-closed limits remain hard boundaries.

Limit catalog

LimitDefaultScopeEnforcement
REST request body10 MiB JSON; larger payloads use Storage or SourceAsset upload.Request413 or VALIDATION_FAILED before business execution.
Entity list page size200 rows by default; final-form enterprise policy may raise by grant.Project and routeVALIDATION_FAILED when requested pageSize exceeds policy.
Batch mutation size1,000 rows or 10 MiB per operation.Entity operationVALIDATION_FAILED before transaction opens.
Idempotency key retention24 hours default; configurable per operation class.ProjectDuplicate keys return prior result or idempotency conflict.
Webhook retries10 attempts with exponential backoff plus jitter.EndpointDelivery moves to DeadLettered after retry policy exhausts.
Realtime subscriptions1,000 per project by default; enterprise scopes use quota grants.Project and actor429 QUOTA_EXCEEDED on handshake or subscribe.
Runtime desired instancesPer-surface and cumulative project ceilings from governance envelope.Project, environment, surface, scale groupCompile or mutation fail-closed when desired/max exceeds policy.
Runtime vertical resourcesCPU, memory, storage, bandwidth, and accelerators capped by project inheritance policy.Project, environment, surfaceCapability satisfaction and governance checks before realization.
Autoscale decision cadence30 seconds default with cooldown, hysteresis, stale-sample, rollout, and drain gates.Autoscale targetDecision skipped with typed reason until the gate clears.
Workflow run duration30 days default; durable workflow policy can extend.Workflow definitionCancellation and compensation policy fires at deadline.
Edge handler CPU time50 ms default CPU slice, 5 s wall-clock ceiling.Execution unitTimed-out invocation with typed BridgeError.
Agent token budgetPer-agent and per-run budget; preflight plus reconciliation.Agent runPreflight deny or reconciled overage usage event.
Connector egress timeout30 seconds default, max set by connection policy.Governed connectionUPSTREAM_TIMEOUT with retry classification.
OpenAPI/SDL/proto sizePublication descriptor size quota.Project publicationPublication compile fails closed with descriptor diagnostics.
CLI descriptor cacheETag validated on every CLI startup.Local CLI installation304 reuse or descriptor refetch.

Quota headers

HTTP/1.1 429 Too Many Requests
X-Vadyl-Quota-Kind: read.monthly
X-Vadyl-Quota-Limit: 1000000
X-Vadyl-Quota-Used: 1000001
X-Vadyl-Quota-Reset: 2026-06-01T00:00:00Z

{
  "error": {
    "code": "QUOTA_EXCEEDED",
    "reasonCode": "Quota.ReadMonthly.Exhausted",
    "retryable": false,
    "correlationId": "01HXZ0J4YV8AJF2GFG2T1F7Y42"
  }
}

Create a quota

POST /api/Usage/{projectId}/quotas
{
  "kind": "agent.tokens.monthly",
  "limit": 50000000,
  "mode": "hard",
  "window": "calendar-month",
  "dimensions": { "agent": "SupportAgent" }
}

HTTP/1.1 201 Created
{
  "id": "quota_123",
  "kind": "agent.tokens.monthly",
  "mode": "hard",
  "state": "active"
}

Enforcement modes

ModeBehavior
hardRejects operation before material consumption.
softAllows operation, emits overage usage event, and triggers policy notifications.
monitorRecords usage and warnings only.
reservationPre-reserves capacity before execution and reconciles actual usage after completion.

Budgeted operations

Agent runs, model invocations, workflow runs, distribution materialization, runtime scaling, vertical resource changes, analytics queries, and storage uploads can reserve budget before execution. Reservation failure returns a typed error without partially starting the operation.

Runtime resource budgets

Runtime Fabric enforces both per-surface and cumulative project ceilings: desired instances, max instances, CPU millicores, memory MiB, ephemeral and persistent storage, IOPS, bandwidth, accelerator count, autoscale strategy, load-balancing mode, protocol, and public ingress. Descendant projects inherit the effective envelope and fail closed when a scaling request exceeds it.