Model selection that declares its needs.
An agent author declares what they need: tier (frontier / mid / fast), required capabilities (vision, tool-use, JSON mode, structured output), context window, latency budget. ILlmCapabilityRegistry knows what every adapter offers. ILlmModelRouter resolves through the project's LlmConnectionBinding chain with a DeprecationFallbackPolicy. Same declarative shape as DbCapabilities — model selection is not magical, it's matched.
Anthropic. OpenAI. DeepSeek. Moonshot. Peer adapters.
ModelInferenceAdapter = 8
Hosted-LLM access lives canonically as CapabilitySurfaceKind.ModelInferenceAdapter. Built-in adapters live exclusively under Vadyl.Connectors.NativeAdapter/ModelInference/<Provider>/V<N>/. No SDK leak past the boundary.
Capability declarations
Each adapter declares per-model capabilities — context window, modalities, tool-use shape, structured output, prompt cache, vision. Routers match requirements against declared caps deterministically.
Typed failure classification
ModelInvocationFailureKind. AuthenticationFailed, RateLimited, Overloaded, ProviderTimeout, ContextWindowExceeded, ContentFiltered. Never Message.Contains — anti-pattern #89 codified.
Deprecation fallback policy
When the requested model is deprecated or returns an error class the policy considers fallback-eligible, the router walks to the next eligible model in the binding chain. Operators see the swap; never silent.
Token accounting preflight
ITokenAccountingService.PreflightAsync runs BEFORE dispatch. Budget intersection across (definition default, caller override, parent residual for sub-agents). Refused if intersection is empty.
Reconciled usage
RecordUsageAsync reconciles after dispatch. Actual tokens counted from the provider response. Drives billing through the canonical UsageEvent ledger. Rolls up to ProjectQuota.
GovernedConnection layer
From authored code, hosted-LLM access still flows through a GovernedConnection of type Llm — for project-scoped binding ergonomics. The adapter itself is the canonical UCSA kind.
Self-hosted via Runtime Fabric
vLLM, Ollama, GPU pools, on-prem inference clusters land canonically through CapabilitySurfaceKind.RuntimeSubstrate with the same scaling and vertical resource policy. No separate kind for self-hosted — same shape, different substrate.
Multi-language adapter authoring
New model providers can ship as built-in native, declarative bundle, or authored Wasm component. WIT contracts ensure conformance regardless of source language.
Anthropic · OpenAI · DeepSeek · Moonshot
Frontier / mid / fast + per-cap
Never Message.Contains
Budget intersection, real usage
Routing that declares its needs.
Author the agent. Declare the requirements. Vadyl resolves the right model through the binding chain — with budget intersection, with deprecation fallback, with typed failure handling.