Bring your own provider
agent-devtools ships two provider lanes. By default the widget talks to
Claude Code via ACP — the local CLI you already authenticated. The
SDK lane swaps that out for any backend that satisfies the
LLMProvider contract exported from @agent-devtools/harness-core.
The widget never sees which backend you picked. The provider abstraction is a server-side seam, not a client-side feature flag. This page documents that seam as a public extension point and shows three concrete adapters you can copy into your own host.
Two lanes, one contract
Section titled “Two lanes, one contract”| Lane | Identifier | Shape | Today’s implementations |
|---|---|---|---|
| ACP (default) | n/a | local CLI spawn | claude CLI (Claude Code) |
| SDK | LLMProvider | request/response | OpenRouter, Groq, Cerebras, OpenAI |
| Session SDK | SessionProvider | persistent session | Claude Agent SDK |
The ACP lane reuses your existing ~/.claude/ OAuth session — no key,
no extra account. The SDK lanes take an API key from the environment.
The LLMProvider interface
Section titled “The LLMProvider interface”Source: packages/harness-core/src/llm/types.ts.
export interface LLMProvider { readonly supportsTools: boolean; readonly providerName: string;
chat(messages: ChatMessage[], options?: ChatOptions): Promise<ChatResponse>; chatWithVision( messages: ChatMessage[], imageContent: ImageContent, options?: ChatOptions, ): Promise<ChatResponse>; chatWithTools( messages: ChatMessage[], tools: ToolDefinition[], options?: ChatOptions, ): Promise<ToolChatResponse>;}One-line gloss per member:
supportsTools— whenfalse, strategies skipchatWithToolsand stay on plain text turns.providerName— stable identifier the harness stamps ontoAgentOutput.metadata.providerfor telemetry.chat— single text-only turn. Returns{ content, model, usage? }.chatWithVision— same aschatbut takes oneImageContent(base64 or URL) and injects it into the last user message.chatWithTools— tool-use turn. Returns{ content, toolCalls, finished, model, usage? }; the model-driven loop iterates whilefinished === false.
ChatOptions.signal is the only field providers MUST honour. Compose it
with your internal timeout via AbortSignal.any([timeoutCtrl.signal, options.signal])
so the fetch aborts on whichever fires first.
Bundled SDK providers
Section titled “Bundled SDK providers”| Provider name | Module | Env var |
|---|---|---|
openrouter | harness-core/src/llm/openrouter.ts | OPENROUTER_API_KEY |
groq | harness-core/src/llm/groq.ts | GROQ_API_KEY |
cerebras | harness-core/src/llm/cerebras.ts | CEREBRAS_API_KEY |
openai | harness-core/src/llm/openai.ts | OPENAI_API_KEY |
Plus the session-shaped provider:
| Session provider | Module | Env var |
|---|---|---|
claude-agent-sdk | harness-core/src/llm/claude-agent-sdk.ts | ANTHROPIC_API_KEY |
The session provider returns a SessionProvider — the harness routes it
to the SDK session loop rather than the request/response loop. ACP and
session SDK are the only paths that preserve conversation state across
turns server-side.
Picking a provider — the no-fallback rule
Section titled “Picking a provider — the no-fallback rule”import { createProvider, createSessionProvider,} from '@agent-devtools/harness-core';
// Request/response laneconst openai = createProvider('openai', 'gpt-4o-mini');
// Session lane (Claude Code subscription auth or ANTHROPIC_API_KEY)const claude = createSessionProvider('claude-agent-sdk', 'claude-sonnet-4-5');createProvider and createSessionProvider require an explicit model
name. There is no “default model” — the harness refuses to pick one
on the host’s behalf. The reason is operational: a default model in a
shared library becomes a silent dependency that costs money the wrong
way the day the model is renamed or deprecated.
DEFAULT_LLM_PROVIDER is an opt-in env var the host reads to choose
between bundled providers. It is never consulted automatically.
Discovery — what is wired up right now
Section titled “Discovery — what is wired up right now”import { getAvailableProviders, getAvailableSessionProviders, getProviderModels,} from '@agent-devtools/harness-core';
getAvailableProviders(); // -> ['openai', 'groq', ...] based on env keysgetAvailableSessionProviders(); // -> ['claude-agent-sdk'] if ANTHROPIC_API_KEYgetProviderModels('openai'); // -> the bundled model whitelistThese three are the building blocks for the /info HTTP endpoint a host
mounts to let the widget show a “which model am I talking to” indicator.
Adding a custom provider
Section titled “Adding a custom provider”Implement the LLMProvider interface in your host code, then either
register through the factory (fork-and-edit) or pass your instance
directly into the harness-core run loop. The three examples below are
copy-pastable starting points — replace the model identifier, headers,
and response field names to match the backend you actually target.
For OpenAI-compatible endpoints you can also extend
BaseOpenAICompatibleProvider (the same base class OpenRouterProvider,
GroqProvider, and CerebrasProvider use) and only override apiUrl,
providerName, and buildHeaders. The standalone class form below is
shown so the contract is visible end-to-end.
Example 1: OpenAI-compatible chat completions
Section titled “Example 1: OpenAI-compatible chat completions”A generic adapter for any backend that speaks the OpenAI
/v1/chat/completions shape — your own self-hosted vLLM, Azure OpenAI,
Together AI, Fireworks, DeepInfra, and so on. Drop in your baseUrl and
auth header.
import type { ChatMessage, ChatOptions, ChatResponse, LLMProvider, ToolCall, ToolChatResponse, ToolDefinition,} from '@agent-devtools/harness-core';
interface OpenAICompatibleResponse { choices?: Array<{ message?: { content?: string | null; tool_calls?: Array<{ id: string; type: 'function'; function: { name: string; arguments: string }; }>; }; finish_reason?: string; }>; usage?: { prompt_tokens?: number; completion_tokens?: number; total_tokens?: number; }; error?: { message?: string };}
export class OpenAICompatibleProvider implements LLMProvider { readonly providerName = 'openai-compatible'; readonly supportsTools = true;
constructor( private readonly baseUrl: string, private readonly apiKey: string, private readonly defaultModel: string, ) {}
async chat( messages: ChatMessage[], options?: ChatOptions, ): Promise<ChatResponse> { const model = options?.model ?? this.defaultModel; const data = await this.post(model, messages, undefined, options); const content = data.choices?.[0]?.message?.content; if (!content) { throw new Error(`${this.providerName}: empty response from ${model}`); } return { content, model, ...(data.usage && { usage: { inputTokens: data.usage.prompt_tokens, outputTokens: data.usage.completion_tokens, totalTokens: data.usage.total_tokens, }, }), }; }
async chatWithVision( messages: ChatMessage[], imageContent: { base64?: string; url?: string; mimeType?: string }, options?: ChatOptions, ): Promise<ChatResponse> { const url = imageContent.url ?? (imageContent.base64 ? `data:${imageContent.mimeType ?? 'image/png'};base64,${imageContent.base64}` : undefined); if (!url) { throw new Error(`${this.providerName}: image url or base64 required`); } const augmented = messages.map((m, idx) => { const isLastUser = idx === messages.length - 1 && m.role === 'user'; if (!isLastUser) return m; const text = typeof m.content === 'string' ? m.content : ''; return { ...m, content: [ { type: 'text' as const, text }, { type: 'image_url' as const, image_url: { url } }, ], }; }); return this.chat(augmented, options); }
async chatWithTools( messages: ChatMessage[], tools: ToolDefinition[], options?: ChatOptions, ): Promise<ToolChatResponse> { const model = options?.model ?? this.defaultModel; const data = await this.post(model, messages, tools, options); const msg = data.choices?.[0]?.message; const toolCalls: ToolCall[] = (msg?.tool_calls ?? []).map((tc) => ({ id: tc.id, type: 'function' as const, function: { name: tc.function.name, arguments: tc.function.arguments }, })); return { content: msg?.content ?? null, toolCalls, finished: toolCalls.length === 0, model, ...(data.usage && { usage: { inputTokens: data.usage.prompt_tokens, outputTokens: data.usage.completion_tokens, totalTokens: data.usage.total_tokens, }, }), }; }
private async post( model: string, messages: ChatMessage[], tools: ToolDefinition[] | undefined, options?: ChatOptions, ): Promise<OpenAICompatibleResponse> { const timeout = new AbortController(); const timer = setTimeout(() => timeout.abort(), 60_000); const signal = options?.signal ? AbortSignal.any([timeout.signal, options.signal]) : timeout.signal; try { const res = await fetch(`${this.baseUrl}/v1/chat/completions`, { method: 'POST', headers: { Authorization: `Bearer ${this.apiKey}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ model, messages, ...(tools && { tools }), max_tokens: options?.maxTokens ?? 4096, temperature: options?.temperature ?? 0.1, }), signal, }); const body = (await res.json()) as OpenAICompatibleResponse; if (!res.ok || body.error) { const msg = body.error?.message ?? res.statusText; throw new Error( `${this.providerName} ${res.status} on ${model}: ${msg}`, ); } return body; } finally { clearTimeout(timer); } }}Example 2: Anthropic Messages API
Section titled “Example 2: Anthropic Messages API”The Anthropic /v1/messages endpoint has a different shape than OpenAI:
the system prompt lives at the top level (not inside messages), tool
calls are returned as content blocks of type tool_use, and tool
results go back as content blocks of type tool_result. The adapter
below translates the LLMProvider ChatMessage list into that shape and
flattens the response back.
import type { ChatMessage, ChatOptions, ChatResponse, ContentPart, LLMProvider, ToolCall, ToolChatResponse, ToolDefinition,} from '@agent-devtools/harness-core';
interface AnthropicContentBlock { type: 'text' | 'tool_use' | 'tool_result'; text?: string; id?: string; name?: string; input?: unknown; tool_use_id?: string; content?: string;}
interface AnthropicResponse { content?: AnthropicContentBlock[]; stop_reason?: string; usage?: { input_tokens?: number; output_tokens?: number }; error?: { message?: string; type?: string };}
export class AnthropicProvider implements LLMProvider { readonly providerName = 'Anthropic'; readonly supportsTools = true;
constructor( private readonly apiKey: string, private readonly defaultModel: string, ) {}
async chat( messages: ChatMessage[], options?: ChatOptions, ): Promise<ChatResponse> { const model = options?.model ?? this.defaultModel; const data = await this.post(model, messages, undefined, options); const text = (data.content ?? []) .filter((b) => b.type === 'text' && b.text) .map((b) => b.text!) .join(''); if (!text) { throw new Error(`${this.providerName}: empty response from ${model}`); } return { content: text, model, ...(data.usage && { usage: { inputTokens: data.usage.input_tokens, outputTokens: data.usage.output_tokens, totalTokens: (data.usage.input_tokens ?? 0) + (data.usage.output_tokens ?? 0), }, }), }; }
async chatWithVision( messages: ChatMessage[], imageContent: { base64?: string; url?: string; mimeType?: string }, options?: ChatOptions, ): Promise<ChatResponse> { const url = imageContent.url ?? (imageContent.base64 ? `data:${imageContent.mimeType ?? 'image/png'};base64,${imageContent.base64}` : undefined); if (!url) { throw new Error(`${this.providerName}: image url or base64 required`); } const augmented = messages.map((m, idx) => { const isLastUser = idx === messages.length - 1 && m.role === 'user'; if (!isLastUser) return m; const text = typeof m.content === 'string' ? m.content : ''; const parts: ContentPart[] = [ { type: 'text', text }, { type: 'image_url', image_url: { url } }, ]; return { ...m, content: parts }; }); return this.chat(augmented, options); }
async chatWithTools( messages: ChatMessage[], tools: ToolDefinition[], options?: ChatOptions, ): Promise<ToolChatResponse> { const model = options?.model ?? this.defaultModel; const data = await this.post(model, messages, tools, options); const textBlocks = (data.content ?? []).filter( (b) => b.type === 'text' && b.text, ); const toolBlocks = (data.content ?? []).filter( (b) => b.type === 'tool_use' && b.id && b.name, ); const toolCalls: ToolCall[] = toolBlocks.map((b) => ({ id: b.id!, type: 'function' as const, function: { name: b.name!, arguments: JSON.stringify(b.input ?? {}), }, })); const text = textBlocks.map((b) => b.text!).join(''); return { content: text.length > 0 ? text : null, toolCalls, finished: toolCalls.length === 0, model, ...(data.usage && { usage: { inputTokens: data.usage.input_tokens, outputTokens: data.usage.output_tokens, totalTokens: (data.usage.input_tokens ?? 0) + (data.usage.output_tokens ?? 0), }, }), }; }
private async post( model: string, messages: ChatMessage[], tools: ToolDefinition[] | undefined, options?: ChatOptions, ): Promise<AnthropicResponse> { const system = messages .filter((m) => m.role === 'system') .map((m) => (typeof m.content === 'string' ? m.content : '')) .join('\n\n'); const turns = messages .filter((m) => m.role !== 'system') .map((m) => ({ role: m.role === 'tool' ? 'user' : m.role, content: m.role === 'tool' ? [ { type: 'tool_result', tool_use_id: m.tool_call_id, content: typeof m.content === 'string' ? m.content : '', }, ] : m.content, })); const anthropicTools = tools?.map((t) => ({ name: t.function.name, description: t.function.description, input_schema: t.function.parameters, })); const timeout = new AbortController(); const timer = setTimeout(() => timeout.abort(), 60_000); const signal = options?.signal ? AbortSignal.any([timeout.signal, options.signal]) : timeout.signal; try { const res = await fetch('https://api.anthropic.com/v1/messages', { method: 'POST', headers: { 'x-api-key': this.apiKey, 'anthropic-version': '2023-06-01', 'Content-Type': 'application/json', }, body: JSON.stringify({ model, ...(system && { system }), messages: turns, ...(anthropicTools && { tools: anthropicTools }), max_tokens: options?.maxTokens ?? 4096, temperature: options?.temperature ?? 0.1, }), signal, }); const body = (await res.json()) as AnthropicResponse; if (!res.ok || body.error) { const msg = body.error?.message ?? res.statusText; throw new Error( `${this.providerName} ${res.status} on ${model}: ${msg}`, ); } return body; } finally { clearTimeout(timer); } }}Example 3: Local Ollama
Section titled “Example 3: Local Ollama”Ollama runs models locally on http://localhost:11434.
The /api/chat endpoint takes OpenAI-style messages but returns its own
shape: { message: { content }, prompt_eval_count, eval_count }. Tool
calling exists for models that advertise it (e.g. llama3.1,
qwen2.5-coder), but most local models do not — this adapter sets
supportsTools = false so the harness only calls chat.
import type { ChatMessage, ChatOptions, ChatResponse, LLMProvider, ToolChatResponse, ToolDefinition,} from '@agent-devtools/harness-core';
interface OllamaResponse { message?: { role: string; content: string }; prompt_eval_count?: number; eval_count?: number; done?: boolean; error?: string;}
export class OllamaProvider implements LLMProvider { readonly providerName = 'ollama'; readonly supportsTools = false;
constructor( private readonly defaultModel: string, private readonly baseUrl: string = 'http://127.0.0.1:11434', ) {}
async chat( messages: ChatMessage[], options?: ChatOptions, ): Promise<ChatResponse> { const model = options?.model ?? this.defaultModel; const timeout = new AbortController(); const timer = setTimeout(() => timeout.abort(), 120_000); const signal = options?.signal ? AbortSignal.any([timeout.signal, options.signal]) : timeout.signal; try { const res = await fetch(`${this.baseUrl}/api/chat`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model, messages: messages.map((m) => ({ role: m.role, content: typeof m.content === 'string' ? m.content : '', })), stream: false, options: { temperature: options?.temperature ?? 0.1, num_predict: options?.maxTokens ?? 4096, }, }), signal, }); const body = (await res.json()) as OllamaResponse; if (!res.ok || body.error) { throw new Error( `${this.providerName} ${res.status} on ${model}: ${body.error ?? res.statusText}`, ); } const content = body.message?.content; if (!content) { throw new Error(`${this.providerName}: empty response from ${model}`); } return { content, model, usage: { inputTokens: body.prompt_eval_count, outputTokens: body.eval_count, totalTokens: (body.prompt_eval_count ?? 0) + (body.eval_count ?? 0), }, }; } finally { clearTimeout(timer); } }
async chatWithVision( messages: ChatMessage[], _imageContent: { base64?: string; url?: string }, options?: ChatOptions, ): Promise<ChatResponse> { // Ollama's vision support depends on the model (llava, bakllava, ...) and // takes images as a top-level `images: string[]` (base64). For brevity // this adapter falls back to text-only — extend it if your model needs // vision turns. return this.chat(messages, options); }
async chatWithTools( _messages: ChatMessage[], _tools: ToolDefinition[], _options?: ChatOptions, ): Promise<ToolChatResponse> { // `supportsTools = false` means the harness never reaches here. Throw // loudly so a strategy that ignores the flag fails fast in dev. throw new Error( `${this.providerName}: tool calling not enabled — set supportsTools = true and implement /api/chat with tools.`, ); }}Wiring an adapter into the harness
Section titled “Wiring an adapter into the harness”import { OpenAICompatibleProvider } from './openai-compatible-provider.js';// ...or AnthropicProvider, OllamaProvider — same shape.
const provider = new OpenAICompatibleProvider( 'https://api.together.xyz', process.env.TOGETHER_API_KEY!, 'meta-llama/Llama-3.1-70B-Instruct-Turbo',);The harness-core run loop takes an LLMProvider, not a provider name.
You can skip the factory entirely and pass the instance directly. If you
want the bundled discovery surface (getAvailableProviders,
getProviderModels, /info endpoint) to know about your adapter, add a
case branch to packages/harness-core/src/llm/factory.ts — but that
step is optional.
Transport — not pluggable today
Section titled “Transport — not pluggable today”The pairing-token + loopback HTTP + SSE transport between widget and local dev server is fixed. There is no third lane. The seam is the provider on the server side, not the wire format on the browser side.
If you need a different wire transport — say, a remote dev container talking to a local browser — file an issue. It is a deliberate non-feature today, not an oversight.
Why this matters
Section titled “Why this matters”The widget never holds an API key. The host decides which lane to use based on policy (cost, latency, capability) and the widget reads whatever the server hands back. Switching from Claude Code to OpenAI for a single user session is a server-side toggle, not a redeploy and not a widget rebuild.
Cross-references
Section titled “Cross-references”- How it works — where the provider sits in the dev-server architecture.
- Provider — ACP vs SDK — when to pick which lane.
- Permission modes — the action-aware policy that gates what your provider’s tool calls are allowed to do.
- Security model — pairing token, loopback binding, and the boundary your provider runs inside.