Docs — AI Fusion

Quickstart

Point any OpenAI-SDK-compatible client at https://ai.viktorarsov.com/v1 and supply your project token as the API key.

from openai import OpenAI

client = OpenAI(
    base_url="https://ai.viktorarsov.com/v1",
    api_key="afp_live_…",
)
resp = client.chat.completions.create(
    model="gateway:fast-free",
    messages=[{"role": "user", "content": "Say hi."}],
)
print(resp.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://ai.viktorarsov.com/v1",
  apiKey: "afp_live_…",
});
const resp = await client.chat.completions.create({
  model: "gateway:fast-free",
  messages: [{ role: "user", content: "Say hi." }],
});
console.log(resp.choices[0].message.content);

curl https://ai.viktorarsov.com/v1/chat/completions \
  -H "Authorization: Bearer afp_live_…" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gateway:fast-free",
    "messages": [{"role": "user", "content": "Say hi."}]
  }'

Models & aliases

You can address a model three ways:

Alias — gateway:fast-free, gateway:best-free-chat. Resolves to a ranked list. Recommended.
Provider-qualified — groq/llama-3.1-8b-instant. Pin to one provider.
Bare — llama-3.3-70b. Matches across providers by display name.

Aliases

Slug	Kind	Description	Fallbacks
auto	alias	Spiral rotation: T1→T5 across all 5 sources × all keys. The DEFAULT alias.	150
best-free-chat	alias	Best-quality free chat: try premium-feel models first, fall through to high-volume.	7
cheap-paid	alias	Cheapest paid fallback when all free quotas exhausted.	3
code-free	alias	Coding-tuned free models (autocomplete, refactor, explain).	4
embed-cheap	alias	Cheap free-tier embeddings: Gemini gemini-embedding-001 (768-dim).	1
fast-free	alias	Low-latency cheap/free models for high-volume simple tasks.	5
gpt-4o-mini	alias	Compat alias: gpt-4o-mini → openrouter gpt-oss-120b:free. Added 2026-05-12 to absorb stray legacy callers.	3
long-context	alias	Models with very large context windows (>200K).	4
openai/gpt-4o-mini	alias	Compat alias: openai/gpt-4o-mini → openrouter gpt-oss-120b:free. Added 2026-05-12.	3
reasoning	alias	Stronger reasoning models.	5
vision	alias	Multimodal (image input) capable free models.	5

The gateway rotates through each alias's fallback chain automatically on quota / cooldown. Chains may include paid models as last-resort fallbacks; routing prefers free providers first based on live cost-score.

Raw models

42 models. Free models show $0.00. Full OpenAPI schema at /docs.

Provider	Model id	Display	Context	$/Mtok in	$/Mtok out
nvidia_nim	mistralai/mistral-large-3-675b-instruct-2512	Mistral Large 3 675B (NIM)	128000	free	free
nvidia_nim	nvidia/llama-3.1-nemotron-ultra-253b-v1	Nemotron Ultra 253B (NIM)	128000	free	free
nvidia_nim	meta/llama-3.1-405b-instruct	Llama 3.1 405B (NIM)	128000	free	free
nvidia_nim	deepseek-ai/deepseek-v3.2	DeepSeek V3.2 (NIM)	128000	free	free
openrouter	nousresearch/hermes-3-llama-3.1-405b:free	Hermes 3 Llama 3.1 405B (free)	131072	free	free
nvidia_nim	meta/llama-4-maverick-17b-128e-instruct	Llama 4 Maverick (NIM, vision)	128000	free	free
openrouter	nvidia/nemotron-3-super-120b-a12b:free	Nemotron 3 Super 120B (free)	262144	free	free
openrouter	arcee-ai/trinity-large-preview:free	Arcee Trinity Large 400B (free)	131072	free	free
nvidia_nim	deepseek-ai/deepseek-v3.1-terminus	DeepSeek V3.1 Terminus (NIM)	128000	free	free
cerebras	zai-glm-4.7	Z.ai GLM 4.7 (Cerebras)	32768	free	free
openrouter	qwen/qwen3-next-80b-a3b-instruct:free	Qwen3-Next 80B (free)	262144	free	free
nvidia_nim	moonshotai/kimi-k2-instruct	Kimi K2 (NIM)	128000	free	free
openrouter	openai/gpt-oss-120b:free	GPT-OSS 120B (OR, free)	131072	free	free
groq	groq/compound	Groq Compound (agentic)	131072	free	free
nvidia_nim	meta/llama-3.3-70b-instruct	Llama 3.3 70B (NIM)	128000	free	free
openrouter	google/gemma-4-31b-it:free	Gemma 4 31B (free)	262144	free	free
cerebras	gpt-oss-120b	GPT-OSS 120B (Cerebras)	131072	free	free
nvidia_nim	openai/gpt-oss-120b	GPT-OSS 120B (NIM)	128000	free	free
nvidia_nim	nvidia/llama-3.3-nemotron-super-49b-v1.5	Nemotron Super 49B v1.5 (NIM)	128000	free	free
openrouter	meta-llama/llama-3.3-70b-instruct:free	Llama 3.3 70B (free)	65536	free	free
groq	meta-llama/llama-4-scout-17b-16e-instruct	Llama 4 Scout 17B (Groq)	131072	free	free
openrouter	z-ai/glm-4.5-air:free	Z.ai GLM 4.5 Air (free)	131072	free	free
openrouter	google/gemma-4-26b-a4b-it:free	Gemma 4 26B (free)	262144	free	free
openrouter	nvidia/nemotron-3-nano-30b-a3b:free	Nemotron 3 Nano 30B (free)	256000	free	free
groq	qwen/qwen3-32b	Qwen3 32B (Groq)	131072	free	free
openrouter	minimax/minimax-m2.5:free	Minimax M2.5 (free)	196608	free	free
openrouter	google/gemma-3-27b-it:free	Gemma 3 27B (free)	131072	free	free
openrouter	nvidia/nemotron-nano-12b-v2-vl:free	Nemotron Nano 12B VL (free, vision)	128000	free	free
freellmapi	auto	Aggregator auto-pick	8192	free	free
no_cost_ai	auto	Aggregator auto-pick	8192	free	free
nvidia_nim	qwen/qwen3-coder-480b-a35b-instruct	Qwen3 Coder 480B (NIM)	128000	free	free
openrouter	qwen/qwen3-coder:free	Qwen3 Coder 480B (free)	262144	free	free
gemini	text-embedding-004	Gemini text-embedding-004 (768-dim, free)	2048	free	free
gemini	gemini-embedding-001	Gemini gemini-embedding-001 (768-dim via outputDimensionality, free)	2048	free	free
groq	groq/compound-mini	Groq Compound Mini (agentic-fast)	131072	free	free
openrouter	openai/gpt-oss-20b:free	GPT-OSS 20B (OR, free)	131072	free	free
openrouter	nvidia/nemotron-nano-9b-v2:free	Nemotron Nano 9B (free)	128000	free	free
nvidia_nim	openai/gpt-oss-20b	GPT-OSS 20B (NIM)	128000	free	free
groq	openai/gpt-oss-20b	GPT-OSS 20B (Groq)	131072	free	free
openrouter	meta-llama/llama-3.2-3b-instruct:free	Llama 3.2 3B (free)	131072	free	free
nvidia_nim	moonshotai/kimi-k2-thinking	Kimi K2 Thinking (NIM, reasoning)	128000	free	free
nvidia_nim	qwen/qwen3-next-80b-a3b-thinking	Qwen3-Next 80B Thinking (NIM)	128000	free	free

Error shapes

All errors return a uniform JSON envelope and an HTTP status matching the failure class.

{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Provider quota exhausted; try again in 30s.",
    "request_id": "req_01H…",
    "retry_after_s": 30
  }
}

Error types

Type	Status	Meaning
authentication_failed	401	Missing or invalid project token.
rate_limit_exceeded	429	RPM, daily or monthly cap hit. See `retry_after_s`.
pool_exhausted	429	Every upstream key is cooling. See `next_slot_eta_s`.
model_unavailable	404	Unknown model or alias resolution is empty.
upstream_error	502	All attempts failed with a 5xx at the provider.
bad_request	400	Malformed OpenAI-shape body or guardrail rejection.

Headers & rate limits

Every successful response carries x-gateway-* metadata so you can see which key answered and at what cost without enabling extra logging.

Response headers

Header	Meaning
x-gateway-provider	Slug of the provider that answered.
x-gateway-model	Final upstream model id used.
x-gateway-attempts	How many (key, model) pairs were tried.
x-gateway-latency-ms	Wall-clock time from ingress to first byte.
x-gateway-tokens	Total tokens counted for this request.
x-gateway-cost-usd	USD cost in our accounting.
x-gateway-request-id	UUID for cross-referencing in your logs.
Retry-After	Seconds until next attempt, on 429/503.

Rate limits

Per-tenant limits come from your plan: RPM, monthly requests, monthly tokens. When the RPM bucket is full, the gateway either waits up to max_wait_ms and retries, or returns 429 with next_slot_eta_s.

Idempotency & retries

The gateway retries transient upstream failures automatically (up to 2 per attempt, up to N providers). Your own retries should use an Idempotency-Key header so we can deduplicate if our SSE disconnects mid-stream.

AI Fusion docs