AI Fusion docs
An OpenAI-compatible gateway aggregating free-tier API access across 5 providers and 42 models. Three steps: sign up, mint a token, POST to /v1/chat/completions.
Quickstart
Point any OpenAI-SDK-compatible client at https://ai.viktorarsov.com/v1 and supply your project token as the API key.
from openai import OpenAI
client = OpenAI(
base_url="https://ai.viktorarsov.com/v1",
api_key="afp_live_…",
)
resp = client.chat.completions.create(
model="gateway:fast-free",
messages=[{"role": "user", "content": "Say hi."}],
)
print(resp.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://ai.viktorarsov.com/v1",
apiKey: "afp_live_…",
});
const resp = await client.chat.completions.create({
model: "gateway:fast-free",
messages: [{ role: "user", content: "Say hi." }],
});
console.log(resp.choices[0].message.content);curl https://ai.viktorarsov.com/v1/chat/completions \
-H "Authorization: Bearer afp_live_…" \
-H "Content-Type: application/json" \
-d '{
"model": "gateway:fast-free",
"messages": [{"role": "user", "content": "Say hi."}]
}'Models & aliases
You can address a model three ways:
- Alias —
gateway:fast-free,gateway:best-free-chat. Resolves to a ranked list. Recommended. - Provider-qualified —
groq/llama-3.1-8b-instant. Pin to one provider. - Bare —
llama-3.3-70b. Matches across providers by display name.
Aliases
| Slug | Kind | Description | Fallbacks |
|---|---|---|---|
| auto | alias | Spiral rotation: T1→T5 across all 5 sources × all keys. The DEFAULT alias. | 150 |
| best-free-chat | alias | Best-quality free chat: try premium-feel models first, fall through to high-volume. | 7 |
| cheap-paid | alias | Cheapest paid fallback when all free quotas exhausted. | 3 |
| code-free | alias | Coding-tuned free models (autocomplete, refactor, explain). | 4 |
| embed-cheap | alias | Cheap free-tier embeddings: Gemini gemini-embedding-001 (768-dim). | 1 |
| fast-free | alias | Low-latency cheap/free models for high-volume simple tasks. | 5 |
| gpt-4o-mini | alias | Compat alias: gpt-4o-mini → openrouter gpt-oss-120b:free. Added 2026-05-12 to absorb stray legacy callers. | 3 |
| long-context | alias | Models with very large context windows (>200K). | 4 |
| openai/gpt-4o-mini | alias | Compat alias: openai/gpt-4o-mini → openrouter gpt-oss-120b:free. Added 2026-05-12. | 3 |
| reasoning | alias | Stronger reasoning models. | 5 |
| vision | alias | Multimodal (image input) capable free models. | 5 |
The gateway rotates through each alias's fallback chain automatically on quota / cooldown. Chains may include paid models as last-resort fallbacks; routing prefers free providers first based on live cost-score.
Raw models
42 models. Free models show $0.00. Full OpenAPI schema at /docs.
| Provider | Model id | Display | Context | $/Mtok in | $/Mtok out |
|---|---|---|---|---|---|
| nvidia_nim | mistralai/mistral-large-3-675b-instruct-2512 | Mistral Large 3 675B (NIM) | 128000 | free | free |
| nvidia_nim | nvidia/llama-3.1-nemotron-ultra-253b-v1 | Nemotron Ultra 253B (NIM) | 128000 | free | free |
| nvidia_nim | meta/llama-3.1-405b-instruct | Llama 3.1 405B (NIM) | 128000 | free | free |
| nvidia_nim | deepseek-ai/deepseek-v3.2 | DeepSeek V3.2 (NIM) | 128000 | free | free |
| openrouter | nousresearch/hermes-3-llama-3.1-405b:free | Hermes 3 Llama 3.1 405B (free) | 131072 | free | free |
| nvidia_nim | meta/llama-4-maverick-17b-128e-instruct | Llama 4 Maverick (NIM, vision) | 128000 | free | free |
| openrouter | nvidia/nemotron-3-super-120b-a12b:free | Nemotron 3 Super 120B (free) | 262144 | free | free |
| openrouter | arcee-ai/trinity-large-preview:free | Arcee Trinity Large 400B (free) | 131072 | free | free |
| nvidia_nim | deepseek-ai/deepseek-v3.1-terminus | DeepSeek V3.1 Terminus (NIM) | 128000 | free | free |
| cerebras | zai-glm-4.7 | Z.ai GLM 4.7 (Cerebras) | 32768 | free | free |
| openrouter | qwen/qwen3-next-80b-a3b-instruct:free | Qwen3-Next 80B (free) | 262144 | free | free |
| nvidia_nim | moonshotai/kimi-k2-instruct | Kimi K2 (NIM) | 128000 | free | free |
| openrouter | openai/gpt-oss-120b:free | GPT-OSS 120B (OR, free) | 131072 | free | free |
| groq | groq/compound | Groq Compound (agentic) | 131072 | free | free |
| nvidia_nim | meta/llama-3.3-70b-instruct | Llama 3.3 70B (NIM) | 128000 | free | free |
| openrouter | google/gemma-4-31b-it:free | Gemma 4 31B (free) | 262144 | free | free |
| cerebras | gpt-oss-120b | GPT-OSS 120B (Cerebras) | 131072 | free | free |
| nvidia_nim | openai/gpt-oss-120b | GPT-OSS 120B (NIM) | 128000 | free | free |
| nvidia_nim | nvidia/llama-3.3-nemotron-super-49b-v1.5 | Nemotron Super 49B v1.5 (NIM) | 128000 | free | free |
| openrouter | meta-llama/llama-3.3-70b-instruct:free | Llama 3.3 70B (free) | 65536 | free | free |
| groq | meta-llama/llama-4-scout-17b-16e-instruct | Llama 4 Scout 17B (Groq) | 131072 | free | free |
| openrouter | z-ai/glm-4.5-air:free | Z.ai GLM 4.5 Air (free) | 131072 | free | free |
| openrouter | google/gemma-4-26b-a4b-it:free | Gemma 4 26B (free) | 262144 | free | free |
| openrouter | nvidia/nemotron-3-nano-30b-a3b:free | Nemotron 3 Nano 30B (free) | 256000 | free | free |
| groq | qwen/qwen3-32b | Qwen3 32B (Groq) | 131072 | free | free |
| openrouter | minimax/minimax-m2.5:free | Minimax M2.5 (free) | 196608 | free | free |
| openrouter | google/gemma-3-27b-it:free | Gemma 3 27B (free) | 131072 | free | free |
| openrouter | nvidia/nemotron-nano-12b-v2-vl:free | Nemotron Nano 12B VL (free, vision) | 128000 | free | free |
| freellmapi | auto | Aggregator auto-pick | 8192 | free | free |
| no_cost_ai | auto | Aggregator auto-pick | 8192 | free | free |
| nvidia_nim | qwen/qwen3-coder-480b-a35b-instruct | Qwen3 Coder 480B (NIM) | 128000 | free | free |
| openrouter | qwen/qwen3-coder:free | Qwen3 Coder 480B (free) | 262144 | free | free |
| gemini | text-embedding-004 | Gemini text-embedding-004 (768-dim, free) | 2048 | free | free |
| gemini | gemini-embedding-001 | Gemini gemini-embedding-001 (768-dim via outputDimensionality, free) | 2048 | free | free |
| groq | groq/compound-mini | Groq Compound Mini (agentic-fast) | 131072 | free | free |
| openrouter | openai/gpt-oss-20b:free | GPT-OSS 20B (OR, free) | 131072 | free | free |
| openrouter | nvidia/nemotron-nano-9b-v2:free | Nemotron Nano 9B (free) | 128000 | free | free |
| nvidia_nim | openai/gpt-oss-20b | GPT-OSS 20B (NIM) | 128000 | free | free |
| groq | openai/gpt-oss-20b | GPT-OSS 20B (Groq) | 131072 | free | free |
| openrouter | meta-llama/llama-3.2-3b-instruct:free | Llama 3.2 3B (free) | 131072 | free | free |
| nvidia_nim | moonshotai/kimi-k2-thinking | Kimi K2 Thinking (NIM, reasoning) | 128000 | free | free |
| nvidia_nim | qwen/qwen3-next-80b-a3b-thinking | Qwen3-Next 80B Thinking (NIM) | 128000 | free | free |
Error shapes
All errors return a uniform JSON envelope and an HTTP status matching the failure class.
{
"error": {
"type": "rate_limit_exceeded",
"message": "Provider quota exhausted; try again in 30s.",
"request_id": "req_01H…",
"retry_after_s": 30
}
}Error types
| Type | Status | Meaning |
|---|---|---|
| authentication_failed | 401 | Missing or invalid project token. |
| rate_limit_exceeded | 429 | RPM, daily or monthly cap hit. See retry_after_s. |
| pool_exhausted | 429 | Every upstream key is cooling. See next_slot_eta_s. |
| model_unavailable | 404 | Unknown model or alias resolution is empty. |
| upstream_error | 502 | All attempts failed with a 5xx at the provider. |
| bad_request | 400 | Malformed OpenAI-shape body or guardrail rejection. |
Headers & rate limits
Every successful response carries x-gateway-* metadata so you can see which key answered and at what cost without enabling extra logging.
Response headers
| Header | Meaning |
|---|---|
| x-gateway-provider | Slug of the provider that answered. |
| x-gateway-model | Final upstream model id used. |
| x-gateway-attempts | How many (key, model) pairs were tried. |
| x-gateway-latency-ms | Wall-clock time from ingress to first byte. |
| x-gateway-tokens | Total tokens counted for this request. |
| x-gateway-cost-usd | USD cost in our accounting. |
| x-gateway-request-id | UUID for cross-referencing in your logs. |
| Retry-After | Seconds until next attempt, on 429/503. |
Rate limits
Per-tenant limits come from your plan: RPM, monthly requests, monthly tokens. When the RPM bucket is full, the gateway either waits up to max_wait_ms and retries, or returns 429 with next_slot_eta_s.
Idempotency & retries
The gateway retries transient upstream failures automatically (up to 2 per attempt, up to N providers). Your own retries should use an Idempotency-Key header so we can deduplicate if our SSE disconnects mid-stream.