AI Fusion

AI Fusion docs

An OpenAI-compatible gateway aggregating free-tier API access across 5 providers and 42 models. Three steps: sign up, mint a token, POST to /v1/chat/completions.

Quickstart

Point any OpenAI-SDK-compatible client at https://ai.viktorarsov.com/v1 and supply your project token as the API key.

from openai import OpenAI

client = OpenAI(
    base_url="https://ai.viktorarsov.com/v1",
    api_key="afp_live_…",
)
resp = client.chat.completions.create(
    model="gateway:fast-free",
    messages=[{"role": "user", "content": "Say hi."}],
)
print(resp.choices[0].message.content)
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://ai.viktorarsov.com/v1",
  apiKey: "afp_live_…",
});
const resp = await client.chat.completions.create({
  model: "gateway:fast-free",
  messages: [{ role: "user", content: "Say hi." }],
});
console.log(resp.choices[0].message.content);
curl https://ai.viktorarsov.com/v1/chat/completions \
  -H "Authorization: Bearer afp_live_…" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gateway:fast-free",
    "messages": [{"role": "user", "content": "Say hi."}]
  }'

Models & aliases

You can address a model three ways:

  • Aliasgateway:fast-free, gateway:best-free-chat. Resolves to a ranked list. Recommended.
  • Provider-qualifiedgroq/llama-3.1-8b-instant. Pin to one provider.
  • Barellama-3.3-70b. Matches across providers by display name.

Aliases

SlugKindDescriptionFallbacks
auto alias Spiral rotation: T1→T5 across all 5 sources × all keys. The DEFAULT alias. 150
best-free-chat alias Best-quality free chat: try premium-feel models first, fall through to high-volume. 7
cheap-paid alias Cheapest paid fallback when all free quotas exhausted. 3
code-free alias Coding-tuned free models (autocomplete, refactor, explain). 4
embed-cheap alias Cheap free-tier embeddings: Gemini gemini-embedding-001 (768-dim). 1
fast-free alias Low-latency cheap/free models for high-volume simple tasks. 5
gpt-4o-mini alias Compat alias: gpt-4o-mini → openrouter gpt-oss-120b:free. Added 2026-05-12 to absorb stray legacy callers. 3
long-context alias Models with very large context windows (>200K). 4
openai/gpt-4o-mini alias Compat alias: openai/gpt-4o-mini → openrouter gpt-oss-120b:free. Added 2026-05-12. 3
reasoning alias Stronger reasoning models. 5
vision alias Multimodal (image input) capable free models. 5

The gateway rotates through each alias's fallback chain automatically on quota / cooldown. Chains may include paid models as last-resort fallbacks; routing prefers free providers first based on live cost-score.

Raw models

42 models. Free models show $0.00. Full OpenAPI schema at /docs.

ProviderModel idDisplayContext$/Mtok in$/Mtok out
nvidia_nim mistralai/mistral-large-3-675b-instruct-2512 Mistral Large 3 675B (NIM) 128000 free free
nvidia_nim nvidia/llama-3.1-nemotron-ultra-253b-v1 Nemotron Ultra 253B (NIM) 128000 free free
nvidia_nim meta/llama-3.1-405b-instruct Llama 3.1 405B (NIM) 128000 free free
nvidia_nim deepseek-ai/deepseek-v3.2 DeepSeek V3.2 (NIM) 128000 free free
openrouter nousresearch/hermes-3-llama-3.1-405b:free Hermes 3 Llama 3.1 405B (free) 131072 free free
nvidia_nim meta/llama-4-maverick-17b-128e-instruct Llama 4 Maverick (NIM, vision) 128000 free free
openrouter nvidia/nemotron-3-super-120b-a12b:free Nemotron 3 Super 120B (free) 262144 free free
openrouter arcee-ai/trinity-large-preview:free Arcee Trinity Large 400B (free) 131072 free free
nvidia_nim deepseek-ai/deepseek-v3.1-terminus DeepSeek V3.1 Terminus (NIM) 128000 free free
cerebras zai-glm-4.7 Z.ai GLM 4.7 (Cerebras) 32768 free free
openrouter qwen/qwen3-next-80b-a3b-instruct:free Qwen3-Next 80B (free) 262144 free free
nvidia_nim moonshotai/kimi-k2-instruct Kimi K2 (NIM) 128000 free free
openrouter openai/gpt-oss-120b:free GPT-OSS 120B (OR, free) 131072 free free
groq groq/compound Groq Compound (agentic) 131072 free free
nvidia_nim meta/llama-3.3-70b-instruct Llama 3.3 70B (NIM) 128000 free free
openrouter google/gemma-4-31b-it:free Gemma 4 31B (free) 262144 free free
cerebras gpt-oss-120b GPT-OSS 120B (Cerebras) 131072 free free
nvidia_nim openai/gpt-oss-120b GPT-OSS 120B (NIM) 128000 free free
nvidia_nim nvidia/llama-3.3-nemotron-super-49b-v1.5 Nemotron Super 49B v1.5 (NIM) 128000 free free
openrouter meta-llama/llama-3.3-70b-instruct:free Llama 3.3 70B (free) 65536 free free
groq meta-llama/llama-4-scout-17b-16e-instruct Llama 4 Scout 17B (Groq) 131072 free free
openrouter z-ai/glm-4.5-air:free Z.ai GLM 4.5 Air (free) 131072 free free
openrouter google/gemma-4-26b-a4b-it:free Gemma 4 26B (free) 262144 free free
openrouter nvidia/nemotron-3-nano-30b-a3b:free Nemotron 3 Nano 30B (free) 256000 free free
groq qwen/qwen3-32b Qwen3 32B (Groq) 131072 free free
openrouter minimax/minimax-m2.5:free Minimax M2.5 (free) 196608 free free
openrouter google/gemma-3-27b-it:free Gemma 3 27B (free) 131072 free free
openrouter nvidia/nemotron-nano-12b-v2-vl:free Nemotron Nano 12B VL (free, vision) 128000 free free
freellmapi auto Aggregator auto-pick 8192 free free
no_cost_ai auto Aggregator auto-pick 8192 free free
nvidia_nim qwen/qwen3-coder-480b-a35b-instruct Qwen3 Coder 480B (NIM) 128000 free free
openrouter qwen/qwen3-coder:free Qwen3 Coder 480B (free) 262144 free free
gemini text-embedding-004 Gemini text-embedding-004 (768-dim, free) 2048 free free
gemini gemini-embedding-001 Gemini gemini-embedding-001 (768-dim via outputDimensionality, free) 2048 free free
groq groq/compound-mini Groq Compound Mini (agentic-fast) 131072 free free
openrouter openai/gpt-oss-20b:free GPT-OSS 20B (OR, free) 131072 free free
openrouter nvidia/nemotron-nano-9b-v2:free Nemotron Nano 9B (free) 128000 free free
nvidia_nim openai/gpt-oss-20b GPT-OSS 20B (NIM) 128000 free free
groq openai/gpt-oss-20b GPT-OSS 20B (Groq) 131072 free free
openrouter meta-llama/llama-3.2-3b-instruct:free Llama 3.2 3B (free) 131072 free free
nvidia_nim moonshotai/kimi-k2-thinking Kimi K2 Thinking (NIM, reasoning) 128000 free free
nvidia_nim qwen/qwen3-next-80b-a3b-thinking Qwen3-Next 80B Thinking (NIM) 128000 free free

Error shapes

All errors return a uniform JSON envelope and an HTTP status matching the failure class.

{
  "error": {
    "type": "rate_limit_exceeded",
    "message": "Provider quota exhausted; try again in 30s.",
    "request_id": "req_01H…",
    "retry_after_s": 30
  }
}

Error types

TypeStatusMeaning
authentication_failed401Missing or invalid project token.
rate_limit_exceeded429RPM, daily or monthly cap hit. See retry_after_s.
pool_exhausted429Every upstream key is cooling. See next_slot_eta_s.
model_unavailable404Unknown model or alias resolution is empty.
upstream_error502All attempts failed with a 5xx at the provider.
bad_request400Malformed OpenAI-shape body or guardrail rejection.

Headers & rate limits

Every successful response carries x-gateway-* metadata so you can see which key answered and at what cost without enabling extra logging.

Response headers

HeaderMeaning
x-gateway-providerSlug of the provider that answered.
x-gateway-modelFinal upstream model id used.
x-gateway-attemptsHow many (key, model) pairs were tried.
x-gateway-latency-msWall-clock time from ingress to first byte.
x-gateway-tokensTotal tokens counted for this request.
x-gateway-cost-usdUSD cost in our accounting.
x-gateway-request-idUUID for cross-referencing in your logs.
Retry-AfterSeconds until next attempt, on 429/503.

Rate limits

Per-tenant limits come from your plan: RPM, monthly requests, monthly tokens. When the RPM bucket is full, the gateway either waits up to max_wait_ms and retries, or returns 429 with next_slot_eta_s.

Idempotency & retries

The gateway retries transient upstream failures automatically (up to 2 per attempt, up to N providers). Your own retries should use an Idempotency-Key header so we can deduplicate if our SSE disconnects mid-stream.