A curated list of permanently free LLM APIs β with rate limits, OpenAI SDK compatibility, available SDKs, speed tiers, and free model lists. No trial credits. No time-limited promos. No credit card required.
Keywords: free LLM API Β· free AI API Β· OpenAI compatible API Β· free GPT API Β· free Llama API Β· free inference API Β· LLM API key Β· no credit card AI API Β· free tier AI Β· open source LLM hosting
Most "free LLM API" lists give you a name and a link. This one gives you everything you need to decide before you sign up β so you're not hunting across 12 different docs pages to compare rate limits, SDK support, and OpenAI compatibility:
| Column | What it means |
|---|---|
| Free Models | Models available on the permanent free tier |
| Rate Limits | RPM (requests/min) and RPD (requests/day) |
| OpenAI Compat | Can you use the OpenAI Python/JS SDK by just swapping base_url? |
| SDKs | Official client libraries available |
| Speed Tier | π’ Fast / π‘ Medium / π΄ Slow (see legend below) |
-
Provider APIs β companies that train or fine-tune their own models
-
Inference Providers β third-party platforms hosting open-weight models
- includes Groq, Cerebras, OpenRouter, GitHub Models, NVIDIA NIM, Hugging Face, Cloudflare, Kluster AI, LLM7.io, Pollinations AI
- Speed Tier Legend
- Quick Comparison Table
- Code Snippets
- Contributing
APIs run by the companies that train or fine-tune the models themselves. These are official free AI APIs directly from the model creators.
Google Gemini πΊπΈ
Google's flagship model family. The free tier via AI Studio is among the most generous of any first-party provider.
Detail Info Free Models Gemini 2.5 Pro, Gemini 2.0 Flash, Gemini 1.5 Flash, Gemini 1.5 Flash-8B, Gemini 1.0 Pro Rate Limits 15 RPM / 1,500 RPD (Flash) Β· 2 RPM / 50 RPD (2.5 Pro) OpenAI Compat β Yes β https://generativelanguage.googleapis.com/v1beta/openai/SDKs Python ( google-generativeai), JS/TS, REST, Go, Swift, DartSpeed Tier π’ Fast (Flash) Β· π‘ Medium (Pro) β οΈ Free tier not available in the EU, UK, or Switzerland. Check available regions β
Mistral AI πͺπΊ
European flagship. Apache 2.0 licensed models β free to use and self-host. One of the best free tiers for token volume.
Detail Info Free Models Mistral Small 3.1, Mistral Large 3, Ministral 8B, Codestral Mamba, Mistral Embed Rate Limits 1 req/sec Β· 1B tokens/month OpenAI Compat β Yes β https://api.mistral.ai/v1SDKs Python ( mistralai), JS/TS (@mistralai/mistralai), RESTSpeed Tier π‘ Medium
Cohere πΊπΈ
Specializes in enterprise NLP. Strong free tier for RAG and embedding use cases.
Detail Info Free Models Command A, Command R+, Command R, Aya Expanse 32B, Aya Expanse 8B + 5 more Rate Limits 20 RPM Β· 1,000 req/month OpenAI Compat β οΈ Partial β native SDK preferred (cohere-python)SDKs Python ( cohere), JS/TS (cohere-ai), Go, Java, RESTSpeed Tier π‘ Medium
Zhipu AI π¨π³
Chinese AI lab. Flash models are genuinely free with no published cap β good for experimentation.
Detail Info Free Models GLM-4.7-Flash, GLM-4.5-Flash, GLM-4.6V-Flash (vision) Rate Limits Undocumented OpenAI Compat β Yes β https://open.bigmodel.cn/api/paas/v4/SDKs Python ( zhipuai), RESTSpeed Tier π‘ Medium
Third-party platforms hosting open-weight models from various sources (Meta, Mistral, DeepSeek, etc.). These are free AI inference APIs β no need to self-host.
Groq πΊπΈ
Fastest free inference available. Runs on custom LPU (Language Processing Unit) hardware. Drop-in OpenAI replacement.
Detail Info Free Models Llama 3.3 70B, Llama 4 Scout, Llama 4 Maverick, Gemma 2 9B, Kimi K2, Qwen QwQ 32B + 17 more Rate Limits 30 RPM Β· 14,400 RPD Β· 6,000 TPM (varies by model) OpenAI Compat β Yes β https://api.groq.com/openai/v1SDKs Python ( groq), JS/TS (groq), RESTSpeed Tier π’ Fast β LPU hardware, consistently 300β500 tok/sec
Cerebras πΊπΈ
Wafer-scale chip inference. Competing with Groq on raw speed, strong free tier.
Detail Info Free Models Llama 3.3 70B, Qwen3 235B, Llama 4 Scout, GPT-OSS-120B + 3 more Rate Limits 30 RPM Β· 60,000 TPM Β· 14,400 RPD OpenAI Compat β Yes β https://api.cerebras.ai/v1SDKs Python ( cerebras-cloud-sdk), RESTSpeed Tier π’ Fast β wafer-scale chip, comparable to Groq
OpenRouter πΊπΈ
One API key for 30+ free models across multiple providers. Great as a fallback layer or for model switching.
Detail Info Free Models DeepSeek R1, Llama 3.3 70B, GPT-OSS-120B, Qwen3 Coder 480B + 27 more (models ending in :free)Rate Limits 20 RPM Β· 200 RPD OpenAI Compat β Yes β https://openrouter.ai/api/v1SDKs Python (via openai), JS/TS (viaopenai), RESTSpeed Tier π‘ Medium β routes to various backends, latency varies
GitHub Models πΊπΈ
Free inference via GitHub account. Good access to frontier models like GPT-4o alongside open-weight models.
Detail Info Free Models GPT-4o, Llama 3.3 70B, DeepSeek-R1, Phi-4, Mistral Large + more Rate Limits 10β15 RPM Β· 50β150 RPD (varies by model tier) OpenAI Compat β Yes β https://models.inference.ai.azure.comSDKs Python ( azure-ai-inferenceoropenai), JS/TS, RESTSpeed Tier π‘ Medium
NVIDIA NIM πΊπΈ
NVIDIA's hosted inference. Access to large parameter models including Qwen3 235B on GPU clusters.
Detail Info Free Models Llama 3.3 70B, Mistral Large, Qwen3 235B, DeepSeek-R1 + more Rate Limits 40 RPM (credit-based, replenishes) OpenAI Compat β Yes β https://integrate.api.nvidia.com/v1SDKs Python (via openai), RESTSpeed Tier π‘ Medium
Hugging Face Serverless πΊπΈ
$0.10/month in free credits, auto-replenished. Access to thousands of community and flagship models. Limited to models under 10GB unless featured.
Detail Info Free Models Llama 3.3 70B, Qwen2.5 72B, Mistral 7B, Zephyr, Phi + many more Rate Limits $0.10 free credits/month (auto-refresh) OpenAI Compat β Yes β https://api-inference.huggingface.co/v1SDKs Python ( huggingface_hub,openai), JS/TS, RESTSpeed Tier π΄ Slow β shared queues, cold starts common
Cloudflare Workers AI πΊπΈ
Edge inference baked into Cloudflare Workers. Globally low-latency via the Cloudflare network. 10K "neurons"/day free.
Detail Info Free Models Llama 3.3 70B, Qwen QwQ 32B, Phi-2, Gemma 7B + 47 more Rate Limits 10,000 neurons/day (1 neuron β 1 output token) OpenAI Compat β οΈ Partial β own REST API format, not drop-inSDKs JS/TS (Workers SDK), Python (via REST), REST Speed Tier π‘ Medium β edge inference, latency varies by region
Kluster AI πΊπΈ
Newer inference provider with access to flagship open-weight models. Rate limits not publicly documented.
Detail Info Free Models DeepSeek-R1, Llama 4 Maverick, Qwen3-235B + 2 more Rate Limits Undocumented OpenAI Compat β Yes SDKs Python (via openai), RESTSpeed Tier π‘ Medium
LLM7.io π¬π§
UK-based inference provider. Token-based rate limiting β free token increases RPM from 15 to 30.
Detail Info Free Models DeepSeek R1, Gemini Flash-Lite, Qwen2.5 Coder + 27 more Rate Limits 15 RPM (30 RPM with free token) OpenAI Compat β Yes SDKs Python (via openai), RESTSpeed Tier π‘ Medium
Pollinations AI π©πͺ
Berlin-based open-source platform. Unique in covering text, image, video, and audio generation all under one free API. No sign-up required for basic use β just hit the endpoint. API key unlocks higher limits and model access.
Detail Info Free Models openai, openai-large, openai-reasoning, gemini, gemini-large, mistral, llama (text) Β· flux, gpt-image, seedream, kontext (image) Β· wan-fast (video) Β· tts-1, 30+ ElevenLabs voices (audio) Rate Limits Per-IP, resets hourly. Undocumented exact cap β authenticated requests get priority limits OpenAI Compat β Yes β https://gen.pollinations.ai/v1(text & audio endpoints)SDKs Python (via openai), JS/TS (viaopenai), REST, MCP serverSpeed Tier π‘ Medium π‘ Standout feature: The only free API on this list with image, video, and audio generation alongside text β all from one key. Also has an MCP server for use directly inside Claude and other AI assistants.
Tier Typical Output Speed Hardware π’ Fast 300β600 tok/sec Custom silicon (LPU/Wafer-scale) π‘ Medium 50β150 tok/sec Standard cloud GPUs (A100/H100) π΄ Slow < 50 tok/sec or variable Shared queues, CPU offload, cold starts Speed tiers are approximate. Real-world performance varies based on model size, prompt length, and time of day.
All 12 free LLM API providers side by side. Sorted by category (provider-first, then inference). Use this to pick the right free AI API for your use case before diving into the full entry above.
Provider Best Free Model RPM RPD OpenAI Compat Speed Google Gemini Gemini 2.5 Pro 2β15 50β1,500 β π’π‘ Mistral AI Mistral Large 3 60 Unlimited* β π‘ Cohere Command A 20 ~33/day β οΈ π‘ Zhipu AI GLM-4.7-Flash β β β π‘ Groq Llama 3.3 70B 30 14,400 β π’ Cerebras Qwen3 235B 30 14,400 β π’ OpenRouter Qwen3 Coder 480B 20 200 β π‘ GitHub Models GPT-4o 10β15 50β150 β π‘ NVIDIA NIM Qwen3 235B 40 β β π‘ Hugging Face Llama 3.3 70B β credit-based β π΄ Cloudflare Workers AI Llama 3.3 70B β 10K neurons β οΈ π‘ Pollinations AI openai-large + image/video/audio β hourly reset β π‘ * Mistral free tier is token-volume capped (1B tokens/month), not RPD capped.
All OpenAI-compatible free LLM APIs work with the same pattern. Just swap
base_urlandapi_keyβ no new SDK to learn:
from openai import OpenAI
# Swap these two lines to switch provider
BASE_URL = "https://api.groq.com/openai/v1" # Groq
API_KEY = "your-groq-key"
client = OpenAI(api_key=API_KEY, base_url=BASE_URL)
response = client.chat.completions.create(
model="llama-3.3-70b-versatile", # change model per provider
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)Provider base URLs:
PROVIDERS = {
"groq": ("https://api.groq.com/openai/v1", "llama-3.3-70b-versatile"),
"cerebras": ("https://api.cerebras.ai/v1", "llama-3.3-70b"),
"openrouter": ("https://openrouter.ai/api/v1", "meta-llama/llama-3.3-70b-instruct:free"),
"mistral": ("https://api.mistral.ai/v1", "mistral-small-latest"),
"gemini": ("https://generativelanguage.googleapis.com/v1beta/openai/", "gemini-2.0-flash"),
"github": ("https://models.inference.ai.azure.com", "gpt-4o"),
"nvidia": ("https://integrate.api.nvidia.com/v1", "meta/llama-3.3-70b-instruct"),
"huggingface": ("https://api-inference.huggingface.co/v1", "meta-llama/Llama-3.3-70B-Instruct"),
"zhipu": ("https://open.bigmodel.cn/api/paas/v4/", "glm-4-flash"),
"pollinations": ("https://gen.pollinations.ai/v1", "openai-large"), # no key needed for basic use
}- RPM = requests per minute Β· RPD = requests per day Β· TPM = tokens per minute
- "Limits undocumented" means the provider does not publicly publish rate limits β expect throttling.
- All providers marked β
OpenAI Compat work with the
openaiPython/JS SDK by changingbase_url. - Providers marked
β οΈ Partial have their own SDK or require minor request format changes. - Trial credits and time-limited promos are excluded β only permanent free tiers are listed.
- Entries verified as of March 2026. Rate limits change frequently β always check provider docs.
Looking for something specific? These searches might help:
- Free OpenAI-compatible APIs β filter the table above by β OpenAI Compat
- Fastest free LLM API β see Groq and Cerebras
- Free API with no sign-up β see Pollinations AI
- Free LLM API for images β see Pollinations AI
- Free LLM API for Europe β see Mistral AI (EU-based, no region block)
- Free Llama API β Groq, Cerebras, OpenRouter, GitHub Models all offer free Llama 3.3 70B
- Free DeepSeek API β OpenRouter, Kluster AI, LLM7.io, GitHub Models
See contributing.md for the full guide. The short version:
- Fork this repo
- Add your entry following the existing format (table + all fields)
- Include a link to the provider's official rate limit documentation
- Open a pull request β add the current month/year you verified it
Rules: No trial credits. No invite-only access. No entries missing rate limits without noting "undocumented". One entry per provider.
CC0 1.0 β public domain. Use freely, no attribution required.