Skip to content

amardeeplakshkar/awesome-free-llm-apis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

5f321a7 Β· Mar 26, 2026

History

3 Commits
Mar 26, 2026
Mar 26, 2026
Mar 26, 2026

Repository files navigation

awesome-free-llm-apis 🧠

A curated list of permanently free LLM APIs β€” with rate limits, OpenAI SDK compatibility, available SDKs, speed tiers, and free model lists. No trial credits. No time-limited promos. No credit card required.

Awesome Last Verified PRs Welcome License: CC0 Providers

Keywords: free LLM API Β· free AI API Β· OpenAI compatible API Β· free GPT API Β· free Llama API Β· free inference API Β· LLM API key Β· no credit card AI API Β· free tier AI Β· open source LLM hosting


Why This List β€” The Most Detailed Free LLM API Directory

Most "free LLM API" lists give you a name and a link. This one gives you everything you need to decide before you sign up β€” so you're not hunting across 12 different docs pages to compare rate limits, SDK support, and OpenAI compatibility:

Column What it means
Free Models Models available on the permanent free tier
Rate Limits RPM (requests/min) and RPD (requests/day)
OpenAI Compat Can you use the OpenAI Python/JS SDK by just swapping base_url?
SDKs Official client libraries available
Speed Tier 🟒 Fast / 🟑 Medium / πŸ”΄ Slow (see legend below)

Contents

  • Provider APIs β€” companies that train or fine-tune their own models

  • Inference Providers β€” third-party platforms hosting open-weight models


    Provider APIs β€” First-Party Free LLM APIs

    APIs run by the companies that train or fine-tune the models themselves. These are official free AI APIs directly from the model creators.


    Google Gemini πŸ‡ΊπŸ‡Έ

    Google's flagship model family. The free tier via AI Studio is among the most generous of any first-party provider.

    Detail Info
    Free Models Gemini 2.5 Pro, Gemini 2.0 Flash, Gemini 1.5 Flash, Gemini 1.5 Flash-8B, Gemini 1.0 Pro
    Rate Limits 15 RPM / 1,500 RPD (Flash) Β· 2 RPM / 50 RPD (2.5 Pro)
    OpenAI Compat βœ… Yes β€” https://generativelanguage.googleapis.com/v1beta/openai/
    SDKs Python (google-generativeai), JS/TS, REST, Go, Swift, Dart
    Speed Tier 🟒 Fast (Flash) · 🟑 Medium (Pro)

    ⚠️ Free tier not available in the EU, UK, or Switzerland. Check available regions β†’


    Mistral AI πŸ‡ͺπŸ‡Ί

    European flagship. Apache 2.0 licensed models β€” free to use and self-host. One of the best free tiers for token volume.

    Detail Info
    Free Models Mistral Small 3.1, Mistral Large 3, Ministral 8B, Codestral Mamba, Mistral Embed
    Rate Limits 1 req/sec Β· 1B tokens/month
    OpenAI Compat βœ… Yes β€” https://api.mistral.ai/v1
    SDKs Python (mistralai), JS/TS (@mistralai/mistralai), REST
    Speed Tier 🟑 Medium

    Cohere πŸ‡ΊπŸ‡Έ

    Specializes in enterprise NLP. Strong free tier for RAG and embedding use cases.

    Detail Info
    Free Models Command A, Command R+, Command R, Aya Expanse 32B, Aya Expanse 8B + 5 more
    Rate Limits 20 RPM Β· 1,000 req/month
    OpenAI Compat ⚠️ Partial β€” native SDK preferred (cohere-python)
    SDKs Python (cohere), JS/TS (cohere-ai), Go, Java, REST
    Speed Tier 🟑 Medium

    Zhipu AI πŸ‡¨πŸ‡³

    Chinese AI lab. Flash models are genuinely free with no published cap β€” good for experimentation.

    Detail Info
    Free Models GLM-4.7-Flash, GLM-4.5-Flash, GLM-4.6V-Flash (vision)
    Rate Limits Undocumented
    OpenAI Compat βœ… Yes β€” https://open.bigmodel.cn/api/paas/v4/
    SDKs Python (zhipuai), REST
    Speed Tier 🟑 Medium

    Inference Providers β€” Free Third-Party LLM Hosting

    Third-party platforms hosting open-weight models from various sources (Meta, Mistral, DeepSeek, etc.). These are free AI inference APIs β€” no need to self-host.


    Groq πŸ‡ΊπŸ‡Έ

    Fastest free inference available. Runs on custom LPU (Language Processing Unit) hardware. Drop-in OpenAI replacement.

    Detail Info
    Free Models Llama 3.3 70B, Llama 4 Scout, Llama 4 Maverick, Gemma 2 9B, Kimi K2, Qwen QwQ 32B + 17 more
    Rate Limits 30 RPM Β· 14,400 RPD Β· 6,000 TPM (varies by model)
    OpenAI Compat βœ… Yes β€” https://api.groq.com/openai/v1
    SDKs Python (groq), JS/TS (groq), REST
    Speed Tier 🟒 Fast β€” LPU hardware, consistently 300–500 tok/sec

    Cerebras πŸ‡ΊπŸ‡Έ

    Wafer-scale chip inference. Competing with Groq on raw speed, strong free tier.

    Detail Info
    Free Models Llama 3.3 70B, Qwen3 235B, Llama 4 Scout, GPT-OSS-120B + 3 more
    Rate Limits 30 RPM Β· 60,000 TPM Β· 14,400 RPD
    OpenAI Compat βœ… Yes β€” https://api.cerebras.ai/v1
    SDKs Python (cerebras-cloud-sdk), REST
    Speed Tier 🟒 Fast β€” wafer-scale chip, comparable to Groq

    OpenRouter πŸ‡ΊπŸ‡Έ

    One API key for 30+ free models across multiple providers. Great as a fallback layer or for model switching.

    Detail Info
    Free Models DeepSeek R1, Llama 3.3 70B, GPT-OSS-120B, Qwen3 Coder 480B + 27 more (models ending in :free)
    Rate Limits 20 RPM Β· 200 RPD
    OpenAI Compat βœ… Yes β€” https://openrouter.ai/api/v1
    SDKs Python (via openai), JS/TS (via openai), REST
    Speed Tier 🟑 Medium β€” routes to various backends, latency varies

    GitHub Models πŸ‡ΊπŸ‡Έ

    Free inference via GitHub account. Good access to frontier models like GPT-4o alongside open-weight models.

    Detail Info
    Free Models GPT-4o, Llama 3.3 70B, DeepSeek-R1, Phi-4, Mistral Large + more
    Rate Limits 10–15 RPM Β· 50–150 RPD (varies by model tier)
    OpenAI Compat βœ… Yes β€” https://models.inference.ai.azure.com
    SDKs Python (azure-ai-inference or openai), JS/TS, REST
    Speed Tier 🟑 Medium

    NVIDIA NIM πŸ‡ΊπŸ‡Έ

    NVIDIA's hosted inference. Access to large parameter models including Qwen3 235B on GPU clusters.

    Detail Info
    Free Models Llama 3.3 70B, Mistral Large, Qwen3 235B, DeepSeek-R1 + more
    Rate Limits 40 RPM (credit-based, replenishes)
    OpenAI Compat βœ… Yes β€” https://integrate.api.nvidia.com/v1
    SDKs Python (via openai), REST
    Speed Tier 🟑 Medium

    Hugging Face Serverless πŸ‡ΊπŸ‡Έ

    $0.10/month in free credits, auto-replenished. Access to thousands of community and flagship models. Limited to models under 10GB unless featured.

    Detail Info
    Free Models Llama 3.3 70B, Qwen2.5 72B, Mistral 7B, Zephyr, Phi + many more
    Rate Limits $0.10 free credits/month (auto-refresh)
    OpenAI Compat βœ… Yes β€” https://api-inference.huggingface.co/v1
    SDKs Python (huggingface_hub, openai), JS/TS, REST
    Speed Tier πŸ”΄ Slow β€” shared queues, cold starts common

    Cloudflare Workers AI πŸ‡ΊπŸ‡Έ

    Edge inference baked into Cloudflare Workers. Globally low-latency via the Cloudflare network. 10K "neurons"/day free.

    Detail Info
    Free Models Llama 3.3 70B, Qwen QwQ 32B, Phi-2, Gemma 7B + 47 more
    Rate Limits 10,000 neurons/day (1 neuron β‰ˆ 1 output token)
    OpenAI Compat ⚠️ Partial β€” own REST API format, not drop-in
    SDKs JS/TS (Workers SDK), Python (via REST), REST
    Speed Tier 🟑 Medium β€” edge inference, latency varies by region

    Kluster AI πŸ‡ΊπŸ‡Έ

    Newer inference provider with access to flagship open-weight models. Rate limits not publicly documented.

    Detail Info
    Free Models DeepSeek-R1, Llama 4 Maverick, Qwen3-235B + 2 more
    Rate Limits Undocumented
    OpenAI Compat βœ… Yes
    SDKs Python (via openai), REST
    Speed Tier 🟑 Medium

    LLM7.io πŸ‡¬πŸ‡§

    UK-based inference provider. Token-based rate limiting β€” free token increases RPM from 15 to 30.

    Detail Info
    Free Models DeepSeek R1, Gemini Flash-Lite, Qwen2.5 Coder + 27 more
    Rate Limits 15 RPM (30 RPM with free token)
    OpenAI Compat βœ… Yes
    SDKs Python (via openai), REST
    Speed Tier 🟑 Medium

    Pollinations AI πŸ‡©πŸ‡ͺ

    Berlin-based open-source platform. Unique in covering text, image, video, and audio generation all under one free API. No sign-up required for basic use β€” just hit the endpoint. API key unlocks higher limits and model access.

    Detail Info
    Free Models openai, openai-large, openai-reasoning, gemini, gemini-large, mistral, llama (text) Β· flux, gpt-image, seedream, kontext (image) Β· wan-fast (video) Β· tts-1, 30+ ElevenLabs voices (audio)
    Rate Limits Per-IP, resets hourly. Undocumented exact cap β€” authenticated requests get priority limits
    OpenAI Compat βœ… Yes β€” https://gen.pollinations.ai/v1 (text & audio endpoints)
    SDKs Python (via openai), JS/TS (via openai), REST, MCP server
    Speed Tier 🟑 Medium

    πŸ’‘ Standout feature: The only free API on this list with image, video, and audio generation alongside text β€” all from one key. Also has an MCP server for use directly inside Claude and other AI assistants.


    Speed Tier Legend

    Tier Typical Output Speed Hardware
    🟒 Fast 300–600 tok/sec Custom silicon (LPU/Wafer-scale)
    🟑 Medium 50–150 tok/sec Standard cloud GPUs (A100/H100)
    πŸ”΄ Slow < 50 tok/sec or variable Shared queues, CPU offload, cold starts

    Speed tiers are approximate. Real-world performance varies based on model size, prompt length, and time of day.


    Quick Comparison β€” Free LLM APIs at a Glance

    All 12 free LLM API providers side by side. Sorted by category (provider-first, then inference). Use this to pick the right free AI API for your use case before diving into the full entry above.

    Provider Best Free Model RPM RPD OpenAI Compat Speed
    Google Gemini Gemini 2.5 Pro 2–15 50–1,500 βœ… 🟒🟑
    Mistral AI Mistral Large 3 60 Unlimited* βœ… 🟑
    Cohere Command A 20 ~33/day ⚠️ 🟑
    Zhipu AI GLM-4.7-Flash β€” β€” βœ… 🟑
    Groq Llama 3.3 70B 30 14,400 βœ… 🟒
    Cerebras Qwen3 235B 30 14,400 βœ… 🟒
    OpenRouter Qwen3 Coder 480B 20 200 βœ… 🟑
    GitHub Models GPT-4o 10–15 50–150 βœ… 🟑
    NVIDIA NIM Qwen3 235B 40 β€” βœ… 🟑
    Hugging Face Llama 3.3 70B β€” credit-based βœ… πŸ”΄
    Cloudflare Workers AI Llama 3.3 70B β€” 10K neurons ⚠️ 🟑
    Pollinations AI openai-large + image/video/audio β€” hourly reset βœ… 🟑

    * Mistral free tier is token-volume capped (1B tokens/month), not RPD capped.


    Code Snippets β€” How to Use Free LLM APIs with the OpenAI SDK

    All OpenAI-compatible free LLM APIs work with the same pattern. Just swap base_url and api_key β€” no new SDK to learn:

from openai import OpenAI

# Swap these two lines to switch provider
BASE_URL = "https://api.groq.com/openai/v1"   # Groq
API_KEY  = "your-groq-key"

client = OpenAI(api_key=API_KEY, base_url=BASE_URL)

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",           # change model per provider
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)

Provider base URLs:

PROVIDERS = {
    "groq":        ("https://api.groq.com/openai/v1",                     "llama-3.3-70b-versatile"),
    "cerebras":    ("https://api.cerebras.ai/v1",                         "llama-3.3-70b"),
    "openrouter":  ("https://openrouter.ai/api/v1",                       "meta-llama/llama-3.3-70b-instruct:free"),
    "mistral":     ("https://api.mistral.ai/v1",                          "mistral-small-latest"),
    "gemini":      ("https://generativelanguage.googleapis.com/v1beta/openai/", "gemini-2.0-flash"),
    "github":      ("https://models.inference.ai.azure.com",              "gpt-4o"),
    "nvidia":      ("https://integrate.api.nvidia.com/v1",                "meta/llama-3.3-70b-instruct"),
    "huggingface": ("https://api-inference.huggingface.co/v1",            "meta-llama/Llama-3.3-70B-Instruct"),
    "zhipu":       ("https://open.bigmodel.cn/api/paas/v4/",              "glm-4-flash"),
    "pollinations": ("https://gen.pollinations.ai/v1",                    "openai-large"),  # no key needed for basic use
}

Notes & Definitions

  • RPM = requests per minute Β· RPD = requests per day Β· TPM = tokens per minute
  • "Limits undocumented" means the provider does not publicly publish rate limits β€” expect throttling.
  • All providers marked βœ… OpenAI Compat work with the openai Python/JS SDK by changing base_url.
  • Providers marked ⚠️ Partial have their own SDK or require minor request format changes.
  • Trial credits and time-limited promos are excluded β€” only permanent free tiers are listed.
  • Entries verified as of March 2026. Rate limits change frequently β€” always check provider docs.

Related Resources

Looking for something specific? These searches might help:

  • Free OpenAI-compatible APIs β†’ filter the table above by βœ… OpenAI Compat
  • Fastest free LLM API β†’ see Groq and Cerebras
  • Free API with no sign-up β†’ see Pollinations AI
  • Free LLM API for images β†’ see Pollinations AI
  • Free LLM API for Europe β†’ see Mistral AI (EU-based, no region block)
  • Free Llama API β†’ Groq, Cerebras, OpenRouter, GitHub Models all offer free Llama 3.3 70B
  • Free DeepSeek API β†’ OpenRouter, Kluster AI, LLM7.io, GitHub Models

Contributing to awesome-free-llm-apis

See contributing.md for the full guide. The short version:

  1. Fork this repo
  2. Add your entry following the existing format (table + all fields)
  3. Include a link to the provider's official rate limit documentation
  4. Open a pull request β€” add the current month/year you verified it

Rules: No trial credits. No invite-only access. No entries missing rate limits without noting "undocumented". One entry per provider.


License

CC0 1.0 β€” public domain. Use freely, no attribution required.

About

A curated list of permanently free LLM APIs β€” with rate limits, OpenAI SDK compatibility, available SDKs, speed tiers, and free model lists. No trial credits. No time-limited promos. No credit card required.

Topics

Resources

License

Contributing

Stars

Watchers

Forks