Skip to content

Cache TTL silently regressed from 1h to 5m around early March 2026, causing quota and cost inflation #46829

@seanGSISG

Description

@seanGSISG

Cache TTL appears to have silently regressed from 1h to 5m around early March 2026, causing significant quota and cost inflation

Summary

Analysis of raw Claude Code session JSONL files spanning Jan 11 – Apr 11, 2026 shows that Anthropic appears to have silently changed the prompt cache TTL default from 1 hour to 5 minutes sometime in early March 2026. Prior to this change, Claude Code was receiving 1-hour TTL cache writes — which we believe was the intended default. The reversion to 5-minute TTL has caused a 20–32% increase in cache creation costs and a measurable spike in quota consumption for subscription users who have never previously hit their limits.

This appears directly related to the behavior described in #45756.


Data

Session data extracted from ~/.claude/projects/ JSONL files across two machines (Linux workstation + Windows laptop, different accounts/sessions), totaling 119,866 API calls from Jan 11 – Apr 11, 2026. Each assistant message includes a usage.cache_creation.ephemeral_5m_input_tokens / ephemeral_1h_input_tokens breakdown that makes the TTL tier per-call observable. Having two independent machines strengthens the signal — both show the same behavioral shift at the same dates.

Phase breakdown

Phase Dates TTL behavior Evidence
1 Jan 11 – Jan 31 5m ONLY ephemeral_1h absent/zero — likely predates 1h tier availability in the API
2 Feb 1 – Mar 5 1h ONLY ephemeral_5m = 0, ephemeral_1h > 0 across 33+ consecutive days on both machines — near-zero exceptions
3 Mar 6–7 Transition First 5m tokens re-appear, small volumes, 1h still present
4 Mar 8 – Apr 11 5m dominant 5m tokens surge to majority; 1h becomes minority or disappears entirely

We believe Phase 2 represents Anthropic's intended default behavior — 1h TTL was rolled out as the Claude Code standard around Feb 1 and held consistently for over a month across two independent machines on two different accounts. January's all-5m data most likely predates the 1h TTL tier being available in the API. The regression began around March 6–8, 2026.

No client-side changes were made between phases. The same Claude Code version and usage patterns were in place throughout. The TTL tier is set server-side by Anthropic.

Day-by-day TTL data showing the regression (combined, both machines)

Date        | 5m-create  | 1h-create  | Behavior
------------|------------|------------|----------
2026-02-01  |      0.00M |      1.70M | 1h ONLY   ← 1h default begins
2026-02-09  |      0.00M |      7.95M | 1h ONLY
2026-02-15  |      0.00M |     13.61M | 1h ONLY   ← heaviest day, 100% 1h
2026-02-28  |      0.00M |     16.15M | 1h ONLY   ← 16M tokens, still 100% 1h
2026-03-01  |      0.00M |      0.12M | 1h ONLY
2026-03-04  |      0.00M |      8.12M | 1h ONLY
2026-03-05  |      0.00M |      6.55M | 1h ONLY   ← last clean 1h-only day
            |            |            |
2026-03-06  |      0.29M |      0.22M | MIXED     ← first 5m tokens reappear
2026-03-07  |      4.56M |      0.50M | MIXED     ← 5m surging
2026-03-08  |     16.86M |      3.44M | MIXED     ← 5m now dominant (83%)
2026-03-10  |     10.55M |      0.51M | MIXED
2026-03-15  |     19.47M |      1.84M | MIXED
2026-03-21  |     21.37M |      1.70M | MIXED     ← 93% 5m
2026-03-22  |     13.48M |      2.85M | MIXED

The transition is visible to the day: March 6 is when 5m tokens first reappear after 33 days of clean 1h-only behavior. By March 8, 5m tokens outnumber 1h by 5:1. This is consistent with a server-side configuration change being rolled out gradually then completing around March 8.


Cost impact

Applying official Anthropic pricing (rates.json, updated 2026-04-09):

Combined dataset (119,866 API calls, two machines):

claude-sonnet-4-6 (cache_write_5m = $3.75/MTok, cache_write_1h = $6.00/MTok, cache_read = $0.30/MTok):

Month Calls Actual cost Cost with 1h TTL Overpaid % waste
Jan 2026 2,639 $78.99 $37.54 $41.45 52.5%
Feb 2026 27,220 $1,120.43 $1,108.11 $12.32 1.1% ← nearly 0 on 1h
Mar 2026 68,264 $2,776.11 $2,057.01 $719.09 25.9%
Apr 2026 21,743 $1,193.01 $1,016.78 $176.23 14.8%
Total 119,866 $5,561.17 $4,612.09 $949.08 17.1%

claude-opus-4-6 (cache_write_5m = $6.25/MTok, cache_write_1h = $10.00/MTok, cache_read = $0.50/MTok):

Month Calls Actual cost Cost with 1h TTL Overpaid % waste
Jan 2026 2,639 $131.65 $62.57 $69.08 52.5%
Feb 2026 27,220 $1,867.38 $1,846.85 $20.53 1.1% ← nearly 0 on 1h
Mar 2026 68,264 $4,626.84 $3,428.36 $1,198.49 25.9%
Apr 2026 21,743 $1,988.35 $1,694.64 $293.71 14.8%
Total 119,866 $9,268.97 $7,687.17 $1,581.80 17.1%

February — the month Anthropic was defaulting to 1h TTL — shows only 1.1% waste (trace 5m activity from one machine on one day). Every other month shows 15–53% overpayment from 5m cache re-creations. The cost difference is explained entirely by TTL tier, not by usage volume. The percentage waste is identical across model tiers (17.1%) because it is driven purely by the 5m/1h token split, not by per-token price.

Why 5m TTL is so expensive in practice

With 5m TTL, any pause in a session longer than 5 minutes causes the entire cached context to expire. On the next turn, Claude Code must re-upload that context as a fresh cache_creation at the write rate, rather than a cache_read at the read rate. The write rate is 12.5× more expensive than the read rate for Sonnet, and the same ratio holds for Opus.

For long coding sessions — which are the primary Claude Code use case — this creates a compounding penalty: the longer and more complex your session, the more context you have cached, and the more expensive each cache expiry becomes.

Over the 3-month period analyzed:

  • 220M tokens were written to the 5m tier
  • Those same tokens generated 5.7B cache reads — meaning they were actively being used
  • Had those 220M tokens been on the 1h tier, re-accesses within the same hour would be reads ($0.30–0.50/MTok) instead of re-creations ($3.75–6.25/MTok)

Quota impact

Users on Pro/subscription plans are quota-limited, not just cost-limited. Cache creation tokens count toward quota at full rate; cache reads are significantly cheaper (the exact coefficient is under investigation in #45756). The silent reversion to 5m TTL in March is the most likely explanation for why subscription users began hitting their 5-hour quota limits for the first time — including the author of this issue, who had never hit quota limits before March 2026.


Hypothesis

The data strongly suggests that 1h TTL was the intended default for Claude Code and was in place as of early February 2026. Sometime between Feb 27 and Mar 8, 2026, Anthropic silently changed the default to 5m TTL — either intentionally as a cost-saving measure, or accidentally as an infrastructure regression.

Evidence supporting "1h was the intended default":

  • Phase 2 (1h ONLY) shows zero 5m tokens across 14 separate active days spanning 3+ weeks — this is not noise or partial rollout, it is consistent deliberate behavior
  • The February cost profile is the only month with 0% overpayment — it represents what users should have been paying all along
  • The March reversion immediately produced the largest 5m-tier days in the entire dataset (30M tokens on Mar 22 alone), suggesting a sudden configuration flip rather than gradual drift
  • Subscription users began hitting 5-hour quota limits for the first time in March — directly coinciding with the reversion

The most likely sequence of events:

  1. ~Feb 1 and prior: Anthropic defaulted to 1h TTL for Claude Code subscription users
  2. ~Mar 6: 5m tokens begin reappearing — gradual rollout of the change or partial infrastructure flip
  3. ~Mar 8: 5m TTL becomes dominant — the regression is fully in effect across both tested machines and accounts
  4. Mar 8+: Mixed behavior continues, suggesting either incomplete rollout, A/B testing, or regional infrastructure variance

The 33-day window of clean 1h-only behavior (Feb 1 – Mar 5) across two independent machines and two separate accounts makes this one of the strongest available signals that 1h TTL was Anthropic's deliberate default, not a fluke.


Request

  1. Confirm or deny whether Anthropic made a server-side TTL default change in early February 2026 and reverted it in early March 2026
  2. Clarify the intended TTL behavior for claude-code sessions — is 5m the intended default, or was 1h intended to be permanent?
  3. Consider restoring 1h TTL as the default for Claude Code sessions, or exposing it as a user-configurable option. The 5m TTL is disproportionately punishing for the long-session, high-context use case that defines Claude Code usage
  4. Disclose quota counting behavior for cache_read tokens (ref [BUG] Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage #45756) so users can make informed decisions about their usage patterns

Methodology

  • Source: raw ~/.claude/projects/**/*.jsonl session files (Claude Code stores per-message API responses including full usage objects)
  • Extraction: filtered for type: "assistant" entries with message.usage.cache_creation field
  • No external tools or proxies involved — this data comes directly from Claude Code's own session logs
  • Analysis tool: cnighswonger/claude-code-cache-fix quota-analysis --source mode (added to support this investigation)
  • Pricing: official Anthropic rates from rates.json (updated 2026-04-09)

Activity

khalic-lab

khalic-lab commented on Apr 12, 2026

@khalic-lab
EthanFrostpro

EthanFrostpro commented on Apr 12, 2026

@EthanFrostpro

This explains a lot about the quota burn rate increase people have been reporting. A 1h → 5min cache TTL change means cache_create operations happen 12x more frequently for the same session, and cache_create tokens cost significantly more than cache_read.

For anyone trying to work around this in the meantime: keeping sessions shorter and more focused (one task per session) reduces the impact since you hit cache invalidation less often. Also, structuring your CLAUDE.md to front-load the most critical context means the cache_create tokens are at least spent on high-value content.

Would be great to get official transparency on pricing-related infrastructure changes like this — silent downgrades erode trust, especially for teams budgeting based on observed costs.

self-assigned this
on Apr 12, 2026
Jarred-Sumner

Jarred-Sumner commented on Apr 12, 2026

@Jarred-Sumner

Thanks for the writeup — the JSONL analysis and date pinpointing is good detective work. Let me walk through what's going on.

The March 6 change makes Claude Code cheaper, not more expensive. 1h TTL for every request could cost more, not less.

The cost tables assume every 5m-tier write would have become a cheap cache read under 1h TTL. That's only true when the cached content is re-accessed within the hour. A meaningful share of Claude Code's requests are one-shot calls where the cached context is used once and not revisited — the 1h TTL those would just be more expensive writes with no follow-up read to amortize them, because 1h writes cost more than 5m writes (roughly 2× base input vs. 1.25× — see the prompt caching docs). So "1h everywhere" isn't the cheaper baseline the tables frame it as; for the requests that are on 5m, it would be more expensive.

Prompt cache optimization is something the Claude Code team invests heavily in on an ongoing basis. Different request types benefit from different TTL tiers, and the client selects per request. The March 6 change you spotted is part of that ongoing optimization work — it wasn't a regression, on balance it lowers total cost for users across the request mix. The pre-March-6 behavior (what your Phase 2 captures) wasn't the intended steady state.

A bug fixed in v2.1.90

A client-side bug could cause sessions that have used up all their subscription quota at application start and started using overages to stay on 5m TTL until their session exits. This was fixed in v2.1.90.

Responses to your specific asks

  1. Was there a change? Yes — March 6, intentional, part of ongoing cache optimization. You pinpointed the date correctly.
  2. Intended TTL behavior? The client picks per request based on the expected cache-reuse pattern; there is no single global default, by design.
  3. Restore 1h as the default / expose as configurable? 1h everywhere would increase total cost given the request mix, so we're not planning a global toggle.
  4. Cache-read quota weighting (ref [BUG] Pro Max 5x Quota Exhausted in 1.5 Hours Despite Moderate Usage #45756): we'll follow up there.
Decme

Decme commented on Apr 12, 2026

@Decme

Jarred what I'm getting from your convoluted post is the change is to reduce costs to Anthropic at the users' expense.

7 remaining items

nukeop

nukeop commented on Apr 13, 2026

@nukeop

A lot of time typing "claude pls open an issue about this, add lots of detail"

hattima-tim

hattima-tim commented on Apr 13, 2026

@hattima-tim

Classic scammer tactics: first, lure users in by promising a huge deal, then scam the hell out of them.

raghuvv

raghuvv commented on Apr 13, 2026

@raghuvv

The March 6 change you spotted is part of that ongoing optimization work — it wasn't a regression, on balance it lowers total cost for users across the request mix.

How does say a Claude Pro user benefit from this "lower total cost" exactly?

User has already paid the monthly/annual subscription cost. The money is already in Anthropic's bank acocunt.

What we are seeing instead is we exhaust the session quota under an hour and have to wait 4 hours to resume our work. This effectively wipes out our day.

This means even if I stay up 24 hours to try and max my utilization, I get about 4 hours of real use out of my subscription.

Responses to your specific asks

  1. Was there a change? Yes — March 6, intentional, part of ongoing cache optimization. You pinpointed the date correctly.
  2. Intended TTL behavior? The client picks per request based on the expected cache-reuse pattern; there is no single global default, by design.
  3. Restore 1h as the default / expose as configurable? 1h everywhere would increase total cost given the request mix, so we're not planning a global toggle.

If the caching TTL is being so drastically changed -- from 1 hour to 5 minutes is a humongous change -- and that change has this oversized impact on user experience to the point that Claude is effectively unusable for the majortity of the day....

... you ought to give your paying customers the control to toggle this the way they want. Let us experiment with setting it at 5 minutes, 15 mins, or 1 hour. With some experience we will figure out what works best for our way of working -- and will apply the right settings for ach session. Just like selecting a model, enabling extended thinking, or turning on plan mode. And I am sure community will find optimal ways of utilizing the cache TTLs that lower the cost for Anthropic as well in the long run.

PS: The drastic drop in user experience seen in mid March that I experienced first-hand is compelling me to add this voice. Otherwise, I am deeply respectful of the technical work being done by Anthropic team and wish you the very best -- want to see you succeed and make greater things. Cheers!

RockyMM

RockyMM commented on Apr 13, 2026

@RockyMM

Jarred’s response was very informative and very transparent, but came at the wrong time. Instead of having this answer as a post-mortem, it should have been a pre-mortem.

As amazing as is your product, I sense the lack of transparency and strategical thinking from the product team.

You need to be clear about it - every change that touches consumption limits or has a probability to change how customers are billed - this must be announced well well in advance. The folks are getting tired of new features, while their regular workflows are starting to misbehave and their limits are getting smaller and smaller.

The only proper way now is to make your roadmap public. This is the best thing you can do for yourself. Not so much for the customers. They will learn to live with the TTLs, with the bugs, and in the end, they will vote with their wallets. But the optics are getting really bad really quickly now for Anthropic.

phillip-haydon

phillip-haydon commented on Apr 13, 2026

@phillip-haydon

My usage is API usage and not a subscription, this is a good change, works out cheaper.

hi-fox

hi-fox commented on Apr 13, 2026

@hi-fox

My usage is API usage and not a subscription, this is a good change, works out cheaper.

API was always 5 minutes, don't spread misinformation. This is a bad overall change.

@Jarred-Sumner How does this work out cheaper? People not paying for caching writes? How about stop charging people caching writes then as I am pretty sure OpenAI does not charge extra for caching, it's done automatically.

Not to mention, you are expecting people to be able to read/process/formulate and then type a reply to every conversation turn within 5 minutes or risk a 10x token use?

This is AFTER changing limits to burn them faster during peak hours?

How is any of this customer-friendly? We're risking our tokens burning TWENTY TIMES FASTER and you are out here saying it's cheaper? No bro, it's to make YOUR costs cheaper. Be honest about it at least.

phillip-haydon

phillip-haydon commented on Apr 13, 2026

@phillip-haydon

My usage is API usage and not a subscription, this is a good change, works out cheaper.

API was always 5 minutes, don't spread misinformation. This is a bad overall change.

Via CC? Since when is the caching different between subscription and API different via CC? If you're suffering from lack of 1h caching then it sounds like you're not even using CC enough to hit any subscription limits.

hi-fox

hi-fox commented on Apr 13, 2026

@hi-fox

Via CC? Since when is the caching different between subscription and API different via CC?

Literally check the documentation and the results of this exact thread? Are you being dense?

If you're suffering from lack of 1h caching then it sounds like you're not even using CC enough to hit any subscription limits.

What? It's the exact opposite. 1h caching prevents you from resending every message as cache writes every 6 minutes.

ghostintheprompt

ghostintheprompt commented on Apr 13, 2026

@ghostintheprompt

Yeah it makes sense to me. I use the Claude dope daily and notice it slacking and switched over to Codex which can't write, but can code. Claude's ability to keep context definitely declined or feels 'throttled' or nerfed. Thanks for writing this. As someone that's been around the block I think they found it more profitable to sell the sizzle, than the steak. You can see the turn of events from the political pressure to the inflated claims on mythos, the cyber use case restrictions. They're not what they were and it's clear.

phillip-haydon

phillip-haydon commented on Apr 13, 2026

@phillip-haydon

Via CC? Since when is the caching different between subscription and API different via CC?

Literally check the documentation and the results of this exact thread? Are you being dense?

CC doesn't go to different places if you're using a subscription vs an API Key. 😆

cstrahan

cstrahan commented on Apr 13, 2026

@cstrahan

@phillip-haydon Where "CC goes" is irrelevant. API keys are distinct from subscription keys. If these are distinct entities, then Anthropic's servers obviously can and do treat the requests differently.

In pseudo code:

if (isAPIKey(key)) {
  // ... apply API key TTL, etc ...
} else {
  // ... apply subscription TTL, etc ...
}

Here's the API docs on prompt caching from January: https://web.archive.org/web/20260124153111/https://platform.claude.com/docs/en/build-with-claude/prompt-caching

By default, the cache has a 5-minute lifetime. The cache is refreshed for no additional cost each time the cached content is used.

If you find that 5 minutes is too short, Anthropic also offers a 1-hour cache duration at additional cost.

For more information, see 1-hour cache duration.

So I feel the need to reiterate what @hi-fox said:

Literally check the documentation and the results of this exact thread? Are you being dense?

If you can't wrap your head around this, I'd recommend spending more time sharpening your programming skills and spending less time in Claude Code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @rinchen@cstrahan@kshmir@Jarred-Sumner@phillip-haydon

      Issue actions