Claude's Journey to Foolishness in Diagrams: The Cost of Thriftiness, or How API Bill Increased 100-Fold

By: blockbeats|2026/04/13 19:04:34

A few days ago, Stella Laurenzo, Head of AI at AMD, posted an issue titled "Claude Code Unusable for Complex Engineering Tasks" in the Claude Code official repository. This was not a user's emotional complaint but a quantitative analysis based on 6,800 sessions. It brought to the forefront the AI community's most unwilling-to-face issue, with one set of numbers particularly standing out: a cost-saving configuration tweak by Anthropic skyrocketed this team's API monthly bill from $345 to $42,121.

Laurenzo's team tracked 235,000 tool invocations, 18,000 prompts, and documented the systemic performance degradation of Claude Code since February 2026. This report was later covered by The Register, sparking a two-week-long storm of public opinion in the developer community.

Boris Cherny, Head of the Anthropic Claude Code team, provided an explanation on Hacker News. On February 9, with the release of Opus 4.6, a "self-thinking" mechanism was enabled by default, where the model autonomously decides the thought duration. On March 3, Anthropic then lowered the default thinking effort to 85. The official explanation was "the optimal balance point between intelligence, latency, and cost." The actual impact of these two adjustments is evident from the data.

Thought Depth Plummets by Three Quarters

According to Stella Laurenzo's GitHub Issue data, Claude Code's average thought depth experienced a three-stage collapse over two months: from a high of 2,200 characters at the end of January to 720 characters by the end of February, a 67% drop. By March, it further shrunk to 560 characters, a 75% decrease from the peak.

Claude's Journey to Foolishness in Diagrams: The Cost of Thriftiness, or How API Bill Increased 100-Fold

Thought depth here is a proxy metric reflecting how much "internal deliberation" the model is willing to engage in before providing an answer. The difference between 2,200 and 560 characters is roughly equivalent to degrading from "drafting before responding" to "thinking for two seconds in your head before speaking."

Laurenzo also pointed out that the "Thought Content Redaction" feature (redact-thinking-2026-02-12) launched in early March coincidentally masked the model's thought process during this period, making the shrinkage less perceptible to users. Boris Cherny insists this was merely a UI change and did not affect the underlying reasoning. Both claims are technically valid, but from a user's perspective, the effect is indistinguishable.

Boris Cherny later acknowledged that even manually setting the effort back to maximum, the self-thought mechanism may still allocate insufficient reasoning in some rounds, leading to hallucinatory content. "Restoring maximum effort" is not a complete solution; it merely turns the knob back closer to its original position rather than restoring it to its original determinism.

From "Research-Oriented Programmer" to "Blind Edit Programmer"

A detail in Stella Laurenzo's report is more explicit than thinking depth: how many relevant files the model actively reads before making changes to the code.

According to GitHub Issue data, during the prime period, the average read-to-edit ratio is 6.6. Before making a code change, the model, on average, reads 6.6 files to understand the context. During the decay period, this number drops to 2.0, a 70% decrease. More critically, about one-third of code edits occur without the model reading the target file, diving straight in.

Laurenzo refers to this as "blind edits." In engineering terms, this is akin to a programmer writing code without looking at function signatures or knowing variable types. "Every senior engineer on my team has had similar first-hand experiences," she wrote in her report. "Claude can no longer be trusted to carry out complex engineering tasks."

The drop from a 6.6 read-to-edit ratio to 2.0 is not merely a behavioral metric shift; it signifies a collapse in task success rates. The complexity of modern code repositories dictates that any modification involves dependencies across multiple files. Skipping context exploration and directly making changes doesn't lead to merely "incorrect answers" but rather to "seemingly correct changes that trigger new errors downstream. The cost of debugging such errors far exceeds that of a single failed explicit answer.

The Paradox of "Saving Money"

One of the most counterintuitive sets of numbers in the entire incident comes from the same GitHub Issue data: Stella Laurenzo's team saw the monthly invocation costs of Claude Code API plummet from $345 in February 2026 to a whopping $42,121 in March, a 122-fold increase.

The logic behind Anthropics' effort reduction was to lower the token consumption per call, thus reducing costs. However, the outcome was the opposite. The reason behind this was the emergence of numerous "reasoning loops" after the model's decay, leading to repeated self-negation within a single reply, constant restarts, and a token consumption far exceeding the saved amount. According to Stella Laurenzo's data, the rate of users voluntarily aborting tasks increased by 12 times during the same period, requiring developers' continuous intervention, correction, and resubmission.

The underlying logic is a systemic error. Slashing computational power on a complex task does not simply proportionally reduce costs. Once below a certain threshold of thought, the model starts to veer off track, and the overall cost ends up escalating. Lowering effort saved money on simple queries, but on coding tasks, it blew up the bill.

The "Dumbing Down" Thing, GPT-4 Did It Three Years Ago

In July 2023, a research team from Stanford University and the University of California, Berkeley, published a paper on arXiv titled "How is ChatGPT's behavior changing over time?", documenting the same phenomenon happening on GPT-4.

According to the research data, in March 2023, GPT-4 had generated code where over 50% was directly runnable. By June, this proportion had dropped to 10%, an 80% decrease over three months. During the same period, the prime number identification accuracy plummeted from 97.6% to 2.4%. OpenAI's response was highly similar to Anthropic's: there had been optimizations in the background, part of normal iteration.

The structure of the two stories is almost identical: an AI company quietly adjusted parameters affecting the model's capabilities in the background, users noticed, the company acknowledged the adjustment, but explained it as "more reasonable resource allocation." GPT-4's degradation occurred in 2023, Claude's degradation happened in 2026, three years apart, but the script is the same.

This is not a specific company's peculiar mistake. The economic logic of AI subscription models determines that when reasoning costs exceed the pricing that can be covered, manufacturers face the same pressure. Lowering the default thought intensity is currently the easiest knob to turn between cost and performance. What users perceive is the model "getting dumber." What the manufacturer saves on the books is the marginal token cost per call.

Boris Cherny has provided a technical solution where users can manually restore the thought intensity to the highest level through the /effort high command or by modifying the configuration file. This solution is technically feasible, but it also means that "maximum performance" is no longer the default setting.

From $345 to $42,121, what was spent was not just the budget but also an assumption: the default configuration changes made by the manufacturer were intended to improve user experience.

Solana did not fall behind during the bear market. Trading enthusiasm has waned, but the network is more stable, RWA and stablecoins are expanding, and the capital foundation is much thicker than in the previous cycle. The real question is: when the speculative tide recedes, can perpetuals, predicti...

Young people in South Korea make a "final effort" in the epic bull market

The South Koreans' average of two accounts for wildly gambling in the chip bull market reflects the survival anxiety and harsh reality of countless young people trying to break through class barriers behind the nationwide stock trading frenzy for wealth.

Dialogue with OmenX Founder: Why does the prediction market need an evolution from "spot" to "derivatives"?

How to reconstruct the prediction market using leverage?

When the P2P illicit funds from ten years ago turned into 60,000 bitcoins

The largest Bitcoin money laundering case in the UK has new developments: 16,000 Chinese victims are pursuing 61,000 seized Bitcoins across borders, and the dispute over the applicability of UK and Chinese laws will directly determine whether the victims can share in the soaring profits.

Morning News | CME Group launches Nasdaq Cryptocurrency Index futures; Asset management giant Janus Henderson strategically invests in Ethena

Overview of Important Market Events on June 10

Why did Oracle deliver the strongest financial report in history, yet its stock price fell?

Oracle's revenue for fiscal year 2026 set a record, with AI cloud orders soaring to $638 billion, but massive capital expenditures on computing power led to negative free cash flow, causing a 5% drop in after-hours stock prices.

Bitcoin Layer 2 Network Botanix: Why Did We Choose to Dissolve?

The Bitcoin L2 star project Botanix announced a gradual shutdown, with the team admitting to facing severe challenges from the failure of its business model and the prevailing trends. Users are urged to withdraw all assets before July 9, 2026.

Morning Report | OpenAI has submitted an S-1 registration statement draft to the U.S. SEC; Morpho completes $175 million financing

Overview of Important Market Events on June 9th

Galaxy Deep Research Report: How Hyperliquid's HIP-4 Upgrade Changes the Landscape of Prediction Markets?

The platform that wins this competition will be the one whose execution layer is the hardest to replicate, whose builder ecosystem delivers the fastest, and whose regulatory path is the most open.

Latest research from 13 top universities including Cornell University: The current state, challenges, and misconceptions of the fusion of Crypto and AI

The combination of AI and crypto is still in its early stages, with both serving as complementary "middleware": AI translates human intentions into executable programs, while cryptographic technology provides verifiable and tamper-proof guarantees for computational processes and results. In the dire...

Deconstructing Anthropic: The Best AI Company, Possibly Also a Type of Organizational Invention

Instead of competing with ambition, focusing on restraint, how does Anthropic leverage extreme strategic focus and an "counterintuitive" geek culture to counterattack OpenAI on the AI battlefield?

Every exchange is a "Universal Exchange."

You initially build infrastructure for something, then realize it can also be used for many other things, and then you continuously expand the business to accommodate everything that the infrastructure can support.

The counterattack of traditional finance: Alliance chains are quietly reviving

Whether public chains win or consortium chains win has never been the focus.

Pantera Capital Partner: How Tokenization is Restructuring the Private Equity and Early Investment Ecosystem?

Top tech companies are going public later and later, leaving retail investors shut out during the high growth period. Can tokenization give ordinary people back this entry ticket?

Mastercard Launches Agent Pay for AI, Plans to Record AI Agent Payment Authorizations on Polygon

Mastercard launched Agent Pay for AI, a new payment protocol designed to help AI agents make small payments such as pay-per-use access to data and APIs. The system plans to record human-granted AI agent permissions on Polygon, focusing on verifiable authorization, identity, and payment controls.

Curve Deploys Llamalend v2 on Optimism With 250,000 OP Incentives

Curve launched Llamalend v2 on Optimism with 250,000 OP incentives from the Optimism Foundation. The upgrade expands Llamalend beyond its earlier crvUSD-focused model, adding broader collateral support, LlamaRisk market reviews, and the ability to use Curve LP tokens as collateral.

Raydium Old Liquidity Pool Reportedly Exploited, With $1.34 Million Moved to Ethereum and Tornado Cash

An old Raydium liquidity pool was reportedly exploited for around $1.34 million in USDC, RAY, and wSOL, with the stolen funds bridged to Ethereum and deposited into Tornado Cash. The incident highlights the tail risks of legacy DeFi pools, old contracts, and cross-chain fund laundering paths.

Kalshi Executive Challenges “SBF Backed AI Unicorns” Narrative, Says Leopold Aschenbrenner Was Key Figure

Kalshi executive John Wang questioned the “SBF backed AI unicorns” narrative, saying Leopold Aschenbrenner was the key figure behind major AI investment decisions.

Morning News | CME Group launches Nasdaq Cryptocurrency Index futures; Asset management giant Janus Henderson strategically invests in Ethena

Overview of Important Market Events on June 10

Why did Oracle deliver the strongest financial report in history, yet its stock price fell?