Anthropic's new prompt caching saves developers a fortune

Anthropic's new prompt caching saves developers a fortune


Sign up for our daily and weekly newsletters to stay up to date with the latest updates and exclusive content on industry-leading AI coverage. More information


Anthropic introduced prompt caching on its APIwhich remembers context between API calls and allows developers to avoid repeated prompts.

The fast cache function is available in public beta on Claude 3.5 Sonnet and Claude 3 Haiku, but support for the largest Claude model, Opus, is coming soon.

Fast caching, described in this 2023 articleallows users to retain frequently used contexts across their sessions. Because the models remember these prompts, users can add additional background information without increasing the cost. This is useful in cases where someone wants to send a large amount of context in a prompt and then refer to it in multiple conversations with the model. It also allows developers and other users to fine-tune model responses.

According to Anthropic, early adopters have seen “substantial improvements in speed and cost with prompt caching for a variety of use cases — from recording an entire knowledge base to 100 sample recordings to recording every turn of a conversation in their prompt.”

According to the company, potential use cases include reducing the cost and latency of long instructions and uploaded documents for conversational agents, faster code autocomplete, providing multiple instructions to agent search tools, and embedding full documents in a prompt.

Cached Prompts Prices

One advantage of prompt caching is the lower price per token. According to Anthropic, using cached prompts is “significantly cheaper” than the base price of the input token.

For Claude 3.5 Sonnet, writing a prompt to be cached costs $3.75 per 1 million tokens (MTok), but using a cached prompt costs $0.30 per MTok. The base price of an input for the Claude 3.5 Sonnet model is $3/MTok, so by paying a little more up front, you can expect to save 10x the next time you use the cached prompt.

Claude 3 Haiku users pay $0.30/MTok for cache and $0.03/MTok when using saved prompts.

Although prompt caching is not yet available for Claude 3 Opus, Anthropic has already published the prices. Writing to cache costs $18.75/MTok, but access to the cached prompt costs $1.50/MTok.

However, as AI influencer Simon Willison noted on X, Anthropic's cache has a lifespan of only 5 minutes and is refreshed with every use.

Of course, this isn't the first time Anthropic has attempted to compete with other AI platforms through pricing. Before the release of the Claude 3 family of models, Anthropic lowered the prices of its tokens.

It is now engaged in a kind of 'race to the bottom' against rivals, including Google And OpenAI when it comes to offering low-cost options for third-party developers building on its platform.

Highly requested feature

Other platforms offer a version of prompt caching. Lamina, an LLM inference system, uses KV caching to reduce the cost of GPUs. A cursory glance at the OpenAI developer forums or GitHub will raise questions about how to cache prompts.

Caching prompts is not the same as the large language model memory. For example, OpenAI's GPT-4o provides a memory where the model remembers preferences or details. However, it does not store the actual prompts and responses like prompt caching does.