- Every message you send in Claude resends the entire conversation from scratch. Long contexts are the single biggest token drain.
- Vague prompts that cause rework cost more tokens than one precise prompt that gets it right the first time.
- Reading entire files wastes tokens. Always specify what you need from a file before Claude reads it.
- The /clear command and compact conversation features reset context and can save 40% to 70% of tokens on long sessions.
- Verbose tool outputs, bash commands, and search results add thousands of tokens. Pipe them through grep or head to return only what you need.
- CLAUDE.md is loaded on every session. Keep it under 500 tokens and move project-specific context to sub-directory files loaded only when needed.
- Switching to claude-haiku-3-5 or claude-3-5-sonnet for simple tasks instead of Opus reduces cost by 75% to 96% per token.
If you have been using Claude Code and watching your usage hit 60% after two or three prompts, you are not imagining things. Claude does not have persistent memory between turns. Every single time you send a message, the entire conversation history is resent from scratch. Your prompts, Claude’s responses, every tool result, every file it read. All of it, every time.
That is how transformer models work. The token count is not what you typed. It is the running total of everything that has ever been said or read in the session, resent repeatedly on every turn.
Once you understand that, the optimisation strategies become obvious. Here are nine that make the largest practical difference.
Why token usage grows faster than you expect
| What wastes tokens | Why it happens | What to do instead |
|---|---|---|
| Long conversation history | Every turn resends the full context from scratch | Start new conversations for new tasks. Use /clear or compact. |
| Reading entire large files | You ask Claude to read a file without specifying what to look for | Ask for specific sections, functions, or line ranges only |
| Vague prompts that cause rework | Claude guesses intent incorrectly and you re-prompt to correct | Specify intent, constraints, and output format before Claude starts |
| Verbose tool outputs | Shell commands or search results return more than needed | Pipe outputs through grep or head to return only relevant lines |
| Redundant CLAUDE.md context | CLAUDE.md is loaded on every session whether relevant or not | Keep CLAUDE.md under 500 tokens. Move project-specific context to sub-directories. |
9 ways to cut token usage in Claude
1 Start fresh conversations for new tasks
The most impactful change you can make. When you start a new task in a conversation that already has 50 exchanges in it, Claude is re-reading all 50 exchanges on every single turn. A conversation that was reasonable to run 30 messages ago is now a token sink dragging every subsequent message.
Separate tasks into separate conversations. Each new conversation starts with zero accumulated context. For Claude Code specifically, use the /clear command to reset the working context without losing your environment setup.
- When to start fresh: Any time the current task is conceptually unrelated to what came before. Any time you have been going back and forth for more than 15 to 20 exchanges on a single problem.
2 Use compact or summary mode before long sessions
Before starting a long working session on an existing conversation, ask Claude to summarise the key context from previous exchanges into a compact handoff. Then start a new conversation with only that summary as context. You preserve the essential state without carrying thousands of tokens of conversation history.
Claude Code’s built-in compact feature does this automatically when conversations get long. Triggering it manually before you hit the limit, rather than waiting for automatic compaction, keeps you in control of what context is preserved.
- Savings: Compact summaries typically represent 5% to 10% of the original conversation token count. A 20,000 token conversation compresses to 1,000 to 2,000 tokens of summary.
3 Specify exactly what to read before Claude reads it
When you ask Claude to read a file, it reads the whole file. A 1,000-line Python module costs 1,000 lines of tokens on every turn it stays in context. If you only needed lines 200 to 250, you paid for 750 lines you did not need.
Before instructing Claude to read any file, tell it what you are looking for. Reference specific function names, line ranges, or class names. Use bash commands that return only the relevant sections rather than full file reads.
- Instead of: ‘Read auth.py and tell me about the login function’
- Try: ‘Show me lines 45 to 90 of auth.py, specifically the login() function and its error handling’
4 Write precise prompts that do not require rework
A vague prompt that causes Claude to produce the wrong output, which you then correct over three follow-up exchanges, costs more tokens than a precise initial prompt that gets it right on the first attempt. The benchmark from MindStudio’s 12-session test showed that structured prompting with upfront clarification produced 14% fewer tokens and 9% lower cost than unstructured prompting, entirely because it eliminated rework cycles.
Before Claude writes a single line, specify: what you want it to do, what constraints apply, what the output format should look like, and what it should not do. Five extra seconds of prompt writing can save five rounds of correction.
- Add to your prompts: Output format, file to edit (not just ‘the current file’), what to leave unchanged, any constraints on approach.
5 Pipe tool outputs through filters
When Claude runs a bash command and the output is long, the entire output enters the conversation context. A find command that returns 200 file paths adds 200 lines of tokens. A grep against a log file that returns 500 matches adds 500 lines.
Pipe outputs to filter what Claude sees. Use grep to return only matching lines. Use head -20 to return only the first 20 lines. Use awk to extract specific columns. The goal is to return the minimum output that answers the question.
find . -name '*.py' | head -20instead offind . -name '*.py'grep -n 'def ' auth.pyinstead ofcat auth.pytail -50 error.loginstead ofcat error.log
6 Optimise your CLAUDE.md file
CLAUDE.md is loaded at the start of every Claude Code session, regardless of what you are working on. A 2,000-token CLAUDE.md adds 2,000 tokens to the baseline of every message in every session. Across a day of work with 50 conversation turns, that is 100,000 extra tokens from one file.
Keep your root CLAUDE.md under 500 tokens with only the context that applies to every project and every task. Move project-specific context to CLAUDE.md files in sub-directories, where they are only loaded when Claude is working in that directory.
- Audit your CLAUDE.md: Remove anything that is not actively used by Claude in current tasks. Move project-specific setup instructions to project-level files. Remove outdated context that no longer reflects the codebase.
7 Use the right model for the task
Not every task requires the most capable model. Opus 4 is the highest-capability model and the most expensive. Claude 3.5 Sonnet handles most coding and writing tasks at roughly 75% lower cost per token. Claude 3.5 Haiku handles simple tasks at roughly 96% lower cost per token than Opus.
The cost-to-quality tradeoff depends on the task. Simple refactoring, code formatting, docstring generation, and test writing rarely benefit from Opus. Complex architectural reasoning, multi-step problem solving, and nuanced code review justify the higher cost.
- Default approach: Start sessions on Sonnet. Escalate to Opus only when Sonnet’s output quality is insufficient for the specific task at hand.
8 Use the web search tool selectively
Each web search in Claude returns a set of results that enter the conversation context. If those results are long and you only needed one specific fact, you have added thousands of tokens of context you will carry through the rest of the session.
Be specific with search queries to get targeted results. Fetch specific URLs rather than running broad searches when you know what source you need. If you do run a broad search, ask Claude to extract only the relevant facts rather than summarising the full results.
9 Batch related tasks into single prompts
Each turn in a conversation adds overhead: the full history is resent, tool calls are made, outputs are returned. If you have three related questions about the same function, asking them in one prompt costs fewer tokens than asking them across three separate turns.
Group related questions and tasks. Ask Claude to make multiple related changes in a single instruction rather than one change at a time. Use numbered lists in your prompts to give Claude a clear multi-part task that it can complete in one response.
- Instead of: Three separate prompts asking about function signature, error handling, and test coverage separately
- Try: One prompt asking: ‘For the login() function: (1) review the signature, (2) identify error handling gaps, (3) suggest what unit tests are missing’
The compounding effect
None of these changes is dramatic on its own. A concise CLAUDE.md saves 1,500 tokens per session. Precise prompting eliminates two correction rounds. Starting fresh conversations saves 30,000 tokens of accumulated history. Using Sonnet instead of Opus cuts per-token cost by 75%.
Applied together across a day of active Claude Code use, the reduction is significant. The MindStudio benchmarks showed 9% to 14% reduction from structured prompting alone. The savings compound when you add context management, model selection, and output filtering on top.
FAQ
Why do tokens in Claude run out so fast?
Claude resends the entire conversation history on every turn. A conversation with 40 exchanges does not cost 40 prompts worth of tokens. It costs the sum of every message, every response, and every tool output accumulated across all 40 exchanges, resent on each new turn. Long conversations are the primary driver of rapid token consumption.
Does starting a new conversation reset token usage?
Yes. A new conversation starts with zero accumulated context. The only tokens consumed are your current prompt, any files or context you explicitly provide, and Claude’s response. Starting fresh conversations for new tasks is the single most effective token reduction strategy available.
How much can CLAUDE.md affect token usage?
CLAUDE.md is loaded on every session. A 2,000-token CLAUDE.md adds 2,000 tokens to every message in every session across every project. Over 50 turns in a day, that is 100,000 tokens from one file. Keep root CLAUDE.md under 500 tokens and move project-specific context to sub-directory files.
Is it worth switching from Claude Opus to Sonnet for cost savings?
For most tasks, yes. Claude 3.5 Sonnet costs roughly 75% less per token than Opus 4. For standard coding tasks, refactoring, documentation, and analysis, Sonnet’s output quality is indistinguishable from Opus in most cases. Reserve Opus for genuinely complex architectural reasoning where the quality difference justifies the cost.
Does using /clear help with token usage?
Yes. The /clear command resets the conversation context, removing accumulated history. It preserves your project setup and CLAUDE.md context but eliminates the conversation history that grows with every turn. Using /clear proactively before starting a new task, rather than waiting for automatic compaction, keeps you in control of when context resets.


