Skip to content

Guides > Configuration

Reduce token usage with AI coding agents

Open in ChatGPT ↗
Ask ChatGPT about this page
Open in Claude ↗
Ask Claude about this page
Copied!

Reduce the tokens and credits your coding agents consume in Warp using model choice, focused context, conversation management, and Rules.

Warp measures agent usage in credits, and credits scale directly with the number of tokens each task processes. The fewer tokens your agents use, the fewer credits you spend.

This guide covers practical ways to lower token usage in Warp: choosing the right model, keeping context tight, managing conversations, and configuring your agents to work efficiently.

Credit usage is non-deterministic, so treat these as habits that bring your usage down over time rather than exact, guaranteed savings. For a full breakdown of what drives usage, see Warp credits and billing.

You can't reduce what you can't see. Before optimizing, build an intuition for which prompts, models, and workflows cost the most.

  • Per-turn breakdown - Hover over the credit count chip at the bottom of any agent response to see the credits, tool calls, context window, and diffs that turn used.
  • In-conversation details - Run /cost to toggle credit usage details directly in the conversation.
  • Total usage and reset date - In the Warp app, open Settings > Billing and usage, or run /usage, to track your overall consumption and when your credits reset.

Once you know where your tokens go, the techniques below help you bring them down.

Larger reasoning models process more tokens per turn than lighter ones, so the model you choose has one of the biggest effects on usage.

  • Use a cost-efficient model for routine work - Switch to Auto (Cost-efficient) (auto-efficient), which optimizes for lower credit consumption while keeping output quality high. Lightweight models like Claude Haiku also use fewer credits for simple edits, lookups, and quick questions.
  • Reserve high-reasoning models for hard problems - Save heavier models like Claude Opus for deep debugging, architecture decisions, and planning, where the extra reasoning is worth the cost.
  • Pick a model and stay with it - Switching models mid-conversation can reset prompt caching and reprocess your context. Choose a model at the start of a task and keep it for the duration when you can.

Change models with the model picker in the input, or run /model. See Agent model choice for the full model list.

Every turn re-sends the conversation so far, so long, unfocused threads pay for the same context again and again. Tight, well-scoped conversations keep token usage low.

  • Scope tasks and work incrementally - Break large changes into smaller, contained steps instead of one sprawling request. Well-scoped tasks need less back-and-forth and fewer correction cycles.
  • Start a new conversation for a new task - Run /new when you switch topics so unrelated history doesn't ride along in every turn.
  • Compact long conversations - When a useful thread grows long, run /compact to summarize the history and free up the context window. Use /fork-and-compact to branch into a fresh, summarized copy that keeps the relevant context and trims the rest.

See Conversation forking and the full Slash Commands reference for more.

Context you attach becomes tokens the model has to process. Adding only what's relevant keeps each turn lean.

  • Attach focused snippets, not full dumps - When sharing logs, code, or command output, include only the relevant portion instead of an entire file or output.
  • Add context deliberately - Attach the specific blocks, files, or images the agent needs for the task, rather than broad, just-in-case context.

Let Codebase Context retrieve code for you

Section titled “Let Codebase Context retrieve code for you”

When an agent explores your repository by reading files one by one, each read is a tool call that consumes tokens. Indexing your codebase lets Warp find the right code with semantic search instead.

  • Index your repository - Run /index so Warp can locate relevant code by meaning, reducing the number of exploratory tool calls and the amount of code you paste in manually.
  • Let the agent search instead of pasting - With an indexed codebase, ask about a feature or file directly rather than copying large sections into the prompt.

Learn more in Codebase Context.

Without persistent guidance, agents re-derive your preferences every session and sometimes drift off course, which wastes tokens on corrections and rework. Rules encode that guidance once.

  • Capture preferences as Rules - Store your tools, conventions, and standards as Rules so you don't re-explain them in every conversation. Add one with /add-rule.
  • Add a project AGENTS.md - Run /init to generate a project AGENTS.md that gives agents the context they need up front, reducing exploration and missteps.

For examples, see Set coding best practices with Rules.

For big or ambiguous tasks, jumping straight to implementation often leads to wrong turns and expensive rework. A short planning pass keeps execution on track.

  • Create a plan before executing - Run /plan to have the agent research and outline the work in phases before it changes code. A clear plan reduces wasted exploratory work and redo cycles on large tasks.

See Planning for details.

Together, these habits lower your token and credit usage over time: match the model to the task, keep conversations and context tight, and configure Rules and Codebase Context. Pair them with the right Agent Profile, and keep an eye on your usage as you go.