Long context windows have become an important capability of modern AI coding tools. However, even a 128K-token or larger context window cannot fully avoid token overflow during long development sessions.

Claude Code solves this problem with the built-in /compact command. It is not a simple truncation tool. It does not just delete old messages or inject a short prompt into the conversation. Instead, it runs local compression logic, calls the model to generate structured summaries, and replaces long conversation history with condensed context.

This article explains how /compact works in Claude Code. It covers command attributes, source code modules, compression workflow, threshold rules, prompt design, auxiliary mechanisms, configuration options, and practical usage advice. It also summarizes reusable engineering patterns for developers building long-context AI systems.

For teams running multiple AI coding tools, a unified API gateway can also help standardize context management rules across clients and reduce maintenance work.

1. What Is the /compact Command?

The /compact command is a local command in Claude Code. This is the key difference between /compact and normal prompt-based commands such as /init.

A prompt-based command sends instructions into the conversation and lets the model respond. /compact works differently. It executes the compression process locally, calls the model API independently to generate a summary, and then replaces the original long conversation history with the compressed result.

This design keeps compression separate from normal coding interaction. It also prevents the compression task from interfering with the current development dialogue.

From the source code structure, /compact is implemented across several modules:

  1. src/commands/compact/index.ts The command registration entry. It contains only 15 lines of code and is responsible for declaring command attributes and loading execution logic.

  2. src/commands/compact/compact.ts The main execution module. It contains 287 lines of code and controls the overall compression flow.

  3. src/services/compact/compact.ts The core compression service. It has more than 1,700 lines of code and a file size of about 60KB. It handles most of the complex compression logic.

  4. src/services/compact/prompt.ts The prompt design module. It contains 374 lines of code and defines how summaries should be generated.

  5. src/services/compact/autoCompact.ts The automatic compression module. It manages trigger conditions and threshold checks.

  6. src/services/compact/microCompact.ts The preprocessing module. It performs lightweight cleanup before full compression.

The purpose of this system is clear: prevent context overflow, reduce token pressure, and preserve the key information needed for continuous coding work.

2. Three-Level Compression Workflow

Whether triggered manually or automatically, /compact follows a three-level fallback workflow.

The system always tries the lighter and cheaper method first. If that method is unavailable or fails, it moves to a heavier fallback path. This design reduces average API cost and avoids unnecessary full-context summarization.

The execution order is:

Session Memory Compaction → Reactive Compaction → Traditional Compaction

2.1 Session Memory Compaction

Session Memory Compaction is the first-priority option. It is an incremental compression strategy.

Instead of summarizing the entire conversation again, it compresses only the newly added messages based on the existing summary. This makes it faster and cheaper than full compression.

However, this mode has a limitation. If the user adds custom compression instructions, Session Memory Compaction will be skipped. In that case, Claude Code needs a more flexible compression path.

2.2 Reactive Compaction

Reactive Compaction is the second-level option. It is mainly used in Claude Code’s internal reactive mode.

This mode is not fully exposed to ordinary users. It is activated only when the tool enters the corresponding internal state.

2.3 Traditional Compaction

Traditional Compaction is the final fallback. It is the most complete compression path.

This mode includes two steps:

  1. MicroCompact preprocessing
  2. Full conversation summarization

Traditional Compaction processes the full conversation history. It is more expensive than incremental compression, but it has the highest reliability. When the lighter methods cannot be used, this path ensures the compression task can still finish.

2.4 Code-Level Execution Logic

The source code follows this fallback strategy clearly.

First, it filters out messages that have already been compressed. This avoids repeated processing.

Then it tries Session Memory Compaction. If this succeeds, the system marks the compression state and stops the rest of the process.

If incremental compression fails, or if custom instructions exist, Claude Code checks whether Reactive Compaction can be used. If not, it moves to Traditional Compaction.

This design is useful for many AI systems. It balances cost, speed, and reliability through layered fallback logic.

3. Automatic Trigger Rules and Thresholds

Claude Code does not rely only on manual compression. It also includes an automatic compression mechanism.

The system monitors token usage in real time. When usage reaches a preset threshold, it starts compression automatically. This prevents the context window from being filled suddenly.

The key constants are defined in the source code:

  • AUTOCOMPACT_BUFFER_TOKENS = 13000 Reserved buffer for automatic compression.

  • WARNING_THRESHOLD_BUFFER_TOKENS = 20000 Buffer used for context overflow warning.

  • MANUAL_COMPACT_BUFFER_TOKENS = 3000 Reserved buffer for manual /compact.

  • MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3 Maximum consecutive automatic compression failures.

3.1 How the Trigger Threshold Is Calculated

The automatic compression threshold is calculated in two steps.

First, Claude Code calculates the effective context window. It subtracts the maximum reserved summary space from the model’s total context window. The reserved summary space is capped at 20,000 tokens.

Second, it subtracts the 13,000-token automatic compression buffer from the effective context window.

For a Claude Sonnet model with a 200K-token context window:

Effective context window = 200000 - 20000 = 180000 tokens
Auto-compact trigger threshold = 180000 - 13000 = 167000 tokens

This means automatic compression starts when the conversation reaches about 167,000 tokens. That is roughly 83% of the total context window.

This leaves enough space for summary generation and new user interactions.

Manual compression uses a smaller 3,000-token buffer. This allows users to run /compact more flexibly before the system reaches the automatic threshold.

3.2 Circuit Breaker Protection

Claude Code also includes a circuit breaker mechanism.

If automatic compression fails three times in a row, the system stops retrying automatically. This prevents infinite retries and avoids wasting API calls.

This design comes from traditional service architecture. It is especially important for AI systems, because repeated failed compression can create large and unnecessary costs.

According to official operation data, before this protection was introduced, there were 1,279 sessions with more than 50 consecutive compression failures every day. These failures caused about 250,000 invalid API calls per day.

The circuit breaker solves this problem by setting a hard retry limit.

4. MicroCompact and File Restoration

Traditional Compaction includes two important auxiliary mechanisms: MicroCompact before summarization and file restoration after summarization.

These two steps solve different problems.

MicroCompact reduces noisy content before compression. File restoration brings key code context back after compression.

4.1 MicroCompact: Removing Noise Before Compression

Images and embedded documents can consume many tokens. However, they often provide limited value for text summaries.

The MicroCompact module handles this problem before full compression. It replaces image resources with [image] and embedded documents with [document].

It also processes image resources inside tool results.

This step reduces the total token size before the summary API call. It also lowers the risk of a “prompt too long” error during compression.

4.2 Restoring Key Files After Compression

After full conversation history is compressed into a summary, the model may lose detailed knowledge of local files. This can cause “memory loss” during ongoing coding work.

To reduce this problem, Claude Code restores selected key files after compression.

The restoration limits are strict:

  • Maximum restorable files: 5
  • Total token budget: 50,000 tokens
  • Maximum tokens per file: 5,000 tokens

After the summary is generated, the system can re-inject up to five important files within the token budget.

This is a practical balance. It keeps the compressed context small while preserving enough code information for continued development.

5. Structured Prompt Design for Better Summaries

The quality of /compact depends heavily on prompt design.

The prompt in prompt.ts uses a structured format. It guides the model to generate summaries that are complete, stable, and useful for later coding tasks.

5.1 Two-Stage Output: Analysis and Summary

The model is asked to return two blocks:

<analysis>
...
</analysis>

<summary>
...
</summary>

The <analysis> block works as a draft area. It helps the model organize the conversation, identify important details, and reason through what should be preserved.

The <summary> block is the final compressed content. After the model returns the result, Claude Code removes the <analysis> part and keeps only the <summary> content.

This design improves summary quality without keeping extra reasoning text in the final context.

The prompt also includes a clear instruction that tool calls are not allowed. This instruction is placed at the beginning of the prompt.

Official tests show that Claude Sonnet 4.6 has a 2.79% chance of ignoring tool prohibition instructions when they appear too late. Placing the rule early improves compliance.

5.2 Nine-Dimensional Summary Template

To avoid vague or incomplete summaries, the prompt uses nine fixed dimensions:

  1. Primary Request and Intent The user’s original goal and main requirements.

  2. Key Technical Concepts Frameworks, languages, tools, and technical topics involved.

  3. Files and Code Sections Related files, code snippets, and important code locations.

  4. Errors and Fixes Errors encountered during the session and how they were solved.

  5. Problem Solving The reasoning path and progress made during the task.

  6. All User Messages User feedback and instructions that should not be lost.

  7. Pending Tasks Work that has not been completed.

  8. Current Work The most important ongoing task at the time of compression.

  9. Optional Next Step The most reasonable next action after compression.

The ninth dimension has an extra rule. The next step must follow the user’s latest explicit request. It should not revive completed tasks or shift to unrelated work.

This rule helps prevent task drift after compression.

5.3 Why This Prompt Design Works

The prompt design solves several common problems in long-session summarization.

The two-stage format improves reasoning quality. The structured template keeps output consistent. The tool prohibition avoids invalid tool calls. The requirement to preserve original user intent reduces information loss.

These techniques are not limited to Claude Code. They can also be reused in report generation, data extraction, long chat memory, and AI agent workflow design.

6. Custom Configuration and Usage Advice

Claude Code allows users to adjust compression behavior through custom instructions and environment variables.

This makes /compact more flexible for different projects and context sizes.

6.1 Custom Manual Compression Instructions

Users can add instructions after the /compact command to control what the summary should focus on.

Examples:

/compact focus on typescript code changes and test output

This tells the system to preserve TypeScript changes and test results.

/compact remember all error messages and stack traces verbatim

This tells the system to keep error logs and stack traces as accurately as possible.

Custom instructions are added to the standard compression prompt. They help preserve task-specific information.

6.2 Environment Variable Tuning

Users can adjust automatic compression behavior through environment variables.

Lower the automatic compression trigger ratio:

export CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=60

This triggers compression earlier, at 60% of the context window. It is useful for long and complex tasks.

Customize the auto-compact context window:

export CLAUDE_CODE_AUTO_COMPACT_WINDOW=100000

This adapts compression behavior to models with different context sizes.

Disable automatic compression:

export DISABLE_AUTO_COMPACT=1

This turns off automatic compression. Users can then decide when to run /compact manually.

6.3 When to Run /compact Manually

Automatic compression is useful, but manual compression still has value.

Manual /compact is recommended in these situations:

  1. The task direction changes significantly.
  2. The model starts referencing outdated file content.
  3. The “Context left until auto-compact” warning appears.
  4. A development milestone is complete.
  5. A new module or new feature is about to begin.

Running /compact at these points can reduce irrelevant history and improve the quality of the next development phase.

7. Reusable Engineering Patterns

The /compact command is more than a Claude Code feature. It also demonstrates several useful engineering patterns for AI systems.

7.1 Hierarchical Fallback Strategy

Claude Code tries lightweight compression first. If that fails, it falls back to heavier methods.

This pattern is useful when a system needs both low latency and high reliability. It can be applied to tool calling, retrieval, summarization, and AI workflow orchestration.

7.2 Draft-and-Final Output Pattern

The model first generates an analysis draft, then produces a final summary.

This pattern improves output quality while keeping the final context clean. It is useful for structured reports, extraction tasks, and long-session memory compression.

7.3 Circuit Breaker Pattern

The circuit breaker stops repeated failed attempts.

This pattern is essential for production AI systems. Without it, automated retries can quickly waste tokens, increase cost, and overload services.

8. Summary

As long-context models become mainstream, token management is no longer optional. It is now a core capability of AI coding tools.

Claude Code’s /compact command provides a practical solution for long development sessions. It combines local execution, hierarchical compression, automatic thresholds, structured prompts, MicroCompact preprocessing, and post-compression file restoration.

This design is more reliable than simple truncation. It reduces token usage while preserving the information needed for ongoing coding work.

For daily use, developers should combine automatic compression with manual /compact. Automatic compression prevents overflow. Manual compression helps reset context at important project boundaries.

For AI tool builders, /compact offers valuable design references. Its fallback strategy, prompt structure, threshold rules, and circuit breaker mechanism can be reused in many long-context AI systems.

When multiple AI development tools are used in a team, standardized context compression rules can also reduce repeated configuration work. Understanding /compact helps developers maintain stable, efficient, and long-running Claude Code sessions.