Claude Code Context Architecture: API Call Internals

As a leading AI coding agent developed by Anthropic, Claude Code relies on a sophisticated layered context assembly system to ensure stable operation, efficient token usage and reliable cache performance during continuous tool calls and multi-round task execution. Every time Claude Code invokes the model API, the request payload is composed of three core components: System Prompt, Tools schema list and Messages dialogue array. This article systematically dissects the structural logic, layered design, caching strategies, dynamic update rules and end-to-end workflow of these three modules. Combined with source code logic, classification tables and practical engineering cases, it explains how Claude Code separates stable cached content from dynamic real-time data, manages tool pools, processes various attachment contexts and operates the core agent loop. This content serves as a valuable reference for developers who build custom AI agents, optimize prompt engineering and refine context management mechanisms.

1. Overall Framework and Core Design Principles

Before diving into each module, it is essential to clarify the overall composition of the API payload and the core design constraints that run through the entire context architecture. When Claude Code triggers an agent loop to call the LLM API, the complete request payload consists of three inseparable parts:

System Prompt: Defines the agent’s identity, behavioral norms, operational rules and basic environmental information.
Tools: A collection of tool schemas that informs the model of all available functional capabilities.
Messages: The full dialogue stream, including user instructions, project configurations, tool execution results and various additional attachment contexts.

From an operational perspective, System Prompt and Tools are mostly prepared before the agent loop starts and remain relatively stable during iterative calls. In contrast, the Messages array is the dynamic layer that continuously appends new content after each tool execution. The core operating logic of the Claude Code agent loop is straightforward: the model returns tool invocation requests → the system executes corresponding tools → tool results are appended to Messages → a new API call is initiated, and this cycle repeats until the entire task is completed.

The most critical engineering constraint governing the whole system is cache affinity classification: System Prompt and Tools belong to the cache-sensitive prefix layer, while Messages act as the dynamically growing layer. Any modification to the content of the stable prefix will invalidate the pre-configured prompt cache. Guided by this principle, Claude Code adopts a core architectural idea: place all cache-friendly fixed content in the front prefix area, and migrate all runtime-changing content to Messages, attachment additional contexts or delayed tool loading modules. This design maximizes the hit rate of API caching, reduces repeated calculation and token consumption, and significantly lowers overall operating costs.

The simplified source code snippet for model invocation in the agent loop is as follows, which intuitively presents the delivery of the three major components:

// Simplified code for model calling in src/query.ts
for await (const message of deps.callModel({
  systemPrompt: fullSystemPrompt,  // Complete system prompt
  messages: ...,                    // Dialogue message array
  tools: ...,                       // Tool schema list
}))

2. Layered Assembly of System Prompt

Claude Code does not use a single large string for the System Prompt. Instead, it organizes it as a string array composed of multiple independent sections, which are concatenated into a complete text before being sent to the API. This segmented design brings three prominent advantages: each section has independent responsibilities for easy maintenance and testing; static and dynamic content can be physically separated to facilitate cache utilization; dynamic sections support conditional loading to flexibly control injected content.

2.1 Static Sections and Dynamic Sections Division

The System Prompt is split into static sections and dynamic sections, separated by a dedicated marker named SYSTEM_PROMPT_DYNAMIC_BOUNDARY. The content before the marker is a fully stable cache prefix, and the content after is the dynamically changing part. Static sections account for more than 60% of the total System Prompt content.

| Category | Core Features | Content Details | Operational Characteristics | | ---- | ---- | ---- | | Static Sections | Cross-session stable, high cache hit rate | Agent identity declaration, system rules, task execution guidelines, operation safety specifications, tool usage preferences, output style and efficiency requirements | Basically unchanged within a single session and across different user sessions, suitable for global caching | | Dynamic Sections | Varies with users, projects and environments | Session guidance, automatic memory rules, simple environmental information, MCP connection instructions, user language preferences | Most content is memoized (calculated once and reused throughout the session); real-time changes are migrated to other modules |

2.1.1 Static Sections Details

Static sections define the fundamental behavioral norms of Claude Code and are the core part of the cache prefix. The main contents include:

Identity introduction: Clarify that the agent is a professional engineering assistant responsible for software development tasks.
Security rules: Restrict network requests and URL generation, and abide by network security specifications.
Task guidelines: Follow the principle of reading files before modification, avoid redundant functions and redundant comments, and comply with OWASP Top 10 security standards.
Operation norms: Emphasize the reversibility of operations, and require user confirmation for destructive behaviors.
Tool preferences: Prioritize dedicated file operation tools such as Read, Edit and Gash over general Bash commands.
Output specifications: Adopt a concise style without emojis, and mark file paths and line numbers when quoting code.

2.1.2 Dynamic Sections Details

Dynamic sections are associated with the current session, operating environment and user configuration. Typical content includes session guidance rules, memory system instructions, simplified environmental information (operating system, shell type, working directory), MCP server usage rules and user-defined output styles. Most dynamic sections use the memoization mechanism: calculated only once at the start of the session and reused in subsequent multiple API calls. Real-time changes such as MCP connection status will not modify the System Prompt directly, but be processed through delta updates or delayed tool loading.

2.2 Cache Control Rules

The SYSTEM_PROMPT_DYNAMIC_BOUNDARY marker acts as a clear dividing line for prompt caching. Combined with different API types, Claude Code formulates differentiated cache strategies:

When calling the official Anthropic API: The content before the boundary uses global level cache, and the dynamic suffix after the boundary uses org level cache.
When using third-party APIs: The entire System Prompt no longer follows the boundary division and adopts unified organizational-level caching.

This fine-grained cache management mechanism ensures that the stable prefix can be reused efficiently while adapting to the access characteristics of different service ends. The complete function for assembling the System Prompt is designed as follows:

export async function getSystemPrompt(tools, model, ...): Promise<string[]> {
  const dynamicSections = [
    systemPromptSection('session_guidance', () => getSessionSpecificGuidanceSection(...)),
    systemPromptSection('memory', () => loadMemoryPrompt()),
    systemPromptSection('env_info_simple', () => computeSimpleEnvInfo(model, ...)),
  ];
  return [
    // Static sections (cache prefix)
    getSimpleIntroSection(),
    getSimpleSystemSection(),
    getSimpleDoingTasksSection(),
    getActionsSection(),
    getUsingYourToolsSection(),
    getSimpleToneAndStyleSection(),
    getOutputEfficiencySection(),
    // Cache boundary marker
    ...(shouldUseGlobalCache() ? [SYSTEM_PROMPT_DYNAMIC_BOUNDARY] : []),
    // Dynamic sections
    ...dynamicSections,
  ].filter(s => s !== null);
}

3. Tiered Management and Delayed Loading of Tools Module

The Tools module is not a fixed list loaded once at the start of the session. Claude Code adopts a three-tier tool pool architecture and a delayed loading strategy to prevent excessive tool schemas from breaking the stable prefix cache.

3.1 Three-Tier Tool Architecture

Candidate Tool Pool: The full set of all available tools in the current session, including built-in tools, MCP external tools and user-defined Skill tools. The assembly function assembleToolPool filters, deduplicates and sorts tools according to permissions. There are more than 40 built-in file and search tools such as Read, Write, Bash and Grep.
Directly Loaded Tools: High-frequency and basic tools that are directly added to the Tools array of each API request. They belong to the cache-sensitive prefix and ensure that the model can call core capabilities at any time.
Deferred Tools: Long-tail and infrequently used tools. They will not be loaded into the prefix by default. When the model has an invocation intention, the system uses Tool Search to dynamically supplement the corresponding schemas, which effectively avoids cache invalidation caused by frequent changes of the Tools list.

3.2 Tool Source and Filtering Logic

The tools in the candidate pool come from three channels:

Built-in native tools: Core tools for file reading and writing, command execution and content retrieval.
MCP (Model Context Protocol) tools: External tools dynamically registered through the MCP server, covering databases, web services and cloud platforms.
Skill tools: Converted from user-defined slash commands such as /review into model-callable tools.

During assembly, the system will filter disabled tools and remove duplicate tools with the same name to ensure the cleanliness and efficiency of the tool pool.

4. Multi-Dimensional Assembly of Messages Array

The Messages array is the main dynamic carrier of the entire agent loop, which continues to grow with tool execution and user interaction. Its initial composition is complex, including CLAUDE.md configuration files, user input and various attachment contexts.

4.1 Loading Rules for CLAUDE.md

CLAUDE.md is a core configuration file used to define project specifications, code styles and team conventions. It is injected into the head of the Messages array in the form of a user message before each API call. Claude Code follows a multi-level priority loading mechanism for CLAUDE.md files, with priorities from low to high as follows:

Managed level: /etc/claude-code/CLAUDE.md, global administrator policies.
User level: ~/.claude/CLAUDE, personal global preferences.
Project level: CLAUDE.md in the project directory and rule files under the .claude folder.
Local level: CLAUDE.local.md, local private overrides.

Since the model reads Messages from top to bottom, the later loaded higher-priority files will take precedence in execution. It is worth noting that CLAUDE.md is placed in Messages instead of System Prompt. The core reason is to distinguish permissions: System Prompt is the highest-level security and operation rule, while CLAUDE.md is project and user personalized configuration, which can be submitted to the code repository and will not touch the core security boundary. The content is wrapped with <system-reminder> tags when injected.

4.2 Classification and Application of Attachment Context

Attachment is a flexible additional context mechanism. User input, file references, IDE status and asynchronous events will be converted into independent AttachmentMessage objects and appended to Messages. According to usage scenarios, attachments are divided into eight categories:

User explicit references: Files, directories, PDFs and MCP resources cited via the @ syntax.
File change records: Read files and edited content, avoiding repeated injection of full text.
IDE diagnosis content: Selected code segments, opened files and LSP error prompts.
Skill and Agent discovery: New skills and sub-Agent information.
Hook and asynchronous events: Background task results and queue messages.
Operating mode reminders: Plan mode, automatic mode switches and task to-do items.
Budget monitoring: Token consumption and cost statistics.
Special functional content: Team collaboration information and nested memory data.

4.3 Dynamic Update Rules in the Loop

In each round of the agent loop, the Messages array will not be fully refreshed. Only incremental content is appended: newly generated assistant replies, tool execution results and newly generated attachments. For repeated content such as read files and historical skills, the deduplication mechanism is enabled to prevent the Messages array from being excessively bloated.

5. Core Agent Loop Full Workflow

The main loop of Claude Code is implemented in the query function, forming a closed loop of "call model → execute tools → update context". The complete execution steps are as follows:

Organize the Messages content for the current round, and perform truncation processing when the token exceeds the limit.
Splice the fixed System Prompt, Tools and processed Messages to initiate an API call.
The model returns a response. If there is no tool invocation requirement, the task ends directly.
If a tool needs to be called, execute the corresponding tool asynchronously and collect all execution results.
Inject incremental attachment content generated during the execution process.
Append the model response and tool results to the Messages array to form a new context, and enter the next loop.

This append-only update mode ensures the stability of the prefix layer and maximizes the cache utilization rate during long-running tasks.

6. Practical Value and Summary

Claude Code’s context assembly architecture is a classic case of combining cache optimization, layered design and dynamic management in production-level AI agents. Its core advantages are reflected in three aspects: first, the separation of static prefix and dynamic content greatly improves the prompt cache hit rate and reduces token cost and inference delay; second, the three-tier tool pool and delayed loading mechanism balance tool richness and cache stability; third, the diversified attachment system realizes fine-grained management of various scene contexts.

For developers who develop custom AI agents, this set of architectures provides clear reference ideas: split content according to cache attributes, separate global rules from personalized configurations, and use incremental updates to control the growth of context volume.

For development teams that need to access multiple large models and AI agent services for a long time, using a unified API relay service can simplify interface management and reduce comprehensive costs. As a professional API gateway, Treerouter supports one-stop access to mainstream models and agent services, with pricing more favorable than official direct access. It is compatible with mainstream development frameworks, allowing developers to seamlessly switch models when testing context architectures without modifying business code.

To sum up, the context assembly of Claude Code is not a simple splicing of text, but a systematic engineering design integrating caching, permissions, dynamic updates and scene adaptation. Understanding this set of mechanisms can help developers optimize existing agent systems, reduce operating costs and improve operational stability.

Claude Code Context Architecture: API Call Internals

1. Overall Framework and Core Design Principles

2. Layered Assembly of System Prompt

2.1 Static Sections and Dynamic Sections Division

2.1.1 Static Sections Details

2.1.2 Dynamic Sections Details

2.2 Cache Control Rules

3. Tiered Management and Delayed Loading of Tools Module

3.1 Three-Tier Tool Architecture

3.2 Tool Source and Filtering Logic

4. Multi-Dimensional Assembly of Messages Array

4.1 Loading Rules for CLAUDE.md

4.2 Classification and Application of Attachment Context

4.3 Dynamic Update Rules in the Loop

5. Core Agent Loop Full Workflow

6. Practical Value and Summary

40+ top providers, 300+ core models, scheduled reliably

How to Use Kimi K3 After Subscription Suspension

Codex Context Migration Guide: Keep AI Coding Memory

GLM5 vs Kimi 2.5 vs Minimax M2.5: LLM Selection Guide

ZCode for GLM-5.2: AI Agent IDE for Developers

1. Overall Framework and Core Design Principles

2. Layered Assembly of System Prompt

2.1 Static Sections and Dynamic Sections Division

2.1.1 Static Sections Details

2.1.2 Dynamic Sections Details

2.2 Cache Control Rules

3. Tiered Management and Delayed Loading of Tools Module

3.1 Three-Tier Tool Architecture

3.2 Tool Source and Filtering Logic

4. Multi-Dimensional Assembly of Messages Array

4.1 Loading Rules for CLAUDE.md

4.2 Classification and Application of Attachment Context

4.3 Dynamic Update Rules in the Loop

5. Core Agent Loop Full Workflow

6. Practical Value and Summary

40+ top providers, 300+ core models, scheduled reliably

Further Reading

How to Use Kimi K3 After Subscription Suspension

Codex Context Migration Guide: Keep AI Coding Memory

GLM5 vs Kimi 2.5 vs Minimax M2.5: LLM Selection Guide

ZCode for GLM-5.2: AI Agent IDE for Developers