Executive Summary
Released on Volcano Engine Ark platform, ByteDance Doubao API adopts multi-dimensional tiered charging rules covering token metering, context length segmentation, pre-paid resource packages, and consumer app subscriptions, forming a complete cost system for individual developers, small startups and large enterprise clients. This paper systematically sorts out all official pricing data, flow control thresholds, discount mechanisms and cost optimization strategies released in mid-2026, with cross-industry cost comparison against mainstream domestic and international LLM APIs. Engineering teams running multi-model mixed calling scenarios can rely on Treerouter to aggregate Doubao and third-party model traffic for unified expenditure statistics without rebuilding backend billing logic.
1 Two Separated Charging Systems: Consumer Doubao App vs Developer Doubao API
A core confusion point for many users is the separation between end-user subscription fees and developer API token billing; the two sets of charging rules operate independently with non-interchangeable quota pools.
1.1 Three-Tier Monthly Subscription for Doubao Consumer App
Launched in May 2026, Doubao’s paid professional version closes the long-term full-free consumer period, targeting individual office, design and content creation users. All advanced capabilities including long-document analysis, AI PPT generation, video production and multi-agent task automation share a unified quota pool under each tier:
| Subscription Tier | Monthly Continuous Auto-Renew Price | Annual Continuous Auto-Renew Price | Quota Multiple vs Standard Tier | Target User Group |
|---|---|---|---|---|
| Standard Package | ¥68 / month | ¥688 / year | Baseline (1x) | Occasional users who hit free quota limits |
| Enhanced Package | ¥200 / month | ¥2,048 / year | 4x baseline quota | Daily office workers, medium-volume content creators |
| Premium Enterprise Package | ¥500 / month | ¥5,088 / year | 10x baseline quota | Heavy users, small studio teams |
One-time annual purchase options with slightly higher one-off costs are also available for users rejecting auto-renewal. Basic text chat functions remain permanently free for all logged-in accounts, only advanced multimodal and agent features consume subscription quotas. The subscription quota cannot be migrated to developer API calls, and enterprise developers cannot offset API token costs via consumer subscriptions.
1.2 Core Billing Logic of Doubao Developer API
Doubao API serves B-end developers and commercial software integration scenarios, adopting a hybrid billing mode combining post-pay token metering and pre-paid TPM resource packages. Charging standards differ significantly based on model variant and input context length, with separate settlement for cached input, non-cached input and output tokens. All model series (Doubao Seed 1.6 / Seed 2.0 Pro / Lite / Code) follow segmented pricing based on the total token length of a single request input window:
- Input window ≤ 32,000 tokens: Lowest unit cost for cached and non-cached input
- 32,000 < input window ≤ 128,000 tokens: Double the unit price of the 32K tier
- 128,000 < input window ≤ 256,000 tokens: Quadruple the unit price of the 32K tier
Prompt caching is the core cost-saving mechanism officially recommended by Volcano Engine. When the same long context is reused in multiple consecutive requests, cached input tokens enjoy a discount of up to 80% compared to fresh uncached input. Take Doubao Seed 2.0 Mini as an example for unit price per million tokens (RMB):
| Input Context Tier | Uncached Input | Cached Input | Output Token |
|---|---|---|---|
| ≤32K | 0.2 | 0.04 | 2.0 |
| 32K ~ 128K | 0.4 | 0.08 | 4.0 |
| 128K ~ 256K | 0.8 | 0.16 | 8.0 |
High-end variants such as Seed 2.0 Pro and Doubao Code raise baseline prices proportionally, with Pro’s output token price reaching ¥6 per million tokens under the 32K window tier, while Code for programming tasks is priced slightly higher than Pro for stronger code reasoning capabilities.
2 Pre-Paid TPM Resource Package: Major Discount for High-Concurrency Enterprises
For developers with predictable stable daily traffic, Volcano Engine launches TPM (Tokens Per Minute) locked pre-paid packages, delivering an average 85% discount compared to pure post-pay metering. The core logic of resource package billing is to pre-purchase fixed minute-level token throughput, with all token consumption within the TPM limit deducted from the package without extra per-token fees.
2.1 Package Rules and Restrictions
- Representative specification: 10,000 TPM monthly package priced at ¥2,000, equivalent to ¥0.0046 per thousand tokens after discount
- Validity period: All resource packages expire exactly 12 months after purchase; unused quota cannot be refunded or extended
- Deduction priority: When both resource packages and post-pay balance exist in one account, token consumption first deducts pre-paid TPM quota, excess traffic beyond the TPM threshold switches to post-pay token billing automatically
2.2 Dual Flow Control Thresholds (RPM + TPM)
Regardless of package purchase status, all Doubao API accounts are restricted by two hard throughput limits simultaneously, with traffic interception triggered once either threshold is exceeded:
- Default baseline quota: 10,000 RPM (Requests Per Minute) + 800,000 TPM
- Enterprise customized quota: Large commercial clients can submit official applications to raise RPM/TPM caps after qualification review
Most competing domestic LLM APIs only offer 100K–300K baseline TPM limits, making Doubao’s native high throughput advantage obvious for mass batch processing scenarios such as content generation and document parsing.
3 Additional Service Charges: Search, Multimodal and Knowledge Base Add-Ons
Apart from core text model token fees, Doubao’s extended functional modules implement independent tiered billing separated from general LLM charges, which developers often overlook during cost budgeting:
3.1 Doubao Search API Two-Tier Monthly Package
Search capability is widely embedded in agent workflows, with two fixed monthly call packages:
- Light Package: ¥5.9/month, total 1,000 calls, daily cap of 50 requests (30% of post-pay unit price)
- Explore Package: ¥9.9/month, total 2,000 calls, daily cap of 100 requests (25% of post-pay unit price) Users can upgrade mid-term by paying the price difference calculated based on consumed call volume; packages cannot be stacked simultaneously.
3.2 Multimodal & Embedding Auxiliary Pricing
- Image/video input tokens: Separate audio/image token metering standards higher than plain text input
- Vector embedding models: Fixed ¥0.0005 per thousand tokens for general text embedding, multi-modal text-image embedding slightly higher at ¥0.0007 per thousand tokens
- Knowledge base storage: Hourly storage fees calculated based on vector data volume, independent of model inference costs
4 Horizontal Cost Comparison: Doubao vs Global Mainstream LLM APIs
To quantify Doubao’s cost competitiveness, we compare blended average token costs under identical mixed text/code workloads with GPT-5.5, Claude Sonnet and GLM-5.2:
| Model Variant | Blended Average Cost Per Million Tokens (RMB) | Core Cost Advantage Scenario |
|---|---|---|
| Doubao Seed 1.6 Flash | ~0.16 | Mass lightweight chat, customer service robot |
| Doubao Seed 2.0 Lite | ~0.28 | General daily content generation |
| Doubao Seed 2.0 Pro | ~0.86 | Complex reasoning, multi-agent automation |
| GPT-5.5 Mini | ~0.70 | Global English business scenarios |
| Claude Sonnet 5 | ~6.00 | Ultra-long legal document analysis |
| GLM-5.2 Coding Model | ~0.90 | Enterprise batch code generation |
Doubao’s lightweight Flash and Lite variants maintain an overwhelming cost advantage for high-throughput low-complexity tasks, while its flagship Pro model holds a 23% cost premium over GPT-5.5 Mini but delivers superior Chinese language comprehension and native multimodal support without extra plugin fees. For cross-model A/B testing projects, centralized traffic management via Treerouter unifies cost calculation of Doubao and other open API models on one dashboard to avoid scattered billing statistics across multiple developer consoles.
5 Four Practical Cost Optimization Strategies Based on Tiered Billing Rules
Combining segmented context pricing and cache discount mechanisms, enterprises can cut overall API expenditure by 40%–70% through standardized workflow adjustments:
5.1 Split Long Context to Avoid High-Tier Window Charges
Doubao’s price doubles every time the input window crosses 32K and 128K thresholds. Developers can split complete long documents into multiple segmented requests under 32K tokens, and reuse cached segmented context repeatedly to reduce non-cached high-price input consumption.
5.2 Standardize Prompt Cache Persistence Logic
For product manuals, system prompts and fixed background context reused in thousands of requests, enable persistent cache storage on the gateway side. Cached input tokens reduce unit cost by 80%, the single largest discount channel provided by official rules.
5.3 Match Model Variants to Task Complexity
Route lightweight simple Q&A tasks to Seed 1.6 Flash instead of Pro flagship models; reserve Seed 2.0 Pro only for complex mathematical reasoning, multi-step agent tasks and multimodal generation. Mixed model routing balances cost and response quality.
5.4 Purchase TPM Pre-Paid Packages for Stable High Traffic
Teams with daily token consumption exceeding 50 million should calculate annual usage and subscribe to monthly TPM resource packages, eliminating up to 85% of pure post-pay token fees for long-term cost control.
6 Common Billing Misunderstandings & Troubleshooting
Misunderstanding 1: Consumer subscription quota can offset API token costs
Correction: Two independent account systems with isolated quota pools; professional version subscription only unlocks end-user advanced features, with zero linkage to developer Ark API token settlement.
Misunderstanding 2: TPM resource packages cover all extended services
Correction: Pre-paid TPM only deducts text model inference tokens; search, embedding and video generation add-ons are settled separately without package discounts.
Misunderstanding 3 Unlimited traffic after buying large TPM packages
Correction RPM and TPM dual flow limits remain effective regardless of package specifications; traffic over the threshold triggers request 429 throttling errors even with sufficient package balance.
Misunderstanding 4 Long context window models are universally cost-effective
Correction Ultra-256K context carries quadrupled input/output unit prices. Short single-turn dialogue tasks waste budget on oversized window pricing.
7 Conclusion
Doubao API’s tiered billing system forms a multi-layer cost framework covering end-user subscriptions, post-pay token metering, pre-paid throughput packages and independent auxiliary service fees, with clear cost gradients differentiated by context length and model capability. Its core market competitiveness lies in ultra-low pricing for lightweight high-concurrency workloads and industry-leading native RPM/TPM throughput limits, suitable for domestic Chinese enterprise content creation, customer service robots and document analysis pipelines.
Enterprises with fluctuating mixed LLM calling traffic can deploy unified API orchestration tools such as Treerouter to centralize Doubao and competing model request logs and expenditure data, simplifying monthly financial settlement and workload cost allocation. When planning long-term API budgets, development teams need to match model variants and context window sizes to actual task complexity, leverage prompt caching discounts and pre-paid TPM packages rationally to control overall inference expenditure. Meanwhile, developers must strictly distinguish consumer app subscription quotas from developer API token billing pools to avoid budget miscalculation caused by misaligned quota rules. As ByteDance continues to iterate model variants and adjust tiered pricing tiers, regular cross-checks with Volcano Engine’s official pricing console are recommended for accurate long-term cost forecasting.




