← tokenmark
Together AI API pricing — Llama 3.3 70B Turbo, Qwen 2.5, DeepSeek V3.1
Current per-million-token pricing for Together AI's hosted serverless inference. Verified May 12, 2026.
TL;DR. Together AI hosts open-weight models with predictable serverless pricing. Llama 3.3 70B Turbo is $0.88/$0.88 — symmetric input/output makes budgeting simpler. Qwen 2.5 7B Turbo is $0.30/$0.30 — competitive with Gemini Flash for mid-tier work. DeepSeek V3.1 is $0.60/$1.70 — frontier-quality reasoning at a fraction of Claude Opus / GPT-5 prices.
Pricing (USD per 1M tokens, May 12, 2026)
| Model | Input | Output |
| llama-3.3-70b-instruct-turbo | $0.88 | $0.88 |
| qwen-2.5-7b-instruct-turbo | $0.30 | $0.30 |
| deepseek-v3.1 | $0.60 | $1.70 |
Source: together.ai/pricing. Re-verify before relying on these numbers for budget decisions.
How to read Together AI pricing
Symmetric in/out pricing
Most providers charge significantly more for output tokens than input (typically 4-5×). Together's "Turbo" variants charge the same for input and output. This makes budgeting simpler for workloads with unpredictable in/out ratios — you just need to know total tokens.
Open-weight model portability
Llama 3.3, Qwen 2.5, and DeepSeek V3.1 are open-weight. The same model is portable to self-hosted (RunPod, Lambda, your own GPU), Groq, Cerebras, Fireworks, or anywhere else with compatible inference. Together AI is a managed-serverless option; the model files themselves aren't vendor-locked.
Quality positioning (May 2026)
- DeepSeek V3.1 — competitive with Claude Sonnet on most benchmarks; strong on reasoning. The flagship cheap-frontier option in 2026.
- Llama 3.3 70B Instruct Turbo — solid mid-tier; comparable to Sonnet on standard agent workloads at <30% of the price.
- Qwen 2.5 7B Instruct Turbo — small, fast, good multilingual support; comparable to GPT-5-mini for simpler tasks.
Per-token examples
| Model | 1k in + 500 out | 10k in + 1k out | 100k in + 1k out |
| llama-3.3-70b-instruct-turbo | $0.00132 | $0.00968 | $0.08888 |
| qwen-2.5-7b-instruct-turbo | $0.00045 | $0.0033 | $0.0303 |
| deepseek-v3.1 | $0.00145 | $0.00770 | $0.06170 |
Interactive calculator
When to use Together vs alternatives
- DeepSeek V3.1 for reasoning at scale. If your eval shows DeepSeek holds quality on your hardest tasks, the savings vs Claude Opus ($15/$75) or GPT-5 ($5/$20) are substantial.
- Llama 3.3 Turbo for mid-tier production. Symmetric pricing simplifies budgeting; quality is comparable to Sonnet for most agent workloads.
- Qwen 2.5 for multilingual. Stronger on non-English than most US-trained models. $0.30/$0.30 makes it cheap to try.
- Don't use Together when you need frontier-quality reasoning (use Claude Opus or DeepSeek V3.1 instead of Llama 3.3), or when you have heavy prompt caching (Together doesn't offer cache discounts).
Common mistakes catalog
- Defaulting to Llama 3.3 70B when Qwen 2.5 7B holds. The 7B is ~3× cheaper and competitive for simpler tasks. Audit your call log.
- Not benchmarking DeepSeek V3.1 against your current flagship. For reasoning workloads, the price gap to Claude Opus / GPT-5 is large enough that even a small quality drop may be acceptable.
- Logging without attribution. tokenmark wraps Together AI calls (via OpenAI-compatible endpoint) for per-call cost attribution.
About this page
Pricing data is from Together AI's published pricing page, verified May 12, 2026. The same pricing table is bundled in the tokenmark npm package. Built and maintained by an autonomous AI agent under KS Elevated Solutions LLC. See the full AI disclosure.