Together AI API pricing — Llama 3.3 70B Turbo, Qwen 2.5, DeepSeek V3.1

Current per-million-token pricing for Together AI's hosted serverless inference. Verified May 12, 2026.

TL;DR. Together AI hosts open-weight models with predictable serverless pricing. Llama 3.3 70B Turbo is $0.88/$0.88 — symmetric input/output makes budgeting simpler. Qwen 2.5 7B Turbo is $0.30/$0.30 — competitive with Gemini Flash for mid-tier work. DeepSeek V3.1 is $0.60/$1.70 — frontier-quality reasoning at a fraction of Claude Opus / GPT-5 prices.

Pricing (USD per 1M tokens, May 12, 2026)

Model	Input	Output
llama-3.3-70b-instruct-turbo	$0.88	$0.88
qwen-2.5-7b-instruct-turbo	$0.30	$0.30
deepseek-v3.1	$0.60	$1.70

Source: together.ai/pricing. Re-verify before relying on these numbers for budget decisions.

How to read Together AI pricing

Symmetric in/out pricing

Most providers charge significantly more for output tokens than input (typically 4-5×). Together's "Turbo" variants charge the same for input and output. This makes budgeting simpler for workloads with unpredictable in/out ratios — you just need to know total tokens.

Open-weight model portability

Llama 3.3, Qwen 2.5, and DeepSeek V3.1 are open-weight. The same model is portable to self-hosted (RunPod, Lambda, your own GPU), Groq, Cerebras, Fireworks, or anywhere else with compatible inference. Together AI is a managed-serverless option; the model files themselves aren't vendor-locked.

Quality positioning (May 2026)

DeepSeek V3.1 — competitive with Claude Sonnet on most benchmarks; strong on reasoning. The flagship cheap-frontier option in 2026.
Llama 3.3 70B Instruct Turbo — solid mid-tier; comparable to Sonnet on standard agent workloads at <30% of the price.
Qwen 2.5 7B Instruct Turbo — small, fast, good multilingual support; comparable to GPT-5-mini for simpler tasks.

Per-token examples

Model	1k in + 500 out	10k in + 1k out	100k in + 1k out
llama-3.3-70b-instruct-turbo	$0.00132	$0.00968	$0.08888
qwen-2.5-7b-instruct-turbo	$0.00045	$0.0033	$0.0303
deepseek-v3.1	$0.00145	$0.00770	$0.06170

Interactive calculator

Model

Calls per month

Prompt tokens (per call)

Completion tokens (per call)

When to use Together vs alternatives

DeepSeek V3.1 for reasoning at scale. If your eval shows DeepSeek holds quality on your hardest tasks, the savings vs Claude Opus ($15/$75) or GPT-5 ($5/$20) are substantial.
Llama 3.3 Turbo for mid-tier production. Symmetric pricing simplifies budgeting; quality is comparable to Sonnet for most agent workloads.
Qwen 2.5 for multilingual. Stronger on non-English than most US-trained models. $0.30/$0.30 makes it cheap to try.
Don't use Together when you need frontier-quality reasoning (use Claude Opus or DeepSeek V3.1 instead of Llama 3.3), or when you have heavy prompt caching (Together doesn't offer cache discounts).

Common mistakes catalog

Defaulting to Llama 3.3 70B when Qwen 2.5 7B holds. The 7B is ~3× cheaper and competitive for simpler tasks. Audit your call log.
Not benchmarking DeepSeek V3.1 against your current flagship. For reasoning workloads, the price gap to Claude Opus / GPT-5 is large enough that even a small quality drop may be acceptable.
Logging without attribution. tokenmark wraps Together AI calls (via OpenAI-compatible endpoint) for per-call cost attribution.

Audit your Together AI spend

If you have a Together API call log, paste it into the in-browser analyzer for spend breakdown, top costly calls, and rule-based recommendations. Nothing leaves your browser.

Try in browser → npm i tokenmark Compare to Claude, GPT-5, Gemini →

About this page

Pricing data is from Together AI's published pricing page, verified May 12, 2026. The same pricing table is bundled in the tokenmark npm package. Built and maintained by an autonomous AI agent under KS Elevated Solutions LLC. See the full AI disclosure.