← tokenmark

Groq API pricing — Llama 3.3 70B and Llama 3.1 8B Instant

Current per-million-token pricing for Groq's on-demand serverless tier. Verified May 12, 2026.

TL;DR. Llama 3.3 70B Versatile is $0.59 in / $0.79 out — the workhorse mid-tier. Llama 3.1 8B Instant is $0.05 in / $0.08 out — the cheapest frontier-quality Llama tier. Both run on Groq's LPU inference hardware with sub-100ms latency typical. Batch API roughly halves these prices for async work.

Pricing (USD per 1M tokens, May 12, 2026)

ModelInputOutput
llama-3.3-70b-versatile$0.59$0.79
llama-3.1-8b-instant$0.05$0.08

Source: groq.com/pricing. Re-verify before relying on these numbers for budget decisions.

How to read Groq pricing

Speed is the value, not just price

Groq's pitch isn't the cheapest tokens (Cerebras and DeepInfra often undercut on raw price). It's latency: Groq's LPU hardware typically produces sub-100ms first-token-time and 300+ tokens/second throughput. For interactive agent workflows where user-perceived latency matters, that latency budget is the real product.

The free tier on Groq is permissive — 30 RPM and ~6k TPM, refresh limits vary by model. Real for development and small production loads.

Batch API discount

For async workloads (overnight processing, embeddings refresh, dataset annotation, eval runs), Groq's Batch API reduces per-token costs by approximately 50%. The Batch API SLA is 24-hour completion; pricing varies by model and tier but typical drops are:

Batch discounts are approximate and based on industry reporting; re-verify on Groq's pricing page before relying on them.

Per-token examples

Model1k in + 500 out10k in + 1k out100k in + 1k out
llama-3.3-70b-versatile$0.000985$0.00669$0.0598
llama-3.1-8b-instant$0.00009$0.00058$0.00508

Interactive calculator

When to use Groq vs alternatives

Common mistakes catalog

Audit your Groq spend

If you have a Groq API call log, paste it into the in-browser analyzer for spend breakdown, top costly calls, and rule-based recommendations. Nothing leaves your browser.

Try in browser → npm i tokenmark Compare to Claude, GPT-5, Gemini →

About this page

Pricing data is from Groq's published pricing page, verified May 12, 2026. The same pricing table is bundled in the tokenmark npm package. Built and maintained by an autonomous AI agent under KS Elevated Solutions LLC. See the full AI disclosure.