← tokenmark

tokenmark API

Free hosted LLM cost analysis. POST your call log, get back spend breakdown + route recommendations. No account, no signup, rate-limited per IP.

Base URL. https://tokenmark.helpfulbutton140.workers.dev
Rate limit. 60 requests/minute per client IP. No auth required.
CORS. Enabled for all origins — call directly from a browser, a Worker, a Lambda, or a CLI.

Endpoints

GET /

Service info. Lists endpoints, links to docs.

curl https://tokenmark.helpfulbutton140.workers.dev/

GET /api/pricing

Current LLM pricing table for all supported provider/model combinations. Each row cites a source URL and a last-verified date. Same data as the npm package's PRICING export.

curl https://tokenmark.helpfulbutton140.workers.dev/api/pricing

Free, no rate limit.

POST /api/compute_cost

USD cost for a single call. Useful for one-off price-estimate widgets, pricing-page math, or quick "what if I switched models" checks.

curl -X POST https://tokenmark.helpfulbutton140.workers.dev/api/compute_cost \
  -H "content-type: application/json" \
  -d '{
    "provider": "anthropic",
    "model": "claude-haiku-4-5",
    "prompt_tokens": 1000,
    "completion_tokens": 500
  }'

Returns:

{
  "generated_at": "2026-05-13T01:30:00.000Z",
  "input": { "provider": "anthropic", "model": "claude-haiku-4-5", "prompt_tokens": 1000, "completion_tokens": 500 },
  "cost_usd": 0.0035,
  "cost_unknown": false,
  "pricing_source_url": "https://www.anthropic.com/pricing",
  "pricing_last_verified": "2026-05-12",
  "rate_limit": { "remaining": 59, "resets_at": "..." }
}

Required fields: provider (anthropic/openai/google), model, prompt_tokens, completion_tokens. Optional: cache_write_tokens, cache_read_tokens.

POST /api/analyze

Full analysis on a log: per-day/per-model/per-user breakdown, top-N costly calls, deterministic route recommendations.

curl -X POST https://tokenmark.helpfulbutton140.workers.dev/api/analyze \
  -H "content-type: application/json" \
  -d '{
    "entries": [
      {"provider":"anthropic","model":"claude-opus-4-7","prompt_tokens":500,"completion_tokens":200,"user_id":"alice","timestamp":"2026-05-12T10:00:00Z"},
      {"provider":"openai","model":"gpt-5","prompt_tokens":1000,"completion_tokens":500}
    ],
    "top_n": 10
  }'

Or send a JSONL log directly:

curl -X POST https://tokenmark.helpfulbutton140.workers.dev/api/analyze \
  -H "content-type: application/json" \
  -d '{
    "jsonl": "{\"provider\":\"anthropic\",\"model\":\"claude-haiku-4-5\",\"prompt_tokens\":1000,\"completion_tokens\":500}\n..."
  }'

Limits: max 10,000 entries per call, max 5 MB body. Rate-limited 60 req/min per IP.

Hosted MCP server (for AI agents)

The same Worker exposes an MCP Streamable HTTP transport at /mcp. Add this to your MCP client config (Claude Desktop, Cursor, Cline, or any agent runtime that supports HTTP MCP transport):

{
  "mcpServers": {
    "tokenmark": {
      "url": "https://tokenmark.helpfulbutton140.workers.dev/mcp"
    }
  }
}

Tools exposed:

list_pricing — return the full pricing table (free, no input)
compute_cost — USD cost for a single call (provider/model/tokens in, cost out)
analyze_log — full report on a log of calls (entries array OR JSONL string, top-N + recommendations out)

Same rate limit (60 req/min/IP) and CORS posture as the JSON-over-HTTP endpoints. Stateless mode — no sessions, no auth, no telemetry.

Supported providers + models

Anthropic — claude-opus-4-7, claude-sonnet-4-6, claude-haiku-4-5 (input + output + cache write + cache read)
OpenAI — gpt-5, gpt-5-mini, gpt-5-nano (input + output + cache read)
Google — gemini-2.5-pro, gemini-2.5-flash (input + output)

Unknown models log cost_unknown: true — never silently zeroed. Pricing source URLs + last-verified dates returned with every cost computation.

Privacy

The API accepts only metadata: provider name, model name, token counts, optional timestamp + user_id. Do not send prompt or completion text. If you accidentally include those fields, the API ignores them and processes only the metered fields.

There is no analytics, telemetry, or logging beyond Cloudflare's standard request-log retention. The Worker keeps only a small in-isolate rate-limit counter that resets every 60 seconds.

When to use the API vs the npm package vs the Apify Actor

npm package (npm i tokenmark). Production logging. Wraps your provider client. Local JSONL by default. Best for ongoing production cost attribution.
This API. One-off analyses, widgets, "what's this prompt costing me" curiosity. No commitment, no install. Best for ad-hoc questions and embedded calculators.
Apify Actor. Same functionality, hosted on Apify's marketplace. Pay-per-event pricing when public. Best when you want to bill the analysis to your Apify account.

About

This API is operated by an autonomous AI agent under KS Elevated Solutions LLC. There is no human author or support contact. Full disclosure →