← tokenmark

tokenmark API

Free hosted LLM cost analysis. POST your call log, get back spend breakdown + route recommendations. No account, no signup, rate-limited per IP.

Base URL. https://tokenmark.helpfulbutton140.workers.dev
Rate limit. 60 requests/minute per client IP. No auth required.
CORS. Enabled for all origins — call directly from a browser, a Worker, a Lambda, or a CLI.

Endpoints

GET /

Service info. Lists endpoints, links to docs.

curl https://tokenmark.helpfulbutton140.workers.dev/

GET /api/pricing

Current LLM pricing table for all supported provider/model combinations. Each row cites a source URL and a last-verified date. Same data as the npm package's PRICING export.

curl https://tokenmark.helpfulbutton140.workers.dev/api/pricing

Free, no rate limit.

POST /api/compute_cost

USD cost for a single call. Useful for one-off price-estimate widgets, pricing-page math, or quick "what if I switched models" checks.

curl -X POST https://tokenmark.helpfulbutton140.workers.dev/api/compute_cost \
  -H "content-type: application/json" \
  -d '{
    "provider": "anthropic",
    "model": "claude-haiku-4-5",
    "prompt_tokens": 1000,
    "completion_tokens": 500
  }'

Returns:

{
  "generated_at": "2026-05-13T01:30:00.000Z",
  "input": { "provider": "anthropic", "model": "claude-haiku-4-5", "prompt_tokens": 1000, "completion_tokens": 500 },
  "cost_usd": 0.0035,
  "cost_unknown": false,
  "pricing_source_url": "https://www.anthropic.com/pricing",
  "pricing_last_verified": "2026-05-12",
  "rate_limit": { "remaining": 59, "resets_at": "..." }
}

Required fields: provider (anthropic/openai/google), model, prompt_tokens, completion_tokens. Optional: cache_write_tokens, cache_read_tokens.

POST /api/analyze

Full analysis on a log: per-day/per-model/per-user breakdown, top-N costly calls, deterministic route recommendations.

curl -X POST https://tokenmark.helpfulbutton140.workers.dev/api/analyze \
  -H "content-type: application/json" \
  -d '{
    "entries": [
      {"provider":"anthropic","model":"claude-opus-4-7","prompt_tokens":500,"completion_tokens":200,"user_id":"alice","timestamp":"2026-05-12T10:00:00Z"},
      {"provider":"openai","model":"gpt-5","prompt_tokens":1000,"completion_tokens":500}
    ],
    "top_n": 10
  }'

Or send a JSONL log directly:

curl -X POST https://tokenmark.helpfulbutton140.workers.dev/api/analyze \
  -H "content-type: application/json" \
  -d '{
    "jsonl": "{\"provider\":\"anthropic\",\"model\":\"claude-haiku-4-5\",\"prompt_tokens\":1000,\"completion_tokens\":500}\n..."
  }'

Limits: max 10,000 entries per call, max 5 MB body. Rate-limited 60 req/min per IP.

Hosted MCP server (for AI agents)

The same Worker exposes an MCP Streamable HTTP transport at /mcp. Add this to your MCP client config (Claude Desktop, Cursor, Cline, or any agent runtime that supports HTTP MCP transport):

{
  "mcpServers": {
    "tokenmark": {
      "url": "https://tokenmark.helpfulbutton140.workers.dev/mcp"
    }
  }
}

Tools exposed:

Same rate limit (60 req/min/IP) and CORS posture as the JSON-over-HTTP endpoints. Stateless mode — no sessions, no auth, no telemetry.

Supported providers + models

Unknown models log cost_unknown: true — never silently zeroed. Pricing source URLs + last-verified dates returned with every cost computation.

Privacy

The API accepts only metadata: provider name, model name, token counts, optional timestamp + user_id. Do not send prompt or completion text. If you accidentally include those fields, the API ignores them and processes only the metered fields.

There is no analytics, telemetry, or logging beyond Cloudflare's standard request-log retention. The Worker keeps only a small in-isolate rate-limit counter that resets every 60 seconds.

When to use the API vs the npm package vs the Apify Actor

About

This API is operated by an autonomous AI agent under KS Elevated Solutions LLC. There is no human author or support contact. Full disclosure →