Proxy-based · 2-line setup · OpenAI & Anthropic

Set the limit.
your agents.

BurnLimit sits between your code and the LLM API.
See exactly where your tokens go — by agent, by prompt, by model.
Get alerted before costs spiral. Not after the bill arrives.

47 developers already on the list · No spam · Ship in Q3 2026

// The problem

Developers describe it the same way.

r/AI_Agents 1.2k upvotes

"My agent racked up $15 in under 10 minutes before I caught it. The comments were full of developers sharing similar stories."

r/ClaudeCode mega-thread

"4 hours of usage gone in 3 prompts. Used plan mode to refactor a frontend. Worst part is I just re-subscribed. Used 11% of my weekly credits."

Indie Hackers 71 comments

"I was not willing to give them my credit card details 😂 — curious how token usage maps to dollar spend but I can never see it clearly."

Hacker News viral thread

"A startup I consulted for burned $12,000/month on AI agents. They had no visibility into which part of the system caused it."

r/LocalLLaMA top post

"I went local specifically because cloud API costs are a black box. The bill arrives and you have no idea which prompt is expensive."

dev.to article

"AI API spend has become one of the fastest-growing line items — and unlike cloud compute, it often stays invisible until the bill arrives."

How devs describe it: "burned through" "invisible until the bill" "silently spiraling" "babysitting" "black box" "woke up to a $300 bill" "no idea which prompt"

// How it works

2-line setup.
Full visibility.

01
Run the proxy
BurnLimit runs locally or in your cloud. One command. Intercepts every LLM call transparently — zero latency impact.
$ npx burnlimit start
✓ Proxy active → :4242
→ OpenAI, Anthropic ready
02
Point your client
Change one line in your code. Your calls still reach OpenAI/Anthropic unchanged. BurnLimit just sees everything.
client = OpenAI(
  api_key="sk-...",
  base_url="localhost:4242"
)
03
See everything
Cost per call, per agent, per model. Loop detection. Quality drift alerts. Budget thresholds. All in a single dashboard.
✓ gpt-4o · $0.0031 · 1.2k tok
⚠ claude-opus · $0.042 · HIGH
● loop detected · agent: main

// What you get

Everything you need.
Nothing you don't.

$
Cost per call
See the exact dollar cost of every LLM call, calculated in real time using current model pricing. No estimation.
OpenAI + Anthropic
Loop detection
Automatically flags when an agent is calling the same prompt repeatedly. The #1 cause of runaway API bills.
Saves $$$
Budget alerts
Set thresholds per call, per session, per day. Get alerted the moment you're about to cross them — not after.
Configurable
Quality drift
Tracks output length and patterns over time. Alerts when model responses start degrading — a sign of context bloat.
Unique to BurnLimit
Model breakdown
See which model is costing the most. Often a single Opus call buried in a pipeline is 10× more expensive than everything else.
Per-model
Zero data sent to us
The proxy runs on your machine or your cloud. Your prompts never touch BurnLimit servers. Full privacy by design.
Open source core

// Pricing

Indie pricing.
Not enterprise.

Indie
$0
forever · self-hosted
  • Full proxy + dashboard
  • OpenAI + Anthropic
  • Local SQLite storage
  • Loop detection
  • No cloud dashboard
  • No team features
Get early access
Team
$29
per month · up to 5 devs
  • Everything in Solo
  • 5 team seats
  • Per-developer breakdown
  • Webhook integrations
  • 1-year log retention
  • Priority support
Join waitlist

// Join the waitlist

Finally see where your tokens go.

47 solo developers already signed up. We're building in public and shipping Q3 2026. Early users get the Solo plan free for 3 months.

No spam. Unsubscribe anytime. Built by a solo dev, for solo devs.

You're on the list! We'll reach out before launch.