Rate limits & quotas

Plan-tiered request limits by endpoint cost tier, the 429 / Retry-After contract, credit rates, and a recommended retry strategy.

Request limits

API requests are rate-limited per account (every API secret on the same account shares the same buckets) by endpoint cost tier, and the limits scale with your plan. Each cell below is requests per minute, implemented as a token bucket — the per-minute number is also the burst capacity, and it refills continuously at that rate.

Cost tierFreeCreatorProBusinessEnterprise*
Generate4103060120
Write3060180360720
Read12024072014402880

* Enterprise defaults shown — custom limits are available; talk to sales.

What each cost tier covers:

Cost tierCoversExamples
GenerateHeavy generation jobsPOST /v1/agent/generate, POST /v1/dynamics/generate, video and book generation
WriteEvery other POST / PUT / PATCH / DELETE, including TTS synthesisprompt / context / speak / file uploads, POST /v1/tts
ReadGET requestsGET /v1/agent/status/*, voice lists, GET /v2/credit-summaries

Your column is determined by your subscription; accounts without one get the Free limits. Plan changes reach the limiter within about a minute — no key rotation needed. Check your plan and keys at Developer → API Keys.

Exceeding a bucket returns 429 with a Retry-After header (see Response headers) and the standard error envelope:

{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Too many generate requests for this api-secret. Retry in ~6s.",
    "httpStatus": 429
  },
  "status": "error",
  "status_code": 429
}

Never rate-limited

Two surfaces are deliberately exempt from the request limiter:

  • Webhooks — webhook traffic is never rate-limited, so signed event deliveries and their retries always go through.
  • Live-session heartbeats — the runtime-token routes (/v1/runtime-tokens*, /v1/runtime/*) that keep a live avatar session authenticated and billing. An active session is never cut off with a 429; live usage is bounded by your credit balance and spend caps instead.

Failed authentication

Repeated failed authentication on key-authenticated endpoints is throttled per client IP at 30 failures per minute — once exceeded, further attempts return 429 until the window clears. Requests with a valid secret are never affected by this throttle. Anonymous, self-authenticating endpoints (token mints, /v1/me, CLI login) carry an additional per-IP limit of 120 requests/minute.

Session concurrency

Concurrency is governed by credits and spend caps, not the request limiter — there is no per-plan cap on simultaneous sessions.

ResourceLimitNotes
Cloud avatar sessionsBounded by creditsEach active session bills per minute; run as many as your balance supports.
Agent generationQueuedHeavy jobs queue and run as capacity frees up.
Dynamics generationQueuedHeavy jobs queue and run as capacity frees up.

Self-hosted deployments are bounded only by your own hardware.

Credit rates

Live sessions bill per minute by model and host; some operations are one-time.

FeatureCredits/min
Voice chat (managed agent, no avatar)10
Camera chat (managed agent, camera on)30
Essence — cloud2
Essence — self-hosted1
Expression — cloud4
Expression — self-hosted2
One-time operationCredits
Agent generation250
Dynamics generation250

Check your balance with GET /v2/credit-summaries — see Billing.

Endpoint guidelines

EndpointGuidance
POST /v1/validateLightweight — use for health checks.
POST /v1/agent/generateHeavy — a 2–5 min async operation.
GET /v1/agent/status/*Poll at 5 s intervals; avoid sub-second polling.
POST /v1/agent/*/speakPer active session — agent must be in a room.
POST /v1/files/upload10 MB image, 100 MB video; size limits enforced.
POST /v1/dynamics/generateHeavy — triggers video generation.

Handling limits

If you exceed limits or run out of credits, the API returns an error:

{
  "error": {
    "code": "INSUFFICIENT_BALANCE",
    "message": "Insufficient credits",
    "httpStatus": 402
  },
  "status": "error",
  "status_code": 402
}

Common status codes: 402 (no credits), 429 (rate limited), 503 (workers busy). See the full error reference.

Response headers

Metered endpoints carry your current rate-limit state, so you can throttle proactively instead of waiting for a 429:

HeaderMeaning
X-RateLimit-LimitYour plan’s limit for the cost tier this request uses.
X-RateLimit-RemainingWhole tokens left right now.
X-RateLimit-ResetUnix time when the bucket is fully refilled.
Retry-After(On 429 only) seconds to wait before retrying.
X-Request-IdCorrelation id for the request — include it in support reports.

Note The X-RateLimit-* headers appear on metered endpoints. Proxied or streaming endpoints — for example raw TTS audio from POST /v1/tts — may omit them, since the response body is a passthrough audio stream. Don’t assume every /v1 response includes them; read them defensively.

Use exponential backoff with jitter for 429 and 503, honoring Retry-After when present:

import time, random, requests

def api_request_with_retry(url, headers, max_retries=3):
    for attempt in range(max_retries):
        resp = requests.post(url, headers=headers)
        if resp.status_code not in (429, 503):
            return resp
        wait = float(resp.headers.get("Retry-After", 2 ** attempt))
        time.sleep(wait + random.uniform(0, 1))
    return resp  # last response if all retries exhausted

Best practices

Use webhooks instead of polling

Rather than polling /v1/agent/status/{id} in a loop, register a webhook and get a signed agent.ready / agent.failed event the moment generation finishes.

Cache agent details

Agent data rarely changes. Cache GET /v1/agent/{code} responses locally and refresh only when needed.

Reuse sessions

Keep avatar sessions alive between conversations instead of creating new ones — session creation is the most expensive operation.

Check credits before heavy operations

Call GET /v2/credit-summaries before agent generation (250 credits) or dynamics creation (250 credits) to avoid calls that fail with 402.

Need more capacity?

Higher plans raise your request limits (see the matrix above) and come with more credits — upgrade (Creator → Pro → Business → Enterprise) on the pricing page, or top up at $1 = 100 credits from the dashboard. For volume, on-prem / air-gapped, or bespoke SLAs beyond Enterprise, talk to sales or reach us via Discord or hello@bithuman.ai.