← All posts

The complete guide to monetising your AI app in 2026

Building an AI feature into your app is the easy part. You pick a provider, write a few API calls, and ship it. The hard part comes a few weeks later when you realise that one power user ran an automated script overnight and burned through your entire monthly OpenAI budget before you woke up.

Most developers solve this the wrong way. They add a hard limit, block users who hit it, and move on. The result is predictable: blocked users churn. They do not email you asking how to pay more. They just leave and find something else.

The better approach is to treat a limit as a conversion opportunity rather than a dead end. This post covers what the full monetisation stack for an AI app actually looks like, why it is more complex than it first appears, and how to implement it without spending weeks on billing infrastructure.

The problem with per-user AI costs

When you build a consumer AI app, you pay the API bills centrally. Every user in your app draws from the same OpenAI or Anthropic account. This is fine when you have ten users. It becomes a serious problem when you have a thousand, because usage patterns across users are wildly uneven.

The top 5% of users in a typical AI app account for 40 to 60% of total API spend. These are not bad actors most of the time. They are your most engaged users. They are also the ones most likely to pay if you give them a way to.

The mistake most developers make is treating this as a cost control problem. It is actually a revenue problem. The question is not how do you stop heavy users from costing you money. The question is how do you convert them into paying customers at the exact moment they are most engaged with your product.

What the full monetisation stack looks like

A complete monetisation layer for an AI app has several distinct components that need to work together.

Per-user cost tracking

Before you can limit or monetise anything, you need to know what each user is actually costing you. This means tracking AI spend per user in real time, not just total spend on your provider dashboard. You need to know that user A cost you $0.40 this month and user B cost you $4.20.

The natural approach is to log this to your database after each API call. The problem is latency. A synchronous database write before or after every AI call adds 100 to 400ms in a serverless environment. Users notice.

The right architecture uses Redis counters. Redis reads take under 1ms and writes are non-blocking. You store a running spend total per user per month and check it before each AI call. For streaming responses you log after the stream completes so you never block the user experience.

Configurable limits per tier

Once you are tracking spend per user, you need to enforce limits. This sounds simple but there are several dimensions to consider.

Monthly limits are the most common. A free user gets $0.50 of AI spend per month. A paid user gets $10. When they hit the limit, they are blocked until the next month or until they buy more.

Daily limits are useful for preventing single-session abuse. A user who runs an automated script for six hours should hit a daily limit long before they exhaust their monthly allowance.

Weekly limits sit between the two and are useful for smoothing out usage patterns.

The limits need to be configurable per tier without code changes. You should be able to update a free tier from $0.50 to $1.00 and have it take effect immediately for all free users without a deployment.

Credits

Blocking users when they hit their limit and making them wait until next month is the worst possible outcome. Most of those users would pay to continue right now. Credits solve this.

The model is straightforward. A user buys a credit pack, say $5 that gives them $3 of additional AI spend. When they hit their tier limit, credits kick in automatically. When credits run out, they are blocked.

The complexity comes in how you price and display credits. You never want end users to see raw AI spend figures. If a user paid $5 for a credit pack, they should see $5 depleting as they use the app, not $3 of AI spend. The display balance needs to be proportional to what they paid, not to what it costs you.

You also need to decide whether credits bypass daily and weekly limits or only monthly limits. The answer depends on your use case. If daily limits are about abuse prevention they should be hard caps. If they are about managing your own costs, credits should bypass them.

Upgrade prompts

This is the part most developers skip entirely, and it is where most of the revenue opportunity lives.

When a user hits their limit, you have a narrow window where they are maximally motivated to pay. They just tried to do something and were told they cannot. If you show them a clear path to continuing right now, a meaningful percentage will take it.

A good upgrade prompt shows the user exactly where they are, what their options are (credit top-up, subscription upgrade, or both), and makes it trivially easy to complete the transaction. A bad upgrade prompt is a 402 error with a link to your pricing page.

The prompt should appear in context, not navigate the user away from what they were doing. An iframe overlay that loads over the current page is the cleanest pattern. The user sees their usage, picks an option, pays through Stripe, and continues without losing their place in your app.

Conversion analytics

Once you have all of the above working, the question becomes how do you know what is working and what is not. You need to track the full funnel from limit hit to conversion.

How many times was the upgrade prompt shown this month? How many users clicked through to checkout? How many completed the purchase? What was the average order value? Which users are most profitable after accounting for their AI spend?

This data tells you whether your credit pricing makes sense, whether your upgrade prompts are converting well, and which users you should be trying to move to higher tiers.

Why building this yourself takes longer than you think

Most developers underestimate how much work the full stack involves. The individual pieces are not complicated but they have to work together correctly and handle edge cases that only become obvious in production.

The Redis architecture for fast spend tracking needs to handle concurrent requests without race conditions. The streaming case for OpenAI and Anthropic requires different approaches for extracting token counts from the response. The Stripe webhook for credit purchases needs to be idempotent. The proportional credit display calculation needs to handle multiple top-ups correctly. The upgrade prompt iframe needs to communicate back to the parent page via postMessage for dismissal.

None of this is rocket science. It is just a lot of careful plumbing that takes several weeks to get right, and it has nothing to do with the core product you are actually trying to build.

What Nasca handles

Nasca is an SDK that implements the full stack described above. You wrap your AI function once, define your tiers and limits in a dashboard, connect your Stripe account, and Nasca handles the tracking, blocking, upgrade prompts, and checkout.

const nasca = new Nasca({
  accountId: process.env.NASCA_ACCOUNT_ID,
  workerUrl: process.env.NASCA_WORKER_URL,
  apiKey: process.env.NASCA_API_KEY,
  getUserId: (ctx) => ctx.user.id,
  getUserTier: (ctx) => ctx.user.plan,
  successUrl: 'https://yourapp.com/billing/success',
  cancelUrl: 'https://yourapp.com/billing',
})

const callAI = nasca.wrap(
  openai.chat.completions.create.bind(openai.chat.completions)
)

try {
  const result = await callAI({ model: 'gpt-4o', messages: [...] }, ctx)
} catch (e) {
  if (e instanceof NascaBlockedError) {
    return res.status(402).json({
      message: e.upgrade_message,
      upgradeUrl: e.checkout_url,
    })
  }
}

When a user hits their limit, Nasca creates a hosted upgrade prompt at a URL you redirect them to. The prompt shows their current usage, their available credit packs and subscription options, and a Stripe checkout. Payments go directly to your Stripe account. Nasca takes 2% of each conversion, nothing while you are still growing.

The intercept check runs in a Cloudflare Worker and reads from Upstash Redis, adding under 50ms to each AI call. Usage logging happens after the response is returned so it never blocks the user.

You can display usage to your end users using the getUsage() SDK method, which returns their current spend as a percentage of their limit and their credit balance in terms of what they paid rather than what it costs you.

const usage = await nasca.getUsage(ctx)
// usage.monthly_percent — how much of their limit they have used
// usage.credit_balance_display — their credit balance in dollars paid
// usage.resets_at — when their monthly limit resets

Getting started

Nasca is free for your first 100 users with no credit card required. The full integration takes about ten minutes. nasca.dev has the docs and a step by step setup guide.