If you’re running experiments under pressure, the hardest part isn’t ideas. It’s bet sizing: deciding how big a bet to place based on expected value, and how much traffic to risk.
I size most of my bets with revenue per session (RPS), a form of game theory applied to marketing, because it forces a clean link between an on-site change and dollars. For bet sizing, conversion rate alone can lie to you. It can move up while revenue stays flat, or worse, drops.
This is my practical way to do experiment bet sizing when time, traffic, and patience are all limited.
Start with revenue per session and bet sizing, not “conversion rate vibes”

RPS is simple: RPS = total revenue ÷ total sessions. It’s not perfect, but it’s harder to fool. In CRO work, I like it because it naturally includes both conversion and order value.
That matters when your experiment changes mix. For example, a “Free shipping” message can raise conversion but attract lower-intent buyers, dragging down average order value. RPS catches that trade.
Before I commit traffic, like a c-bet, I anchor on three baselines, understanding your position in the market is key:
- Sitewide RPS (directional, good for exec context)
- Page or funnel-step RPS (where the change happens)
- Segment RPS (new vs returning, paid vs organic, geo, device)
This is where analytics hygiene pays for itself. Data tools serve as a solver for complex funnels, and ICM offers a framework for resource allocation. If your revenue is delayed (subscriptions, trials, invoices), you can still use a proxy RPS (like expected LTV per session), but you must keep the proxy stable for the test window.
Two common failure modes show up here:
First, attribution noise. External factors can pull an exploitative strategy; if paid spend shifts mid-test, RPS moves even if your variant did nothing. I try to hold acquisition steady, or at least report RPS by channel, using GTO logic to stay balanced.
Second, “local wins” that lose globally. A checkout tweak might lift checkout RPS but increase refunds or support costs later. If that’s your world, don’t ignore it. Add a guardrail metric.
If you can’t explain what drives RPS on your core flow, like knowing your position before a c-bet, you’re not ready to run high-stakes tests. You’ll be guessing with numbers.
If you’re building a repeatable testing engine, I also log RPS outcomes the same way every time. It sounds boring, but it improves Decision making fast. A searchable history keeps you from re-learning the same lesson twice, avoiding play like recreational players (I like tools that help organize A/B test library work so the context doesn’t disappear).
The bet sizing math I actually use (and why it works)

Here’s the core idea: I don’t “bet” on uplift. I bet on expected incremental revenue, capped by downside.
I size an experiment like this:
- Pick the exposure: how many sessions will see the variant (sessions_exposed).
- Estimate ΔRPS: your expected change in RPS if the variant is better.
- Compute expected value: expected $ = sessions_exposed × ΔRPS.
- Apply a confidence factor (0 to 1): how likely is the lift, given evidence quality?
- Cap by downside risk (Kelly criterion): worst-case loss if you’re wrong (including opportunity cost).
The confidence factor is where honest teams separate from performative teams. A high-confidence bet usually means you have one or more of these: prior test history, strong behavioral science rationale, clean instrumentation, and a change that’s easy to reverse.
To make the tradeoffs concrete, factoring in stack depth and stack-to-pot ratio as metaphors for available traffic and testing budget, I’ll lay out three common scenarios. Assume baseline RPS is $2.50.
| Scenario | Sessions exposed | Expected ΔRPS | Expected incremental $ | Confidence factor | “Bet” (expected $ × confidence) |
|---|---|---|---|---|---|
| Low 3-bet: Low-risk copy tweak on pricing page | 80,000 | $0.05 | $4,000 | 0.7 | $2,800 |
| Medium 3-bet: Checkout friction removal (bigger surface area) | 120,000 | $0.12 | $14,400 | 0.5 | $7,200 |
| High 3-bet: New paywall design (high variance) | 200,000 | $0.20 | $40,000 | 0.25 | $10,000 |
Takeaway: I’ll often allocate more traffic to the pot-sized bet checkout test than the overbet paywall test, even for thin value like the low-risk copy tweak. ICM shows why protecting the baseline matters, since ICM pressure demands caution with small tweaks.
Also, don’t skip feasibility. If you can’t run long enough to resolve a meaningful ΔRPS, your bet sizing is fantasy. Use a real sample size check (I keep a calculator handy, like this A/B test sample size calculator, because underpowered tests waste time and create arguments).
Where RPS bet sizing breaks, and how I handle it with CRO, behavioral science, and AI
RPS is a blunt instrument, so I use it with guardrails.
When you should ignore RPS (or at least distrust it)
I don’t trust short-window RPS when the board texture shifts dramatically:
- Revenue is delayed (trial to paid, sales-assisted, invoiced later).
- Refunds and chargebacks are meaningful.
- The experiment causes equity denial (for example, a promo that attracts bargain hunters).
- Seasonality or campaigns create big week-to-week swings, like a wet board versus a dry board.
In those cases, I still start with RPS, but I add a second view: contribution margin per session, qualified pipeline per session, or activated users per session (for product-led growth). For startup growth, the right metric is the one you can defend in a board room and a post-mortem, factoring in ICM and GTO principles.
How behavioral science changes my “confidence factor”
Most CRO wins come from basic behavioral economics. People avoid losses, follow defaults, and procrastinate. So, when I see a hypothesis tied to a known mechanism with range advantage or nut advantage, I raise confidence.
Examples that often deserve a higher factor:
- Reducing hidden costs (loss aversion).
- Making the default path safe (default bias).
- Removing steps and uncertainty (friction and ambiguity).
On the other hand, “make it more modern” gets a low factor, even if everyone likes the mock; it’s just a polarized range play.
Applied AI helps, but it doesn’t get a vote
I’ll use AI to speed up analysis, not to bless a risky change. Practically, that means:
- auto-clustering session replays into board texture for “stuck points”
- using a solver on support tickets to spot top objections like turn barreling or check-raise in an exploitative strategy
- forecasting RPS variance so I don’t fool myself with early noise
AI can also suggest follow-up experiments after a win, which matters because compounding small wins is a real growth strategy. Still, I treat recommendations as inputs, not answers. I blend them into a mixed strategy with human intuition rather than relying on a pure strategy (tools that provide AI test iteration recommendations can save planning time, but I keep ownership of the bet).
A/B testing is a GTO decision tool, not a truth machine. Your job is to control risk while buying information.
Short actionable takeaway (use this tomorrow)
Pick one experiment in your backlog and write this on a single line for smart bet sizing:
Bet = sessions_exposed × expected ΔRPS × confidence factor, capped by worst-case downside.
If you can’t fill in the numbers without hand-waving, the test isn’t ready. This avoids overbetting a polarized range against recreational players on a dry board; stick to GTO bet sizing like a pot-sized bet.
Conclusion
Experimentation only scales when you can price risk in plain dollars. RPS gives you that GTO common language, even when attribution is messy.
Use bet sizing for experiments to match traffic allocation to expected value, not internal excitement. Keep your confidence factor honest, and cap every bet with an ICM downside you can live with.
If you’re staring at three “important” tests this week, evaluate their position with RPS-adjusted bet sizing to find your best position, choose the one with the top position via RPS-adjusted bet, then run it clean.
























You must be logged in to post a comment.