How to Write an Experiment Pre-Registration Doc That Stops P-Hacking in Growth Teams

Ever had an A/B test that “won” on Friday and “lost” by Tuesday? That swing is often real variance, but it’s also a sign the team is touching the dials mid-flight. When goals are aggressive and dashboards update in real time, it’s easy to chase a green number.

An experiment pre-registration doc fixes that by doing one simple thing: it forces you to write down your intent before you see the outcome. Think of it like sealing your analysis plan in an envelope before you open the results.

What p-hacking looks like in growth teams (and why it happens)

Clean, modern vector illustration in split panels contrasting p-hacking pitfalls like metric switching, optional stopping, and repeated peeks on the left with stable pre-registration practices on the right. — Common p-hacking traps versus a locked pre-registration plan, created with AI.

P-hacking in growth work rarely looks like fraud. It looks like “being agile.” Common patterns:

Metric switching: You planned to judge on activation rate, but retention moved, so retention becomes the headline.
Optional stopping: The test is called early when it looks good, or extended when it doesn’t.
Repeated peeks: You check results daily and stop the moment p < 0.05.
Post-hoc segments: “It didn’t work overall, but it worked for mobile users in Canada.”
Removing ‘bad’ data: Excluding outliers, refunds, or “weird days” after seeing they hurt the result.

These behaviors are so common that many teams barely notice them anymore. If you want a practical, growth-focused breakdown, Jason Cohen’s write-up on p-hacking your A/B tests is a good mirror to hold up to your process.

What an experiment pre-registration doc is (for A/B tests)

Pre-registration is popular in academic research, but it maps cleanly to product, marketing, and lifecycle tests. You write down:

what you’re changing
what “success” means
how long you’ll run
what analysis you’ll use
what you will not change after launch

If you want a canonical reference, Open Science Framework’s overview of registrations and preregistrations is a solid starting point.

This is also aligned with the American Statistical Association’s guidance on not treating p-values like a magic pass or fail button. The ASA statement is short and worth bookmarking: ASA statement on p-values (PDF).

The doc sections that block the usual p-hacking moves

A clean, modern vector-style illustration in landscape ratio showing an open document with structured sections like Hypothesis, Metrics, Sample Size, and Analysis Plan. Background features a data funnel, locked folder, and team handshake, using blue/teal colors with flat design for an organized, trustworthy feel. — An example of a structured pre-registration document layout, created with AI.

A good pre-reg doc is short, but it’s opinionated. These fields do most of the work.

1) Primary metric + decision rule (stops metric switching)

Write one primary metric, one definition, one decision rule.

Example: “Primary metric = activation within 24 hours. Ship only if effect is positive and statistically significant at alpha 0.05, and guardrails pass.”

Also list secondary metrics, but label them as supporting evidence, not the thing you will use to declare victory.

2) Fixed run length + stopping rule (stops optional stopping and peeking)

Pre-commit to either:

Fixed horizon: “Run for 14 full days, evaluate once at the end.”

Or sequential testing (allowed peeks): “Evaluate at day 7 and day 14 with alpha spending.” You don’t need heavy math in the doc, just state the method. Two readable intros are Understanding Group Sequential Testing and Error Spending in Sequential Testing Explained.

Key point: if peeking is allowed, it must be structured. If it’s not structured, it’s p-hacking with better charts.

3) Population, unit, and bucketing (stops “we changed who counts”)

Lock:

Unit of randomization (user, account, session)
Eligibility window (new signups only, last 30 days)
Exposure definition (what counts as “saw treatment”)
One user, one bucket rules (no cross-device reassignment, if possible)

This prevents redefining the denominator after the fact.

4) Data exclusions and quality rules (stops removing ‘bad’ data)

Write exclusions before launch. Keep them narrow and operational.

Good: bot traffic filters, internal users, known tracking outages with timestamps, duplicate accounts rule.

Risky: “Remove extreme spenders,” “remove angry users,” or “remove days where conversion was weird.”

If you must exclude anything subjective, require an amendment and a separate “exploratory” result.

5) Segmentation plan (stops post-hoc segments)

Pre-specify the only segments you’ll treat as confirmatory.

Example: “Confirmatory segments: device (mobile vs desktop) and plan (free vs trial). All other slices are exploratory.”

This doesn’t ban exploration. It just stops you from presenting a lucky slice as if you planned it.

6) Multiple comparisons controls (stops false wins when you test many things)

Growth teams often test:

many metrics
many variants
many segments
many experiments per month

That’s a multiple comparisons problem. Your pre-reg doc should pick one approach:

Pre-specified hierarchy: one primary metric, then only test secondary metrics if primary passes.
Bonferroni or Holm: more conservative, simple to explain for a small set of metrics.
False Discovery Rate (FDR) control: useful when you’re screening many hypotheses.

You don’t need to teach stats in the doc. You just need to state what rule you’ll follow.

Governance: what must be locked before launch vs what can change

In 2025, experimentation is faster than ever, but governance still matters. The easiest policy is “lock the parts that can create a false win.”

Item	Must be locked before launch	Can change with amendment log
Hypothesis and primary metric	Yes	No (start a new experiment)
Eligibility, unit, bucketing	Yes	Rarely (only for bugs)
Stopping rule and peek schedule	Yes	No (start a new experiment)
Exclusions and data quality rules	Yes	Yes (with timestamps and reason)
Secondary metrics and segments	Yes	Yes (but marked exploratory)
Instrumentation details	No	Yes
Run dates (if incident occurs)	No	Yes (with documented incident)

Amendment log rule: if you change anything that would make the result easier to “win,” you either restart the test or treat outcomes as exploratory.

Copy/paste experiment pre-registration template (Markdown)

Experiment pre-registration (v1.0)

Experiment name:
Owner:
Reviewer (data/analytics):
Decision maker:
Created on (date):
Planned launch (date):

1) Goal and hypothesis

Change description:
Hypothesis (directional):
Primary decision: ship, iterate, or stop

2) Primary metric (confirmatory)

Primary metric name:
Metric definition (numerator/denominator, window):
Decision rule (include alpha and direction):

3) Guardrails

Guardrail metrics (and fail thresholds):

4) Population and assignment

Eligibility:
Unit of randomization:
Variants (control, treatment):
Bucketing method:
Exposure definition:

5) Sample size and duration

Planned duration:
Target sample size (or MDE assumptions):
Seasonality risks (if any):

6) Stopping and peeking

Stopping rule (fixed horizon or sequential):
Peek schedule (if any):
Early stop criteria (efficacy, futility, safety):

7) Analysis plan

Primary test method:
Handling repeated users/sessions:
Multiple comparisons control (hierarchy, Holm, FDR):
Segment plan (confirmatory segments only):
Missing data and tracking checks:

8) Exclusions (pre-committed)

Exclude:
Do not exclude:

9) Reporting plan

Where results will be posted:
Template for final readout:

Amendment log

Date:
Change:
Reason:
Impact on confirmatory vs exploratory:
Approved by:

Filled example: onboarding email subject line test (growth team)

Experiment name: Onboarding Email 1 Subject Line
Owner: Lifecycle PM
Reviewer: Analytics Lead
Planned launch: Jan 6, 2026

Goal and hypothesis
Change: Subject line “Welcome to Acme” (control) vs “Your first win in 5 minutes” (treatment).
Hypothesis: Treatment increases activation within 24 hours.

Primary metric (confirmatory)
Primary metric: Activation rate within 24 hours of signup.
Definition: Activated users / delivered-email recipients, 24-hour window from signup.
Decision rule: Ship if uplift > 0 and significant at 0.05, and guardrails pass.

Guardrails
Unsubscribe rate: do not increase by more than 0.15 percentage points.
Spam complaint rate: do not increase by more than 0.02 percentage points.

Population and assignment
Eligibility: New signups, excluding internal domains and known bots.
Unit: User.
Exposure: Email delivered within 30 minutes of signup.
Bucketing: 50/50 split by user_id hash.

Sample size and duration
Duration: 14 days to cover weekday cycles.
Sample size: Run until 20,000 delivered emails total (based on prior baseline variance).

Stopping and peeking
Sequential plan: Two looks (day 7 and day 14) using alpha spending (pre-set). No other peeks.

Analysis plan
Primary method: Two-proportion test on activation rate, report effect size and confidence interval.
Multiple comparisons: Hierarchy (primary metric first; then guardrails; then secondary metrics).
Segments: Confirmatory segments are device (mobile/desktop) only. Any other segments are exploratory.

Exclusions
Exclude: internal users, bot signups, known tracking outage window (if it occurs, logged).
Do not exclude: low-engagement users, refunds, “weird days” without incident ticket.

Conclusion

A strong experiment pre-registration doc doesn’t slow growth teams down, it stops you from arguing with your past self. It makes wins more believable, losses more useful, and post-test decisions less political. Start with one template, enforce the locked fields, and keep an amendment log that’s painful to abuse. If your next “win” can’t survive that process, it wasn’t a win you could trust.

How to Write an Experiment Pre-Registration Doc That Stops P-Hacking in Growth Teams

What p-hacking looks like in growth teams (and why it happens)

What an experiment pre-registration doc is (for A/B tests)

The doc sections that block the usual p-hacking moves

1) Primary metric + decision rule (stops metric switching)

2) Fixed run length + stopping rule (stops optional stopping and peeking)

3) Population, unit, and bucketing (stops “we changed who counts”)

4) Data exclusions and quality rules (stops removing ‘bad’ data)

5) Segmentation plan (stops post-hoc segments)

6) Multiple comparisons controls (stops false wins when you test many things)

Governance: what must be locked before launch vs what can change

Copy/paste experiment pre-registration template (Markdown)

Experiment pre-registration (v1.0)

Filled example: onboarding email subject line test (growth team)

Conclusion

Like this:

Comments

Leave a ReplyCancel reply

More posts

How To Use Feature Flags for Safe Experiment Rollouts (Without Betting the Business)

How to Run Holdout Tests to Prove Incremental Revenue (and Stop Guessing)

Experiment Bet Sizing Using Revenue Per Session (RPS)

The Experiment Brief Template That Prevents Months of Thrash

How to Write an Experiment Pre-Registration Doc That Stops P-Hacking in Growth Teams

What p-hacking looks like in growth teams (and why it happens)

What an experiment pre-registration doc is (for A/B tests)

The doc sections that block the usual p-hacking moves

1) Primary metric + decision rule (stops metric switching)

2) Fixed run length + stopping rule (stops optional stopping and peeking)

3) Population, unit, and bucketing (stops “we changed who counts”)

4) Data exclusions and quality rules (stops removing ‘bad’ data)

5) Segmentation plan (stops post-hoc segments)

6) Multiple comparisons controls (stops false wins when you test many things)

Governance: what must be locked before launch vs what can change

Copy/paste experiment pre-registration template (Markdown)

Experiment pre-registration (v1.0)

Filled example: onboarding email subject line test (growth team)

Conclusion

Share this:

Like this:

Comments

Leave a ReplyCancel reply

More posts

How To Use Feature Flags for Safe Experiment Rollouts (Without Betting the Business)

How to Run Holdout Tests to Prove Incremental Revenue (and Stop Guessing)

Experiment Bet Sizing Using Revenue Per Session (RPS)

The Experiment Brief Template That Prevents Months of Thrash

Discover more from Decision Driven Test Repository→ GrowthLayer.app