Skip to content

What Is Incrementality Testing? Holdout Experiments for Ecommerce

Attribution reports tell you which ads were present when a sale happened. Incrementality testing tells you which ads actually caused the sale. That distinction — correlation vs causation — is the difference between spending your budget confidently and spending it blindly. Here’s how holdout experiments work, what the research shows, and how Shopify merchants can run them without an enterprise budget.

The core problem: attribution is not causation

Standard attribution models assign credit to ads based on proximity to the purchase. Last-click gives credit to the final touchpoint. Data-driven splits credit across all exposures. But neither model answers the fundamental question: would this customer have bought anyway, even without seeing the ad?

Consider a customer who sees your retargeting ad for a product they added to their cart three days ago. They were going to complete the purchase tonight regardless. Your retargeting ad claims credit for the conversion. Your reported ROAS for that campaign looks strong. But the incremental value of that ad was zero — you paid for a conversion you already had. Incrementality testing is the only method that can quantify this systematically.

How the holdout methodology works

Step 1: Define the test and control groups

Randomly split your target audience. The test group (80–90% of the audience) receives your ads normally. The holdout group (10–20%) is excluded from seeing any ads in the campaign being tested. Random assignment is critical — any non-random split introduces selection bias that corrupts the results.

Step 2: Run the experiment for 2–4 weeks

Minimum duration is 14 days to account for weekly purchase cycle variation. Longer is better — 4 weeks captures two full purchase cycles. During the test, do not change any other variables: don’t adjust creative, budget, or targeting in the test campaign. External changes (promotions, seasonality) that affect both groups equally won’t bias the results; changes that affect only one group will.

Step 3: Compare conversion rates between groups

At the end of the test, calculate the conversion rate for both groups. The difference is your incremental lift. If 3.2% of the exposed group converted and 2.1% of the holdout group converted, your incremental lift is 1.1 percentage points — meaning 34% of conversions in the exposed group were truly incremental. The remaining 66% would have converted anyway.

Step 4: Translate lift to true ROAS

Multiply your reported ROAS by your incrementality percentage. If your reported ROAS is 4.0x and only 34% of conversions are incremental, your true ROAS is 4.0 × 0.34 = 1.36x. That’s almost certainly below your profitability threshold and the number you need for budget allocation decisions. See the true ROAS explainer for the full correction model.

What the research found: P&G and eBay

The most cited evidence for incrementality testing comes from two landmark studies that fundamentally changed how large advertisers think about attribution.

eBay (2014, Quarterly Journal of Economics): Researchers ran a large-scale holdout experiment on eBay’s branded paid search campaigns. The finding was stark: for users who frequently searched for eBay, the incremental effect of paid search was near zero. These users were going to find eBay and buy regardless of whether an ad appeared. eBay was spending tens of millions of dollars annually on clicks it already owned organically. The study was peer-reviewed and published — not a vendor whitepaper.

P&G (2017): Procter & Gamble cut $200 million from its digital advertising budget after running incrementality analysis across channels. Their analysis found that a significant portion of digital ad spend was reaching people who had already decided to buy their products or who were entirely unresponsive to digital ads. Revenue didn’t drop. Incrementality testing identified the waste and P&G redirected the budget to higher-lift channels. Read more about these findings in our guide to running holdout experiments as a Shopify merchant.

Running incrementality tests as a Shopify merchant

You don’t need P&G’s budget to run meaningful incrementality tests. Meta’s Conversion Lift feature is available in Ads Manager and supports holdout group creation at the campaign level. For a minimum test, you need an audience of at least 100,000 people and a budget large enough to reach the test group with meaningful frequency during the test window.

An alternative for smaller stores is a geographic holdout: identify two comparable market areas (similar demographics, purchase rates, and seasonality), run ads in one and withhold in the other, then compare Shopify order rates from each geography over 3–4 weeks. It’s less statistically clean than a user-level split, but it produces directional data that is far better than no measurement at all.

The Ripplux methodology uses a combination of holdout-derived industry benchmarks and your actual Shopify order data to estimate incrementality rates without requiring you to run a full holdout experiment — though running one remains the gold standard and is always recommended for high-spend channels.

Incrementality vs attribution: which to trust

Attribution is useful for understanding the customer journey and which touchpoints are involved in purchases. Incrementality is the only tool for answering whether those touchpoints are causing purchases. Use attribution for creative and placement optimization. Use incrementality for budget allocation decisions. The two tools answer different questions and neither replaces the other. That’s the argument the ROAS manifesto makes in detail — attribution models weren’t designed to tell you where to spend; they were designed to describe what happened.

Estimate your incremental waste without running a holdout

The free calculator uses published research benchmarks to estimate how much of your reported ROAS is incremental vs claimed.

Related reading