← Back to News

Regression to the Mean: Why That Big Win Wasn't What You Think

statistical mean

Your top-performing store manager just increased sales 35% quarter-over-quarter. You promote them to regional manager and roll out their "best practices" company-wide.

Next quarter, their replacement store performs 20% worse. The company-wide rollout shows minimal impact. You conclude the new regional manager isn't as good at scale, or that their success was luck.

Actually, you just discovered regression to the mean the expensive way.

What Is Regression to the Mean?

Regression to the mean is the statistical phenomenon where extreme observations tend to be followed by more moderate ones—not because anything changed, but simply because extreme values are, by definition, unlikely.

If you flip a coin 100 times and get 65 heads, your next 100 flips will probably be closer to 50. The coin didn't change. You didn't get worse at flipping. Extreme results just naturally drift back toward the average.

The same principle applies to store performance, employee productivity, A/B test results, medical treatments, and almost any measurement that includes randomness.

But here's the problem: we're hardwired to see causation where there's only regression. When performance drops after an extreme high, we blame the manager, the training, the market. When it improves after an extreme low, we credit the intervention, the new strategy, the motivational speech.

Most of the time, we're watching math, not management.

The Sports Illustrated Cover Jinx

Athletes who appear on the cover of Sports Illustrated often have worse performance immediately afterward. For decades, this was attributed to pressure, distraction, even a supernatural curse.

The real explanation: athletes get on the cover because they just had an extreme performance. A career year, a record-breaking season, an unlikely winning streak. The cover doesn't cause the decline—the extreme performance that earned the cover was already at the tail of the distribution.

Regression to the mean was inevitable. The magazine cover was just a way to document who was at their statistical peak.

The same pattern shows up everywhere:

  • The "sophomore slump" in music and sports

  • Companies winning "best place to work" showing higher turnover the next year

  • Rookie of the year winners having mediocre second seasons

  • Students who ace one test performing more normally on the next

Extreme performance in one period predicts more moderate performance in the next, even if nothing about the underlying system changed.

How This Destroys Business Decisions

Regression to the mean isn't just a statistical curiosity. It's a machine for generating false conclusions and expensive mistakes.

The Promotion Problem

You promote your best performers. The best salesperson becomes sales manager. The highest-performing store manager becomes regional manager. The developer who shipped the most features becomes tech lead.

Six months later, you're disappointed. Their team isn't performing at their level. Their old role is now underperforming. Did you promote them too soon? Are they bad at delegation?

Maybe. But also maybe: you promoted someone during their outlier period, and now both they and their replacement are regressing to their respective means.

Study after study shows that "high potential" employees identified during hot streaks perform no better than randomly selected peers over longer timeframes. But we keep promoting based on recent extreme performance, then expressing surprise when they look more average.

The Intervention Illusion

Your worst-performing sales region gets a new training program, motivational speakers, and increased manager attention. Next quarter, performance improves 15%. Success!

Except: you intervened on an outlier. Worst-performing regions are, by definition, having a bad quarter. Even doing nothing, you'd expect some improvement toward the mean.

How much of the 15% improvement was the intervention, and how much was inevitable regression? You don't know, because you don't have a control group that was also underperforming but got no intervention.

The medical field learned this the hard way. Patients tend to seek treatment when symptoms are at their worst. Any treatment—effective or not—will be followed by improvement on average, because symptoms naturally fluctuate. This is why placebo-controlled trials are essential.

In business, we rarely have that discipline. We intervene on underperformers and take credit for the improvement. We intervene on overperformers ("let's not mess with success") and wonder why they decline.

The Reward and Punishment Trap

Israeli Air Force flight instructors noticed that when they praised cadets after excellent landings, the next landing was usually worse. When they criticized cadets after terrible landings, the next landing was usually better.

They concluded that punishment works and praise doesn't.

The psychologist Daniel Kahneman (who would later win a Nobel Prize) pointed out the real explanation: regression to the mean. Excellent landings are followed by more average ones. Terrible landings are followed by more average ones. The instructors' feedback had nothing to do with it.

This same pattern plays out in performance reviews, sales management, and parenting. We punish extreme lows and see improvement. We reward extreme highs and see decline. We conclude that carrots don't work and sticks do.

We're learning the wrong lesson from noise.

The A/B Testing Disaster

Regression to the mean is particularly insidious in A/B testing and before/after comparisons.

Declaring Winners Too Early

You run an A/B test. After three days, variant B is winning by 12%. You declare victory and ship it.

Three weeks later, the effect has vanished. What happened?

Early random variation created an apparent winner. With small sample sizes and short time windows, random noise can create large apparent effects. As more data comes in, those effects regress to the true (smaller or nonexistent) difference.

This is why proper A/B testing requires:

  • Pre-specified sample sizes and duration

  • No peeking at results mid-test

  • Statistical significance thresholds

But most companies don't have that discipline. They check results daily. They stop tests when they see a winner. They maximize their exposure to regression to the mean, then wonder why their "winning" variants stop working.

The Winner's Curse

Here's a more subtle version: You run 20 A/B tests in parallel. You ship the 3 with the biggest lifts.

Those 3 will probably underperform their test results in production. Why?

You selected them precisely because they were statistical outliers. Even if none of your variants have true effects, random variation means some will show positive results. The biggest apparent winners are the biggest statistical flukes.

This is called the "winner's curse"—the fact that winning auctions means you probably overbid, winning A/B tests means you probably overestimated the effect.

Before/After Studies

A retailer implements a new inventory management system. They measure sales 1 month before and 1 month after implementation. Sales are up 8%. The system is declared a success.

But was the "before" period representative? What if it was:

  • An unusually slow month (bad weather, fewer shopping days)

  • A post-holiday slump

  • An inventory stockout period

If you intervene during a low point, even a completely ineffective intervention will be followed by apparent improvement, because low points naturally revert toward the average.

This is why longitudinal data matters. One before/after comparison tells you almost nothing. You need to know: was this change outside the normal range of variation, or just the cycle returning to trend?

Traditional Approaches (And Their Limits)

Statisticians have developed several techniques to account for regression to the mean:

Randomized Controlled Trials

The idea: Randomly assign interventions. Both groups will experience regression to the mean, so any difference between them is attributable to the intervention.

The problem: This requires discipline, sample size, and time that most businesses don't have. You can't always randomize. You can't always wait for statistical significance.

Longer Time Horizons

The idea: Don't judge performance on short windows. Average over longer periods to smooth out noise.

The problem: Business moves fast. You need to make decisions now, not after two years of data collection. Also, longer windows introduce other confounds—seasonality, market trends, competitive changes.

Control Groups and Diff-in-Diff

The idea: Compare your treated group to a similar untreated group. Look at the difference in differences.

The problem: Finding truly comparable control groups is hard. If you intervened on your worst-performing stores, you can't just compare them to your average stores—they're different populations to begin with.

The Fundamental Challenge

All traditional approaches require either:

  • More time than you have

  • More experimental control than your environment allows

  • Assumptions about comparability that may not hold

Most business decisions get made with small samples, short time windows, and no clean control groups. Which means most business decisions are vulnerable to regression to the mean.

Modeling True Performance

Regression to the mean happens because we observe noisy signals about underlying stable characteristics.

Your store manager doesn't have a "true" quarterly sales number—they have:

  • An underlying skill level

  • Market conditions that fluctuate

  • Random events (construction on the road, a competitor opening nearby, weather)

  • Measurement noise

Any single quarter's performance is their skill plus a bunch of random variation. Extreme performances are likely to be skill plus unusually lucky (or unlucky) random draws.

The solution isn't to ignore single-period performance. It's to model the underlying distribution.

This is where synthetic research becomes powerful. Instead of waiting for multiple periods of data or trying to find perfect control groups, you can:

1. Model the Variability

Build a model of expected performance that accounts for:

  • Store characteristics (size, location, demographics)

  • Manager experience and track record

  • Seasonal patterns

  • Market conditions

Now when a store posts an outlier quarter, you can estimate: how much of this is signal, and how much is noise?

A 35% increase in a stable, high-traffic urban store with a veteran manager is more signal. A 35% increase in a new suburban store with a rookie manager in December is more noise.

2. Run Counterfactual Scenarios

With a calibrated population model, you can simulate: "What would this store's performance distribution look like without the intervention?"

This gives you a baseline that accounts for natural variability and regression to the mean. Now you can see if the actual performance is outside that distribution—which would suggest a real effect.

3. Quantify Uncertainty

Instead of point estimates ("sales increased 15%"), you get distributions: "15% increase observed; 95% confidence interval of true effect is -2% to +25%; probability that true effect is positive: 82%."

This forces better decisions. An 82% chance of a real positive effect is different than "we saw 15% improvement." The first acknowledges uncertainty; the second invites overconfidence.

4. Separate Signal From Noise Over Time

As more data comes in, Bayesian models can update estimates of true underlying performance while discounting outliers. This gives you better predictions faster than waiting for years of data.

A manager who posts two strong quarters in a row is more likely to have genuine skill than a manager with one massive quarter followed by regression. The model can quantify that probability difference.

A Real Example

A retail chain has 50 stores. They want to test a new layout. Traditional approach:

  • Pick 5 stores for the pilot

  • Measure before/after sales

  • If positive, roll out to all stores

Problems:

  • Which 5 stores? Random selection? Worst performers (hoping for improvement)? Best performers (reduce risk)?

  • What time period for "before"? Last quarter? Last year?

  • How do you account for seasonality, local market conditions, regression to the mean?

Synthetic approach:

  • Model expected sales for all 50 stores based on historical data, characteristics, and seasonal patterns

  • Pilot the layout in 5 stores

  • Compare actual performance to modeled baseline (not just to their own history)

  • Simulate: "If we saw this effect in 5 stores, what's the distribution of expected outcomes for all 50?"

  • Account for selection effects (did we pilot in our best stores?) and regression to the mean

This gives you:

  • Better estimates of true effect size

  • Realistic projections of company-wide impact

  • Risk quantification ("10% chance this actually hurts sales in low-traffic stores")

You're not eliminating uncertainty. You're quantifying it realistically instead of pretending your noisy pilot is a clean signal.

When This Matters Most

Regression to the mean is particularly dangerous when:

You're intervening on outliers: Whether you're helping your worst performers or trying to replicate your best ones, you're starting from an extreme position that will naturally revert.

Sample sizes are small: With 3 stores or 50 users, random variation dominates signal. Everything looks like an effect.

Time windows are short: Daily or weekly data has enormous noise. Monthly is better. Quarterly is better still. But business often can't wait.

You're running many tests: The more tests you run, the more likely you are to find spurious "winners" that are just statistical noise.

What You Can Actually Do

  1. Expect regression.

    When you see extreme performance—good or bad—assume it will revert toward the mean. Adjust your expectations accordingly.

  2. Don't intervene on outliers alone.

    If you're testing a new training program, don't test it only on your worst performers. You need a proper control group, or you're just watching regression happen.

  3. Look at distributions, not point estimates.

    One great quarter doesn't make a great manager. One terrible month doesn't make a failed strategy. You need to see performance over time.

  4. Model the baseline.

    Before you attribute improvement to your intervention, model what improvement you'd expect from regression to the mean alone. That's your null hypothesis.

  5. Be suspicious of lucky timing.

    "We hired a new VP and sales jumped 20%" tells you nothing if you hired them right after a terrible quarter. Of course sales improved.

Regression to the mean is one of the most powerful forces in statistics, and one of the least intuitive for human decision-makers.

We see patterns where there's noise. We celebrate interventions that did nothing. We fire managers who were unlucky. We promote managers who were lucky.

The real world is noisier than we think, and our interventions are weaker than we hope.

That doesn't mean we can't learn from data or improve performance. It means we need better models of what "normal variation" looks like before we start claiming credit for changes that would have happened anyway.

Your best performer last quarter probably won't be your best performer next quarter. That's not a failure of talent or effort. It's just math.

The question is whether you'll build systems that understand that—or keep learning expensive lessons from statistical noise.

Andreas Duess

About the author

Andreas Duess

Andreas Duess builds AI tools that turn consumer behavior into fast, usable signal. He started his career in London, UK, working with Cisco Systems, Sony, and Autonomy before co-founding and scaling Canada’s largest independent communications agency focused on food and drink.

After exiting the agency, Andreas co-founded ditto, born from a clear gap he saw firsthand: teams needed faster, more accurate ways to map consumer behavior and pressure-test decisions before committing time and capital.

Related Articles


Ready to Experience Synthetic Persona Intelligence?

See how population-true synthetic personas can transform your market research and strategic decision-making.

Book a Demo

Ditto Newsletter - Subscribe

Get insights that don't slow you down. Research updates, case studies, and market intelligence—delivered monthly.