A sporting goods brand running about 2,200 active SKUs wanted to understand elasticity on their mid-range outdoor accessories — a category where they suspected they had pricing headroom but didn't have the data to act on that hunch with confidence. Their first instinct was to run a sitewide sale at 15% off and measure the lift. That would have told them whether demand was elastic in the aggregate. It would have told them nothing about which specific SKUs had headroom, burned significant margin across the entire category, and made it impossible to isolate the demand response signal from the promotional effect. Sitewide events are not elasticity tests. They're promotional events that happen to produce demand data.
Running price elasticity tests on a live catalog requires a different discipline. The goal is to produce actionable elasticity estimates on specific SKUs without creating margin exposure that isn't recoverable, without contaminating the signal with promotional noise, and without requiring a statistician to interpret the results before a pricing analyst can act on them. That's a more constrained problem than it sounds.
Why Most Elasticity Tests Go Wrong
The most common failure mode is testing on SKUs that shouldn't be tested. Not every SKU is a candidate for active price elasticity testing. SKUs with very low baseline velocity don't generate enough units sold per week to produce a readable demand response signal within a reasonable test window — a SKU moving eight units per week needs many weeks at a changed price point before you have enough observations to separate signal from noise. SKUs that are heavily driven by organic search ranking or algorithm placement on Amazon will produce demand response data that reflects algorithm effects as much as price elasticity. Seasonal SKUs tested outside their season will produce elasticity estimates that don't apply to in-season behavior.
A second failure mode is testing too many SKUs simultaneously, particularly if they share a customer base or are frequently bundled or cross-purchased. If a buyer typically purchases SKU A and SKU B together, and you raise the price on SKU A during a test, you'll see demand response on both — but only SKU A is under test. The SKU B data is contaminated by the SKU A test effect. This matters less for a catalog with 50 SKUs and more for a catalog with 1,500 where cross-purchase overlap is high in certain categories.
A third failure mode is not defining a holdout correctly. Without a control — a comparable SKU or set of SKUs at unchanged price during the test window — you can't separate the price effect from concurrent external factors: a seasonal shift, a competitor promotion, a platform algorithm change. A proper test cell has a treatment (the SKUs at changed price) and a holdout (comparable SKUs at control price) running simultaneously.
SKU Selection: Where Active Testing Is Worth Running
Before designing a test, we screen the catalog for SKUs that meet four criteria. First, baseline velocity: a minimum of 25-30 units per week at the current price point, sustained over at least six weeks with no major promotional events in that window. Below that threshold, test window duration becomes prohibitive. Second, competitive stability: no active competitor repricing pressure on the SKU in the preceding 30 days. A competitor running aggressive repricing on your ASIN during your elasticity test produces confounded data. Third, no recent algorithm or placement changes that would explain recent demand variance. Fourth, the SKU is not a bundle component or frequent cross-sell for other SKUs under test simultaneously.
SKUs that meet all four criteria are active test candidates. For a catalog in the 1,000-2,500 range, this typically filters to somewhere between 8 and 20% of the catalog at any given time — a useful subset that is actually testable, rather than running nominal "tests" on SKUs that will produce uninterpretable data.
Test Window Design: Duration, Magnitude, and Direction
Test window length depends on baseline velocity and the magnitude of the price change being tested. A price change of 5% on a SKU moving 40 units per week needs a longer window than an 8% change on a 90-unit-per-week SKU to produce a statistically meaningful demand response. For practical purposes, we recommend a minimum of three weeks for moderate-velocity SKUs with changes in the 5-10% range, and five to six weeks for lower-velocity SKUs or smaller price changes. Testing both directions — up and down — sequentially on the same SKU introduces ordering effects and is usually not worth the added complexity at the catalog management level. Pick a direction, hold it for the full window, interpret the result.
Price change magnitude deserves careful thought. Changes below 3-4% on most consumer products will not produce a detectable demand response within a reasonable test window — buyer sensitivity at that level is often below the noise floor of normal demand variance. Changes above 15% risk damaging the SKU's positioning or triggering channel policy issues on Amazon. The useful range for most test scenarios is 5-12%, with the specific magnitude chosen based on where you believe you have headroom relative to your competitive set and margin floor.
On direction: testing price increases before testing price decreases is the rational sequence. If the demand response to a 7% price increase is negligible, you've found margin with no cost. If it's significant, you've learned something about elasticity with recoverable cost — you can return to prior price. Testing decreases first means you may have burned margin confirming what you already suspected or could have estimated from observational data.
Reading the Demand Response Signal
The metric to track during an active test is not absolute sales volume — it's the ratio of actual unit velocity during the test window to the projected velocity at control price. Projected velocity comes from your holdout SKUs and from the SKU's own trailing velocity trend adjusted for any seasonal component. The question is: does observed velocity during the test differ from projected velocity by more than what's attributable to normal demand variance?
Demand variance on individual SKUs is higher than most analysts expect. A SKU with a stable 40-unit-per-week average will routinely have weeks at 30 and weeks at 52, with no price change and no competitive event. Interpreting a drop from 40 to 34 units in week two of a price increase test as "significant demand response" is probably reading noise. Looking at the full three-to-five-week window against holdout is what separates the signal from the week-to-week volatility.
When the signal is clear — velocity down meaningfully against holdout for a price increase, or no meaningful change for an increase below the elasticity threshold — the output is an elasticity coefficient estimate for that SKU. That estimate is not precise in the way an econometric model would be, but it's directionally reliable for pricing decisions: this SKU appears to be inelastic in the 7-12% price increase range, or this SKU appears sensitive to price increases above 5%.
When Observational Data Beats Active Testing
Active testing has a real cost: test duration, analyst time to monitor and interpret, and the margin exposure during downward tests. For a significant portion of a large catalog, observational data from historical price changes produces elasticity estimates that are adequate for pricing decisions without running a structured test at all.
If a SKU has experienced three or four organic price changes over the past 18 months — not tests, just operational repricing decisions — and you have the velocity data at each price point with reasonable competitive stability, you already have a demand response history. Fitting even a simple linear estimate to that historical data gives you an elasticity picture that's often as actionable as a structured test for SKUs in the mid-to-lower tier of your catalog by margin contribution.
The SKUs where active testing pays off over observational estimation are the ones that haven't moved much in price historically, that sit in your high-margin high-velocity tier, and where you have a specific hypothesis about headroom (your competitive set is priced meaningfully higher and you want to test whether buyers are price-sensitive enough to matter). Those are the situations where the precision of a designed test is worth the overhead.
We're not saying active elasticity testing is essential for every growing catalog. For many DTC brands below 500 high-velocity SKUs, observational estimation with structured competitive monitoring produces enough signal to make good pricing decisions without a formal testing program. The case for active testing strengthens as catalog size grows, as competitive pressure intensifies, and as the marginal value of knowing individual SKU elasticity precisely outweighs the cost of running clean tests. At 1,500+ SKUs with a category structure where headroom varies significantly across segments, knowing which SKUs you can push up 8% without meaningful volume impact starts to add up to real margin.
The discipline is the same in either approach: separate the demand response signal from the noise, use holdouts and competitive controls to do it, and don't optimize the metric you can measure easiest (unit sales this week) at the expense of the outcome that matters (margin over the full product lifecycle).