Demand Forecasting Methods: 10 Techniques Compared
10 demand forecasting methods compared by accuracy, data needs, and fit — from moving averages to ML — for manufacturers picking what actually works.
Most guides to demand forecasting methods list every technique ever invented and tell you nothing about which one to use on a Tuesday. That's useless when you're a VP of Supply Chain staring at 8,000 SKUs and a planning team of four. Having built the demand planning function at a $250M manufacturer, I'll tell you the truth up front: you don't pick one method. You pick a method per demand profile, and the skill is matching the technique to the SKU, not falling in love with a model. Here are the 10 demand forecasting methods that matter, compared on accuracy, data appetite, and where they actually fit.
The two families, and why it matters
Every demand forecasting method falls into one of two camps:
- Quantitative — driven by data. Statistical time-series and machine learning. Good when you have history and the future rhymes with the past.
- Qualitative — driven by judgment. Sales input, expert panels, market intelligence. Necessary for new products, step-changes, and anything with no usable history.
The mistake I see most: teams running pure qualitative (sales gut-feel) on mature, high-volume SKUs that statistics would forecast better and cheaper. And running pure statistical models on new-product launches where there's no history to learn from. Match the family to the situation.
The 10 methods, compared
| Method | Family | Data needed | Best fit | Typical accuracy |
|---|---|---|---|---|
| 1. Naive / last-period | Quant | Minimal | Baseline to beat, very stable items | Low–Medium |
| 2. Moving average | Quant | 3–12 periods | Smooth, slow-moving items | Medium |
| 3. Exponential smoothing (SES) | Quant | 1–2 yrs | Smooth demand, no trend/season | Medium |
| 4. Holt-Winters (triple exp.) | Quant | 2–3 yrs | Trend + seasonality | Medium–High |
| 5. ARIMA / SARIMA | Quant | 2–3 yrs | Strong autocorrelation, seasonality | Medium–High |
| 6. Croston's / TSB | Quant | Sparse history | Intermittent, spare parts | Medium (for lumpy) |
| 7. Causal / regression | Quant | History + drivers | Price, promo, weather-driven | High (if drivers known) |
| 8. Machine learning (GBM, etc.) | Quant | Large, clean data | Many SKUs, rich features | High (at scale) |
| 9. Sales-force composite | Qual | Rep input | B2B, project demand, new accounts | Variable |
| 10. Delphi / expert panel | Qual | Expert time | New products, no history | Variable |
How to choose: a decision rule that works
Forget model worship. Segment your SKUs first, then assign:
- Smooth, high-volume A-items (CV < 0.5): Holt-Winters or exponential smoothing is plenty. If price and promotion swing demand, layer causal regression on top. Don't reach for ML here — the lift over a well-tuned statistical model is usually small and the maintenance cost is real.
- Intermittent / spare parts (CV > 1.0): Croston's method or TSB. Standard exponential smoothing will systematically over-forecast lumpy demand and quietly build dead stock. This is the single most common error I see on aftermarket portfolios.
- Promo- and price-driven items: causal regression or ML with the drivers fed in. The residual error on these is almost always the promo lift, so model the lift directly.
- New products: qualitative. Analog/like-modeling off a comparable SKU's launch curve plus a sales-force or expert input. No statistical model can forecast what has no history.
- Lumpy B2B / project demand: sales-force composite, but discipline it with forecast value added so you can see whether the reps' input beats a naive baseline. Often it doesn't.
Where machine learning actually earns its keep
ML demand forecasting is oversold for mid-market manufacturers. It earns its keep in exactly three conditions:
- Scale. Thousands of SKUs where hand-tuning statistical models per item isn't feasible, so a single model that learns across the portfolio wins on labor alone.
- Rich features. You actually have price, promo, weather, web traffic, macro signals to feed it. ML with no features is just a slower moving average.
- Clean, deep data. Garbage in, confident garbage out — and ML hides its garbage better than a transparent statistical model does.
If you can't check all three, a disciplined statistical-plus-causal approach beats a half-baked ML project, and it's explainable when the CFO asks why the number moved. Where ML does shine in practice is platforms that combine it with planner-friendly scenario modeling — running an AI baseline across the whole portfolio, then letting planners apply judgment and measure whether that judgment adds value. That's the model PlanForge implements on Pigment.
The method nobody lists: ensemble + FVA
The highest-accuracy approach isn't a method on the list — it's running several, picking the best per SKU automatically (a champion-challenger setup), and then measuring forecast value added so human overrides are only kept when they beat the machine. In the teams I've run, this two-step discipline moved accuracy more than swapping any single algorithm. The model matters less than the process around it.
The bottom line
Demand forecasting methods aren't a menu where one wins. Smooth items want exponential smoothing or Holt-Winters; lumpy items want Croston's; promo-driven items want causal models; new products want judgment; and a large, feature-rich portfolio is where ML pays off. The real edge is matching method to demand profile, then policing it with forecast value added so you keep only the human touches that actually help.
Not sure which methods fit your portfolio? PlanForge runs a free planning-maturity and stranded-inventory teardown — we profile your SKUs by demand variability, tell you which method belongs on each segment, and show where your current approach is building dead stock. Book a 30-minute call and we'll map your portfolio together.
Let's see what's worth building first.
A 15-minute call: tell me where your AI or planning is stuck, and I'll tell you the one thing worth building first — and whether it's worth doing at all.