Assumed background
This page is an implementation recipe for walk-forward optimization (WFO) on crypto strategies. It does not re-explain the concept — for that, see the canonical primer at /learn/walk-forward-optimization, which covers the Pardo formulation and the general mechanic. The job here is the crypto-specific implementation: window sizing for regime shifts, anchored vs rolling on volatile assets, OOS interpretation thresholds, and the gap between what’s shipped today and what’s on the roadmap.
IS/OOS window sizing for crypto
The dominant constraint is regime duration. Crypto regimes — funding bias, realized vol level, cross-sectional dispersion — typically last 2-6 months and then shift. Two implications follow:
- IS window must span ≥2 regimes, which puts a practical floor around 6-12 months. Shorter and you fit a single regime that may not generalize.
- OOS window of 1-3 months per fold gives the strategy a chance to see one regime change without diluting the fold count too much. Shorter than ~30 trading days and the OOS Sharpe estimate is dominated by noise.
The second constraint is history available per asset. Hyperliquid perps that listed in mid-2023 have ~36 months by now; perps listed in 2025 have 6-12 months. A WFO schedule that requires 12 months of IS + 3 months OOS + 6 rolling folds needs 12 + 18 = 30 months of history per name, which excludes ~half the listings. The honest move is to run two universes: a long-history universe for the rigorous WFO run, and a shorter schedule (e.g. 6mo IS / 1mo OOS / 4 folds) for the broader universe.
Anchored vs rolling for crypto specifically
The two schemes solve different problems.
| Scheme | IS window | Right for crypto when |
|---|---|---|
| Anchored | Fixed start date; grows each fold | Limited history; want every fold to use all data; strategy is regime-agnostic |
| Rolling | Fixed width; slides each fold | 18+ months of history; want adaptation to regime shifts; willing to discard old data |
| Hybrid | Anchored until threshold, then rolling | Mid-history asset; want stability early, adaptation later |
The crypto-specific bias: rolling tends to win on majors (BTC, ETH, SOL, HYPE) because regime shifts are frequent enough that older data is genuinely stale, and anchored tends to win on newer listings because there isn’t enough history for rolling to discard anything. Hybrid splits the difference for mid-history names.
A second axis worth deciding up front: parameter selection rule. Vanilla WFO picks the single best IS parameter set and applies it to the next OOS window. That is fragile — the “winner” on the IS slice is often a thin margin over the next-best set and degrades to noise out-of-sample. Two more-robust selection rules: take the median of the top-decile parameter sets by IS metric, or take the parameter set with the lowest IS-rank- variance across nearby-volatility folds. Either dampens fold- to-fold parameter jumps and improves carry-over.
How many windows
More folds equals tighter OOS Sharpe estimates and a clearer degradation pattern, but at the cost of compute (every fold is a full optimization run). Practical ranges:
- 4-6 folds — minimum useful count; enough to see whether OOS Sharpe degradation is consistent or one-off.
- 8-12 folds — the sweet spot for crypto with 2+ years of history; gives ~6-12 OOS months of strategy history per fold to score on.
- 20+ folds — useful only for fast-rebalance strategies (intraday on 15m bars) where each OOS window has hundreds of trades; otherwise the per-fold Sharpe estimates are too noisy to compare.
Interpreting OOS results
Three diagnostics that matter more than headline OOS Sharpe:
- Degradation ratio (OOS Sharpe / IS Sharpe). Healthy range: 0.5-0.7 across most folds. Below 0.5 means the optimization is over-fitting; above 1.0 across folds is suspicious (usually the OOS happened to be a friendly regime).
- OOS Sharpe consistency. A strategy with OOS Sharpes of [0.4, 0.6, 0.5, 0.5, 0.6] is far more trustworthy than one with [2.1, -0.3, 1.4, -0.5, 0.8] even if the second has higher mean. Look at the per-fold distribution, not just the mean.
- Parameter stability across folds. If the optimal lookback window jumps from 5 days to 60 days to 12 days across consecutive folds, the underlying strategy doesn’t have a stable edge — you’re fitting regime-conditional noise.
The browser-side walk-forward visualizer takes a CSV of returns plus window/step parameters and renders all three diagnostics — IS vs OOS bars per fold, degradation series, and a parameter-stability strip. Useful for quickly triaging a candidate strategy without building a full optimization harness.
Doing this with Keel today
Single-window backtests are the primitive. In the Keel web app, set start and end dates on a strategy in the builder and click Run Backtest — the run lands on the same detail page with the equity curve, metrics, and fills. You can fork a shared backtest, change the date range, and re-run to compare IS vs OOS slices side by side.
Native WFO is not shipped. Until it is, you can approximate walk-forward by running a sequence of single-window backtests with manually rolled IS/OOS dates: for each fold, run an IS backtest, eyeball or grid the parameters, update the strategy, run a second backtest on the OOS window with those parameters, save the OOS metrics, and aggregate per-fold OOS metrics for the degradation plot.
This works but is fragile — you maintain the schedule yourself, the parameter handoff is hand-rolled, and the aggregate report is whatever you build. For a one-off rigor check on a candidate strategy, the effort is justified. For ongoing research it is not.
Scripting the manual sequence? pipx install keel-trade gives you keel backtest run and keel backtest results for IS/OOS automation from a terminal or AI agent — CLI reference.
When native WFO ships
On the roadmap, no committed date. The intent is a one-command walk-forward run that takes IS/OOS window parameters, anchored/rolling mode, and a parameter grid, runs the full schedule across all folds, and emits a per-fold report (IS metrics, winning params, OOS metrics) plus aggregate degradation and parameter-stability plots. Sign up with ?notify=wfo to be tagged for the launch announcement.
Until then, the visualizer plus the manual sequence above are the practical answer. WFO is the diagnostic that separates a real edge from a fit one; the discipline is worth the friction.
Crypto-specific WFO pitfalls
- Optimizing the regime gate inside the IS window. If the WFO fits both the strategy parameters and the regime threshold on the same IS window, the “regime gate” degenerates into an extra fit parameter. Treat regime thresholds as priors, set them once globally on the full dataset (or hand-pick them), and only optimize the strategy parameters per fold.
- Survivorship across folds. Universe selection (e.g. “top 30 by volume”) computed inside each fold leaks information across folds — assets that graduated to top 30 by fold N were not top 30 in fold 1. Recompute the universe at the start of each OOS window using only data available at that point.
- Funding cadence misalignment in OOS scoring. The OOS window has to start on a funding-cadence boundary that matches the IS one. Starting OOS mid-funding-interval creates a partial-period bias in the first scored bar that accumulates across folds.
- Treating high OOS Sharpe as good news. If OOS Sharpe consistently exceeds IS Sharpe, the most likely explanation is that the OOS window happens to be a friendly regime — not that the strategy generalizes well. Healthy WFO shows OOS at 50-70% of IS; consistently higher is a yellow flag, not a green one.
Try it
Use the browser-side WFO visualizer to inspect a candidate strategy’s OOS degradation in seconds, or sign up to be notified when the Keel-native walk-forward scheduler ships.