Reserve a chunk of history the strategy never touches in optimization or selection. Train/test/holdout splits are the baseline; k-fold needs time-aware variants; walk-forward generalizes to many OOS windows. Short HL histories on newer perps constrain how much data you can hold out — plan accordingly.
Out-of-sample (OOS) testing is the cheapest defense against the most expensive kind of mistake — deploying a backtest whose performance was an artifact of the optimization window. The basic idea is simple: a piece of historical data that the strategy has never seen during any parameter selection or signal construction step. Performance there is your honest estimate of what live trading would have looked like during that window.
The difficulty in crypto specifically is data-history length. Mature pairs (BTC, ETH) have years of high-quality data. Newer HL listings often have months. The amount of data you can reserve for OOS is constrained, and the OOS window itself may be dominated by a single regime. Both shape how to design the validation.
OOS is defined by what it is not: data the strategy has not touched in any of the following steps:
Any contact between the strategy-building process and the OOS data leaks information from OOS into the strategy, and the verification stops being honest. This is harder to maintain than it sounds — the temptation to re-tune after a disappointing OOS is exactly what breaks the validation.
Simple holdout. Split history into two contiguous pieces — typically 70–80% in-sample, 20–30% out-of-sample. Build and tune the strategy on IS. Run the frozen strategy once on OOS. Compare performance. Simplest form, lowest computational cost. Limitation: one realization of OOS performance — if the OOS window happens to be regime-mismatched (a chop window for a trend strategy), the result is unfairly bad and vice versa.
K-fold cross-validation (time-aware). Standard k-fold shuffles rows into folds, which leaks future information into training. For time-series data, you need time-aware variants:
Walk-forward. A constrained, chronological form of blocked k-fold. IS and OOS windows slide forward through history, each OOS window is unseen at the time its parameters are selected. The aggregate OOS performance approximates what continuous re-optimization would have looked like live. Walk-forward is the gold standard for parameter-tuned strategies; see the WFO explainer for the full procedure.
Crypto data on Hyperliquid imposes constraints that don’t exist for equities. The validation design has to bend around them.
The expected degradation from IS to OOS Sharpe is real and predictable:
OOS / IS ratio interpretation
≥ 0.7 unusually robust; suspect insufficient IS optimization
0.5–0.7 healthy; the expected outcome for a well-validated strategy
0.3–0.5 degraded; edge survives but smaller than IS suggested
0.0–0.3 heavy overfit; the IS number was mostly noise
< 0 broken; the strategy has no real edgeA 50–70% retention is the realistic baseline. Practitioners new to OOS often expect retention near 100% and reject strategies that show 60% retention — but 60% is what success looks like. The strategies you should worry about are the ones showing 95% retention from a 10,000-trial grid search; that pattern usually means the OOS window leaked into the IS optimization somehow.
Equally important: a strategy that fails on a single OOS window may still be viable on others. Use walk-forward’s multiple OOS windows to estimate the distribution of OOS Sharpe, not just one realization.
A practical workflow for an HL strategy with ~18 months of data:
Keel today ships single-window optimization plus the walk-forward visualizer; rolling/anchored WFO and holdout enforcement are roadmap. The discipline of manually rolling windows and reserving an untouched holdout is on you, not on the platform.
The Walk-Forward Visualizer takes a returns series and lets you inspect IS-vs-OOS Sharpe across a sequence of walk-forward windows. Useful for getting an intuition for how much of an in- sample Sharpe number typically survives out-of-sample, and how much the result swings between adjacent windows when underlying conditions shift.
Further reading: Pardo (2008), The Evaluation and Optimization of Trading Strategies (2nd ed.) is the canonical reference on walk-forward methodology. Bailey, Borwein, López de Prado & Zhu (2014), Pseudo-Mathematics and Financial Charlatanism formalizes the selection-bias problem that OOS testing exists to address.
Keel is a Strategy OS for AI-assisted systematic trading on Hyperliquid. Backtest, optimize, and run live strategies across single-stock perps, indices, and crypto majors — realistic fees, slippage, and funding modeled.
Free to start — connect a Hyperliquid wallet when you’re ready to go live.
Conventional rule of thumb: 20–30% of total history reserved as a final holdout that the strategy never touches during parameter selection or any optimization step. For short HL histories — say a perp with 18 months of data — that means roughly 4–5 months untouched at the end. On longer-history pairs (BTC, ETH back several years), you can afford a 30% holdout while still leaving enough in-sample for meaningful parameter estimates. The key constraint is trade count: in-sample needs at least 200 trades for parameter estimates to be statistically meaningful; OOS needs at least 50 for the verification to be credible.
Not in the standard form. K-fold randomly shuffles rows into folds, which leaks information from future bars into the training folds — a strategy fitted on shuffled k-fold can use information from after the test bar in the training process. For time-series data you have to use time-aware variants: blocked k-fold (folds are contiguous time windows), or purged-and-embargoed k-fold (purges training samples that overlap test windows in feature lookback, embargoes a buffer after each test fold). Walk-forward is functionally a constrained form of blocked k-fold where folds advance chronologically.
The honest expectation is OOS Sharpe of 50–70% of IS Sharpe for a well-validated strategy. So an IS Sharpe of 2.5 that produces 1.5–1.75 OOS is doing what you should expect. If OOS Sharpe is within 30% of IS, you may have an unusually robust edge or your IS optimization was unusually well-constrained. If OOS is below 30% of IS, the in-sample number was mostly fitted noise. If OOS goes negative, you overfit hard and the strategy has no real edge. The 50–70% degradation is the baseline — it's what good validation looks like, not a failure mode.
Keel ships single-window parameter optimization today; native walk-forward and strategy-level holdout splits are on the roadmap. The shipped tools let you define a backtest window and run the strategy on a different window for verification, but they do not automatically partition data or block you from re-tuning on the holdout. The discipline of carving an untouched final window — and of approximating walk-forward by running a series of rolling single-window optimizations — is on you, the operator. The `/lab/walk-forward-visualizer` widget renders per-fold IS/OOS results once you have produced them.
Walk-forward generates many OOS windows by sliding IS/OOS pairs through history; each OOS window is unseen at the time its parameters were selected. The aggregate OOS performance across all walk-forward windows is your strategy's expected behavior under continuous re-optimization. A single train/test split is the simplest form of OOS — one IS window, one OOS window, no walking. Walk-forward generalizes this to multiple windows for stronger validation. Final holdout sits above both: a piece of data the strategy never touches in any optimization or walk-forward step, used once as a final sanity check.
Take a candidate strategy to a full backtest on ~220 HL perps with real fees and 1-hour funding, then carve the holdout window yourself.
Inspect IS vs OOS Sharpe across a sequence of walk-forward windows on your own returns series.
The full walk-forward methodology: rolling IS/OOS windows, anchored vs rolling, parameter drift analysis.
The HL-specific application — how to carve train/test/holdout windows on ~220 HL perps with 15-min bars and 1-hour funding.