A backtest is a simulation of a trading strategy on historical data — total return, Sharpe, max drawdown, and win rate as they would have been live. Done honestly, it filters out the worst ideas before they cost real money. Done poorly, it gives false confidence in strategies that will fail out-of-sample.
Backtesting is the simulation of a strategy on historical data. Run the strategy across past prices, record what it would have done, compute performance metrics. The output is a hypothesis: "if the future resembles the past, this strategy should produce roughly these returns at roughly this risk level."
The word hypothesis is doing work in that sentence. Backtests aren't predictions; they're filters. A strategy that fails in backtest will almost certainly fail live. A strategy that succeeds in backtest might succeed live — but only if the backtest was constructed honestly and the future actually resembles the past. Both are non-trivial conditions.
The metrics that matter, in priority order:
Beyond the headline metrics, the equity curve itself matters. A smooth equity curve with occasional small drawdowns is psychologically very different from a curve with two big drawdowns separated by long flat periods — even if both reach the same total return.
The single most important defense against overfitting is splitting your data.
If OOS performance is comparable to IS, the strategy generalizes. If OOS performance degrades substantially, the strategy is overfit. Common split: 70% IS, 30% OOS — though the right ratio depends on sample size.
Walk-forward optimization extends this: instead of one fixed split, walk through time in chunks (e.g. 6-month optimization → 3-month OOS validation → roll forward → repeat). Strategies that survive walk-forward have demonstrated robustness across multiple regime shifts. See /learn/walk-forward-optimization.
Keel handles the structural mistakes automatically:
To get started: open the lab to find candidate assets via a screen, click "Backtest in Keel" to take that state into a workspace, then configure entry/exit logic. Or fork one of the documented templates in /strategies as a starting point — the funding-carry template has a real 20-month backtest with Sharpe 2.17 you can inspect and modify.
Keel is a Strategy OS for AI-assisted systematic trading on Hyperliquid. Backtest, optimize, and run live strategies across single-stock perps, indices, and crypto majors — realistic fees, slippage, and funding modeled.
Free to start — connect a Hyperliquid wallet when you’re ready to go live.
A backtest is a simulation of a trading strategy on historical data. It returns metrics like total return, Sharpe ratio, max drawdown, win rate — the would-have-been performance if the strategy had been live across the test period. Backtests are not predictions; they're sanity checks. A strategy that performs poorly in backtest will almost certainly perform poorly live; the converse is not guaranteed.
Four killers. (1) Look-ahead bias — using information that wasn't available at the time of the trade (e.g. using closing price to decide to enter intraday). (2) Survivorship bias — backtesting only on assets that exist today, missing delisted/failed ones. (3) Overfitting — tuning parameters until the historical sample looks great; the strategy fails out-of-sample. (4) Ignoring costs — leaving out fees, slippage, or funding produces inflated returns that vanish live.
Long enough to include multiple market regimes. For crypto, minimum 1 year covering at least one trend + range + drawdown cycle. 2-3 years is better. For lower-frequency strategies (multi-day holds), longer samples are necessary — 100 trades is the rough minimum for parameter estimates to be reliable.
In-sample is the data you used to develop the strategy (chose parameters, picked signals). Out-of-sample is fresh data the strategy never saw. A strategy that works in-sample but fails out-of-sample is overfit. Good practice: split your data 70/30, develop on 70%, validate on 30%. Walk-forward optimization is an extension that re-validates as you walk through time.
Three tests. (1) Run on multiple in-sample sub-periods — if performance varies wildly, the strategy is regime-dependent. (2) Try the strategy on out-of-sample data (or paper-trade it forward) — substantial degradation is the overfit signature. (3) Parameter sensitivity — vary each parameter slightly; if performance collapses, you're at a fragile peak in the parameter surface. Robust strategies have wide profitable plateaus, not narrow spikes.
Yes. Keel backtests model fees, slippage, and funding by default. The component system makes look-ahead errors hard to write (you can't access future data). Walk-forward optimization is available for the parameter-tuning workflow. The strategy registry includes win-rate standard error and per-regime sub-period breakdowns. Every backtest run gets a permanent share URL for reproducibility.