Question 1

What does the Deflated Sharpe Ratio actually deflate?

Accepted Answer

It deflates the observed Sharpe ratio for four things at once: (1) the number of strategy variants you tested — more trials, higher expected max Sharpe under the null, even with zero edge; (2) sample size T — short samples produce noisy Sharpes; (3) skewness — negative skew (rare crashes) makes a given Sharpe less reliable; (4) excess kurtosis — fat tails make a given Sharpe less reliable. The output is the probability that the true Sharpe is greater than zero given everything you tried and the shape of your returns.

Question 2

Why do I need to enter the number of trials?

Accepted Answer

Because the expected max Sharpe across N independent random strategies grows with sqrt(2 ln N), even when no strategy has any real edge. If you tested 100 parameter variants and reported the best Sharpe, your number is contaminated by selection bias. DSR corrects for that by comparing your observed Sharpe to SR_0 — the expected best Sharpe under the null of zero true edge. If you only ran one strategy, set N=1 and the trial-bias correction drops out (SR_0 = 0).

Question 3

How do I count trials honestly?

Accepted Answer

Count every parameter combination you evaluated on the same data, including informal ones. If you swept lookback ∈ {10, 20, 50, 100} × threshold ∈ {0.01, 0.02, 0.05} × asset ∈ {BTC, ETH, SOL}, that is 36 trials, not one. Add manual variants ("I also tried with a 7-day filter") — those count too. If anything, undercounting is the bigger risk; most practitioners underestimate N by 5-10x. When unsure, sensitivity-test by entering both your best-guess N and 10×N.

Question 4

How is DSR different from a regular Sharpe t-test?

Accepted Answer

A regular Sharpe significance test asks "given a single sample, is the Sharpe distinguishable from zero?" DSR asks "given that I selected this Sharpe from N alternatives, is it distinguishable from what selection alone would have produced?" The two converge at N=1 (no selection); they diverge sharply as N grows. A Sharpe of 1.5 over 252 days is wildly significant by t-test (~3-sigma) but can be statistically noise once you account for testing 30 variants.

Question 5

Does Keel compute DSR natively on backtest results?

Accepted Answer

Not yet. Today Keel reports observed Sharpe, Sortino, max drawdown, and other point estimates from the backtest engine — DSR is not built in. This calculator is the bridge. Paste your observed Sharpe and an honest trial count from the optimizer history, and read off the deflated probability. Native DSR alongside the other metrics is on the roadmap; no committed ship date.

Deflated Sharpe Ratio Calculator

Methodology

Trade systematically on Keel

Calculator questions

Deflated Sharpe Ratio — Explainer

PBO Calculator

Monte Carlo Backtest Resampler