LLMs × Systematic Trading

Use Claude where it actually helps. Use the engine for the rest.

Keel pairs Claude or Cursor with a deterministic Hyperliquid backtest engine. LLMs propose, compose, summarize, refactor — where they're sharp. The engine runs the backtest, models funding and fees, and deploys the compiled artifact live — where LLMs hallucinate.

Split of labor

Three frames for LLMs in systematic work. The third is the only one that survives.

Most pitches for LLMs in crypto fall into two camps: an autonomous wallet bot, or a one-off Python script. Neither matches what an LLM is actually good at. Here, LLMs compose; the engine computes; the compiled artifact trades.

Where LLMs help

Proposing a thesis from priors ("carry plus a funding-regime gate"). Searching a 182-component library for matching signals. Composing them into a typed DSL graph that compiles. Summarizing a backtest tearsheet in one paragraph. Refactoring a strategy when you want to swap the regime detector. Explaining a vol-of-vol filter to a teammate who has not seen it before.

With Keel

These are composition and summarization tasks. They scale with model capability, they tolerate one-shot errors (the engine catches them), and they save real time. This is where Claude is sharp.

Where LLMs hallucinate

Running deterministic math in their head. Predicting a funding-payment trajectory across 200 perps over 20 months. Sizing a position against realised volatility. Telling you whether a Sharpe of 2.17 is statistically distinguishable from luck given the sample size. Computing drawdown from a price series. The LLM will produce a plausible number and it will be wrong in ways you can't easily audit.

With Keel

Don't ask Claude to compute drawdown; ask it to compose a strategy whose drawdown the engine computes. Runtime math stays in the engine, where it's deterministic and auditable.

The split of labor

The LLM authors the strategy graph through the MCP. The engine runs deterministic math against real Hyperliquid funding and price. The compiled artifact deploys live with bit-for-bit parity to the backtest. The agent is off-line at execution time; the deterministic compiled strategy is what trades.

With Keel

The agent helps you build the strategy; the engine runs it. The architecture treats LLMs as useful but unreliable, isolating that unreliability to the composition step where you can review every output before the engine commits.

Capabilities

Under the hood

Structured strategy engine

Strategies are composable pipelines of typed components. The system validates every connection at edit time — errors caught before you backtest, not after you deploy.

AI built on the same system

AI doesn’t generate code — it composes from the same components you use. It understands valid connections, constraints, and trade-offs. Every strategy it builds is structurally valid.

Detailed backtest reports

Sharpe, Sortino, max drawdown, win rate, trade-by-trade logs. Compare runs side by side. Real fee and slippage modeling.

Version control for strategies

Every edit creates a new version. Compare any two versions side by side. Tag releases, restore previous versions, fork strategies. Your full history, always recoverable.

Auditable execution logs

Every live run is logged — what the strategy calculated, what orders executed, what filled. Full transparency.

Non-custodial by design

Your keys never touch our servers. Keel uses Hyperliquid’s native delegation. Sign once, revoke anytime.

Verified backtest

One run, one share URL, no hand-waving.

A deterministic backtest against real Hyperliquid history, with bit-for-bit parity to live. The share URL holds the full tearsheet — equity curve, decomposed P&L, every trade.

Featured · Single-window verified backtest

Funding-carry on Hyperliquid perps

A deterministic single-window backtest with bit-for-bit parity to live. Click through for the full tearsheet — equity curve, decomposed P&L, trade-by-trade log. Period: 2024-08-15 → 2026-04-30 (20 mo).

Total return
+79.6%
Sharpe
2.17
Max drawdown
-9.7%
Win rate
48.7%

Verified Keel backtest. Past performance is not indicative of future returns.

FAQ

Common questions

Why use an LLM for systematic work at all?

Because composition and summarization are LLM-shaped problems. Searching a 182-component typed library for "vol-of-vol regime detectors", wiring them into a valid DSL graph, and writing a one-paragraph summary of a backtest tearsheet are tasks where Claude is genuinely strong. None of that asks the LLM to compute drawdown, simulate funding accrual, or decide whether a Sharpe is statistically significant — those stay with the engine and with the human reading the numbers. The LLM is the composition + summarization layer over a typed component graph, not a runtime decision engine.

What's the engine actually doing during a backtest?

Keel ingests Hyperliquid perpetual markets at 15-minute bars and 1-hour funding rates from the same cache the live execution path reads. The simulator applies the live Hyperliquid fee schedule per fill, models per-asset slippage in basis points that you set explicitly, accrues funding into the equity curve on every 1-hour boundary, and decomposes P&L into price return, funding return, and combined return. A buffered rebalancer respects exposure caps and volatility targeting at portfolio level. Output is a deterministic share URL with the full tearsheet — same inputs, same outputs, every time.

How is backtest-to-live parity verified?

The compiled pipeline artifact — the serialized DSL graph plus its component parameters — is the same object that runs in backtest and in live execution. The same data cache feeds both paths. Live still diverges from backtest because of real-world market impact, slippage realization, and regime drift after the cutoff date — but not because of implementation drift. Execution logs let you compare expected versus actual fill by fill. The plumbing under "parity" is that there is one engine, not two.

How does this compare to QuantConnect or quantpylib for Hyperliquid?

Neither runs on Hyperliquid natively. QuantConnect is an equities-and-crypto research platform with strong WFO and rigor diagnostics but no HL execution path. quantpylib is a Python library for systematic crypto research, again without HL-native execution. Keel's differentiation is venue-native data and execution plus agent-driven composition plus bit-for-bit backtest-to-live parity — not rigor-diagnostic surface, where QC is ahead today. If your workflow is "research in QC, deploy somewhere else", you keep the rigor surface and lose parity. If your workflow is "compose in Claude, deploy on HL", Keel is the path.

What's the data depth?

Hyperliquid 15-minute bars and 1-hour funding rates, with open interest where the venue publishes it. History depth varies per asset — BTC, ETH, and SOL go back to 2024-08-15; newer listings like HYPE start at their listing date. The cache is parquet on local disk; the live execution path reads the same files. Sub-15-minute backtesting is not supported; latency-sensitive strategies belong in a different tool.

Is this for crypto only?

Yes. Keel is Hyperliquid-native by design — funding accrual, native delegated signing, per-asset slippage tuning, and the full execution path are all built around HL perps.

What will you build?

Live on Hyperliquid in minutes.

Get started
Non-custodial
Your keys never leave your wallet. Your strategies run on your account — Keel never holds funds.
Same code, backtest to live
The strategy that passed your backtest is the strategy that trades. Same pipeline, no surprises.
Full visibility
See every position, trade, and decision in real time. Pause anytime. Your account, your control.