How We Use AI to Develop and Test Trading Systems in Trading Blox

Two things have changed how we build systematic strategies in the last eighteen months. The first is that large language models — Claude, ChatGPT, and their peers — have become genuinely useful for quant work. Not magic, not autonomous, but a real lever. The second hasn’t changed: Trading Blox is still the platform we trust to backtest, position-size, and stress-test portfolios of futures strategies before any of it sees live capital.

The combination of the two is where the workflow actually shifts. In this post we walk through how we use AI alongside Trading Blox at Wisdom Trading — what it’s good for, what it isn’t, and the discipline that keeps the whole thing from becoming an overfitting machine.

How a strategy idea moves from hypothesis to live capital in our workflow.

Why Trading Blox, Specifically

If you’ve never used it: Trading Blox is built for portfolio-level systematic futures research. Most platforms backtest one symbol at a time. Trading Blox runs a whole portfolio — dozens of markets, multiple systems, shared volatility-targeting and risk overlays — through a single backtest. It handles roll logic, contract specs, dynamic position sizing, and correlations between sub-systems. It’s also been around long enough that most of the early-2000s mistakes that newer platforms still make have been engineered out of it.

For anyone running a real systematic CTA-style book, that portfolio-first architecture isn’t a nice-to-have. It’s the difference between knowing what you have and guessing. Single-market backtests look pretty until you find out the entire equity curve came from one runaway position in coffee in 2012.

Where AI Fits Into the Process

We use AI in three concrete places. None of them involve handing it the steering wheel.

Strategy ideation. Most strategy ideas are recombinations of existing structure: a breakout filter, a regime classifier, a different exit logic. AI is very good at producing a wide range of variants on a theme on demand. A prompt like “give me five Donchian breakout variants that account for volatility regime, each with a different rationale” produces five testable hypotheses in about thirty seconds. Maybe one survives a backtest. That’s a much better hit rate than scrolling through old trading books for inspiration.

Translating ideas into BloxBasic. Trading Blox uses its own scripting language. It’s not hard, but the syntax has quirks, and the documentation assumes you already know what you’re looking for. A clear plain-English description of a strategy can be turned into a draft BloxBasic block by an LLM in a single pass. We almost always still rewrite parts of it — but we’re rewriting, not writing from scratch.

Code review. This is the one we underrated at first. Look-ahead bias, off-by-one indexing on bar references, accidental position-size compounding — these are the bugs that turn 0.6 Sharpe into 2.4 Sharpe and ruin a year of work. Pasting a strategy block into an LLM and asking “find any lookahead, sizing, or boundary issues in this code” catches things we’d miss in our own re-read. Not every time, but often enough to be worth the thirty seconds.

Our Actual Workflow

The practical sequence we follow:

Hypothesis. A market observation, a paper, an idea from a portfolio principal — anything testable in rules.
Distillation with AI. We describe the hypothesis in plain English and ask the model to formalize it into a precise rule set, then ask it to identify ambiguities. The model is usually better than we are at noticing where “buy when momentum is strong” needs five more specifications before it’s code.
Translate to BloxBasic. AI produces a first-draft script. We review and rewrite.
Run the first backtest. Standard portfolio universe, fixed position sizing, no fitting yet. We just want to see if the idea has any pulse.
Examine attribution. Trading Blox makes it straightforward to see which markets and which periods contributed to the equity curve. If the entire result comes from three months in 2008 or one market, we know to be skeptical.
Iterate carefully. This is where it gets dangerous. AI will happily generate parameter sweeps that overfit. We constrain the loop: a fixed number of variants per session, a strict out-of-sample period reserved before any tuning starts.
Walk-forward and stress. Trading Blox handles walk-forward validation natively. We use it. If the strategy doesn’t survive walk-forward on an honest sample, it doesn’t see capital.
Paper or small-size live. Even surviving strategies start small. Slippage, fills, and real-time market microstructure are not in any backtest.

The result is a workflow where AI compresses the time spent on the mechanical parts — restating ideas, writing boilerplate, scanning code — and we spend more time on the judgment-heavy parts: choosing what’s worth testing in the first place, deciding when a result is real, and allocating capital.

What AI Can’t Do

A short list, because pretending otherwise leads to bad strategies and worse losses.

It doesn’t know the market. It can write code that reflects a strategy idea, but it has no intuition for slippage in illiquid contract months, holiday roll calendars, or which markets routinely have outsized gap risk overnight.
It overfits enthusiastically. Ask an LLM to “improve” a strategy’s Sharpe and you’ll get suggestions that work brilliantly on the in-sample data and collapse out of sample. It needs guardrails.
It can’t assess regime change. A walk-forward test on 2014–2019 data doesn’t anticipate a 2020 vol shock or a 2022 rate regime. Knowing whether the regime you backtested in still exists is a human call.
It doesn’t size portfolios. Position sizing across correlated futures markets is a portfolio-engineering problem. Trading Blox does it. The LLM doesn’t.

The mental model that works: AI is a sharp junior analyst with no market memory and infinite patience. It generates more, drafts faster, catches obvious bugs. Everything else still requires the senior person in the room.

The Discipline Part

What’s kept this from becoming a strategy-overfitting factory:

A fixed out-of-sample cutoff declared before any AI iteration begins.
A cap on iteration cycles per strategy — if it takes more than five rounds of tweaking to look attractive, it’s almost certainly not real.
Honest attribution. If a strategy works only in one market or one decade, it goes in the rejection pile, even if the headline Sharpe looks great.
Live paper trading before any capital. The transition from clean backtest to messy live execution still has surprises every time.

If You’re Building in Trading Blox

If you’re developing your own strategies in Trading Blox and need a broker who can take them all the way to live execution — not just give you an account and a price sheet — that’s what we do. We’ve been running systematic capital since 2003, clearing through StoneX, Phillip Capital, and TradeStation. If you want to talk about how an AI-assisted workflow looks when it actually has to clear and fill, start a conversation or explore our trading-systems services.