Testing on Testing Periods – The Most Subtle Trap

Testing on Testing Periods – The Most Subtle Trap

#61 - Behind The Cloud: Fundamentals in Quant Investing (5/12)

Testing on Testing Periods – The Most Subtle Trap

November 2025

𝗙𝘂𝗻𝗱𝗮𝗺𝗲𝗻𝘁𝗮𝗹𝘀 𝗼𝗳 𝗤𝘂𝗮𝗻𝘁𝗶𝘁𝗮𝘁𝗶𝘃𝗲 𝗜𝗻𝘃𝗲𝘀𝘁𝗺𝗲𝗻𝘁𝘀

In this series, the Omphalos AI Research Team want to discuss the key and fundamental aspects of quantitative investing in detail and depth. In particular, our series will not be a beautiful story of how to build the holy grail of investing, but rather a long list of pitfalls that can be encountered when building such systems. It will not be a complete and exhaustive list of pitfalls, but we will try to describe those we discuss in great depth so that their significance is clear to everyone. And importantly, this will not be a purely theoretical discussion. We will provide a practical view on all of these aspects — shaped by real-world lessons and, in many cases, by our own painful and sometimes even traumatic experiences in building and testing systematic investment strategies. These hard-earned lessons are precisely why Omphalos Fund has been designed as a resilient, modular, and diversified platform — built to avoid the traps that have undone so many before.

At Omphalos Fund, we have always been clear about one thing: artificial intelligence is not magic. It is a powerful tool, but its value depends entirely on the system it operates in and the rules that guide it. When applied to asset management, this means that even the most advanced AI can only be effective if it is built on a deep understanding of how markets work — with all their complexities, inefficiencies, and risks.

That is why our latest Behind the Cloud white paper takes a step back from the technology itself. Instead, it examines the foundations of quantitative investing — the real-world mechanics, pitfalls, and paradoxes that shape investment strategies. The aim is not to present a flawless “holy grail” of investing, but to show the challenges and traps that every systematic investor must navigate.

We believe this is essential for anyone working with AI in finance. Without appreciating the underlying business of investing, AI models risk becoming black boxes that look impressive in theory but fail in practice. By shedding light on the subtle but critical issues in quantitative investment design — from overfitting to diversification, from the illusion of normal distributions to the reality of risk of ruin — we provide context for why our platform is built the way it is: modular, transparent, and resilient.

The goal of this white paper is simple:
To help readers understand that using AI in asset management is not only about smarter algorithms — it’s about building systems that are grounded in strong investment fundamentals and designed to survive the real world of markets.

Chapter 5

Testing on Testing Periods – The Most Subtle Trap

Even when quants avoid the obvious sins of overfitting, there is a quieter danger lurking in their workflow – one that is harder to see, harder to admit, and just as destructive. It is the corruption of the test period: using data meant for validation to make design choices.

On paper, this seems innocent. You run a backtest, it looks good. You test it on an unseen period, it looks slightly worse – so you tweak a parameter. Now it looks better again. One more small change, one more test, another improvement. The process feels scientific, disciplined, and iterative. But in truth, you’ve just transformed your test period into part of the training set.

This subtle contamination is known as data leakage or testing on the test set – and it quietly destroys the credibility of countless quantitative strategies.

 

The Mirage of Robustness

A test period is supposed to represent the unknown future – untouched, unseen, unoptimized. It’s the proving ground for robustness. But the moment you adjust a parameter based on its results, it stops being independent. You are no longer testing; you are re-optimizing.

The danger is that every iteration feels justified. After all, you’re just “fine-tuning.” But after ten, twenty, or fifty such cycles, you’ve effectively peeked into the future so many times that you’ve taught your model to perform on that specific data, not on unseen data.

This is why many strategies look stable across multiple test periods – until they go live. The illusion of robustness has been created through inadvertent lookahead bias. You didn’t cheat intentionally; you simply optimized too long.

Bailey and López de Prado (2014) demonstrated that the more a model is iteratively tested and refined on overlapping data, the higher the false discovery rate becomes. Their Probability of Backtest Overfitting (PBO) framework quantifies this risk mathematically: with each new test or “improvement,” the probability that your results are merely statistical noise rises exponentially.

Goodfellow et al. (2016), in the machine-learning world, called this problem test-set overuse. The principle is universal: every time you look at test results and act upon them, the test loses its value as an unbiased sample of the future.

In finance, this effect is particularly dangerous because data is scarce and non-stationary. Markets change – so the few “clean” test periods you have are precious. Once contaminated, they cannot be replaced.

 

Research Spotlight

    • White (2000): One of the earliest warnings about data-snooping bias in empirical finance, showing how repeated testing inflates apparent success rates.
    • Bailey & López de Prado (2014): Introduced the Deflated Sharpe Ratio and the Probability of Backtest Overfitting (PBO) – a framework proving that the chance of false discovery increases exponentially with each re-test or parameter tweak.
    • Goodfellow et al. (2016): In ML research, highlighted how even small amounts of test-set overuse can produce over-optimistic performance metrics.
    • Bailey, Borwein, López de Prado & Zhu (2014): Showed mathematically that repeated model selection on the same dataset virtually guarantees spurious results.

 

Omphalos Perspective

At Omphalos Fund, we assume that even well-intentioned research processes tend toward contamination. That’s why we’ve built data hygiene and role separation into our architecture from day one.

Each trading agent is developed and validated under strict procedural rules:

    • Developers never see the ultimate test datasets used for validation.
    • Testing is performed by an independent evaluation layer that cannot modify model parameters.
    • Once an agent has been tested, its parameters are locked – no retesting, no retroactive optimization.

We don’t believe in perfect isolation – markets are too dynamic for that – but we do believe in systematic discipline. A clean test set is sacred. Once you’ve peeked, it’s gone.

 

A Case in Point: The “Iterative Improvement” Illusion

A mid-sized quant fund once ran an internal contest to improve one of its equity models. Dozens of analysts submitted tweaks – new factors, better filters, adjusted time windows – each claiming incremental improvement. After six months, the model’s out-of-sample Sharpe ratio had doubled.

Excited, the team deployed it live. Within three months, it lost 15%. The reason was clear only afterward: every “improvement” had been guided by test results. The supposedly independent validation set had been used a hundred times. The model was no longer robust – it was surgically optimized to the past.

As López de Prado often reminds his students: “Every time you touch the test set, you lose a bit of the future.”

 

Closing Thought

Testing is not the enemy of robustness – overtesting is. The credibility of a model depends less on how clever its algorithms are and more on how cleanly it was tested.

At Omphalos Fund, we treat the test set not as an instrument of perfection, but as a mirror of humility. It shows us where we might fail – not where we can look good.

👉 In the next chapter, we’ll turn to diversification – the most misunderstood word in finance. It can be either your greatest defense or your most dangerous illusion, depending on what truly lies beneath the correlations.

Stay tuned for Behind The Cloud, where we’ll continue to explore the frontiers of AI in finance and investing.

Funds Europe nominated Omphalos Fund for the “Funds Europe Awards 2025” in the category “European Thought Leader of the Year”.

If you missed our former editions of “Behind The Cloud”, please check out our BLOG.

© The Omphalos AI Research Team November 2025

If you would like to use our content please contact press@omphalosfund.com 

WINNER !!!