Blog, News

Anything Can Happen – Why Backtests Always Look Amazing (Until They Don’t)

October 9, 2025 | by omfalos

#57 - Behind The Cloud: Fundamentals in Quant Investing (1/12)

Anything Can Happen – Why Backtests Always Look Amazing (Until They Don’t)

October 2025

𝗙𝘂𝗻𝗱𝗮𝗺𝗲𝗻𝘁𝗮𝗹𝘀 𝗼𝗳 𝗤𝘂𝗮𝗻𝘁𝗶𝘁𝗮𝘁𝗶𝘃𝗲 𝗜𝗻𝘃𝗲𝘀𝘁𝗺𝗲𝗻𝘁𝘀

In this series, the Omphalos AI Research Team want to discuss the key and fundamental aspects of quantitative investing in detail and depth. In particular, our series will not be a beautiful story of how to build the holy grail of investing, but rather a long list of pitfalls that can be encountered when building such systems. It will not be a complete and exhaustive list of pitfalls, but we will try to describe those we discuss in great depth so that their significance is clear to everyone. And importantly, this will not be a purely theoretical discussion. We will provide a practical view on all of these aspects — shaped by real-world lessons and, in many cases, by our own painful and sometimes even traumatic experiences in building and testing systematic investment strategies. These hard-earned lessons are precisely why Omphalos Fund has been designed as a resilient, modular, and diversified platform — built to avoid the traps that have undone so many before.

At Omphalos Fund, we have always been clear about one thing: artificial intelligence is not magic. It is a powerful tool, but its value depends entirely on the system it operates in and the rules that guide it. When applied to asset management, this means that even the most advanced AI can only be effective if it is built on a deep understanding of how markets work — with all their complexities, inefficiencies, and risks.

That is why our latest Behind the Cloud white paper takes a step back from the technology itself. Instead, it examines the foundations of quantitative investing — the real-world mechanics, pitfalls, and paradoxes that shape investment strategies. The aim is not to present a flawless “holy grail” of investing, but to show the challenges and traps that every systematic investor must navigate.

We believe this is essential for anyone working with AI in finance. Without appreciating the underlying business of investing, AI models risk becoming black boxes that look impressive in theory but fail in practice. By shedding light on the subtle but critical issues in quantitative investment design — from overfitting to diversification, from the illusion of normal distributions to the reality of risk of ruin — we provide context for why our platform is built the way it is: modular, transparent, and resilient.

The goal of this white paper is simple:
To help readers understand that using AI in asset management is not only about smarter algorithms — it’s about building systems that are grounded in strong investment fundamentals and designed to survive the real world of markets.

Chapter 1 Anything Can Happen – Why Backtests Always Look Amazing (Until They Don’t)

Quantitative investing often begins with a chart. A clean line moving steadily upward, showing what would have happened if only the strategy had been traded in the past. These simulated histories — backtests — are the first step for any systematic investor. They are meant to answer a simple question: does this idea work?

But here lies the first trap: almost every backtest looks like it works. And the reasons why they look so good reveal one of the biggest pitfalls in quantitative finance.

Not all backtests are created equal. The most common — and most misleading — are those built on the optimization period. Here the parameters of a strategy are tuned directly on the same data being evaluated, ensuring that the model fits history almost perfectly. The result is not insight, but illusion: a curve designed to look flawless because it was sculpted to the past. In this sense, the optimization-period backtest is not a test at all — it is a lie.

A more honest approach is to separate a test period — data unseen during optimization — to evaluate the strategy’s robustness. This is indeed better, and it often exposes fragilities hidden in the optimized version. Yet even test-period results are not a guarantee. They are still based on yesterday’s conditions, which may not reflect tomorrow’s. As we’ll explore in the next chapter, even strong performance on a clean test set can mask structural risks that lead to ruin.

Backtests are seductive because they compress years of hypothetical trading into one elegant line. They suggest repeatability, objectivity, inevitability. But what they actually reflect are layers of assumptions: about how data is cleaned, how trades are executed, how costs are modeled, and how risks unfold. The chart does not tell you whether the system will survive tomorrow’s chaos. It only shows how it would have fared in yesterday’s order.

The Hidden Fragility of “Perfect” Performance

The deeper danger of backtests is that they almost always benefit from hindsight — even when the designer tries to be careful. Parameters are adjusted to fit the historical data. Rare events that never happened in the dataset are ignored. Structural changes in market regimes are invisible because they haven’t yet occurred.

This creates a paradox: the more time you spend “improving” a backtest, the more impressive it looks, but the less realistic it becomes. A strategy that perfectly fits the past often performs poorly in the future, because it has learned yesterday’s noise, not tomorrow’s signal. As Nassim Nicholas Taleb points out in Antifragile, systems built to look smooth and efficient in stable conditions are often the least capable of withstanding disorder when it inevitably strikes. His proposed antidote — the “barbell strategy” — is not to seek perfection, but to build robustness by combining extreme safety on one side with calculated risk-taking on the other.

For beginners, this trap is obvious: curve-fitting a simple moving average to match the S&P 500. For professionals, it is more subtle: over-optimizing complex machine learning models until their predictions are indistinguishable from historical quirks. In both cases, the backtest shines on paper and fails in practice.

Research Spotlight

Academic studies reinforce this sobering truth.

- Sullivan, Timmermann & White (1999): Their landmark paper showed that most technical trading rules that looked profitable historically lost their edge once adjusted for data-snooping bias.
- Bailey et al. (2014): Introduced the concept of “probabilistic Sharpe ratios,” quantifying how likely a strategy’s performance is due to skill versus luck.
- Lo & MacKinlay (1990): Demonstrated the dangers of data mining, showing how seemingly strong results can collapse when tested on out-of-sample data.
- López de Prado (2014), The Probability of Backtest Overfitting: Provided a rigorous statistical framework to measure how likely an investment strategy’s backtest results are the product of overfitting rather than genuine predictive power.
- MIT Sloan (2023): Reviewed over 400 published backtested strategies in finance; fewer than 15% maintained statistically significant performance when re-tested on out-of-sample data.
- CRSP, University of Chicago: Found that transaction costs and slippage — often simplified or ignored in backtests — wiped out more than half of the reported “alpha” across dozens of quant strategies.
- Industry evidence: Even hedge funds with strong research pipelines admit that only a small fraction of tested strategies survive. One major multi-strategy fund reported that fewer than 1 in 100 internally tested ideas make it to live trading.

The message is clear: the backtest is not proof. At best, it is a hypothesis. Always ask about the test period (results on unforeseen in optimization data), because everything looks good on the very data it was optimized on; that’s not evidence of robustness, it’s just mathematics fitting itself to history.

Omphalos Perspective

At Omphalos Fund, we view backtests as stress tests, not success stories. A backtest does not confirm that a strategy “works”; it reveals the boundaries of where it might fail.

This is why our research team deliberately avoids chasing “perfect” historical fits. Instead, we ask:

- How does the strategy behave in extreme negative periods?
- Is performance uncorrelated with other strategies already in the system?
- Does it remain robust when transaction costs or assumptions are varied?

By treating backtests as tools for falsification rather than validation, we avoid falling in love with models that look good on paper but cannot survive reality. This philosophy allows us to scale a portfolio of agents that complement one another, rather than chase the illusion of one flawless backtest.

A Case in Point: The XIV ETN Collapse (“Volmageddon,” 2018)

On February 5, 2018, the volatility-linked exchange-traded note XIV lost over 90% of its value in a single day. For years, products like XIV had backtested beautifully: selling volatility seemed like a consistent way to generate steady returns. Investors saw smooth equity curves and assumed the strategy was safe.

But the backtests missed a crucial fact: volatility is not normally distributed. When stress events hit, volatility spikes can be extreme and sudden. On that day in February, years of profits evaporated in hours. The backtests, which had shown steady gains, were no preparation for the reality of an outlier event.

This collapse is a vivid reminder that anything can happen. A backtest can tell a comforting story, but markets often write their own endings.

👉 In the next chapter, we’ll explore another paradox: How you can have a strategy that wins 90% of its trades and still ends up losing money — a pitfall that reveals the importance of risk management and payoff asymmetry in systematic investing.

Stay tuned for Behind The Cloud, where we’ll continue to explore the frontiers of AI in finance and investing.

Funds Europe nominated Omphalos Fund for the “Funds Europe Awards 2025” in the category “European Thought Leader of the Year”.

If you missed our former editions of “Behind The Cloud”, please check out our BLOG.

If you would like to use our content please contact press@omphalosfund.com

WINNER !!!

Anything Can Happen – Why Backtests Always Look Amazing (Until They Don’t)

Anything Can Happen – Why Backtests Always Look Amazing (Until They Don’t)

#57 - Behind The Cloud: Fundamentals in Quant Investing (1/12)

Anything Can Happen – Why Backtests Always Look Amazing (Until They Don’t)

October 2025

𝗙𝘂𝗻𝗱𝗮𝗺𝗲𝗻𝘁𝗮𝗹𝘀 𝗼𝗳 𝗤𝘂𝗮𝗻𝘁𝗶𝘁𝗮𝘁𝗶𝘃𝗲 𝗜𝗻𝘃𝗲𝘀𝘁𝗺𝗲𝗻𝘁𝘀

Chapter 1

Anything Can Happen – Why Backtests Always Look Amazing (Until They Don’t)

The Hidden Fragility of “Perfect” Performance

Research Spotlight

Omphalos Perspective

A Case in Point: The XIV ETN Collapse (“Volmageddon,” 2018)

Address

Explore