This is the 2nd chapter of the 11th 'Behind The Cloud' series:
The Data Engine - How AI Funds Sense Markets
Omphalos’ long-term development has reinforced one lesson: in live markets, robustness beats cleverness.
Data is where robustness begins.
This series continues the Behind The Cloud mission: to share research-based insights into what truly drives AI investing, beyond buzzwords, beyond demos, and always grounded in real-world constraints.
Trust good (!) data, not just AI.
Point-in-Time Reality - What the System Could Actually Have Known
This chapter explains why point in time integrity is the foundation of any credible AI investment process. It covers revisions, publication lags, survivorship bias, corporate action adjustments, and timestamp mismatches, all of which can accidentally leak the future into the past.
Point in time discipline is not a technical detail. It is a prerequisite for trust, for robustness, and for honest performance evaluation.
The Core Idea, A Dataset Is Not a Time Machine
When people say they have historical data, they usually mean they have a table of values labeled by date. In finance, that is not enough. The question is not what the dataset contains today. The question is what the market would have known at the time. Point in time data means every value is stored exactly as it was available then, including delays, missing observations, and revisions that were not yet published.
Without that, research becomes a subtle form of time travel. Models learn from information that did not exist. Strategies appear to anticipate events they could never have seen. The backtest becomes a mirror of hindsight, not a simulation of reality.
Revisions, When the Past Keeps Changing
Many financial and economic datasets are revised. Inflation readings are updated. GDP figures are restated. Company fundamentals change through accounting adjustments. Index compositions are updated after events occur.
If you build models using the latest revised values, you implicitly assume that the past was as clear as the present. In real markets, it was not. This matters because revisions often happen around stress periods, which are precisely when models are most fragile. A strategy that looks stable on revised history can fail in live trading because it was trained on a cleaner world than the one it operates in.
Point in time discipline forces humility. It removes the illusion that historical data is truth. It makes clear that historical data is a sequence of imperfect observations.
Publication Lags, Information Has a Clock
Even when data is not revised, it arrives with a delay.
Macroeconomic releases are published at specific times. Earnings are reported on known schedules. News is time stamped, but market digestion is not instantaneous. If you align data by calendar date rather than by availability time, you introduce leakage.
A classic example is using an end of day close to generate a signal that is assumed to have been available at the open. The numbers look small. The effect can be enormous.
A robust data engine aligns everything to the moment it could have been known, not the day it refers to. That is why point in time is not only about values. It is about clocks.
Survivorship Bias, The Missing Dead Matter
Another invisible distortion is survivorship bias. Many datasets, especially those assembled from public sources, silently drop instruments that disappeared. Delisted stocks vanish. Failed funds vanish. Merged entities are replaced by survivors. Index histories are presented as if today’s constituents were always present.
This creates a false world. It makes strategies appear safer and more profitable than they were. It hides drawdowns and volatility that come from failure, default, and disappearance, which are real features of markets.
Survivorship bias is especially dangerous for AI systems because it removes the negative examples the model needs to learn robustness. A system trained on survivors learns a world with less ruin. It becomes overconfident.
Corporate Actions and Adjustments, Clean Series Hide Messy Reality
Financial time series are often adjusted for splits, dividends, and corporate actions. This is necessary for comparability. But it can also hide reality if the adjustment is misapplied or not point in time.
The moment you adjust history using information known only later, you contaminate the training set. The model learns from a series that is internally consistent but historically impossible.
This is why a production grade pipeline treats corporate actions as first class events, not as background housekeeping. It keeps raw and adjusted series distinct, time stamps adjustment events, and ensures that what the model sees reflects what the market could have seen.
The Hidden Trap, Timestamp and Calendar Mismatches
Most leakage is not caused by malicious intent. It is caused by alignment errors.
Different markets operate in different time zones. Some instruments trade nearly continuously. Others trade in sessions. News arrives at one time, but data is indexed by another. If you merge sources without consistent timestamps, you can easily shift information across the boundary between past and future.
The danger is that this can improve backtests without being obvious. A few hours of leakage is enough to inflate Sharpe ratios, reduce apparent drawdowns, and make a strategy look robust. In other words, small timestamp errors can create large illusions.
LLMs and the New Form of Leakage
Large language models add a new challenge.
LLMs can summarize, contextualize, and structure information at scale. That is powerful. But they can also amplify point in time mistakes because they rationalize what they are given. If the retrieval layer surfaces revised documents, mixed time periods, or content without point in time context, the model will produce coherent answers that feel correct while being historically impossible.
There is another problem that is just as important: methodology changes. Many datasets do not only change because values are revised. They change because the way values are calculated changes. Economic indicators are rebased. Inflation baskets are adjusted. Accounting standards evolve. Index rules are modified. Vendor definitions, classifications, and estimation methods are updated. A time series can therefore look continuous while measuring slightly different things across time.
For an LLM, this is particularly dangerous. If the retrieval system does not surface methodology changes, the model may compare values across periods as if they were fully equivalent. It may explain a trend that partly reflects a calculation change. It may interpret a structural break as an economic signal, when part of the break comes from the measurement system itself.
This does not mean LLMs are unusable. It means they must be bounded. In LLM driven workflows, point in time governance must extend beyond numeric data feeds. It must include documents, text sources, and retrieval pipelines, ensuring the model reasons over verified and time consistent context.
Without that, language becomes a tool for creating plausible hindsight.
The Hard Part, These Problems Rarely Come Alone
In practice, point-in-time failures rarely appear as one clean error. They usually come in combinations. A macro series may be revised, rebased, and published with a delay. A company dataset may include survivorship bias, corporate action adjustments, and changing reporting standards. A vendor feed may change its methodology at one point in time, its coverage at another, and its timestamp logic later. Each issue may look small on its own, but together they can distort the historical record in ways that are very difficult to detect.
This is what makes point-in-time discipline so demanding. The problem is not only that the future can leak into the past. The problem is that different forms of distortion can appear at different moments, across different datasets, and in different combinations. A model may still produce clean backtests and coherent explanations, while learning from a history that never truly existed in that form.
What Point in Time Discipline Looks Like in Practice
Point in time discipline is not a slogan. It is an operating model.
This discipline is tedious. It is expensive. It does not show up in marketing. But it is the difference between an AI system that learns reality and one that learns a fantasy.
Omphalos Perspective
At Omphalos, point in time truth is treated as a core constraint. We do not assume that datasets are reliable by default. We assume they are contaminated until proven otherwise.
This is why point in time integrity is embedded into the data engine, from timestamp alignment to revision handling and from instrument survival to document context. It also means continuously monitoring and verifying data sources, especially when vendors, index providers, statistical agencies, or reporting frameworks change the methodology behind how values are calculated. A number may look comparable across time, but if the calculation logic has changed, the meaning of the data has changed as well.
In live markets, this discipline is not optional. Without it, every layer of intelligence becomes suspect.
Trust good (!) data, not just AI.
In the next chapter, we move from the timing of information to the quality of information itself. Noise is not random. It is structured by market microstructure, liquidity regimes, and execution mechanics, and it can mislead models if it is treated as statistical clutter instead of market behavior.
Supporting research & news
Next week we will publish the second chapter of this series: "Noise Is Not Random, Microstructure, Regimes, and False Signals'
If you missed our former editions of "Behind The Cloud", please check out our BLOG.
Omphalos Fund won the "Funds Europe Awards 2025" in the category "European Thought Leader of the Year".
Omphalos Fund won the "EuroHedge Awards 2025"
© The Omphalos AI Research Team - July 2026
If you would like to use our content please contact press@omphalosfund.com