#23 - Behind The Cloud: Demystifying AI in Asset Management: Is It Really a Black Box? (4/6)
Avoiding AI Overfitting and Bias in Investing – Safeguards for a Transparent Future
November 2024
As AI continues to shape the landscape of asset management, two critical challenges remain front and center: Overfitting and Bias.
Overfitting occurs when an AI model performs exceptionally well on historical data but struggles to generalize to new, unseen data. This is a common issue in the investment industry, where models often display strong performance on backtests but fall short in real-world scenarios due to overfitting.
Bias, on the other hand, refers to unintended distortions that can arise from the data or the model itself, leading to unfair or misleading predictions. In this chapter, we will explore how AI systems can be safeguarded against overfitting and bias to ensure they deliver reliable and transparent investment outcomes.
What Is Overfitting?
Overfitting happens when an AI model becomes too “fit” to the specific dataset it has been trained on. Instead of learning the underlying patterns that apply generally, the model starts to memorize specific details that are only relevant to the historical data. As a result, the AI model may generate strong results in backtesting but struggle to perform in real-world situations where new, unseen data is introduced.
Key Characteristics of Overfitting:
- High accuracy on historical data: The model may seem incredibly accurate when tested on past data.
- Poor performance on new data: When applied to new market conditions or data, the model’s predictions may become unreliable.
- Overly complex models: The more complex the model, the higher the risk of overfitting. It may start learning noise and insignificant details rather than the real patterns.
How Overfitting Affects AI-Driven Investing
In the context of AI-driven investing, overfitting can have serious consequences. An AI model that has overfit on historical market data may produce misleading investment signals when conditions change. For example, the model might excel at identifying past market trends that no longer apply, leading to poor investment decisions.
Example in Finance: Imagine an AI model trained on stock market data from the past five years, a period of steady economic growth. If that model overfits to this data, it might fail to recognize warning signs of an impending market downturn, as it has “learned” to expect continuous growth. When the real market shifts, the model might suggest staying long on stocks, leading to significant losses.
Guarding Against Overfitting
To avoid overfitting, asset managers must ensure that their AI models are rigorously tested and validated. Several techniques can help reduce the risk of overfitting and improve the model’s performance on unseen data:
Proper Selection and Construction of the Training Set: The most crucial step to avoid overfitting is selecting a representative and well-constructed training set. Ensuring that the training data covers a broad range of market scenarios allows the model to generalize more effectively to real-world conditions.
Cross-Validation: This technique involves splitting the data into different subsets and training the model on one subset while testing it on another. By repeating this across multiple iterations, cross-validation helps assess how well the model generalizes to new data.
Regularization: Regularization techniques impose penalties on overly complex models, encouraging them to simplify their predictions. This reduces the risk of the model fitting noise or insignificant details in the training data.
Model Size vs. Training Set Size: Ensuring that the model complexity is balanced with the size of the training data is essential. If a model has more parameters than there are samples in the training set, it risks memorizing the entire training data rather than learning meaningful patterns. This simple ratio check can help prevent overfitting by ensuring the model has the right level of complexity.
Pruning Unnecessary Features: In AI models, not all features (or input variables) are equally important. By removing features that don’t add value or introduce noise, the model becomes less likely to overfit to irrelevant details.
Use of Test and Validation Sets: A validation set is used during training to select the best model configuration by evaluating performance on data it hasn’t seen during training. This helps fine-tune the model’s parameters without using the test set, ensuring that the final model is optimized without overfitting to training data. The test set is then used only once, at the end, to evaluate the model’s performance objectively, giving an unbiased assessment of how well it generalizes to truly unseen data.
Human Oversight: Human experts at Omphalos play a critical role in monitoring the model’s performance. They ensure that the AI is not generating signals that look suspiciously good on historical data but fail under new market conditions. This oversight helps catch potential overfitting issues early on.
Understanding Bias in AI Models
Bias in AI refers to unintended distortions or patterns that can lead the model to produce unfair or misleading predictions. In asset management, bias can stem from data quality, model assumptions, or the way the model is trained. If left unchecked, bias can lead to suboptimal investment decisions and even reputational damage for a fund.
Sources of Bias
- Data Bias: If the data used to train the AI contains biases, the model will inevitably pick them up. For example, if the training data overrepresents certain market conditions or asset classes, the model might favor those types of investments, even when they’re not the best choice.
- Model Bias: Some models have inherent biases based on the algorithms they use. For instance, a model might be more likely to pick safe, low-volatility investments because it prioritizes stability over growth. This could bias the portfolio toward lower returns in the long run.
- Confirmation Bias: When humans select the data for training, they may unconsciously choose data that supports their assumptions or existing strategies, introducing bias before the model is even trained.
Safeguards Against Bias in AI Models
To minimize the risk of bias in AI, firms must take a proactive approach, implementing safeguards throughout the data selection and model-building processes.
These include:
- Diverse and Balanced Datasets: One of the most effective ways to combat bias (and also overfitting) is by ensuring the data used to train the model is diverse and representative of different market conditions. By including data from various economic cycles, sectors, and geographies, firms can reduce the risk of the model becoming skewed toward specific biases.
- Regular Audits of Model Performance: Human experts regularly audit the AI models at Omphalos to check for signs of bias. This might involve running the model on different datasets to see if it produces consistent results across diverse conditions.
- Bias Detection Algorithms: Some AI models are built with bias detection algorithms that flag potential biases in the decision-making process. This allows firms to catch biases before they influence investment decisions.
- Transparent Model Development: Having a clear, transparent process for how models are built and trained helps identify potential biases early. Omphalos emphasizes this transparency, ensuring that the strategies driving the AI are open to scrutiny.
- Human Oversight and Data Selection: Human intervention is critical in ensuring that the data fed into the model is unbiased. Experts at Omphalos carefully choose reliable and diverse data sources, ensuring the AI system works with high-quality inputs. Moreover, human oversight during the model’s use helps ensure that biases don’t affect the final investment decisions.
AI Overfitting vs. Human Bias
It’s important to note that both AI models and human fund managers are susceptible to bias, though the causes may differ. Human fund managers may rely on past experience or gut feeling, which can introduce emotional or cognitive biases into investment decisions. AI models, on the other hand, are primarily influenced by data biases or the structure of the model itself.
Example of Human Bias: A fund manager might hold onto a poorly performing asset because they’ve personally invested time and resources into researching it, even when all evidence suggests it’s time to sell. This emotional attachment introduces bias into the decision-making process.
In contrast, AI systems operate without emotion, relying purely on data. While this reduces the risk of emotional bias, data biases or overfitting still need to be addressed to ensure the AI’s decisions remain objective and reliable.
Omphalos Fund’s Approach to AI Transparency
At Omphalos Fund, our approach to AI-led investing combines the power of AI with meticulous human oversight to safeguard against overfitting and bias. While AI handles the core tasks of data analysis and signal generation, human experts ensure that data inputs are reliable, diverse, and unbiased. An essential component of our methodology is assessing the model’s ability to generalize to unseen data—a key measure of a model’s reliability in real-world scenarios.
To accurately evaluate this generalization ability, we employ a complex methodology of training models using cross-validation across multiple folds and testing on various distinct datasets. This approach allows us to assess the probability of the model’s success on genuinely unseen data, taking into account variations in trend, seasonality, and unique market patterns across different time periods. Rather than relying solely on historical data performance, our process emphasizes the ability to perform well across future market conditions.
Transparency is central to our philosophy: While the detailed internal mechanisms of an AI model may appear opaque, the strategies guiding these models at Omphalos Fund are fully transparent, ensuring that every investment decision is grounded in a clear and explainable rationale. This hybrid approach—combining rigorous AI signal generation with robust human oversight—ensures that AI functions as a transparent and effective tool, rather than a “black box” of uncertainty.
Conclusion: Safeguarding the Future of AI Investing
Overfitting and bias are significant challenges in AI-led investing, yet they are manageable with the right strategies. Through rigorous safeguards—such as careful data selection, regular audits, and human oversight—asset management firms can ensure their AI models are not only reliable but also generalizable to unseen data. At Omphalos Fund, we prioritize transparency and performance, assessing each model’s ability to generalize through advanced testing methodologies that go beyond single-period evaluations. By calculating the probability of a model’s success across various market scenarios, we aim to create a predictive system that adapts reliably to changing conditions.
Our approach to AI transparency ensures that this technology remains a trusted, effective tool in the investment process.
In the coming weeks, we will explore the differences between AI decision-making and human “gut feeling,” the safeguards in place to prevent AI bias and overfitting, and how AI can become more transparent in the future. The goal is to demystify AI in asset management and show that the “black box” perception is more myth than reality.
Next week, we’ll explore whether AI-led investing really functions as a “black box,” and how systematic strategies and transparency combine to dispel this common misconception.
Thank you for following us. We will try to continue to address relevant topics around AI in Asset Management.
If you missed our former editions of “Behind The Cloud”, please check out our BLOG.
© The Omphalos AI Research Team – November 2024
If you would like to use our content please contact press@omphalosfund.com