Marcos Lopez de Prado

ADVANCES IN FINANCIAL MACHINE LEARNING

Academic materials for Cornell University's ORIE 5256 course.

AUTHORS	YEAR	TITLE	ABSTRACT
Lopez de Prado, Marcos	2018	Advances in Financial Machine Learning: Lecture 1/10	The Pitfalls of Econometric Analysis.
Lopez de Prado, Marcos	2018	Advances in Financial Machine Learning: Lecture 2/10	Financial Applications of Machine Learning.
Lopez de Prado, Marcos	2018	Advances in Financial Machine Learning: Lecture 3/10	Data Analysis.
Lopez de Prado, Marcos	2018	Advances in Financial Machine Learning: Lecture 4/10	Modelling.
Lopez de Prado, Marcos	2018	Advances in Financial Machine Learning: Lecture 5/10	Backtesting I.
Lopez de Prado, Marcos	2018	Advances in Financial Machine Learning: Lecture 6/10	Backtesting II.
Lopez de Prado, Marcos	2018	Advances in Financial Machine Learning: Lecture 7/10	Machine Learning Portfolio Construction.
Lopez de Prado, Marcos	2018	Advances in Financial Machine Learning: Lecture 8/10	Useful Financial Features.
Lopez de Prado, Marcos	2018	Advances in Financial Machine Learning: Lecture 9/10	High-Performance Computing.
Lopez de Prado, Marcos	2018	Advances in Financial Machine Learning: Lecture 10/10	The 7 Reasons Most Machine Learning Funds Fail.
Lopez de Prado, Marcos	2019	Advances in Financial Machine Learning: Numerai's Tournament	Preparation for Numerai's Tournament.

�

RECENT SEMINARS AND ACADEMIC LECTURES

The best part of giving a seminar is the opportunity to meet people who have also thought deeply about that topic, and may have reached different conclusions. I have found these encounters very productive in advancing my own research.

AUTHORS	YEAR	TITLE	ABSTRACT
Lopez de Prado, Marcos	2024	Causality as a Necessary Condition for Investment Efficiency	In this seminar, I explain why investment efficiency cannot be achieved through associational or observational studies, emphasizing that the estimation of optimal portfolios requires causal analysis. The transition from associational to causal inference will necessitate a redesign of the quantitative protocols used in investing.
Lopez de Prado, Marcos	2024	The Role of Causal Inference in the Scientific Method	Every students of statistics learns that correlation does not imply causation. Students are rarely exposed to the reasons behind this statement. This seminar discusses the central role that causality plays in the scientific method, and how the standard statistical toolkit has led to numerous false discoveries. Finally, the seminar proposes several solutions the replication crisis that currently afflicts scientific research.
Lopez de Prado, Marcos; Zoonekynd, Vincent	2024	Why Has Factor Investing Failed?	We show that: (1) factor strategies that over-control for colliders can yield systematic losses, even if all correlations remain constant and the risk premia are estimated with the correct sign; and (2) specification errors explain the erratic performance of factor investing strategies.
Lopez de Prado, Marcos	2023	Can Factor Investing Become Scientific?	I differentiate between type-A and type-B spurious claims, and explain how both types prevent factor investing from advancing beyond its current pre-scientific stage.�This seminar analyzes the current state of causal confusion in the factor investing literature, and proposes solutions with the potential to transform factor investing into a truly scientific discipline.
Lopez de Prado, Marcos	2021	Escaping The Sisyphean Trap: How Quants Can Achieve Their Full Potential	While investment firms have attracted scientific talent, they have done a poor job at developing it. Firms hire specialists, but entice them to become generalists (e.g., portfolio managers). Under the ubiquitous silo/platform structure, quants succumb to the Sisyphean trap, and do not achieve their full potential.
Lopez de Prado, Marcos	2021	Detection of False Investment Strategies through FWER and FDR	This seminar explains how to detect false investment strategies by controlling for the familywise error rate (FWER) and the false discovery rate (FDR) of an organization.
Lopez de Prado, Marcos	2020	Interpretable Machine Learning: Shapley Values	This seminar demonstrates the use of Shapley values to interpret the outputs of ML models. With the help of interpretability methods, ML is becoming the primary tool of scientific discovery, through induction as well as abduction.
Lopez de Prado, Marcos	2020	Three Machine Learning Solutions to the Bias-Variance Dilemma	This seminar explores why machine learning algorithms are generally more appropriate for financial datasets, how they outperform classical estimators, and how they solve the bias-variance dilemma.
Lipton, Alexander; Lopez de Prado, Marcos	2020	Exit Strategies for COVID-19: An Application of the K-SEIR Model	We introduce a new mathematical model (called K-SEIR) to simulate the propagation of epidemics, and evaluate the outcomes of various government interventions. Unlike the standard SEIR model, K-SEIR computes the dynamics of K population groups with different mortality rates, thus allowing the implementation of targeted lockdowns and flexible exit strategies.
Lopez de Prado, Marcos	2020	Three Quant Lessons from COVID-19	Many quantitative firms have suffered substantial losses as a result of the COVID-19 selloff. In this note we highlight three lessons that quantitative researchers could learn.
Lopez de Prado, Marcos	2020	Overfitting: Causes and Solutions	When used incorrectly, the risk of machine learning (ML) overfitting is extremely high. However, ML counts with sophisticated methods to prevent: (a) train set overfitting, and (b) test set overfitting. Thus, the popular belief that ML overfits is false. A more accurate statement would be that: (1) in the wrong hands, ML overfits, and (2) in the right hands, ML is more robust to overfitting than classical methods.
Lopez de Prado, Marcos	2020	Clustered Feature Importance	In classical statistics, p-values are routinely used to determine the variables involved in a phenomenon. However, p-values suffer from various limitations that often lead to false positives and false negatives. Machine learning offers powerful feature importance methods that overcome many of the limitations of p-values.
Lopez de Prado, Marcos	2020	Statistical Association	Despite its popularity among economists, correlation has many known limitations in the contexts of financial studies In this seminar we will explore more modern measures of codependence, based on Information Theory, which overcome some of the limitations of correlations.
Lopez de Prado, Marcos	2020	Clustering	Many problems in finance require the clustering of variables or observations. Despite its usefulness, clustering is almost never taught in Econometrics courses. In this seminar we review two general clustering approaches: partitional� and hierarchical.
Lopez de Prado, Marcos	2019	Machine Learning Asset Allocation	We introduce the nested clustered optimization algorithm (NCO), a method that tackles both sources of efficient frontier's instability. Monte Carlo experiments demonstrate that NCO can reduce the estimation error by up to 90%, relative to traditional portfolio optimization methods (e.g., Black-Litterman).
Lopez de Prado, Marcos	2019	Quantitative Research Through Investment Tournaments	This presentation explores how data and experience barriers impact the quality of quantitative research, and how investment tournaments can help deliver better investment outcomes by overcoming those two barriers.
Lopez de Prado, Marcos	2019	The 7 Reasons Most Econometric Investments Fail	This presentation reviews the main reasons why investment strategies discovered through econometric methods fail. As a solution, it proposes the modernization of the statistical methods used by financial firms and academic authors.
Lopez de Prado, Marcos	2018	Type I and Type II Errors in Finance	Most papers in the financial literature control for Type I errors (false positive rate), while ignoring Type II errors (false negative rate). This is a mistake, because a low Type I error can only be achieved at the cost of a high Type II error. In this presentation we derive analytical expressions for both, after correcting for Non-Normality, Sample Length and Multiple Testing.
Lopez de Prado, Marcos	2018	Ten Financial Applications of Machine Learning	In this presentation, we review a few practical cases where machine learning solves financial tasks better than traditional methods.
Lopez de Prado, Marcos	2018	Market Microstructure in the Age of Machine Learning	In this presentation, we analyze the explanatory (in-sample) and predictive (out-of-sample) importance of some of the best known market microstructural features. Our conclusions are drawn over the entire universe of the 87 most liquid futures worldwide, covering all asset classes, going back through 10 years of tick-data history.
Lopez de Prado, Marcos	2018	A Practical Solution to the Multiple-Testing Crisis in Financial Research	Most discoveries in empirical finance are false, as a consequence of selection bias under multiple testing. This may explain why so many hedge funds fail to perform as advertised or as expected, particularly in the quantitative space. These false discoveries may have been prevented if academic journals and investors demanded that any reported investment performance incorporates the false positive probability, adjusted for selection bias under multiple testing.
Lopez de Prado, Marcos	2018	Financial Machine Learning in 10 Minutes	Most publications in Financial ML seem concerned with forecasting prices. While these are worthy endeavors, Financial ML can offer so much more. In this presentation, we review a few important applications that go beyond price forecasting.
Lopez de Prado, Marcos	2018	How the Sharpe Ratio Died, But Came Back to Life	Selection bias under multiple backtesting makes it impossible to assess the probability that a strategy is false. As a consequence, most quantitative firms invest in false positives. The goal of this presentation is to explain a practical method to prevent that selection bias leads to false positives.
Lopez de Prado, Marcos	2018	The Myth and Reality of Financial Machine Learning	In recent years, Machine Learning (ML) has been able to master tasks that until now only a few human experts could perform. Some of the most successful hedge funds in history apply ML every day. However, myths about Financial ML have proliferated. In this presentation we will review the rationale behind those claims.
Lopez de Prado, Marcos	2017	The 7 Reasons Most Machine Learning Funds Fail	The rate of failure in quantitative finance is high, and particularly so in financial machine learning. The few managers who succeed amass a large amount of assets, and deliver consistently exceptional performance to their investors. However, that is a rare outcome, for reasons that will become apparent in this presentation. Over the past two decades, I have seen many faces come and go, firms started and shut down. In my experience, there are 7 critical mistakes underlying most of those failures.
Lopez de Prado, Marcos	2017	Supercomputing for Finance: A gentle introduction	This presentation introduces key concepts needed to operate a high-performance computing cluster.
Lopez de Prado, Marcos	2016	Mathematics & Economics: A Reality Check	Economics (and by extension finance) is arguably one of the most mathematical fields of research. However, economists� choice of math may be inadequate to model the complexity of social institutions.
Lopez de Prado, Marcos	2016	Financial Quantum Computing	Quantum computers can be used to solve some of the hardest problems in Finance. In this presentation we discuss some applications.
Lopez de Prado, Marcos	2016	Building Diversified Portfolios that Outperform Out-Of-Sample	Mean-Variance portfolios are optimal in-sample, however they tend to perform poorly out-of-sample (even worse than the 1/N na�ve portfolio!) We introduce a new portfolio construction method that substantially improves the Out-Of-Sample performance of diversified portfolios.
Lopez de Prado, Marcos	2015	Quantum Computing (in 5 minutes or less)	The purpose of our work is to show that, in the near future, Quantum Computing algorithms may solve many currently intractable financial problems, and render obsolete many existing mathematical approaches.
Lopez de Prado, Marcos	2015	Multi-Period Integer Portfolio Optimization Using a Quantum Annealer	Computing a trading trajectory in general terms is a NP-Complete problem. This note illustrates how quantum computers can solve this problem in the most general terms.
Lopez de Prado, Marcos	2015	Backtesting	Empirical Finance is in crisis: Our most important �discovery� tool is historical simulation, and yet, most backtests published in the top Financial journals are wrong. We present practical solutions to this problem.
Lopez de Prado, Marcos	2015	Illegitimate Science: Why Most Empirical Discoveries in Finance Are Likely Wrong, and What Can Be Done About It	The proliferation of false discoveries is a pressing issue in Financial research. For a large enough number of trials on a given dataset, it is guaranteed that a model specification will be found to deliver sufficiently low p-values, even if the dataset is random. Most academic papers and investment proposals do not report the number trials involved in a discovery. The implication is that most published empirical discoveries in Finance are likely to be false. This has severe implications, specially with regards to the peer-review process and the Backtesting of investment proposals. We make several proposals on how to address these problems.
Lopez de Prado, Marcos	2014	Optimal Trading Rules Without Backtesting	Calibrating a trading rule using a historical simulation (also called backtest) contributes to backtest overfitting, which in turn leads to underperformance. In this paper we propose a procedure for determining the optimal trading rule (OTR) without running alternative model configurations through a backtest engine.
Lopez de Prado, Marcos	2014	Deflating the Sharpe Ratio	The Deflated Sharpe Ratio (DSR) corrects for two leading sources of performance inflation: Non-Normally distributed returns, and selection bias under multiple testing.
Lopez de Prado, Marcos	2014	Stochastic Flow Diagrams add Topology to the Econometric Toolkit	Just as Geometry could not help Euler solve the �Seven Bridges of K�nigsberg� problem, Econometric analysis or Linear Algebra alone are not able to answer many key questions about how financial markets coordinate. Statistical tables are detailed in terms of reporting estimated values, however that level of detail also obfuscates the logical relationships between variables. Stochastic Flow Diagrams (SFDs) add Topology to the Statistical and Econometric toolkit. SFDs are more insightful than the standard collection of statistical tables because SFDs shift the focus from the algebraic solution of the system to its logical structure, its topology.
Lopez de Prado, Marcos	2013	What to look for in a Backtest	A large number of quantitative hedge funds have historically sustained losses. In this study we argue that the back-testing methodology at the core of their strategy selection process may have played a role. Most firms and portfolio managers rely on back-tests (or historical simulations of performance) to allocate capital to investment strategies. If a researcher tries a large enough number of strategy configurations, a back-test can always be fit to any desired performance for a fixed sample length. Thus, there is a minimum back-test length (MinBTL) that should be required for a given number of trials. Standard statistical techniques designed to prevent regression over-fitting, such as hold-out, are inaccurate in the context of back-test evaluation. The practical totality of published back-tests do not report the number of trials involved, and thus we must assume those results may be overfit.
Lopez de Prado, Marcos	2013	How long does it take to recover from a Drawdown?	Investment management firms routinely hire and fire employees based on the performance of their portfolios. Such performance is evaluated through popular metrics that assume IID Normal returns, like Sharpe ratio, Sortino ratio, Treynor ratio, Information ratio, etc. However, investment returns are far from IID Normal. We find that firms evaluating performance through Sharpe ratio are firing up to three times more skillful managers than originally targeted. This is very costly to firms and investors, and is a direct consequence of wrongly assuming that returns are IID Normal. An implication is that an accurate performance evaluation methodology is worth a substantial portion of the fees paid to hedge funds.
Lopez de Prado, Marcos	2013	A Journey through the "Mathematical Underworld" of Portfolio Optimization	It has been estimated that the current size of the asset management industry is approximately US$58 trillion. Portfolio optimization is one of the problems most frequently encountered by financial practitioners. It appears in various forms in the context of Trading, Risk Management and Capital Allocation. The Critical Line Algorithm (CLA) is the only algorithm specifically designed for inequality-constrained portfolio optimization problems, which guarantees that the exact solution is found after a predefined number of iterations. Surprisingly, open-source implementations of CLA in a scientific language appear to be inexistent or unavailable. The lack of publicly available CLA software, commercially or open-source, means that trillions of dollars are likely to be suboptimally allocated as a result of practitioners using general-purpose quadratic optimizers. For a video of this presentation, follow this link.
Lopez de Prado, Marcos	2012	Low-Frequency Traders in a High-Frequency World: A Survival Guide	Multiple empirical studies have shown that Order Flow Imbalance has predictive power over the trading range. The PIN Theory (Easley et al. [1996]) reveals the Microstructure mechanism that explains this observed phenomenon. VPIN is a High Frequency estimate of PIN, which can be used to detect the presence of Informed Traders.
Lopez de Prado, Marcos	2012	Managing Risks in a Risk-On/Risk-Off Environment	Every structure has natural frequencies. Minor shocks in these frequencies can bring down any structure, e.g. a bridge. An Investment Universe also has natural frequencies, characterized by its eigenvectors. A concentration of risks in the direction of any such eigenvector exposes a portfolio to the possibility of greater than expected losses (indeed, maximum risk for that portfolio size), even if that portfolio is below the risk limits. This is particularly dangerous in a risk-on/risk-off regime. Managing Risk is not only about limiting its amount, but also controlling how this amount is concentrated around the natural frequencies of the investment universe.
Lopez de Prado, Marcos	2012	The Sharp Razor: Performance Evaluation with Non-Normal Returns	Because the Sharpe ratio only takes into account the first two moments, it wrongly �translates� skewness and excess kurtosis into standard deviation. As a result: (a) It deflates the skill measured on �well-behaved� investments (positive skewness, negative excess kurtosis). (b) It inflates the skill measure on �badly-behaved� investments (negative skewness, positive excess kurtosis). Sharpe ratio estimates need to account for higher moments, even if investors only care about two moments (Markowitz framework).
Lopez de Prado, Marcos	2012	Concealing the Trading Footprint: Optimal Execution Horizon	Market Makers adjust their trading range to avoid being adversely selected by Informed Traders; Informed Traders reveal their future trading intentions when they alter the Order Flow; Consequently, Market Makers� trading range is a function of the Order Flow imbalance. The Optimal Execution Horizon (OEH) algorithm presented here takes into account order imbalance to determine the optimal participation rate.
Lopez de Prado, Marcos	2012	Portfolio Oversight: An Evolutionary Approach	An analogue can be made between: (a) the slow pace at which species adapt to an environment, which often results in the emergence of a new distinct species out of a once homogeneous genetic pool, and (b) the slow changes that take place over time within a fund, with several co-existing investment style which mutate over time. A fund�s track record provides a sort of genetic marker, which we can use to identify mutations. The biometric procedure presented here can detect the emergence of a new investment style within a fund�s track record. In doing so, we answer the question: �What is the probability that a particular PM�s performance is departing from the reference distribution used to allocate her capital?�