GAM Systematic’s Chris Longworth and Silvia Stanescu discuss the prevalence of small data problems in finance, how to identify them, as well as some of the approaches that can be applied to address them.
Over the last decade, we have seen significant advances in machine learning across a wide range of fields. In many cases, this has come from applying very complex models – often containing tens of thousands of parameters – to extremely large datasets, commonly containing millions of examples. These applications are often described as ‘big data’ problems.
However, there is a related category of problems where the amount of available data to train a machine learning model is fundamentally limited, which we refer to as ‘small data problems’. Small data problems are very common in finance and need to be approached in a very specific way since in most cases, techniques designed to solve big data problems simply do not work well when applied to small data sets.
One such example of a small data problem is the study of large earthquakes. High quality historical records of earthquakes start around 1900. However, since then there have been around only 100 earthquakes of magnitude 8.0 or greater worldwide, as shown in Figure 1. Importantly, the issue is not that we did not look hard enough for data. We already have the complete dataset, but it is small.
Figure 1: Locations of the largest earthquakes since 1900
The tell-tale signs of small data
There are a number of signs that one might be working with small data:
- Time series: If the data is associated with a particular point in time on a specific date, there a high chance of a small data problem. This is especially likely to be the case when dealing with data that is only periodically available, which is common for economic data.
- Rarity: Does the data represent real world events, and do those events occur rarely in nature? This is the earthquake situation outlined above.
- Aggregate: Is the data aggregate data? If the data represents whole countries or already represents a global aggregate, there is likely to be a small data problem. With the exception of astronomical data, we normally only have data from one planet to work with.
- Correlated: If the data contains a high degree of internal structure or correlation, it is likely that there are fewer independent data samples, particularly if the dataset is noisy.
It turns out that many problems in finance satisfy all of these criteria. Finance consists of both big data and small data problems and the challenge is to be able to differentiate one from the other. In our latest white paper, we discuss in greater depth some examples of small data problems in finance and outline some of the approaches that can be applied to address these challenges.
The information in this document is given for information purposes only and does not qualify as investment advice. Opinions and assessments contained in this document may change and reflect the point of view of GAM in the current economic environment. No liability shall be accepted for the accuracy and completeness of the information. There is no guarantee that forecasts will be achieved. The mentioned financial instruments are provided for illustrative purposes only and shall not be considered as a direct offering, investment recommendation or investment advice. Assets and allocations are subject to change. Past performance is no indicator for the current or future development.