# Introduction

In previous posts we have explored ideas on how to construct fundamental models for forecasting the price of corn and soybean. These models used as input parameters the stock-to-usage numbers calculated from the monthly WASDE reports together with the Dollar index, the mean value of crude in the previous month and the Ruble vs Dollar exchange rate. The aim of this report is to extend these results to a spread between two related commodities, in this case Corn and European wheat. We use the shorthand C vs CA for this pair.

When we calculate the price difference between two commodities in the strategy Commodity 1 vs Commodity 2 we follow the convention

$S^A = P^A_2 - P^A_1$ where $$P^A_i$$ is the price of contract with symbol $$A$$ for commodity $$i$$. After all the spreads have been calculated we associate the spreads between two consecutive WASDE reports with the timestamp of the first report. This way we assume that the WASDE reports reflect reality with repect to the underlying fundamentals and we want to study the subsequent spread behaviour. Similar to the spread convention we use the difference between the two fundamentals defined by

$\delta = F_2 - F_1.$ Here $$F_i$$ is the value of the particular fundamental of commodity $$i$$. In this example we use stock-to-usage values. We collect all the different $$\delta$$ values ang group them into decile buckets. This amounts to dividing all the different $$\delta$$’s into ten bucket all containing more or less the same number of entries. Withing these decile buckets we then calculate statistics on the spreads $$S^A$$.

The different plots below show the spread statistics of each commodity contract code calculated withing the fundamental buckets shown on the x-axis. The darker and ligher shaded regions show the 25th to 75th and 5th to 95th percentiles respectively. The solid black lines shows the median spread. The latest value and associated decile bucket of the spread is indicated by the red point on each of the facets. From the plots below we can see that the C vs CA spread is on the upper range of the spread values given the current fundamentals.

In anticipation of the feature importance results we show the spread in the different contract months as a function of comdty2 which in this case refers to European wheat stock-to-usage.

Another way to look at the prices of two related commodities is the calculate their price ratios. Similar to the spreads we use the convention

$R^A = P^A_2/P^A_1$ to calculate the ratio of prices between commodities 1 and 2. Continuing with this convention we define the fraction of fundamentals as

$\phi = F_2 / F_1.$

We then follow in exactly the same way we did in the case of the spreads, except we calculate statistics of the price ratios $$R^A$$ within each of the decile buckets of the fundamental ratios $$\phi$$. The interpretation of the plots below are the same of those of the spread case. Notice that when we express the relative value in terms of a price ratio it is trading at an even greater discount.

The plots above can give us a quick idea whether the current values of the spreads or ratios are trading at a premium or discount with respect the the prevailing stock-to-usage numbers reported by the USDA in the monthly WASDE report.

In the following we include more fundamental features in an attempt to incorporate some interaction terms that might be present. Below is a list of new features we include:

• daysdiff - the number of days left until the contract expired
• product - the product of the two stock-to-usage numbers
• comdty1 - stock-to-usage of commodity 1, C is this case
• comdty2 - stock-to-usage of commodity 2, S in this case
• lagged features - denoted by lag 1 where we include the results from the previous report
• delta - absolute changes between current and lagged values
• per - percentage changes between current and lagged values

# A quick note on the data

The European wheat data have undergone a couple of changes over the past where certain contracts have stopped and new ones have started. The table below shows a reduced version of the contracts associated with CA. Notice that the F contract stopped in 2016, while the Q and X contracts stopped in 2013 and 2014 respetively. The U and Z contracts then started in 2015. In order to create a larger historical database for the calendar spreads we associated the X and Z contract with each other. Similarly we associated Q and U with each other.

year F H K N Q U X Z
2009 1 1 1 NA 1 NA 1 NA
2010 1 1 1 NA 1 NA 1 NA
2011 1 1 1 NA 1 NA 1 NA
2012 1 1 1 NA 1 NA 1 NA
2013 1 1 1 NA NA NA 1 NA
2014 1 1 1 NA NA NA 1 NA
2015 1 1 1 NA NA 1 NA 1
2016 NA 1 1 NA NA 1 NA 1
2017 NA 1 1 NA NA 1 NA 1
2018 NA 1 1 NA NA 1 NA 1
2019 NA 1 1 NA NA 1 NA 1
2020 NA 1 1 NA NA 1 NA 1
2021 NA 1 1 NA NA 1 NA 1
2022 NA 1 1 NA NA 1 NA NA

When determining the spreads we use the convention outlined in the table below. Notice that currently the CA wheat contract does not have an N contract, so in terms of the strategy code N we use the C N and the CA U contracts.

code C CA
H H H
K K K
N N U
U U U
Z Z Z

In a previous post we showed some examples how we use techiniques from machine learning in out investment process. One techinique we find particularly interesting is the study of feature importance when using random forests in the modelling process. Similar to what we have done in previous posts (here, here and here) we explore the importance of each of the features we have used in the modelling process.

The barplots below show the feature importance of each of the input features for the different contract codes. The feature with greatest importance is highlighted in orange. The dashed red line shows the value of importance if all the features were equally important. It is interesting to note that the differnt contract codes have vastly different features gives the best results for the majority of cases.

The plot below shows the aggregated feature importance values. From this plot it is evident that the stock-to-use values of CA is the main contributing feaure followed by the ration of stock-to-use numbers. However, the other top predictive features are not far off.

The table below shows the model fit statistics for each contract code.

code R squared mean cv score std cv test score
H 0.4943776 0.1418684 0.1830277
K 0.5613687 0.2263561 0.0992956
N -0.5102992 0.3809324 0.2204969
U 0.2534031 0.0118944 0.1366802

The model predictions together with the latest values of the spreads are shown in the plot below. The shaded regions shows the 25th to 75th percentile of the model predictions. Median predictions are represented by the solid black line. Current C vs CA spread values are shown by the red dots. Notice that all the spreads, except that of U, is containing within the model error brackets. They are all on the lower en of the forcasted range.

# Modeling the Ratio

This section follows the same outline as the previous section with the exception that we replace the spread with the ratio. The barplots below show the feature importance of each of the input features for the different contract codes. The feature with greatest importance is highlighted in orange. The dashed red line shows the value of importance if all the features were equally important. It is interesting to note that the fraction feature gives the best results for the majority of cases.

The plot below shows the aggregated feature importance values. From this plot it is even more clear the the fraction feature clearly dominated the predictive power of the model.

The table below shows the model fit statistics for each contract code.

code R squared mean cv score std cv test score
H 0.4132055 0.0832211 0.1977795
K 0.4954519 0.1680103 0.1729898
N 0.2439843 0.2552635 0.2753243
U 0.3862745 -0.0667841 0.2270038

The model predictions together with the latest values of the ratios are shown in the plot below. The shaded regions shows the 25th to 75th percentile of the model predictions. Median predictions are represented by the solid black line. Current C vs CA ratio values are shown by the red dots.

# Roll Structure

In this section we explore how this structure rolls. Throughout we assume a long position in CA and a short position in C. We also assume a 2.72:1 ratio when trading this relative, i.e. long 2.72 units of CA and short one unit of C. This forces the trade to be equivalent on the metric tonne basis. We only consider data from 1 Jan 2009 onward. The table below shows the roll structure we follow. Consider the first row which represents Januray. In this case we want to have positions in the H contract for both C and CA. During February we roll each of those positions forward to the K contracts.

month C CA
1 H H
2 K K
3 K K
4 N U
5 N U
6 U U
7 U Z
8 Z Z
9 Z Z
10 Z H
11 H H
12 H H

In the facet plot below we show the ETF price and Cummulative Roll Gaps on the left and right respectively. The ETF price shows the value of $1 invested in the spread at the beginning of the time series. The Cummulative Roll Gaps shows the cummulative difference in price (USD/mt) when the structure is rolled forward. Note that the shape of the Cummulative Roll Gaps greatly determines the shape of the evolution of the ETF price. This is the case for most relative value pairs where the term structure of the different commodities plays a major role in the overall return profile. From the data is it clear that sustained periods of positive return coinside with periods where the Cummulative Roll Gaps has a zero or positive slope. The plot below takes the ETF price data and stackes the normalised data together from the 1st of January to the 31 of December for each year. Within each month we calculate statistics. The 25th to 75th and 5th to 95th percentiles are given by the darker and ligher shaded regions respectively. The median yearly return is represented by the solid black line.. Superimposed on top of the universe ribbon plot the the data for 2018 and as well as 2019. From the plot it is clear that under normal circumstances we can expect to pay away a roll yield of 16% of the capital allocated to this strategy on a yearly basis. Below we show the futures curves of C and CA. In order to have a flat or positive roll yield we require the curve of the long commodity (CA) to have a slope that is less than that of the short commodity (C). In the plot below we convert both commodities to USd/Mt. By eye it is not clear which curve is the steapest. The table below summarises the results. The columns are pretty self-explanatory, save for C change and CA change which show the differnce between consecutive prices, i.e. the value of the calendar spread. The Roll Yield the difference between the change in C and CA. If the roll yield is positive it indicates favourable roll period. Note that currently the Roll Yields is favourable for the majority of the curve. identifier daysdiff C price CA price C change CA change Roll Yield H 2020 85 148.32 203.20 NA NA NA K 2020 147 150.58 203.75 2.26 0.55 1.71 N 2020 208 152.36 198.76 1.78 -4.99 6.77 U 2020 270 152.06 198.76 -0.30 0.00 -0.30 Z 2020 361 153.73 202.64 1.67 3.88 -2.21 H 2021 449 157.77 205.97 4.04 3.33 0.71 K 2021 512 159.74 207.92 1.97 1.95 0.02 N 2021 573 160.82 203.75 1.08 -4.17 5.25 U 2021 635 158.56 203.75 -2.26 0.00 -2.26 Z 2021 726 159.64 206.53 1.08 2.78 -1.70 N 2022 938 166.33 210.42 6.69 3.89 2.80 Note that the roll yield is the difference between to calendar spreads. In the following we use corn and European wheat stock-to-usage to model the different calendarspreads involved in the roll process. In this way, if we have to fundamental point of view on the underling stock-to-usage values we can get an idea of how the spread as well as the roll will be affected. ## Corn Calendars Below we show the spread statistiscs of the corn calendar spreads making up the roll process. As betore the current values of the spreads are indicated by the red dots. Below we show the feature importances for each of the calendar spreads. We can see that the stock-to-usage features plays the biggest roles in the predictive models. calRef R squared mean cv score std cv test score HK -0.5132399 -0.5145170 0.4259837 KN -0.2905441 0.0683175 0.0352093 NU 0.6387132 0.4719729 0.1340568 UZ 0.4542380 0.5747438 0.1257030 The model predictions together with the latest values of the spreads are shown in the plot below. The shaded regions shows the 25th to 75th percentile of the model predictions. Median predictions are represented by the solid black line. Current C calendar spread values are shown by the red dots. The calendar spreads show some potential to become more contango if the current stock conditions remain unchanged. Below we show the calendar spread model sensitivities. These plots measure how a change in the stock-to-usage affects the model prediction of the spread. Similar to before the shaded region represents the 22th to 75th percentile of the model predictions with the median represented by the solid black line. The mean is given by the dashed red line. Here we set the daysdiff feature to 10 days prior to expiry. ## European Wheat Calendars Below we show the spread statistiscs of the European wheat calendar spreads making up the roll process. As before the current values of the spreads are indicated by the red dots. Notice that the CA calendars don’t have a great amout of structure in them, i.e. there are not any clear trends with resprect to the stock-to-usage values. These calendar seem to be very range bound. Below we show the feature importances for each of the calendar spreads. We can see that the stock-to-usage feature plays the biggest roles in the predictive models. calRef R squared mean cv score std cv test score HK -0.3969071 -0.1170219 0.1461436 KU 0.1598760 -0.0066988 0.3529586 UZ -0.2191712 -0.3233438 0.5515606 ZH 0.0930855 0.0155175 0.1475874 The model predictions together with the latest values of the spreads are shown in the plot below. The shaded regions shows the 25th to 75th percentile of the model predictions. Median predictions are represented by the solid black line. Current C calendar spread values are shown by the red dots. Below we show the calendar spread model sensitivities. These plots measure how a change in the stock-to-usage affects the model prediction of the spread. Similar to before the shaded region represents the 22th to 75th percentile of the model predictions with the median represented by the solid black line. The mean is given by the dashed red line. Here we set the daysdiff feature to 10 days prior to expiry. # Remarks • Currently this spreads is trading on par given the current stock-to-use numbers • This trade can be thought of as a possible roll return trade given high stocks on C and low stocks in CA • Historically, assuming a long CA position we pay away a meadian roll of 16% to hold this position • Currently, from H20 to Z20 we are earning$5.97/mt, this is can increase to 20.97 if C stocks remain high
• CA calendars have a tendancy to trade range bound
##### Mauritz van den Worm
###### Portfolio Manager and Quantitative Researcher

My research interests include the use of artificial intelligence in managing commodity portfolios