# Introduction

In previous posts we have explored ideas on how to construct fundamental models for forecasting the price of corn and soybean. These models used as input parameters the stock-to-usage numbers calculated from the monthly WASDE reports together with the Dollar index, the mean value of crude in the previous month and the Ruble vs Dollar exchange rate. The aim of this report is to extend these results to a spread between two related commodities, in this case Kansas and European wheat. We use the shorthand KW vs CA for this pair.

When we calculate the price difference between two commodities in the strategy Commodity 1 vs Commodity 2 we follow the convention

$S^A = P^A_2 - P^A_1$ where $$P^A_i$$ is the price of contract with symbol $$A$$ for commodity $$i$$. After all the spreads have been calculated we associate the spreads between two consecutive WASDE reports with the timestamp of the first report. This way we assume that the WASDE reports reflect reality with repect to the underlying fundamentals and we want to study the subsequent spread behaviour. Similar to the spread convention we use the difference between the two fundamentals defined by

$\delta = F_2 - F_1.$ Here $$F_i$$ is the value of the particular fundamental of commodity $$i$$. In this example we use stock-to-usage values. We collect all the different $$\delta$$ values ang group them into decile buckets. This amounts to dividing all the different $$\delta$$’s into ten bucket all containing more or less the same number of entries. Withing these decile buckets we then calculate statistics on the spreads $$S^A$$.

The different plots below show the spread statistics of each commodity contract code calculated withing the fundamental buckets shown on the x-axis. The darker and ligher shaded regions show the 25th to 75th and 5th to 95th percentiles respectively. The solid black lines shows the median spread. The latest value and associated decile bucket of the spread is indicated by the red point on each of the facets. From the plots below we can see that the KW vs CA spread is on the upper range of the spread values given the current fundamentals.

In anticipation of the feature importance results we show the spread in the different contract months as a function of fraction which in this case refers to European/Kansas stock-to-usage.

Another way to look at the prices of two related commodities is the calculate their price ratios. Similar to the spreads we use the convention

$R^A = P^A_2/P^A_1$ to calculate the ratio of prices between commodities 1 and 2. Continuing with this convention we define the fraction of fundamentals as

$\phi = F_2 / F_1.$

We then follow in exactly the same way we did in the case of the spreads, except we calculate statistics of the price ratios $$R^A$$ within each of the decile buckets of the fundamental ratios $$\phi$$. The interpretation of the plots below are the same of those of the spread case. Notice that when we express the relative value in terms of a price ratio it is trading at an even greater discount.

The plots above can give us a quick idea whether the current values of the spreads or ratios are trading at a premium or discount with respect the the prevailing stock-to-usage numbers reported by the USDA in the monthly WASDE report.

In the following we include more fundamental features in an attempt to incorporate some interaction terms that might be present. Below is a list of new features we include:

• daysdiff - the number of days left until the contract expired
• product - the product of the two stock-to-usage numbers
• comdty1 - stock-to-usage of commodity 1, C is this case
• comdty2 - stock-to-usage of commodity 2, S in this case
• lagged features - denoted by lag 1 where we include the results from the previous report
• delta - absolute changes between current and lagged values
• per - percentage changes between current and lagged values

# A quick note on the data

The European wheat data have undergone a couple of changes over the past where certain contracts have stopped and new ones have started. The table below shows a reduced version of the contracts associated with CA. Notice that the F contract stopped in 2016, while the Q and X contracts stopped in 2013 and 2014 respetively. The U and Z contracts then started in 2015. In order to create a larger historical database for the calendar spreads we associated the X and Z contract with each other. Similarly we associated Q and U with each other.

year F H K N Q U X Z
2009 1 1 1 NA 1 NA 1 NA
2010 1 1 1 NA 1 NA 1 NA
2011 1 1 1 NA 1 NA 1 NA
2012 1 1 1 NA 1 NA 1 NA
2013 1 1 1 NA NA NA 1 NA
2014 1 1 1 NA NA NA 1 NA
2015 1 1 1 NA NA 1 NA 1
2016 NA 1 1 NA NA 1 NA 1
2017 NA 1 1 NA NA 1 NA 1
2018 NA 1 1 NA NA 1 NA 1
2019 NA 1 1 NA NA 1 NA 1
2020 NA 1 1 NA NA 1 NA 1
2021 NA 1 1 NA NA 1 NA 1
2022 NA 1 1 NA NA 1 NA NA

When determining the spreads we use the convention outlined in the table below. Notice that currently the CA wheat contract does not have an N contract, so in terms of the strategy code N we use the KW N and the CA U contracts.

code KW CA
H H H
K K K
N N U
U U U
Z Z Z

# Modeling the Spread

In a previous post we showed some examples how we use techiniques from machine learning in out investment process. One techinique we find particularly interesting is the study of feature importance when using random forests in the modelling process. Similar to what we have done in previous posts (here, here and here) we explore the importance of each of the features we have used in the modelling process.

The barplots below show the feature importance of each of the input features for the different contract codes. The feature with greatest importance is highlighted in orange. The dashed red line shows the value of importance if all the features were equally important. It is interesting to note that the differnt contract codes have vastly different features gives the best results for the majority of cases.

The plot below shows the aggregated feature importance values.

The table below shows the model fit statistics for each contract code.

code R squared mean cv score std cv test score
H 0.3989448 0.4821566 0.1012933
K 0.6523919 0.3113692 0.2592643
N 0.5484700 0.3557384 0.1390042
U 0.5677014 0.4165228 0.1728039
Z 0.5750890 0.5391855 0.1560953

The model predictions together with the latest values of the spreads are shown in the plot below. The shaded regions shows the 25th to 75th percentile of the model predictions. Median predictions are represented by the solid black line. Current KW vs CA spread values are shown by the red dots. Notice that all the spreads, except that of U, is containing within the model error brackets. They are all on the lower en of the forcasted range.

# Modeling the Ratio

This section follows the same outline as the previous section with the exception that we replace the spread with the ratio. The barplots below show the feature importance of each of the input features for the different contract codes. The feature with greatest importance is highlighted in orange. The dashed red line shows the value of importance if all the features were equally important. It is interesting to note that the fraction feature gives the best results for the majority of cases.

The plot below shows the aggregated feature importance values. From this plot it is even more clear the the fraction feature clearly dominated the predictive power of the model.

The table below shows the model fit statistics for each contract code.

code R squared mean cv score std cv test score
H 0.5124372 0.1628780 0.2476488
K 0.5606533 0.1376243 0.2390249
N 0.2950367 0.0058866 0.2761410
U 0.4436313 0.0079530 0.2857539
Z 0.5909700 0.1115113 0.3326622

The model predictions together with the latest values of the ratios are shown in the plot below. The shaded regions shows the 25th to 75th percentile of the model predictions. Median predictions are represented by the solid black line. Current KW vs CA ratio values are shown by the red dots.

# Roll Structure

In this section we explore how this structure rolls. Throughout we assume a long position in CA and a short position in KW. We also assume a 2.72:1 ratio when trading this relative, i.e. long 2.72 units of CA and short one unit of KW. This forces the trade to be equivalent on the metric tonne basis. We only consider data from 1 Jan 2009 onward. The table below shows the roll structure we follow. Consider the first row which represents Januray. In this case we want to have positions in the H contract for both KW and CA. During February we roll each of those positions forward to the K contracts.

month CA KW
1 H H
2 K K
3 K K
4 U U
5 U U
6 U U
7 Z Z
8 Z Z
9 Z Z
10 H H
11 H H
12 H H

In the facet plot below we show the ETF price and Cummulative Roll Gaps on the left and right respectively. The ETF price shows the value of \$1 invested in the spread at the beginning of the time series. The Cummulative Roll Gaps shows the cummulative difference in price (USD/mt) when the structure is rolled forward. Note that the shape of the Cummulative Roll Gaps greatly determines the shape of the evolution of the ETF price. This is the case for most relative value pairs where the term structure of the different commodities plays a major role in the overall return profile. From the data is it clear that sustained periods of positive return coinside with periods where the Cummulative Roll Gaps has a zero or positive slope.

The plot below takes the ETF price data and stackes the normalised data together from the 1st of January to the 31 of December for each year. Within each month we calculate statistics. The 25th to 75th and 5th to 95th percentiles are given by the darker and ligher shaded regions respectively. The median yearly return is represented by the solid black line.. Superimposed on top of the universe ribbon plot the the data for 2018 and as well as 2019. From the plot it is clear that under normal circumstances we can expect to pay away a roll yield of 8% of the capital allocated to this strategy on a yearly basis.

Below we show the futures curves of KW and CA. In order to have a flat or positive roll yield we require the curve of the long commodity (CA) to have a slope that is less than that of the short commodity (KW). In the plot below we convert both commodities to USd/Mt. By eye it is clear that KW is steaper than CA.

The table below summarises the results. The columns are pretty self-explanatory, save for KW change and CA change which show the differnce between consecutive prices, i.e. the value of the calendar spread. The Roll Yield the difference between the change in KW and CA. If the roll yield is positive it indicates favourable roll period. Note that currently the Roll Yields is favourable for the majority of the curve.

identifier daysdiff KW price CA price KW change CA change Roll Yield
H 2020 46 181.05 214.90 NA NA NA
K 2020 108 183.90 213.23 2.85 -1.67 4.52
N 2020 169 186.75 208.50 2.85 -4.73 7.58
U 2020 231 189.78 208.50 3.03 0.00 3.03
Z 2020 322 193.91 211.00 4.13 2.50 1.63
H 2021 410 197.96 212.11 4.05 1.11 2.94
K 2021 473 198.97 213.23 1.01 1.12 -0.11
N 2021 534 196.12 209.05 -2.85 -4.18 1.33
U 2021 596 197.59 209.05 1.47 0.00 1.47
Z 2021 687 202.27 211.56 4.68 2.51 2.17
H 2022 777 203.84 213.51 1.57 1.95 -0.38
K 2022 837 203.84 214.62 0.00 1.11 -1.11
N 2022 899 200.25 209.05 -3.59 -5.57 1.98

Note that the roll yield is the difference between to calendar spreads. In the following we use Kansas and European wheat stock-to-usage to model the different calendarspreads involved in the roll process. In this way, if we have to fundamental point of view on the underling stock-to-usage values we can get an idea of how the spread as well as the roll will be affected.

## Kansas Calendars

Below we show the spread statistiscs of the corn calendar spreads making up the roll process. As betore the current values of the spreads are indicated by the red dots.

Below we show the feature importances for each of the calendar spreads. We can see that the stock-to-usage features plays the biggest roles in the predictive models.

calRef R squared mean cv score std cv test score
HK 0.5740849 0.1527344 0.3840133
HN 0.7768749 0.6004284 0.0586316
KN 0.7946587 0.5112871 0.1456354
NU 0.5009831 0.4752546 0.0555437
NZ 0.4038643 0.5158401 0.0639208
UH 0.5984948 0.2509739 0.0407539
UZ 0.4813549 0.2454571 0.0875155
ZH 0.3845530 0.1909481 0.1789459
ZN 0.7395113 0.4894691 0.1861844

The model predictions together with the latest values of the spreads are shown in the plot below. The shaded regions shows the 25th to 75th percentile of the model predictions. Median predictions are represented by the solid black line. Current C calendar spread values are shown by the red dots. The calendar spreads show some potential to become more contango if the current stock conditions remain unchanged.

Below we show the calendar spread model sensitivities. These plots measure how a change in the stock-to-usage affects the model prediction of the spread. Similar to before the shaded region represents the 22th to 75th percentile of the model predictions with the median represented by the solid black line. The mean is given by the dashed red line. Here we set the daysdiff feature to 10 days prior to expiry.

## European Wheat Calendars

Below we show the spread statistiscs of the soybean calendar spreads making up the roll process. As before the current values of the spreads are indicated by the red dots. Notice that the CA calendars don’t have a great amout of structure in them, i.e. there are not any clear trends with resprect to the stock-to-usage values. These calendar seem to be very range bound.

Below we show the feature importances for each of the calendar spreads. We can see that the stock-to-usage feature plays the biggest roles in the predictive models.

calRef R squared mean cv score std cv test score
HK -0.3969071 -0.1170219 0.1461436
KU 0.1598760 -0.0066988 0.3529586
UZ -0.2191712 -0.3233438 0.5515606
ZH 0.0930855 0.0155175 0.1475874

The model predictions together with the latest values of the spreads are shown in the plot below. The shaded regions shows the 25th to 75th percentile of the model predictions. Median predictions are represented by the solid black line. Current CA calendar spread values are shown by the red dots.

Below we show the calendar spread model sensitivities. These plots measure how a change in the stock-to-usage affects the model prediction of the spread. Similar to before the shaded region represents the 22th to 75th percentile of the model predictions with the median represented by the solid black line. The mean is given by the dashed red line. Here we set the daysdiff feature to 10 days prior to expiry.

# Remarks

• Currently this spreads does not stand out given the prevailing stock conditions.
• The current KW calendars seem to be pricing in stock-to-usage in the 45% range
• CA calendars have a tendancy to trade range bound

For more researh on how to best express this trade the interested reader is referred to

##### Mauritz van den Worm
###### Portfolio Manager and Quantitative Researcher

My research interests include the use of artificial intelligence in managing commodity portfolios