1 Introduction

Here we explore the viability of modelling the price of soybeans as a function of stock-to-usage. The market receives new information about the state of global stocks once a month after the WASDE reports have been published. As the global balance sheets change during the course of the season the expectation of the stock levels left over at the end of the season changes. We aim to model the soybean price along the futures curve as a function of stock-to-usage percentages of the major producing and consuming nations. We add a proxy for energy by looking at the average WTI crude price during the prior month. Furthermore we also consider the dollar strength as measured by the dollar index.

The plot below shows the evolution of the soybean stock-to-usage numbers for the United States and World levels.

We want to connect these stock-to-usage numbers with price of the corresponding soybean futures contracts. To do this we connect the price data between two successive WASDE reports with the first report and aggregate the results. As an example consider two reports dated 2018-05-11 and 2018-06-12 respectively. All price data between those two dates are associated with the first date.

The images below give a graphical representation of the data. The x- and y-axes represent the Stock-to-Usage and Price of the July contract respectively.
From the images we can distinguish between two different regimes roughly corresponding to before and after 2007. This can be seen by the clear separation between blue and orange points in the plot below. I am not sure if there is some fundamental reason for the separation of not. It might have to do with pre and post GMO.

2 Deterministic Model

From the bubble chart above it looks like a linear model should be sufficient to model the July corn price as a function of stock-to-usage. Here we look at a couple slight modifications to improve upon the simple linear model. It looks like the data clumps together around $4 for stock-to-usages greter than 15. We see that the prices are decreasing at a slower rate with increasing stock-to-usage numbers. Linear models on the other hand assume a constant rate of decrease. Here we look at two alternative models, a power-law and exponential model, both of which have decreasing rates of change.

To find the best model it amounts to looking at the three different graphs below and deciding which has the best linear fit to the data. The equation describing the models are given below

Linear: \[ y = x \times m + c \]

Power-Law: \[ y = x^{m} \times \exp\left(c\right) \]

Exponential: \[ y = \exp\left( x \times m + c \right) \] In all three equations above $x$ and $y$ represent stock-to-usage and price respectively.

By eye the results look fairly similar. The table below summarises the results and provides the coefficients to plug into the model. The best fit models seems to be the Power-Law model. For best results we recommend averaging the results of the three models.

The table below summarises the results of the model fitting. Each cell shows the R-squared value of the fit. The models with the greatest R-squared values are shown at the top. From this naive in-sampe point of view we can see that United States Corn Stock-to-Usage is the best predictor followed by world and world withouth China Stock-to-Usages. The next most important features are Mean Month Prior Crude and the Dollar Index. In the following we have a closer look at the relationship between price and the main predictive features according to the table below.

variable	exponential	linear	power law
unitedstates_Oilseed, Soybean_s2u	0.4209837	0.4127324	0.5584211
crude	0.5774028	0.5669530	0.5481382
dollarindex	0.4116129	0.4001928	0.4159651
worldnochina_Oilseed, Soybean_s2u	0.2626859	0.2694793	0.2779572
world_Oilseed, Soybean_s2u	0.2121701	0.2168694	0.2126130
argentina_Oilseed, Soybean_s2u	0.1818140	0.1872721	0.2004916
brazil_Oilseed, Soybean_s2u	0.0014916	0.0008604	0.0057191
china_Oilseed, Soybean_s2u	0.0136528	0.0137879	0.0022092

The model with the best fit is a power-law using the United States stock-to-usage percentage as input variable. The model coefficients are given in the table below.

model	Rsq	m	c	code	variable
linear	0.4624023	6.3574635	614.254303	F	crude
power law	0.4474239	0.3966680	5.283570	F	crude
exponential	0.4773726	0.0055661	6.561821	F	crude
linear	0.3959766	-29.2414559	1360.489521	F	unitedstates_Oilseed, Soybean_s2u
power law	0.5679050	-0.3046050	7.625807	F	unitedstates_Oilseed, Soybean_s2u
exponential	0.4139891	-0.0257595	7.216716	F	unitedstates_Oilseed, Soybean_s2u
linear	0.5424828	6.6700579	603.025219	H	crude
power law	0.5219749	0.4065045	5.254004	H	crude
exponential	0.5545073	0.0058559	6.551261	H	crude
linear	0.3943966	-28.0914807	1356.736954	H	unitedstates_Oilseed, Soybean_s2u
power law	0.5484322	-0.2878310	7.594478	H	unitedstates_Oilseed, Soybean_s2u
exponential	0.4042912	-0.0246940	7.213343	H	unitedstates_Oilseed, Soybean_s2u
linear	0.5754952	6.6631429	610.495889	K	crude
power law	0.5451711	0.4036271	5.273894	K	crude
exponential	0.5825902	0.0058176	6.561012	K	crude
linear	0.4056673	-27.7992613	1361.595383	K	unitedstates_Oilseed, Soybean_s2u
power law	0.5543910	-0.2793798	7.582172	K	unitedstates_Oilseed, Soybean_s2u
exponential	0.4162098	-0.0244309	7.218297	K	unitedstates_Oilseed, Soybean_s2u
linear	0.5669530	6.5402082	625.851025	N	crude
power law	0.5481382	0.3958488	5.313132	N	crude
exponential	0.5774028	0.0057320	6.573448	N	crude
linear	0.4127324	-27.0962096	1360.036438	N	unitedstates_Oilseed, Soybean_s2u
power law	0.5584211	-0.2708220	7.567570	N	unitedstates_Oilseed, Soybean_s2u
exponential	0.4209837	-0.0237661	7.217073	N	unitedstates_Oilseed, Soybean_s2u
linear	0.5871743	6.4221131	629.478807	Q	crude
power law	0.5631550	0.3848068	5.356353	Q	crude
exponential	0.5968094	0.0056842	6.573504	Q	crude
linear	0.3962969	-24.3933370	1327.733312	Q	unitedstates_Oilseed, Soybean_s2u
power law	0.5360235	-0.2494218	7.515780	Q	unitedstates_Oilseed, Soybean_s2u
exponential	0.4038128	-0.0216175	7.191769	Q	unitedstates_Oilseed, Soybean_s2u
linear	0.6045228	5.9495116	645.685701	U	crude
power law	0.5761461	0.3591547	5.449976	U	crude
exponential	0.6124659	0.0053781	6.581074	U	crude
linear	0.3663650	-20.6215783	1271.465254	U	unitedstates_Oilseed, Soybean_s2u
power law	0.4980346	-0.2180352	7.431469	U	unitedstates_Oilseed, Soybean_s2u
exponential	0.3710724	-0.0186385	7.146733	U	unitedstates_Oilseed, Soybean_s2u
linear	0.7080435	5.9742500	641.017516	X	crude
power law	0.6742210	0.3633579	5.431150	X	crude
exponential	0.7172283	0.0054647	6.572906	X	crude
linear	0.3206284	-17.5898818	1238.626902	X	unitedstates_Oilseed, Soybean_s2u
power law	0.4324261	-0.1894400	7.365899	X	unitedstates_Oilseed, Soybean_s2u
exponential	0.3201690	-0.0159748	7.118519	X	unitedstates_Oilseed, Soybean_s2u

Taking the values from the table above we plot the model predictions in blue. The latest USDA United States stock-to-usage is given by the vertical orange line. The horizontal orange line gives the latest S N9 price. The results can be interpreted in two ways. If we take the USDA numbers as the truth we need to see a downward adjustment in price. On the other hand we can imply a stock-to-usage from the latest price. Currently this number is much less than that reported by the USDA.

The plot below takes the data from above and constructs a futures curve for each input feature and model.

3 Probabilistic Model

If we discretise the stock-to-usage percentages we are able to do some statistics on the values of the prices given stock-to-usage (or any other feature) in the discretised basket. In this way we can perform Bayesian statistics on the prices, i.e. given a forecast on the stock-to-usage we can determine the probability that the price is contained withing some interval.

In the subsections below we show plots of the price statistics when the value of the underlyiing feature falls within the bucket specified on the x-axis. The solid black line shows the median price. The light and dark shaded regions show the 10th to 90th and 25th to 75th percentiles. The fat of the distributions lie withing the dark shaded region. For reference we also show the USDA and Polar Star fundamental forecast together with the latest price data. These are represented by the vertical and horizontal lines respectively. The same data used to create the images is also given in tabular form below the plots.

3.1 United States Stock-to-Usage

	p10	p25	p50	p75	p90
(4.03,6.14]	1229.950	1281.2500	1372.250	1427.500	1464.550
(6.14,8.22]	899.150	940.6250	980.625	1061.688	1212.850
(8.22,10.3]	995.300	1021.1250	1065.500	1302.875	1421.100
(10.3,12.4]	897.825	957.0625	994.500	1030.000	1062.675
(12.4,14.5]	878.000	892.6875	996.875	1043.000	1059.125
(16.6,18.7]	880.400	887.3750	899.750	913.250	920.800
(18.7,20.7]	891.250	900.8125	912.000	922.375	934.800
(20.7,22.8]	852.200	896.0000	912.500	924.875	934.600
(22.8,24.9]	908.500	920.1250	932.250	945.000	949.000

3.2 World Stock-to-Usage

	p10	p25	p50	p75	p90
(23.3,25.3]	1011.35	1052.3750	1291.000	1429.875	1504.60
(25.3,27.3]	917.00	977.9375	1309.625	1403.812	1452.00
(27.3,29.3]	941.20	996.0000	1255.750	1376.250	1419.50
(29.3,31.2]	956.15	977.5000	1034.000	1166.750	1249.95
(31.2,33.2]	879.70	904.0625	1046.125	1262.250	1315.15
(33.2,35.2]	897.40	957.0000	1040.250	1241.625	1376.35
(35.2,37.2]	945.50	982.5000	1003.500	1019.000	1035.25
(37.2,39.2]	877.15	892.6875	913.750	937.750	1092.95
(39.2,41.2]	931.05	970.2500	997.000	1048.875	1062.60
(41.2,43.2]	920.35	931.7500	947.500	990.875	1023.20

3.3 World Stock-to-Usage without China

	p10	p25	p50	p75	p90
(15.8,16.9]	976.050	1318.6250	1396.00	1444.8750	1507.850
(16.9,18]	996.700	1268.7500	1369.25	1425.0000	1454.850
(18,19.1]	942.850	1018.7500	1223.00	1347.6250	1396.550
(19.1,20.3]	899.200	947.2500	1016.50	1150.0000	1240.350
(20.3,21.4]	881.875	913.7500	984.50	1058.2500	1285.500
(21.4,22.5]	911.100	989.4375	1022.00	1246.1875	1307.675
(22.5,23.6]	896.750	914.5000	968.25	977.5000	996.000
(23.6,24.7]	867.875	890.8125	909.25	919.4375	932.125
(24.7,25.8]	891.250	915.5000	942.75	1052.7500	1106.750
(25.8,27]	941.550	961.5000	985.50	1023.1250	1056.350

3.4 Mean Crude

	p10	p25	p50	p75	p90
(33.3,43.8]	872.700	883.0000	898.750	956.250	1001.100
(43.8,54.3]	905.000	963.6250	995.000	1022.312	1049.750
(54.3,64.7]	902.800	923.0000	983.250	1022.000	1053.250
(64.7,75.2]	890.400	910.1875	933.000	987.500	1056.025
(75.2,85.6]	959.350	991.6250	1040.000	1225.500	1295.050
(85.6,96]	1108.225	1284.5625	1382.375	1433.250	1474.675
(96,106]	1104.500	1235.0000	1299.750	1389.750	1448.050
(106,117]	1148.475	1230.4375	1253.375	1322.688	1367.750
(117,127]	1226.050	1268.8125	1333.125	1370.500	1390.950
(127,138]	1273.600	1336.7500	1428.250	1450.625	1488.900

3.5 Dollar Index

	p10	p25	p50	p75	p90
(72.1,75.2]	1254.600	1297.500	1363.500	1396.250	1442.750
(75.2,78.2]	986.500	1057.750	1214.500	1317.125	1382.750
(78.2,81.3]	950.000	1009.500	1262.250	1402.875	1449.500
(81.3,84.3]	948.500	1025.000	1252.750	1417.500	1500.300
(84.3,87.3]	866.875	921.750	1002.125	1039.438	1062.875
(87.3,90.3]	950.900	994.500	1039.500	1056.000	1063.350
(90.3,93.4]	993.100	1000.812	1007.000	1016.375	1023.550
(93.4,96.4]	891.500	912.500	956.250	991.500	1015.750
(96.4,99.4]	873.700	887.500	913.375	950.250	1014.125
(99.4,103]	956.800	974.125	1027.750	1053.875	1070.700

4 Ensemble Model

We have created ensemble machine learning models that predict the soybean price along the futures curve. These models take as inputs the stock-to-usage percentages of the top soybean producing and consuming nations together with the dollar index and month prior average crude price as proxies for the US Dollar and energy respectively.

The ensemble models we create are all random forest regression models. We create a train and test split and perform hyper parameter tuning on the training set using 3 fold cross-validation. Ensemble models are a natural extension of the single variable deterministic models in that they are able to gain from possible interactions between the different input features.

From the best models we determine the variable importance of all the input features. The results are sumarised in the plot below. The greater the importance the larger the effect of that feature on the predicted values. The most important feature for each of the two different classes of wheat and contract codes are highlighted in orange.

Notice that the features with greatest importance is crude and uniteststates_Oilseed, Soybean_s2u. In all the cases shown above these two features make up more than 70% of the variable importance of the ensemble models. The table below gives the R-squared values of the ensemble models fitted to the data. Notice the significant improvement over the deterministic models.

In the plot below we aggregate all the feature importances along the curve into a single representation.

	R squared
K	0.89
N	0.85
Q	0.82
U	0.88
X	0.79
F	0.83
H	0.64

4.1 United States Stock-to-Usage Sensitivity

As the United States Stock-to-Usage percentages increase we expect the price of soybeans to decrease. This intuition is confirmed in the plots below. The y-and x-axis show the prediction and value of United States Stock-to-Usage respectively. Here we fix all parameters to the latest WASDE numbers, but allow the value of United States Stock-To-Usage the change. In the plots below we see the quasi monotonic decreasing relationship between the two variables. We can also see transition values that resembles a phase transition for values of United States Stock-to-Usage around 6.

4.2 Crude Sensitivity

As the cost of energy increases we expect the price of soybeans to increase. This intuition is confirmed in the plots below. The y-and x-axis show the prediction and value of crude respectively. Here we fix all parameters to the latest WASDE numbers, but allow the value of the prior month crude price the change. In the plots below we see the monotonic increasing relationship between the two variables. We can also see an elbow forming at crude prices greater than 50. We are currently at crude prices greater than this which might signify a possible increase in the price of soybeans.

5 Remove Crude

Here we create models without crude to compare with the previous models. Below we show the feature importances of these new models. As we might have expected, the most dominant features are the ones that were next in line in the original models above.

6 Only Crude an United States stock-to-usage

Here we create models using only crude and United States Stock-to-usage to compare with the previous models. Below we show the feature importances of these new models.

The table below shows the R-squared values of the models with and without crude. Notice that the models that contain crude as a feature perform slighlty better than those without crude. Overall the results withour crude are still good.

code	all features	only crude and USA	without crude
F	0.83	0.84	0.83
H	0.64	0.69	0.65
K	0.89	0.85	0.88
N	0.85	0.84	0.79
Q	0.82	0.87	0.87
U	0.88	0.95	0.69
X	0.79	0.88	0.87

7 Predictions

The plot below shows the ensemble model predictions for USDA forecasted fundamentals. It is difficult to pin down the value of crude, so we consider a range of values form 55 to 65. Furthermore we consider all the predictions from each of the decision trees model to determine prediction statistics. The normal output of a collection of regression trees is the mean of all the predictions. In the plot below we sohw the 25th to 75th percentiles of the predicted prices, this corresponds to the area between the two gray curves. The latest price data is represented by the black curve. The median model prediction is shown in blue. here we use the median as it is les likely to be skewed by possible outliers.

8 Comments

Data driven models are only as good as the data that is used in the modeling process. In the current case we have the highest stock-to-usage numbers of United States soybeans in two decades. For this reason the number of samples we have with similar levels are pretty small and the results are difficult to confirm. The closest numbers to these are those in 2006/7. During this period we also had a regime change in the soybean market. Taking this into account we cannot make a clear call on the soybean price given the current stock-to-usage scenario.

Soybean price vs Stock-to-Usage