Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Soybean Futures Prices Predictions

hmnshu
November 08, 2019
23

Soybean Futures Prices Predictions

Our project aims to help build better prices forecasting models by understanding patterns in Time-Series Data.

Our model detects patterns and makes next five days prices(EOD) forecast for three different Soybean Future Contracts (March, May and July).

hmnshu

November 08, 2019
Tweet

Transcript

  1. Introduction • Our project aims to help build better prices

    forecasting models by understanding patterns in Time-Series Data. • Stock market and News Sentiments play a huge role in determining the Commodity Futures Prices . • Predicting future prices is difficult. This makes it challenging to build traditional rule-based algorithms to extract time-driven features. The Deep Learning Algorithms, however, is good at extracting features. • Similarly, our model detects patterns and makes next five days prices(EOD) forecast for three different Soybean Future Contracts (March, May and July). 3 Icon Source: flaticon.com/authors/freepik
  2. Business Understanding – Soybean Futures Producers and Consumers: All soybeans

    futures contracts require the traders to put up the initial margin and a maintenance margin and comes with contract expiration months. Uses: It is widely used as meal, as well as a source of oil. Major Exporter: Brazil , U.S., Argentina (90% Soybean production) Major Importer: China Competitive Crops: Corn 4 Source: harvestpublicmedia
  3. Data Preparation: Soybean Futures Prices General Causal Factors • Trade-war

    and Tariffs (Trump tweets) • Seasonality – Planting, Blooming, Harvest (by Year) • Supply and Demand (Exports) • Weather – Forest fire Brazil • Competitors – Brazil, Argentina • Widespread Disease (African swine fever, Asian rust) • News – Relevance, Sentiments and Magnitude Market Causal Factors • Last Days Price (EOD Price) • Volume Weighted Average Price (VWAP) – High, Low, Last, Volume • Percentage Change Per Day (Change) • Trade Weighted U.S. Dollar Index • USD-Yuan Exchange Rate • Competitive Crops Prices – Corn, Wheat 5
  4. Correlation with EOD Price • Yuan/USD is inversely proportional •

    News/tweets sentiment are related • 1 is a perfect positive correlation, 0 is no correlation, -1 is a perfect negative correlation • Recursive Feature Elimination: Gives the ranking of all the variables, 1 being most important.
  5. Modelling: Sequence Forecasting - Training Process (Step X) Raw Daily

    Time Series Data/Target Number of rows Feature Engineering Prev-Day EOD Price, Change %, VWAP (Incorporates 7 features) Training Set (Beginning of the Time Series – 70% rows) Ensemble Model (Trained on Whole Data) Test Set (End of the Time Series – 20% rows) Machine Learning LSTM, XGBoost, LightGBM… Validation Set (Middle of the Time Series – 10% rows) TRAINING STEP X PREDICTION Last Sequence from Dataset Predict Predicted Step X Sequence MODEL TRAINING HYPERPARAMETER TUNING PERFORMANCE EVALUATION [949.69, 943.44, 943.44…] [952.69] 7 Recursive Feature Elimination: Gives the ranking of all the features, 1 being most important.
  6. Model Complexity Model for Each Day (Step X) and Each

    Contract Data Day 1 Prediction Day 2 Prediction Day 3 Prediction Day 4 Prediction Day 5 Prediction Predictors: Previous days EOD, VWAP, CHANGE, DTWEXB Previous days EOD, VWAP, CHANGE, DTWEXB Previous days EOD, VWAP, CHANGE, DTWEXB Previous days EOD, VWAP, CHANGE, DTWEXB Previous days EOD, VWAP, CHANGE, DTWEXB Target: 1 Step EOD 2 Step EOD 3 Step EOD 4 Step EOD 5 Step EOD 8
  7. Evaluation • Sequence Forecasting using LSTM - Next day EOD

    Price - Test Set - 92/520 • Best fit- Test Loss Low + Validation Loss Low • 100+ Different experiments and model prediction. • Verification unseen predicted days • XGBoost - Dropped 9
  8. Model Interpretability • global interpretability — the collective SHAP values

    can show how much each predictor contributes, either positively or negatively, to the target variable. • local interpretability — each observation gets its own set of SHAP values. 10
  9. What we learned? Advantage: • LSTM - Essentially a nonlinear

    timeseries model, where the nonlinearity is learned from the data. • Deployment was extremely good. Sum of absolute error for 3 contracts during the month of November was low. • Prototype in development - 5 Day Prediction Webpage (Compare Predicted vs Real) Disadvantage: • Multi-step prediction gets worse with each step (i.e. day) • Requires models for each day and each contract, Retraining and maintenance is necessary • Unable to tackle instant drops or rises. Future Work: • Fine Tuning more complex models, anomaly detection • Re-Evaluating model performance (Including new causal factors if new arrives) • Experiment new methods –LSTM (seq to seq prediction), Reinforcement Learning 11
  10. Results • MinneMUDAC 2019 Student Data Science Challenge – Top

    Performer https://minneanalytics.org/minnemudac/ • FASTCON Prediction for Soybean Future prices - 2nd http://www.mudac.org/minnemudac/ShowPredictions.php MinneMUDAC 2019 12