Kickstart your Probabilistic Forecasting with Level Set & Quantile Regression Forests

Kickstart your Probabilistic Forecasting with Level Set & Quantile Regression
Forests Inge van den Ende, Forecasting Tech Lead 26 September 2025

2 Understanding uncertainty is essential in decision making

Point forecasts donʼt give any information about this uncertainty t
0 Time Actuals Forecast Point forecast ?Uncertainty? Energy production

4 A probabilistic forecast provides information about the uncertainty t
0 Time Actuals Forecast Point forecast t 1 Time y Actuals Forecast Point forecast q80 q60 q40 q20 slice q80 q20 Energy production

5 A probability density function describes the uncertainty Energy MWh
Probability density Actual Forecast

6 A probability density function describes the uncertainty Energy MWh
Probability density Actual Forecast

Core trade-off: calibration vs. sharpness Energy MWh Probability density Actual
Theoretical quantile Sample quantile q99 q99 0 Calibration Sharpness q20 q30

8 A probabilistic forecast provides information about the uncertainty t
0 Time Actuals Forecast Point forecast t 1 Time y Actuals Forecast Point forecast q80 q60 q40 q20 slice q80 q20 Energy production

9 Where to start?

10 Practical probabilistic models to the rescue: Level Set Forecaster
Quantile regression forest

11 Practical probabilistic models to the rescue: Level Set Forecaster
Quantile regression forest

12 Train → predict: the standard ML workflow Train Train
Predict Predict f LGBMRegressor().fit(X_train, y_train) model.predict(X_test)

13 The Level Set Forecaster follows the same workflow Hasson,
2021 Train Predict

14 The first step is training the model as usual
## Training # 1. Train a point forecast model model = LGBMRegressor().fit(X_train, y_train) # This can be any point forecast model Train Train Predict Hasson, 2021

15 In sample prediction to have a prediction per observation
In-sample predict f Train Train Predict # 2. Predict for the train set point_predictions_train_set = model.predict(X_train) Hasson, 2021

16 Level sets group observations of the same prediction level
In-sample predict f Train Train Bin to level sets In-sample predictions Actual values Level set Hasson, 2021 q80 q20 Predict

17 Actual values Level set Hasson, 2021 q80 q20 In-sample
prediction 1.3 2.1 2.1 Level sets group observations of the same prediction level

18 In-sample predict f Train Train Predict Bin to level
sets # 3. Group the training samples based on the prediction sorting_index = np.argsort(point_predictions_train_set) point_predictions_sorted = point_predictions_train_set[sorting_index] target_sorted_by_prediction = y_train[sorting_index] bin_size = 500 # This is a hyperparameter of LSF num_bins = int(np.ceil(len(y_train) / bin_size)) left_thresholds = [-np.inf] + list(point_predictions_sorted[bin_size::bin_size]) target_per_bin = [target_sorted_by_prediction[i * bin_size : (i + 1) * bin_size] for i in range(num_bins)] Hasson, 2021 Level sets group observations of the same prediction level

19 The usual point prediction enables the next step In-sample
predict f Train Train Predict Bin to level sets Predict f ## Predicting # 1. Predict with the point forecast model point_predictions = model.predict(X_test) Hasson, 2021

20 The samples in the level set are our probabilistic
prediction In-sample predict f Train Train Predict Bin to level sets Predict f # 2. Select the bin probabilistic_predictions = [] for i, prediction in enumerate(point_predictions): bin_index = np.argmax(left_thresholds >= prediction) probabilistic_predictions.append([target_per_bin[bin_index]]) Retrieve level set Hasson, 2021

21 Your first probabilistic forecast! q80 q20 In-sample predict f
Train Train Predict Bin to level sets Predict f Retrieve level set Note: you can find the LSF in gluonts

22 The Level Set Forecaster workflow In-sample predict f Train
Train Predict Bin to level sets Predict f Retrieve level set

Downside of the Level Set Forecaster Example of time series
forecast Level set forecaster Hasson, 2021

24 Quantile regression forest Level Set Forecaster Quantile regression forest

The Quantile Regression Forest follows the same workflow Predict Train

26 The first step is training the Random Forest as
usual ## Training # 1. Train a point forecast model: Random Forest model = RandomForestRegressor().fit(X_train, y_train) Train Predict Train Meinhausen, 2006

27 Random Forest averages the predictions of every tree 2.3
3.5 1.7 2.5

28 Quantile Regression Forest groups the samples per leaf Meinhausen,
2006

29 # 2. Predict for the train set with every
tree of the Random Forest listed_predictions_per_tree = [tree.predict(X_test) for tree in model.estimators_] predictions_per_tree = np.column_stack(listed_predictions_per_tree) # Every column is a tree and every row is an observations In-sample predict f Train Predict Train Meinhausen, 2006 Quantile Regression Forest groups the samples per leaf

30 Resample In-sample predict f Train Predict Bin per leaf
Predict f Retrieve leaves Train # 3. Group the true values based on prediction (leaf value) for every tree leaf_values_per_tree: List[Dict[float, np.ndarray]] = [ ( pd.DataFrame( {"predictions": predictions_per_tree[:, i], "true_values": y_train.copy()} ) .sort_values("predictions") .groupby("predictions")["true_values"].apply(list).to_dict() ) for i in range(model.n_estimators) ] # Every dictionary in the list is a tree: # every key a predicted value of a leave, the items are the true values in that leaf Meinhausen, 2006 Quantile Regression Forest groups the samples per leaf

31 Resample In-sample predict f Train Predict Bin per leaf
Train # 4. Resample the true values to have a constant leaf weight leaf_size = 100 # the number of values in every leaf after resampling balanced_values_per_leaf = [ { leaf: resample(true_values, sample_size=leaf_size) for leaf, true_values in tree.items() } for tree in leaf_values_per_tree ] Meinhausen, 2006 Quantile Regression Forest groups the samples per leaf

32 The usual point prediction enables the next step Resample
In-sample predict f Train Predict Bin per leaf Predict f Train ## Predicting # 1. Predict with every tree of the random forest listed_predictions_per_tree = [tree.predict(X_test) for tree in model.estimators_] predictions_per_tree = np.column_stack(listed_predictions_per_tree) # Every column is a tree and every row is a sample Meinhausen, 2006

33 Selecting the samples per leaf gives us a prediction
Resample In-sample predict f Train Predict Bin per leaf Predict f Retrieve leaves Train # 2. Collect all true values of the relevant leave per tree probabilistic_predictions_QRF = np.zeros( [predictions_per_tree.shape[0], predictions_per_tree.shape[1] * leaf_size] ) for sample in range(predictions_per_tree.shape[0]): # Loop over all samples for tree in range(predictions_per_tree.shape[1]): # Loop over all trees start_idx = tree * leaf_size end_idx = start_idx + leaf_size probabilistic_predictions_QRF[sample, start_idx:end_idx] = ( balanced_values_per_leaf[tree][predictions_per_tree[sample, tree]] ) Meinhausen, 2006

34 Your probabilistic forecast! Resample In-sample predict f Train Predict
Bin per leaf Predict f Retrieve leaves Train q80 q20 Note: you can find the QRF in gluonts

Resample The Quantile Regression Forest has six steps In-sample predict
f Train Predict Bin per leaf Predict f Retrieve leaves Train

36 How do the methods compare? Level Set Forecaster Quantile
regression forest

37 How do the methods compare? Level Set Forecaster Quantile
Regression Forest Grouping Based on predictions Based on features Full distribution 🟢 🟢 No assumptions on distribution 🟢 🟢 Good performance 🟢 🟢 Model agnostic 🟢 🔴 Distinct forecast per sample 🔴 🟢 Fast 🟢 🟠

How do the methods compare? Example of time series forecast
Level set forecaster Quantile regression forest How valuable for your project? Sufficient information in features? Performance of Random Forest in your project? Which do you choose?

Key takeaways about probabilistic forecasting 39 With these two models
you can start today Tailoring evaluation to probabilistic metrics is crucial to boost performance Probabilistic forecasts provide valuable information for decision making

Thank you! Read my blog about probabilistic forecasts

Kickstart your Probabilistic Forecasting with L...

Kickstart your Probabilistic Forecasting with Level Set & Quantile Regression Forests

More Decks by Inge van den Ende

Other Decks in Programming

Featured

Transcript