-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Description
Hi, I'm not sure if this is the right place to ask, so if it's not, I apologize for the inconvenience.
I was working on sktime using the following code:
from sktime.datasets import load_longley
from sktime.forecasting.ardl import ARDL
from sktime.forecasting.base import ForecastingHorizon
from sktime.split import temporal_train_test_split
import numpy as np
# Load data with exogenous variables
y, X = load_longley()
# Split into train and test using temporal_train_test_split
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X, test_size=5)
print(y_test)
# Select specific exogenous variables
X_train = X_train[["GNPDEFL", "GNP"]]
X_test = X_test[["GNPDEFL", "GNP"]]
print(X_test)
# Create ARDL model with lags and order for exogenous variables
ardl = ARDL(lags=2, order={"GNPDEFL": 1, "GNP": 2}, trend="c")
# Fit the model
ardl.fit(y=y_train, X=X_train)
fh = ForecastingHorizon(np.array([3], dtype="int64"))
y_pred = ardl.predict(fh=fh, X=X_test)
print(y_pred)and for some reason it gave me this error:
Traceback (most recent call last):
File "D:\Self Study\Open Source\Sktime\sktime\sktime\forecasting\tests\testing_my_changes.py", line 25, in <module>
y_pred = ardl.predict(fh=fh, X=X_test)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Self Study\Open Source\Sktime\sktime\sktime\forecasting\base\_base.py", line 459, in predict
y_pred = self._predict(fh=fh, X=X_inner)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\Self Study\Open Source\Sktime\sktime\sktime\forecasting\ardl.py", line 437, in _predict
y_pred = self._fitted_forecaster.predict(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\emana\anaconda3\envs\sktime-dev\Lib\site-packages\statsmodels\base\wrapper.py", line 113, in wrapper
obj = data.wrap_output(func(results, *args, **kwargs), how)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\emana\anaconda3\envs\sktime-dev\Lib\site-packages\statsmodels\tsa\ardl\model.py", line 1047, in predict
return self.model.predict(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\emana\anaconda3\envs\sktime-dev\Lib\site-packages\statsmodels\tsa\ardl\model.py", line 852, in predict
x[i, offset + j] = val
~^^^^^^^^^^^^^^^
IndexError: index -2 is out of bounds for axis 0 with size 1
so I have investigated a little and tried to understand the Predict function at statsmodels\tsa\ardl\model.py, I had hard time trying to understand its logic, especially the following loop:
for i in range(dynamic_start, fcasts.shape[0]):
for j, lag in enumerate(self._lags):
loc = i - lag
if loc >= dynamic_start:
val = fcasts[loc]
else:
# Actual data
val = self.endog[start + loc]
# Add this just before the error line in a test
print(f"i={i}, offset={offset}, j={j}, x.shape={x.shape}, loc={loc}")
x[i, offset + j] = val
fcasts[i] = x[i] @ paramsI am not sure if I understand everything well, but I think that the loop somehow can use negative indices to calculate all the oos (out-of-sample) predictions till the end. I believe there is some math magic here.
For my above code example, I have 16 observations total in my dataset: 11 for training (indices 0-10) and 5 for test.
So, with fh=[3] our start and end parameters are set to 13 (sktime, when it has a one-element list or array, passes the start and end values as this exact element).
I also got to know that the following function updates the end value to the index of the last training element:
params, exog, exog_oos, start, end, num_oos = self._prepare_prediction(
params, exog, exog_oos, start, end
)So after the _prepare_prediction function, our parameters would be like this:
start = 13, end = 10, and this mismatch makes the math in the above loop crash, resulting in an IndexError.
My question is: Should sktime's wrapper ensure that start never exceeds the index of the first out-of-sample element? For example, with 11 training observations (0-10), should start always be capped at 11 when doing out-of-sample prediction?