Skip to content

Question: ARDL model #9699

@EmanAbdelhaleem

Description

@EmanAbdelhaleem

Hi, I'm not sure if this is the right place to ask, so if it's not, I apologize for the inconvenience.

I was working on sktime using the following code:

from sktime.datasets import load_longley    
from sktime.forecasting.ardl import ARDL    
from sktime.forecasting.base import ForecastingHorizon    
from sktime.split import temporal_train_test_split  
import numpy as np
  
# Load data with exogenous variables    
y, X = load_longley()    
    
# Split into train and test using temporal_train_test_split  
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X, test_size=5)  
print(y_test)

# Select specific exogenous variables    
X_train = X_train[["GNPDEFL", "GNP"]]    
X_test = X_test[["GNPDEFL", "GNP"]]    
print(X_test)
# Create ARDL model with lags and order for exogenous variables    
ardl = ARDL(lags=2, order={"GNPDEFL": 1, "GNP": 2}, trend="c")    
    
# Fit the model    
ardl.fit(y=y_train, X=X_train)    
    
fh = ForecastingHorizon(np.array([3], dtype="int64"))    
y_pred = ardl.predict(fh=fh, X=X_test)    
    
print(y_pred)

and for some reason it gave me this error:

Traceback (most recent call last):
  File "D:\Self Study\Open Source\Sktime\sktime\sktime\forecasting\tests\testing_my_changes.py", line 25, in <module>
    y_pred = ardl.predict(fh=fh, X=X_test)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Self Study\Open Source\Sktime\sktime\sktime\forecasting\base\_base.py", line 459, in predict
    y_pred = self._predict(fh=fh, X=X_inner)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Self Study\Open Source\Sktime\sktime\sktime\forecasting\ardl.py", line 437, in _predict
    y_pred = self._fitted_forecaster.predict(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\emana\anaconda3\envs\sktime-dev\Lib\site-packages\statsmodels\base\wrapper.py", line 113, in wrapper
    obj = data.wrap_output(func(results, *args, **kwargs), how)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\emana\anaconda3\envs\sktime-dev\Lib\site-packages\statsmodels\tsa\ardl\model.py", line 1047, in predict
    return self.model.predict(
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\emana\anaconda3\envs\sktime-dev\Lib\site-packages\statsmodels\tsa\ardl\model.py", line 852, in predict
    x[i, offset + j] = val
    ~^^^^^^^^^^^^^^^
IndexError: index -2 is out of bounds for axis 0 with size 1

so I have investigated a little and tried to understand the Predict function at statsmodels\tsa\ardl\model.py, I had hard time trying to understand its logic, especially the following loop:

for i in range(dynamic_start, fcasts.shape[0]):
            for j, lag in enumerate(self._lags):
                loc = i - lag
                if loc >= dynamic_start:
                    val = fcasts[loc]
                else:
                    # Actual data
                    val = self.endog[start + loc]
                # Add this just before the error line in a test
                print(f"i={i}, offset={offset}, j={j}, x.shape={x.shape}, loc={loc}")
                x[i, offset + j] = val
            fcasts[i] = x[i] @ params

I am not sure if I understand everything well, but I think that the loop somehow can use negative indices to calculate all the oos (out-of-sample) predictions till the end. I believe there is some math magic here.

For my above code example, I have 16 observations total in my dataset: 11 for training (indices 0-10) and 5 for test.

So, with fh=[3] our start and end parameters are set to 13 (sktime, when it has a one-element list or array, passes the start and end values as this exact element).

I also got to know that the following function updates the end value to the index of the last training element:

params, exog, exog_oos, start, end, num_oos = self._prepare_prediction(
            params, exog, exog_oos, start, end
        )

So after the _prepare_prediction function, our parameters would be like this:
start = 13, end = 10, and this mismatch makes the math in the above loop crash, resulting in an IndexError.

My question is: Should sktime's wrapper ensure that start never exceeds the index of the first out-of-sample element? For example, with 11 training observations (0-10), should start always be capped at 11 when doing out-of-sample prediction?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions