Question: ARDL model


Hi, I'm not sure if this is the right place to ask, so if it's not, I apologize for the inconvenience.

I was working on sktime using the following code:
```python
from sktime.datasets import load_longley    
from sktime.forecasting.ardl import ARDL    
from sktime.forecasting.base import ForecastingHorizon    
from sktime.split import temporal_train_test_split  
import numpy as np
  
# Load data with exogenous variables    
y, X = load_longley()    
    
# Split into train and test using temporal_train_test_split  
y_train, y_test, X_train, X_test = temporal_train_test_split(y, X, test_size=5)  
print(y_test)

# Select specific exogenous variables    
X_train = X_train[["GNPDEFL", "GNP"]]    
X_test = X_test[["GNPDEFL", "GNP"]]    
print(X_test)
# Create ARDL model with lags and order for exogenous variables    
ardl = ARDL(lags=2, order={"GNPDEFL": 1, "GNP": 2}, trend="c")    
    
# Fit the model    
ardl.fit(y=y_train, X=X_train)    
    
fh = ForecastingHorizon(np.array([3], dtype="int64"))    
y_pred = ardl.predict(fh=fh, X=X_test)    
    
print(y_pred)
```
and for some reason it gave me this error:
```
Traceback (most recent call last):
  File "D:\Self Study\Open Source\Sktime\sktime\sktime\forecasting\tests\testing_my_changes.py", line 25, in <module>
    y_pred = ardl.predict(fh=fh, X=X_test)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Self Study\Open Source\Sktime\sktime\sktime\forecasting\base\_base.py", line 459, in predict
    y_pred = self._predict(fh=fh, X=X_inner)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Self Study\Open Source\Sktime\sktime\sktime\forecasting\ardl.py", line 437, in _predict
    y_pred = self._fitted_forecaster.predict(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\emana\anaconda3\envs\sktime-dev\Lib\site-packages\statsmodels\base\wrapper.py", line 113, in wrapper
    obj = data.wrap_output(func(results, *args, **kwargs), how)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\emana\anaconda3\envs\sktime-dev\Lib\site-packages\statsmodels\tsa\ardl\model.py", line 1047, in predict
    return self.model.predict(
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\emana\anaconda3\envs\sktime-dev\Lib\site-packages\statsmodels\tsa\ardl\model.py", line 852, in predict
    x[i, offset + j] = val
    ~^^^^^^^^^^^^^^^
IndexError: index -2 is out of bounds for axis 0 with size 1
```
so I have investigated a little and tried to understand the `Predict` function at `statsmodels\tsa\ardl\model.py`, I had hard time trying to understand its logic, especially the following loop:
```python
for i in range(dynamic_start, fcasts.shape[0]):
            for j, lag in enumerate(self._lags):
                loc = i - lag
                if loc >= dynamic_start:
                    val = fcasts[loc]
                else:
                    # Actual data
                    val = self.endog[start + loc]
                # Add this just before the error line in a test
                print(f"i={i}, offset={offset}, j={j}, x.shape={x.shape}, loc={loc}")
                x[i, offset + j] = val
            fcasts[i] = x[i] @ params
```

I am not sure if I understand everything well, but I think that the loop somehow can use negative indices to calculate all the oos (out-of-sample) predictions till the end. I believe there is some math magic here.

For my above code example, I have 16 observations total in my dataset: 11 for training (indices 0-10) and 5 for test.

So, with `fh=[3]` our `start` and `end` parameters are set to `13` (sktime, when it has a one-element list or array, passes the `start` and `end` values as this exact element).

I also got to know that the following function updates the `end` value to the index of the last training element:
```python
params, exog, exog_oos, start, end, num_oos = self._prepare_prediction(
            params, exog, exog_oos, start, end
        )
```
So after the `_prepare_prediction` function, our parameters would be like this:
`start = 13`, `end = 10`, and this mismatch makes the math in the above loop crash, resulting in an IndexError.

My question is: Should sktime's wrapper ensure that `start` never exceeds the index of the first out-of-sample element? For example, with 11 training observations (0-10), should `start` always be capped at 11 when doing out-of-sample prediction?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question: ARDL model #9699

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question: ARDL model #9699

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions