boiwsa
is an R package for seasonal adjustment and forecasting of
weekly data. It provides a simple, easy-to-use interface for calculating
the seasonally adjusted estimates, as well as a number of diagnostic
tools for evaluating the quality of the adjustments. In this tutorial we
illustrate its functionality in seasonal adjustment and forecasting of
the weekly data.
The seasonal adjustment procedure approach aligns closely with the locally-weighted least squares procedure introduced by Cleveland et al. (2014) albeit with several adjustments. Firstly, instead of relying on differencing to detrend the data, we opt for a more explicit approach by directly estimating the trend through smoothing techniques. Secondly, we incorporate a variation of Discount weighted regression (Harrison and Johnston, 1984) to enable the seasonal component to evolve dynamically over time.
We consider the following decomposition model of the observed series
where
where
Like the X-11 method (Ladiray and Quenneville, 2001), the boiwsa
procedure uses an iterative principle to estimate the various
components. The seasonal adjustment algorithm comprises eight steps,
which are documented below:
-
Step 1: Estimation of trend (
$T_{t}^{(1)}$ ) usingstats::supsmu()
. -
Step 2: Estimation of the Seasonal-Irregular component:
-
Step 2*: Searching for additive outliers using the method proposed by Findley et al. (1998)
-
Step 2**: Identifying the optimal number of trigonometric variables
-
Step 3: Calculation of seasonal factors, along with other potential factors such as
$H_{t}$ or$O_{t}$ , is done through DWR on the seasonal-irregular component extracted in Step 2. In this application, the discounting rate decays over the years. For each year$t$ and the observed year$\tau$ , a geometrically decaying weight function is represented as:$w_{t}=r^{|t-\tau|}$ , where$r \in (0,1]$ . Several important points are worth mentioning. First, when$r=1$ , the method simplifies to ordinary least squares regression with constant seasonality. On the contrary, smaller values of$r$ permit a more rapid rate of change in the seasonal component. However, it is advised against setting it below 0.5 to prevent overfitting. In addition, the choice of$r$ affects the strength of revisions in the seasonally adjusted data, with higher values of$r$ leading to potentially stronger revisions. Second, our methodology differs from the conventional one-way discounting, enabling the inclusion of future observations in the computation of seasonal factors. This approach circumvents the limitations of the forecasting methods discussed in Bandara et al. (2024). Finally, the choice of year-based discounting is driven by the fact that in traditional discount-weighted regression, even with a conservative choice of$r=0.95$ , in weekly data, observations separated by more than 2 years would carry nearly negligible weight. Therefore, the use of year-based discounting prevents an overly rapid decay which may potentially lead to unstable estimates of the seasonal component. -
Step 4: Estimation of trend (
$T_{t}^{(2)}$ ) from seasonally and outlier adjusted series usingstats::supsmu()
function (R Core Team, 2013) -
Step 5: Estimation of the Seasonal-Irregular component:
$$y_{t}-T_{t}^{(2)}=S_{t}+H_{t}+O_{t}+I_{t}$$ -
Step 6: Computing the final seasonal factors (and possibly other factors such as
$H_{t}$ or$O_{t}$ ) using discount weighted regression, as in step 3. -
Step 7: Estimation of the final seasonally adjusted series:
$$y_{t}-S_{t}-H_{t}$$ -
Step 8: Computing final trend (
$T_{t}^{(3)}$ ) estimate from seasonally and outlier adjusted series usingstats::supsmu()
.
To install boiwsa, you can use devtools:
# install.packages("devtools")
devtools::install_github("timginker/boiwsa")
Alternatively, you can clone the repository and install the package from source:
git clone https://github.com/timginker/boiwsa.git
cd boiwsa
R CMD INSTALL .
Using boiwsa
is simple. First, load the boiwsa
package:
library(boiwsa)
Next, load your time series data into a data frame object. Here is an
example that is based on the gasoline
data from the US Energy
Information Administration that we copied from the from the fpp2
package:
data("gasoline.data")
plot(gasoline.data$date,
gasoline.data$y,
type="l"
,xlab="Year",
ylab=" ",
main="Weekly US gasoline production")
Once you have your data loaded, you can use the boiwsa
function to
perform weekly seasonal adjustment:
res <- boiwsa(
x = gasoline.data$y,
dates = gasoline.data$date
)
In general, the procedure can be applied with minimum interventions and
requires only the series to be adjusted (x
argument) and the
associated dates (dates
argument) provided in a date format. Unless
specified otherwise (i.e., my.k_l = NULL
), the procedure automatically
identifies the best number of trigonometric variables to model the
intra-yearly (ic
option. Like
other software, there are three options: “aic”, “aicc”, and “bic”. The
weighting decay rate is specified by r
. By default
In addition, the procedure automatically searches for additive outliers
(AO) using the method described in Appendix C of Findley et al. (1998).
To disable the automatic AO search, set auto.ao.search = F
. To add
user-defined AOs, use the ao.list
option. As suggested by Findley et
al. (1998), the out.threshold
argument.
The boiwsa
function returns an S3 class object containing the results.
The seasonally adjusted series is stored in a vector called sa
. The
estimated seasonal factors are stored as sf
. In addition, the user can
see the number of trigonometric terms chosen in automatic search
(my.k_l
) and the position of additive outliers (ao.list
) found by
the automatic routine.
After the seasonal adjustment, we can plot the adjusted data to visualize the seasonal pattern:
plot(res)
To assess the quality of the adjustment, we can plot the autoregressive spectrum of the original and seasonally adjusted data, as illustrated in the code below:
plot_spec(res)
It is evident that the series originally had a single intra-yearly seasonal cycle, but this component was completely removed by the procedure.
We can also inspect the output to check if the number of trigonometric terms chosen by the automatic procedure matches our visual findings:
print(res)
#>
#> number of yearly cycle variables: 12
#> number of monthly cycle variables: 0
#> list of additive outliers: 1998-03-28
As can be seen, the number of yearly terms,
Here, we consider the weekly number of initial registrations at the Israeli Employment Service. Registration and reporting at the Employment Service are mandatory prerequisites for individuals seeking to receive an unemployment benefit. Therefore, applicants are expected to register promptly after their employment has been terminated. Given that most employment contracts conclude toward the end of the month, an increased number of applications is anticipated at the beginning of the month, leading to an intra-monthly seasonal pattern. Additionally, as can be seen in the Figure below, on an annual basis, three distinct peaks are observed, with the final one occurring in August. This peak is closely tied to seasonal workers, leading to the creation of an intra-yearly cycle.
library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr 1.1.4 ✔ readr 2.1.4
#> ✔ forcats 1.0.0 ✔ stringr 1.5.1
#> ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
#> ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
#> ✔ purrr 1.0.2
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ggplot() +
geom_line(aes(
x = lbm$date,
y = lbm$IES_IN_W_ADJ
), color = "royalblue") +
theme_bw() +
ylab("Number of claims per week") +
xlab("Year")
Furthermore, each year, there are two weeks in which the activity
plunges to nearly zero due to the existence of two moving holidays
associated with Rosh Hashanah and Pesach. Moreover, a working day effect
is expected, which leads to a reduced number of applications in weeks
with fewer working days. These effects are captured and modeled through
additional variables generated by the dedicated functions in boiwsa
.
To generate a working day variable, we use the boiwsa::simple_td
function, designed to aggregate the count of full working days within a
week and normalize it. This function requires two parameters: the data
dates and a data.frame
object containing information about working
days. The data.frame
should be in a daily frequency and contain two
columns: “date” and “WORKING_DAY_PART”. For a complete working day, the
“WORKING_DAY_PART” column should be assigned a value of 1, for a half
working day 0.5, and for a holiday, the value should be set to 0.
Moving holiday variables can be created using the boiwsa::genhol
function. These variables are computed using the Easter formula in Table
2 of Findley et al. (1998), with the calendar centering to avoid bias,
as indicated in the documentation. In the present example, the impact of
each holiday is concentrated within a single week, resulting in a
noticeable drop and subsequent increase in the number of registrations
during the following week. To account for this effect, we employ dummy
variables that are globally centered. These dummy variables are created
using a custom function - boiwsa::my_rosh
, which is created for this
illustrative scenario.
The code below illustrates the entire process based on the
boiwsa::lbm dataset
: creation of working day adjustment variables
using the boiwsa::simple_td
function; creation of moving holiday
variables using the dedicated functions, and adding the combined input
into the boiwsa::boiwsa
function.
# creating an input for simple_td
dates_il %>%
select(DATE_VALUE, ISR_WORKING_DAY_PART) %>%
`colnames<-`(c("date", "WORKING_DAY_PART")) %>%
mutate(date = as.Date(date)) -> df.td
# creating a matrix with a working day variable
td <- simple_td(dates = lbm$date, df.td = df.td)
# generating the Rosh Hashanah and Pesach moving holiday variables
rosh <- my_rosh(
dates = lbm$date,
holiday.dates = holiday_dates_il$rosh
)
# renaming (make sure that all the variables in H have distinct names)
colnames(rosh) <- paste0("rosh", colnames(rosh))
pesach <- my_rosh(
dates = lbm$date,
holiday.dates = holiday_dates_il$pesah,
start = 3, end = -1
)
colnames(pesach) <- paste0("pesach", colnames(pesach))
# combining the prior adjustment variables in a single matrix
H <- as.matrix(cbind(rosh[, -1], pesach[, -1], td[, -1]))
# running seasonal adjustment routine
res <- boiwsa(
x = lbm$IES_IN_W_ADJ,
dates = lbm$date,
H = H,
out.threshold = 5
)
Subsequently, we can visually examine the results of the procedure:
plot(res)
As we can see in the plot, the procedure has successfully eliminated the annual and monthly seasonal cycles, along with the influences of moving holidays.
Following a thorough visual examination of the seasonally adjusted data, we can now move forward with the spectrum diagnostics. As illustrated in the Figure below, corroborating our initial analysis of potential underlying seasonal patterns, it becomes evident that the data has two distinct seasonal cycles. Additionally, it is noteworthy that our procedure successfully removed the corresponding peaks, thereby highlighting its effectiveness.
plot_spec(res)
It is also possible to forecast weekly data using the predict method for
boiwsa
. The prediction is based on the forecast::auto.arima
model
fitted to the seasonally and outlier-adjusted data, which is then
combined with the seasonal factor estimates from boiwsa
.
The code below illustrates the entire process:
x <- boiwsa::lbm$IES_IN_W_ADJ
dates <- boiwsa::lbm$date
# train-test split (using 90% of observations for train)
n_train <- round(length(x) * 0.9)
n_test <- (length(x) - n_train)
h_train <- H[1:n_train, ]
h_test <- tail(H, n_test)
x_train <- x[1:n_train]
x_test <- tail(x, n_test)
dates_train <- dates[1:n_train]
dates_test <- tail(dates, n_test)
fit <- boiwsa(
x = x_train,
dates = dates_train,
H = h_train
)
fct <- predict(fit,
n.ahead = n_test,
new_H = h_test
)
# Visualizing predictions
plot(dates_test,
x_test,type="l",
ylab="Number of claims per week",
xlab="Date")
lines(dates_test,
fct$forecast$mean,
col="red")
legend(
"topright",
legend = c("Original", "Predicted"),
lwd = c(2, 1),
col = c("black", "red"),
bty = "n"
)
Bandara, K., Hyndman, R. J. and C. Bergmeir (2024). MSTL: A seasonal-trend decomposition algorithm for time series with multiple seasonal patterns. International Journal of Operational Research, 2024. In press.
Cleveland, W.P., Evans, T.D. and S. Scott (2014). Weekly Seasonal Adjustment-A Locally-weighted Regression Approach (No. 473). Bureau of Labor Statistics.
Findley, D.F., Monsell, B.C., Bell, W.R., Otto, M.C. and B.C Chen (1998). New capabilities and methods of the X-12-ARIMA seasonal-adjustment program. Journal of Business & Economic Statistics, 16(2), pp.127-152.
Harrison, P. J. and F. R. Johnston (1984). Discount weighted regression. Journal of the Operational Research Society 35(10), 923–932.
Hyndman, R. (2023). fpp2: Data for “Forecasting: Principles and Practice” (2nd Edition). R package version 2.5.
Harvey, A. C. (1990). Forecasting, structural time series models and the Kalman filter. Cambridge University Press.
Ladiray, D. and B. Quenneville (2001). Seasonal adjustment with the X-11 method.
Proietti, T. and D. J. Pedregal (2023). Seasonality in High Frequency Time Series. Econometrics and Statistics, 27: 62–82, 2023.
R Core Team (2013). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.
The views expressed here are solely of the author and do not necessarily represent the views of the Bank of Israel.
Please note that boiwsa
is still under development and may contain
bugs or other issues that have not yet been resolved. While we have made
every effort to ensure that the package is functional and reliable, we
cannot guarantee its performance in all situations.
We strongly advise that you regularly check for updates and install any new versions that become available, as these may contain important bug fixes and other improvements. By using this package, you acknowledge and accept that it is provided on an “as is” basis, and that we make no warranties or representations regarding its suitability for your specific needs or purposes.