0% found this document useful (0 votes)
35 views

Sowell Maximum Likelihood Estimation

This document presents a method for maximum likelihood estimation of stationary univariate fractionally integrated time series models. It begins with an introduction to fractional integration and fractionally differenced time series models. It then reviews limitations of previous two-step estimation procedures, which estimated the fractional differencing parameter separately from the AR and MA parameters. The main contribution is deriving the unconditional exact likelihood function, which allows simultaneous maximum likelihood estimation of all parameters. Issues in evaluating the likelihood function efficiently, obtaining starting values, and small sample properties are also discussed.

Uploaded by

Adis Salkic
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Sowell Maximum Likelihood Estimation

This document presents a method for maximum likelihood estimation of stationary univariate fractionally integrated time series models. It begins with an introduction to fractional integration and fractionally differenced time series models. It then reviews limitations of previous two-step estimation procedures, which estimated the fractional differencing parameter separately from the AR and MA parameters. The main contribution is deriving the unconditional exact likelihood function, which allows simultaneous maximum likelihood estimation of all parameters. Issues in evaluating the likelihood function efficiently, obtaining starting values, and small sample properties are also discussed.

Uploaded by

Adis Salkic
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Journal of Econometrics 53 (1992) 165-188.

North-Holland

Maximum likelihood estimation


of stationary univariate fractionally
integrated time series models*

Fallaw Sowell
Carnegie Mellon University, Pittsburgh, PA 1.5213-3890, USA

Received August 1988, final version received April 1991

To estimate the parameters of a stationary univariate fractionally integrated time series, the
unconditional exact likelihood function is derived. This allows the simultaneous estimation of all
the parameters of the model by exact maximum likelihood. Issues involved in obtaining
maximum likelihood estimates are discussed. Particular attention is given to efficient procedures
to evaluate the likelihood function, obtaining starting values, and the small sample properties of
the estimators. Limitations of previous estimation procedures are also discussed.

1. Introduction

A time series implication of several economic and financial models is the


absence of strong dependence between distant observations. Because nonzero
values of the fractional differencing parameter implies this strong depen-
dence, several researchers have tested these models by estimating fractional
differencing parameters. The series that have been studied include: real gross
national product [Diebold and Rudebusch (19891, Haubrich and Lo (1989)1,
interest rates [Shea (199011, consumer and wholesale price indices [Geweke
and Porter-Hudak (1982)], stock market returns [Aydogam and Booth (19881,
Greene and Fielitz (197711, stock market prices [Lo (198911, futures prices for
soybeans, soybean oil, and soybean meal [Helms et al. (198411, and exchange
rates [Cheung (1989), Booth et al. (1982)l. The wide use of the fractional

*This is a generalization of Chapter 3 of the author’s Ph.D. thesis, which was supported by the
Alfred P. Sloan Foundation through a Doctoral Dissertation Fellowship. The additional work
was supported by a Faculty Development Grant from Carnegie Mellon University. The author
thanks John Geweke and two anonymous referees for helpful suggestions and Ken Murphy for
computing assistance.

0304-4076/92/$05.00 0 1992-Elsevier Science Publishers B.V. All rights reserved


166 F. Sowell, Maximum likelihood estimation

differencing model in empirical work highlights the importance of efficient


estimation procedures for this model.
The importance of efficient estimation procedures has also been high-
lighted by recent developments in both econometric and economic theory. In
econometric theory, both Gourieroux, Maurel, and Monfort (1987) and
Sowell (1990a) have shown that the asymptotic distributions of regression
estimators are very different for series that follow fractional differenced
models. While in economic theory, Haubrach and Lo (1989), using a tech-
nique developed in Granger (19801, present a theoretical macroeconomic
model which gives the implication that aggregate time series should be
fractionally integrated. Both these lines of research highlight the need for
efficient procedures to estimate fractionally integrated models.
The most efficient estimation procedure for fractionally differenced models
is presumably maximum likelihood. Unfortunately, the perceived complexi-
ties associated with deriving the likelihood function and evaluating it repeat-
edly [see McLeod and Hipel (19781, Hosking (19841, and Brockwell and Davis
(1987)] has meant that none of the empirical work cited above has used
maximum likelihood. In this paper, this perception is shown to be unfounded.
The unconditional exact likelihood function for a normally distributed sta-
tionary fractionally integrated time series is derived. Recursive procedures
are noted which allow efficient evaluation of the likelihood function.
The next section is an introduction to fractionally integrated models. In
section 3 previous estimation procedures are reviewed and their limitations
are noted. The likelihood function is derived in section 4. The fifth section
presents issues involved in obtaining maximum likelihood estimates. Particu-
lar attention is given to: efficient procedures to evaluate the likelihood
function, obtaining starting values for the parameters, and the small sample
properties of the estimators. The final section summarizes the paper and
indicates directions for future research.

2. An introduction to fractional models

For d E (- i, +I, the fractional difference operator (1 - Ljd is defined by


the Binomial Theorem as the series

(1 -I!+ E (;)(-l)‘L’, (1)


j=O

where the sequence of coefficients

f(d+l)(-1)’ r( -d + j)
T(d-j+l)T(j+l) =T(-d)T(j+l)
F. Sowell, Maximum likelihood estimation 167

is square summable. For values of d > i, (1 - LJd can be defined by combin-


ing integer differencing and eq. (1).
A time series, z,, is said to follow a fractionally differenced model if, after
applying the operator (1 - Lid, it follows an ARMA(p, q) model, where p
and q are finite nonnegative integers. The time series z, is said to be
integrated of order d, which is denoted z, - I(d). Extending the standard
notation z, - I(d) can also be written z, - ARIMA(p, d, q). The general
fractionally integrated time series model will be written as

@(L)(l -LfZ, = @(L)Et, (4

where

Q(L) = 1+&L +C$,P+ ... +C/+L”

and

O(L) =l+B,L+B,P+ ... +e,Lq

are polynomials in the lag operator L. Attention will be restricted to the class
of models which satisfy the following assumptions:

Assumption 1. Q(L) is of order less than or equal to p, O(L) is of order less


than or equal to q, the roots of O(x) and Q(x) are outside the unit circle, and
E, - IIDN(0, u2>.

Assumption 2. d < i.

Assumption 3. The roots of Q(x) are simple.

Assumption 1 is standard and requires no explanation. As noted below,


Assumption 2 is required to restrict attention to stationary models. Assump-
tion 3 is required for the procedure used to derive the likelihood function.
The practical restrictions implied by Assumption 3 are investigated in section
5.4.
If Assumption 1 holds the distinguishing characteristics of the fractional z,
time series are:

(1) z, is stationary if d < 4.


(2) z, possesses an invertible moving average representation if d > - i.
168 F. Sowell, Maximum likelihood estimation

(3) If the spectral density is denoted by f,(A), then as A * 0, f,(h) - K-~~,

where K is a constant and is independent of d.


(4) If the autocovariances are denoted by y(s) = Ez~z,_~, then as s --f ~0,
y(s) - ks2d-‘, where k is a function of d.

These results are proven in Granger and Joyeux (1980) or Hosking (1981)
and the explicit forms of f,(A) and y(s) are derived below.
The most useful feature of fractionally integrated time series are their
long-range dependence. The dependence between observations produced by
the ARMA structure of the model decays at a geometric rate, while the
dependence produced by the fractional differencing parameter decays at the
much slower hyperbolic rate. Hence the long-range dependence between
observations is eventually determined only by the fractional differencing
parameter. These characteristics can be seen in the shapes of the spectral
density and the autocovariance function. If 0 < d < $, the process eventually
exhibits strong positive dependency between distant observations. This is
noted in the frequency domain by the spectral density increasing to an
infinite value as A approaches zero. In the time domain, this long-range
positive dependence is indicated by the fact that the autocovariances are
eventual positive and decline slowly. If - i < d < 0, the process eventually
exhibits negative dependency between distant observations. In the frequency
domain, this is indicated by the decline of the spectral density to zero as A
approaches zero. In the time domain, this long-range negative dependence is
indicated by the fact that the autocovariances are eventual negative and
decline slowly.

3. Previous estimation techniques

Several estimation procedures for fractionally integrated time series have


been proposed. Unfortunately the AR and MA parameters are typically not
accurately estimated. This problem is avoided in this paper by estimating all
the parameters by maximum likelihood. Before deriving the likelihood func-
tion, the previous estimation techniques will be reviewed and their inability
to estimate the general model is noted.
There are two types of estimation procedures: two-step procedures and
one-step procedures. The earliest were the two-step procedures. In the first
step, the fractional differencing parameter is estimated and in the second
step the other parameters of the model were estimated. The two-step
procedures only differ in their first step. In the second step, the observed
series is transformed and the remaining parameters of the model were
estimated by standard time series procedures applied to the transformed
series.
F. Sowell, Maximum likelihood estimation 169

There are four ways of performing the first step of estimating the fractional
differencing parameter. The first is the estimated resealed range exponent, H
[see McLeod and Hipel (1978) for a survey of this statistic]. The differencing
parameter can be identified as d^= fi - i. A generalization, which is de-
signed to be robust to ‘short-term dependence’, is given in Lo (1989). Two
alternative estimation procedures exploit the special shape of the spectral
density of a fractionally integrated time series. In Janacek (1982) the differ-
encing parameter estimate is obtained by numerical integration of the log
periodogram. Geweke and Porter-Hudak (1983) exploits the behavior of the
spectral density around zero. Using frequencies near zero, a univariate
regression of the log periodogram on the log of the frequencies is performed.
The slope estimate is an estimate of the differencing parameter. The fourth
procedure presented in Shea (1990) is maximum likelihood estimation of the
regression model used in Geweke and Porter-Hudak (1983).
These procedures only estimate the differencing parameter. In the second
step, the estimated differencing parameter is used to transform the observed
series into a series that presumably follows an ARMA(p, q> model. If the
model of the observed series is @(LX1 - Lldz, = @(L)E~, the estimated
differencing parameter d^ is used to try and obtain the series u, = (1 - Lldz,
which would follow the model NL)u, = @(L)E~.
A problem is that (1 -L)” is defined as an infinite lag polynomial, hence
the series u, = (1 - Lldz, can only be obtained when there exists an infinite
realization of 2,. Given a finite sample, two different procedures have been
proposed to approximate ur; one in the time domain and the other in the
frequency domain. In the time domain the Binomial Theorem representation
of (1 - Ljd is used to transform the observed series. If u, = (1 - LY’z,, then

Cc
r( -d + j)
u,= c r(
j=. -d)T( j + 1) “-j’

This ;uggests approximating u, by using d^ in the truncated polynomial, i.e.,


use d for d in the above expression and set z,_~ = 0 for t -j outside of the
sample.
The alternative transformation is the frequency domain approach pre-
sented in Geweke and Porter-Hudak (1983). The approach is to calculate the
Fourier transform of the observed series, multiply by the Fourier transform
of the fractional differencing operator based on dA, and then calculate the
inverse Fourier transform. It was hoped that the effect of this transformation
is a series that follows the ARMA( p, q) model of the u, process. The actual
effect is identical to the time domain truncated polynomial using d:
170 F. Sowell, Maximum likelihood estimation

The equivalence of the frequency transform and the truncated time trans-
formation applies to general linear transformations. To see this consider an
arbitrary linear operator cq_,6jLj. The frequency domain transformed
series (denoted ti,) is calculated by taking the inverse Fourier transform of
the product of the Fourier transforms of a linear operator c~=OSjLj and the
sample zr, z2,. . . , zT:

_ ,‘, LZDeisA 5 mi~‘klsk_rzt


epikh dh
k=l t=1

0 if sI0
min[T, s]
=
C Ss_tzt if s 2 1.
i r=r

For s = 1,2,. . ., T, this is the truncated polynomial:

s- 1

ii, = i ss_,zt = c sjz,_j.


t=1 j=O

This is a well-known result in Fourier analysis that has apparently been


overlooked by researchers working with fractional ARIMA models.
Consider the truncated time domain implementation of the fraction dif;
ferencing transformation. For any finite realization (ignoring any bias in d
which only further complicates the estimation of the ARMA parameters),

5
q-J+t+k)
=u,-- c
k=O r( -d^)r(f + k + 1) zk’

After applying either transformation, the resulting series is the sum of a


series which follows an ARMA model and a linear combination of an infinite
number of unobserved terms. The transformed series does not have the
ARMA model of the u, series, hence the ARMA parameters will not be
correctly estimated in the second step. This problem can be avoided by
estimating all the parameters of the model in one step.
F. SoweN, Maximum likelihood estimation 171

Two one-step procedures have previously been proposed, one in Li and


McLeod (1986) and the other in Fox and Taqqu (1986). In Li and McLeod
(1986) the procedure suggested is to truncate the infinite sum that defines
(1 - LYj and use standard time series estimation procedures. Truncating the
infinite series is equivalent to the two-step procedure which leads to mislead-
ing estimates. The procedure proposed in Fox and Taqqu (1986) is an
approximation to the maximization of the normal likelihood function. The
estimates are consistent and asymptotically normal. However, as noted in
Dahlhaus (1988), the small sample behavior of this type of estimator is poor
if the spectrum of the series contains peaks near zero, which is known to exist
for positive values of the fractional differencing parameter. Dahlhaus (1988)
concludes that exact maximum likelihood is optimal.

4. The normal likelihood function

Consider a stationary normally distributed fractionally integrated time


series z, generated by the model given by the eq. (2) which satisfies Assump-
tions 1, 2, and 3. Let Z, be a sample of T observations such that Z, =
[z, 22 ... ~~1’ and Z, - N,(O, X>, with probability density function

Stationarity implies that the covariance matrix is a Toeplitz form

,X=[[r(i-j)] for i,j=1,2 ,..., T.

To estimate the parameters of the model by maximum likelihood requires


evaluating the likelihood function for a given set of parameter values. This
requires writing the covariance matrix or, equivalently, the autocovariance
function in terms of the parameters of the model. This parameterization of
the autocovariance function is derived by writing the spectral density of z, in
term of the parameters of the model and then calculating the autocovariance
function by

The spectral density of z, will be calculated in two steps, first the spectral
density of U, = (1 - Ljdz, is calculated and then the spectral density of 2,.
Assumptions 1 and 2 imply U, is generated by the ARMA(p, q) process
172 F. Sowell, Maximum likelihood estimation

@(L)u, = @(L)E~. The Wold representation of u, is

O(L)
ut= ~&t*

The roots of Q(L) are assumed to be outside of the unit circle hence. Q(x)
can be written

@tx) = ,I1(l -Pj'>T

where l~,,l < 1 for n = 1,2,..., p. The spectral density of U, is

f,(h) = WV
ww)‘* 2= ,@(w),*‘T*fi
(1 -p,w)-‘(l -p,wq-r,
j=l

where w = eiA. By Assumption 3 it is possible to use the partial fraction


decomposition, to write

f,(h) = 10(w)12a2
~ “’ 1

i
O”~j

j=l (1
-PjO) - (1 -Pj’W) ’
I

where

-1

lj=
1Pjt~l(l--PiPj)m~j(Pj-p.i)
I .

Because O(o) is of finite order, each element of 10(w>12 can be written as a


polynomial in w with both positive and negative values of the exponents and
the spectral density of u, can be written

f,(A) =a2 5
I= -q
l+!f(l)O f:
j=l
O”lj

i
PfP-
(l-pjw) (1-lt;‘q
1’ (5)
where

min[q,q--ll
F. Sowell, Maximum likelihood estimation 173

Eq. (5) is one representation of the spectral density of an ARMA(p, q>


model where the AR process has simple roots.
Given the definition of ~1, the spectral density of z, can be written

f,(h) = (1 -COPd(l -o-‘)-df,(A).

Upon substituting in for f,(A),

[s-q j=l [
(l-Pjw) -
I
(l-~;‘O)
x(1 -w>-“(1 --O-l)-dWp+i. (6)

This is the general form of the spectral density for a time series which is
generated by a stationary fractional ARMA(p, d, q) model where the roots of
the AR polynomial are simple.
-y(s) can be calculated by substituting eq. (6) into eq. (4). From the
structure of the spectral density this reduces to the evaluation of the integral’
(recall w = e-‘“>

_e-‘A)-d(l -e’“)-de-i+j,,.
X(1
(7)
The autocovariance function can be written

Y(s) =u2
I=
2t
-4 ,= 1
ICl(l)ijC(d,P+l-s,P,). (8)

If p = 0, the partial fraction decomposition need not be considered and the


autocovariances have the form

r(l-2d)T(d+s-[)
Y(S) =u2 5
I= -9
4(i) T(d)T(l -d)T(l-d-s+l).

‘If p = 0, the term in brackets equals one; this reduces to the integral evaluated in appendix 2.
Also, if p = 0, the j indexed sum and the terms i, will not in the expression of the spectral
density.
174 F. Sowell, Maximum likelihood estimation

The autocovariances for p # 0 have been written in terms of C(d, h, p) and


this function has been defined as an integral. To efficiently calculate the
autocovariances, an alternative form will be derived. Note that the variable h
can take on any integer value, p can be any complex number in the unit
circle, and d is restricted to real values less than $. Using geometric series
expansions, C(d, h, p) can be written

+ e pn&i:‘(’ _ e-iA)-d(l _ eiA)-de-iA(h-n)dA.


n=l

The interchange of the sum and integral is justified in appendix 1.


The problem is reduced to the following evaluation, which is derived in ap-
pendix 2,

r(l-2d)(-l)h
/
gzT(l
-e-‘“)-d(l -eiA)-de-iAhdA = r(l _d_h)T(l _d+h).

With this it is possible to write

p”( - ])h+,n
C(d,h,p) =Ql- 24 p2” 2
,=,T(l-d+h+m)r(l-d-h-m)

+e n=l r(l -d+h-n)T(l


p”( - l)hPn
-d-h+n)
1

r(l-2d)T(d+h)
= r(l -d+h)T(l -d)T(d)

x[p2”F(d+hh,l;1-d+h;P)

+F(d-h,l;l-d-hip) -11, (9)


F. Sowell, Maximum likelihood estimation 175

where F(a, b; c; x) is the hypergeometric function. The last equality is


derived in appendix 3. The advantage of this final form is that the needed
hypergeometric functions can be quickly and accurately approximated on a
computer (see appendix 3).

5. Maximum likelihood estimates

5.1. Evaluating the log-likelihood function

Using the formulas presented above, the log-likelihood function is easily


evaluated on a computer.2 The main parts of the program are:

(1) Factor the autoregressive polynomial.


(2) Calculate the Jj’s.
(3) Calculate the different C(d, h, p) values.
(4) Evaluate the covariance matrix.
(5) Calculate the Cholesky decomposition and determinant of the in-
verse of the covariance matrix.
(6) Calculate the log likelihood function value.

Important savings in time are made by separating steps 3 and 4. Because


the same C(d, h, p) term appears in several autocovariances, all the C(d, h, p>
terms are calculated, saved in memory, and then recalled in step 4. Step 3 is
relatively quick because, as noted in appendix 3, the C(d, h, p) terms can be
calculated by a recursive formula. In step 5, Levinson’s algorithm [see Sowell
(1989)] is used to recursively calculate the terms of the Cholesky decomposi-
tion and determinant of the inverse of the covariance matrix; hence as T
grows the number of calculations grows at the rate TZ instead of T”.

5.2. Starting values

When using a numerical optimization algorithm to maximize the log-likeli-


hood function, concern must be given to the selection of starting values for
the model’s parameters.3 The log-likelihood function is not globally concave;
hence, the results of a search algorithm depend on the choice of starting
values. No single procedure to obtain starting values will be optimal for all
models; therefore different procedures are presented.
The procedures given below to obtain starting values for fractional
ARIMA(p, d, q) models use a procedure to obtain starting values for a

‘A FORTRAN program that calculates the maximum likelihood estimates is available from
the author.
‘These comments apply not only to the maximum likelihood procedure but also to the
procedure presented in Fox and Taqqu (1986).
176 F. Sowell, Maximum likelihood estimation

nonfractional ARMA(p, q) model. Several alternative procedures are avail-


able for the nonfractional ARMA(p, q) models [see ch. 6 of Hannan and
Deistler (1988)] and any of these could be used. The specific procedure used
below is described in Box and Jenkins (1976) and is widely accessible through
the subroutine NSPE in the International Mathematical and Statistical
Libraries (IMSL).

Procedure 1. First estimate d using the GPH procedure (or the R/S proce-
dure, etc.). Start d at the estimated value and start the initial ARh44 parameter
and the jnnovation variance at the parameters estimated by NSPE for the series
(1 - L)d~,. Where the transformation (1 - L)d is applied by the truncated
polynomial.

Procedure 2. Choose a grid of d values. For each value of d calculate the


series (1 - Ljdt, and estimate the ARMA parameters and the innovation
variance by NSPE. Start the parameters at the values associated with the
smallest estimate of the innovation variance.

For small samples Procedure 1 suffers from the fact that the individual
estimates of d are extremely imprecise; while Procedure 2 suffers from the
imprecise estimate of the innovation variance. Experience has shown that in
samples sizes as large as 200, Procedure 1 and Procedure 2 (using NSPE)
yield comparable final parameter estimates. In practice, it is important to
choose several different starting values to be confident that the global
maximum has been achieved.

5.3. Small sample properties

Sufficient conditions for the consistency and asymptotic normality of the


exact maximum likelihood estimates are presented in Dahlhaus (1989).4
However, because the fractional differencing parameter captures long-cycle
characteristics of a series, asymptotic properties of maximum likelihood
estimators may be of questionable use in small samples. To discover the small
sample properties, Monte Carlo simulation was used to compare different
estimation procedures. The parameter of simulated samples were estimated
by the maximum likelihood procedure (MAX), the Fox and Taqqu (1986)
procedure (F&T), and the Geweke and Porter-Hudak (19831 procedure5
(GPH). The GPH only estimates the d parameters, while both MAX and
F&T estimate all the parameters of the model. The log-likelihood function
and the negative of the objective function in Fox and Taqqu were maximized

4’All conditions are fulfilled for fractional Gaussian noise and fractional ARMA-processes’
[Dahlhaus (1989, p. 1751)].
‘For the GPH procedure (Y was set equal to 4.
F. Sowell, Maximum likelihood estimation 177

Table 1
Estimated parameter bias and square root of the mean squared error for the ARIMA(O,d,O)
model CT= 100, 100 replications).

BIAS &iGz
d MAX F&T GPH MAX F&T GPH
- 0.4 -0.017 - 0.043 0.032 0.101 0.104 0.301
- 0.3 - 0.013 - 0.044 0.020 0.089 0.100 0.327
- 0.2 - 0.007 - 0.040 0.013 0.077 0.088 0.311
-0.1 - 0.009 - 0.055 0.016 0.079 0.096 0.274
0.0 - 0.009 - 0.048 0.002 0.079 0.094 0.316
0.1 -0.017 - 0.057 0.010 0.067 0.086 0.308
0.2 - 0.012 - 0.046 0.013 0.092 0.122 0.299
0.3 - 0.015 -0.056 - 0.032 0.077 0.097 0.326
0.4 -0.018 - 0.034 0.085 0.060 0.097 0.254

using the Davidon-Fletcher-Powell algorithm [Powell (1971)] in the DFP


subroutine of GQOPT.6*7
The first model considered is (1 - Ljdz, = Ed. For each d value samples
with T = 100 were generated. Table 1 presents the results for this model. All
three procedures perform equally well in terms of bias. For MAX and F&T
the level of the bias does not depend on the parameter value, while for GPH
the bias appears to increase as the parameter value moves away from zero.
For all the models considered the m was smallest for the maximum
likelihood estimators. A striking feature is that the m of the GPH
estimates are three to four times larger than for the maximum likelihood
estimates.
The results for the ARIMA(1, d,O) model (1 + +L)(l - Ljdx, = Ed are
presented in table 2 and the results for the ARIMA(0, d, 1) model (1 - Ljdx,
= (1 + f3L)~, are presented in table 3. On average all three procedures
appear to have bias of comparable orders of magnitude. For all three
procedures the bias in d was greater for negative values of 4 and 13than for
positive values. This occurred for all the values of d. The source of the
problem is that as 4 or 0 approach negative one, the operators (1 + +L) and
(1 + BL) tend to either cancel out or reinforce the (1 - Ljd operator.
For the ARIMA(1, d, 0) model the smallest bias was achieved by GPH for
seven of the models, by MAX for five of the models, and by F&T for one

‘GQOPT is a FORTRAN library of numerical optimization algorithms distributed by Richard


Quandt at the Department of Economics, Princeton University.
‘The initial parameter values were chosen by futing d = 0 and starting the ARMA parameters
and the innovation variance selected by NSPE. The final results reported are not sensitive to the
starting values. These starting values were chosen to save time for the thousands of different
starting values required for the Monte Carlo. In practice researchers should use either Proce-
dure 1 or Procedure 2.
Table 2
Estimated parameter bias and square root of the mean squared error for the ARIMA(1, d, 0) model CT = 100, 100 replications).
_____
BIAS &E
-
MAX F&T GPH MAX F&T GPH

d 4 d^ 4 cl i d^ d^ i d^ i d^
~___
- 0.3 -0.7 0.102 0.138 - 0.035 0.042 0.311 0.225 0.233 0.258 0.232 0.345
-0.3 - 0.2 -0.118 - 0.094 - 0.408 - 0.337 0.046 0.242 0.264 0.352 0.342 0.296
-0.3 0.3 - 0.037 - 0.047 - 0.130 -0.128 0.040 0.185 0.206 0.296 0.297 0.334
-0.3 0.8 - 0.021 -0.017 - 0.047 - 0.034 0.013 0.095 0.062 0.114 0.060 0.318
0.0 -0.7 0.002 0.022 -0.108 - 0.050 0.283 0.172 0.157 0.142 0.093 0.308
0.0 -0.2 -0.100 - 0.064 - 0.325 - 0.253 0.055 0.235 0.248 0.352 0.325 0.292
0.0 0.3 - 0.019 - 0.006 - 0.121 - 0.088 - 0.017 0.102 0.114 0.229 0.235 0.301
0.0 0.8 0.000 0.005 - 0.046 0.024 - 0.009 0.088 0.064 0.102 0.070 0.314
0.3 -0.7 -0.137 - 0.085 - 0.145 - 0.059 0.283 0.155 0.124 0.147 0.103 0.276
0.3 -0.2 - 0.096 -0.081 - 0.521 - 0.452 0.069 0.197 0.232 0.303 0.289 0.275
0.3 0.3 - 0.026 - 0.017 - 0.109 - 0.088 - 0.010 0.096 0.125 0.240 0.248 0.274
0.3 0.8 - 0.033 -0.018 - 0.073 - 0.035 - 0.020 0.089 0.067 0.103 0.072 0.305
Table 3
Estimated parameter bias and square root of the mean squared error for the ARIMACO, d, 1) model (T = 100, 100 replications).

BIAS JMSE
MAX F&T GPH MAX F&T GPH
d 8 2 i i e^ 2 2 e^ 2 i

- 0.3 - 0.8 - 0.029 0.019 - 0.094 0.142 - 0.242 0.215 0.193 0.260 0.262 0.311
-0.3 -0.3 0.016 -0.019 -0.135 0.125 - 0.004 0.228 0.248 0.210 0.212 0.300
- 0.3 0.5 - 0.016 0.012 - 0.070 0.032 0.044 0.096 0.111 0.105 0.109 0.278
-0.3 0.9 - 0.010 0.001 - 0.048 - 0.030 -0.013 0.097 0.048 0.103 0.068 0.259
0.0 -0.8 - 0.077 0.067 - 0.328 0.295 - 0.461 0.223 0.206 0.353 0.352 0.282
0.0 -0.3 -0.019 - 0.004 - 0.190 0.156 - 0.086 0.198 0.237 0.192 0.182 0.307
0.0 0.5 - 0.013 0.012 - 0.064 0.027 - 0.015 0.100 0.118 0.119 0.117 0.275
0.0 0.9 0.012 -0.011 ~ 0.037 - 0.052 0.017 0.080 0.049 0.100 0.079 0.295
0.3 -0.8 -0.180 0.173 -0.515 0.481 - 0.486 0.245 0.259 0.277 0.305 0.281
0.3 -0.3 - 0.064 0.059 - 0.178 0.166 - 0.091 0.142 0.183 0.185 0.197 0.294
0.3 0.5 - 0.012 0.023 - 0.050 0.030 0.044 0.088 0.109 0.117 0.118 0.320
0.3 0.9 - 0.007 - 0.005 - 0.043 0.064 0.039 0.065 0.048 0.096 0.075 0.303
180 F. Sowell, Maximum likelihood estimation

model. For the ARIMACO, d, 1) model the smallest bias was achieved by
MAX for eleven of the models and by GPH for the other model. The m
of the d parameter tended to be smallest for the maximum likelihood
estimator. The m for the maximum likelihood estimates was always
smaller than for the GPH estimates, and for 21 of the 24 models considered
the \/MsE for MAX was smaller than for F&T. Generally, the maximum
likelihood estimator was superior to the procedures presented in Fox and
Taqqu (1986) and in Geweke and Porter-Hudak (1982).

5.4. Roots of the autoregressive polynomial

One possible limitation of the maximum likelihood estimation procedure is


that at a theoretical level it is limited to models which do not have repeated
roots in the autoregressive polynomial, see Assumption 3. This is because the
partial fraction decomposition used to derive the spectral density of the u,
process is restricted to simple roots in the AR polynomial. If the population
parameters of the model actually contains repeated roots, there is the
possibility that the procedure used to evaluate the likelihood function may
break down. This problem is unique to the procedure outlined in this paper
and is not a problem for some alternative estimation procedures such as
F&T or GPH.
This problem was investigated by simulation. Series were generated from
models which contained repeated roots in the AR polynomial and their
parameters estimated with the maximum likelihood procedure. The results
of the simulation are reported in table 4. The MAX procedure did not
break down for any of simulated samples. This was not a result of imprecise
estimation; of twelve parameters estimated, the smallest bias was achieved by
MAX for eleven of the parameters and the smallest v&%I? was achieved by
MAX for nine of the parameters. As with the ARIMA(1, d, 0) model the bias
and the \/MsE increased as the roots of the AR polynomial approached
negative one.
The restriction that the roots of the autoregressive polynomial be simple
does not appear to be a binding restriction at an empirical level. This is
because in the space of polynomials of a given order, the subset which has
repeated roots is a set with zero Lebesgue measure. Hence, numerical search
routines choose AR parameter values with repeated roots with probability
zero. The maximum likelihood procedure is well defined over a large enough
subset of the parameter space to accurately estimate models with repeated
roots.

5.5. Computation time

The addition of the fractional differencing parameter to the ARMA model


significantly alters the covariance matrix so that efficient procedures to
F. Sowell, Maximum likelihood estimation 181
Table 5
Seconds of CPU time required to evaluate the likelihood function for the fractional ARIMAfp, d, 4) model and the nonfractional ARIMAfp, 0,4).

Numr of MA parameters (9)

Numr of Fractional ARIMAfp, d, 9) ARIMAf p, 0,9)


AR parameters ( p) 0 1 2 3 4 5 0 I 2 3 4 5

T= 100
0 0.08 0.08 0.10 0.23 0.24 0.35 0.01 0.03 0.07 0.13 0.22 0.34
1 0.11 0.11 0.13 0.25 0.27 0.38 0.01 0.03 0.07 0.13 0.22 0.34
2 0.14 0.15 0.18 0.30 0.33 0.47 0.03 0.03 0.07 0.12 0.21 0.34
3 0.29 0.30 0.33 0.47 0.50 0.63 0.06 0.06 0.06 0.12 0.22 0.34
4 0.32 0.34 0.37 0.51 0.55 0.69 0.12 0.12 0.12 0.12 0.21 0.34
5 0.49 0.50 0.54 0.70 0.74 0.89 0.21 0.22 0.21 0.21 0.22 0.34

T= 150
0 0.17 0.17 0.20 0.32 0.33 0.46 0.01 0.10 0.19 0.33 0.52
1 0.20 0.21 0.23 0.36 0.90 0.49 0.01 0.04 0.10 0.19 0.33 0.52
2 0.25 0.26 0.28 0.42 0.45 0.59 0.04 0.04 0.10 0.19 0.33 0.52
3 0.40 0.42 0.44 0.59 0.64 0.78 0.09 0.09 0.10 0.19 0.32 0.51
4 0.45 0.47 0.50 0.66 0.72 0.87 0.18 0.18 0.19 0.19 0.32 0.51
5 0.63 0.66 0.70 0.87 0.94 1.10 0.32 0.32 0.32 0.32 0.32 0.52

T=200
0 0.31 0.31 0.33 0.46 0.47 0.59 0.01 0.05 0.14 0.25 0.43 0.68
1 0.34 0.34 0.37 0.50 0.53 0.65 0.01 0.05 0.14 0.25 0.43 0.68
2 0.40 0.41 0.43 0.59 0.64 0.77 0.05 0.05 0.13 0.25 0.43 0.68
3 0.57 0.58 0.61 0.78 0.83 0.99 0.12 0.12 0.13 0.25 0.43 0.68
4 0.62 0.64 0.69 0.86 0.93 1.11 0.24 0.24 0.25 0.25 0.43 0.68
5 0.81 0.84 0.92 1.08 1.16 1.37 0.42 0.42 0.42 0.43 0.43 0.68
F. Sowell, Maximum likelihood estimation 183

evaluate the autocovariance function of ARMA models [e.g., Harvey and


Phillips (1979)] cannot be used. The loss of these procedures raises the
question of the practicality of repeated evaluation of the fractional likelihood
function as is required for numerical optimization algorithms. The concern is
the computation time required for one evaluation of the fractional
ARIMA( p, d, q> likelihood function.
Table 5 presents the seconds of CPU time’ required for one evaluation of
the fractional ARIMA likelihood function for samples of size T = 100,
T = 150, and T = 200. Also presented is the time required to evaluate the
nonfractional ARMA models using the Kalman filter based procedure pre-
sented in Harvey and Phillips (1979). Except for the simplest models, the
time required to evaluate the fractional likelihood function is of the same
order of magnitude as for the nonfractional models. The times required for
the slowest fractional models are only between two and three times as long as
for the comparable nonfractional ARMA model.
The time required to obtain a single set of maximum likelihood parameter
estimates using a numerical optimizing algorithm is a function of more than
simply the time required for one evaluation of the likelihood function. It also
depends on the computer used, the number of parameters being estimated,
the particular numerical optimization algorithm used, the starting values for
the algorithm, and the population parameter for the model. In extreme
situations (e.g., large number of parameters, large number of observations,
roots near the unit circle) the time can be quite lengthy. However, for most
models currently encountered in practice, the time required to obtain the
exact maximum likelihood estimated for a fractional model is clearly of the
same order of magnitude as the time required to obtain the exact maximum
likelihood estimated for a nonfractional model. The general superiority of
exact maximum likelihood estimates for time series models has been noted in
Hillmer and Tiao (1979), Ansley and Newbold (19801, and Dahlhaus (1988).

6. Summary and extensions

This paper derived the unconditional normal likelihood function for a


stationary fractionally integrated time series model. Recursive procedures,
which allow quick evaluation of the likelihood function, were presented.
These models make it possible to obtain maximum likelihood estimates. The
small sample properties of the maximum likelihood estimates were compared
to previous estimation techniques. The maximum likelihood estimates gener-
ally had smaller bias and MSE.

‘The computations reported in this paper were performed in double precision VAX Fortran
on a VAX workstation 3200.
184 F. Sowell, Maximum likelihood estimation

The work in this paper can be extended in several directions. Currently,


the approach is being generalized to the multivariate model. Also, the Monte
Carlo work reported in this paper is only suggestive. More extensive simula-
tions need to be performed to better understand the small sample properties
of the maximum likelihood estimator. For empirical work, much of the early
research can now be extended by using the more efficient estimation proce-
dure and estimating all the parameters of the model.
One additional application of the maximum likelihood procedure is as a
unit root test. The null hypothesis of a unit root in the AR process of a
variable can be tested by first differencing the series and estimating the best
fractional ARIMA( p, d, q) model. The null hypothesis can be rejected if d^is
significantly different from zero. This approach to unit root testing has an
advantage over previous tests because it explicitly accounts for nuisance
parameters. The problems caused by nuisance parameters in unit root tests
has been noted in Schwert (1987a, b). For an application of this approach to
real United States GNP see Sowell (1990b).

Appendix 1

Justification of interchange of integrals

This is a demonstration of the integrability of

to allow interchanging the summation and the integral. The parameters


satisfy the restrictions: d is in the open interval (- i, $), h is an integer, and
IPI < I,

/,‘” E ipn(l _ e-iA)-d(l _ eiA)-de-iAh/ dA

n=O

I /,‘” 5 Ipj"l(l - ediA)-d(l - eiA)-dl dA


n=O

1
ZZZp 2rl I _ e-iA)-d(I - eiA)-dl dA.
l-IPI /0 (

The integrand is either bounded or has poles at 0 and 2rr. At the poles the
integrand behaves like A-2d near zero, which is integrable for d < $.
F. Sowell, Maximum likelihood estimation 185

Appendix 2

Evaluation of the integral

This is the evaluation of the integral

1
zj;,zi;(l _ e-iA)-d(f _ eiA)-deiAhdA

because

(1 - eic) = 2sin( t/2)ei(c+3sr)/2 = 2 sin( [/2)e’(c-“)/2

= FLT[sin(z)] -2d e-izhz dz

[see Erdelyi et al. (1953, p. 12, (1..5.29))]

T(l- 2d)( -1)”


T(l-d-h)T(l-d+h)’

Appendix 3

An alternative form of C(d, h, p)

Some of the autocovariances are written in terms of the function C(d, h, p).
The variable h can take on any integer value, p can any complex numr in the
unit circle, while d is restricted to real values on the open interval
(- i, $). An alternative form of the function, which is easier to evaluate, will
derived using the following definition and equation. The hypergeometric
function is defined by

r(a +n)T(b +n)r(c)


F(a,b;c;x) = C
.=,r(a)r(h)l‘(c+n)l’(n+l)Xfl’

For all real values of y and integer values of n, as long as the arguments of
186 F. Sowell. Maximum likelihood estimation

the gamma function are never nonpositive integers, then

r(Y+n) r(l-Y)(-l)n
= I-(1-y-n) *
(10)
T(Y)

This follows from eq. (8.334.3) of Gradshteyn and Ryzhik (1980) and cause
sin(r(n + y)) = (- 1)” sinCry) for real y and integer n.
The terms in the first sum of eq. (9) can rewritten using eq. (lo),

( -l)h+m
r(l-d+h+m)T(l-d-h-m)

1
( -l)h T(d+h +m)r(l -d+h)
= I-(1-d-h)T(l-d-h) [ T(d+h)T(1-d+h+m)

Similarly, the terms in the second sum can rewritten

( -l)h-n
T(l-d+h-n)T(l-d-h+n)

(-llh I+-h+n)r(l-d-h)
= T(l-d-h)T(l-d-h) [ r(d-h)T(l-d-h+n)
Using these relations, eq. (9) can written

r(l-2d)(-l)h
C(d,h,P) =
r(l-d-h)T(l-d+h)

p”T(d+h+m)T(l-d+h)
x P2pc
[
m=O r(l -d+h +m)T(d+h)

1
p”T(d-h+n)r(l-d-h)
+C
n=l T(d-h)T(l-d-h+n)

r(l-2d)r(d+h)
=
r(l-d+h)T(l-d)r(d)

x[p*PF(d+h,l;l -d+h;p)

+F(d-h,l;l-d-h;p)-11.
F. Sowell, Maximum likelihood estimation 187

When calculating the likelihood function for a fractional model many


different values of C(d, h,p) must evaluated. Fortunately, this does not
imply multiple evaluation of the hypergeometric function. The d and p
parameters remain fixed for all the autocovariances. Only h varies and it only
varies by integer values. Using the recursive relationship, yT(y) = r(y + 1)
and the definition of the hypergeometric function, it is straightforward to
show

F(a,l;c;p) = P;;_ll) [F(a-1,l;c-l;P)-ll.

For each evaluation of the likelihood function only p different hypergeomet-


ric functions need to evaluated. This recursive calculation of the hypergeo-
metric functions allows quick evaluations of the C(d, h, p) functions.

References
Ansley, Craig F. and Paul Newbold, 1980, Finite sample properties of estimators for autoregres-
sive moving average models, Journal of Econometrics 13, 159-183.
Aydogan, Kursat and G. Geoffrey Booth, 1988, Are there long cycles in common stock returns?,
Southern Economic Journal, 141-149.
Booth, G. Geoffrey, Fred R. Kaen, and Peter E. Koveos, 1982, R/S analysis of foreign exchange
rates under two international monetary regimes, Journal of Monetary Economics 10, 407-415.
Box, G.E.P. and G.M. Jenkins, 1976, Time series analysis forecasting and control (Holden-Day,
San Francisco, CA).
Brockwell, Peter J. and Richard A. Davis, 1987, Time series: Theory and methods (Springer-
Verlag, New York, NY).
Cheung, Yin-Wong, 1989, Long-term memory in foreign exchange rates, Unpublished manuscript
(Department of Economics, University of Pennsylvania, Philadelphia, PA).
Dahlhaus, Rainer, 1988, Small sample effects in time series analysis: A new asymptotic theory
and a new estimate, Annals of Statistics 16, 808-841.
Dahlhaus, Rainer, 1989, Efficient parameter estimation for self-similar processes, Annals of
Statistics 17, 1749-1766.
Diebold, Francis X. and Glenn D. Rudebusch, 1989, Long memory and persistence in aggregate
output, Journal of Monetary Economics 24, 189-209.
Erdelyi, Arthur et al., 1953, Higher transcendental functions, Vol. I (McGraw-Hill, New York,
NY).
Fox, Rort and Murad S. Taqqu, 1986, Large-sample properties of parameter estimates for
strongly dependent stationary Gaussian time series, Annals of Statistics 14, 517-532.
Geweke, John F. and Susan Porter-Hudak, 1982, The estimation and application of long memory
time series models, Journal of Time Series Analysis 4, 221-238.
Gourieroux, Christian, Francoise Maurel, and Alain Monfort, 1987, Regression and non station-
arity, Working paper no. 8708 (Institut National de la Statistique et des Etudes Economiques,
France).
Gradshteyn, I.S. and I.M. Ryzhik, 1980, Tables of integrals, series and products (Academic
Press, New York, NY).
Granger, C.W.J., 1980, Long memory relationships and the aggregation of dynamic models,
Journal of Econometrics 14, 227-238.
Granger, C.W.J. and Roselyne Joyeux, 1980, An introduction to long-memory time series models
and fractional differencing, Journal of Time Series Analysis 1, 15-29.
Greene, Myron T. and Bruce D. Fielitz, 1977, Long-term dependence in common stock returns,
Journal of Financial Economics 4, 339-349.
188 F. Sowell, Maximum likelihood estimation

Hannan, E.J. and Manfred Deistler, 1988, The statistical theory of linear systems (Wiley, New
York, NY).
Harvey, A.C. and G.D.A. Phillips, 1979, Maximum likelihood estimation of regression models
with autoregressive-moving average disturbances, Biometrika 66, 49-58.
Haubrich, Joseph G. and Andrew W. Lo, 1989, The sources and nature of long-term memory in
the business cycle, Working paper (Department of Finance, Wharton School, University of
Pennsylvania, Philadelphia, PA).
Helms, Billy P., Fred R. Kaen, and Rort E. Rosenman, 1984, Memory in commodity futures
contracts, Journal of Futures Markets 4, 559-567.
Hillmer, Steven C. and George C. Tiao, 1979, Likelihood function of stationary multiple
autoregressive moving average models, Journal of the American Statistical Association 74,
652-660.
Hosking, J.R.M., 1981, Fractional differencing, Biometrica 68, 165-176.
Hosking, J.R.M., 1984, Modeling persistence in hydrological time series using fractional differ-
endng, Water Resources Research 20, 1898-1908.
Janacek, G.J., 1982, Determining the degree of differencing for time series via the long
spectrum, Journal of Time Series Analysis 3, 177-183.
Lo, Andrew, 1989, Long-term memory in stock market prices, Working paper 3014-89-EFA
(M$ed P. Sloan School of Management, Massachusetts Institute of Technology, Cambridge,

McLeod, Ian, 1975, Derivation of the theoretical autocovariance function of auto-


regressive-moving average time series, Applied Statistics 24, 255-256 (Correction 26, 1941.
McLeod, Angus Ian and Keith William Hipel, 1978, Preservation of the resealed adjusted range
1: A reassessment of the Hurst phenomenon, Water Resources Research 14, 491-508.
Powell, M.J.D., 1971, Recent advances in constrained optimization, Mathematical Programming
1, 26-57.
Schwert, G. William, 1987a, Tests for unit roots: A Monte Carlo investigation, Working paper
GPB 87-01 (Center for Research in Government Policy and Business, University of Rochester,
Rochester, NY).
Schwert, G. William, 1987b, Effects of model specification on tests for unit roots in macroeco-
nomic data, Journal of Monetary Economics 20, 73-103.
Shea, Gary S., 1990, Uncertainty and implied variance bounds in long-memory model of the
interest rate term structure, Empirical Economics, forthcoming.
Sowell, Fallaw, 1989, A decomposition of block Toeplitz matrices with applications to vector
time series, Mimeo. (Graduate School of Industrial Administration, Carnegie Mellon Univer-
sity, Pittsburgh, PA).
Sowell, Fallaw, 1990a, The fractional unit root distribution, Econometrica 58, 495-505.
Sowell, Fallaw, 1990b, Modeling long run havior with the fractional ARIMA model, Working
paper (Graduate School of Industrial Administration, Carnegie Mellon University, Pitts-
burgh, PA).

You might also like