Sowell Maximum Likelihood Estimation
Sowell Maximum Likelihood Estimation
North-Holland
Fallaw Sowell
Carnegie Mellon University, Pittsburgh, PA 1.5213-3890, USA
To estimate the parameters of a stationary univariate fractionally integrated time series, the
unconditional exact likelihood function is derived. This allows the simultaneous estimation of all
the parameters of the model by exact maximum likelihood. Issues involved in obtaining
maximum likelihood estimates are discussed. Particular attention is given to efficient procedures
to evaluate the likelihood function, obtaining starting values, and the small sample properties of
the estimators. Limitations of previous estimation procedures are also discussed.
1. Introduction
*This is a generalization of Chapter 3 of the author’s Ph.D. thesis, which was supported by the
Alfred P. Sloan Foundation through a Doctoral Dissertation Fellowship. The additional work
was supported by a Faculty Development Grant from Carnegie Mellon University. The author
thanks John Geweke and two anonymous referees for helpful suggestions and Ken Murphy for
computing assistance.
f(d+l)(-1)’ r( -d + j)
T(d-j+l)T(j+l) =T(-d)T(j+l)
F. Sowell, Maximum likelihood estimation 167
where
and
are polynomials in the lag operator L. Attention will be restricted to the class
of models which satisfy the following assumptions:
Assumption 2. d < i.
These results are proven in Granger and Joyeux (1980) or Hosking (1981)
and the explicit forms of f,(A) and y(s) are derived below.
The most useful feature of fractionally integrated time series are their
long-range dependence. The dependence between observations produced by
the ARMA structure of the model decays at a geometric rate, while the
dependence produced by the fractional differencing parameter decays at the
much slower hyperbolic rate. Hence the long-range dependence between
observations is eventually determined only by the fractional differencing
parameter. These characteristics can be seen in the shapes of the spectral
density and the autocovariance function. If 0 < d < $, the process eventually
exhibits strong positive dependency between distant observations. This is
noted in the frequency domain by the spectral density increasing to an
infinite value as A approaches zero. In the time domain, this long-range
positive dependence is indicated by the fact that the autocovariances are
eventual positive and decline slowly. If - i < d < 0, the process eventually
exhibits negative dependency between distant observations. In the frequency
domain, this is indicated by the decline of the spectral density to zero as A
approaches zero. In the time domain, this long-range negative dependence is
indicated by the fact that the autocovariances are eventual negative and
decline slowly.
There are four ways of performing the first step of estimating the fractional
differencing parameter. The first is the estimated resealed range exponent, H
[see McLeod and Hipel (1978) for a survey of this statistic]. The differencing
parameter can be identified as d^= fi - i. A generalization, which is de-
signed to be robust to ‘short-term dependence’, is given in Lo (1989). Two
alternative estimation procedures exploit the special shape of the spectral
density of a fractionally integrated time series. In Janacek (1982) the differ-
encing parameter estimate is obtained by numerical integration of the log
periodogram. Geweke and Porter-Hudak (1983) exploits the behavior of the
spectral density around zero. Using frequencies near zero, a univariate
regression of the log periodogram on the log of the frequencies is performed.
The slope estimate is an estimate of the differencing parameter. The fourth
procedure presented in Shea (1990) is maximum likelihood estimation of the
regression model used in Geweke and Porter-Hudak (1983).
These procedures only estimate the differencing parameter. In the second
step, the estimated differencing parameter is used to transform the observed
series into a series that presumably follows an ARMA(p, q> model. If the
model of the observed series is @(LX1 - Lldz, = @(L)E~, the estimated
differencing parameter d^ is used to try and obtain the series u, = (1 - Lldz,
which would follow the model NL)u, = @(L)E~.
A problem is that (1 -L)” is defined as an infinite lag polynomial, hence
the series u, = (1 - Lldz, can only be obtained when there exists an infinite
realization of 2,. Given a finite sample, two different procedures have been
proposed to approximate ur; one in the time domain and the other in the
frequency domain. In the time domain the Binomial Theorem representation
of (1 - Ljd is used to transform the observed series. If u, = (1 - LY’z,, then
Cc
r( -d + j)
u,= c r(
j=. -d)T( j + 1) “-j’
The equivalence of the frequency transform and the truncated time trans-
formation applies to general linear transformations. To see this consider an
arbitrary linear operator cq_,6jLj. The frequency domain transformed
series (denoted ti,) is calculated by taking the inverse Fourier transform of
the product of the Fourier transforms of a linear operator c~=OSjLj and the
sample zr, z2,. . . , zT:
0 if sI0
min[T, s]
=
C Ss_tzt if s 2 1.
i r=r
s- 1
5
q-J+t+k)
=u,-- c
k=O r( -d^)r(f + k + 1) zk’
The spectral density of z, will be calculated in two steps, first the spectral
density of U, = (1 - Ljdz, is calculated and then the spectral density of 2,.
Assumptions 1 and 2 imply U, is generated by the ARMA(p, q) process
172 F. Sowell, Maximum likelihood estimation
O(L)
ut= ~&t*
The roots of Q(L) are assumed to be outside of the unit circle hence. Q(x)
can be written
f,(h) = WV
ww)‘* 2= ,@(w),*‘T*fi
(1 -p,w)-‘(l -p,wq-r,
j=l
f,(h) = 10(w)12a2
~ “’ 1
i
O”~j
j=l (1
-PjO) - (1 -Pj’W) ’
I
where
-1
lj=
1Pjt~l(l--PiPj)m~j(Pj-p.i)
I .
f,(A) =a2 5
I= -q
l+!f(l)O f:
j=l
O”lj
i
PfP-
(l-pjw) (1-lt;‘q
1’ (5)
where
min[q,q--ll
F. Sowell, Maximum likelihood estimation 173
[s-q j=l [
(l-Pjw) -
I
(l-~;‘O)
x(1 -w>-“(1 --O-l)-dWp+i. (6)
This is the general form of the spectral density for a time series which is
generated by a stationary fractional ARMA(p, d, q) model where the roots of
the AR polynomial are simple.
-y(s) can be calculated by substituting eq. (6) into eq. (4). From the
structure of the spectral density this reduces to the evaluation of the integral’
(recall w = e-‘“>
_e-‘A)-d(l -e’“)-de-i+j,,.
X(1
(7)
The autocovariance function can be written
Y(s) =u2
I=
2t
-4 ,= 1
ICl(l)ijC(d,P+l-s,P,). (8)
r(l-2d)T(d+s-[)
Y(S) =u2 5
I= -9
4(i) T(d)T(l -d)T(l-d-s+l).
‘If p = 0, the term in brackets equals one; this reduces to the integral evaluated in appendix 2.
Also, if p = 0, the j indexed sum and the terms i, will not in the expression of the spectral
density.
174 F. Sowell, Maximum likelihood estimation
r(l-2d)(-l)h
/
gzT(l
-e-‘“)-d(l -eiA)-de-iAhdA = r(l _d_h)T(l _d+h).
p”( - ])h+,n
C(d,h,p) =Ql- 24 p2” 2
,=,T(l-d+h+m)r(l-d-h-m)
r(l-2d)T(d+h)
= r(l -d+h)T(l -d)T(d)
x[p2”F(d+hh,l;1-d+h;P)
‘A FORTRAN program that calculates the maximum likelihood estimates is available from
the author.
‘These comments apply not only to the maximum likelihood procedure but also to the
procedure presented in Fox and Taqqu (1986).
176 F. Sowell, Maximum likelihood estimation
Procedure 1. First estimate d using the GPH procedure (or the R/S proce-
dure, etc.). Start d at the estimated value and start the initial ARh44 parameter
and the jnnovation variance at the parameters estimated by NSPE for the series
(1 - L)d~,. Where the transformation (1 - L)d is applied by the truncated
polynomial.
For small samples Procedure 1 suffers from the fact that the individual
estimates of d are extremely imprecise; while Procedure 2 suffers from the
imprecise estimate of the innovation variance. Experience has shown that in
samples sizes as large as 200, Procedure 1 and Procedure 2 (using NSPE)
yield comparable final parameter estimates. In practice, it is important to
choose several different starting values to be confident that the global
maximum has been achieved.
4’All conditions are fulfilled for fractional Gaussian noise and fractional ARMA-processes’
[Dahlhaus (1989, p. 1751)].
‘For the GPH procedure (Y was set equal to 4.
F. Sowell, Maximum likelihood estimation 177
Table 1
Estimated parameter bias and square root of the mean squared error for the ARIMA(O,d,O)
model CT= 100, 100 replications).
BIAS &iGz
d MAX F&T GPH MAX F&T GPH
- 0.4 -0.017 - 0.043 0.032 0.101 0.104 0.301
- 0.3 - 0.013 - 0.044 0.020 0.089 0.100 0.327
- 0.2 - 0.007 - 0.040 0.013 0.077 0.088 0.311
-0.1 - 0.009 - 0.055 0.016 0.079 0.096 0.274
0.0 - 0.009 - 0.048 0.002 0.079 0.094 0.316
0.1 -0.017 - 0.057 0.010 0.067 0.086 0.308
0.2 - 0.012 - 0.046 0.013 0.092 0.122 0.299
0.3 - 0.015 -0.056 - 0.032 0.077 0.097 0.326
0.4 -0.018 - 0.034 0.085 0.060 0.097 0.254
d 4 d^ 4 cl i d^ d^ i d^ i d^
~___
- 0.3 -0.7 0.102 0.138 - 0.035 0.042 0.311 0.225 0.233 0.258 0.232 0.345
-0.3 - 0.2 -0.118 - 0.094 - 0.408 - 0.337 0.046 0.242 0.264 0.352 0.342 0.296
-0.3 0.3 - 0.037 - 0.047 - 0.130 -0.128 0.040 0.185 0.206 0.296 0.297 0.334
-0.3 0.8 - 0.021 -0.017 - 0.047 - 0.034 0.013 0.095 0.062 0.114 0.060 0.318
0.0 -0.7 0.002 0.022 -0.108 - 0.050 0.283 0.172 0.157 0.142 0.093 0.308
0.0 -0.2 -0.100 - 0.064 - 0.325 - 0.253 0.055 0.235 0.248 0.352 0.325 0.292
0.0 0.3 - 0.019 - 0.006 - 0.121 - 0.088 - 0.017 0.102 0.114 0.229 0.235 0.301
0.0 0.8 0.000 0.005 - 0.046 0.024 - 0.009 0.088 0.064 0.102 0.070 0.314
0.3 -0.7 -0.137 - 0.085 - 0.145 - 0.059 0.283 0.155 0.124 0.147 0.103 0.276
0.3 -0.2 - 0.096 -0.081 - 0.521 - 0.452 0.069 0.197 0.232 0.303 0.289 0.275
0.3 0.3 - 0.026 - 0.017 - 0.109 - 0.088 - 0.010 0.096 0.125 0.240 0.248 0.274
0.3 0.8 - 0.033 -0.018 - 0.073 - 0.035 - 0.020 0.089 0.067 0.103 0.072 0.305
Table 3
Estimated parameter bias and square root of the mean squared error for the ARIMACO, d, 1) model (T = 100, 100 replications).
BIAS JMSE
MAX F&T GPH MAX F&T GPH
d 8 2 i i e^ 2 2 e^ 2 i
- 0.3 - 0.8 - 0.029 0.019 - 0.094 0.142 - 0.242 0.215 0.193 0.260 0.262 0.311
-0.3 -0.3 0.016 -0.019 -0.135 0.125 - 0.004 0.228 0.248 0.210 0.212 0.300
- 0.3 0.5 - 0.016 0.012 - 0.070 0.032 0.044 0.096 0.111 0.105 0.109 0.278
-0.3 0.9 - 0.010 0.001 - 0.048 - 0.030 -0.013 0.097 0.048 0.103 0.068 0.259
0.0 -0.8 - 0.077 0.067 - 0.328 0.295 - 0.461 0.223 0.206 0.353 0.352 0.282
0.0 -0.3 -0.019 - 0.004 - 0.190 0.156 - 0.086 0.198 0.237 0.192 0.182 0.307
0.0 0.5 - 0.013 0.012 - 0.064 0.027 - 0.015 0.100 0.118 0.119 0.117 0.275
0.0 0.9 0.012 -0.011 ~ 0.037 - 0.052 0.017 0.080 0.049 0.100 0.079 0.295
0.3 -0.8 -0.180 0.173 -0.515 0.481 - 0.486 0.245 0.259 0.277 0.305 0.281
0.3 -0.3 - 0.064 0.059 - 0.178 0.166 - 0.091 0.142 0.183 0.185 0.197 0.294
0.3 0.5 - 0.012 0.023 - 0.050 0.030 0.044 0.088 0.109 0.117 0.118 0.320
0.3 0.9 - 0.007 - 0.005 - 0.043 0.064 0.039 0.065 0.048 0.096 0.075 0.303
180 F. Sowell, Maximum likelihood estimation
model. For the ARIMACO, d, 1) model the smallest bias was achieved by
MAX for eleven of the models and by GPH for the other model. The m
of the d parameter tended to be smallest for the maximum likelihood
estimator. The m for the maximum likelihood estimates was always
smaller than for the GPH estimates, and for 21 of the 24 models considered
the \/MsE for MAX was smaller than for F&T. Generally, the maximum
likelihood estimator was superior to the procedures presented in Fox and
Taqqu (1986) and in Geweke and Porter-Hudak (1982).
T= 100
0 0.08 0.08 0.10 0.23 0.24 0.35 0.01 0.03 0.07 0.13 0.22 0.34
1 0.11 0.11 0.13 0.25 0.27 0.38 0.01 0.03 0.07 0.13 0.22 0.34
2 0.14 0.15 0.18 0.30 0.33 0.47 0.03 0.03 0.07 0.12 0.21 0.34
3 0.29 0.30 0.33 0.47 0.50 0.63 0.06 0.06 0.06 0.12 0.22 0.34
4 0.32 0.34 0.37 0.51 0.55 0.69 0.12 0.12 0.12 0.12 0.21 0.34
5 0.49 0.50 0.54 0.70 0.74 0.89 0.21 0.22 0.21 0.21 0.22 0.34
T= 150
0 0.17 0.17 0.20 0.32 0.33 0.46 0.01 0.10 0.19 0.33 0.52
1 0.20 0.21 0.23 0.36 0.90 0.49 0.01 0.04 0.10 0.19 0.33 0.52
2 0.25 0.26 0.28 0.42 0.45 0.59 0.04 0.04 0.10 0.19 0.33 0.52
3 0.40 0.42 0.44 0.59 0.64 0.78 0.09 0.09 0.10 0.19 0.32 0.51
4 0.45 0.47 0.50 0.66 0.72 0.87 0.18 0.18 0.19 0.19 0.32 0.51
5 0.63 0.66 0.70 0.87 0.94 1.10 0.32 0.32 0.32 0.32 0.32 0.52
T=200
0 0.31 0.31 0.33 0.46 0.47 0.59 0.01 0.05 0.14 0.25 0.43 0.68
1 0.34 0.34 0.37 0.50 0.53 0.65 0.01 0.05 0.14 0.25 0.43 0.68
2 0.40 0.41 0.43 0.59 0.64 0.77 0.05 0.05 0.13 0.25 0.43 0.68
3 0.57 0.58 0.61 0.78 0.83 0.99 0.12 0.12 0.13 0.25 0.43 0.68
4 0.62 0.64 0.69 0.86 0.93 1.11 0.24 0.24 0.25 0.25 0.43 0.68
5 0.81 0.84 0.92 1.08 1.16 1.37 0.42 0.42 0.42 0.43 0.43 0.68
F. Sowell, Maximum likelihood estimation 183
‘The computations reported in this paper were performed in double precision VAX Fortran
on a VAX workstation 3200.
184 F. Sowell, Maximum likelihood estimation
Appendix 1
n=O
1
ZZZp 2rl I _ e-iA)-d(I - eiA)-dl dA.
l-IPI /0 (
The integrand is either bounded or has poles at 0 and 2rr. At the poles the
integrand behaves like A-2d near zero, which is integrable for d < $.
F. Sowell, Maximum likelihood estimation 185
Appendix 2
1
zj;,zi;(l _ e-iA)-d(f _ eiA)-deiAhdA
because
Appendix 3
Some of the autocovariances are written in terms of the function C(d, h, p).
The variable h can take on any integer value, p can any complex numr in the
unit circle, while d is restricted to real values on the open interval
(- i, $). An alternative form of the function, which is easier to evaluate, will
derived using the following definition and equation. The hypergeometric
function is defined by
For all real values of y and integer values of n, as long as the arguments of
186 F. Sowell. Maximum likelihood estimation
r(Y+n) r(l-Y)(-l)n
= I-(1-y-n) *
(10)
T(Y)
This follows from eq. (8.334.3) of Gradshteyn and Ryzhik (1980) and cause
sin(r(n + y)) = (- 1)” sinCry) for real y and integer n.
The terms in the first sum of eq. (9) can rewritten using eq. (lo),
( -l)h+m
r(l-d+h+m)T(l-d-h-m)
1
( -l)h T(d+h +m)r(l -d+h)
= I-(1-d-h)T(l-d-h) [ T(d+h)T(1-d+h+m)
( -l)h-n
T(l-d+h-n)T(l-d-h+n)
(-llh I+-h+n)r(l-d-h)
= T(l-d-h)T(l-d-h) [ r(d-h)T(l-d-h+n)
Using these relations, eq. (9) can written
r(l-2d)(-l)h
C(d,h,P) =
r(l-d-h)T(l-d+h)
p”T(d+h+m)T(l-d+h)
x P2pc
[
m=O r(l -d+h +m)T(d+h)
1
p”T(d-h+n)r(l-d-h)
+C
n=l T(d-h)T(l-d-h+n)
r(l-2d)r(d+h)
=
r(l-d+h)T(l-d)r(d)
x[p*PF(d+h,l;l -d+h;p)
+F(d-h,l;l-d-h;p)-11.
F. Sowell, Maximum likelihood estimation 187
References
Ansley, Craig F. and Paul Newbold, 1980, Finite sample properties of estimators for autoregres-
sive moving average models, Journal of Econometrics 13, 159-183.
Aydogan, Kursat and G. Geoffrey Booth, 1988, Are there long cycles in common stock returns?,
Southern Economic Journal, 141-149.
Booth, G. Geoffrey, Fred R. Kaen, and Peter E. Koveos, 1982, R/S analysis of foreign exchange
rates under two international monetary regimes, Journal of Monetary Economics 10, 407-415.
Box, G.E.P. and G.M. Jenkins, 1976, Time series analysis forecasting and control (Holden-Day,
San Francisco, CA).
Brockwell, Peter J. and Richard A. Davis, 1987, Time series: Theory and methods (Springer-
Verlag, New York, NY).
Cheung, Yin-Wong, 1989, Long-term memory in foreign exchange rates, Unpublished manuscript
(Department of Economics, University of Pennsylvania, Philadelphia, PA).
Dahlhaus, Rainer, 1988, Small sample effects in time series analysis: A new asymptotic theory
and a new estimate, Annals of Statistics 16, 808-841.
Dahlhaus, Rainer, 1989, Efficient parameter estimation for self-similar processes, Annals of
Statistics 17, 1749-1766.
Diebold, Francis X. and Glenn D. Rudebusch, 1989, Long memory and persistence in aggregate
output, Journal of Monetary Economics 24, 189-209.
Erdelyi, Arthur et al., 1953, Higher transcendental functions, Vol. I (McGraw-Hill, New York,
NY).
Fox, Rort and Murad S. Taqqu, 1986, Large-sample properties of parameter estimates for
strongly dependent stationary Gaussian time series, Annals of Statistics 14, 517-532.
Geweke, John F. and Susan Porter-Hudak, 1982, The estimation and application of long memory
time series models, Journal of Time Series Analysis 4, 221-238.
Gourieroux, Christian, Francoise Maurel, and Alain Monfort, 1987, Regression and non station-
arity, Working paper no. 8708 (Institut National de la Statistique et des Etudes Economiques,
France).
Gradshteyn, I.S. and I.M. Ryzhik, 1980, Tables of integrals, series and products (Academic
Press, New York, NY).
Granger, C.W.J., 1980, Long memory relationships and the aggregation of dynamic models,
Journal of Econometrics 14, 227-238.
Granger, C.W.J. and Roselyne Joyeux, 1980, An introduction to long-memory time series models
and fractional differencing, Journal of Time Series Analysis 1, 15-29.
Greene, Myron T. and Bruce D. Fielitz, 1977, Long-term dependence in common stock returns,
Journal of Financial Economics 4, 339-349.
188 F. Sowell, Maximum likelihood estimation
Hannan, E.J. and Manfred Deistler, 1988, The statistical theory of linear systems (Wiley, New
York, NY).
Harvey, A.C. and G.D.A. Phillips, 1979, Maximum likelihood estimation of regression models
with autoregressive-moving average disturbances, Biometrika 66, 49-58.
Haubrich, Joseph G. and Andrew W. Lo, 1989, The sources and nature of long-term memory in
the business cycle, Working paper (Department of Finance, Wharton School, University of
Pennsylvania, Philadelphia, PA).
Helms, Billy P., Fred R. Kaen, and Rort E. Rosenman, 1984, Memory in commodity futures
contracts, Journal of Futures Markets 4, 559-567.
Hillmer, Steven C. and George C. Tiao, 1979, Likelihood function of stationary multiple
autoregressive moving average models, Journal of the American Statistical Association 74,
652-660.
Hosking, J.R.M., 1981, Fractional differencing, Biometrica 68, 165-176.
Hosking, J.R.M., 1984, Modeling persistence in hydrological time series using fractional differ-
endng, Water Resources Research 20, 1898-1908.
Janacek, G.J., 1982, Determining the degree of differencing for time series via the long
spectrum, Journal of Time Series Analysis 3, 177-183.
Lo, Andrew, 1989, Long-term memory in stock market prices, Working paper 3014-89-EFA
(M$ed P. Sloan School of Management, Massachusetts Institute of Technology, Cambridge,