2015 Book InnovationsInQuantitativeRiskM
2015 Book InnovationsInQuantitativeRiskM
2015 Book InnovationsInQuantitativeRiskM
Kathrin Glau
Matthias Scherer
Rudi Zagst Editors
Innovations in
Quantitative
Risk
Management
Springer Proceedings in Mathematics & Statistics
Volume 99
Springer Proceedings in Mathematics & Statistics
This book series features volumes composed of select contributions from work-
shops and conferences in all areas of current research in mathematics and statistics,
including OR and optimization. In addition to an overall evaluation of the interest,
scientific quality, and timeliness of each proposal at the hands of the publisher,
individual contributions are all refereed to the high quality standards of leading
journals in the field. Thus, this series provides the research community with well-
edited, authoritative reports on developments in the most exciting areas of math-
ematical and statistical research today.
Rudi Zagst
Editors
Innovations in Quantitative
Risk Management
TU München, September 2013
Editors
Kathrin Glau Rudi Zagst
Chair of Mathematical Finance Chair of Mathematical Finance
Technische Universität München Technische Universität München
Garching Garching
Germany Germany
Matthias Scherer
Chair of Mathematical Finance
Technische Universität München
Garching
Germany
Dear Reader,
We would like to thank you very much for studying the proceedings volume
of the conference “Risk Management Reloaded”, which took place in Garching-
Hochbrück, during September 9–13, 2013. This conference was organized by the
KPMG Center of Excellence in Risk Management and the Chair of Mathematical
Finance at Technische Universität München. The scientific committee consisted
of Prof. Claudia Klüppelberg, Prof. Matthias Scherer, Prof. Wim Schoutens, and
Prof. Rudi Zagst. Selected speakers were approached to contribute with a manu-
script to this proceedings volume. We are grateful for the large number of high-
quality submissions and would like to especially thank the many referees that
helped to control and even improve the quality of the presented papers.
The objective of the conference was to bring together leading researchers and
practitioners from all areas of quantitative risk management to take advantage of the
presented methodologies and practical applications. With more than 200 registered
participants (about 40 % practitioners) and 80 presentations we outnumbered our
own expectations for this inaugural event. The broad variety of topics is also
reflected in the long list of keynote speakers and their presentations: Prof. Hansjörg
Albrecher (risk management in insurance), Dr. Christian Bluhm (credit-risk mod-
eling in risk management), Prof. Fabrizio Durante (dependence modeling in risk
management), Dr. Michael Kemmer (regulatory developments in risk manage-
ment), Prof. Rüdiger Kiesel (model risk for energy markets), Prof. Ralf Korn (new
mathematical developments in risk management), Prof. Alfred Müller (new risk
measures), Prof. Wim Schoutens (model, calibration, and parameter risk), and Prof.
Josef Zechner (risk management in asset management). Besides many invited and
contributed talks, the conference participants especially enjoyed a vivid panel
discussion titled “Quo vadis quantitative risk management?” with Dr. Christopher
Lotz, Dr. Matthias Mayer, Vassilios Pappas, Prof. Luis Seco, and Dr. Daniel
Sommer as participants and Markus Zydra serving as anchorman. Moreover,
we had a special workshop on copulas (organized by Prof. Fabrizio Durante and
Prof. Matthias Scherer), a DGVFM workshop on “Alternative interest guarantees in
life insurance” (organized by Prof. Ralf Korn and Prof. Matthias Scherer),
v
vi Preface I
Kathrin Glau
Matthias Scherer
Rudi Zagst
Preface II
vii
viii Preface II
ix
x Contents
1 Introduction
According to the Interaction between Market and Credit Risk (IMCR) research group
of the Basel Committee on Banking Supervision (BCBS) [5], liquidity conditions
interact with market risk and credit risk through the horizon over which assets can
be liquidated. To face the impact of market liquidity risk, risk managers agree in
adopting a longer holding period to calculate the market VaR, for instance 10 business
D. Brigo (B)
Department of Mathematics, Imperial College London, 180 Queen’s Gate,
London SW7 2AZ, UK
e-mail: [email protected]
C. Nordio
Risk Management, Banco Popolare, Milan, Italy
e-mail: [email protected]
days instead of 1; recently, BCBS has prudentially stretched such liquidity horizon
to 3 months [6]. However, even the IMCR group pointed out that the liquidity of
traded products can vary substantially over time and in unpredictable ways, and
moreover, IMCR studies suggest that banks’ exposures to market risk and credit
risk vary with liquidity conditions in the market. The former statement suggests a
stochastic description of the time horizon over which a portfolio can be liquidated,
and the latter highlights a dependence issue.
We can start by saying that probably the holding period of a risky portfolio is
neither 10 business days nor 3 months; it could, for instance, be 10 business days
with probability 99 % and 3 months with probability 1 %. This is a very simple
assumption but it may have already interesting consequences. Indeed, given the FSA
(now Bank of England) requirement to justify liquidity horizon assumptions for the
Incremental Risk Charge modeling, a simple example with the two-points liquidity
horizon distribution that we develop below could be interpreted as a mixture of
the distribution under normal conditions and of the distribution under stressed and
rare conditions. In the following we will assume no transaction costs, in order to
fully represent the liquidity risk through the holding period variability. Indeed, if
we introduce a process describing the dynamics of such liquidity conditions, for
instance,
• the process of time horizons over which the risky portfolio can be fully bought or
liquidated,
then the P&L is better defined by the returns calculated over such stochastic time
horizons instead of a fixed horizon (say daily, weekly or monthly basis). We will
use the “stochastic holding period” (SHP) acronym for that process, which belongs
to the class of positive processes largely used in mathematical finance. We define
the liquidity-adjusted VaR or Expexted Shortfall (ES) of a risky portfolio as the VaR
or ES of portfolio returns calculated over the horizon defined by the SHP process,
which is the ‘operational time’ along which the portfolio manager must operate, in
contrast to the ‘calendar time’ over which the risk manager usually measures VaR.
risk and propose a liquidity adjusted VaR measure built using the distribution of the
bid-ask spreads. The other mentioned studies model and account for endogenous risk
in the calculation of liquidity adjusted risk measures. In the context of the coherent
risk measures literature, the general axioms a liquidity measure should satisfy are
discussed in [1]. In that work coherent risk measures are defined on the vector space
of portfolios (rather than on portfolio values). A key observation is that the portfolio
value can be a nonlinear map on the space of portfolios, motivating the introduction
of a nonlinear value function depending on a notion of liquidity policy based on a
general description of the microstructure of illiquid markets.
As mentioned earlier, bid-ask spreads have been used to assess liquidity risk.
While bid-ask spreads are certainly an important measure of liquidity, they are not
the only one. In the Credit Default Swap (CDS) space, for example, Predescu et al.
[22] have built a statistical model that associates an ordinal liquidity score with
each CDS reference entity. The liquidity score is built using well-known liquidity
indicators such as the already mentioned bid-ask spreads but also using other less
accessible predictors of market liquidity such as number of active dealers quoting
a reference entity, staleness of quotes of individual dealers, and dispersion in mid-
quotes across market dealers. The bid-ask spread is used essentially as an indicator of
market breadth; the presence of orders on both sides of the trading book corresponds
to tighter bid-ask spreads. Dispersion of mid-quotes across dealers is a measure
of price uncertainty about the actual CDS price. Less liquid names are generally
associated with more price uncertainty and thus large dispersion. The third liquidity
measure that is used in Predescu et al. [22] aggregates the number of active dealers and
the individual dealers’ quote staleness into an (in)activity measure, which is meant
to be a proxy for CDS market depth. Illiquidity increases if any of the liquidity
predictors increases, keeping everything else constant. Therefore, liquid (less liquid)
names are associated with smaller (larger) liquidity scores. CDS liquidity scores are
now offered commercially by Fitch Solutions and as of 2009 provided a comparison
of relative liquidity of over 2,400 reference entities in the CDS market globally,
mainly concentrated in North America, Europe, and Asia. The model estimation and
the model generated liquidity scores are based upon the Fitch CDS Pricing Service
database, which includes single-name CDS quotes on over 3,000 entities, corporates,
and sovereigns across about two dozen broker-dealers back to 2000. This approach
and the related results, further highlighting the connection between liquidity and
credit quality/rating, are summarized in [14], who further review previous research
on liquidity components in the pricing space for CDS.
Given the above indicators of liquidity risk, the SHP process seems to be naturally
associated with the staleness/inactivity measure. However, one may argue that the
random holding period also embeds market impact and bid-ask spreads. Indeed,
traders will consider closing a position or a portfolio also in terms of cost. If bid-
ask spreads cause the immediate closure of a position to be too expensive, market
operators might wait for bid-asks to move. This will impact the holding period for
the relevant position. If we take for granted that the risk manager will not try to
model the detailed behavior of traders, then the stochastic holding period becomes a
reduced form process for the risk manager, which will possibly incapsulate a number
6 D. Brigo and C. Nordio
The Basel Committee came out with a recommendation on multiple holding periods
for different risk factors in 2012 in [7]. This document states that
The Committee is proposing that varying liquidity horizons be incorporated in the market
risk metric under the assumption that banks are able to shed their risk at the end of the
liquidity horizon.[...]. This proposed liquidation approach recognises the dynamic nature of
banks trading portfolios but, at the same time, it also recognises that not all risks can be
unwound over a short time period, which was a major flaw of the 1996 framework.
Further on, in Annex 4, the document details a sketch of a possible solution: assign
a different liquidity horizon to risk factors of different types. While this is a step
forward, it can be insufficient. How is one to decide the horizon for each risk factor,
and especially how is one to combine the different estimates for different horizons
for assets in the same portfolio into a consistent and logically sound way? Our
random holding period approach allows one to answer the second question, but more
generally none of the above works focuses specifically on our setup with random
holding period, which represents a simple but powerful idea to include liquidity in
traditional risk measures such as Value at Risk or Expected Shortfall. Our idea was
first proposed in 2010 in [13].
When analyzing multiple positions, holding periods can be taken to be strongly
dependent, in line with the first classification (a) of Bangia et al. [4] above, or
independent, so as to fit the second category (b). We will discuss whether adding
dependent holding periods to different positions can actually add dependence to the
position returns.
The paper is organized as follows. In order to illustrate the SHP model, first in
a univariate case (Sect. 2) and then in a bivariate one (Sect. 3), it is considerably
easier to focus on examples on (log)normal processes. A brief colloquial hint at
positive processes is presented in Sect. 2, to deepen the intuition of the impact on
risk measures of introducing a SHP process. Across Sects. 3 and 4, where we try
to address the issue of calibration, we outline a possible multivariate model which
could be adopted, in line of principle, in a top-down approach to risk integration in
order to include the liquidity risk and its dependence on other risks.
A Random Holding Period Approach for Liquidity-Inclusive Risk Management 7
Finally, we point out that this paper is meant as a proposal to open a research
effort in stochastic holding period models for risk measures. This paper contains
several suggestions on future developments, depending on an increased availability
of market data. The core ideas on the SHP framework, however, are presented in this
opening paper.
Let us suppose that we have to calculate the VaR of a market portfolio whose value
at time t is Vt . We call X t = ln Vt , so that the log return on the portfolio value at
time t over a period h is
Vt+h − Vt
X t+h − X t = ln(Vt+h /Vt ) ≈ .
Vt
In order to include liquidity risk, the risk manager decides that a realistic, simplified
statistics of the holding period in the future will be the one given in Table 1. To
estimate liquidity-adjusted VaR say at time 0, the risk manager will perform a number
of simulations of V0+H0 − V0 with H0 randomly chosen by the statistics above, and
finally will calculate the desired risk measure from the resulting distribution. If
the log-return X T − X 0 is normally distributed with zero mean and variance T for
deterministic T (e.g., a Brownian motion, i.e., a Random walk), then the risk manager
d √
could simplify the simulation using X 0+H0 − X 0 | H0 ∼ H0 (X 1 − X 0 ) where | H0
denotes “conditional on H0 ”. With this practical exercise in mind, let us generalize
this example to a generic t.
A process for the risk horizon at time t, i.e., t → Ht , is a positive stochastic process
modeling the risk horizon over time. We have that the risk measure at time t will be
taken on the change in value of the portfolio over this random horizon. If X t is the
log-value of the portfolio at time t, we have that the risk measure at time t is to be
taken on the log-return
X t+Ht − X t .
8 D. Brigo and C. Nordio
For example, if one uses a 99 % Value at Risk (VaR) measure, this will be the 1st
percentile of X t+Ht − X t . The request that Ht be just positive means that the horizon
at future times can both increase and decrease, meaning that liquidity can vary in
both directions.
There are a large number of choices for positive processes: one can take lognormal
processes with or without mean reversion, mean reverting square root processes,
squared Gaussian processes, all with or without jumps. This allows one to model the
holding period dynamics as mean reverting or not, continuous or with jumps, and
with thinner or fatter tails. Other examples are possible, such as Variance Gamma or
mixture processes, or Levy processes. See for example [11, 12].
Going back to our notation, VaRt,h,c and ESt,h,c are the value at risk and expected
shortfall, respectively, for a horizon h at confidence level c at time t, namely
P X t+h − X t > −VaRt,h,c = c, ESt,h,c = −E X t+h − X t |X t+h − X t ≤ −VaRt,h,c .
We now recall the standard result on VaR and ES under Gaussian returns in
deterministic calendar time.
X t+h −X t is normally distributed with mean μt,h and standard deviation σt,h (1)
we obtain
VaRt,h,c = −μt,h + Φ −1 (c)σt,h , ESt,h,c = −μt,h + σt,h p Φ −1 (c) /(1 − c)
where p is the standard normal probability density function and Φ is the standard
normal cumulative distribution function.
In the following we will calculate VaR and Expected Shortfall referred to a confi-
dence level of 99.96 %, calculated over the fixed time horizons of 10 and 75 business
days, and under SHP process with statistics given by Table 1, using Monte Carlo
simulations. Each year has 250 (working) days. The results are presented in Table 2.
More generally, we may derive the VaR and ES formulas for the case where Ht
is distributed according to a general distribution
and
P(X t+h − X t ≤ x) = FX,t,h (x).
Definition 1 (VaR and ES under Stochastic Holding Period) We define VaR and ES
under a random horizon Ht at time t and for a confidence level c as
P X t+Ht − X t > −VaR H,t,c = c, ES H,t,c = −E X t+Ht − X t |X t+Ht − X t ≤ −VaR H,t,c .
10 D. Brigo and C. Nordio
We point out that the order of time/confidence/horizon arguments in the VaR and
ES definitions is different in the Stochastic Holding Period case. This is to stress the
different setting with respect to the fixed holding period case.
We have immediately the following:
∞
∞
1
ES H,t,c = − E X t+h − X t |X t+h − X t ≤ −VaR H,t,c Prob X t+h − X t ≤ −VaR H,t,c dFH,t (h)
1−c
0
∞
μt,h + VaR H,t,c
Φ dFH,t (h) = c
σt,h
0
∞
1 −μt,h − VaR H,t,c −μt,h − VaR H,t,c
ES H,t,c = −μt,h Φ + σt,h p dFH,t (h)
1−c σt,h σt,h
0
Notice that in general one can try and obtain the quantile VaR H,t,c for the random
horizon case by using a root search, and subsequently compute also the expected
shortfall. Careful numerical integration is needed to apply these formulas for general
distributions of Ht . The case of Table 2 is somewhat trivial, since in the case where
H0 is as in Table 1 integrals reduce to summations of two terms.
We note also that the maximum difference, both in relative and absolute terms,
between ES and VaR is reached by the model under random holding period H0 .
Under this model the change in portfolio value shows heavier tails than under a single
deterministic holding period. In order to explore the impact of SHP’s distribution tails
on the liquidity-adjusted risk, in the following we will simulate SHP models with
H0 distributed as an Exponential, an Inverse Gamma distribution1 and a Generalized
Within multivariate modeling, using a common SHP for many normally distributed
risks leads to dynamical versions of the so-called normal mixtures and normal mean-
variance mixtures [19].
Assumption 2 In this section we assume that different assets have the same random
holding period, thus testing an extreme liquidity dependence scenario. We will briefly
discuss relaxing this assumption at the end of this section. We further assume that
the stochastic holding period process is independent of the log returns of assets in
deterministic calendar time.
Let the log returns (recall X ti = ln Vti , with Vti the value at time t of the ith asset)
1
X t+h − X t1 , . . . , X t+h
m
− X tm
∞
= P X t+h
1
− X t1 < x1 , . . . , X t+h
m
− X tm < xm dFH,t (h)
0
α, the fatter the tails. The mean is, if α > 1, E[H0 ] = k/(α − 1).
12
∞
1 − X1 + · · · + w m m < z dF
P X t+Ht − X t < z = P w1 X t+h t m X t+h − X t H,t (h)
0
In particular, in analogy with the unidimensional case, the mixture may potentially
generate skewed and fat-tailed distributions, but when working with more than one
asset this has the further implication that VaR is not guaranteed to be subadditive on
the portfolio. Then the risk manager who wants to take into account SHP in such a
setting should adopt a coherent measure like Expected Shortfall.
A natural question at this stage is whether the adoption of a common SHP can add
dependence to returns that are jointly Gaussian under deterministic calendar time,
perhaps to the point of making extreme scenarios on the joint values of the random
variables possible.
Before answering this question, one needs to distinguish extreme behavior in the
single variables and in their joint action in a multivariate setting. Extreme behavior
on the single variables is modeled, for example, by heavy tails in the marginal dis-
tributions of the single variables. Extreme behavior in the dependence structure of,
say, two random variables is achieved when the two random variables tend to take
extreme values in the same direction together. This is called tail dependence, and one
can have both upper tail dependence and lower tail dependence. More precisely, but
still loosely speaking, tail dependence expresses the limiting proportion according
to which the first variable exceeds a certain level given that the second variable has
already exceeded that level. Tail dependence is technically defined through a limit,
so that it is an asymptotic notion of dependence. For a formal definition we refer,
for example, to [19]. “Finite” dependence, as opposed to tail, between two random
variables is best expressed by rank correlation measures such as Kendall’s tau or
Spearman’s rho.
We discuss tail dependence first. In case the returns of the portfolio assets are
jointly Gaussian with correlations smaller than one, the adoption of a common ran-
dom holding period for all assets does not add tail dependence, unless the commonly
adopted random holding period has a distribution with power tails. Hence, if we
want to rely on one of the random holding period distributions in our examples
above to introduce upper and lower tail dependence in a multivariate distribution for
the assets returns, we need to adopt a common random holding period for all assets
that is Pareto or Inverse Gamma distributed. Exponentials, Lognormals, or discrete
Bernoulli distributions would not work. This can be seen to follow from properties of
the normal variance-mixture model, see for example [19], p. 212 and also Sect. 7.3.3.
A more specific theorem that fits our setup is Theorem 5.3.8 in [23]. We can write
it as follows with our notation.
Proposition 3 (A common random holding period with less than power tails does
not add tail dependence to jointly Gaussian returns) Assume the log returns to be
Wti = ln Vti , with Vti the value at time t of the ith asset, i = 1, 2, where
1
Wt+h − Wt1 , Wt+h
2
− Wt2
14 D. Brigo and C. Nordio
are two correlated Brownian motions, i.e., normals with zero means, variances h,
and instantaneous correlation less than 1 in absolute value:
d W 1 , W 2 = dWt1 dWt2 = ρ1,2 dt, |ρ1,2 | < 1.
t
W H1 0 , W H2 0
√
if and only if H0 is regularly varying at ∞ with index α > 0.
Theorem 5.3.8 in [23] also reports an expression for the tail dependence coeffi-
cients as functions of α and of the survival function of the student t distribution with
α + 1 degrees of freedom.
Summarizing, if we work with power tails, the heavier the tails of the common
holding period process H , the more one may expect tail dependence to emerge for the
multivariate distribution: by adopting a common SHP for all risks, dependence could
potentially appear in the whole dynamics, in agreement with the fact that liquidity
risk is a systemic risk.
We now turn to finite dependence, as opposed to tail dependence. First, we note the
well-known elementary but important fact that one can have two random variables
with very high dependence but without tail dependence. Or one can have two random
variables with tail dependence but small finite dependence. For example, if we take
two jointly Gaussian Random variables with correlation 0.999999, they are clearly
quite dependent on each other but they will not have tail dependence, even if a
rank correlation measure such as Kendall’s τ would be 0.999, still very close to 1,
characteristic of the co-monotonic case. This is a case with zero tail dependence but
very high finite dependence. On the other hand, take a bivariate student t distribution
with few degrees of freedom and correlation parameter ρ = 0.1. In this case the two
random variables have positive tail dependence and it is known that Kendall’s tau
for the two random variables is
2
τ= arcsin(ρ) ≈ 0.1
π
which is the same tau one would get for two standard jointly Gaussian random
variables with correlation ρ. This tau is quite low, showing that one can have positive
tail dependence while having very small finite dependence.
The above examples point out that one has to be careful in distinguishing large
finite dependence and tail dependence.
A further point of interest in the above examples comes from the fact that the
multivariate student t distribution can be obtained by the multivariate Gaussian dis-
tribution when adopting a random holding period given by an inverse gamma dis-
tribution (power tails). We deduce the important fact that in this case a common
A Random Holding Period Approach for Liquidity-Inclusive Risk Management 15
random holding period with power tails adds positive tail dependence but not finite
dependence.
In fact, one can prove a more general result easily by resorting to the tower property
of conditional expectation and from the definition of tau based on independent copies
of the bivariate random vector whose dependence is being measured. One has the
following “no go” theorem for increasing Kendall’s tau of jointly Gaussian returns
through common random holding periods, regardless of the tail’s power.
Proposition 4 (A common random holding period does not alter Kendall’s tau for
jointly Gaussian returns) Assumptions as in Proposition 3 above. Then adding a
common nonnegative random holding period H0 independent of W ’s leads to the
same Kendall’s tau for
W H1 0 , W H2 0
a consistent effort on theoretical and statistical studies. This will possibly result in
available synthetic indices of liquidity risk grouped by region, market, instrument
type, etc. For instance, Fitch already calculates market liquidity indices on CDS
markets worldwide, on the basis of a scoring proprietary model [14].
5 Conclusions
Within the context of risk integration, in order to include liquidity risk in the whole
portfolio risk measures, a stochastic holding period (SHP) model can be useful,
being versatile, easy to simulate, and easy to understand in its inputs and outputs.
In a single-portfolio framework, as a consequence of introducing an SHP model, the
statistical distribution of P&L moves to possibly heavier tailed and skewed mixture
distributions. In a multivariate setting, the dependence among the SHP processes to
which marginal P&L are subordinated, may lead to dependence on the latter under
drastic choices of the SHP distribution, and in general to heavier tails on the total
3 A similar approach is adopted in [21] within the context of operational risk modeling.
A Random Holding Period Approach for Liquidity-Inclusive Risk Management 17
Acknowledgments This paper reflects the authors’ opinions and not necessarily those of their
current and past employers. The authors are grateful to Dirk Tasche for helpful suggestions and
correspondence on a first version, to Giacomo Scandolo for further comments and correspondence,
and to an anonymous referee and the Editors for important suggestions that helped in improving
the paper.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
1. Acerbi, C., Scandolo, G.: Liquidity risk theory and coherent measures of risk. Quant. Financ.
8(7), 681–692 (2008)
2. Alexander, C.: Normal mixture diffusion with uncertain volatility: modelling short and long-
term smile effects. J. Bank. Financ. 28(12), 2957–2980 (2004)
3. Angelidis, T., Benos, A.: Liquidity adjusted value-at-risk based on the components of the
bid-ask spread. Working paper, available at http://ssrn.com/abstract=661281 (2005)
4. Bangia, A., Diebold, F.X., Schuermann, T., Stroughair, J.D.: Modeling liquidity risk with impli-
cations for traditional market risk measurement and management. Working paper, Financial
Institutions Center at The Wharton School (1999)
5. Basel Committee on Banking Supervision. Findings on the interaction of market and credit
risk, BCBS working paper No 16 May 2009, available at http://www.bis.org (2009)
6. Basel Committee on Banking Supervision. Guidelines for computing capital for incremental
risk in the trading book, BCBS guidelines, July 2009, available at http://www.bis.org (2009)
7. Basel Committee on Banking Supervision. Consultative document: fundamental review of the
trading book, BIS, May 2012, available at http://www.bis.org/publ/bcbs219.pdf (2012)
8. Bhupinder, B.: Implied risk-neutral probability density functions from option prices: a cen-
tral bank perspective. In: Knight, J., Satchell, S. (eds.) Forecasting Volatility in the Financial
Markets, pp. 137–167. Butterworth Heinemann, Oxford (1998)
9. Brigo, D., Mercurio, F.: Displaced and mixture diffusions for analytically-tractable smile mod-
els. In: Geman, H., Madan, D.B., Pliska, S.R., Vorst, A.C.F. (eds.) Mathematical Finance-
Bachelier Congress 2000. Springer, Berlin (2001)
10. Brigo, D., Mercurio, F., Rapisarda, F.: Smile at uncertainty, risk, May issue (2004)
11. Brigo, D., Dalessandro, A., Neugebauer, M., Triki, F.: A stochastic processes toolkit for risk
management: geometric Brownian motion, jumps, GARCH and variance gamma models. J.
Risk Manag. Financ. Inst. 2(4), 365–393 (2009)
12. Brigo, D., Dalessandro, A., Neugebauer, M., Triki, F.: A stochastic processes toolkit for risk
management. Mean reversing processes and jumps. J. Risk Manag. Financ. Inst. 3, 1 (2009)
13. Brigo, D., Nordio, C.: Liquidity-adjusted market risk measures with stochastic holding period.
Available at http://arxiv.org/abs/1009.3760 and http://ssrn.com/abstract=1679698 (2010)
14. Brigo, D., Predescu, M., Capponi, A.: Liquidity modeling for credit default swaps: an overview.
In: Bielecki, Brigo, Patras (eds.) Credit Risk Frontiers: Sub prime crisis, Pricing and Hedging,
CVA, MBS, Ratings and Liquidity, pp. 587–617. Wiley/Bloomberg Press, See also http://ssrn.
com/abstract=1564327 (2010)
18 D. Brigo and C. Nordio
15. Ernst, C., Stange, S., Kaserer, C.: Accounting for non-normality in liquidity risk. Available at
http://ssrn.com/abstract=1316769 (2009)
16. Guo, C.: Option pricing with heterogeneous expectations. Financ. Rev. 33, 81–92 (1998)
17. Jarrow, R., Subramanian, A.: Mopping up liquidity. RISK 10(10), 170–173 (1997)
18. Jarrow, R., Protter, P.: Liquidity Risk and Risk Measure Computation. Working paper, Cornell
University (2005)
19. McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management. Princeton University
Press, New Jersey (2005)
20. Melick, W.R., Thomas, C.P.: Recovering an asset’s implied PDF from option prices: an appli-
cation to crude oil during the Gulf crisis. J. Financ. Quant. Anal. 32, 91–115 (1997)
21. Moscadelli, M.: The modelling od operational risk: experience with the analysis of the data
collected by the Base Committee, Banca d’Italia, Temi di discussione del Servizio Studi,
Number 517 (July 2004), available at http://www.bancaditalia.it/pubblicazioni/econo/temidi
(2004)
22. Predescu, M., Thanawalla, R., Gupton, G., Liu, W., Kocagil, A., Reyngold, A.: Measuring CDS
liquidity. Fitch Solutions Presentation at the Bowles Symposium, Georgia State University,
12(2009)
23. Prestele, C.: Credit portfolio modelling with elliptically contoured distributions. Approxima-
tion, Pricing, Dynamisation. Doctoral dissertation, Ulm University. Available at http://vts.uni-
ulm.de/docs/2007/6093/vts_6093_8209.pdf (2007)
24. Ritchey, R.J.: Call option valuation for discrete normal mixtures. J. Financ. Res. 13, 285–296
(1990)
25. Stange, S., Kaserer, C.: Why and how to integrate liquidity risk into a VaR-framework. CEFS
working paper 2008 No. 10, available at http://ssrn.com/abstract=1292289 (2008)
26. Tasche, D.: Measuring sectoral diversification in an asymptotic multi-factor framework. J.
Credit Risk 2(3), 33–55 (2006)
27. Tasche, D.: Capital allocation to business units and sub-portfolios: the euler principle. In: Resti,
A., (ed.) Pillar II in the New Basel Accord: The Challenge of Economic Capital, Risk Books,
pp. 423–453 (2008)
28. Tasche, D.: Capital allocation for credit portfolios with kernel estimators. Quant. Financ. 9(5),
581–595 (2009)
Regulatory Developments in Risk
Management: Restoring Confidence
in Internal Models
Abstract The paper deals with the question of how to restore lost confidence in
the results of internal models (especially market risk models). This is an impor-
tant prerequisite for continuing to use these models as a basis for calculating risk-
sensitive prudential capital requirements. The authors argue that restoring confidence
is feasible. Contributions to this end will be made both by the reform of regulatory
requirements under Basel 2.5 and the Trading Book Review and by refinements of
these models by the banks themselves. By contrast, capital requirements calculated
on the basis of a leverage ratio and prudential standardised approaches will not be
sufficient, even from a regulatory perspective, owing to their substantial weaknesses.
Specific proposals include standardising models with a view to reducing complexity
and enhancing comparability, significantly improving model validation and increas-
ing transparency as to how model results are determined, also over time. The article
reflects the personal views of the authors.
1 Introduction
Since 1997 (“Basel 1.5”), banks in Germany have been allowed to calculate their
capital requirements for the trading book using internal value-at-risk (VaR) models
that have passed a comprehensive and stringent supervisory vetting and approval
process. Basel II and Basel III saw the introduction of further internal models com-
plementing the standardised approaches already available—take, for example, the
internal ratings-based (IRB) approach for credit risk under Basel II and the advanced
credit valuation adjustment (CVA) approach for counterparty risk under Basel III.
During the financial crisis, particular criticism was directed at internal market risk
models, the design of which supervisors largely left to the banks themselves. This
article therefore confines itself to examining these models, which are a good starting
point for explaining and commenting on the current debate. Much of the following
applies to other types of internal models as well.
Banks and supervisors learned many lessons from the sometimes unsatisfactory
performance of VaR models in the crisis—one of the root causes of the loss of confi-
dence by investors in model results. This led, at bank level, to a range of improvements
in methodology, and also to the realisation that not all products and portfolios lend
themselves to internal modelling. At supervisory level, Basel 2.5 ushered in an initial
reform with rules that were much better at capturing extreme risks (tail risks) and
that increased capital requirements at least threefold. Work on a fundamental trading
book review (Basel 3.5), which will bring further methodological improvements to
regulatory requirements, is also underway.
Nevertheless, models are still criticised as being
• too error-prone,
• suitable only for use in “fair-weather” conditions,
• too variable in their results when analysing identical risks,
• insufficiently transparent for investors and
• manipulated by banks, with the tacit acceptance of supervisors, with the aim of
reducing their capital requirements.
As a result, the credibility of model results and thus their suitability for use as a
basis for calculating capital requirements have been challenged. This culminated
in, for example, the following statement by the academic advisory board at the
German Ministry for Economic Affairs: “Behind these flaws (in risk modelling)1
lie fundamental problems that call into question the system of model-based capital
regulation as a whole.”2 It therefore makes good sense to explore the suitability of
possible alternatives. The authors nevertheless conclude that model-based capital
charges should be retained. But extensive efforts are needed to restore confidence in
model results.
The market disruption which accompanied the start of the financial crisis in the second
half of 2007 took the form in banks’ trading units of sharply falling prices with a
corresponding impact on their daily P&Ls after a prolonged phase of low volatility.
Uncertainty grew rapidly about the accuracy of estimated probabilities of default,
default correlations of the underlying loans and the scale of loss in the event of default,
and thus also about the probabilities of default and recovery rates of the securitisation
instruments. This in turn caused spreads to widen, volatility to increase and market
liquidity for securitisation products to dry up. A major exacerbating factor was that
many market participants responded in the same way (“flight to simplicity”, “flight
to quality”). Later on, there were also jump events such as downgrades. Calibrating
the above parameters proved especially problematic since there was often a lack of
historical default or market data. Unlike in the period before the crisis, even AAA-
rated senior or super senior tranches of securitisation instruments, which only start
to absorb loss much later than their riskier counterparts, suffered considerably in
value as the protective cushion of more junior tranches melted away, necessitating
substantial write-downs.3
The performance of internal market-risk models was not always satisfactory, espe-
cially in the second half of 2007 and in the “Lehman year” of 2008. In this period,
a number of banks found that the daily loss limits forecast by their models were
sometimes significantly exceeded (backtesting outliers).4 The performance results
of Deutsche Bank, for instance, show that losses on some sub-portfolios were evi-
dently serious enough to have an impact on the overall performance of the bank’s
trading unit. This demonstrates the extremely strong market disruption which can
follow an external shock. When backtesting a model’s performance, the current
clean P&L—P&Lt —is compared with the previous day’s VaR forecast VaRt−1 .5 At
a confidence level of 99 %, an average of two to three outliers a year may be antic-
ipated over the long term (representing 1 % of 250–260 trading days a year). In the
years between 2007 and 2013, Deutsche Bank had 12, 35, 1, 2, 3, 2 and 2 outliers.6
Although the models’ performance for 2007 and 2008 looks bad at first sight, the
question nevertheless arises as to whether or not these outliers are really the models’
“fault”, so to speak. By their very nature, models can only do what they have been
designed to do: “If you’re in trouble, don’t blame your model.” To function properly,
the models needed liquid markets, adequate historical market data and total coverage
of all market risks, particularly migration and default risk. These prerequisites were
not always met by markets and banks. Anyone using a model has to be aware of its
limitations and exercise caution when working with its results.
Even Germany’s Federal Financial Supervisory Authority BaFin pointed out that,
given the extreme combination of circumstances on the market in connection with
the financial crisis, the figures do not automatically lead to the conclusion that the
predicative quality of the models is inadequate.7 The example could indicate that,
outliers are based on “clean” P&L data. This inconsistency was eliminated in 2010. Dirty and clean
P&L figures may differ. This is because clean P&L simply shows end-of-day positions revalued
using prices at the end of the following trading day, whereas dirty P&L also includes income from
intraday trading, fees and commissions and interest accrued.
6 [11], Management Report, 2007, p. 88, 2008, p. 98, 2009, p. 85, 2010, p. 95, 2011, p. 104, 2012,
since 2009, the bank has been successful in eliminating its models weaknesses, at
least at the highest portfolio level. It should nevertheless be borne in mind that market
phases analysed after 2008 were sometimes quieter and that there has also been some
reduction in risk. The increasing shift in the nature of the financial crisis from 2010
towards a crisis concerning the creditworthiness of peripheral European countries,
which created new market disruption, is most certainly reflected at the highest level of
the backtesting time series. Particularly large losses were incurred in March and May
2010, which only in May 2010 led to the two outliers realised that year. These outliers
may be explained by the fears brewing at the time about the situation of the PIIGS
states. Possibly, the scale of the corresponding trading activities was such that any
problems with the models for these sub-portfolios made themselves felt at the highest
portfolio level. The weaknesses outlined below were, by the banks own testimony,
identified and rapidly addressed.8 As mentioned above, two to three outliers per year
represent the number to be expected and are not sufficient, in themselves, to call the
quality of modelling into question.
The flaws banks identified in their models following the outbreak of the crisis
revealed that a variety of areas needed work and improvement. These improvements
have since been carried out. Some examples of model weaknesses which banks have
now resolved are9 :
1. No coverage of default-risk driven “jump events”, such as rating changes and
issuer defaults. At the outbreak of the crisis, models often failed to cover the
growing amount of default risk in the trading book. The introduction of IRC
models10 to cover migration and default risk helped to overcome this.
2. Insufficient coverage of market liquidity risk. It was often not possible to liquidate
or hedge positions within the ten-day holding period assumed under Basel 1.5.
This led to risks being underestimated. Basel 2.5 takes account of market liquidity
risk explicitly and in a differentiated way, at least for IRC models. Full coverage
will be achieved under Basel 3.5.
3. Slow response to external shocks (outlier clustering). The introduction of stress
VaR under Basel 2.5 went a long way towards eliminating the problem of under-
estimating risks in benign market conditions. Historical market data for “normal
VaR” are now adjusted daily, while monthly or quarterly adjustments were the
norm before the crisis.
4. Insufficient consideration of the risk factors involved in securitisation. As a result,
models designed for securitisation portfolios may no longer be used to calculate
capital charges (with the exception of the correlation trading portfolio). Even
before the rule change, some banks had already decided themselves to stop using
these models.
5. Flawed proxy approaches. Prior to the crisis, it was often possible to assign a
newly introduced product to an existing one and assume the market risk would
which were not covered by traditional market risk models before the crisis.
Regulatory Developments in Risk Management … 23
behave in the same way. During the crisis, this assumption proved to be flawed.11
The supervisory treatment of such approaches is now much more restrictive.
6. The approximation of changes in the price of financial instruments cannot accom-
modate large price movements (delta-gamma approximations). Full revaluation
of instruments is now standard practice.
7. No and/or flawed scaling to longer time horizons. Scaling practices of this kind,
such as square-root-of-time scaling, are now subject to prudential requirements
to ensure their suitability.
These problems were the basis of the review of market risk rules under Basel 2.5
and, as described above, were able to be eliminated both by banks themselves and by
new supervisory requirements.12 Despite this large-scale and appropriate response,
distrust of internal model results and their use for prudential purposes persisted,
leading to further fundamental discussions.13
This continuing distrust at the most senior level of the Basel Committee14 led to the
commissioning of the Standards Implementation Group for Market Risk (SIG-TB) to
compare the results generated by the internal models of various banks when applied
to the same hypothetical trading portfolios (hypothetical portfolio exercise). A major
point of criticism has always been that internal model results are too variable even if
the risks involved are the same. In January 2013, the SIG-TB published its analysis.15
The following factors were identified as the key drivers of variation:
• The legal framework: some of the banks in the sample did not have to apply Basel
2.5. This means the US banks, for instance, supplied data from models that had
neither been implemented nor approved. Analysis showed that some of these banks
had significantly overestimated risk, though this did not, in practice, translate into
higher capital requirements.
• National supervisory rules for calculating capital requirements: differences were
noted, for example, in the multipliers set by supervisors for converting model
results into capital requirements. In addition, some supervisors had already
imposed restrictions on the type of model that could be used and/or set specific
capital add-ons.
• Legitimate modelling decisions taken by the banks: among the most important
of these was the choice of model (spread-based, transition matrix-based) in the
absence of a market standard for modelling rating migration and default risk (IRC
11 [18], p. 133.
12 [21], pp. 59 ff., [25], p. 39.
13 Cf. Sect. 3.
14 The precise reasons for this distrust at senior level are not known.
15 Cf. [6].
24 U. Gaumert and M. Kemmer
3.1 Overview
Given the difficulties associated with modelling and the variation in results, it is
legitimate to ask whether model-based, risk-sensitive capital charges should be
dropped altogether. Such a step would, moreover, significantly simplify regulation.
But it could also be asked whether it would not make more sense to address the
undoubted weaknesses of internal models by means of the reforms already in place
or in the pipeline without “throwing the baby out with the bath water”, i.e. should we
not try to learn from past mistakes instead of just giving up. These questions can best
be answered systematically by examining to what extent the existing regulatory pro-
posals could, together or on their own, replace model-based capital charges. There
are essentially two alternatives under discussion:
• dropping risk-sensitive capital charges and introducing a leverage ratio as the sole
“risk metric”;
• regulatory standardised approaches: applying risk-sensitive capital charges while
abandoning model-based ones.
19 The most recent revision of the Basel Committee’s definition of the leverage ratio can be found
in draft form in [4] and, in its final form, in [8].
20 Cf. [31], p. 19.
26 U. Gaumert and M. Kemmer
ratios set at this level would override risk-based standards, thus rendering them obsolete.
25 Cf., for example, the summarising article [32], pp. 26 f.
26 Cf., for example, [17] or [20], p. 58.
Regulatory Developments in Risk Management … 27
Standardised approaches, i.e. approaches which spell out in detail how to calculate
capital requirements on the basis of prudential algorithms (“supervisory models”),
will always be needed for smaller banks which cannot or do not wish to opt for internal
models. But larger banks need standardised approaches too—as a fallback solution
if their internal models are or become unsuitable for all or for certain portfolios.
Having said that, a standardised approach alone is by no means sufficient for larger
banks; the reasons are as follows30 :
• It is invariably true of a standardised approach that “one size does not fit all banks”.
Since a standardised approach is not tailored to an individual bank’s portfolio
structure, it cannot measure certain risks (such as certain basis risks) or can only
27 The discussion about a suitable definition of the leverage ratio also shows that improved definitions
their risk per unit of assets, reducing their usefulness as an indicator of bank failure—a classic
Goodhart’s Law. Indeed, that was precisely the rationale for seeking risk-sensitivity in the Basel
framework in the first place. A formulation which would avoid this regulatory arbitrage, while
preserving robustness, would be to place leverage and capital ratios on a more equal footing.” A
leverage ratio of at least 7 % would be necessary for this purpose, in the authors’ view.
30 Cf. [19], p. 37.
28 U. Gaumert and M. Kemmer
31 One example: when supervisors set risk factors in the standardised approach model, basis risk is
often ignored because different risk factors are (and must be) mapped to the same regulatory risk
factor. This is part of the model simplification process. It is often easy to design a trade to exploit
the “difference”.
32 The outlined shortcomings of standardised approaches also mean they have only limited suitability
as a floor for model-based capital requirements. Contrary to what is sometimes claimed, model risk
would therefore not be reduced by a floor.
Regulatory Developments in Risk Management … 29
that, together or separately, a leverage ratio and standardised approaches are inappro-
priate and insufficient from a supervisory perspective. Internal models must remain
the first choice. Nevertheless, confidence in internal models needs to be significantly
strengthened.
4.1 Overview
33 For example: the range of multipliers (“3 + x” multiplier), which convert model results into
capital requirements, and the reasons for their application differ widely from one jurisdiction to
another.
34 The Basel Committee is already trying to find a balance between the objectives of “risk sensitiv-
ity”, “complexity” and “comparability”. Standardisation has the potential to reduce the complexity
of internal models and increase their comparability. Against that, increasing the complexity of
standardised approaches often improves comparability. See [5, 22] on the balancing debate.
30 U. Gaumert and M. Kemmer
• Standardised models can pose a threat to financial stability because they encourage
all banks to react in the same way (herd behaviour). Model diversity is a desirable
phenomenon from a prudential point of view since it generates less procyclicality.
• Standardised models would frequently be unsuitable for internal use at larger
banks, which would consequently need to develop alternative models for internal
risk management purposes. As a result, the regulatory model would be maintained
purely for prudential purposes (in violation of the use test; see below). This would
encourage strategies aimed at reducing capital requirements since the results of
this model would not have to, and could not, be used internally.
• It is therefore in the nature of models that a certain amount of variation will
inevitably exist.
Nonetheless, it is most certainly possible to standardise models in a way which will
reduce their complexity and improve the comparability of their results but will not
compromise their suitability for internal use. Here are a few suggestions35 :
• Develop a market standard for IRC models to avoid variation as a result of differ-
ences in the choice of model (proposed standard established by supervisors: see
Trading Book Review).
• Reduce the amount of flexibility in how historical data are used. For the standard
VaR, one year should be not just the minimum but both the minimum and max-
imum period. This may well affect different banks in different ways, sometimes
increasing capital requirements and sometimes reducing them.
• Standardise the stress period for stressed VaR. The period should be set by super-
visors instead of being selected by banks. True, this means the stress period would
no longer be optimally suited to the individual portfolio in question. But as the
study by the Basel Committee’s SIB-TB has shown, similar periods may, as a
result of the financial crisis, be considered relevant at the highest portfolio level—
namely the second half of 2008 (including Lehman insolvency) and the first half
of 2009.36
Much could also be done to improve transparency. Banks could disclose their
modelling methodologies in greater detail, and explain—for example—why changes
made to their models have resulted in reduced capital charges. Transparency of this
kind will significantly benefit informed experts and analysts. These experts will then
be faced with the difficult challenge of preparing their analyses in such a way as to
be accessible to the general public. The public at large cannot be expected to be the
primary addressees of a bank’s disclosures. Someone without specialist knowledge is
unlikely to be able to understand a risk report, for instance. Nor is it the task of banks
35 Cf. [23].
36 Cf. [6], p. 50.
Regulatory Developments in Risk Management … 31
to write their reports in a manner that makes such specialist knowledge unnecessary.
This is, however, by no means an argument against improving transparency.
The work of the Enhanced Disclosure Task Force (EDTF) is also a welcome
contribution37 and some banks have already implemented its recommendations in
their trading units voluntarily. The slide from the Deutsche Bank’s presentation for
analysts on 31 January 2013 is just one illustration.38 This explains, in particular, the
changes in market-risk-related RWAs (mRWA flow), i.e. it is made clear what brought
about the reduction in capital requirements in the trading area. The reasons include
reduced multipliers (for converting model results into capital requirements) on the
back of significantly better review results, approval of models (IRB approach, IMM)
for some additional products and the consideration of additional netting agreements
and collateral in calculations of capital requirements.
Another possible means of improving transparency would be to disclose the his-
tory of individual positions with a certain time lag. Serious discussion is nevertheless
called for to determine at what point the additional cost of transparency incurred by
banks would exceed the additional benefit for stakeholders. From an economic per-
spective, this may be regarded as a transparency ceiling.
37 Cf. [13]. Recommendations for market risk (nos 22–25), cf. pp. 12, 51–55.
38 Cf. [12], p. 23.
39 Cf. [2, 3].
40 Cf. [1], p. 203.
32 U. Gaumert and M. Kemmer
• Model validation will take place at desk level and become even more stringent
through backtesting and a new P&L attribution process. This will significantly
improve the validation process. At the same time, it will have the effect of raising
the barriers to obtaining supervisory approval of internal models.
• All banks using models will also have to calculate requirements using the stan-
dardised approach. Supervisors take the view that the standardised approach can
serve as a floor, or even a benchmark, for internal models (the level of the floor has
not yet been announced). This may provide a further safety mechanism to avoid
underestimating risk, even if the standardised approach does not always produce
sound results (see above).
Up to now, approval of internal models has been dependent, among other things,
on supervisors being convinced that the model is really used for internal risk man-
agement purposes. Banks consequently have to demonstrate that the model they
have submitted for supervisory approval is their main internal risk management tool.
Basically, they have to prove that the internal model used to manage risk is largely
identical to the model used to calculate capital charges (use test). The rationale behind
this sensible supervisory requirement is that the quality of these risk measurement
systems can best be ensured over time if the internal use of the model results is an
absolute prerequisite of supervisory approval. As a result of the use test, the bank’s
own interests are linked to the quality of the model. The design of the model should
on no account be driven purely by prudential requirements. Moreover, the reply to
the question of how model results are used for internal risk management purposes
shows what shape the bank’s “risk culture” is in.
The use test concept has been undermined, however, by a development towards
more prudentially driven models which began under Basel 2.5 and is even more
pronounced under Basel 3.5. This trend should be reversed. At a minimum, the core
of the model should be usable internally—that is to say be consistent with the bank’s
strategies for measuring risk. Conservative adjustments can then be made outside the
core.
and IRC models. Though validation standards already exist for IRC models, they
can by no means be described as comprehensive.
For normal market risk models, a comprehensive approach going beyond purely
quantitative backtesting and the P&L attribution process could be supported by banks
themselves. Proposals to this effect are already on the table at the Federal Financial
Supervisory Authority (BaFin).42 It would be worth examining whether the minimum
requirements for the IRB approach could make an additional contribution. These
minimum requirements already pursue a comprehensive quantitative and qualitative
approach to validation, though it may not be possible to apply a number of problems
needing to be resolved to the area of market risk.43
A further approach might be to quantify and capitalise model risk either in the form
of a capital surcharge on model results under pillar 1 or as an additional risk category
under pillar 2.
It would be worthwhile discussing the idea of using the diverging result inter-
val of the hypothetical portfolio exercise (see Sect. 2.2) as a quantitative basis for
individual capital surcharges. This may be regarded as prudential benchmarking.44
The portfolios tested in this exercise do not, however, correspond to banks’ real
individual portfolios, which makes them a questionable basis for individual capital
surcharges. As explained above in Sect. 2.2, moreover, it cannot be concluded that
the differences are largely due to model weaknesses. The question of how to derive
the differences actually due to model risk from the observed “gross” differences is
yet to be clarified and will probably be fraught with difficulties. What is more, model
risk is not reflected solely in the differences in model results (see below on the nature
of model risk, which also covers the inappropriate use of model results, for example,
which can result in flawed management decisions).
This raises the question as to whether it may be better to address model risk under
pillar 2. If model risk is assumed to arise, first, when statistical models are not used
properly and, second, from an inevitable uncertainty surrounding key features of
models, then it is likely to be encountered above all in the areas of
• design (model assumptions concerning the distribution of market risk parameters
or portfolio losses, for example),
• implementation (e.g. the approximation assumptions necessary for IT purposes),
• internal processes (e.g. complete and accurate coverage of positions, capture of
market data, valuation models at instrument level [see below]) and IT systems
used by banks to estimate risk, and
CRD IV.
34 U. Gaumert and M. Kemmer
• model use.45
The authors take the view that solving the question of how to quantify model risk for
the purpose of calculating capital charges is a process very much in its infancy and
that it is consequently too soon for regulatory action in this field. As in other areas,
risk-sensitive capital requirements should be sought; one-size-fits-all approaches,
like that called for by the Liikanen Group, should not be pursued because they
usually end up setting perverse incentives.
This point notwithstanding, there are already rigid capital requirements for trad-
ing activities under pillar 1 which address model risk, namely in the area of prudent
valuation. These require valuation adjustments to be calculated on accounting mea-
surements of fair value instruments (additional valuation adjustments, AVAs) and
deducted from CET1 capital. This creates a capital buffer to cover model risk associ-
ated with valuation models at instrument level (see above).46 Valuation risk arising
from the existence of competing valuation models and from model calibration is
addressed by the EBA standard. Deductions for market price uncertainty (Article 8
of the EBA RTS) can also be interpreted as charges for model risk, even if the EBA
does not itself use the term.
5 Conclusion
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
1. Artzner, P., Delbaen, F., Eber, J.M., Heath, D.: Coherent measures of risk. Math. Financ. 9(3),
203–228 (1999)
2. Basel Committee on Banking Supervision: Consultative document—Fundamental review of
the trading book (2012)
3. Basel Committee on Banking Supervision: Consultative document—Fundamental review of
the trading book: A revised market risk framework (2013)
4. Basel Committee on Banking Supervision: Consultative document—Revised Basel III leverage
ratio framework and disclosure requirements (2013)
5. Basel Committee on Banking Supervision: Discussion paper—The regulatory framework: bal-
ancing risk sensitivity, simplicity and comparability (2013)
6. Basel Committee on Banking Supervision: Regulatory consistency assessment programme
(RCAP)—Analysis of risk-weighted assets for market risk (2013)
7. Basel Committee on Banking Supervision: Regulatory consistency assessment programme
(RCAP)—Second report on risk-weighted assets for market risk in the trading book (2013)
8. Basel Committee on Banking Supervision: Basel III leverage ratio framework and disclosure
requirements (2014)
9. Bongers, O.: Mindestanforderungen an die Validierung von Risikomodellen. In: Martin, R.W.,
Quell, P., Wehn, C.: Modellrisiko und Validierung von Risikomodellen, pp. 33–64. Cologne
(2013)
10. Bundesverband deutscher Banken (Association of German Banks): Discussion paper:
Finanzmarktturbulenzen—Gibt es Weiterentwicklungsmöglichkeiten von einzelnen Methoden
im Risikomanagement? (2008)
11. Deutsche Bank AG: 2007 to 2013 Annual Reports
12. Deutsche Bank AG: Investor Relations, presentation for Deutsche Bank analysts’ conference
call on 31 January 2013, https://www.deutsche-bank.de/ir/de/images/Jain_Krause_4Q2012_
Analyst_call_31_Jan_2013_final.pdf
13. Enhanced Disclosure Task Force: Enhancing the Risk Disclosures of Banks—Report of the
Enhanced Disclosure Task Force (2013)
14. European Banking Authority (EBA): EBA Guidelines on the Incremental Default and Migration
Risk Charge (IRC), EBA/GL/2012/3 (2012)
15. European Banking Authority (EBA): EBA FINAL draft Regulatory Technical Standards on
prudent valuation under Article 105(14) of Regulation (EU) No 575/2013 (Capital Require-
ments Regulation—CRR) (2014)
16. Federal Financial Supervisory Authority (Bundesanstalt für Finanzdienstleistungsaufsicht—
BaFin): 2007 Annual Report (2008)
17. Frenkel, M., Rudolf, M.: Die Auswirkungen der Einführung einer Leverage Ratio als zusät-
zliche aufsichtsrechtliche Beschränkung der Geschäftstätigkeit von Banken, expert opinion for
the Association of German Banks (2010)
18. Gaumert, U.: Finanzmarktkrise—Höhere Kapitalanforderungen im Handelsbuch interna-
tionaler Großbanken? In: Nagel, R., Serfling, K. (eds.) Banken, Performance und Finanzmärkte,
festschrift for Karl Scheidl’s 80th birthday, pp. 117–150 (2009)
19. Gaumert, U.: Plädoyer für eine modellbasierte Kapitalunterlegung. In: Die Bank, 5/2013, pp.
35–39 (2013)
20. Gaumert, U., Götz, S., Ortgies, J.: Basel III—eine kritische Würdigung. In: Die Bank, 5/2011,
pp. 54–60 (2011)
Regulatory Developments in Risk Management … 37
21. Gaumert, U., Schulte-Mattler, H.: Höhere Kapitalanforderungen im Handelsbuch. In: Die Bank,
12/2009, pp. 58–64 (2009)
22. German Banking Industry Committee: Comments on the BCBS Discussion Paper “The regu-
latory framework: balancing risk sensitivity, simplicity and comparability” (2013)
23. German Banking Industry Committee: Position paper “Standardisierungsmöglichkeiten bei
internen Marktrisikomodellen” (2013)
24. Haldane, A., Madouros, V.: The Dog and the Frisbee. Bank Of England, London (2012)
25. Hartmann-Wendels, T.: Umsetzung von Basel III in europäisches Recht. In: Die Bank, 7/2012,
pp. 38–44 (2012)
26. Hellwig, M., Admati, A.: The Bankers’ New Clothes. Princeton University Press, Princeton
(2013)
27. Kreditwesengesetz: Gesetz über das Kreditwesen—KWG, non-official reading version of
Deutschen Bundesbank, Frankfurt am Main, as at 2 January (2014)
28. Quell, P.: Grundsätzliche Aspekte des Modellrisikos. In: Martin, R.W., Quell, P., Wehn, C.:
Modellrisiko und Validierung von Risikomodellen, pp. 15–32. Cologne (2013)
29. Regulation (EU) No 575/2013 of the European Parliament and of the Council of 26 June 2013 on
prudential requirements for credit institutions and investment firms and amending Regulation
(EU) No 648/2012 (CRR—Capital Requirements Regulation)
30. Senior Supervisors Group: Observations on Risk Management Practices during Recent Market
Turbulence (2008)
31. Wissenschaftlicher Beirat beim BMWi (Academic advisory board at the Federal Ministry for
Economic Affairs and Energy): Reform von Bankenregulierung und Bankenaufsicht nach der
Finanzkrise, Berlin, report 03/2010 (2010)
32. Zimmermann, G., Weber, M.: Die Leverage Ratio—Beginn eines Paradigmenwechsels in der
Bankenregulierung? In: Risiko Manager 25/26, pp. 26–28 (2012)
Model Risk in Incomplete Markets with Jumps
Abstract We are concerned with determining the model risk of contingent claims
when markets are incomplete. Contrary to existing measures of model risk, typically
based on price discrepancies between models, we develop value-at-risk and expected
shortfall measures based on realized P&L from model risk, resp. model risk and
some residual market risk. This is motivated, e.g., by financial regulators’ plans to
introduce extra capital charges for model risk. In an incomplete market setting, we
also investigate the question of hedge quality when using hedging strategies from a
(deliberately) misspecified model, for example, because the misspecified model is
a simplified model where hedges are easily determined. An application to energy
markets demonstrates the degree of model error.
1 Introduction
We are concerned with determining model risk of contingent claims when mar-
ket models are incomplete. Contrary to existing measures of model risk, based on
price discrepancies between models, e.g., [8, 26], we develop measures based on
the realized P&L from model risk. This is motivated by financial regulators’ plans
to introduce extra capital charges for model risk, e.g., [5, 13, 17]. In a complete
and frictionless market model, the “residual” P&L observed on a perfectly hedged
position is due to pricing and hedging in a misspecified model. The distribution of
this P&L can therefore be taken as an input for specifying measures of model risk,
This work was financially supported by the Frankfurt Institute for Risk Management and Reg-
ulation (FIRM) and by the Europlace Institute of Finance.
N. Detering (B)
Department of Mathematics, University of Munich, Theresienstraße 39,
80333 Munich, Germany
e-mail: [email protected]
N. Packham
Frankfurt School of Finance & Management, Sonnemannstr. 9–11,
60314 Frankfurt am Main, Germany
e-mail: [email protected]
© The Author(s) 2015 39
K. Glau et al. (eds.), Innovations in Quantitative Risk Management,
Springer Proceedings in Mathematics & Statistics 99,
DOI 10.1007/978-3-319-09114-3_3
40 N. Detering and N. Packham
Several simulation studies investigate the risk from hedging in a simplified model,
e.g., [11, 24, 25]. However, to the best of our knowledge, this is never compared to
the residual risk in the alternative model when following a risk-minimizing strategy.
Yet, this comparison is important for selecting an appropriate model for pricing and
hedging.
In a case study, we study the respective loss distributions and measures when
applied to options on energy futures. Empirical returns in the energy spot and future
markets behave in a spiky way and thus need to be modeled with jump processes.
However, to reduce the computational cost and to attain a parsimonious model, often
simplified continuous asset price processes are assumed. Based on the measures of
model risk, we assess the quality and robustness of hedging in a continuous asset
price model when the underlying price process has jumps relative to determining
hedges in the jump model itself. As asset price models, we employ continuous and
pure-jump versions of the Schwartz model [27], calibrated to the spot market at the
Nordic energy exchange Nord Pool.
The paper is structured as follows: In Sect. 2, we construct the loss variable and loss
distribution relevant for model risk. Section 3 defines measures on the distribution
of losses from model risk and relates them to the axioms for measures of model
uncertainty introduced by [8]. In Sect. 4, we introduce a way of measuring the relative
losses from hedging in a misspecified model as opposed to hedging in the appropriate
model. Finally, Sect. 5 contains a case study from the energy market to illustrate the
relative loss measure and draw conclusions about the quality of hedging strategies
determined in a complete model with continuous asset price processes, when the
underlying market is in fact subject to jumps.
In this section, we formalize the market setup and the loss process expressing the
residual losses from a hedged position. In the case of a complete and frictionless
market, these losses correspond to model risk, whereas in the case of an incomplete
market, these losses comprise in addition the market risk that is not hedged away.
We begin with a standard market setup under model certainty, as in e.g., [22]. On a
probability space (Ω, F, Q) endowed with a filtration (Ft )t≥0 satisfying the “usual
j
hypotheses” are defined adapted asset price processes (St )t≥0 , j = 0, . . . , d. The
asset with price process S represents the money market account, whereas S 1 , . . . , S d
0
are risky assets. All prices are discounted, that is, expressed in units of the money
42 N. Detering and N. Packham
considered is
S = Φ|Φ admissible, predictable, self-financing, Φt ∈ σ (St ), ∀ t ≥ 0
⎡ ⎤
T
and E ⎣ (φ j )2 d[S j , S j ]⎦ < ∞, j = 0, . . . , d .
0
The condition Φt ∈ σ (St ) implies that the hedging strategy is a Markov process.
Working on a set of measures requires further conditions, in particular, as the
measures in Q need not be absolutely continuous with respect to Q. More specifi-
cally, the asset price processes must be consistent under all measures and specifying
trading strategies requires the notion of a stochastic integral with respect to the set
of measures.
In case the models in Q are diffusion processes, [28] develop the necessary tools
from stochastic analysis, such as existence of a stochastic integral, martingale rep-
resentation, etc. Although this restricts the joint occurrence of certain probability
measures, it does not exclude any particular measure. For our purposes, this limi-
tation does not play a role, as we are primarily interested in choosing a rich set of
possible models to cover the model uncertainty. For details, we refer to [10].
In the general case, we pose the following condition on the set of measures Q,
which ensures that all objects are well defined when working with uncountably many
measures.
t
Assumption 1 There exists a universal version of the stochastic integral 0 φ dS,
φ ∈ S. In addition, for all Q ∈ Q, the integral coincides Q–a.s. with the usual
t
probabilistic construction and 0 φ dS is Ft -measurable.
The goal will be to extend this variable to a loss process L t (X, Φ), t < T , with
Φ a hedging, resp. replicating strategy under Q. As both the time-t price, E[Y |Ft ]
and the strategy φ are defined only Q–a.s., one must be explicit in specifying the
version to be used when dealing with models that are not absolutely continuous with
respect to Q. In our setup, we have E[Y |Ft ] = E[Y |St ] = f (St ) for some Borel-
measurable function f , and likewise for the trading strategy. Since Q expresses the
model uncertainty when employing Q for pricing and hedging, it must not be involved
in the choice of the respective versions of the pricing and hedging strategies.
Assumption 2 The versions of E[Y |St ], t ≤ T , and φ are chosen irrespective of the
measures contained in Q.
We further impose linearity conditions on the versions of E[Y |Ft ] and φ, which
are in general only fulfilled Q–a.s. but for all practically relevant models and claims
hold for all ω ∈ Ω. This will be important for the axiomatic setup in Sect. 3.2.
Assumption 3 Let X 1 , X 2 ∈ C, Φ1 = (φ1 , u 11 , . . . , u 1I ), Φ2 = (φ2 , u 21 , . . . , u 2I ) ∈
I j
S and define Y j := X j − i=1 u i Hi , j = 1, 2. For all t ≤ T , it holds that
E[aY1 + bY2 |Ft ](ω) = aE[Y1 |Ft ](ω) + bE[Y2 |Ft ](ω), a, b ∈ R, ω ∈ Ω
and
Vt (aφ1 (ω) + bφ2 (ω)) = aVt (φ1 (ω)) + bVt (φ2 (ω)), a, b ∈ R, ω ∈ Ω.
I
with Y = X − i=1 u i Hi and V0 = E[Y ].
If Φ is a replicating strategy under Q, then L t = 0 Q–a.s., but possibly for some
Q ∈ Q, Q(L t = 0) < 1, which expresses that Φ fails to replicate X under Q. A
model-free hedging strategy is defined as follows:
Definition 2 The trading strategy Φ = ((φt )0≤s≤T , u 1 , . . . , u I ) is a model-free or
model-independent replicating strategy for claim X with respect to Q, if L t = 0,
t ≥ 0, Q–a.s., for all Q ∈ Q.
Model Risk in Incomplete Markets with Jumps 45
Note that our definition of the hedge error based on a continuous time integral sep-
arates model risk from a discretization error. When actually calculating the hedge
error, it is necessary to use a time grid small enough such that the discretization error
is negligible.
The following proposition shows that the overall expected loss at time T from
replicating in Q when the market evolves according to Q M instead of Q depends
only on the price difference.
Proposition 1 1. The total expected loss from replicating under Q claim X , that is
I
E[L T ] plus the initial transaction cost E[ i=1 u i (Hi0 − Ci )], when the market
evolves according to Q M is just the price difference in the two models, −(E[X ]−
EQ M [X ]).
2. The price range measure, defined by supQ∈Q EQ [X ] − inf Q∈Q EQ [X ], can be
Q Q
expressed as supQ,Q EQ [L T ], where L T denotes the loss variable from hedging
under Q.
Proof See [10].
If a claim cannot be replicated, then—given the static hedging component
I
i=1 u i Hi —a hedging strategy can be defined as a solution ( V̂0 , Φ̂) ∈ R × S
of the optimization problem
⎡ ⎛ ⎛ ⎞⎞⎤
d
T
⎢ ⎜ ⎜ ⎟⎟⎥
inf E[U (L T (X, Φ))] = inf E ⎣U ⎝− ⎝V0 + φ dS j − Y ⎠⎠⎦ ,
(V0 ∈R,Φ∈S ) (V0 ∈R,Φ∈S )
j=1 0
(3)
where U : R → R+ weighs the magnitude of the hedge error. The most common
choice is U (x) = x 2 , which minimizes the quadratic hedge error. This so-called
quadratic hedging has the advantage that the resulting pricing and hedging rules
become linear and it is also the analytically most tractable rule. Under this choice of
U (x), if S is a martingale, then a solution exists and V̂0 = E[Y ], [20].
Of course, in an incomplete market, L T (X, Φ) entails not only losses due to model
misspecification, but some losses due to market risk as well, since Q(L T (X, Φ) =
0) < 1, that is, P&L is incurred even when there is no model uncertainty.
For the explicit determination of L t (X, Φ) in some examples, we refer to [10]. It
is worth noting that in a complete market setup, the loss process corresponds to the
tracking error of [15].
The next step is to associate a distribution with the loss variable L t , t ≤ T , based
on which risk measures such as value-at-risk and expected shortfall can be defined.
46 N. Detering and N. Packham
P(L t ≤ x) = Qa (L t ≤ x) μ(da), 0 ≤ t ≤ T.
Θ
The loss distribution aggregated across the measures in Q from Sect. 2.3 is the key
input to define measures of model risk. For the time being, we continue to work
in a setting where a particular model Q is used for pricing and hedging, as this is
appropriately quantifies the model risk from a bank’s internal perspective.
If a claim cannot be replicated, and the trading strategy Φ is merely a hedg-
ing strategy in some risk-minimizing sense, then the loss variable L t (X, Φ) from
Definition 1 features not only model risk, but also the unhedged market risk. To dis-
entangle model risk from the market risk, one could first determine the market risk
from the unhedged part of the claim under Q and set this into relation to the overall
residual risk. This requires taking into account potential diversification effects, since
Model Risk in Incomplete Markets with Jumps 47
risks are not additive. We shall continue to work under the setup of measuring resid-
ual risk, and use the terminology “model risk,” although some market risk is also
present.
Market incompleteness can also be seen to be a form of model risk, as—in addition
to the uncertainty on the objective measure—it causes uncertainty on the equivalent
martingale measure. However, hedging strategies would typically be chosen that are
risk minimizing not under the martingale measure, but risk minimizing under the
objective measure. In the case of continuous asset prices, this implies that hedging
is done under the minimal-martingale measure, which is uniquely determined. In
practice, it is more common to choose an equivalent measure that calibrates suf-
ficiently well, and in this case one could argue that incompleteness also increases
model uncertainty. In our setup, this would be reflected by a larger set Q.
The usual value-at-risk and expected shortfall measures are defined as follows:
Definition 3 Let L t (X, Φ) be the time-t loss from the strategy Φ that hedges claim
X under Q. Given a confidence level α ∈ (0, 1),
1. Value-at-risk (VaR) is given by
1
1
ESα (L t (X, Φ)) = VaRu (L t (X, Φ)) du.
(1 − α)
α
reasonable in the sense that it is not of interest whether a position is indeed hedged
or not. Rather the hedging argument serves only to eliminate (or reduce, in case the
claim cannot be replicated) P&L from market risk. Choosing the minimal degree
allows to appropriately capture claims that can be replicated in a model-free way.
Definition 4 Concrete measures capturing the model uncertainty when pricing and
hedging claim X according to model Q are given by
Q
1. μSQE,t (X ) = inf Φ∈Π E[L t (X, Φ)2 ],
Q
2. μVaR,α,t (X ) = inf Φ∈Π VaRα (|L t (X, Φ)|),
Q
3. μES,α,t (X ) = inf Φ∈Π ESα (|L t (X, Φ)|).
Q
4. ρVaR,α,t (X ) = inf Φ∈Π max(VaRα (L t (X, Φ)), 0),
Q
5. ρES,α,t (X ) = inf Φ∈Π max(ESα (L t (X, Φ)), 0).
Q Q
The measures μVaR,α,t and μES,α,t capture model uncertainty in an absolute sense,
and are thus measures of the magnitude or degree of model uncertainty. The measures
Q Q
ρVaR,α,t and ρES,α,t consider losses only. As such, they are suitable for defining a
capital charge against losses from model risk.
Contrary to the case of bank internal risk measurement, a regulator may wish to
measure model risk independently of a particular pricing or hedging measure, taking
a more prudent approach. To abstract from the pricing measure, one would first
define the set Q H ⊆ Q of potential pricing and hedging measures (e.g., measures
that calibrate sufficiently well) and then define the risk measure in a worst-case sense
as follows:
Q
Definition 5 Let μt H (X ) be a measure of model uncertainty when pricing and
hedging X according to model Q H ∈ Q H . The model uncertainty of claim X is
given by
QH
μt (X ) = sup μt (X ). (4)
Q H ∈Q H
Q Q
Capital charges can then be determined from either μVaR,α,t (X ), resp. μES,α,t (X ),
or from μVaR,α,t (X ), resp. μES,α,t (X ).
Cont [8] introduces a set of axioms for measures of model risk. A measure satisfying
these axioms is called a convex measure of model risk. The axioms follow the general
notion of convex risk measures, [18, 21], but are adapted to the special case of model
risk. In particular, these axioms take into account the possibility of static hedging
Model Risk in Incomplete Markets with Jumps 49
with liquidly traded option and of hedging in a model-free way. More specifically,
the axioms postulate that an option that can be statically hedged with liquidly traded
options is assigned a model risk bounded by the cost of replication, which can be
expressed in terms of the bid-ask spread. Consequently, partial static hedging for
a claim reduces model risk. Further, the possibility of model-free hedging with the
underlying asset reduces model risk to zero. Finally, to express that model risk can
be reduced through diversification, convexity is required.
Here we only state the following result, which ensures that our measures fulfill
the axioms proposed in Cont [8]. The proof is given in [10] for complete markets
and can be easily generalized to an incomplete market.
Q Q Q
Proposition 3 The measures μSQE,t (X ), μES,α,t (X ) and ρES,α,t (X ) satisfy the
Q Q
axioms of model uncertainty. The measures μVaR,α,t (X ) and ρVaR,α,t (X ) satisfy
Axioms 1, 2, and 4.
4 Hedge Differences
Instead of considering the P&L arising from model misspecification as in Sect. 2.2,
one might be interested in a direct comparison of hedging strategies implied by
different models. For example, one might wish to assess the quality of hedging
strategies determined from a deliberately misspecified, but simpler model, in a more
appropriate, but more involved model.
We first explain the idea with respect to one alternative model Q M ∈ Q and outline
then how measures with respect to the entire model set can be built. As before, Q
is the model for pricing and hedging and, fixing a claim X ∈ C, Π is the set of
quadratic risk-minimizing (QRM) hedging strategies for X under Q (containing
various hedging strategies, depending on how static hedges with the benchmark
instruments are chosen).
We seek an answer to the following question: If the market turns out to follow
Q M , what is the loss incurred by hedging in Q instead of hedging I in Q M ? Let
Φ = (φ, u 1 , . . . , u I ) ∈ Π be the QRM strategy for Y = X − i=1 u i Hi , and let
Φ M be the respective QRM strategy for Y derived under Q M . The relative difference
of the hedge portfolio compared to the hedge portfolio when using the strategy of
Q M is given by
t
QM
LΔ
j
t (X, Φ, Φ M ) =E [Y ] − Ē[Y ] + (φ M − φ j ) dS j . (5)
j=1 0
50 N. Detering and N. Packham
This variable differs from L t (X, Φ), cf. Eq. (2), in that it expresses the difference
between the hedging strategies Φ and Φ M , whereas L t (X, Φ) describes the difference
between the hedging strategy Φ and the claim X .1
The next proposition provides some insight on the different nature of the two
variables.
Proposition 4 The following properties hold for the processes L Δ (X, Φ, Φ M ) and
L(X, Φ):
L Δ (X, Φ, Φ M ) is a Q M -martingale with L Δ Q M [Y ] − Ē[Y ]
1. 0 (X, Φ, Φ M ) = E
2. EQ M [L T (X, Φ)] = EQ M [Y ] − Ē[Y ]
3. LΔT (X, Φ, Φ M ) = L T (X, Φ) Q M –a.s. if Y can be replicated under Q M
LΔ Q M [Y |F ] − E[Y |F ] Q –a.s. if Y can be
4. t (X, Φ, Φ M ) − L t (X, Φ) = E t t M
replicated under Q M .
Proof 1. This follows directly from the definition of L Δ (X, Φ, Φ M ) and the fact
that Φ M and Φ are in S.
2. See Proposition 1.
T j
3. If Y can be replicated, then Y = EQ M [Y ] + dj=1 0 φ M dS j Q M –a.s., and
d
consequently L Δ
T j
T (X, Φ, Φ M ) = Y − (E[Y ] + j=1 0 φ dS ) Q M –a.s.. The
j
d T j
claim follows by observing that L t (X, Φ) = −(E[Y ] + j=1 0 φ dS j − Y ).
Q M [Y |F ] − (E[Y ] + d
t j
4. Using that L Δ
t (X, Φ, Φ M ) = E t j=1 0 φ dS Q M –a.s.,
j
since Y can be replicated under Q M , the claim follows with the definition of
L t (X, Φ).
1 There is no need to pose specific conditions on the version of the hedging strategy Φ M chosen,
since in the following only properties of L Δ
t (X, Φ, Φ M ) under Q M are analyzed.
Model Risk in Incomplete Markets with Jumps 51
0.06 0.12
Relative Frequency
Relative Frequency
0.05 0.10
0.04 0.08
0.03 0.06
0.02 0.04
0.01 0.02
0.00 0.00
0.002 0.004 0.006 0.008 0.010 0.002 0.004 0.006 0.008 0.010 0.012 0.014
Loss from hedging Loss from hedging
Fig. 1 Loss at t = 1/2T from dynamically hedging an at-the-money call option with a maturity
T of 3 months based on 10,000 simulations and 1,000 time steps. Left Distribution of L t (X, Φ),
E[L t (X, Φ)] = 0.0053. Right Distribution of L Δ Δ
t (X, Φ, Φ M ), E[L t (X, Φ, Φ M )] = 0.0099, which
equals the initial price difference
As a real-worked example, we study the loss variables and risks from hedging options
on futures in energy markets. The spot and future prices in energy markets are
extremely volatile and show large spikes, and a realistic model for the price dynamics
should therefore involve jumps. However, continuous models based on Brownian
motions are not only computationally more tractable, but prevalent in practice. Our
analysis sheds light on the risks of hedging in a simplified continuous model instead
of a model involving jumps.
Assume given a probability space (Ω, (Ft )0≤t≤T ) with a measure P̃ on which a
two-dimensional Lévy process (L t ) = (L 1,t , L 2,t )t>0 with independent components
is defined. A popular two-factor model for the energy spot price is developed by
Schwartz and Smith [27]. The spot is driven by a short-term mean reverting factor to
account for short-term energy supply and energy demand and a long-term factor for
changes in the equilibrium price level. In its extended form, [4, Sect. 5], the logarithm
of the spot price is
log St = Λt + X t + Yt (6)
with (Λt )t>0 a deterministic seasonality function, (X t )t>0 a Lévy driven Ornstein
Uhlenbeck process with dynamics dX t = −λX t dt + dL 1,t and (Yt )t>0 defined by
dYt = dL 2,t . We further assume that the cumulant function Ψ (z) := log(E[ez,L 1 ])
is well defined for z = (z 1 , z 2 ) ∈ R2 , |z| ≤ C, for C ∈ R. Due to the independence
of L 1 and L 2 , the cumulant transforms of both processes add up and we have Ψ (z) =
Ψ1 (z 1 ) + Ψ2 (z 2 ) where Ψ1 and Ψ2 is the cumulant for L 1 and L 2 , respectively.
We consider the pricing and hedging of options on the future contract. In contrast
to, for example, equity markets, the future contract in energy delivers over a period
of time [T1 , T2 ] instead of a fixed time point by defining a payout
52 N. Detering and N. Packham
T2
1
Sr dr (7)
T2 − T1
T1
in return for the agreed future price. While the spot is not tradable due to lack of
storage opportunities, the future is tradable and used for hedging both options on the
future itself and options directly on the spot price. Assuming that the future price Ft
equals its expected payout
⎡ ⎤
T2
⎢ 1 ⎥
Ft = EQ ⎣ Sr dr Ft ⎦ (8)
T2 − T1
T1
under a pricing measure Q ∼ P̃, the value Ft is derived in analytic form in [4]. Under
the assumption that L 1 and L 2 are normal inverse Gaussian (NIG) distributed Lévy
processes an approximate process ( F tL )t<T2 is determined in [4] by matching first
and second moments such that ( Ft )t<T2 is of exponential additive type. We assume
L
with time-dependent, deterministic functions Σ1L (t) and Σ2L (t). The process F L
depends on the interval [T1 , T2 ], but in order to avoid overloading the notation and
since we shall only consider a single delivery period in our example, we simply
write FL , Σ L and Σ L . The market under this model is incomplete and claims can in
1 2
general only be hedged with risk-minimizing strategies. Integral representations for
prices and quadratic risk-minimizing hedge positions of call and put payoffs can be
derived, and we refer the reader to [4, Prop.3.9.] for further details and the explicit
formulas.
As a pricing and hedging model, we consider a simplified version of (6), which
is driven by two (nonstandard) independent Q-Brownian motions (B1,t )t>0 and
(B2,t )t>0 defined on (Ω, (Ft )0≤t≤T ) and we derive, again by moment matching,
an analog approximate future price process F B of the form
⎛ ⎞
t
t
t
tB = F0 exp ⎝−
F Ψ1B ((Σ1B (s)) + Ψ2B ((Σ2B (s)) ds + Σ1B (s) dB1,s + Σ2B (s) dB2,s ⎠ (10)
0 0 0
with time-dependent, deterministic functions Σ1B (t) and Σ2B (t) and with Ψ1B (z) and
Ψ2B (z) being the cumulant transforms of B1,1 and B2,1 .
Model Risk in Incomplete Markets with Jumps 53
Although the model has two sources of randomness, it is a complete model under
the filtration generated by the future price itself as the next proposition shows, which
means that all practically relevant claims can be replicated.
B up to time t, i.e., Gt :=
Proposition 5 Let (Gt )t<T be the filtration generated by F
σ { Fs , s ≤ t}. Then the market consisting of F and a constant riskless bank account
B B
We estimate the parameters for both models based on future and spot data from
Nord Pool energy exchange. We use average daily system peak load electricity
spot prices for the period from January 2011 until May 2013 (prices as shown
on Bloomberg page “ENOSOSPK”) and weekday prices for front month and sec-
ond month future contracts. For details on the estimation procedure, we refer to
[4, Sect. 5.2.]. In Table 1, we collect the parameter estimates for the two factors of
both models, the simplified model with two nonstandard Brownian motions and the
model with two independent NIG-Lévy processes. The estimates for the Brownian
factor are only the drift term μ and the volatility term σ . The NIG distribution is a
four-parameter distribution with scale parameter δ, tail heaviness α, skew parameter
β, and the location parameter ν, see [3].
Figure 2 shows the empirical return distributions of both factors together with the
density function of the estimated distribution. It is obvious that the NIG distribution
provides a significantly better fit to the empirical returns than the normal distribution.
The claim to be hedged is an option on a future with a one-week delivery period
trading one month prior to expiry, so that T1 = 23 and T2 = 30. Based on the
parameter estimates, we determine scaling terms Σ1L (t) and Σ2L (t) for the dynamics
of FtL and scaling terms Σ B (t) and Σ B (t) for the dynamics of F B , respectively.
1 2
Assuming that the measures Q and Q are orthogonal, we define an aggregating
process F such that F = F B Q–a.s. and F = F tL Q–a.s.. Pricing and hedging is
performed under Q, and there is only one alternative measure, denoted by Q. Our
model set is thus Q = {Q, Q}. Applying the Akaike Information Criterion (AIC),
we assign a probability distribution to the model set Q. It turns out that model Q gets
assigned a probability of basically 1 due to its much better fit of the returns and we
simulate according to this model.
We consider an at-the-money call option X := ( F T2 − F0 )+ and calculate the
hedge positions implied by Q. For the simulation of the process under Q, we use 600
time steps in order to reduce the discretization error. We investigate the distribution
of L Δ
T (X, Φ, ΦQ ) and L T (X, Φ), with Φ and ΦQ dynamic hedging strategy as there
are no benchmark instruments. As implied by Proposition 5, the hedging strategy is
actually a perfect hedge under the model Q.
Figure 3 shows on the left-hand side the distributions under Q of L T (X, Φ) and
LΔT (X, Φ, ΦQ ). To compare, Fig. 3 shows the distribution under Q of the hedge error
L T (X, Φ Q ) when hedging under Q (top right). Here, the hedge error is introduced
by market incompleteness.
54 N. Detering and N. Packham
Table 1 Estimated parameters for the NIG distributions of L 1,t and L 2,t and parameters for the
normal distributions of B1,t and B2,t
α̂ β̂ ν̂ δ̂
L 2,t 1.9240 −0.8860 0.0176 0.0622
L 1,t 33.3008 −1.0988 −0.0009 0.0071
σ̂ μ̂
B2,t 0.2328 −0.0004
B1,t 0.0133 0.0002
60 6
Relative Frequency
Relative Frequency
50 5
40 4
30 3
20 2
10 1
0 0
− 0.10 − 0.05 0.00 0.05 0.10 − 2.0 − 1.5 − 1.0 − 0.5 0.0 0.5 1.0
Return Return
Fig. 2 Empirical distributions of long-term factor (left) and short-term factor (right) together with
fitted NIG distribution (solid line) and normal distribution (dashed line)
It turns out that the loss due to the misspecified model Q is minor compared to the
loss due to the incompleteness. The loss due to model misspecification as measured by
Q Q,Δ
LΔ Δ
T (X, Φ, ΦQ ) has a mean-squared value of μSQE,t (X ) = E [(L T (X, Φ, ΦQ )) ] =
2
9.50. The mean-squared hedge error from hedging under the misspecified model is
Q
greater with μSQE,t (X ) = EQ [(L T (X, Φ))2 ] = 34.61. Although the magnitude
appears high, it is relativized by the fact that even under correct model specifica-
tion the mean-squared hedge error EQ [(L T (X, ΦQ ))2 ] is 25.54. The initial prices
under the two models are EQ [X ] = 10.954 and EQ [X ] = 8.068, respectively.
If we consider the variance of the loss variables, which corrects for the mean, it
turns out that the impact from the misspecified hedge is rather low. For the variable
LΔ Δ
T (X, Φ, ΦQ ), we get Var(L T (X, Φ, ΦQ )) = 1.07. We find that Var(L T (X, Φ))
Q
and E [(L T (X, ΦQ )) ] = Var(L T (X, ΦQ )) are similar with 25.71 and 25.56,
2
respectively. The lower right of Fig. 3 shows a scatter plot of (L T (X, Φ) and
L T (X, ΦQ ). The two variables show a correlation of 97.91 %, implying a strong
linear dependence between the hedge error under model Q (market risk) and the
hedge error due to using the misspecified model Q.
The fact that the impact due to hedging in the wrong model is relatively low in
this case study should not be misinterpreted. It confirms a stylized fact that is well
known for diffusion processes (see [15]), namely that, hedging is robust, as long as
the overall variance of the underlying is described sufficiently well by the model.
Model Risk in Incomplete Markets with Jumps 55
Relative Frequency
0.10
0.10
0.08
0.08
0.06
0.06
0.04 0.04
0.02 0.02
0.00 0.00
−25 −20 −15 −10 −5 0 5 10 −25 −20 −15 −10 −5 0 5
Hedge error Hedge error
5
Relative Frequency
0.4
Hedge error
0.3 0
0.2
−5
0.1
−10
0.0
−6 −4 −2 0 −10 −5 0 5 10
Hedge error Hedge error
Fig. 3 Upper left Q(L T (X, Φ) < ·). Upper right Q(L T (X, ΦQ ) < ·). Lower left Q(L Δ
T (X, Φ,
ΦQ ) < ·). Lower right Scatter plot of L T (X, Φ) and L T (X, ΦQ )
The overall volatility in our setup is the same for both models due to the moment
matching procedure and uncertainty in this volatility is likely to result in greater
model risk. The study makes also clear that the hedging error due to incompleteness
cannot be neglected.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
1. Artzner, P., Delbaen, F., Eber, J., Heath, D.: Coherent measures of risk. Math. Financ. 9(3),
203–228 (1999)
2. Bannör, K., Scherer, M.: Capturing parameter uncertainty with convex risk measures. Eur.
Actuar. J. 3, 97–132 (2013)
3. Barndorff-Nielsen, O.E.: Processes of normal inverse Gaussian type. Financ. Stochast. 2, 41–68
(1998)
4. Benth, F.E., Detering, N.: Pricing and hedging Asian-style options in energy. Financ. Stochast.
(2013)
5. BIS: Revisions to the Basel II market risk framework. Basel committee on banking supervision,
Bank for International Settlements, February (2011)
6. Burnham, K., Anderson, D.: Model Selection and Multimodel Inference: A Practical
Information-Theoretic Approach, 2nd edn. Springer, New York (2002)
56 N. Detering and N. Packham
7. Burnham, K., Anderson, D.: Multimodel inference—understanding AIC and BIC in model
selection. Sociol. Methods Res. 33(2), 261–304 (2004)
8. Cont, R.: Model uncertainty and its impact on the pricing of derivative instruments. Math.
Financ. 16(3), 519–547 (2006)
9. Denis, L., Martini, C.: A theoretical framework for the pricing of contingent claims in the
presence of model uncertainty. Ann. Appl. Probab. 16(2), 827–852 (2006)
10. Detering, N., Packham, N.: Measuring the model risk of contingent claims. Working Paper,
Frankfurt School of Finance & Management (submitted) (2013)
11. Detering, N., Weber, A., Wystup, U.: Return distributions of equity-linked retirement plans
under jump and interest rate risk. Eur. Actuar. J. 3(1), 203–228 (2013)
12. Detering, N.: Measuring the model risk of quadratic risk minimizing hedging strategies with
an application to energy markets. Working paper, February (2014)
13. EBA: Discussion paper on draft regulatory technical standards on prudent valuation, under
Article 100 of the draft Capital Requirements Regulation (CRR). Discussion Paper, European
Banking Authority, November (2012)
14. El Karoui, N., Quenez, M.: Dynamic programming and pricing of contingent claims in an
incomplete market. SIAM J. Control Optim. 33(1), 29–66 (1995)
15. El Karoui, N., Jeanblanc-Picqué, M., Shreve, S.: Robustness of the Black and Scholes formula.
Math. Financ. 8(2), 93–126 (1998)
16. Epstein, L.: A definition of uncertainty aversion. Rev. Econ. Stud. 66(3), 579–608 (1999)
17. Federal Reserve: Supervisory guidance on model risk management. Board of Governors of the
Federal Reserve System, Office of the Comptroller of the Currency, SR Letter 11–7 Attachment,
April (2011)
18. Föllmer, H., Schied, A.: Convex measures of risk and trading constraints. Financ. Stochast.
6(4), 429–447 (2002)
19. Föllmer, H., Schweizer, M.: Hedging of contingent claims under incomplete information. In:
Davis, M., Elliott, R. (eds.) Applied Stochastic Analysis, Stochastics Monographs, vol. 5, pp.
389–414. Gordon and Breach, London (1991)
20. Föllmer, H., Sondermann, D.: Hedging of non-redundant contingent claims. In: Hildenbrand,
W., MasCollel, A. (eds.) Contributions to Mathematicla Economics, pp. 205–223. North-
Holland, Amsterdam (1986)
21. Frittelli, M., Gianin, E.R.: Putting order in risk measures. J. Bank. Financ. 26(7), 1473–1486
(2002)
22. Jeanblanc, M., Yor, M., Chesney, M.: Mathematical Methods for Financial Markets. Springer,
Berlin (2009)
23. Knight, F.H.: Risk, Uncertainty and Profit. Houghton Mifflin, Boston (1921)
24. Melino, A., Turnbull, S.M.: Misspecification and the pricing and hedging of long-term foreign
currency options. J. Int. Money Financ. 14(3), 373–393 (1995)
25. Nalholm, M., Poulsen, R.: Static hedging and model risk for barrier options. J. Futures Mark.
26(5), 449–463 (2006)
26. Schoutens, W., Simons, E., Tistaert, J.: A perfect calibration! now what? Wilmott Mag. 2,
66–78 (2004)
27. Schwartz, E.S., Smith, J.E.: Short-term variations and long-term dynamics in commodity prices.
Manag. Sci. 46(7), 893–911 (2000)
28. Soner, M., Touzi, N., Zhang, J.: Quasi-sure stochastic analysis through aggregation. Electron.
J. Probab. 16, 1844–1879 (2011). Article number 67
Part II
Financial Engineering
Bid-Ask Spread for Exotic Options
under Conic Finance
Abstract This paper puts the concepts of model and calibration risks into the
perspective of bid and ask pricing and marketed cash-flows which originate from
the conic finance theory. Different asset pricing models calibrated to liquidly traded
derivatives by making use of various plausible calibration methodologies lead to
different risk-neutral measures which can be seen as the test measures used to assess
the (un)acceptability of risks.
Keywords Calibration risk · Model risk · Exotic bid-ask spread · Conic finance ·
Metric-free calibration risk measure
1 Introduction
The publication of the pioneering work of Black and Scholes in 1973 sparked off
an unprecedented boom in the derivative market, paving the way for the use of
financial models for pricing financial instruments and hedging financial positions.
Since the late 1970s, incited by the emergence of a liquid market for plain-vanilla
options, a multitude of option pricing models has seen the day, in an attempt to
mimic the stylized facts of empirical returns and implied volatility surfaces. The
need for such advanced pricing models, ranging from stochastic volatility models to
models with jumps and many more, has even been intensified after Black Monday,
which evidenced the inability of the classical Black–Scholes model to explain the
intrinsic smiling nature of implied volatility. The following wide panoply of models
has inescapably given rise to what is commonly referred to as model uncertainty or, by
malapropism, model risk. The ambiguity in question is the Knightian uncertainty as
defined by Knight [17], i.e., the uncertainty about the true process generating the data,
F. Guillaume (B)
University of Antwerp, Middelheimlaan 1, 2020 Antwerpen, Belgium
e-mail: [email protected]
W. Schoutens
K.U.Leuven, Celestijnenlaan 200 B, 3001 Leuven, Belgium
e-mail: [email protected]
as opposed to the notion of risk dealing with the uncertainty on the future scenario of a
given stochastic process. This relatively new kind of “risk” has significantly increased
this last decade due to the rapid growth of the derivative market and has led in some
instances to colossal losses caused by the misvaluation of derivative instruments.
Recently, the financial community has shown an accrued interest in the assessment
of model and parameter uncertainty (see, for instance, Morini [19]). In particular,
the Basel Committee on Banking Supervision [2] has issued a directive to compel
financial institutions to take into account the uncertainty of the model valuation in
the mark-to-model valuation of exotic products. Cont [6] set up the theoretical basis
of a quantitative framework built upon coherent or convex risk measures and aimed
at assessing model uncertainty by a worst-case approach.1 Addressing the question
from a more practical angle, Schoutens et al. [22] illustrated on real market data
how models fitting the option surface equally well can lead to significantly different
results once used to price exotic instruments or to hedge a financial position.
Another source of risk for the price of exotics originates from the choice of the
procedure used to calibrate a specific model on the market reality. Indeed, although
the standard approach consists of solving the so-called inverse problem, i.e., quoting
Cont [7], of finding the parameters for which the value of benchmark instruments,
computed in the model, corresponds to their market prices, alternative procedures
have seen the day. The ability of the model to replicate the current market situation
could rather be specified in terms of the distribution goodness of fit or in terms of
moments of the asset log-returns as proposed by Eriksson et al. [9] and Guillaume
and Schoutens [12]. In practice, even solving the inverse problem requires making
a choice among several equally suitable alternatives. Indeed, matching perfectly the
whole set of liquidly traded instruments is typically not plausible such that one
looks for an “optimal” match, i.e., for the parameter set which replicates as well as
possible the market price of a set of benchmark instruments. Put another way, we
minimize the distance between the model and the market prices of those standard
instruments. Hence, the calibration exercise first requires not only the definition of
the concept of a distance and its metric but also the specification of the benchmark
instruments. Benchmark instruments usually refer to liquidly traded instruments. In
equity markets, it is a common practice to select liquid European vanilla options.
But even with such a precise specification, several equally plausible selections can
arise. We could for instance select out-of-the-money options with a positive bid price,
following the methodology used by the Chicago Board Options Exchange (CBOE
[4]) to compute the VIX volatility index, or select out-of-the-money options with a
positive trading volume, or ... Besides, practitioners sometimes resort to time series
or market quotes to fix some of the parameters beforehand, allowing for a greater
stability of the calibrated parameters over time. In particular, the recent emergence
of a liquid market for volatility derivatives has made this methodology possible to
calibrate stochastic volatility models. Such an alternative has been investigated in
Guillaume and Schoutens [11] under the Heston stochastic volatility model, where
1Another framework for risk management under Knightian uncertainty is based on the concept of
g-expectations (see, for instance, Peng [20] and references therein).
Bid-Ask Spread for Exotic Options under Conic Finance 61
the spot variance and the long-run variance are inferred from the spot value of the VIX
volatility index and from the VIX option price surface, respectively. Another example
is Brockhaus and Long [3] (see also Guillaume and Schoutens [13]) who propose to
choose the spot variance, the long-run variance, and the mean reverting rate of the
Heston stochastic volatility model in order to replicate as well as possible the term
structure of model-free variance swap prices, i.e., of the return expected future total
variance. Regarding the specification of the distance metric, several alternatives can
be found in the literature. The discrepancy could be defined as relative, absolute, or in
the least-square sense differences and expressed in terms of price or implied volatility.
Detlefsen and Härdle [8] introduced the concept of calibration risk (or should we say
calibration uncertainty) arising from the different (plausible) specifications of the
objective function we want to minimize. Later, Guillaume and Schoutens [10] and
Guillaume and Schoutens [11] extended the concept of calibration risk to include not
only the choice of the functional but also the calibration methodology and illustrated
it under the Heston stochastic volatility model.
In order to measure the impact of model or parameter ambiguity on the price of
structured products, several alternatives have been proposed in the financial litera-
ture. Cont [6] proposed the so-called worst-case approach where the impact of model
uncertainty on the value of a claim is measured by the difference between the supre-
mum and infimum of the expected claim price over all pricing models consistent with
the market quote of a set of benchmark instruments (see also Hamida and Cont [16]).
Gupta and Reisinger [14] adopted a Bayesian approach allowing for a distribution
of exotic prices resulting directly from the posterior distribution of the parameter set
obtained by updating a plausible prior distribution using a set of liquidly traded instru-
ments (see also Gupta et al. [15]). Another methodology allowing for a distribution
of exotic prices, but based on risk-capturing functionals has recently been proposed
by Bannör and Scherer [1]. This method differs from the Bayesian approach since the
distribution of the parameter set is constructed explicitly by allocating a higher proba-
bility to parameter sets leading to a lower discrepancy between the model and market
prices of a set of benchmark instruments. Whereas the Bayesian approach requires
a parametric family of models and is consequently appropriate to assess parameter
uncertainty, the two alternative proxies (i.e., the worst-case and the risk-capturing
functionals approaches) can be considered to quantify the ambiguity resulting from a
broader set of models with different intrinsic characteristics. These three approaches
share the characteristic that the plausibility of any pricing measure Q is assessed
by considering the average distance between the model and market prices, either
by allocating a probability weight to each measure Q which is proportional to this
distance or by selecting the measures Q for which the distance falls within the aver-
age bid-ask spread. Hence, the resulting measure of uncertainty implicitly depends
on the metric chosen to express this average distance. We will adopt a somewhat
different methodology, although similar to the ones above-mentioned. We start from
a set of plausible calibration procedures and we consider the resulting risk-neutral
probability measures (i.e., the optimal parameter sets) as the test measures used to
assess the (un)acceptability of any zero cost cash-flow X . In other words, these pric-
ing measures can be seen as the ones defining the cone of acceptable cash-flows;
62 F. Guillaume and W. Schoutens
X ∈ A ⇔ E Q [X ] ≥ 0 ∀Q ∈ M .
provided that M = ∅; where EPQ denotes the exotic price under the pricing measure
Q. The model uncertainty can thus be quantified by the bid-ask spread of illiquid
products. Indeed, the cash-flow of selling a claim with payoff X at time T at its ask
price is acceptable for the market if E Q [a − exp(−r T )X ] ≥ 0, ∀Q ∈ M , i.e., if
a ≥ exp(−r T ) max {E Q [X ]}. For the sake of competitiveness, the ask price is set
Q ∈M
at the minimum value, i.e.,
a = exp(−r T ) max {E Q [X ]} .
Q ∈M
Similarly, the cash-flow of buying a claim with payoff X at time T at its bid price
is acceptable for the market if E Q [−b + exp(−r T )X ] ≥ 0, ∀Q ∈ M , i.e., taking
the maximum possible value for competitiveness reasons
b = exp(−r T ) min {E Q [X ]} .
Q ∈M
Bid-Ask Spread for Exotic Options under Conic Finance 63
The impact of model uncertainty can be expressed as a function of the severity of the
percentage threshold p. We note that decreasing the threshold ultimately boils down
to considering a thinner set of benchmark instruments since the model price has to
fall within the market bid-ask spread for a smaller number of calibration instruments
in order for a pricing measure to be selected. In particular, such a relaxation typically
results in the “elimination” of the most illiquid calibration instruments, i.e., deep
out-of-the-money options in the case of equity markets (see Fig. 2).
For the numerical study, we consider the Variance Gamma (VG) model of Madan
et al. [18] only, although the methodology can be equivalently used to assess cali-
bration or/and model uncertainty. The calibration instrument set consists of liquid
out-of-the-money options: moving away from the forward price, we select put and
call options with a positive bid price and with a strike lower and higher than the
forward price, respectively, and this until we encounter two successive options with
zero bid. Denoting by Pi = ai +b 2
i
the mid-price of option i and by σi its implied
volatility, the set of measures M results from the following specifications for the
objective function we minimize (i.e., for the distance and its metric):
RMSE = i 2
ωi Pi − P
i=1
M
|σi − σi |
σ
ARPE = ωi
σi
i=1
1
M
APE = i
ωi Pi − P
P̄ i=1
1
M
APEσ = ωi |σi −
σi | ,
σ̄
i=1
where P̄ and σ̄ denote the average option price and the average implied volatil-
ity, respectively.
Each of these six objective functions can again be subdivided into an unweighted
functional for which the weight ωi = ω = M1 ∀i and a weighted functional for which
the weight ωi is proportional to the trading volume of option i. We furthermore con-
sider the possibility of adding an extra penalty term to the objective function in order
to force the model prices to lie within their market bid-ask spread. Besides these stan-
dard specifications (in terms of the price or the implied volatility of the calibration
instruments), we consider the so-called moment matching market implied calibra-
tion proposed by Guillaume and Schoutens [12] and which consists in matching the
moments of the asset log-return which are inferred from the implied volatility sur-
face. As the VG model is fully characterized by three parameters, we consider three
standardized moments, namely the variance, the skewness, and the kurtosis. Since
as shown by Guillaume and Schoutens [12], the variance can always be perfectly
matched, we either allocate the same weight to the matching of the skewness and the
kurtosis or we match uppermost the lower moment, i.e., the skewness. This leads to
a total of 26 plausible calibration procedures, each of them leading to a test measure
Q ∈ M provided that the proportion of model prices falling within their market
bid-ask spread is at least equal to the threshold p.
2The data are taken from the KU Leuven data collection which is a private collection of historical
daily spot and option prices of major US equity stocks and indices.
Bid-Ask Spread for Exotic Options under Conic Finance 65
Maximal proportion
1
Maximal proportion
0.8
0.6
0.4
0.2
01/10/08 02/01/09 01/05/09 01/09/09 30/10/09
Trading day
10
0
01/10/08 02/01/09 01/05/09 01/09/09 30/10/09
Trading day
Fig. 1 Maximum proportion π of option prices replicated within their bid-ask spread (upper) and
option bid-ask spreads (below)
1
π= max # PiQ ∈ [bi , ai ], i = 1, . . . , M .
M Q
If π < p, then M is an empty set and there does not exist exotic spread for that partic-
ular threshold p as defined by (1). Hence, when selecting the proportion threshold p,
we should keep in mind the trade-off between the in-spread precision and the number
66 F. Guillaume and W. Schoutens
Number of options replicated within bid−ask spread Number of options replicated within bid−ask spread
(RMSE price unweighted specification) (RMSE price weighted specification)
1500 900
in spread in spread
out spread 800 out spread
700
number of options
number of options
1000 600
500
400
500 300
200
100
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Number of options replicated within bid−ask spread Number of options replicated within bid−ask spread
(RMSE volatility unweighted specification) (RMSE volatility weighted specification)
1500 900
out spread out spread
in spread 800 in spread
700
number of options
number of options
1000 600
500
400
500 300
200
100
0 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Fig. 2 Number of options for which the model price falls within the quoted bid-ask spread
of test measures. Indeed, the higher the proportion, the higher the precision but the
fewer the measures selected as test measure, which can in turn lead to an underesti-
mation of the calibration uncertainty measured as the exotic bid-ask spreads. From
Fig. 1, we observe that π is significantly higher during the heart of the recent credit
crunch, i.e., from the beginning of the sample period until mid 2009. This can easily
be explained by the typically wider bid-ask spreads observed during market distress
periods. Indeed, as shown on the lower panel of Fig. 1, the quoted spread for at-the-
money, in-the-money (K = 0.75 S0 ), and out-of-the-money (K = 1.25 S0 ) options
has significantly shrunk after the troubled period of October 2008–July 2009.
Figure 2 shows the number of vanilla options whose model price falls within the
quoted bid-ask spread as a function of the option moneyness for four of the calibration
procedures under investigation, namely the weighted and unweighted RMSE price
and implied volatility specifications without penalty term. To assess the impact of
moneyness on the model ability to replicate option prices within their bid-ask spread,
we split the strike range into 21 classes: SK0 < 0.5, 0.5 ≤ SK0 < 0.55, 0.55 ≤ SK0 <
0.6, . . . , 1.45 ≤ SK0 < 1.5, and SK0 > 1.5. We clearly see that, at least for the price
specifications, option prices falling outside their quoted bid-ask spread are mainly
observed for deep out-of-the-money calls and puts. This trend is even more marked
Bid-Ask Spread for Exotic Options under Conic Finance 67
and present in the implied volatility specifications when we add a penalty term in
the objective function to constraint the model price within the market spread. Hence,
increasing the proportion threshold p mainly boils down to limit the set of calibration
instruments to close to the money vanilla options.
In order to illustrate the impact of parameter uncertainty on the bid-ask spread of
exotics, we consider the following path dependent options (with a maturity of T = 3
months):
1. Asian option
The payoff of Asian options depends on the arithmetic average of the stock price
from the emission to the maturity date of the option. The fair price of the Asian
call and put options with maturity T is given by
AC = exp(−r T )E Q [( mean St − K + AP = exp(−r T )E Q [( K − mean St + .
0≤t≤T 0≤t≤T
4. Cliquet option
The payoff of a cliquet option depends on the sum of the stock returns over a
series of consecutive time periods; each local performance being first floored
and/or capped. Moreover, the final sum is usually further floored and/or capped
to guarantee a minimum and/or maximum overall payoff such that cliquet options
protect investors against downside risk while allowing them for significant upside
68 F. Guillaume and W. Schoutens
For sake of comparison, we also price a 3 months at-the-money call option. Note
that this option does not generally belong to the set of benchmark instruments since,
most of the time, we can not observe a market quote for the option with the exact
same maturity and moneyness.
The path dependent nature of exotic options requires the use of the Monte Carlo
procedure to simulate sample paths of the underlying index. The stock price process
S0 exp((r − q)t + X t )
St = , X ∼ V G(σ, ν, θ )
E Q [exp(X t )]
is discretized by using a first order Euler scheme (for more details on the simula-
tion, see Schoutens [21]). The (standard) Monte Carlo simulation is performed by
considering one million scenarios and 252 trading days a year.
The bid and ask prices and the relative bid-ask spread (dollar bid-ask spread
expressed as a proportion of the mid-price) of different exotic options are shown
on Figs. 3 and 4, respectively, and this for a proportion threshold p equal to 0.5,
0.75, and 0.9. For sake of comparison, Fig. 5 shows the same results but for the
3 months at-the-money call option. The figures clearly indicate that the impact of
parameter uncertainty is much more marked for path-dependent derivatives than for
(non-quoted) vanilla options. Indeed, the relative bid-ask spread is of a magnitude
order at least 10 times higher for the Asian call, lookback call, barrier call, and
cliquet than for the vanilla call option. Besides, we observe that a far above average
call relative spread does not necessarily imply a far above average percentage spread
for path dependent options. In order to assess the consistency of our findings, we
have reproduced the Monte Carlo simulation 400 times for one fixed quoting day
(namely October, 1, 2008) with different sets of sample paths and computed the
option relative spreads for each simulation. Figure 6 shows the resultant histogram
for each relative spread and clearly brings out the consistency of the results: the
relative spread is far more significant for the exotic options than for the vanilla
options whatever the set of sample paths considered. The consistency of the Monte
Carlo study is besides guaranteed by the fact that we used the same set of sample
paths to price each option. Table 1 which shows the average price, standard deviation,
and relative spread (across the 400 Monte Carlo simulations) for the price weighted
RMSE functional confirms that the exotic bid-ask spreads are due to the nature of
the exotic options rather than to the intrinsic uncertainty of Monte Carlo simulations.
Indeed, the Monte Carlo relative spread given in Table 1 is significantly smaller than
the option spread depicted on Fig. 6, and this for each exotic option. Table 2 shows
the average of the relative spread over the whole period under investigation, and this
for the different options under consideration. We clearly observe that the threshold p
Bid-Ask Spread for Exotic Options under Conic Finance 69
Bid and ask prices for the Asian call option (K = S0, T = 1/4)
60
bid price (p = 0.5)
ask price (p = 0.5)
50 bid price (p = 0.75)
30
20
10
0
01/10/08 02/01/09 01/05/09 01/09/09 30/10/09
Trading day
Bid and ask prices for the Lookback call option (T = 1/4)
bid price (p = 0.5)
300 ask price (p = 0.5)
bid price (p = 0.75)
Bid and ask prices
200
150
100
80
ask price (p = 0.75)
bid price (p = 0.9)
60 ask price (p = 0.9)
40
20
0
01/10/08 02/01/09 01/05/09 01/09/09 30/10/09
Trading day
Bid and ask prices for the Cliquet option (local floor = −0.08, global floor = 0,
local cap = 0.08, global cap = + ∞, N = 3, T = 1/4, t i = i/12)
0.8
0.6
0.4
0.2
0
01/10/08 02/01/09 01/05/09 01/09/09 30/10/09
Trading day
Relative bid−ask spread for the Lookback call option (T = 1/4)
1
p = 0.5
Relative bid−ask spread
p = 0.75
0.8 p = 0.9
0.6
0.4
0.2
0
01/10/08 02/01/09 01/05/09 01/09/09 30/10/09
Trading day
Relative bid−ask spread for the UI Barrier call option (K = S0, H = 1.25 S0, T = 1/4)
2
p = 0.5
Relative bid−ask spread
p = 0.75
p = 0.9
1.5
0.5
0
01/10/08 02/01/09 01/05/09 01/09/09 30/10/09
Trading day
Relative bid−ask spread for the Cliquet option (local floor = −0.08, global floor = 0,
local cap = 0.08, global cap = + ∞, N = 3, T = 1/4, t i = i/12)
Relative bid−ask spread
p = 0.5
1
p = 0.75
p = 0.9
0.8
0.6
0.4
0.2
0
01/10/08 02/01/09 01/05/09 01/09/09 30/10/09
Trading day
Fig. 4 Evolution of exotic relative bid-ask spreads (in absolute value) through time
Bid-Ask Spread for Exotic Options under Conic Finance 71
Bid and ask prices for the Call option (K = S0, T = 1/4)
70
60
50
40
p = 0.5
0.1
p = 0.75
Relative bid−ask spread
p = 0.9
0.08
0.06
0.04
0.02
Trading day
Fig. 5 Evolution of vanilla bid and ask prices and relative bid-ask spread (in absolute value) through
time
impacts more severely the spread of the path-dependent options. Indeed, decreasing
p leads to a sharper increase of the relative bid-ask spread for the exotic options than
for the European call and put options. Besides, the calibration risk is predominant
for the up-and-in barrier call option and, to a smaller extent, for the Asian options.
Table 3 shows the 95 % quantile of relative bid-ask spreads. We clearly see that in
terms of extreme events, the more risky options are the up-and-in barrier call option
and the lookback options. By way of conclusion, our findings clearly illustrate the
impact of the calibration methodology on the price of exotic options, suggesting that
risk managers should take into account calibration uncertainty when assessing the
safety margin.
72 F. Guillaume and W. Schoutens
0.04
0.03
0.02
0.01
0
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
Relative spread
0.05
0.04
0.03
0.02
0.01
0
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
Relative spread
Fig. 6 Relative bid-ask spreads (in absolute value) for different Monte Carlo simulations
3 Conclusion
This paper sets the theoretical foundation of a new framework aimed at assessing the
impact of calibration uncertainty. The main advantage of the proposed methodology
resides in its metric-free nature since the selection of test measures does not depend
on any specified distance. Besides, the paper links the concept of uncertainty and
the recently developed conic finance theory by defining the test measures used to
construct the cone of acceptable cash-flows as the pricing measures resulting from
any plausible calibration methodology such that model and parameter uncertainties
are naturally measured as bid-ask spreads. The numerical study has highlighted
the significant impact of parameter uncertainty for a wide range of path-dependent
options under the popular VG model.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
1. Bannör, K.F., Scherer, M.: Capturing parameter risk with convex risk measures. Eur. Actuar.
J. 3, 97–132 (2013)
2. Basel Committee on Banking Supervision: Revisions to the Basel II market risk framework.
Technical report, Bank for International Settlements (2009)
3. Brockhaus, O., Long, D.: Volatility swaps made simple. Risk 13(1), 92–95 (2000)
4. CBOE. VIX: CBOE volatility index. Technical report, Chicago (2003)
5. Cherny, A., Madan, D.B.: Markets as a counterparty: an introduction to conic finance. Int. J.
Theor. Appl. Financ. 13, 1149–1177 (2010)
6. Cont, R.: Model uncertainty and its impact on the pricing of derivative instruments. Math.
Financ. 16, 519–547 (2006)
7. Cont, R.: Model calibration. Encyclopedia of Quantitative Finance, pp. 1210–1219. Wiley,
Chichester (2010)
8. Detlefsen, K., Härdle, W.K.: Calibration risk for exotic options. J. Deriv. 14, 47–63 (2007)
9. Eriksson, A., Ghysels, E., Wang, F.: The normal inverse gaussian distribution and the pricing
of derivatives. J. Deriv. 16, 23–37 (2009)
10. Guillaume, F., Schoutens, W.: Use a reduce Heston or reduce the use of Heston? Wilmott J. 2,
171–192 (2010)
11. Guillaume, F., Schoutens, W.: Calibration risk: illustrating the impact of calibration risk under
the heston model. Rev. Deriv. Res. 15, 57–79 (2012)
74 F. Guillaume and W. Schoutens
12. Guillaume, F., Schoutens, W.: A moment matching market implied calibration. Quant. Financ.
13, 1359–1373 (2013)
13. Guillaume, F., Schoutens, W.: Heston model: the variance swap calibration. J. Optim. Theory
Appl. (2013) (to appear)
14. Gupta, A., Reisinger, C.: Robust calibration of financial models using bayesian estimators. J.
Comput. Financ. (2013) (to appear)
15. Gupta, A., Reisinger, C., Whitley, A.: Model uncertainty and its impact on derivative pricing.
In: Böcker, K. (ed.) Rethinking Risk Measurement and Reporting, pp. 625–663. Nick Carver
(2010)
16. Hamida, S.B., Cont, R.: Recovering volatility from option prices by evolutionary optimization.
J. Comput. Financ. 8, 43–76 (2005)
17. Knight, F.: Risk, Uncertainty and Profit. Houghton Mifflin Co., Boston (1920)
18. Madan, D.B., Carr, P., Chang, E.: The variance gamma process and option pricing. Eur. Financ.
Rev. 2, 79–105 (1998)
19. Morini, M.: Understanding and Managing Model Risk. A Practical Guide for Quants, Traders
and Validators. Wiley, New York (2011)
20. Peng, S.: Nonlinear expectation theory and stochastic calculus under knightian uncertainty. In:
Bensoussan, A., Peng, S., Sung, J. (eds.) Real Options, Ambiguity, Risk and Insurance, pp.
144–184. IOS Press BV, Amsterdam (2013)
21. Schoutens, W.: Lévy Processes in Finance: Pricing Financial Derivatives. Wiley, New York
(2003)
22. Schoutens, W., Simons, E., Tistaert, J.: A perfect calibration! now what? Wilmott Mag. 3,
66–78 (2004)
Derivative Pricing under the Possibility
of Long Memory in the supOU Stochastic
Volatility Model
Abstract We consider the supOU stochastic volatility model which is able to exhibit
long-range dependence. For this model, we give conditions for the discounted stock
price to be a martingale, calculate the characteristic function, give a strip where it
is analytic, and discuss the use of Fourier pricing techniques. Finally, we present a
concrete specification with polynomially decaying autocorrelations and calibrate it
to observed market prices of plain vanilla options.
1 Introduction
The views expressed herein are those of the authors and do not necessarily reflect the views of
the institutions mentioned below.
R. Stelzer (B)
Institute of Mathematical Finance, Ulm University, Helmholtzstr. 18,
89081 Ulm, Germany
e-mail: [email protected]
J. Zavišin
risklab GmbH/Allianz Global Investors, Seidlstr. 24-24a,
80335 Munich, Germany
e-mail: [email protected]
In this paper, we consider a variant of the model which additionally can cover
the stylized fact of long-range dependence (or slower than exponentially decaying
autocorrelations), the supOU stochastic volatility model. In this model, we specify
the volatility as a superposition of Ornstein–Uhlenbeck (thus “supOU”) processes,
which have been introduced in [2]. Various features of this volatility model (in a
multidimensional setting) have been considered in [4, 5, 18, 26].
Typically long-range dependence is obtained by using fractional Brownian motion
or fractional Lévy processes as the driving noises, see, e.g., [6, 7] for a critical
discussion of such models for financial markets. In such models one cannot have
jumps, as fractional Lévy processes (cf. [16]) have continuous paths, and one is
bound to have long memory. In our supOU model, one has a natural extension of the
OU-type model that exhibits jumps and, depending on the parameters, can exhibit
short or long memory. However, our model shares one disadvantage with fractional
process based models, viz. that it is no longer Markovian. In this context, one should
bear in mind that most Markov processes one employs to model volatilities are
geometrically ergodic and thus cannot exhibit long memory, although there exists
also Markov process with polynomial mixing coefficients and even long memory
(see, e.g., [27]).
The focus of the present paper is on derivative pricing in and calibration of the
univariate supOU SV model similar to the papers [19, 20] in the (multivariate) OU-
type SV model. To this end, we first briefly review the model in Sect. 2. In Sect. 3,
we give conditions on the parameters such that the discounted stock price process
is a martingale which implies that under these conditions the model can be used to
describe the risk neutral dynamics of a financial asset. Thereafter, we start Sect. 4
with a review of Fourier pricing. Then, we give the characteristic function of the log
asset price in the supOU SV model and show conditions for the moment generating
function to be sufficiently regular so that Fourier pricing is applicable. Finally, we
present a concrete specification, the Γ -supOU SV model, in Sect. 5 and discuss its
calibration to market data which we illustrate with a small example using options on
the DAX. Finally, we discuss a subtle issue regarding how to employ the calibrated
model to calculate prices of European options with a general maturity.
We briefly review the definition and the most important known facts of the supOU
stochastic volatility model introduced in [5]. More background on supOU processes
can be found in [2, 4, 13, 26].
In the following, R− denotes the set of negative real numbers and Bb (R− × R)
denotes the bounded Borel sets of R− × R.
R+
t
Σt = e A(t−s) Λ(d A, ds)
R− −∞
t t 1
Xt = X0 + as ds + Σs2 dWs + ρ(L t − γ0 t),
0 0
3 Martingale Conditions
Now we assume given a market with a deterministic numeraire (or bond) with price
process er t for some r ≥ 0 and a risky asset with price process St .
We want to model the market by a supOU stochastic volatility model under the risk
neutral dynamics. Thus, we need to understand when Ŝt = e−r t e X t is a martingale
Derivative Pricing for supOU SV Models 79
for the filtration G = (Gt )t∈R+ generated by the Wiener process and the Lévy basis,
i.e., Gt = σ ({Λ(A), Ws : s ∈ [0, t] and A ∈ Bb (R− × (−∞, t])}) for t ∈ R+ .
Implicitly, we understand that the filtration is modified such that the usual hypotheses
(see, e.g., [22]) are satisfied.
Theorem 3.1 (Martingale condition) Consider a market as described above. Sup-
pose that
ρx
e − 1 ν(dx) < ∞. (1)
x>1
Proof The arguments are straightforward adaptations of the ones in [19, Proposition
2.10] or [20, Sect. 3].
Our aim now is to use the Fourier pricing approach in the supOU stochastic volatility
model for calculating prices of European derivatives.
We start with a brief review on the well-known Fourier pricing techniques introduced
in [9, 23].
Let the price process of a financial asset be modeled as an exponential semi-
martingale S = (St )0≤t≤T , i.e., St = S0 e X t , 0 ≤ t ≤ T where X = (X t )0≤t≤T is a
semimartingale.
Let r be the risk-free interest rate and let us assume that we are directly work-
ing under an equivalent martingale measure, i.e., the discounted price process
Ŝ = ( Ŝt )0≤t≤T given by Ŝt = S0 e X t −r t is a martingale.
We call the process X the underlying process and without loss of generality we
can assume that X 0 = 0. We denote by s minus the logarithm of the initial value of
S, i.e., s = − log(S0 ).
Let fˆ denote the Fourier transform of the function f , i.e., fˆ(u) = R eiux f (x)dx.
80 R. Stelzer and J. Zavišin
(i) g R ∈ L 1 (R) ∩ L ∞ (R), (ii) Φ X T |G0 (R) < ∞, (iii) Φ X T |G0 (R + i·) ∈ L 1 (R),
e−r t−Rs
then V f (X T ; s) = 2π Re
−ius Φ
X T |G0 (R + iu) fˆ(i R − u)du.
It is well known that for a European Call option with maturity T and strike K > 0
condition (i) is satisfied for R > 1 and that for the payoff function f (x) = max(e x −
K , 0) =: (e x − K )+ the Fourier transform is fˆ(u) = iu(1+iu) K 1+iu
for u ∈ C with
Im(u) ∈ (1, ∞).
In the following, we calculate the characteristic/moment generating function for
the supOU SV model and show conditions when the above Fourier pricing techniques
are applicable.
Consider the general supOU SV model with drift of the form at = μ + γ0 + βΣt .
Note that then the discounted stock price is a martingale if and only if β = −1/2
and μ + γ0 = r − R+ (eρx − 1) ν(dx).
Standard calculations as in [19, Theorem 2.5] or [20] give the following result
which is the univariate special case of a formula reported in [4, Sect. 5.2].
Theorem 4.2 Let X 0 ∈ R and let the log-price process X follow a supOU SV model
of the above form. Then, for every t ∈ R+ and for all u ∈ R the characteristic
function of X t given G0 is given by
t
e A(t−s) i 2 1 i 2
+ ϕ uβ + u − uβ + u − ρu dsπ(d A) .
A 2 A 2
R− 0
Derivative Pricing for supOU SV Models 81
Note that in contrast to the case of the OU-type stochastic volatility model, where
(X, Σ) is a strong Markov process, in the supOU stochastic volatility model Σ is not
Markovian. Thus, conditioning on X 0 and Σ0 is not equivalent to conditioning upon
G0 . Therefore, Φ X t |G0 (iu) is not simply a function of X 0 , Σ0 . Instead, the whole past
of the Lévy basis enters via the G0 -measurable
0
1 A(t−s)
which has a similar role as the initial volatility Σ0 in the OU-type stochastic volatility
model. Like Σ0 in the OU-type models, z t can be treated as an additional parame-
ter to be determined when calibrating the model to market option prices. We can
immediately see that thus the number of parameters to be estimated increases with
each additional maturity. As it will become clear later, the following observation is
important.
In order to apply Fourier pricing, we now show where the moment generating function
Φ X T |G0 is analytic.
Let θ L (u) = γ0 u + R+ (eux − 1) ν(dx) be the cumulant transform of the Lévy
basis (or rather its underlying subordinator). If x≥1 er x ν(dx) < ∞ for all r ∈
R such that r < ε for some ε > 0, then the function θ L is analytic in the open set
SL := {z ∈ C : Re(z) < ε}, as can be seen, e.g., from the arguments at the start of
the proof of [19, Lemma 2.7].
t
for some ε > 0. Then the function Θ(u) = R− 0 θ L (u f u (A, s))dsπ(d A) is analytic
on the open strip
82 R. Stelzer and J. Zavišin
|ρ| √
S := {u ∈ C, | Re(u)| < δ} with δ := −|β| − + Δ, (5)
t
2
|ρ|
where Δ := |β| + t + t .
2ε
The rough idea of the proof is similar to [19, Theorem 2.8], but the fact that we
now integrate over the mean reversion parameter adds significant difficulty, as now
bounds independent of the mean reversion parameter need to be obtained and a very
general holomorphicity result for integrals has to be employed.
Proof Define
e A(t−s) u
1 u
We first determine δ > 0 such that for all u ∈ R with |u| < δ it holds that
|u f u (A, s)| < ε. We have
e A(t−s) − 1 u2
|u f u (A, s)| ≤ |β||u| + + |ρ||u| (7)
A 2
by the triangle inequality. In order to find the upper bound for the latter term, we first
note that elementary analysis shows
e A(t−s) − 1
≤t (8)
A
all A < 02
and s ∈ [0, t]. Thus, we have to find δ > 0 such that |u f u (A, s)| ≤
for
t |β||u| + u2 + |ρ||u| < ε,for all u ∈ R with |u| < δ, i.e., to find the solutions of
the quadratic equation
t 2
u + (t|β| + |ρ|) |u| − ε = 0. (9)
2
Since for u = 0 the sign of (9) is negative, i.e., (9) is equal to −ε, we know that there
exist one positive and one negative solution. The positive one is δ as given in (5).
Now let u ∈ S, i.e., u = v + iw with
v, w A(t−s)
∈ R, |v| < δ. Observe that
w2 e A(t−s) −1 −1
Re(u f u (A, s)) = v f v (A, s) − 2 A and e
A ≥ 0 for all s ∈ [0, t]
and A < 0. Hence, Re(u f u (A, s)) ≤ v f v (A, s). This implies that
eRe(u fu (A,s))x ν(dx) ≤ ev fv (A,s)x ν(dx) < ∞
x≥1 x≥1
Derivative Pricing for supOU SV Models 83
we obtain that
u f u (A,s)x
e − 1 ν(dx) ≤ K (u) xν(dx) + O K (u)2 |x|2 ν(dx) =: B2 (u),
x≤1 x≤1 x≤1
Let Sn := {C
u = v + iw : |v| ≤ δ − 1/n} ⊆ S. Since the function v f v (A, s)
is continuous on the compact set Vn = {v ∈ R : |v| ≤ δ − 1/n}, it attains its min-
imum and maximum on that set, i.e., there exists v∗ ∈ Vn such that v f v (A, s) ≤
v∗ f v∗ (A, s) ≤ |v∗ f v∗ (A, s)| =: K n (u) for all v ∈ Vn . Note thatv∗ ∈ Vn implies that
K n (u) < ε. Since Re(u f u (A, s)) ≤ v f v (A, s) and eu fu (A,s)x = eRe(u fu (A,s))x ≤
e K n (u)x , it follows that
u fu (A,s)x K n (u)x
e − 1 ν(dx) ≤ e ν(dx) + ν(dx) =: B3,n (u),
x>1 x>1 x>1
so the function t (B1 (u) + B2 (u) + B3,n (u)) is integrable with respect to π . Since
ϕ(u, A) is analytic and thus a continuous function on Sn , for all A < 0, it also holds
that |ϕ(u, A)| is continuous
on Sn , for all A < 0. By the dominated convergence
theorem, it follows that R− |ϕ(u, A)| π(d A) is continuous and thus a locally bounded
function on Sn . Since n ∈ N was arbitrary, it follows that the function is continuous
and locally bounded on S, which completes the proof.
Now, we can easily give conditions ensuring that (ii) in Theorem 4.1 is satisfied.
Corollary 4.5 Let x≥1 er x ν(dx) < ∞ for all r ∈ R such that r < ε for some
ε > 0. Then the moment generating function Φ X T |G0 is analytic on the open strip
√
2
S := {u ∈ C : | Re(u)| < δ} with δ := −|β| − |ρ|
T + Δ where Δ := |β| + |ρ|
T +
T .
2ε
Furthermore,
for all u ∈ S.
Proof Follows from Theorems 4.2 and 4.4 noting that an analytic function is uniquely
identified by its values on a line and [19, Lemma A.1].
Very similar to [19, Theorem 6.11], we can now prove that also condition (iii) in
Theorem 4.1 is satisfied for the supOU SV model.
w → Φ X T |G0 (v + iw)
is absolutely integrable.
5 Examples
If we want to price a derivative by Fourier inversion, then this means in the supOU SV
model that we have to calculate the inverse Fourier
transform
t by numerical integration
and inside this the double integral in Θ(u) = R− 0 θ L (u f u (A, s))dsπ(d A). If we
want to calibrate our model to market data, the optimizer will repeat this procedure
very often and so it is important to consider specifications where at least some of the
integrals can be calculated analytically.
Actually, it is not hard to see that one can use the standard specifications for ν of
the OU-type stochastic volatility model (see [3, 11, 20, 25]) which are named after
the resulting stationary distribution of the OU-type processes.
As in the case of a Γ -OU process we can choose the underlying Lévy process to be
a compound Poisson process with the characteristic triplet (γ0 , 0, abe−bx 1{x>0} dx)
with a, b > 0 where abusing notation we specified the Lévy measure by its density.
Furthermore, we assume that A follows a “negative” Γ -distribution, i.e., that π is
the distribution of B R, where B ∈ R− and R ∼ Γ (α, 1) with α > 1 which is
the specification typically used to obtain long memory/a polynomial decay of the
autocorrelation function. We refer to this specification as the Γ -supOU SV model.
Using (6) we have
t t
t t
e A(t−s) u2
u γ0 f u (A, s)dsπ(d A) = γ0 uβ + dsπ(d A)
A 2
R− 0 R− 0
I1
t t
1 u2
− uβ + dsπ(d A) + ρudsπ(d A) .
A 2
R− 0 R− 0
I2 I3
86 R. Stelzer and J. Zavišin
uβ + u2
I1 = − 2
ln(1 − Bt) if α = 2,
B 2
t uβ + u2 t
I2 = , I3 = ρu dsπ(d A) = ρut.
B(α − 1)
0 R−
u2
Furthermore setting C(A) := 1
A uβ + 2 − ρu one obtains for the second sum-
mand in Θ
t
A(b + C(A)) b− e At
uβ + u2
+ C(A)
R− A 2
Unfortunately, we have been unable to obtain a more explicit formula for this integral,
and so it has to be calculated numerically. In our example later on we have used the
standard Matlab command “integral” for this. Note that the well-behavedness of
this numerical integration depends on the choice of π . For our choice, π being a
negative Gamma distribution implies roughly (i.e., up to a power) an exponentially
fast decaying integrand for A → ∞, whereas the behavior at zero appears to be hard
to determine.
We can also choose the underlying Lévy process as in an IG-OU model with
parameters δ and γ , while keeping the choice of the measure π the same. In this
1
case, we have ν(dx) = √1 δ x −1 + γ 2 x − 2 exp − 21 γ 2 x 1{x>0} dx and the only
2 2π
difference compared to the previous case is in the calculation of the triple integral
which also can be partially calculated analytically so that only a one-dimensional
numerical integration is necessary.
Table 1 Calibrated supOU SV model parameters for DAX data of August 19, 2013
ρ a b B α γ0
−10.8797 0.2225 29.4025 −0.0004 4.3632 0.0000
z t1 z t2 z t3 z t4 z t5 z t6 z t7 z t8
0.0012 0.0026 0.0038 0.0054 0.0093 0.0136 0.0225 0.0328
Implied volatility
0.25
0.2
0.2
0.15
0.15
0.1 0.1
0.9 0.95 1 1.05 0.9 0.95 1 1.05
Moneyness K/S Moneyness K/S
0 0
Implied volatility
0.2 0.2
0.15 0.15
0.1 0.1
0.9 0.95 1 1.05 0.9 0.95 1 1.05
Moneyness K/S0 Moneyness K/S0
Implied volatility
0.2
0.2
0.18
0.18
0.16
0.16
0.9 0.95 1 1.05 0.9 0.95 1 1.05
Moneyness K/S0 Moneyness K/S0
Implied volatility
0.25
0.22
0.2
0.2
0.18
0.16
0.15
0.8 0.9 1 1.1 1.2 0.8 0.9 1 1.1 1.2
Moneyness K/S Moneyness K/S
0 0
Fig. 1 Calibration of the supOU model to call options on DAX: The Black–Scholes implied
volatilities. The implied volatilities from market prices are depicted by a dot, the implied volatilities
from model prices by a solid line
Derivative Pricing for supOU SV Models 89
exhibit long memory. One should be very careful not to overinterpret these findings,
as no confidence intervals/hypothesis tests are available in connection with such a
standard calibration.
The leverage parameter ρ is negative, which implies a negative correlation be-
tween jumps in the volatility and returns. Hence, the typical leverage effect is present.
The drift parameter of the underlying Lévy basis γ0 is estimated to be practically
zero. So our calibration suggests that a driftless pure jump Lévy basis may be quite
adequate to use.
Let us briefly turn to a comparison with the OU-type stochastic volatility model
(cf. [19] or [20]) noting that a detailed comparison with various other models is
certainly called for, but beyond the scope of the present paper. For some β < 0
looking at a sequence of Γ -supOU models with αn = n, Bn = β/n and all others
parameters fixed, shows that the mean reversion probability measures πn converge
weakly to the delta distribution at β. So the OU model is in some sense a limiting
case of the supOU model. However, the limiting model is very different from all
approximating models, as it is Markovian, has the same decay rate for all jumps,
whereas the approximating supOU models have all negative real numbers as possible
decay rates for individual jumps. This implies that in connection with real data the
behavior of the OU and the supOU model can well be rather different. Calibrating a
Γ -OU model to our DAX data set (so the only parameter now different is π , which is
a Dirac measure) returns actually a globally better fit (the RMSE is 0.0037). Looking
at the plots of market against model implied volatilities they all look quite similar
(Fig. 2 shows only the last four largest maturities) to the ones in Fig. 1, although the
fit for the early maturities is definitely better when looking closely. Yet, there is one
big exception, the last maturity, where the supOU model fits much better. Whereas
the rate of the underlying compound Poisson process is a = 0.2225 in the supOU
model, it is 1.2671 in the OU model. The mean of the decay rates is −0.0017 in the
supOU model and the decay rate of the OU case is −1.3906. Noting that the standard
deviation of the decay rates is 0.0008 in the supOU model, the two calibrated models
are indeed in many respects rather different.
Remark 5.1 (How to price options with general maturities?) After having calibrated
a model to observed liquid market prices one often wants to use it to price other
(exotic) derivatives. Looking at a European derivative with payoff f (ST ) for some
measurable function f and maturity T > 0, one soon realizes that we can only
obtain its price directly if T ∈ {t1 , t2 , . . . , t M }, as only then we know z T , thus the
characteristic function Φ X T |G0 and therefore the distribution of the price process
at time T conditional on our current information G0 . This is not desirable and the
problem is that we assume that we know G0 in theory, but we have only limited
information in the market prices which we can use to get only parts of the information
in G0 .
It seems that to get z t for all t ∈ R+ one needs to really know the whole past
of Λ, i.e., all jumps before time 0 and the associated times and decay rates. This is
clearly not feasible. A detailed analysis on the dependence of z t on t is beyond the
scope of this paper. But we briefly want to comment on possible ad hoc solutions
90 R. Stelzer and J. Zavišin
Implied volatility
0.2 0.2
0.18 0.18
0.16 0.16
0.9 0.95 1 1.05 0.9 0.95 1 1.05
Moneyness K/S0 Moneyness K/S0
Implied volatility
0.22 0.24
0.2 0.22
0.18 0.2
0.16 0.18
0.8 0.9 1 1.1 0.8 0.9 1 1.1
Moneyness K/S0 Moneyness K/S0
Fig. 2 Calibration of the OU model to call options on DAX: The implied volatilities from market
prices are depicted by a dot, the implied volatilities from model prices by a solid line. Last four
maturities only
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
Derivative Pricing for supOU SV Models 91
References
1. Bannör, K.F., Scherer, M.: A BNS-type stochastic volatility model with two-sided jumps with
applications to FX options pricing. Wilmott 2013, 58–69 (2013)
2. Barndorff-Nielsen, O.: Superposition of Ornstein–Uhlenbeck type processes. Theory Probab.
Appl. 45, 175–194 (2001)
3. Barndorff-Nielsen, O., Shephard, N.: Non-Gaussian Ornstein-Uhlenbeck-based models and
some of their uses in financial economics (with discussion). J. R. Stat. Soc. B Stat. Methodol.
63, 167–241 (2001)
4. Barndorff-Nielsen, O., Stelzer, R.: Multivariate supOU processes. Ann. Appl. Probab. 21(1),
140–182 (2011)
5. Barndorff-Nielsen, O., Stelzer, R.: The multivariate supOU stochastic volatility model. Math.
Finance 23, 275–296 (2013)
6. Bender, C., Sottinen, T., Valkeila, E.: Arbitrage with fractional Brownian motion? Theory
Stoch. Process. 13(1–2), 23–34 (2007)
7. Björk, T., Hult, H.: A note on Wick products and the fractional Black-Scholes model. Finance
Stoch. 9(2), 197–209 (2005)
8. Brockwell, P.J., Davis, R.A.: Time Series: Theory and Methods, vol. 2. Springer, New York
(1991)
9. Carr, P., Madan, D.B.: Option valuation using the fast Fourier transform. J. Comput. Finance
2, 61–73 (1999)
10. Cont, R.: Empirical properties of asset returns: stylized facts and statistical issues. Quant.
Finance 1, 223–236 (2001)
11. Cont, R., Tankov, P.: Financial Modelling with Jump Processes. CRC Financial Mathematical
Series. Chapman & Hall, London (2004)
12. Eberlein, E., Glau, K., Papapantoleon, A.: Analysis of Fourier transform valuation formulas
and applications. Appl. Math. Finance 17, 211–240 (2010)
13. V, Fasen, Klüppelberg, C.: Extremes of supOU processes. In: Benth, F.E., Di Nunno, G.,
Lindstrom, T., Øksendal, B., Zhang, T. (eds.) Stochastic Analysis and Applications: The Abel
Symposium 2005. Abel Symposia, vol. 2, pp. 340–359. Springer, Berlin (2007)
14. Guillaume, D.M., Dacorogna, M.M., Davé, R.D., Müller, U.A., Olsen, R.B., Pictet, O.V.:
From the bird’s eye to the microscope: a survey of new stylized facts of the intra-daily foreign
exchange markets. Finance Stoch. 1, 95–129 (1997)
15. Königsberger, K.: Analysis 2. Springer, Heidelberg (2004)
16. Marquardt, T.: Fractional Lévy processes with an application to long memory moving average
processes. Bernoulli 12, 1099–1126 (2006)
17. Mattner, L.: Complex differentiation under the integral. Nieuw Archief voor Wiskunde 5/2(2),
32–35 (2001)
18. Moser, M., Stelzer, R.: Tail behavior of multivariate Lévy driven mixed moving average
processes and related stochastic volatility models. Adv. Appl. Probab. 43, 1109–1135 (2011)
19. Muhle-Karbe, J., Pfaffel, O., Stelzer, R.: Option pricing in multivariate stochastic volatility
models of OU type. SIAM J. Finance Math. 3, 66–94 (2011)
20. Nicolato, E., Venardos, E.: Option pricing in stochastic volatility models of the Ornstein-
Uhlenbeck type. Math. Finance 13, 445–466 (2003)
21. Pigorsch, C., Stelzer, R.: A multivariate Ornstein–Uhlenbeck type stochastic volatility model.
Working paper (2009) http://www.uni-ulm.de/mawi/finmath/people/stelzer/publications.html
22. Protter, P.: Stochastic Integration and Differential Equations. Stochastic Modelling and Applied
Probability, vol. 21, 2nd edn. Springer, New York (2004)
23. Raible, S.: Lévy Processes in Finance: Theory, Numerics and Empirical Facts. Disserta-
tion, Mathematische Fakultät, Albert-Ludwigs-Universität Freiburg i. Br., Freiburg, Germany,
(2000)
24. Sato, K.: Lévy Processes and Infinitely Divisible Distributions, volume 68 of Cambridge Studies
in Advanced Mathematics. Cambridge University Press, Cambridge (1999)
92 R. Stelzer and J. Zavišin
25. Schoutens, W.: Lévy Processes in Finance—Pricing Financial Derivatives. Wiley, Chicester
(2003)
26. Stelzer, R, Tosstorff, T, Wittlinger, M.: Moment based estimation of supOU processes and a
related stochastic volatility model. submitted for publication (2013) http://arxiv.org/abs/1305.
1470v1
27. Veretennikov, A.Y.: On lower bounds for mixing coefficients of Markov diffusions. In: Kabanov,
Y., Lipster, R., Stoyanov, J. (eds.) From Stochastic Calculus to Mathematical Finance—The
Shiryaev Festschrift, pp. 33–68. Springer, Berlin (2006)
A Two-Sided BNS Model for Multicurrency
FX Markets
1 Introduction
For derivatives valuation, the Black–Scholes model, presented in the seminal paper
[4], generated a wave of stochastic models for the description of stock-prices. Since
the assumptions of the Black–Scholes model (normally distributed log-returns, inde-
pendent returns) cannot be observed in neither time series of stock-prices nor option
markets (implicitly expressed in terms of the volatility surface), several alterna-
tive models have been developed trying to overcome these assumptions. Some
K.F. Bannör
Deloitte & Touche GmbH Wirtschaftsprüfungsgesellschaft, Rosenheimer Platz 4,
81669 München, Germany
e-mail: [email protected]
M. Scherer · T. Schulz(B)
Technische Universität München, Parkring 11, 85748 Garching-Hochbrück, Germany
e-mail: [email protected]
T. Schulz
e-mail: [email protected]
models, as, e.g., [9, 23] account for stochastic volatility, while others as, e.g., [12,
16] enrich the original Black–Scholes model with jumps. Both approaches have
been combined in the models of, e.g., [3, 6]. Another approach combining stochastic
volatility and negative jumps in both volatility and asset-price process, employ-
ing Lévy subordinator-driven Ornstein–Uhlenbeck processes, is available with the
Barndorff–Nielsen–Shephard (BNS) model class, presented in [2] and extended in
several papers (e.g. [18]). A multivariate extension of the BNS model class employ-
ing matrix subordinators is designed in [20] and pricing in this model is scrutinized in
[17]. In the special case of a Γ -OU-BNS model, a tractable variant of a multivariate
BNS model based on subordination of compound Poisson processes was developed
by [15]. This model allows for a separate calibration of the single assets (following
a univariate Γ -OU-BNS model) and the dependence structure.
Besides for options on stocks, these models have also been used to price derivatives
on other underlyings. When modeling foreign exchange (FX) rates instead of stock-
prices, one has to cope with the introduction of two different interest rates as well as
identifying the actual tradeable assets. The Black–Scholes model was adapted to FX
markets by [8]. Many of the models mentioned above have been employed for FX
rates modeling as, e.g., [3, 9]. Since the original BNS model assumes only downward
jumps in the asset-price process, [1] extend the BNS model class to additionally
incorporate positive jumps, which is needed for the realistic modeling of FX rates
and calibrates much better to FX option surfaces.
In this paper, we unify the extensions of the BNS model from [1, 15] and intro-
duce a multivariate Γ -OU-BNS model with time-changed compound Poisson drivers
incorporating dependent jumps in both directions, both generalizing the univariate
two-sided Γ -OU-BNS model and the multivariate “classical” Γ -OU-BNS model.
Since the two-sided Γ -OU-BNS model seems to be particularly suitable for the mod-
eling of FX rates, we consider a multivariate two-sided Γ -OU-BNS model a sensible
choice for the valuation of multivariate FX derivatives such as best-of-two options.
Since the multivariate two-sided model accounts for joint and single jumps in the FX
rates, the jump behavior of modeled FX rates resembles reality better than models
only employing joint or single jumps, as illustrated in Fig. 1. Furthermore, a mul-
tivariate two-sided BNS model for FX rates with a common currency also implies
a jump-diffusion model for an FX rate via quotient or product processes. A crucial
feature of our multivariate approach is the separability of the univariate models from
the dependence structure, i.e. one has two sets of parameters that can be determined
in consecutive steps: parameters determining each univariate model and parameters
determining the dependence. This feature provides tractability for practical applica-
tions like simulation or calibration on the one side, but also simplifies interpretability
of the model parameters on the other side.
Instead of modeling the FX spot rates only, one could model FX forward rates
to get a model setup suited for pricing cross-currency derivatives depending on FX
forward rates, as for example cross-currency swaps. Multicurrency models built
upon FX forward rates (see e.g. [7]) on the one hand support flexibility to price
such derivatives, on the other hand, however, these models do not provide the crucial
A Two-Sided BNS Model for Multicurrency FX Markets 95
Fig. 1 The logarithmic returns of EUR-SEK and USD-SEK FX rates over time. Assuming that
every logarithmic return exceeding three standard deviations (dashed lines) from the mean can be
interpreted as a jump (obviously, smaller jumps occur as well, but may be indistinguishable from
movement originating in the Brownian noise), one can see that joint as well as separate jumps in the
EUR-SEK and the USD-SEK logarithmic returns occur. Clearly, this 3-standard deviation criterion
is just a rule of thumb, however, [10] investigated the necessity of both common and individual
jumps in a statistical thoroughly manner. Hence, a multivariate FX model capturing the stylized
facts of both joint and separate jumps can be valuable. The data was provided by Thomson Reuters
property of separating the dependence structure from the univariate models, which
makes it extremely difficult to calibrate such a multivariate model in a sound manner.
The remaining paper is organized as follows: In Sect. 2, we recall the two-sided
Barndorff–Nielsen–Shephard model constructed in [1] and outline stylized facts of
its trajectories. In Sect. 3, we introduce a multivariate version of the two-sided Γ -OU-
BNS model, using the time change construction from [15] to incorporate dependence
between the jump drivers. Section 4 focuses on the specific obstacles occuring when
modeling FX rates in a multivariate two-sided Γ -OU-BNS model, particularly the
dependence structure of joint jumps and the implied model for a third FX rate which
may be induced. In Sect. 5, we describe a calibration of the model to implied volatility
surfaces and show how the model can be used to price multivariate derivatives. We
then evaluate the model in a numerical case study. Finally, Sect. 6 concludes.
We briefly motivate the construction and main features of the two-sided BNS model
class. The classical BNS model accounts for the leverage effect, a feature of stock
returns, by incorporating negative jumps in the asset-price process, accompanied by
96 K.F. Bannör et al.
upward jumps in the stochastic variance. While downward jumps might be sufficient
in the case of modeling stock-price dynamics, it is not suitable when modeling FX
rates, where one-sided jumps contradict economic intuition. Hence, [1] develop an
extension of the BNS model which allows for two-sided jumps and is able to capture
the symmetric nature of FX rates.
We say that a stochastic process {St }t≥0 follows a two-sided BNS model (abbrevi-
ated BNS2 model), if the log-price X t := log St follows the dynamics of the SDEs
dX t = (μ + βσt2 ) dt + σt dWt + ρ+ dZ t+ + ρ− dZ t− ,
dσt2 = −λσt2 dt + dZ t+ + dZ t− ,
1Compared to the original formulation of the model in [1] and the original BNS model from [18],
we do not change the clock of the subordinators to t → λt. This formulation is equivalent and more
handy in the upcoming multivariate construction.
A Two-Sided BNS Model for Multicurrency FX Markets 97
10.5
10
9.5
9
8.5
8
7.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time
0.2
0.15
0.1
0.05
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Time
Fig. 2 Sample path of a two-sided BNS model, generated from calibrated parameters. The FX rate
process exhibits positive and negative jumps
rates with common currency. To establish dependence between the compound Pois-
son drivers, we employ the time-change methodology presented in [15], yielding an
analytically tractable and easy-to-simulate setup.
Definition 1 (Time-changed CPPs with exponential jump sizes) Let c0 , η1 , . . . ,
ηd > 0 and c1 , . . . , cd ∈ (0, c0 ). Furthermore, let d ∈ N and Y (1) , . . . , Y (d)
be d independent compound Poisson processes with intensities c1 /(c0 − c1 ), . . . ,
cd /(c0 − cd ) and Exp(c0 η1 /(c0 − c1 )), . . . , Exp(c0 ηd /(c0 − cd ))-distributed jump
sizes. To these compound Poisson processes, we apply a time change with another
independent compound Poisson process T = {Tt }t≥0 with Exp(1)-distributed jump
sizes and intensity c0 . Define the T -subordinated compound Poisson processes
( j) ( j)
Z (1) , . . . , Z (d) by {Z t }t≥0 := {YTt }t≥0 . We call the d-tuple of (Z (1) , . . . , Z (d) )
a time-change-dependent multivariate compound Poisson process with parameters
(c0 , c1 , . . . , cd , η1 , . . . , ηd ).
At first sight, the subordination of a compound Poisson process with another
compound Poisson process may look strange, particular in the light of interpreting
the time change as “business time”, following the idea of [14]. But in this case, we
primarily use the joint subordination to introduce dependence via joint jumps between
compound Poisson processes without the interpretation as “business time”, the time
change construction has a technical nature and provides a convenient simulation
scheme.
Remark 1 (Properties of time-changed CPPs, cf. [15])
(i) Each coordinate of the T -subordinated compound Poisson process Z ( j) is again a
compound Poisson process with intensities c j and jump size distribution Exp(η j )
for all j = 1, . . . , d.
(ii) For cmax := max1≤ j≤d c j , the correlation coefficient of (Z ( j) , Z (k) ), 1 ≤ j ≤
d, 1 ≤ k ≤ d, j = k is given by
√ √
( j) (k) c j ck c j ck
Corr Zt , Zt = =κ ,
c0 cmax
with κ := cmax /c0 ∈ (0, 1). We call κ the time-change correlation parameter. In
√
particular, correlation coefficients ranging from zero to c j ck /cmax are possible,
and the correlation does not depend on the point in time t.
(iii) Due to the common time change, the compound Poisson processes Z (1) , . . . , Z (d)
are stochastically dependent. Moreover, it can be shown that the dependence
structure of the d-dimensional process (Z (1) , . . . , Z (d) ) is driven solely by the
time-change correlation parameter κ.
A striking advantage of introducing dependence among the jumps in this manner
is that the time-changed processes Z (1) , . . . , Z (d) remain in the class of compound
Poisson processes with exponential jump heights, which ensures that the marginal
processes maintain a tractable structure. In particular, the characteristic functions
of the univariate log-price processes in a two-sided Γ -OU-BNS model are still at
A Two-Sided BNS Model for Multicurrency FX Markets 99
hand. Moreover, the univariate processes Z (1) , . . . , Z (d) can be simulated as ordinary
compound Poisson processes with exponentially distributed jump heights and the
Laplace transform is given. Hence, we can now define a multidimensional two-sided
Γ -OU-BNS model with dependent jumps.
with (W (1) , . . . , W (d) ) being correlated Brownian motions with correlation matrix
( j) ( j)
Σ and for all 1 ≤ j ≤ d, μ j , β j ∈ R, ρ+ > 0, ρ− < 0, λ j > 0, and
(Z +(1) , Z −(1) ), . . . , (Z +(d) , Z −(d) ) are pairs of independent compound Poisson
processes with exponential jumps. Furthermore, the 2d-dimensional Lévy process
(Z +(1) , Z −(1) , . . . , Z +(d) , Z −(d) ) splits up in two time-change-dependent d-tuples
of compound Poisson processes (cf. Definition 1).
The dependence parameters, which are the correlation matrix Σ of the Brownian
motions and the time-change correlation parameters κ̃ and κ̂ that determine the
dependence structure of the time-change-dependent multivariate compound Poisson
processes, can be calibrated separately afterwards without altering the already fixed
marginal distributions. This simplifies the model calibration and is a convenient fea-
ture for practical purposes, because it automatically ensures that univariate derivative
prices are fitted to the multivariate model.
In this section, we discuss the modeling of FX rates with a bivariate two-sided Γ -OU-
BNS model. Particularly, we discuss how to soundly introduce dependence between
the Lévy drivers and investigate a possible “built-in” model induced by the model for
the two FX rates. We concentrate on the case of two currency pairs, which illustrates
the problems of choosing the jump dependence structure best.
To ensure familiarity with the FX markets wording, we recall that an FX rate
is the exchange rate between two currencies, expressed as a fraction. The currency
in the numerator of the fraction is called (by definition) domestic currency, while
the currency in the denominator of the fraction is called foreign currency.2 The role
each currency plays in an FX rate is defined by market conventions and is often due
to historic reasons, so economic interpretations are not necessarily helpful. A more
detailed discussion of market conventions of FX rates and derivatives is provided in
[22], a standard textbook on FX rates modeling is [13].
2 The wording “foreign” and “domestic” currency does not necessarily reflect whether the currency
is foreign or domestic from the point of view of a market participant. The currency EUR, e.g., is
always foreign currency by market convention. Sometimes, the foreign currency is called underlying
currency, while the domestic currency is called accounting or base currency.
A Two-Sided BNS Model for Multicurrency FX Markets 101
When two FX rates are modeled and among the two rates there is a common cur-
rency, this bivariate model always implicitly defines a model for the missing currency
pair which is not modeled directly, e.g. when modeling EUR-USD and EUR-CHF
exchange rates simultaneously, the quotient process automatically implies a model
for the USD-CHF exchange rate. Similar to the bivariate Garman–Kohlhagen model,
modeling two FX rates directly by a bivariate two-sided BNS model does not nec-
essarily imply a similar model for the quotient or product process from the same
family, but the main structure of a jump-diffusion-type model is maintained.
Lemma 1 (Quotient and product process of a two-sided BNS model) Given two
(1) (2)
asset-price processes {St }t≥0 and {St }t≥0 modeled by a multivariate two-sided
(1) (2)
Γ -OU-BNS models, the product and quotient processes {St St }t≥0 resp.
(1) (2)
{St /St }t≥0 are both of jump-diffusion type.
Proof Follows directly from log(St(1) St(2) ) = X t(1) + X t(2) and log(St(1) /St(2) ) =
X t(1) − X t(2) .
Due to symmetry in FX rates, the implied model for the third missing FX rate can
be used to calibrate the parameters steering the dependence, namely, the correlation
between the Brownian motions as well as the time-change correlation parameters, or
equivalently the intensities of the time-change processes. Additionally, the calibration
performance of the implied model to plain vanilla options yields a plausibility check
whether the bivariate model may be useful for the evaluation of true bivariate options,
e.g. best-of-two options or spread options.
102 K.F. Bannör et al.
5.1 Data
As input data for our calibration exercise we use option data on exchange rates
concerning the three currencies EUR, USD, and SEK. Since the EUR-USD exchange
rate can be regarded as an implied exchange rate, i.e.
USD SEK/EUR
= ,
EUR SEK/USD
we model the two exchange rates EUR-SEK and USD-SEK directly with two-sided
Γ -OU-BNS models as suggested in [1]. For each currency pair EUR-SEK, USD-
SEK, and EUR-USD, we have the implied volatilities of 204 different plain vanilla
options (different maturities, different moneyness) available as input data. The option
data is as of August 13, 2012, and was provided by Thomson Reuters.
We consider a market with two traded assets, namely {exp(rUSD t)StUSDSEK }t≥0 and
{exp(rEUR t)StEURSEK }t≥0 , where StUSDSEK , StEURSEK denote the exchange rates at
time t and rUSD , rEUR , rSEK denote the risk free interest rates in the correspond-
ing monetary areas. These assets can be seen as the future value of a unit of
the respective foreign currency (in this case USD or EUR), valued in the domes-
tic currency (which is SEK). Assume a risk-neutral measure QSEK to be given
with numéraire process {exp(rSEK t)}t≥0 , i.e. {exp((rUSD − rSEK )t)StUSDSEK }t≥0 and
{exp((rEUR − rSEK )t)StEURSEK }t≥0 are martingales with respect to QSEK , governed
by the SDEs
A Two-Sided BNS Model for Multicurrency FX Markets 103
SEK 2
SEK σt c
+SEK ρ
+SEK c
−SEK ρ
−SEK
dX = rSEK − r
− − + + − dt
2 η
SEK − ρ
+SEK η
SEK + ρ
−SEK
+ σt
SEK dWt
SEK + ρ
+SEK dZ t+
SEK − ρ
−SEK dZ t−
SEK ,
SEK
SEK
dσt2 = − λ
SEK σt2 dt + dZ t+
SEK + dZ t−
SEK ,
for λ
SEK , ρ
+SEK , ρ
−SEK > 0,
∈ {EUR, USD}, {WtEURSEK , WtUSDSEK }t≥0 being
a two-dimensional Brownian motion with correlation r ∈ [−1, 1], and {Z t+ EURSEK ,
Z t+ USDSEK }t≥0 and {Z t− EURSEK , Z t− USDSEK }t≥0 being (independent) two-dimen-
sional time-change dependent compound Poisson processes with parameters
+ + + + + +
(max(cEURSEK , cUSDSEK )/κ + , cEURSEK , cUSDSEK , ηEURSEK , ηUSDSEK )
− − − − − −
and (max(cEURSEK , cUSDSEK )/κ − , cEURSEK , cUSDSEK , ηEURSEK , ηUSDSEK ),
where κ + and κ − are the time-change correlation parameters (following the frame-
work in Sect. 4.1). Hence, the EUR-SEK, EUR-USD exchange rates follow a bivari-
ate SBNS model. The implied exchange rate process S EURUSD is given by
StEURSEK
StEURUSD = .
t≥0 StUSDSEK t≥0
Due to the change-of-numéraire formula for exchange rates (cf. [19]), the process
{exp((rEUR − rUSD )t)StEURUSD }t≥0 is a martingale with respect to QUSD , where
QUSD is determined by the Radon–Nikodým derivative
d QUSD StUSDSEK exp(rUSD t)
= .
d QSEK t S0USDSEK exp(rSEK t)
5.3 Calibration
For calibration purposes, we use the volatility surfaces of the EUR-SEK and USD-
SEK exchange rates to fit the univariate parameters. Due to the consistency relation-
ships which have to hold between the exchange rates, we can calibrate the dependence
parameters by fitting them to the volatility surface of EUR-USD. Even in presence of
other “bivariate options” (e.g. best-of-two options), we argue that European options
on the quotient exchange rate currently provide the most liquid and reliable data for
a calibration.
The calibration of the presented multivariate model is done in two steps. Due to the
fact that the marginal distributions can be separated from the dependence structure
within our models, it is possible to keep the parameters governing the dependence
104 K.F. Bannör et al.
S0
SEK σ0
SEK c
SEK η
SEK λ
SEK ρ
SEK #options Error (%)
EUR 8.229 0.074 0.71 62.13 3.25 1.66 204 1.08
USD 6.664 0.078 1.15 40.81 2.19 1.22 204 3.17
Here, we used 100,000 simulations to calibrate the dependence parameters. The exe-
cution of the overall optimization procedure takes around four hours. The calibration
error of the dependence parameters in terms of average relative error is roughly nine
percent, which is still a good result giving consideration to the fact that we try to
fit 204 market prices by means of just two parameters in an implicitly specified
model. A more complex model, obtained by relaxing the condition that κ + and κ −
A Two-Sided BNS Model for Multicurrency FX Markets 105
0.7
0.6
calibration error 0.5
0.4
0.3
0.2
0.1
0
0
0.2 0.2
0.4 0.4
n
0.6 0.6
relatio
time−change 0.8 0.8 cor
dependence 1 BM
Fig. 3 The best matching correlation between the two Brownian motions is 0.52 and the optimal
time-change dependence parameter is κ = 0.96. This corresponds to a calibration error of around
nine percent for the 204 options on this currency pair
coincide, leads to even smaller calibration errors. However, we keep the model as
simple as possible to maintain tractability. Figure 3 illustrates the calibration error
of this second step depending on different choices of the dependence parameters.
Eventually, the whole model is fixed.
Now, we are able to price European multi-currency options, for instance a best-
of-two call option with a payoff at time t given by
USDSEK
EURSEK
eXT eXT
max max − K , 0 , max − K, 0 ,
S0USDSEK S0EURSEK
i.e. we consider the maximum of two call options with strike K > 0 on two exchange
rates. This option can be used as an insurance against a weakening SEK, because one
gets a payoff if the relative performance of one exchange rate, USD-SEK or EUR-
SEK, is greater than K −1. Pricing is done by a Monte-Carlo simulation that estimates
the expected value in Eq. (1). We used 100,000 scenarios to price this option, which
takes about four minutes. Figure 4 shows option prices of the best-of-two call option
dependent on various choices of the dependence parameters.
106 K.F. Bannör et al.
300 bp
280 bp
option price
260 bp
240 bp
220 bp
0
0.2 0.2
0.4 0.4
0.6 0.6
time−change 0.8 0.8
1 BM correlation
dependence
Fig. 4 Prices of a best-of-two call option where K = 1.1 and T = 1. One observes that both
dependence parameters play an important role for the price of this option. For the optimal parameter
setting (Brownian motion correlation is 0.52, κ = 0.96), the fair price of this option is 261 bp
Acknowledgments We thank the KPMG Center of Excellence in Risk Management for making
this work possible. Furthermore, we thank the TUM Graduate School for supporting these studies
and two anonymous referees for valuable comments.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
1. Bannör, K., Scherer, M.: A BNS-type stochastic volatility model with two-sided jumps, with
applications to FX options pricing. Wilmott 2013(65), 58–69 (2013)
2. Barndorff-Nielsen, O.E., Shephard, N.: Non-Gaussian Ornstein-Uhlenbeck-based models and
some of their uses in financial economics. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 63, 167–241
(2001)
A Two-Sided BNS Model for Multicurrency FX Markets 107
3. Bates, D.: Jumps and stochastic volatility: exchange rate processes implicit in Deutsche Mark
options. Rev. Financ. Stud. 9(1), 69–107 (1996)
4. Black, F., Scholes, M.: The pricing of options and corporate liabilities. J. Polit. Econ. 81(3),
637–654 (1973)
5. Carr, P., Madan, D.: Option valuation using the fast Fourier transform. J. Comput. Financ. 2,
61–73 (1999)
6. Duffie, D., Pan, J., Singleton, K.: Transform analysis and asset pricing for affine jump-
diffusions. Econometrica 68, 1343–1376 (2000)
7. Eberlein, E., Koval, N.: A cross-currency Lévy market model. Quant. Financ. 6(6), 465–480
(2006)
8. Garman, M., Kohlhagen, S.: Foreign currency option values. J. Int. Money Financ. 2, 231–237
(1983)
9. Heston, S.: A closed-form solution for options with stochastic volatility with applications to
bond and currency options. Rev. Financ. Stud. 6(2), 327–343 (1993)
10. Jacod, J., Todorov, V.: Testing for common arrivals of jumps for discretely observed multidi-
mensional processes. Ann. Stat. 37(4), 1792–1838 (2009)
11. Kallsen, J., Tankov, P.: Characterization of dependence of multidimensional Lévy processes
using Lévy copulas. J. Multivar. Anal. 97(7), 1551–1572 (2006)
12. Kou, S.G.: A jump-diffusion model for option pricing. Manag. Sci. 48(8), 1086–1101 (2002)
13. Lipton, A.: Mathematical Methods For Foreign Exchange: A Financial Engineer’s Approach.
World Scientific Publishing Company, River Edge (2001)
14. Luciano, E., Schoutens, W.: A multivariate jump-driven financial asset model. Quant. Financ.
6(5), 385–402 (2006)
15. Mai, J.-F., Scherer, M., Schulz, T.: Sequential modeling of dependent jump processes. Wilmott
2014(70), 54–63 (2014)
16. Merton, R.: Option pricing when underlying stock returns are discontinuous. J. Financ. Econ.
3(1), 125–144 (1976)
17. Muhle-Karbe, J., Pfaffel, O., Stelzer, R.: Option pricing in multivariate stochastic volatility
models of OU type. SIAM J. Financ. Math. 3, 66–94 (2012)
18. Nicolato, E., Venardos, E.: Option pricing in stochastic volatility models of the Ornstein-
Uhlenbeck type. Math. Financ. 13(4), 445–466 (2003)
19. Pelsser, A.: Mathematical foundation of convexity correction. Quant. Financ. 3(1), 59–65
(2003)
20. Pigorsch, C., Stelzer, R.: A multivariate generalization of the Ornstein-Uhlenbeck stochastic
volatility model, working paper (2008)
21. Raible, S.: Lévy Processes in finance: theory, numerics, and empirical facts, PhD thesis,
Albert-Ludwigs-Universität Freiburg i. Br. (2000)
22. Reiswich, D., Wystup, U.: FX volatility smile construction. Wilmott 60, 58–69 (2012)
23. Stein, E., Stein, J.: Stock price distributions with stochastic volatility: an analytic approach.
Rev. Financ. Stud. 4(4), 727–752 (1991)
Modeling the Price of Natural Gas
with Temperature and Oil Price
as Exogenous Factors
Abstract The literature on stochastic models for the spot market of gas is domi-
nated by purely stochastic approaches. In contrast to these models, Stoll and Wiebauer
[14] propose a fundamental model with temperature as an exogenous factor. A model
containing only deterministic, temperature-dependent and purely stochastic compo-
nents, however, still seems not able to capture economic influences on the price. In
order to improve the model of Stoll and Wiebauer [14], we include the oil price as
another exogenous factor. There are at least two fundamental reasons why this should
improve the model. First, the oil price can be considered as a proxy for the general
state of the world economy. Furthermore, pricing formulas in oil price indexed gas
import contracts in Central Europe are covered by the oil price component. It is shown
that the new model can explain price movements of the last few years much better
than previous models. The inclusion of oil price and temperature in the regression of
a least squares Monte Carlo method leads to more realistic valuation results for gas
storages and swing options.
Keywords Gas spot price · Oil price model · Temperature · Gas storage valuation ·
Least squares Monte Carlo · Seasonal time series model
1 Introduction
During the last years trading in natural gas has become more important. The traded
quantities over-the-counter and on energy exchanges have strongly increased and new
products have been developed. For example, swing options increase the flexibility of
suppliers and they are used as an instrument for risk management purposes. Important
facilities for the security of supply are gas storages.
These are two examples of complex American-style real options that illustrate the
need for reliable pricing methods. Both options rely on nontrivial trading strategies
where exercise decisions are taken under uncertainty. Therefore, analytic pricing
formulas cannot be expected. The identification of an optimal trading strategy under
uncertainty is a typical problem of stochastic dynamic programming, but even then
numerical solutions are difficult to obtain due to the curse of dimensionality. There-
fore, simulation-based approximation algorithms have been successfully applied in
this area. Longstaff and Schwartz [9] introduced the least square Monte Carlo method
for the valuation of American options. Meinshausen and Hambly [10] extended the
idea to Swing options, and Boogert and de Jong [5] applied it to the valuation of
gas storages. Their least squares Monte Carlo algorithm requires a stochastic price
model for daily spot prices generating adequate gas price scenarios. We prefer this
approach to methods using scenario trees or finite differences as it is independent of
the underlying price process.
The financial literature on stochastic gas price models is dominated by purely sto-
chastic approaches. The one- and two-factor models by Schwartz [12] and Schwartz
and Smith [13] are general approaches applicable to many commodities, such as oil
and gas. The various factors represent short- and long-term influences on the price.
An important application of gas price models is the valuation of gas storage facili-
ties. Within this context, Chen and Forsyth [7] and Boogert and de Jong [6] propose
gas price models. Chen and Forsyth [7] analyze regime-switching approaches incor-
porating mean-reverting processes and random walks. The class of factor models is
extended by Boogert and de Jong [6]. The three factors in their model represent short-
and long-term fluctuations as well as the behavior of the winter–summer spread. In
contrast to these models, Stoll and Wiebauer [14] propose a fundamental model
with temperature as an exogenous factor. They use the temperature component as an
approximation of the filling level of gas storages, which have a remarkable influence
on the price.
There is a fundamental difference between the model of Stoll and Wiebauer [14]
and the other models mentioned before as far as their stochastic behavior is concerned.
Incorporating cumulated heating degree days over a winter as an explanatory variable
leads to a seasonal effect in the variance of the prices. In this model the variance of the
gas prices increases over the winter depending on the actual weather conditions and
has a maximum at the end of winter. This is much more in line with the observations
than the behavior of the model of Boogert and de Jong [6] where the variance of the
gas price has a minimum at the end of winter as there is no effect of the winter–summer
spread used there. Another major difference is the use of exogenous variables that
can be observed and thus the optimal exercising decision for American-style options
depends on these variables and therefore also the price of these real options will be
different.
In this paper we extend the model of Stoll and Wiebauer [14] by introducing
another exogenous factor to their model: the oil price. There are at least two reasons
why we believe that this is useful. The main reason is that an oil price component can
Modeling the Price of Natural Gas with Temperature … 111
be considered as a proxy for the state of the world economy in the future. In contrast
to other indicators, such as the gross domestic product (GDP), futures prices for oil
are available on a daily basis. Furthermore, the import prices for gas in countries
such as Germany are known to be oil price indexed.
Apart from the GDP or oil price there might be more candidates as an explanatory
variable in the model. The most natural choice would be the forward gas price. We
prefer the oil price as it gives us the chance to valuate gas derivatives that are oil
price indexed, as is often the case for gas swing contracts. For the valuation of such
swing contracts gas price scenarios are needed as well as corresponding oil price
scenarios. This application is hardly possible with explanatory variables other than
the oil price.
The rest of the paper is organized as follows. In Sect. 2 we introduce the model
by Stoll and Wiebauer [14] including a short description of their model for the
temperature component. In Sect. 3 we discuss the need for an oil price component in
the model. The choice of the component in our model is explained. Then we fit the
model to data in Sect. 4. The new model is used within a least squares Monte Carlo
algorithm for valuation of gas storages and swing options in Sect. 5. The exogenous
factors are included in the regression to approximate the continuation value. We
finish with a short conclusion in Sect. 6.
Modeling the price of natural gas in Central Europe requires knowledge about the
structure of supply and demand. On the supply side there are only a few sources in
Central Europe, while most of the natural gas is imported from Norway and Russia.
On the demand side there are mainly three classes of gas consumers: Households,
industrial companies, and gas fired power plants. While households only use gas for
heating purposes at low temperatures, industrial companies use gas as heating and
process gas. Households and industrial companies are responsible for the major part
of the total gas demand.
These two groups of consumers cause seasonalities in the gas price:
• Weekly seasonality: Many industrial companies need less gas on weekends as their
operation is restricted to working days.
• Yearly seasonality: Heating gas is needed mainly in winter at low temperatures.
An adequate gas price model has to incorporate these seasonalities as well as sto-
chastic deviations of these.
Stoll and Wiebauer [14] propose a model meeting these requirements and incor-
porating another major influence factor: the temperature. To a certain extent the
temperature dependency is already covered by the deterministic yearly seasonality.
This component describes the direct influence of temperature: The lower the tem-
perature, the higher the price. But the temperature influence is more complex than
this. A day with average temperature of 0 ◦ C at the end of a long cold winter has
112 J. Müller et al.
a different impact on the price than a daily average of zero at the end of a “warm”
winter. Similarly, a cold day at the end of a winter has a different impact on the price
than a cold day at the beginning of the winter.
The different impacts are due to gas storages that are essential to cover the demand
in winter. The total demand for gas is higher than the capacities of the gas pipelines
from Norway and Russia. Therefore, gas providers use gas storages. These storages
are filled during summer (at low prices) and emptied in winter months. At the end of
a long and cold winter most gas storages will be rather empty. Therefore, additional
cold days will lead to comparatively higher prices than in a normal winter.
The filling level of all gas storages in the market would be the adequate variable
to model the gas price. However, these data are not available as they are private
information. Therefore, we need a proxy variable for it. As the filling levels of gas
storages are strongly related to the demand for gas which in turn depends on the
temperature, an adequate variable can be derived from the temperature.
Stoll and Wiebauer [14] use normalized cumulated heating degree days to cover
the influence of temperature on the gas price. They define a temperature of 15 ◦ C
as the limit of heating. Any temperature below 15 ◦ C makes households as well
as companies switch on their heating systems. Heating degree days are measured
by H D Dt = max (15 − Tt , 0), where Tt is the average temperature of day t. As
mentioned above the impact on the price depends on the number of cold days observed
so far in the winter. In this context, we refer to winter as 1 October and the 181
following days till end of March. We will write H D Dd,w for H D Dt , if t is day
number d of winter w. Cumulation of heating degree days over a winter leads to
a number indicating how cold the winter has been so far. Then we can define the
cumulated heating degree days on the day d in winter w as
d
C H D Dd,w = H D Dk,w for 1 ≤ d ≤ 182. (1)
k=1
The impact of cumulated heating degree days on the price depends on the comparison
with a normal winter. This information is included in normalized cumulated heating
degree days
w−1
1
Λd,w = C H D Dd,w − C H D Dd, for 1 ≤ d ≤ 182. (2)
w−1
=1
−300
−400
−500
2003 2006 2009 2012
heating degree days are linear with respect to the parameters, we can use ordinary
least squares regression for parameter estimation. The complete model can be written
as
(G) (G)
G t = m t + α · Λt + X t + Yt (3)
with the day-ahead price of gas G t , the deterministic seasonality m t , the normal-
(G)
ized cumulated heating degree days Λt , an ARMA process X t , and a geometric
(G)
Brownian motion Yt . For model calibration day-ahead gas prices from TTF market
(Source: ICE) are used. The Dutch gas trading hub TTF offers the highest trading vol-
umes in Central Europe. As corresponding temperature data we choose daily average
temperatures from Eindhoven, Netherlands (Source: Royal Netherlands Meteorolog-
ical Institute). The fit to historical prices before the crisis can be seen in Fig. 2. Outliers
have been removed (see Sect. 4 for details on treatment of outliers).
The model described in Eq. (3) is capable to cover all influences on the gas price
related to changes in temperature. But changes in the economic situation are not
covered by that model. This was clearly observable in the economic crisis 2008/2009
(see Fig. 5). During that crisis the demand for gas by industrial companies in Central
Europe was falling by more than 10 %. As a consequence the gas price rapidly
decreased by more than 10 Euro per MWh.
The oil price showed a similar behavior in that period. Economic changes are
the main drivers for remarkable changes in the oil price level. Short-term price
movements caused by speculators or other effects cause deviations from the price
level that represents the state of the world economy. Therefore, gas price changes
often correspond to long-term changes in the oil price level. Such an influence can be
114 J. Müller et al.
35
30
20
15
10
5
2005 2006 2007 2008 2009
Year
Fig. 2 m t + α · Λt from Eq. (3) (black) fitted to TTF prices from 2004–2009 (grey)
Fig. 3 In a 3-1-3 formula the price is determined by the average price of 3 months (March to May).
This price is valid for July–September. The next day of price fixing is 1 October
modeled by means of a moving average of past oil prices. The averaging procedure
removes short-term price movements if the averaging period is chosen sufficiently
long. The result is a time series containing only the long-term trends of the oil
price. Using such an oil price component in a gas price model explains the gas price
movements due to changes in the economic situation. This consideration is in line
with He et al. [8]. They identify cointegration between crude oil prices and a certain
indicator of global economic activity.
Another important argument for the use of this oil price component is based on
Central European gas markets. Countries such as Germany import gas via long-
term contracts that are oil price indexed. This indexation can be described by three
parameters:
1. The number of averaging months. The gas price is the average of past oil prices
within a certain number of months.
2. The time lag. Possibly, there is a time lag between the months the average is
taken of and the months the price is valid for.
3. The number of validity months. The price is valid for a certain number of months.
An example of a 3-1-3 formula is given in Fig. 3.
The formulas used in the gas import contracts are not known to all market partici-
pants. Theoretically, any choice of three natural numbers is possible. But from other
products, like oil price indexed gas swing options, we know that some formulas are
more popular than others. Examples of common formulas are 3-1-1, 3-1-3, 6-1-1,
Modeling the Price of Natural Gas with Temperature … 115
90
80
60
50
40
30
20
2008 2010 2012
Year
Fig. 4 The oil price (grey), the 6-0-1 formula (black step function) and the moving average of
180 days (black)
6-1-3, and 6-3-3. Therefore, we assume that these formulas are relevant for import
contracts as well.
As there are many different import contracts with possibly different price formulas
we cannot be sure that one of the mentioned formulas is responsible for the price
behavior on the market. The mixture of different formulas might affect the price in
the same way as a common formula or a similar one.
Evaluation of the formula leads to price jumps every time the price is fixed. The
impact on the gas price will be smoother, however. The new gas price determined
on a fixing day is the result of averaging a number of past oil prices. The closer
to the fixing day the more prices for the averaging are known. Therefore, market
participants have reliable estimations of the new import price. If the new price would
be higher it would be cheaper to buy gas in advance and store it. This increases the
day-ahead price prior to the fixing day and leads to a smooth transition from the old
to the new price level on the day-ahead market.
This behavior of market participants leads to some smoothness of the price. In
order to include this fact in a model a smoothed price formula can be used. A sophis-
ticated smoothing approach for forward price curves is introduced by Benth et al.
[3]. They assume some smoothness conditions in the knots between different price
intervals. It is shown that splines of order four meet all these requirements and make
sure that the result is a smooth curve. As our price formulas are step functions like
forward price curves, this approach is applicable to our situation.
If the number of validity months is equal to one it is possible to use a moving
average instead of a (smoothed) step function to simplify matters (see Fig. 4). This
alternative is much less complex than the approach with smoothing by splines, and
delivers comparable results. Therefore, the simpler method is applied in case of
formulas with one validity month.
116 J. Müller et al.
In the next section we compare various formulas regarding their ability to explain
the price behavior on the gas market.
We now compare different formulas of oil prices in the regression model in order to
find the one explaining the gas price best (see Fig. 5).
For the choice of the best formula we use the coefficient of determination R 2
as the measure of goodness-of-fit. We choose the reasonable formula leading to the
highest value of R 2 . Reasonable, in this context, means that we restrict our analysis
to formulas that are equal or similar to the ones known from other oil price indexed
products (compare Sect. 3). The result of this comparison is a 6-0-1 formula (see
Fig. 6). Although this is not a common formula there is an explanation for it: The gas
price decreased approximately six months later than the oil price in the crisis. This
major price movement needs to be covered by the oil price component. As explained
above we replace the step function by a moving average. Taking the moving average
of 180 days is a good approximation of the 6-0-1 formula. All in all, the oil price
component increases the R 2 as our measure of goodness-of-fit from 0.35 to 0.83
(see Fig. 5). Even if the new model is applied to data before the crisis the oil price
component is significant. In that period the increase of R 2 is smaller but still improves
the model.
These comparisons give evidence that both considerations in the previous section
are valid. The included oil price component can be seen as the smoothed version
of a certain formula. At the same time it can be considered as a variable describing
economic influences indicated by the trends and level of the oil price.
Therefore, we model the gas price by the new model
(G)
G t = m t + α1 Λt + α2 Ψt + X t (4)
(G)
with Ψt being the oil price component. This means that the unobservable factor Yt
in Stoll and Wiebauer [14] is replaced by the observable factor Ψt .
Parameter estimation of our model is based on the same data sources as the model
by Stoll and Wiebauer [14]. However, we extend the period to 2011. Additionally, we
need historical data for the estimation of the oil price component. Therefore, we use
prices of the front month contracts of Brent crude oil traded on the Intercontinental
Exchange (ICE) from 2002–2011. Using these data we can estimate all parameters
applying ordinary least squares regression after removing outliers from the gas price
data, G t .
Outliers can be due to technical problems or a fire at a major gas storage. We
exclude the prices on these occasions by an outlier treatment proposed by Weron [15],
where values outside a range around a running median are declared to be outliers.
The range is defined as three times the standard deviation. The identified outliers
Modeling the Price of Natural Gas with Temperature … 117
35
30
20
15
10
5
2005 2006 2007 2008 2009 2010 2011 2012
Year
Fig. 5 The model of Stoll and Wiebauer [14] (bold black) and our model (thin black line) fitted to
historical gas prices (grey)
35
30
Price in EUR per MWh
25
20
15
10
5
2007 2008 2009 2010
Year
Fig. 6 Comparison of different oil price components in the model: 6-0-1 formula (bold black),
6-1-1 formula (grey) and 3-0-1 formula (thin black) fitted to the historical prices (dark grey)
are excluded in the regression. We do not remove them from our model, however, as
they are still included in the estimation of the parameters of the remaining stochastic
process.
Altogether, these model components give fundamental explanations for the his-
torical day-ahead price behavior. Short-term deviations are included by a stochastic
process (see Sect. 4.3). Long-term uncertainty due to the uncertain development of
the oil price is included by the oil price process. Therefore, our model is able to
generate reasonable scenarios for the future (see Fig. 7). We specify the stochastic
(G)
models for the exogenous factors Ψt and Λt as well as the stochastic process X t
in the following.
118 J. Müller et al.
30
20
10
2008 2009 2010 2011 2012 2013
40
30
20
10
2008 2009 2010 2011 2012 2013
Fig. 7 The historical gas price (2008–2012) and its extensions by two realizations of the gas price
process for 2012–2013
Oil prices show a different behavior than gas prices, which influences the choice
of an adequate model. The most obvious fact is the absence of any seasonalities or
deterministic components. Therefore, we model the oil price without a deterministic
function or fundamental component. Another major difference affects the stochastic
process. While the oil price and also logarithmic oil prices are not stationary the gas
price is stationary after removal of seasonalities and fundamental components.
A very common model for nonstationary time series is the Brownian motion
with drift applied to logarithmic prices. Drift and volatility of this process can be
determined using historical data or by any estimation of the future volatility. For a sta-
tionary process, the use of an Ornstein-Uhlenbeck process or its discrete equivalent,
an AR(1) process, is an appropriate simple model.
A combination of these two simple modeling approaches is given by the two-
factor model by Schwartz and Smith [13]. They divide the log price into two factors:
one for short-term variations and one for long-term dynamics.
Temperature in °C
30
20
10
0
−10
2009 2010 2011 2012
Year
1 1
PACF
0.5
ACF
0.5
0
0
0 20 40 0 10 20
Lags in days Lags in days
Fig. 8 Top: Fit of deterministic function (black) to the historical daily average temperature (grey) in
Eindhoven, Netherlands. Bottom: Autocorrelation function (left) and partial autocorrelation function
(right) of residual time series (black) and innovations of AR(3) process (grey)
When modeling daily average temperature we can make use of a long history of
temperature data. Here, a yearly seasonality and a linear trend can be identified.
Therefore, we use a temperature model closely related to the one proposed by Benth
and Benth [2].
2πt 2πt (T )
Tt = a1 + a2 t + a3 sin + a4 cos + Xt (6)
365.25 365.25
with X t(T ) being an AR(3) process. The model fit with respect to the deterministic
part (ordinary least squares regression) and the AR(3) process is shown in Fig. 8. The
process (Tt ) (see Fig. 9) is then used to define the derived process (Λt ) of normalized
cumulated heating degree days as described in Sect. 2.
The fit of normalized cumulated heating degree days, oil price component, and deter-
ministic components to the gas price via ordinary least squares regression (see Fig. 10)
results in a residual time series. These residuals contain all unexplained, “random”
deviations from the usual price behavior.
120 J. Müller et al.
Temperature in °C
30
20
10
0
−10
Temperature in °C Oct Jan Apr Jul Oct Jan Apr Jul Oct Jan
30
20
10
0
−10
Oct Jan Apr Jul Oct Jan Apr Jul Oct Jan
Month
Fig. 9 Historical temperatures and its extension by two realizations of the temperature model
35
30
Price in EUR per MWh
25
20
15
10
5
2005 2006 2007 2008 2009 2010 2011 2012
Year
Fig. 10 Fit of deterministic function and exogenous components (black) to the historical gas price
(grey)
The residuals exhibit a strong autocorrelation to the first lag. Further analysis
of the partial autocorrelation function reveal an ARMA(1,2) process providing a
good fit (see Fig. 11). The empirical innovations of the process show heavier tails
than a normal distribution (compare Stoll and Wiebauer [14]). Therefore, we apply
a distribution with heavy tails. The class of generalized hyperbolic distributions
including the NIG distribution was introduced by Barndorff-Nielsen [1]. The normal-
inverse Gaussian (NIG) distribution leads to a remarkably good fit (see Fig. 11).
Both the distribution of the innovations and the parameters of autoregressive
processes are estimated using maximum likelihood estimation.
Modeling the Price of Natural Gas with Temperature … 121
1 1
PACF
ACF
0.5 0.5
0 0
0 20 40 0 20 40
Lags in days Lags in days
Relative Frequency
0
10
−2
10
−4
10
−6 −4 −2 0 2 4 6
Fig. 11 Top: ACF (left) and PACF (right) of residual time series (black) and innovations of
ARMA(1,2) process (grey). Bottom: Fit of NIG distribution (grey) to kernel density of empiri-
cal innovations (black)
We assume that the storage is available from time t = 0 till time t = T + 1 and
the holder is allowed to take an action at any discrete date t = 1, . . . , T after the spot
price S(t) is known. Let v(t) denote the volume in storage at the start of day t and
Δv the volume change during day t. In case of an injection Δv > 0, while Δv < 0
means withdrawal from the storage. The payoff on day t is
(−G t − cW D,t ) · Δv, Δv ≥ 0
h (G t , Δv) = . (10)
(−G t − c I N ,t ) · Δv, Δv < 0
Here cW D,t denotes the withdrawal costs and c I N ,t the injection costs on day t, which
can be different and may include a bid-ask spread.
Modeling the Price of Natural Gas with Temperature … 123
Let U (t, G t , v(t)) be the value of the flexibility starting at volume level v(t)
at time t. By C(t, G t , v(t), Δv) we denote the continuation value after taking an
allowed action Δv from D(t, v(t)) (the set of all admissible actions at time t if the
filling level is v(t)). If r (t) is the interest rate at time t then
C(t, G t , v(t), Δv) = E e−r (t+1) U (t + 1, G t+1 , v(t) + Δv) . (11)
The continuation value only depends on v(t + 1) := v(t) + Δv. Therefore, we will
from now on also write C(t, G t , v(t + 1)) for short. With this notation the flexibility
value U (t, G t , v(t)) satisfies the following dynamic program:
for all times t. In the first equation q is a possible penalty depending on the volume
level at time T + 1 and the spot price at this time G T +1 .
As the continuation value cannot be determined analytically, we use the least
squares Monte Carlo method to approximate the continuation value
m
C(t, G t , v(t + 1)) ≈ βl,t (v(t + 1)) · φl (G t ) (13)
l=0
using basis functions φl . If N price scenarios are given, estimates β̂l,t (v(t + 1))
for the coefficients βl,t (v(t + 1)) result by regression. With these coefficients an
approximation Ĉ(t, G t , v(t + 1)) of the continuation value is obtained that is used
to determine an approximately optimal action Δv(t) for all volumes v(t).
Moreno and Navas [11] have shown that the concrete choice of the basis functions
does not have much influence on the results. For this reason we have chosen the easy
to handle polynomial basis functions φl (G t ) = G lt . Calculations have shown that
m = 3 is enough to get good results. A higher number of basis functions leads to
similar results.
Boogert and de Jong [6] use a multi-factor price process and include the factors
of the price process into the basis used for the regression in the least squares Monte
Carlo method. While their factors are unobservable, our price process (see Eq. (4))
includes two exogenous factors, which can easily be observed. We include the oil
price component Ψt (see Sect. 3) and the temperature component Λt (see Sect. 2)
into the regression by using
124 J. Müller et al.
m
C(t, G t , Λt , Ψt , v(t), Δv) ≈ βl,t · φl (G t )
l=0
+ βm+1,t Ψt + βm+2,t Ψt2 + βm+3,t Ψt · G t
+ βm+4,t Λt + βm+5,t Λt · G t . (14)
Gas storages and swing options are not only virtual products but are real options. This
means that traders need to take exercise decisions on a daily basis. These decisions
depend on all observable market information. In order to reflect this behavior in the
pricing algorithm for such options we will use the least squares Monte Carlo method
described above in combination with the spot price model in Sect. 4. The examples
given in this section are artificial gas storages and swing options valuated at two
different dates, 4 July 2012 and 2 April 2013. These dates are characterized by a
very different implicit volatility observed at the markets—for example for TTF the
long-term volatility has significantly decreased in the 8-month period from 25 to 12 %
(Source: ICE). At the same time the summer–winter spread between winter 13/14
and summer 13 has decreased from 2.40 EUR/MWh to 1.20 EUR/MWh, whereas
the price level has increased from 26.15 EUR/MWh to 27.70 EUR/MWh.
The TTF market prices have been used for the valuation of a slow and a fast
storage that are identical to the ones valued by Boogert and de Jong [6]. Moreover,
we have also valued a flexible and an inflexible swing contract. The parameters for
these storages and swings are given in Table 1. All valuations have been done using
5,000 price scenarios, which results in sufficiently convergent results.
We denote by daily intrinsic the value obtained if a daily price forward curve is
taken and an optimal exercise is calculated (using a deterministic dynamic program).
This value could be logged in immediately if each single future day could be traded
as an individual forward contract. The fair value denotes the value resulting from the
least squares Monte Carlo method, and the extrinsic value is the difference between
fair value and daily intrinsic value. Therefore, the extrinsic value is a measure for
the value of the flexibility included in the considered real option.
As can clearly be seen by comparing Tables 2 and 3 the decrease of the summer–
winter spread results in a lower daily intrinsic value for the storages. In contrast to
this behavior the intrinsic value of the flexible swing increases because of the higher
price level in 2013 compared to 2012. Furthermore, the decrease of volatility does
Modeling the Price of Natural Gas with Temperature … 125
Table 1 Parameters for gas storages and swing options from 1.4.2013–1.4.2014
Parameter Slow storage Fast storage Inflexible swing Flexible swing
Min volume 0 MWh 0 MWh 0 MWh 0 MWh
Max volume 100 MWh 100 MWh 438 MWh 438 MWh
Min injection 0 MWh/day 0 MWh/day – –
Max injection 1 MWh/day 2 MWh/day – –
Min withdrawal 0 MWh/day 0 MWh/day 0.6 MWh/day 0 MWh/day
Max withdrawal 1 MWh/day 5 MWh/day 1.2 MWh/day 1.2 MWh/day
Injection costs 0 EUR/MWh 0 EUR/MWh – –
Withdrawal costs 0 EUR/MWh 0 EUR/MWh 27 EUR/MWh 27 EUR/MWh
Start volume 0 MWh 0 MWh 438 MWh 438 MWh
Max end volume 0 MWh 0 MWh 146 MWh 146 MWh
not change the extrinsic value of the two storages—very much in contrast to the
swings.
For storages these findings correspond very well to the observations by Boogert
and de Jong [6]. They also found that a change of volatility in the long-term compo-
nent does not influence the value of gas storages—it may even decrease the value.
An explanation for this behavior is that it becomes more difficult for traders to decide
correctly if today’s price is high or low and therefore withdrawal, injection, or no
action makes most sense. Due to the decision under uncertainty about the future price
development with an increased volatility, more and more wrong decisions are taken
and this may decrease the value at least in case of fast storages.
The situation is completely different for swing options. With an increasing volatil-
ity their value also increases. This is not surprising as can easily be seen from looking
at a special case. If the yearly restriction is not binding the swing is equivalent to a
126 J. Müller et al.
strip of European options. In this case it is well known that an increase of volatility
implies an increase of the extrinsic option value under quite general assumptions on
the underlying stochastic process, see e.g. Bergenthum and Rüschendorf [4].
Another important difference between swings and storages is their behavior if
the exogenous components of the spot price process are included in the regression
of the algorithm. For the value of storages the oil price component is much more
important—in contrast to swings. For the inflexible swing both components are
irrelevant, while for the flexible swing the temperature component is more important
than the oil price component. For storages the oil price component is a measure for
normal long-term levels. As prices revert back to this long-term level mainly defined
by the oil price component, a price higher than this level is good for withdrawal
while a price lower than this level is good for injection. Therefore, an inclusion in
the regression is very important for the exercise decision and increases the value.
Another interesting observation is the influence of the two exogenous components
on the less flexible products. While an inclusion of the oil price component increases
the fair value, a further inclusion of the temperature component decreases the value
slightly for valuation date 2 April 2013—but not for 4 July 2012. One important
reason is that in April 2013 the end of a long and as far as heating degrees are
concerned quite normal winter has just been exceeded and the linear return to zero is
starting, while the winter 2011/12 has been very warm and in July the linear return
with a slight gradient has half been finished.
To sum up, these results indicate that it is very important to include both exogenous
components into the exercise decision for storages as well as swings, as this can
significantly increase the extrinsic value.
Modeling the Price of Natural Gas with Temperature … 127
6 Conclusion
The spot price model by Stoll and Wiebauer [14] with only temperature as an exoge-
nous factor is not able to explain the gas price behavior during the last years. We
have shown that adding an oil price component as another exogenous factor remark-
ably improves the model fit. It is not only a good proxy for economic influences on
the price but also approximates the oil price indexation in gas import contracts on
Central European gas markets. These fundamental reasons and the improvement of
model fit give justification for the inclusion of the model component. The resulting
simulation paths from the model are reliable. The inclusion of both exogenous factors
in algorithms for the valuation of options by least squares Monte Carlo remarkably
affects the valuation results.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
13. Schwartz, E.S., Smith, J.E.: Short-term variations and long-term dynamics in commodity prices.
Manage. Sci. 46, 893–911 (2000)
14. Stoll, S.O., Wiebauer, K.: A spot price model for natural gas considering temperature as an
exogenous factor and applications. J. Energy Mark. 3, 113–128 (2010)
15. Weron, R.: Modeling and Forecasting Electricity Loads and Prices: A Statistical Approach.
Wiley, Chichester (2006)
Copula-Specific Credit Portfolio Modeling
How the Sector Copula Affects the Tail of the Portfolio
Loss Distribution
1 Introduction
After the market crash of October 1987, Value-at-Risk (VaR) became a popular
management tool in financial firms. Practitioners and policy makers have invested
individually in implementing and exploring a variety of new models. However, as a
consequence of the financial markets turmoil around 2007/2008, the concept of VaR
was exposed to fierce debates. But just a few years after the crisis, VaR is still being
used albeit with greater awareness of its limitations (model risk) or in combination
with scenario analysis or stress testing. In particular, banks are required to critically
analyze and validate their employed VaR models which form the basis for their
internal capital allocation process (ICAAP, see BaFin [1, AT.4.1]). In this context,
M. Fischer (B)
Department of Statistics and Econometrics, University of Erlangen-Nuremberg,
Lange Gasse 20, 90403 Nuremberg, Germany
e-mail: [email protected]
K. Jakob
Department of Economics, University of Augsburg,
Universitätsviertel. 16, 86159 Augsburg, Germany
e-mail: [email protected]
the term “model validation” should be associated to the activity of assessing if the
assumptions of the model are valid. Model assumptions, not computational errors,
were the focus of the most common criticisms of quantitative models in the crisis. In
particular, banks should be aware of the errors that can be made in the assumptions
underlying their models which form one of the crucial parts of model risk, probably
underestimated in the past practice of model risk management. With respect to the
current regulatory requirements (see, e.g., BaFin [1] or Board of Governors of the
Federal Reserve System [2]), banks are also required to quantify how sensitive their
models and the resulting risk figures are if fundamental assumptions are modified.
The focus of this contribution is solely on credit risk as one of the most important
risk types in the classical banking industry. Typically, the amount of economic capital
which has to be reserved for credit risk is determined with a credit portfolio model.
Two of the most widespread models are CreditMetrics, launched by JP Morgan
(see Gupton et al. [3]) and CreditRisk+ , an actuarial approach proposed by Credit
Suisse Financial Products (CSFP, see Wilde [4]). Shortly after their publication,
Koylouglu and Hickman [5], Crouhy [6] or Gordy [7] offered a comparative anatomy
of both models and described quite precisely where the models differ in functional
form, distributional assumptions, and reliance on approximation formulae. Sector
dependence, however, was not in the focus of these studies.
A crucial issue with credit portfolio models consists in the realistic modeling
of dependencies between counterparties. Typically, all counterparties are assigned
to one or more (industry/country) sectors. Consequently, high-dimensional counter-
party dependence can be reduced to low(er)-dimensional sector dependence, which
describes the way how sector variables are coupled together. Against this background,
our focus is on the impact of different dependence structures represented in terms of
copulas within credit portfolio models. Relating to Jakob and Fischer [8], we extend
the analysis of the CreditRisk+ model to CreditMetrics and provide comparisons
between both frameworks. For this purpose, we work out the implicit and explicit
sector copula of both classes in a first step and quantify the effect of exchanging the
copula model on the risk figures for a hypothetical loan portfolio and a variety of
recent flexible parametric copulas in a second step.
Therefore, the outline is as follows. In Sect. 2, we review the classical copula con-
cept and briefly introduce those copulas which are used during the analysis. Section 3
summarizes and compares the underlying credit portfolio models with special empha-
sis on the underlying sector dependence. Finally, we empirically demonstrate the
influence of different copula models on the upper tail of the loss distribution and,
hence, on the risk figures for a hypothetical but realistic loan portfolio. Section 5
concludes.
Copula-Specific Credit Portfolio Modeling 131
1 In general, we assume that the reader is already familiar with the concept of copulas as well as
the most popular classes. For details, we refer to Joe [10] and Nelson [11].
2 A vine is a directed acyclic graph, representing the decomposition sequence of a multivariate
density function.
132 M. Fischer and K. Jakob
3Migration risk includes the financial risk due to a change of the portfolio value caused by rating
migrations (i.e., down- and upgrade).
Copula-Specific Credit Portfolio Modeling 133
are replaced by
EADi · LGDi EADi · LGDi · PDi
EADi := max ,1 i :=
and PD
U
E ADi · U
respectively. The adjustment of the PDs ensures that the expected loss of the portfolio
is not affected by the discretizastion. i.e., it holds:
N
N
To simplify notation, we will omit the tilde for the discretized exposure and the PD
in the following and denote them also with EADi and PDi , respectively. Since the
CreditMetrics model is a simulative one, such an adjustment is not necessary.
L
Sk = S k + γk, Ŝ , for k = 1, . . . , K (1)
=1
independent of each other, one can reduce the CBV extension to the basic CreditRisk+
model. Hence, also the CBV model can be solved analytically, too. For further details
on the estimation of the Gamma parameters, we refer to Fischer and Dietz [23].
In Eq. (1), the marginal distributions of S = (S1 , . . . , S K )T are (in general) not
Gamma anymore. An analysis of the resulting univariate distribution was established
by Moschopoulos [24]. The copula of S is called a multi factor copula, which is
discussed by Oh and Patton [25] in a very general way or Mai and Scherer [26].
falls below φ −1 (PDi ), where φ −1 denotes the quantile function of the standard
normal distribution and Yi ∼ N (0, 1) is independent from X and Y j for i = j. The
vector RiT ∈ [−1, 1] K , with the restriction that RiT Σ Ri ≤ 1, contains the so-called
factor loadings, describing the correlation between a counterparty’s asset value Ai
and the systemic factors X k . Given a sector realization x of X, the conditional PD,
derived from the asset process (2) reads as
PDiCM (X = x) = φ φ −1 (PDi ) − RiT x 1 − RiT Σ Ri . (3)
In the CreditRisk+ model, the sector variables Sk are assumed to influence the con-
ditional PD according to
PDiCR+ (S = s) = PDi W iT s + Wi,0 (4)
K
with W i ∈ [0, 1] K and Wi,0 := k=1 Wi,k ≤ 1. Equations (3) and (4) establish a
connection between sector variables and counterparties PDs. In CreditRisk+ , PDiCR+
serves as intensity parameter of a Poisson distribution from which defaults are drawn
independently for every counterparty. The Poisson distribution is used instead of a
Bernoulli one in order to obtain a closed form expression of the loss distribution.
Therefore, also multiple defaults of counterparties (especially with bad creditworthi-
ness) are possible. This is a major drawback of the model, leading to an overestimation
of the risk figures. In Sect. 4 we analyze the changes of risk figures with respect to
the underlying copula. But since our focus is on relative changes, this overestimation
does not influence the comparison.
4 One should note, that Ai again has a standard Gaussian law. The dependence structure is described
by a multi factor copula as in case of the CreditRisk+ - CBV model, but with a different parame-
trization.
Copula-Specific Credit Portfolio Modeling 135
In this section the estimation results for the sector copulas are presented as well as
the effect on economic capital.
5 Assigning an obligor to more than one sector would cause serious problems in the CreditMetrics
framework, since, in general, the distribution of the asset value (2) is unknown if the copula of X
is not Gaussian.
6 For readers who are interested in the effect of stochastic LGDs, we refer to Gundlach and Lehrbass
σk2 represent the uncertainty about possible PD changes within the sector. Therefore,
the risk related to a particular sector increases with σk2 .
The basis for the parameter estimation is a data pool containing monthly obser-
vations (PD estimations) from 2003 to 2012 for more than 30,000 exchange traded
corporates from all over the world. The individual PD time series, derived from mar-
ket data (equity prices and liabilities) via a Merton model (see Merton [28]), are
aggregated on sector level via averaging. In order to take time dependencies into
account, we fitted a univariate autoregressive process to every sector time series.
In order to fully determine the marginal distributions, we have to specify the sector
variances σk2 for the CreditRisk+ and the asset correlations Rk2 for the CreditMetrics
model.7 The sector variances are estimated based on the autocovariance function of
the aggregated sector time series mentioned above, which are normalized such that
E(Sk ) = 1 holds, in order to ensure that the mean of the conditional PD (Eq. (4))
equals the unconditional PD. In case of the CreditMetrics model, the asset correlation
parameters Rk2 are estimated via a moment matching approach, such that the first
two moments of the conditional
PD in both models coincide.8 Note, that the PD
variance Var PDi (X) induced by Eq. (3) of counterparty i in sector k is given by
CM
where PDk denotes the mean of the time series for sector k and Φ 2 is the distribution
function of the bivariate normal distribution with correlation parameter Rk2 .
First note that the estimations are based on the residuals of the autoregressive
processes, fitted on every sector PD time series. For a more detailed discussion
on this topic, we refer to Jakob and Fischer [8], for instance.
7 In practice, the parametrization of both models are very different. The parameters of the
CreditRisk+ model are typically estimated based on default data or insolvency rates, whereas
in case of the CreditMetrics model marked data are used. Using PD time series based on marked
data might serve as a compromise in order to compare the results across both models.
The parameters of the GC and the TC (as representatives of the elliptical copula class)
are estimated via maximum likelihood using the R-Package “copulas” from Hofert
et al. [29]. For the TC, we estimated 3.786 degrees of freedom indicating that a joint
exceedance of high quantiles is more likely compared to the GC. Generalizing the
TC, we also considered symmetric and asymmetric9 GHC. For parameter estimation
the R-package “ghyp” from Luethi and Breymann [30] was used. Please note that
compared to the TC, the sym. GHC poses two more parameters due to the generalized
inverse Gaussian distribution, which is used as mixing distribution for the family of
generalized hyperbolic distributions and by another ten parameters because of the
skewness vector in case of the asymmetric GHC. The corresponding log-likelihood
values are summarized in Table 2. A standard likelihood ratio test indicates that the
TC fits the data significantly better than the Gaussia one on every typical significance
level. Also, the increase of the log-likelihood of the asymmetric GHC is significant
to that of its symmetric counterpart. Hence, the stronger dependence between higher
PDs, occurring in the asym. GHC, is significant again on every common level.
Please note that the application of the GHC in practice has several drawbacks.
The estimation procedure, the MCECM (multi-cycle, expectation, conditional esti-
mation) algorithm is much more difficult to implement and time consuming com-
pared to estimation of GC or a TC. Furthermore, the simulation of random numbers
is much more computationally intensive due to the quantile functions, which con-
tain the modified Bessel function of the third kind, requiring methods for numerical
integration.
Out of the Archimedean class, we estimated parameters for the Gumbel, Clayton,
Joe, and Frank copula but only the copulas of Gumbel and Joe provided a reasonable
fit to our data. Since our data represent default probabilities, the economic intuition
would be that the dependence increases for higher values, i.e., in times of recession,
as can be seen from the empirical data (see, Fig. 2). The Gumbel and Joe copulas
exhibit a positive upper tail dependence,10 while the lower ones are zero. Therefore,
they are suitable to model this kind of asymmetric dependence. The Frank copula is
9 For the symmetric GHC, we force the skewness parameter γ ∈ R K to be zero for all components
(notation according to Luethi and Breymann [30]).
10 The coefficients of upper (lower) tail dependence are defined by λ :=
U
limu1 P X 2 ≥ F2−1 (u) | X 1 ≥ F1−1 (u) and λ L := limu 0 P X 2 ≤ F2−1 (u) | X 1 ≤ F1−1 (u) ,
respectively.
138 M. Fischer and K. Jakob
Fig. 1 First level of R-vine (with parameters of Gumbel and Joe copulas) and Hierarchical
Archimedean copula (Gumbel) estimated from default data
tail independent, whereas the Clayton copula posses only a lower tail dependence.
Applying goodness-of-fit tests (see Genest et al. [31]), we have to reject both copulas
(Frank and Clayton) on a significance level considerably below 1 %. In addition, we
also considered hierarchical Archimedean constructions. With the help of the “HAC”
package from Okhrin and Ristig [20], a stepwise ML estimation procedure was used
to estimate the tree of the Gumbel HAC, depicted in Fig. 1. The figure shows that the
dependence parameters are in a range of 4.35 at the bottom, indicating the strongest
dependence, and 1.21 at the top of the tree. For the ordinary Gumbel copula, we
estimate a parameter value of 1.836, which is in the range of the HAC parameters.
Since the variates selection on each level of the HAC tree is based on empirical values
of Kendall’s τ , the structures of the two HACs (Gumbel and Joe) coincide.
Finally, the copula parameters are estimated via ML. The corresponding steps
(I)–(III) are implemented in the R-package “VineCopula” (see Schepsmeier et al.
[21]), which has been used to determine a PCC for our data set. In order to allow
maximum flexibility, we decided to use a R-vine, which generalizes both C- and D-
vines. The candidate set for the pair copulas comprises GC, TC, Gumbel, Clayton,
Frank, and Joe copula.
Analog to the HAC, the estimation algorithm of the PCC identifies sectors 3
and 8 as those with the strongest dependence. Therefore, these sectors are coupled
together on the first level of the R-vine, which means that their pairwise dependence
is explicitly selected to follow a Gumbel copula with θ̂ = 4.35. In general, all except
one bivariate copulas on the first level are estimated to be Gumbel with parameter
values in [1.56, 4.35], which is close to the HAC parameter range, see Fig. 1. Only
in case of sectors 5 and 9, the Joe copula with parameter 1.87 is preferred. Again,
the weakest dependence (measured by the implied value of Kendall’s τ ) on the first
level is related to sector 5. On higher levels, all copulas out of the candidates set are
selected to model conditional bivariate dependencies.
For the CBV model, the likelihood function is rather complex and a ML estimation is
numerically not feasible. Hence, the parameters of the CBV factor copula are chosen
such that the Euclidean distance between the empirical and the theoretical covariance
matrix is minimal (see, e.g., Fischer and Dietz [23]).
Exemplarily, Fig. 2 illustrates the contour plot of the estimated copula density
between sectors 3 (cycl. consumer goods) and 8 (industry) for different competi-
tors as well as the (transformed) empirical observations. Notice that darker areas
indicate higher concentration of the probability mass. In the first row, the elliptical
and GHCs are displayed. Looking at the center of the unit squares, one observes that,
in case of the TC and the asymmetric GHC, more probability mass is concentrated
around the main diagonal as for the GC or the symmetric GHC. Since the asymmetric
GHC provides a significantly better fit compared to the TC, the issue of asymmetri-
cally distributed data seems to be more important than the absence of a positive tail
dependence, at least for our data. This might be caused by the limited sample size
of only 120 observations. Although the asymmetric GHC has a significantly better
fit compared to the symmetric one and the skewness parameters are strictly positive,
its density still looks very symmetric.
In contrast, the copula of the CBV model11 is extremely concentrated around
the main diagonal. Here, observations aside from the diagonal have a very low
11 In case of the CBV copula, the density is estimated via a two dimensional kernel density estimator.
140 M. Fischer and K. Jakob
Fig. 2 Contour of the estimated copulas between sector 3 (cycl. consumer goods) and 8 (industry)
together with empirical observations
probability. Please note again that the estimation procedure for this copula is dif-
ferent, which might explain this issue to some extend. For the ordinary Gumbel and
Joe copulas, one has to choose one single parameter for all bivariate (and higher
dimensional) dependencies. Therefore the estimation is always a trade-off between
stronger and weaker dependencies. This leads to the effect that, in our example, the
dependence in both cases seems to be rather underestimated by this copulas compared
to its competitors. The HAC overcomes this drawback by using different parameters,
which leads to a significantly better fit.
4.4 Effect of the Copula on the Risk Figures and the Tail
of the Loss Distribution
Finally, we analyze the impact of the sector copula on the right tail and therefore
on the economic capital. Since, in practice, the underlying data sets used for para-
metrizations of both model types are rather different and not comparable, we do
not draw any comparisons between the absolute values of the risk figures across the
two models. Instead, we measure the impact with the help of factors, where the risk
figures of the models with the GC are normalized to one. In case of the CreditRisk+ -
CBV model, the marginal distributions of the sectors, which follow a weighted sum
of Gamma distributions (see Eq. (1)), are replaced by Gamma distributed variates
with the same mean and variance, for reasons of simplicity. Since this is a monotone
transformation, the dependence structure is not affected. Please note that by drawing
Copula-Specific Credit Portfolio Modeling 141
the sector realizations12 for the CreditMetrics model, we use the survival copula,13
because in this case higher values of the sector variates correspond to an increase
rather than a decrease of obligors creditworthiness.
Table 3 summarizes all risk figures. The copulas are ordered according to the
impact on the economic capital on a 99.9 % level in case of the CreditRisk+ model.
First of all, one observes that in the CreditRisk+ framework, the risk decreases
if we switch from the original model (CBV) to another one. In both models, the
GC implies the lowest risk, followed by the sym. GHC. Although both copulas
are elliptically symmetric and tail independent, the risk figures differ by up to 4 %.
Applying a TC, the risk rises in both models because of the positive tail dependence of
λ̂U = 0.69. For the CreditMetrics model the markup is around 6 %. The highest risk
arises if we use an asymmetric dependence structure, i.e., a (hierarchical) Gumbel
or Joe copula, an asym. GHC, a PCC or, in case of CreditRisk+ , the factor copula
induced by the CBV model. Therefore, at least for our data set and portfolio, there
is an indication that the risk arising from an asymmetric dependence structure, i.e.,
where dependencies are higher during times of a recession, is higher compared to
the risk caused by a positive tail dependence. In the CreditRisk+ model even the
economic capital in case of the HAC (Joe) copula is around 8.1 % above the amount
of the model with a GC and 2 % below the basic model. In both models, the risk
implied by a Joe copula is higher compared to a Gumbel copula. Since both copulas
exhibit no positive lower tail dependence, whereas the upper tail dependence14 is
higher in case of the Joe copula, this observation is plausible.
As to be expected, all portfolio loss distributions exhibit a significant amount of
skewness (skew) and kurtosis (kurt), measured by the third and fourth standardized
moments, respectively. In addition, we calculated the right-quantile weight (RQW)
for β = 0.875 which was recommended by Brys et al. [34] as a robust measure of
tail weight and is defined as follows:
FL−1 1+β
2 + FL−1 1 − β2 − 2FL−1 (0.75)
RQW(β) := ,
β
FL−1 1+β
2 − FL
−1
1 − 2
where, in our case, FL−1 denotes the quantile function of the portfolio loss distribution.
First of all, it becomes obvious that the rank order observed for EC99.9 with respect
to the copula model is highly correlated to the rank order of the higher moments and
of the tail weight. Secondly, all of the latter statistics derived from the CreditMet-
rics framework are (significantly) higher than those derived from the CreditRisk+
framework.
12 For details on the simulation of copulas in general, please refer to Mai and Scherer [33].
13 For a random vector u = (u 1 , . . . , u K )T with copula C, the survival copula is defined as the
copula of the vector (1 − u 1 , . . . , 1 − u K )T .
14 The coefficients of upper tail dependence implied by the estimated parameters are 0.54 in case
Table 3 Risk figures (SD: standard deviation, ECα ;ESα : economic capital and expected shortfall on level α ∈ [0, 1] normalized to GC) and tail measures
(skewness, kurtosis, and right-quantile-weight in absolute values) for different copulas in case of the CreditRisk+ model (left part) and the CreditMetrics model
(right part)
CreditRisk+ CreditMetrics
Copula SD EC 90 EC 99,9 E S 90 E S 99,9 skew kurt RQW SD EC 90 EC 99,9 E S 90 E S 99,9 skew kurt RQW
CBV 1.060 1.058 1.101 1.047 1.074 1,53 6,49 0.39 – – – – – – – –
GC 1.000 1.000 1.000 1.000 1.000 1,40 6,01 0.37 1.000 1.000 1.000 1.000 1.000 1,64 7,71 0.39
sGHC 1.000 0.991 1.027 1.000 1.024 1,45 6,32 0.38 1.003 0,988 1.042 1.002 1.038 1,75 8,67 0.39
TC 1.005 0.998 1.036 1.005 1.031 1,44 6,27 0.38 1.009 0,995 1.060 1.005 1.055 1,73 8,48 0.40
HAC(G) 1.001 0.988 1.053 1.008 1.039 1,50 6,57 0.39 1.006 0,978 1.095 1.009 1.073 1,83 9,18 0.41
GHC 1.010 0.997 1.057 1.011 1.047 1,56 6,86 0.38 1.007 0,992 1.090 1.009 1.068 1,87 9,40 0.41
Gumbel 0.995 0.978 1.059 1.004 1.048 1,53 6,67 0.39 1.018 0,979 1.096 1.015 1.082 1,85 9,27 0.41
PCC 1,023 1,013 1,072 1,022 1,056 1,52 6,60 0,39 1,031 1,008 1,106 1,026 1,084 1,85 9,15 0,41
Joe 1,001 0,986 1,078 1,015 1,061 1,63 7,14 0,41 1,013 0,980 1,107 1,020 1,083 1,95 9,76 0,42
HAC(J) 1,000 0,984 1,081 1,013 1,066 1,62 7,15 0,40 1,010 0,980 1,091 1,017 1,074 1,92 9,55 0,42
indep. 0.795 0.782 0.757 0.858 0.820 1,19 5,52 0.36 0.786 0,790 0.701 0.846 0.782 1,28 6,49 0.35
M. Fischer and K. Jakob
Copula-Specific Credit Portfolio Modeling 143
CreditRisk + CreditMetrics
Finally, Fig. 3 exhibits the estimated densities of the portfolio loss for both models
and different copulas. On the horizontal axis, the percentiles of the loss distribution of
the particular standard models are displayed. The ordering of the densities confirms
our results, derived from the corresponding risk figures.
5 Summary
Credit portfolio models are commonly used to estimate the future loss distribution
of credit portfolios in order to derive the amount of economic capital which has to
be allocated to cover unexpected losses. Therefore, capturing the (unknown) depen-
dence between the counterparties of the portfolios or between the economic sectors
to which counterparties have been assigned is a crucial issue. For this purpose, copula
functions provide a flexible toolbox to specify different dependence structures.
Against this background, we analyzed the effect of different parametric copulas on
the tail of the loss distribution and the risk figures for a hypothetical portfolio and for
both CreditMetrics and CreditRisk+ , two of the most popular credit portfolio mod-
els in the financial industry. Our results indicate that the specific CreditRisk+ −CBV
model uses a rather conservative copula. However, referring to Jakob and Fischer
[8], one might come across to certain artifacts for this (implicit) copula family. In the
CreditMetrics setting, the canonical assumption of a Gaussian copula allows an easy
and fast implementation but also gives rise to certain drawbacks, such as the absence
of a tail dependence (“extreme events occur together”) or the ability to model asym-
metric dependence structures for which we found evidence in the underlying data
set. Replacing the Gaussian copula by alternative competitors (Student-t, General-
ized hyperbolic, PCC or generalized Archimedean copulas) we could significantly
improve the goodness-of-fit to the underlying PD series. As a consequence, using the
144 M. Fischer and K. Jakob
Acknowledgments We would like to thank Matthias Scherer and an anonymous referee for their
helpful comments on early versions of this article. Especially the recommendation to consider a Joe
copula, which proved to be a good alternative (within the class of AC) to the Gumbel copula, was
very valuable.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
1. BaFin. Erläuterung zu den MaRisk in der Fassung vom 14.12.2012, Dec 2012
2. Board of Governors of the Federal Reserve System: Supervisory guidance on model risk man-
agement, 2011. URL http://www.federalreserve.gov/bankinforeg/srletters/sr1107.htm. Letter
11–7
3. Gupton, G.M., Finger, C.C., Bhatia. M.: CreditMetrics—technical dokument, 1997. URL http://
www.defaultrisk.com/_pdf6j4/creditmetrics_techdoc.pdf
4. Wilde, T.: CreditRisk+ a credit risk management framework, 1997. URL http://www.csfb.com/
institutional/research/assets/creditrisk.pdf
5. Koylouglu, H.U., Hickman, A.: Reconciliable differences. RISK 11, 56–62 (1998)
6. Crouhy, M., Galai, D., Mark, R.: A comparative analysis of current credit risk models. J. Bank.
Financ. 24, 59–117 (2000)
7. Gordy, M.B.: A comparative anatomy of credit risk models. J. Bank. Financ. 24, 119–149
(2000)
8. Jakob, K., Fischer, M.: Quantifying the impact of different copulas in a generalized CreditRisk+
framework. Depend. Model. 2, 1–21 (2014)
9. Sklar, A.: Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris
8, 229–231 (1959)
10. Joe, H.: Multivariate Models and Dependence Concepts. Chapman & Hall/CRC, London (1997)
11. Nelson, R.B.: An Introduction to Copulas. Springer Science+Business Media Inc., New York
(2006)
12. Li, D.X.: On default correlation: a copula function approach. J. Fixed Income 9, 43–54 (2000)
13. Ebmeyer, D., Klaas, R., Quell, P.: The role of copulas in the CreditRisk+ framework. In:
Copulas. Risk Books (2006)
14. McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management. Princeton University
Press, Princeton (2005)
15. Barndorff-Nielsen, O.E.: Exponentially decreasing distributions for the logarithm of particle
size. Proc. Roy. Soc. Lond. Ser. A, Math. Phys. Sci. 353, 401–419 (1977)
16. Savu, C., Trede, M.: Hierarchical Archimedean copulas. Quant. Financ. 10, 295–304 (2010)
17. McNeil, A.J.: Sampling nested archimedean copulas. J. Stat. Comput. Simul. 78, 567–581
(2008)
18. Hofert, M., Scherer, M.: CDO pricing with nested archimedean copulas. Quant. Financ. 11(5),
775–787 (2011)
Copula-Specific Credit Portfolio Modeling 145
19. Aas, K., Czado, C., Frigessi, A., Bakken, H.: Pair-copula constructions of multiple dependence.
Insur. Math. Econ. 44, 182–198 (2009)
20. Okhrin, O., Ristig, A.: Hierarchical Archimedean copulae: the HAC package. Humbold Uni-
versität Berlin, Juni 2012. URL http://cran.r-project.org/web/packages/HAC/index.html
21. Schepsmeier, U., Stoeber, J., Brechmann, E.C.: VineCopula: statistical inference of vine cop-
ulas, 2013. URL http://CRAN.R-project.org/package=VineCopula. R package version 1.1-1
22. Gundlach, M., Lehrbass, F.: CreditRisk+ in the Banking Industry. Springer, Berlin (2003)
23. Fischer, M., Dietz, C.: Modeling sector correlations with CreditRisk+ : the common background
vector model. J. Credit Risk 7, 23–43 (2011/12)
24. Moschopoulos, P.G.: The distribution of the sum of independendent gamma random variables.
Ann. Inst. Stat. Math. 37, 541–544 (1985)
25. Oh, D.H., Patton, A.J.: Modelling dependence in high dimension with factor copulas. working
paper, 2013. URL http://public.econ.duke.edu/ap172/Oh_Patton_MV_factor_copula_6dec12.
pdf
26. Mai, J.F., Scherer, M.: H-extendible copulas. J. Multivar. Anal. 110, 151–160 (2012b). URL
http://www.sciencedirect.com/science/article/pii/S0047259X12000802
27. Altman, E., Resti, A., Sironi A.: The link between default and recovery rates: effects on the
procyclicality of regulatory capital ratios. BIS working paper, July 2002. URL http://www.bis.
org/publ/work113.htm
28. Merton, R.C.: On the pricing of corporate debt: the risk structure of interest rates. J. Financ.
29, 449–470 (1973)
29. Hofert, M., Kojadinovic, I., Mächler, M., Yan J.: Copula: multivariate dependence with copulas,
r package version 0.999-5 edition, 2012. URL http://CRAN.R-project.org/package=copula
30. Luethi, D., Breymann W.: ghyp: a package on the generalized hyperbolic distribution and its
special cases, 2011. URL http://CRAN.R-project.org/package=ghyp. R package version 1.5.5
31. Genest, C., Remillard, B., Beaudoin, D.: Goodness-of-fit tests for copulas: a review and a power
study. Insur. Math. Econ. 44, 199–213 (2009)
32. Brechmann, E.C., Schepsmeier, U: Modeling dependence with C- and D-vine copulas: The R-
package CDVine. Technical report, Technische Universität München, 2012. URL http://www.
jstatsoft.org/v52/i03/paper
33. Mai, J.F., Scherer, M.: Simulating Copulas: Stochastic Models, Sampling Algorithms, and
Applications, vol. 4. World Scientific, Singapore (2012)
34. Brys, G., Hubert, M., Struyf, A.: Robust measures of tail weight. Comput. Stat. Data Anal.
50(3), 733–759 (2006)
Implied Recovery Rates—Auctions
and Models
1 Introduction
Corporate credit spreads contain the market’s perception about (at least) two sources
of risk: the time of default and the subsequent loss given default, respectively, the
recovery rate. Default probabilities and recovery rates are unknown parameters—
comparable to the volatility in the Black–Scholes model. We concern the question
whether it is possible to reverse-engineer and disentangle observed credit spreads
into these ingredients. Such a reverse-engineering approach translates market values
into model parameters, comparable to the extraction of market implied volatilities
in the Black–Scholes framework. There is growing literature in the field of implied
default probabilities, whereas scientific studies on implied recoveries are sparse.
Inferring implied default probabilities from market quotes of credit instruments often
relies on the assumption of a fixed recovery rate of, say, Φ = 40 %. Subsequently,
S. Höcht (B)
Assenagon GmbH, Prannerstrasse 8, 80333 Munich, Germany
e-mail: [email protected]
M. Kunze
Assenagon Asset Management S.A., Zweigniederlassung München, Prannerstrasse 8,
80333 Munich, Germany
e-mail: [email protected]
M. Scherer
Lehrstuhl für Finanzmathematik, Technische Universität München, Parkring 11,
85748 Garching-Hochbrück, Germany
e-mail: [email protected]
default probabilities are chosen such that model implied credit spreads match quoted
credit spreads. The assumption of fixing Φ = 40 % is close to the market-wide
empirical mean (compare Altman et al. [1]), but disregards recovery risk. In many
papers, the same recovery rate is assumed for all considered companies, although
empirical studies suggest that recoveries are time varying (compare Altman et al.
[2], Bruche and González-Aguado [3]), depend on the specific debt instrument, and
vary across industry sectors (compare Altman et al. [1]). Obviously, the resulting
implied default probability distribution strongly depends on the assumptions on the
recovery rate. Since default probabilities and recoveries both enter theoretical spread
formulas, we face a so-called identification problem. Making this more plastic, the
widely known approximation via the “credit triangle” (see, e.g., Spiegeleer et al.
[4, pp. 256]) suggests:
where Φ is the recovery rate and λ denotes the default intensity. Obviously, for any
given market spread s, the implied recovery is a function of (the assumption on) λ
and vice versa. Using this simplified spread formula alone, it is clearly impossible to
reverse-engineer Φ and λ simultaneously from s. As we will see, this identification
problem also appears in more sophisticated credit models.
We invoke and (at least partially) answer the questions:
• Is it possible to simultaneously extract implied recovery rates and implied default
probabilities (under the risk-neutral measure Q)?
• How do implied recoveries compare to realized recoveries?1
We address the first question using two types of credit models, where neither the
recovery rate nor the default probability distribution is fixed beforehand. As opposed
to most existing approaches for the calculation of implied recoveries, both procedures
only take into account prices from simultaneously traded assets. Instead of analyzing
the spread of one credit instrument for different points in time, implied recoveries
and default probabilities are extracted from the term structure of credit spreads.
Likewise to the aforementioned implied volatility calculation, this restriction allows
for an implied recovery calibration under the risk-neutral measure Q. Analyzing the
second question, both models are exemplarily calibrated to market data of Allied
Irish Banks (AIB), who experienced a credit event in June 2011. Subsequently, real
recovery rates were revealed and can thus be compared to their implied counterparts.
In order to clarify how real recoveries are settled in today’s credit markets, we start
by introducing the mechanism of credit auctions.
1 Here, the term realized recovery does not refer to workout recoveries but to a credit auction result.
The question whether the auction procedure appropriately anticipates workout recoveries is left for
future research.
Implied Recovery Rates—Auctions and Models 149
CDS are the most common and liquidly traded single-name credit derivatives—their
liquidity usually even exceeds the one of the underlying bond market. In case of a
credit event, the protection buyer receives a default payment, which approximates
the percentage loss of a bond holder subject to this default2 (see Schönbucher [5,
preface]). This payment is referred to as loss given default (LGD). The corresponding
recovery is defined as one minus the LGD. Recoveries are often quoted as rates, e.g.,
referring to the fraction of par the protection buyer receives, after the CDS is settled.
There are mainly three types of credit events that can be distinguished:
• Bankruptcy A bankruptcy event occurs if the company in question faces insol-
vency or bankruptcy proceedings, is dissolved (other than merger, etc.), liquidated,
or wound up.
• Failure-to-pay This occurs if the company is unable to pay back outstanding
obligations in an amount at least as large as a prespecified payment requirement.
• Restructuring A restructuring event takes place if any clause in the company’s
outstanding debt is negatively altered or violated, such that it is legally binding
for all debt holders. Not all types of CDS provide protection against restructuring
events.
These credit events are standardized by the International Swaps and Derivatives
Association (ISDA). The legally binding answer to the question, whether or not a
specific credit event occurred, is given by the so-called Determinations Commit-
tees (DC).3 CDS ISDA standard contracts as well as the responsible DCs differ
among geopolitical regions. As opposed to standard European contracts, the stan-
dard North American contract does not provide protection against restructuring credit
events. The differences are originated by regulatory requirements and the absence
of a Chapter 11 equivalent: in order to provide capital relief from a balance sheet
perspective, European contracts have to incorporate restructuring events. Our focus
will be on the case of nonrestructuring credit events in what follows.
Prior to 2005, CDS were settled physically, i.e., the protection buyer received the
contractually agreed notional in exchange for defaulted bonds with the same notional.
Accordingly, the corresponding CDS recovery rate was the ratio of the bond’s market
value to its par. This procedure exhibited different shortfalls (see Haworth [6, p. 24]
or Creditex and Markit [7]):
• For a protection buyer, it was necessary to own the defaulted asset. Often, this
entailed an unnatural inflation of bond prices after default and became a substantial
2 We will use “credit event” and “default” as synonyms. Note, however, that the terms default
and credit event are sometimes distinguished in the sense that default is associated with the final
liquidation procedure.
3 More information on DCs and ISDA can be found on www.isda.org.
150 S. Höcht et al.
4 Sometimes the phenomenon that some bonds were used several times for the settlement of CDS
is referred to as “recycling.”
5 Restructuring events differ, since they allow for maturity specific cheapest-to-deliver bonds.
Implied Recovery Rates—Auctions and Models 151
participate in the auction. Moreover, the notional of the physical settlement request
is not allowed to exceed the notional of the outstanding position.
In the next step, the so-called inside market midpoint (IMM) is calculated subject
to the following method:
1. Crossing quotes are canceled, i.e., in case an offer quote is smaller or equal to
another bid quote, the specific bid and offer are both eliminated.6
2. The so-called best halves of the remaining quotes are constructed. The best bid
half refers to the (rounded up) upper half of the remaining bid quotes. Accord-
ingly, the best offer half contains the same number of lowest non-canceled offer
quotes.
3. The IMM is defined as the average of all quotes in those best halves.
Any participant, whose bid and ask price are both violating the IMM has to pay
an adjustment amount.7 This penalty is supposed to assure that the IMM reflects
the underlying bond market in an appropriate way.8 The initial bidding period is
concluded by calculating the net open interest, i.e., the netted notional of physical
settlement requests, which is simply carried out by aggregation. In case this amount
is zero, the IMM is fixed as the auction result and consequently as the recovery for
all CDS, which were supposed to be settled via the auction. Otherwise, the IMM
serves as a benchmark for the second part of the auction procedure.
To illustrate this first step, we consider the failure-to-pay event of AIB on June
21, 2011. Two auctions were held, one for senior and one for subordinated CDS
referring to AIB. We only consider the senior auction. Table 1 displays the submitted
two-way quotes from all 14 participants. For the calculation of the IMM, the reported
bid quotes are arranged in descending order, whereas the offers start from the lowest
quote.
The first quotes from Nomura (bid) and Citigroup (offer) are canceled out, since
the corresponding bid exceeds the offer. Note that this cancelation does not entail
a settlement, both quotes are merely neglected with regard to the IMM calculation.
Therefore, 13 bid and offer quotes remain and the best halves are the seven highest bid
and lowest offer quotes, which are emphasized in Table 1. The IMM is calculated via
averaging over these quotes and rounding to one eighth, yielding an IMM of 71.375.
The maximum bid-offer spread was 2.50 %-points and the quotation amount was
EUR 2 MM. In Table 2, the corresponding physical settlement requests are reported.
As the aggregated notional from bid quotes exceeds the aggregated notional from
offer quotes, the auction type is “to buy”. Since there is netted demand for the
cheapest-to-deliver senior bond, initial offers falling below the IMM are considered
6 Note that they are not settled, but only not taken into account for the calculation of the IMM.
7 The term “violating” refers to both quotes falling below the IMM (auction is “to buy”) or exceeding
submits a bid exceeding the IMM, he or she is considered off-market, since prices are supposed to go
down and not up. Then the corresponding participant has to pay the prefixed quotation amount times
the difference between the IMM and his or her bid. The penalty works vice versa for off-market
offers if the open interest is “to buy”.
152 S. Höcht et al.
Table 1 Dealer inside market quotes for the first stage of the auction of senior AIB CDS (see
Creditex and Markit [8]). Published with the kind permission of ?Creditex Group Inc. and Markit
Group Limited 2013. All rights reserved
Dealer Bid Offer Dealer
Nomura Int. PLC 72.00 70.50 Citigroup Global Markets Ltd.
Goldman Sachs Int. 71.00 71.50 Société Générale
Bank of America N.A. 70.50 72.00 Credit Suisse Int.
Barclays Bank PLC 70.50 72.00 Deutsche Bank AG
BNP Paribas 70.50 72.00 JPMorgan Chase Bank N.A.
HSBC Bank PLC 70.50 72.25 Morgan Stanley &Co. Int. PLC
The Royal Bank of Scotland PLC 70.50 72.50 UBS AG
Deutsche Bank AG 70.00 73.00 Bank of America N.A.
UBS AG 70.00 73.00 Barclays Bank PLC
Morgan Stanley &Co. Int. PLC 69.75 73.00 BNP Paribas
Credit Suisse Int. 69.50 73.00 HSBC Bank PLC
JPMorgan Chase Bank N.A. 69.50 73.00 The Royal Bank of Scotland PLC
Société Générale 69.00 73.50 Goldman Sachs Int.
Citigroup Global Markets Ltd. 68.00 74.50 Nomura Int. PLC
Resulting IMM 71.375
All quotes are reported in %
Table 2 Physical settlement requests for the first stage of the auction of AIB (see Creditex and
Markit [8]). Published with the kind permission of ?Creditex Group Inc. and Markit Group Limited
2013. All rights reserved
Dealer Type Size in EUR MM
BNP Paribas Offer 48.00
Credit Suisse Int. Offer 43.90
Morgan Stanley &Co. Int. PLC Offer 11.80
Barclays Bank PLC Bid 30.00
JPMorgan Chase Bank N.A. Bid 52.00
Nomura Int. PLC Bid 7.75
UBS AG Bid 16,00
Total (net) “To buy” 2.05
This second step is designed as a one-sided Dutch auction, i.e., only quotes in the
opposite direction of the net open interest are allowed. In case the net open interest is
“to sell”, dealers are only allowed to submit bid limit orders and vice versa. For the
senior CDS auction of AIB, the net physical settlement request is “to buy” and thus
only offer limit orders are allowed. As opposed to the first stage of the auction, there
is no restriction with respect to the size of the submitted orders, regardless of the
initial settlement request. In order to prevent manipulations, particularly in case of a
low net open interest, the prefixed cap amount, which is usually half of the maximum
bid-offer spread, imposes a further restriction on the possible limit orders. In case the
auction is “to sell”, orders are bounded from above by the IMM plus the cap amount
and vice versa if the net open interest is “to buy”.
In addition to these new limit orders, the appropriate side from the initial two-way
quotes from the first stage of the auction are carried over to the second stage—as
long as the order does not violate the IMM. All quotes, which are carried over, are
determined to have the same size, i.e., the prespecified quotation amount, which was
already used to assess the adjustment amount.
Now, all submitted and carried over limit orders are filled, until the net open
interest is matched. In case the auction is “to sell”, i.e., there is a surplus of bond
offerings, the bid limit orders are processed in descending order, starting from the
highest quote. Analogously, if the auction is “to buy”, offer quotes are filled, starting
from the lowest quote. The unique auction price corresponds to the last quote which
was at least partially filled. Furthermore, the result may not exceed 100 %.9
Reconsider the credit event auction for outstanding senior AIB CDS. Both, carried
over offer quotes (first) as well as offers from the second stage (second) of the auction
are reported in Table 3.
Recalling that the net physical settlement request was EUR 2.05 MM, we observe
that the first two orders were partially filled. The associated limit orders were
70.125 %, which is consequently fixed as the final auction result, i.e., all outstanding
senior CDS for AIB were settled subject to a recovery rate of 70.125 %. Following
an auction, all protection buyers, who decided to settle their contracts physically
beforehand, are obliged to deliver one of the deliverable obligations in exchange for
par. Naturally, they are interested in choosing the cheapest among all possible deliv-
erables. Thus, in case of a default, protection buyers are long a cheapest-to-delivery
option (compare, e.g., Schönbucher [5, p. 36]), enhancing the position of a protection
buyer. Details about the value of that option can be found in Haworth [6, pp.30–32]
and Jankowitsch et al. [9].
9 For Northern Rock Asset Management, the European DC resolved that a restructuring credit event
occurred on December, 15, 2011. Two auctions took place on February, 2, 2012 and the first one
theoretically would have led to an auction result of 104.25 %. Consequently, the recovery was fixed
at 100 % (compare Creditex and Markit [8]).
154 S. Höcht et al.
Table 3 Limit orders for the senior auction of AIB (see Creditex and Markit [8]). Published with
the kind permission of ?Creditex Group Inc. and Markit Group Limited 2013. All rights reserved
Dealer Type Quote (%) Size (EUR MM) Aggregated size (USD MM)
JPMorgan Chase Bank N.A. Second 70.125 2.05 2.05
Barclays Bank PLC Second 70.125 2.05 4.10
Credit Suisse Int. Second 70.25 2.05 6.15
BNP Paribas Second 70.25 1.00 7.15
BNP Paribas Second 70.375 1.05 8.20
Citigroup Global Markets Ltd. First 71.375 2.00 10.20
...
Nomura Int. PLC Second 75 2.00 42.25
As explained above, the recovery of a CDS, Φτ ∈ [0, 1], refers to the result of an
auction which is held after a credit event at time τ and is designed to approximate
the relative “left-over” for a bond holder. Before a default event and the following
auction takes place this recovery is unknown. One way to assess this quantity for
nondefaulted securities is to reverse-engineer implied recoveries from market CDS
quotes. Any basic pricing approach for the “fair” spread sT of a CDS with maturity
T > 0 is of the form
I.e., the spread is the risk-neutral expectation of a function of the default time (or
default probability, respectively) and the recovery rate in case of default. Specifying τ
and Φτ , two models are revisited and calibrated by minimizing the root mean squared
error (RMSE) between EQ [ f (τ, Φτ )] and market spreads over a term structure of
CDS spreads.
This reduced-form model resembles the one presented in Jaskowski and McAleer
[11], although applied in a different context. All reduced-form models are based
on the same principle. The time of a credit event τ is the first jump of a stochastic
counting process Z = {Z t }t≥0 ∈ N0 , i.e., τ = inf {t ≥ 0 : Z t > 0}. In this case Z
will be a Cox-Process governed by a Cox–Ingersoll–Ross type intensity process λ,
i.e.,
dλt = κ(θ − λt )dt + σ λt dWt , λ0 > 0.
Fig. 1 Weekly average spreads for AIB senior and subordinated CDS with 1 and 5 years maturity.
The spreads represent two whole term structures, which are used to calibrate the presented implied
recovery approaches in every displayed week independently
and vice versa. Comparable choices for modeling recoveries can be found, e.g., in
Madan et al. [12], Das and Hanouna [13], Höcht and Zagst [14], or Jaskowski and
McAleer [11]. Since the model will be calibrated to one CDS spread curve, one has
to be restrictive concerning the amount of free model parameters in the recovery
model. Using this model, the risk-neutral spread sT (κ, θ, σ, λ0 , a) has an integral-
free representation. The resulting risk-neutral parameters and subsequently the risk-
neutral implied recovery and probability of default are determined by minimizing
the RMSE:
1 M 2
(κ ∗ , θ ∗ , σ ∗ , λ∗0 , a ∗ ) := argmin sT − sT (κ, θ, σ, λ0 , a) , (3)
|I|
T ∈I
where I is the set of maturities with observable market quotes for CDS spreads sTM .
In case senior as well as subordinated CDS are available for a certain defaultable
entity, two different recovery parameters asen and asub are used, while the intensity
parameters are the same for both seniorities. This reflects the fact that in case of a
credit event both CDS types are settled, although usually in different auctions.10 In
this case, the optimization in Eq. (3) is simply carried out by matching senior and
subordinated spreads simultaneously. For the calibration, we reconsider the exam-
ple of AIB. Figure 1 exemplarily shows weekly average quotes for AIB senior and
subordinated CDS spreads with maturities 1 and 5 years.
Approaching the time of default, a spread widening and inversion of both senior
and subordinated term structures can be observed. Calibrating the introduced Cox-
Ingersoll-Ross model to AIB CDS quotes for each week independently for several
maturities leads to the resulting implied recoveries and 5-year default probabilities
shown in Fig. 2.
10 In the current version of the upcoming ISDA supplement, subordinated CDS may also settle
without effecting senior CDS. However, so far either both or none settles.
Implied Recovery Rates—Auctions and Models 157
Fig. 2 Weekly calibration results for the CIR model applied to CDS spreads of AIB before its
default in June 2011
Implied senior and subordinated recoveries and implied default probabilities vary
substantially over time. One reason is that term structure shapes and general spread
regimes also vary unusually strong from week to week, since AIB is in distress.
Furthermore, there are co-movements of the 5-year implied default probability and
the implied recoveries. This is caused by the fact that a (recovery) and θ (long-term
default intensity) have a similar effect on long term CDS spreads. Assuming λt ≡ θ
for all t > t ∗ > 0, the fair long term spread can be approximated via
where c0 ∈ R is constant. Hence, using the above approximation for a given spread
sT , the optimal recovery parameter a ∗ can be seen as a function of the long term
default intensity, denoted as a ∗ (θ ). This entails the existence of a continuum of
parameter values (κ ∗ , θ, σ ∗ , λ∗0 , a ∗ (θ )), θ > 0, which all generate a comparable
long term spread and thus a similar RMSE. Consequently, a minor variation in the
quoted spreads might cause a substantial change in the resulting optimal parameters
and thus in the implied recovery and implied probability of default. This is referred
to as identification problem.
The following section contains a framework to circumvent this identification
problem.
Two CDS contracts with the same reference entity and maturity, but differently ranked
reference obligations, face the same default probabilities, but different recoveries.
The general idea of the “pure recovery model” goes back to Unal et al. [15] and
158 S. Höcht et al.
Schläfer and Uhrig-Homburg [16]. The approach makes use of this fact by con-
sidering the fraction of two differently ranked CDS spreads, which is then free of
default probabilities. Hence, spread ratios are considered and modeled and default
probabilities can be neglected. A comparable approach is outlined in Doshi [17]. Let
s sen and s sub denote the fair spreads of two CDS contracts referring to senior and
subordinated debt. The basic idea can be illustrated using the credit triangle formula
from Eq. (1), i.e.,
Under simplified assumptions the ratio of two different types of CDS spreads is a
function of the recoveries Φ sen and Φ sub . In case of the credit triangle formula, for
instance, the underlying assumptions include independence of λ and Φ. The crucial
point is to find a suitable and sophisticated model, such that this fraction again only
contains recovery information. Implied recoveries are then extracted by calibrating
fractions of senior and subordinated spreads. We propose a model that allows for
time variation in Φ but no dependence on the default time τ .
In a first step, a company-wide recovery rate X T is defined, i.e., a recovery for
the whole company in case of a default until T, where Tmax is the maximum of all
instruments’ maturities which should be captured by the model. Suppose μ0 ∈ (0, 1),
μ1 ∈ (−1, 1), and μ0 + μ1 ∈ (0, 1). Furthermore, let v ∈ (0, 1). For a certain
maturity Tmax > T > 0, X T is assumed to be Beta-distributed with the following
expectation and variance:
EQ [X T ] = μ(T ) := μ0 + μ1 T /Tmax , (6)
VarQ [X T ] = σ (T ) := v[μ(T ) − μ(T ) ].
2 2
(7)
The Beta distribution is a popular choice for modeling stochastic recovery rates,
since it allows for an U-shaped density on [0, 1] that is empirically confirmed for
recovery rates. The above parameter restrictions assure that a Beta distribution with
EQ [X T ] and VarQ [X T ] as above actually exists. The square-root specification allows
for a higher differentiation between maturity specific recoveries near T = 0, a
phenomenon which is also widely reflected in CDS market term structures. Overall,
this company-wide recovery distribution varies in time without depending on τ .
In a second step, the seniority specific recoveries ΦTsen and ΦTsub are defined as
functions of X T . In legal terms, such a relation is established via a pecking order,
defined by the Absolute Priority Rule (APR): In case of a default event, any class
of debt with a lower priority than another will only be repaid if all higher ranked
debt is repaid in full. Furthermore, all claimants of the same seniority will recover
simultaneously, i.e., they receive the same proportion of their par value. Let dsec ,
dsen , and dsub denote the proportions of secured, senior unsecured, and subordinated
unsecured debt, respectively, on the balance sheet of a company at default, such that
dsub + dsen + dsec = 1. Figure 3 illustrates the APR.
Implied Recovery Rates—Auctions and Models 159
Fig. 3 Absolute priority rule: seniority specific recoveries depend on the stochastic firm-wide
recovery and the debt structure of the company
The parameters dsub , dsen , and dsec determine, which proportion of X T is assigned
to senior and subordinated debt holders if a default occurs. Motivated by the linkage
of bonds and CDS in the auction mechanism, ΦTsen and ΦTsub are also assumed to
be the appropriate CDS recoveries. Note, however, that in practice, APR violations
often occur and are widely examined (see, e.g., Betker [18] and Eberhart and Weiss
[19]). Using the APR rule, a general spread representation as in Eq. (2) as well as
independence of Φ and τ , the recoveries are deterministic functions of the company-
wide recovery X T and the fraction of senior to subordinated CDS spreads is given
by
dsec +dsenx−dsec 1
sTsen 1− dsec dsen f pT ,qT (x)d x − dsec +dsen f pT ,qT (x)d x
= 1 , (8)
sTsub 1 − dsec +dsen x−(ddsecsub+dsen ) f pT ,qT (x)d x
where f pT ,qT (x) denotes the density of a Beta( pT , qT )-distributed random variable.
The variables pT and qT are linked to the parameters μ0 , μ1 , and v via Eqs. (6)
and (7) and the first two moments of the Beta distribution. They are calibrated using
the above formula, whereas the balance sheet parameters dsec , dsen , and dsub are
directly taken from quarterly reports. Instead of calibrating a single-spread curve,
the calibration is carried out by matching theoretical fractions sTsen /sTsub (μ0 , μ1 , v)
in Eq. (8) for a set of several maturities to their market counterparts sTM,sen /sTM,sub ,
i.e.
1 M,sen M,sub 2
(μ∗0 , μ∗1 , v ∗ ) := argmin sT /sT − sTsen /sTsub (μ0 , μ1 , v) .
|I|
T ∈I
160 S. Höcht et al.
Fig. 4 Weekly calibration results for the pure recovery model applied to CDS spreads of AIB
before its default in June 2011
seniorities, such as senior and subordinated CDS.11 Furthermore, the extracted risk-
neutral recoveries are more in line with the observed final auction results. Generally,
further instruments, e.g., loans or the recently more popular contingent convertibles
could be used in a similar way.
Acknowledgments The authors would like to thank Michael Hünseler, Rudi Zagst, and an anony-
mous referee for their valuable remarks on earlier versions of the manuscript.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
1. Altman, E.I., Kishore, I., Vellore, M.: Almost everything you wanted to know about recoveries
on defaulted bonds. Financ. Anal. J. 52(6), 57–64 (1996)
2. Altman, E.I., Brady, B., Resti, A., Sironi, A.: The link between default and recovery rate:
theory, empirical evidence and implications. J. Bus. 78(6), 2203–2227 (2005)
3. Bruche, M., González-Aguado, C.: Recovery rates, default probabilities and the credit cycle.
J. Bank. Financ. 34(4), 754–764 (2010)
4. De Spiegeleer, J., Van Hulle, C., Schoutens, W.: The Handbook of Hybrid Securities. Wiley
Finance Series. Wiley (2014)
5. Schönbucher, P.J.: Credit Derivatives Pricing Models. Wiley Finance Series. Wiley, Canada
(2003)
6. Haworth, H.: A guide to credit event and auctions (2011). http://www.credit-suisse.com/
researchandanalytics
7. Creditex and Markit. Credit event auction primer (2010). http://www.creditfixings.com
8. Creditex and Markit. Credit event fixings (2013). http://www.creditfixings.com
9. Jankowitsch, R., Pullirsch, R., Vez̆a, T.: The delivery option in credit default swaps. J. Bank.
Financ. 32(7), 1269–1285 (2008)
10. Du, S., Zhu, H.: Are CDS auctions biased? Working Paper, Stanford University (2011)
11. Jaskowski, M., McAleer, M.: Estimating implied recovery rates from the term structure of CDS
spreads. Tinbergen Institute Discussion Papers 13–005/III, Tinbergen Institute (2012)
12. Madan, D.B., Bakshi, G.S., Zhang F.X.: Understanding the role of recovery in default risk
models: empirical comparisons and implied recovery rates. FDIC CFR Working Paper No. 06;
EFA 2004 Maastricht Meetings Paper No. 3584; FEDS Working Paper; AFA 20004 Meetings,
2006
13. Das, S., Hanouna, P.: Implied recovery. J. Econ. Dyn. Control 33(11), 1837–1857 (2009)
14. Höcht, S., Zagst, R.: Pricing credit derivatives under stochastic recovery in a hybrid model.
Appl. Stoch. Models Bus. Ind. 26(3), 254–276 (2010)
15. Unal, H., Madan, D., Güntay, L.: Pricing the risk of recovery in default with absolute priority
rule violation. J. Bank. Financ. 27(6), 1001–1218 (2003)
16. Schläfer T.S., Uhrig-Homburg, M.: Estimating market-implied recovery rates from credit
default swap premia. University of Karlsruhe Working Paper, 2009
17. Doshi, H.: The term structure of recovery rates. McGill University Working Paper (2011)
11 Note that there is an ongoing discussion regarding the admission of credit events, which are
only binding for subordinated CDS. Such a possibility is explicitly excluded in the proposed pure
recovery model.
162 S. Höcht et al.
18. Betker, B.L.: Management’s incentives, equity’s bargaining power and deviations from absolute
priority in Chapter 11 bankruptcies. J. Bus. 68(2), 161–183 (1995)
19. Eberhart, A.C., Weiss, L.A.: The importance of deviations form the absolute priority rule in
Chapter 11 bankruptcy proceedings. Financ. Manag. 27(4), 106–110 (1998)
Upside and Downside Risk Exposures
of Currency Carry Trades via Tail
Dependence
Abstract Currency carry trade is the investment strategy that involves selling low
interest rate currencies in order to purchase higher interest rate currencies, thus
profiting from the interest rate differentials. This is a well known financial puzzle
to explain, since assuming foreign exchange risk is uninhibited and the markets
have rational risk-neutral investors, then one would not expect profits from such
strategies. That is, according to uncovered interest rate parity (UIP), changes in
the related exchange rates should offset the potential to profit from such interest
rate differentials. However, it has been shown empirically, that investors can earn
profits on average by borrowing in a country with a lower interest rate, exchanging
for foreign currency, and investing in a foreign country with a higher interest rate,
whilst allowing for any losses from exchanging back to their domestic currency at
maturity.
This paper explores the financial risk that trading strategies seeking to exploit
a violation of the UIP condition are exposed to with respect to multivariate tail
dependence present in both the funding and investment currency baskets. It will
outline in what contexts these portfolio risk exposures will benefit accumulated
portfolio returns and under what conditions such tail exposures will reduce portfolio
returns.
M. Ames (B)
Department of Statistical Science, University College London, London, UK
e-mail: [email protected]
G.W. Peters
Department of Statistical Science, University College London, London, UK
G.W. Peters
Commonwealth Scientific and Industrial Research Organisation, Canberra, Australia
G.W. Peters
Oxford-Man Institute, Oxford University, Oxford, UK
G. Bagnarosa
Department of Computer Science, ESC Rennes School of Business, University College London,
London, UK
I. Kosmidis
Department of Statistical Science, University College London, London, UK
One of the most robust puzzles in finance still to be satisfactorily explained is the
uncovered interest rate parity puzzle and the associated excess average returns of
currency carry trade strategies. Such trading strategies are popular approaches which
involve constructing portfolios by selling low interest rate currencies in order to buy
higher interest rate currencies, thus profiting from the interest rate differentials. The
presence of such profit opportunities, pointed out by [2, 10, 15] and more recently
by [5–7, 20, 21, 23], violates the fundamental relationship of uncovered interest rate
parity (UIP). The UIP refers to the parity condition in which exposure to foreign
exchange risk, with unanticipated changes in exchange rates, is uninhibited and
therefore if one assumes rational risk-neutral investors, then changes in the exchange
rates should offset the potential to profit from the interest rate differentials between
high interest rate (investment) currencies and low interest rate (funding) currencies.
We can more formally write this relation by assuming that the forward price, FtT , is
a martingale under the risk-neutral probability Q ([24]):
ST FtT
EQ F = = e(rt −rt )(T −t) . (1)
St
t
St
The UIP Eq. (1) thus states that under the risk-neutral probability, the expected vari-
ation of the exchange rate St should equal the differential between the interest rate
of the two associated countries, denoted by, respectively, rt and rt . The currency
carry trade strategy investigated in this paper aims at exploiting violations of the UIP
relation by investing a certain amount in a basket of high interest rate currencies (the
long basket), while funding it through a basket of low interest rate currencies (the
short basket). When the UIP holds, then given foreign exchange market equilibrium,
no profit should arise on average from this strategy, however, such opportunities are
routinely observed and exploited by large volume trading strategies.
In this paper, we build on the existing literature by studying a stochastic feature
of the joint tail behaviours of the currencies within each of the long and the short
baskets, which form the carry trade. We aim to explore to what extent one can attribute
the excess average returns with regard to compensation for exposure to tail risk, for
example either dramatic depreciations in the value of the high interest rate currencies
or dramatic appreciations in the value of the low interest rate currencies in times of
high market volatility.
We postulate that such analyses should also benefit from consideration not only
of the marginal behaviours of the processes under study, in this case the exchange
rates of currencies in a portfolio, but also a rigorous analysis of the joint dependence
Upside and Downside Risk Exposures of Currency Carry Trades via Tail Dependence 165
In order to fully understand the tail risks of joint exchange rate movements present
when one invests in a carry trade strategy, we can look at both the downside extremal
tail exposure and the upside extremal tail exposure within the funding and investment
baskets that comprise the carry portfolio. The downside tail exposure can be seen
as the crash risk of the basket, i.e. the risk that one will suffer large joint losses
from each of the currencies in the basket. These losses would be the result of joint
appreciations of the currencies that one is short in the low interest rate basket and/or
joint depreciations of the currencies that one is long in the high interest rate basket.
Definition 1 (Downside Tail Risk Exposure in Carry Trade Portfolios) Consider the
investment currency (long) basket with n-exchange rates relative to base currency, on
day t, with currency log returns (X t(1) , X t(2) , . . . , X t(d) ). Then, the downside tail expo-
sure risk for the carry trade will be defined as the conditional probability of adverse
166 M. Ames et al.
currency movements in the long basket, corresponding to its upper tail dependence
(a loss for a long position results from a forward exchange rate increase), given by,
(i) (i) (1) (i−1) −1 (u), X (i+1) (d)
λU (u) := Pr X t > Fi−1 (u)|X t > F1−1 (u), . . . , X t > Fi−1 t
−1 (u), . . . , X
> Fi+1 t > Fd−1 (u)
(2)
(i) (i) (1) (i−1) −1 (u), X (i+1) < F −1 (u), . . . , X (d) < F −1 (u) .
λL (u) := Pr X t < Fi−1 (u)|X t < F1−1 (u), . . . , X t < Fi−1 t i+1 t d
(3)
We can formalise the notion of the dependence behaviour in the extremes of the
multivariate distribution through the concept of tail dependence, limiting behaviour
of Eqs. (2) and (3), as u ↑ 1 and u ↓ 0 asymptotically. The interpretation of such
quantities is then directly relevant to assessing the chance of large adverse move-
ments in multiple currencies which could potentially increase the risk associated
with currency carry trade strategies significantly, compared to risk measures which
only consider the marginal behaviour in each individual currency. Under certain sta-
tistical dependence models, these extreme upside and downside tail exposures can
be obtained analytically. We develop a flexible copula mixture example that has such
properties below.
In order to study the joint tail dependence in the investment or funding basket,
we consider an overall tail dependence analysis which is parametric model based,
obtained by using flexible mixtures of Archimedean copula components. Such a
model approach is reasonable since typically the number of currencies in each of
the long basket (investment currencies) and the short basket (funding currencies)
is 4 or 5.
In addition, these models have the advantage that they produce asymmetric depen-
dence relationships in the upper tails and the lower tails in the multivariate model.
We consider three models; two Archimedean mixture models and one outer power
transformed Clayton copula. The mixture models considered are the Clayton-Gumbel
mixture and the Clayton-Frank-Gumbel mixture, where the Frank component allows
for periods of no tail dependence within the basket as well as negative dependence.
We fit these copula models to each of the long and short baskets separately.
N
where 0 ≤ λi ≤ 1 ∀i ∈ {1, . . . , N } and i=1 λi = 1.
In the following section, we consider two stages to estimate the multivariate basket
returns, first the estimation of suitable heavy tailed marginal models for the currency
exchange rates (relative to USD), followed by the estimation of the dependence
structure of the multivariate model composed of multiple exchange rates in currency
baskets for long and short positions.
Once the parametric Archimedean mixture copula model has been fitted to a basket
of currencies, it is possible to obtain the upper and lower tail dependence coefficients,
via closed form expressions for the class of mixture copula models and outer power
transform models we consider. The tail dependence expressions for many common
bivariate copulae can be found in [25]. This concept was recently extended to the
multivariate setting by [9].
Definition 4 (Generalised Archimedean Tail Dependence Coefficient) Let X =
(X 1 , . . . , X d )T be an d-dimensional random vector with distribution C(F1 (X 1 ),
. . . , Fd (X d )), where C is an Archimedean copula and F1 , . . . , Fd are the marginal
distributions. The coefficients of upper and lower tail dependence are defined respec-
tively as:
= lim P X 1 > F1−1 (u), . . . , X h > Fh−1 (u)|X h+1 > Fh+1
−1
(u), . . . , X d > Fd−1 (u)
1,...,h|h+1,...,d
λU
u→1−
d d
d−i i(−1) ψ (it)
i
i=1
= lim , (6)
t→0+ d−h d−h
= lim P X 1 < F1−1 (u), . . . , X h < Fh−1 (u)|X h+1 < Fh+1
−1
(u), . . . , X d < Fd−1 (u)
1,...,h|h+1,...,d
λL
u→0+
d ψ (dt)
= lim (7)
t→∞ d − h ψ
((d − h)t)
for the model dependence function ‘generator’ ψ(·) and its inverse function.
In [9], the analogous form of the generalised multivariate upper and lower tail
dependence coefficients for outer power transformed Clayton copula models is pro-
vided. The derivation of Eqs. (6) and (7) for the outer power case follows from [12],
i.e. the composition of a completely monotone function with a non-negative func-
tion that has a completely monotone derivative is again completely monotone. The
densities for the outer power Clayton copula can be found in [1].
In the above definitions of model-based parametric upper and lower tail depen-
dence, one gets the estimates of joint extreme deviations in the whole currency basket.
It will often be useful in practice to understand which pairs of currencies within a
given currency basket contribute significantly to the downside or upside risks of the
overall currency basket. In the class of Archimedean-based mixtures we consider,
the feature of exchangeability precludes decompositions of the total basket down-
side and upside risks into individual currency specific components. To be precise,
we aim to perform a decomposition of say the downside risk of the funding basket
into contributions from each pair of currencies in the basket, we will do this via a
simple linear projection onto particular subsets of currencies in the portfolio that are
Upside and Downside Risk Exposures of Currency Carry Trades via Tail Dependence 169
d
i|1,2,...,i−1,i+1,...,d 2|1 3|1 3|2 d|d−1 i| j
E λ̂U λ̂U , λ̂U , λ̂U , . . . , λ̂U = α0 + αi j λ̂U , (8)
i= j
i|1,2,...,i−1,i+1,...,d
where λ̂U is a random variable since it is based on parameters of
the mixture copula model which are themselves functions of the data and therefore
random variables. Such a simple linear projection will then allow one to interpret
directly the marginal linear contributions to the upside or downside risk exposure
of the basket obtained from the model, according to particular pairs of currencies in
the basket by considering the coefficients αi j , i.e. the projection weights. To perform
this analysis, we need estimates of the pairwise tail dependence in the upside and
i| j i| j
downside risk exposures λ̂U and λ̂L for each pair of currencies i, j ∈ {1, 2, . . . , d}.
We obtain this through non-parametric (model-free) estimators, see [8].
Definition 5 Non-Parametric Pairwise Estimator of Upper Tail Dependence
(Extreme Exposure)
n−k
n , n
n−k
log Ĉn
λ̂U = 2 − min 2, k = 1, 2, . . . , n − 1, (9)
n )
log( n−k
n
where Ĉn (u 1 , u 2 ) = 1
n 1 Rn1i ≤ u 1 , R2i
n ≤ u 2 and R ji is the rank of the variable
i=1
in its marginal dimension that makes up the pseudo data.
In order to form a robust estimator of the upper tail dependence, a median of
the estimates obtained from setting k as the 1st, 2nd, . . . , 20th percentile values was
used. Similarly, k was set to the 80th, 81st, . . . , 99th percentiles for the lower tail
dependence.
The inference function for margins (IFM) technique introduced in [17] provides a
computationally faster method for estimating parameters than Full Maximum Like-
lihood, i.e. simultaneously maximising all model parameters and produces in many
cases a more stable likelihood estimation procedure. This two-stage estimation pro-
cedure was studied with regard to the asymptotic relative efficiency compared with
maximum likelihood estimation in [16] and in [14]. It can be shown that the IFM
estimator is consistent under weak regularity conditions.
In modelling parametrically the marginal features of the log return forward
exchange rates, we wanted flexibility to capture a broad range of skew-kurtosis rela-
170 M. Ames et al.
q
p
σ 2 = α0 + αi εt−i
2
+ βi σt−i
2
. (11)
i=1 i=1
The estimation for the three model parameters in the l.g.g.d. can be challenging due to
the fact that a wide range of model parameters, especially for k, can produce similar
resulting density shapes (see discussions in [19]). To overcome this complication
and to make the estimation efficient, it is proposed to utilise a combination of profile
likelihood methods over a grid of values for k and perform profile likelihood based
MLE estimation for each value of k, over the other two parameters b and u. The
differentiation of the profile likelihood for a given value of k produces the system of
two equations:
√ n
σ̃ k y√i
1
n y exp
yi i=1 i σ̃
exp(μ̃) = exp √ ; n σ̃ k − y − √ = 0,
n σ̃ k y√i k
i=1 i=1 exp σ̃ k
(12)
√
where n is the number of observations, yi = log xi , σ̃ = b/ k and μ̃ = u + b log k.
The second equation is solved directly via a simple root search to give an estimation
for σ̃ and then substitution into the first equation results in an estimate for μ̃. Note,
for each value of k we select in the grid, we get the pair of parameter estimates μ̃
Upside and Downside Risk Exposures of Currency Carry Trades via Tail Dependence 171
and σ̃ , which can then be plugged back into the profile likelihood to make it purely
a function of k, with the estimator for k then selected as the one with the maximum
likelihood score. As a comparison, we also fit the GARCH(1,1) model using the
MATLAB MFEtoolbox using the default settings.
In order to fit the copula model, the parameters are estimated using maximum like-
lihood on the data after conditioning on the selected marginal distribution models
and their corresponding estimated parameters obtained in Stage 1. These models are
utilised to transform the data using the CDF function with the l.g.g.d. MLE parame-
ters (k̂, û and b̂) or using the conditional variances to obtain standardised residuals
for the GARCH model. Therefore, in this second stage of MLE estimation, we aim
to estimate either the one parameter mixture of CFG components with parameters
θ = (ρclayton , ρfrank , ρgumbel , λclayton , λfrank , λgumbel ), the one parameter mixture of
CG components with parameters θ = (ρclayton , ρgumbel , λclayton , λgumbel ) or the two
parameter outer power transformed Clayton with parameters θ = (ρclayton , βclayton ).
The log likelihood expression for the mixture copula models, is given generically
by:
n
n
d
l(θ) = log c(F1 (X i1 ; μ̂1 , σ̂1 ), . . . , Fd (X id ; μ̂d , σ̂d )) + log f j (X i j ; μ̂ j , σ̂ j ).
i=1 i=1 j=1
(13)
This optimization is achieved via a gradient descent iterative algorithm which was
found to be quite robust given the likelihood surfaces considered in these models with
the real data. Alternative estimation procedures such as expectation-maximisation
were not found to be required.
In our study, we fit copula models to the high interest rate basket and the low interest
rate basket updated for each day in the period 02/01/1989 to 29/01/2014 using log
return forward exchange rates at one month maturities for data covering both the
previous 6 months and previous year as a sliding window analysis on each trading
day in this period.
Our empirical analysis consists of daily exchange rate data for a set of 34 currency
exchange rates relative to the USD, as in [23]. The currencies analysed included:
Australia (AUD), Brazil (BRL), Canada (CAD), Croatia (HRK), Cyprus (CYP),
Czech Republic (CZK), Egypt (EGP), Euro area (EUR), Greece (GRD), Hungary
(HUF), Iceland (ISK), India (INR), Indonesia (IDR), Israel (ILS), Japan (JPY),
172 M. Ames et al.
Malaysia (MYR), Mexico (MXN), New Zealand (NZD), Norway (NOK), Philippines
(PHP), Poland (PLN), Russia (RUB), Singapore (SGD), Slovakia (SKK), Slove-
nia (SIT), South Africa (ZAR), South Korea (KRW), Sweden (SEK), Switzerland
(CHF), Taiwan (TWD), Thailand (THB), Turkey (TRY), Ukraine (UAH) and the
United Kingdom (GBP).
We have considered daily settlement prices for each currency exchange rate as
well as the daily settlement price for the associated 1 month forward contract. We
utilise the same dataset (albeit starting in 1989 rather than 1983 and running up
until January 2014) as studied in [20, 23] in order to replicate their portfolio returns
without tail dependence risk adjustments. Due to differing market closing days, e.g.
national holidays, there was missing data for a couple of currencies and for a small
number of days. For missing prices, the previous day’s closing prices were retained.
As was demonstrated in Eq. (1), the differential of interest rates between two
countries can be estimated through the ratio of the forward contract price and the
spot price, see [18] who show this holds empirically on a daily basis. Accordingly,
instead of considering the differential of risk-free rates between the reference and
the foreign countries, we build our respective baskets of currencies with respect to
the ratio of the forward and the spot prices for each currency. On a daily basis,
we compute this ratio for each of the d currencies (available in the dataset on that
day) and then build five baskets. The first basket gathers the d/5 currencies with the
highest positive differential of interest rate with the US dollar. These currencies are
thus representing the ‘investment’ currencies, through which we invest the money to
benefit from the currency carry trade. The last basket will gather the d/5 currencies
with the highest negative differential (or at least the lowest differential) of interest
rate. These currencies are thus representing the ‘financing’ currencies, through which
we borrow the money to build the currency carry trade.
Given this classification, we investigate then the joint distribution of each group
of currencies to understand the impact of the currency carry trade, embodied by the
differential of interest rates, on currencies returns. In our analysis, we concentrate on
the high interest rate basket (investment currencies) and the low interest rate basket
(funding currencies), since typically when implementing a carry trade strategy one
would go short the low interest rate basket and go long the high interest rate basket.
In order to model the marginal exchange rate log returns, we considered two
approaches. First, we fit Log Generalised Gamma models to each of the 34 cur-
rencies considered in the analysis, updating the fits for every trading day based on
a 6 month sliding window. A time series approach was also considered to fit the
marginals, as is popular in much of the recent copula literature, see for example [4],
using GARCH(1,1) models for the 6-month sliding data windows. In each case we
are assuming approximate local stationarity over these short 6 month time frames.
Upside and Downside Risk Exposures of Currency Carry Trades via Tail Dependence 173
Table 1 Average AIC for the Generalised Gamma (GG) and the GARCH(1,1) for the four most
frequent currencies in the high interest rate and the low interest rate baskets over the 2001–2014
data period split into two chunks, i.e. 6 years
01–07 07–14
Investment Currency GG GARCH GG GARCH
TRY 356.9 (3.5) 341.1 (21.7) 358.7 (3.0) 349.1 (16.8)
MXN 360.0 (1.2) 357.04 (3.8) 358.6 (4.0) 344.5 (28.1)
ZAR 358.7 (3.0) 353.5 (11.4) 358.0 (6.1) 352.8 (12.2)
BRL 359.0 (2.8) 341.6 (19.4) 360.0 (2.1) 341.6 (23.2)
Funding JPY 361.2 (0.9) 356.5 (7.2) 356.9 (6.8) 355.0 (7.0)
CHF 360.8 (1.4) 359.1 (2.9) 358.6 (7.4) 355.4 (8.8)
SGD 360.0 (2.7) 356.8 (5.7) 360.0 (2.6) 353.7 (7.5)
TWD 358.7 (6.2) 347.0 (16.4) 359.1 (5.8) 348.5 (13.2)
Standard deviations are shown in parentheses. Similar performance was seen between 1989 and
2001
A summary of the marginal model selection can be seen in Table 1, which shows
the average AIC scores for the four most frequent currencies in the high interest
rate and the low interest rate baskets over the data period. Whilst the AIC for the
GARCH(1,1) model is consistently lower than the respective AIC for the Generalised
Gamma, the standard errors are sufficiently large for there to be no clear favourite
between the two models.
However, when we consider the model selection of the copula in combination
with the marginal model, we observe lower AIC scores for copula models fitted
on the pseudo-data resulting from using Generalised Gamma margins than using
GARCH(1,1) margins. This is the case for all three copula models under consid-
eration in the paper. Figure 1 shows the AIC differences when using the Clayton-
Frank-Gumbel copula in combination with the two choices of marginal for the high
interest rate and the low interest rate basket, respectively. Over the entire data period,
the mean difference between the AIC scores for the CFG model with Generalised
Gamma versus GARCH(1,1) marginals for the high interest rate basket is 12.3 and
for the low interest rate basket is 3.6 in favour of the Generalised Gamma.
Thus, it is clear that the Generalised Gamma model is the best model in our copula
modelling context and so is used in the remainder of the analysis. We now consider
the goodness-of-fit of the three copula models applied to the high interest rate basket
and low interest rate basket pseudo data. We used a scoring via the AIC between the
three component mixture CFG model versus the two component mixture CG model
versus the two parameter OpC model. One could also use the Copula-Information-
Criterion (CIC), see [13] for details.
The results are presented for this comparison in Fig. 2, which shows the dif-
ferentials between AIC for CFG versus CG and CFG versus OpC for each of the
high interest rate and the low interest rate currency baskets. We can see it is not
unreasonable to consider the CFG model for this analysis, since over the entire data
period, the mean difference between the AIC scores for the CFG and the CG models
174 M. Ames et al.
AIC of CFG Model with GenGamma Margins minus AIC of CFG with GARCH Margins on High Basket.
(Negative means CFG GenGamma is a better fit)
100
(AIC of CFG GARCH(1,1))
0
(AIC of CFG GG) −
−100
−200
−300
90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14
Date
AIC of CFG Model with GenGamma Margins minus AIC of CFG with GARCH Margins on Low Basket.
(Negative means CFG GenGamma is a better fit)
100
(AIC of CFG GARCH(1,1))
50
(AIC of CFG GG) −
−50
−100
−150
−200
90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14
Date
Fig. 1 Comparison of AIC for Clayton-Frank-Gumbel model fit on the pseudo-data resulting from
generalised gamma versus GARCH(1,1) margins. The high interest rate basket is shown in the
upper panel and the low interest rate basket is shown in the lower panel
for the high interest rate basket is 1.33 and for the low interest rate basket is 1.62 in
favour of the CFG.
However, from Fig. 2, we can see that during the 2008 credit crisis period, the CFG
model is performing much better. The CFG copula model provides a much better
fit when compared to the OpC model, as shown by the mean difference between
the AIC scores of 9.58 for the high interest rate basket and 9.53 for the low interest
rate basket. Similarly, the CFG model performs markedly better than the OpC model
during the 2008 credit crisis period.
Below, we will examine the time-varying parameters of the maximum likelihood fits
of this mixture CFG copula model. Here, we shall focus on the strength of dependence
present in the currency baskets, given the particular copula structures in the mixture,
which is considered as tail upside/downside exposure of a carry trade over time.
Figure 3 shows the time-varying upper and lower tail dependence, i.e. the extreme
upside and downside risk exposures for the carry trade basket, present in the high
interest rate basket under the CFG copula fit and the OpC copula fit. Similarly, Fig. 4
shows this for the low interest rate basket.
Remark 2 (Model Risk and its Influence on Upside and Downside Risk Exposure) In
fitting the OpC model, we note that independent of the strength of true tail dependence
Upside and Downside Risk Exposures of Currency Carry Trades via Tail Dependence 175
AIC of CFG Model with GenGamma Margins vs AICs of OpC and CG with GenGamma Margins on High Basket
(Negative means CFG GG is a better fit)
100
(AIC of comparison model)
0
(AIC of CFG GG) −
−100
OpC
CG
−200
−300
90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14
Date
AIC of CFG Model with GenGamma Margins vs AICs of OpC and CG with GenGamma Margins on Low Basket
(Negative means CFG GG is a better fit)
50
(AIC of comparison model)
0
(AIC of CFG GG) −
−50
OpC
−100
CG
−150
−200
90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14
Date
Fig. 2 Comparison of AIC for Clayton-Frank-Gumbel model with Clayton-Gumbel and outer
power clayton models on high and low interest rate baskets with generalised gamma margins. The
high interest rate basket is shown in the upper panel and the low interest rate basket is shown in the
lower panel
in the multivariate distribution, the upper tail dependence coefficient λU for this
model strictly increases with dimension very rapidly. Therefore, when fitting the OpC
model, if the basket size becomes greater than bivariate, i.e. from 1999 onwards, the
upper tail dependence estimates become very large (even for outer power parameter
values very close to β = 1). This lack of flexibility in the OpC model only becomes
VIX vs Tail Dependence Present in CFG Copula and OpC Copula in High IR Basket
Upper Tail Dependence
1 100
GBP leaves ERM Asian Crisis
BNP HF Bust OpC
US IR Hikes Lehman Collapse VIX
Dotcom Crash
CFG
Russian Default
VIX
U
0.5 50
λ
0 0
90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14
Date
Lower Tail Dependence
1 100
GBP leaves ERM
Asian Crisis US IR Hikes BNP HF Bust OpC
Dotcom Crash Lehman Collapse VIX
CFG
Russian Default
VIX
L
0.5 50
λ
0 0
90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14
Date
Fig. 3 Comparison of Volatility Index (VIX) with upper and lower tail dependence of the high
interest rate basket in the CFG copula and OpC copula. US NBER recession periods are represented
by the shaded grey zones. Some key crisis dates across the time period are labelled
176 M. Ames et al.
VIX vs Tail Dependence Present in CFG Copula and OpC Copula in Low IR Basket
VIX
U
0.5 50
λ
0 0
90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14
Date
VIX
L
0.5 50
λ
0 0
90 91 92 93 94 95 96 97 98 99 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14
Date
Fig. 4 Comparison of Volatility Index (VIX) with upper and lower tail dependence of the low
interest rate basket in the CFG copula and OpC copula. US NBER recession periods are represented
by the shaded grey zones. Some key crisis dates across the time period are labelled
apparent in baskets of dimension greater than 2, but is also evident in the AIC scores
in Fig. 2. Here, we see an interesting interplay between the model risk associated to
the dependence structure being fit and the resulting interpreted upside or downside
financial risk exposures for the currency baskets.
Focusing on the tail dependence estimate produced from the CFG copula fits, we
can see that there are indeed periods of heightened upper and lower tail dependence in
the high interest rate and the low interest rate baskets. There is a noticeable increase
in upper tail dependence in the high interest rate basket at times of global market
volatility. Specifically, during late 2007, i.e. the global financial crisis, there is a
sharp peak in upper tail dependence. Preceding this, there is an extended period of
heightened lower tail dependence from 2004 to 2007, which could tie in with the
building of the leveraged carry trade portfolio positions. This period of carry trade
construction is also very noticeable in the low interest rate basket through the very
high levels of upper tail dependence.
We compare in Figs. 3 and 4 the tail dependence plotted against the VIX volatility
index for the high interest rate basket and the low interest rate basket, respectively,
for the period under investigation. The VIX is a popular measure of the implied
volatility of S&P 500 index options—often referred to as the fear index. As such,
it is one measure of the market’s expectations of stock market volatility over the
next 30 days. We can clearly see here that in the high interest rate basket, there
are upper tail dependence peaks at times when there is an elevated VIX index,
particularly post-crisis. However, we would not expect the two to match exactly
since the VIX is not a direct measure of global FX volatility. We can thus conclude
that investors’ risk aversion clearly plays an important role in the tail behaviour. This
conclusion corroborates recent literature regarding the skewness and the kurtosis
features characterising the currency carry trade portfolios [5, 11, 23].
Upside and Downside Risk Exposures of Currency Carry Trades via Tail Dependence 177
Fig. 5 Heat map showing the strength of non-parametric tail dependence between each pair of
currencies averaged over the 2008 credit crisis period. Lower tail dependence is shown in the lower
triangle and upper tail dependence is shown in the upper triangle. The 3 currencies most frequently
in the high interest rate and the low interest rate baskets are labelled
178 M. Ames et al.
CAD
NOK
CHF LOW LOW
SEK
0.78
MXN
PLN
MYR
SGD
INR HIGH HIGH
0.62
ZAR
NZD
THB
KRW
TWD 0.46
BRL
HRK
CZK LOW LOW
EGP HIGH HIGH
HUF 0.3
ISK
IDR
ILS
PHP
0.14
RUB
UAH HIGH HIGH
EUR TRY JPY GBP AUD CAD NOK CHF SEK MXN PLN MYR SGD INR ZAR NZD THB KRW TWD BRL HRK CZK EGP HUF ISK IDR ILS PHP RUB UAH
Fig. 6 Heat map showing the strength of non-parametric tail dependence between each pair of
currencies averaged over the last 12 months (01/02/2013–29/01/2014). Lower tail dependence is
shown in the lower triangle and upper tail dependence is shown in the upper triangle. The 3
currencies most frequently in the high interest rate and the low interest rate baskets are labelled
interpret this as the relative contribution of each of the 3 currency pairs to the overall
basket tail dependence. We note that for the low interest rate lower tail dependence
and for the high interest rate upper tail dependence, there is a significant degree of
cointegration between the currency pair covariates and hence we might be able to
use a single covariate due to the presence of a common stochastic trend.
Table 2 Pairwise non-parametric tail dependence, during the period 01/02/2013 to 29/01/2014,
regressed on respective basket tail dependence (standard errors are shown in parentheses)
Low IR Basket Constant CHF JPY CZK CHF CZK JPY R2
Upper TD 0.22 (0.01) 0.02 (0.03) 0.18 (0.02) 0.38 (0.05) 0.57
Lower TD 0.71 (0.17) −0.62 (0.25) −0.38 (0.26) 0.23 (0.32) 0.28
High IR Basket Constant EGP INR UAH EGP UAH INR R2
Upper TD 0.07 (0.01) −0.06 (0.33) 0.59 (0.08) 2.37 (0.42) 0.4
Lower TD 0.1 (0.02) 0.56 (0.05) 0.44 (0.08) −0.4 (0.07) 0.44
The 3 currencies most frequently in the respective baskets are used as independent variables
Upside and Downside Risk Exposures of Currency Carry Trades via Tail Dependence 179
As was discussed in Sect. 2, the tail exposures associated with a currency carry trade
strategy can be broken down into the upside and downside tail exposures within each
of the long and short carry trade baskets. The downside relative exposure adjusted
returns are obtained by multiplying the monthly portfolio returns by one minus the
upper and the lower tail dependence present, respectively, in the high interest rate
basket and the low interest rate basket at the corresponding dates. The upside relative
exposure adjusted returns are obtained by multiplying the monthly portfolio returns
by one plus the lower and upper tail dependence present, respectively, in the high
interest rate basket and the low interest rate basket at the corresponding dates. Note
that we refer to these as relative exposure adjustments only for the tail exposures
since we do not quantify a market price per unit of tail risk. However, this is still
informative as it shows a decomposition of the relative exposures from the long and
short baskets with regard to extreme events.
Downside Risk Adjusted Returns for HML basket (penalising tail dependence)
250
HML Returns
150
100
50
0
88 90 92 94 96 98 00 02 04 06 08 10 12 14
Date
Fig. 7 Cumulative log returns of the carry trade portfolio (HML = High interest rate basket minus
low interest rate basket). Downside exposure adjusted cumulative log returns using upper/lower
tail dependence in the high/low interest rate basket for the CFG copula and the OpC copula are
shown for comparison
As can be seen in Fig. 7, the relative adjustment to the absolute cumulative returns
for each type of downside exposure is greatest for the low interest rate basket, except
under the OpC model, but this is due to the very poor fit of this model to baskets
containing more than 2 currencies which we see transfers to financial risk exposures.
This is interesting because intuitively one would expect the high interest rate basket
to be the largest source of tail exposure. However, one should be careful when
180 M. Ames et al.
Upside Risk Adjusted Returns for HML basket (rewarding tail dependence)
400
HML Returns
200
150
100
50
0
88 90 92 94 96 98 00 02 04 06 08 10 12 14
Date
Fig. 8 Cumulative log returns of the carry trade portfolio (HML = High interest rate basket minus
low interest rate basket). Upside exposure adjusted cumulative log returns using lower/upper tail
dependence in the high/low interest rate basket for the CFG copula and the OpC copula are shown
for comparison
interpreting this plot, since we are looking at the extremal tail exposure. The analysis
may change if one considered the intermediate tail risk exposure, where the marginal
effects become significant. Similarly, Fig. 8 shows the relative adjustment to the
absolute cumulative returns for each type of upside exposure is greatest for the low
interest rate basket. The same interpretation as for the downside relative exposure
adjustments can be made here for upside relative exposure adjustments.
7 Conclusion
In this paper, we have shown that the positive and negative multivariate tail risk
exposures present in currency carry trade baskets are additional factors needing
careful consideration when one constructs a carry portfolio. Ignoring these exposures
leads to a perceived risk return profile that is not reflective of the true nature of such
a strategy. In terms of marginal model selection, it was shown that one is indifferent
between the log Generalised Gamma model and the frequently used GARCH(1,1)
model. However, in combination with the three different Archimedean copula models
considered in this paper, the log Generalised Gamma marginals provided a better
overall model fit.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
Upside and Downside Risk Exposures of Currency Carry Trades via Tail Dependence 181
References
1. Ames, M., Bagnarosa, G., Peters, G.W.: Reinvestigating the Uncovered Interest Rate Parity Puz-
zle via Analysis of Multivariate Tail Dependence in Currency Carry Trades. arXiv:1303.4314
(2013)
2. Backus, D.K., Foresi, S., Telmer, C.I.: Affine term structure models and the forward premium
anomaly. J. Financ. 56(1), 279–304 (2001)
3. Bollerslev, T.: Generalized autoregressive conditional heteroskedasticity. J. Econom. 31(3),
307–327 (1986)
4. Brechmann, E.C., Czado, C.: Risk management with high-dimensional vine copulas: an analy-
sis of the Euro Stoxx 50. Stat. Risk Model. 30(4), 307–342 (2012)
5. Brunnermeier, M.K., Nagel, S., Pedersen, L.H.: Carry trades and currency crashes. Working
paper 14473, National Bureau of Economic Research, November 2008
6. Burnside, C., Eichenbaum, M., Kleshchelski, I., Rebelo, S.: Do peso problems explain the
returns to the carry trade? Rev. Financ. Stud. 24(3), 853–891 (2011)
7. Christiansen, C., Ranaldo, A., Söderlind, P.: The time-varying systematic risk of carry trade
strategies. J. Financ. Quant. Anal. 46(04), 1107–1125 (2011)
8. Cruz, M., Peters, G., Shevchenko, P.: Handbook on Operational Risk. Wiley, New York (2013)
9. De Luca, G., Rivieccio G.: Multivariate tail dependence coefficients for archimedean copulae.
Advanced Statistical Methods for the Analysis of Large Data-Sets, p. 287 (2012)
10. Fama, E.F.: Forward and spot exchange rates. J. Monet. Econ. 14(3), 319–338 (1984)
11. Farhi, E., Gabaix, X.: Rare disasters and exchange rates. Working paper 13805, National Bureau
of Economic Research, February 2008
12. Feller, W.: An Introduction to Probability Theory and Its Applications, vol. 2. Wiley, New York
(1971)
13. Grønneberg, S.: The copula information criterion and its implications for the maximum pseudo-
likelihood estimator. Dependence Modelling: The Vine Copula Handbook, World Scientific
Books, pp. 113–138 (2010)
14. Hafner, C.M., Manner, H.: Dynamic stochastic copula models: estimation, inference and appli-
cations. J. Appl. Econom. 27(2), 269–295 (2010)
15. Hansen, L.P., Hodrick, R.J.: Forward exchange rates as optimal predictors of future spot rates:
an econometric analysis. J. Polit. Econ. 829–853 (1980)
16. Joe, H.: Asymptotic efficiency of the two-stage estimation method for copula-based models.
J. Multivar. Anal. 94(2), 401–419 (2005)
17. Joe, H., Xu, J.J.: The Estimation Method of Inference Functions for Margins for Multivariate
Models. Technical report, Technical Report 166, Department of Statistics, University of British
Columbia (1996)
18. Juhl, T., Miles, W., Weidenmier, M.D.: Covered interest arbitrage: then versus now. Economica
73(290), 341–352 (2006)
19. Lawless, J.F.: Inference in the generalized gamma and log gamma distributions. Technometrics
22(3), 409–419 (1980)
20. Lustig, H., Roussanov, N., Verdelhan, A.: Common risk factors in currency markets. Rev.
Financ. Stud. 24(11), 3731–3777 (2011)
21. Lustig, H., Verdelhan, A.: The cross section of foreign currency risk premia and consumption
growth risk. Am. Econ. Rev. 97(1), 89–117 (2007)
22. McNeil, A.J., Nešlehová, J.: Multivariate archimedean copulas, d-monotone functions and
L1-norm symmetric distributions. Ann. Stat. 37(5B), 3059–3097 (2009)
23. Menkhoff, L., Sarno, L., Schmeling, M., Schrimpf, A.: Carry trades and global foreign exchange
volatility. J. Financ. 67(2), 681–718 (2012)
24. Musiela, M., Rutkowski, M.: Martingale Methods in Financial Modelling. Springer, Berlin
(2011)
25. Nelsen, R.B.: An Introduction to Copulas. Springer, New York (2006)
Part III
Insurance Risk
and Asset Management
Participating Life Insurance Contracts under
Risk Based Solvency Frameworks: How
to Increase Capital Efficiency by Product Design
1 Introduction
Traditional participating life insurance products play a major role in old-age provision
in Continental Europe and in many other countries. These products typically come
with a guaranteed benefit at maturity, which is calculated using some guaranteed
minimum interest rate. Furthermore, the policyholders receive an annual surplus
participation that depends on the performance of the insurer’s assets. With the so-
A. Reuß
Institut für Finanz- und Aktuarwissenschaften, Lise-Meitner-Straße 14, 89081 Ulm, Germany
e-mail: [email protected]
J. Ruß · J. Wieland (B)
Institut für Finanz- und Aktuarwissenschaften, Ulm University, Lise-Meitner-Straße 14,
89081 Ulm, Germany
e-mail: [email protected]
J. Wieland
e-mail: [email protected]
called cliquet-style guarantees, once such surplus has been assigned to the policy at
the end of the year, it increases the guaranteed benefit based on the same guaranteed
minimum interest rate. This product design can create significant financial risk.
Briys and de Varenne [8] were among the first to analyze the impact of interest rate
guarantees on the insurer’s risk exposure. However, they considered a simple point-
to-point guarantee where surplus (if any) is credited at maturity only. The financial
risks of cliquet-style guarantee products have later been investigated, e.g., by Grosen
and Jorgensen [17]. They introduce the “average interest principle”, where the insurer
aims to smooth future bonus distributions by using a bonus reserve as an additional
buffer besides the policy reserve (the client’s account). Besides valuing the contract
they also calculate default probabilities (however, under the risk-neutral probability
measure Q). Grosen et al. [19] extend the model of Grosen and Jorgensen [17], and
introduce mortality risk. Grosen and Jorgensen [18] modify the model used by Briys
and de Varenne [8] by incorporating a regulatory constraint for the insurer’s assets
and analyzing the consequences for the insurer’s risk policy. Mitersen and Persson
[23] analyze a different cliquet-style guarantee framework with the so-called terminal
bonuses, whereas Bauer et al. [4] specifically investigate the valuation of participating
contracts under the German regulatory framework.
While all this work focuses on the risk-neutral valuation of life insurance contracts
(sometimes referred to as “financial approach”), Kling et al. [20, 21] concentrate
on the risk a contract imposes on the insurer (sometimes referred to as “actuar-
ial approach”) by means of shortfall probabilities under the real-world probability
measure P.
Barbarin and Devolder [3] introduce a methodology that allows for combining
the financial and actuarial approach. They consider a contract similar to Briys and
de Varenne [8] with a point-to-point guarantee and terminal surplus participation.
To integrate both approaches, they use a two-step method of pricing life insurance
contracts: First, they determine a guaranteed interest rate such that certain regulatory
requirements are satisfied, using value at risk and expected shortfall risk measures.
Second, to obtain fair contracts, they use risk-neutral valuation and adjust the par-
ticipation in terminal surplus accordingly. Based on this methodology, Gatzert and
Kling [14] investigate parameter combinations that yield fair contracts and analyze
the risk implied by fair contracts for various contract designs. Gatzert [13] extends
this approach by introducing the concept of “risk pricing” using the “fair value of
default” to determine contracts with the same risk exposure. Graf et al. [16] (also
building on Barbarin and Devolder [3]) derive the risk minimizing asset allocation
for fair contracts using different risk measures like the shortfall probability or the
relative expected shortfall.
Under risk-based solvency frameworks such as Solvency II or the Swiss Solvency
Test (SST), the risk analysis of interest rate guarantees becomes even more impor-
tant. Under these frameworks, capital requirement is derived from a market consistent
valuation considering the insurer’s risk. This risk is particularly high for long term
contracts with a year-to-year guarantee based on a fixed (i.e., not path dependent)
guaranteed interest rate. Measuring and analyzing the financial risk in relation to the
required capital, and analyzing new risk figures such as the Time Value of Options
Participating Life Insurance Contracts under Risk Based Solvency Frameworks … 187
and Guarantees (TVOG) is a relatively new aspect, which gains importance with
new solvability frameworks, e.g., the largest German insurance company (Allianz)
announced in a press conference on June 25, 20131 the introduction of a new partici-
pating life insurance product that (among other features) fundamentally modifies the
type of interest rate guarantee (similar to what we propose in the remainder of this
paper). It was stressed that the TVOG is significantly reduced for the new product.
Also, it was mentioned that the increase of the TVOG resulting from an interest rate
shock (i.e., the solvency capital requirement for interest rate risk) is reduced by 80 %
when compared to the previous product. This is consistent with the findings of this
paper.
The aim of this paper is a comprehensive risk analysis of different contract designs
for participating life insurance products. Currently, there is an ongoing discussion,
whether and how models assessing the insurer’s risk should be modified to reduce the
capital requirements (e.g., by applying an “ultimate forward rate” set by the regula-
tor). We will in contrast analyze how (for a given model) the insurer’s risk, and hence
capital requirement can be influenced by product design. Since traditional cliquet-
style participating life insurance products lead to very high capital requirements, we
will introduce alternative contract designs with modified types of guarantees, which
reduce the insurer’s risk and profit volatility, and therefore also the capital require-
ments under risk-based solvency frameworks. In order to compare different product
designs from an insurer’s perspective, we develop and discuss the concept of Capital
Efficiency, which relates profit to capital requirements.2 We identify the key drivers
of Capital Efficiency, which are then used in our analyses to assess different product
designs.
The remainder of this paper is structured as follows:
In Sect. 2, we present three considered contract designs that all come with the
same level of guaranteed maturity benefit but with different types of guarantee:
• Traditional product: a traditional contract with a cliquet-style guarantee based on
a guaranteed interest rate > 0.
• Alternative product 1: a contract with the same guaranteed maturity benefit, which
is, however, valid only at maturity; additionally, there is a 0 % year-to-year guar-
antee on the account value meaning that the account value cannot decrease from
one year to the next.
• Alternative product 2: a contract with the same guaranteed maturity benefit that is,
however, valid only at maturity; there is no year-to-year guarantee on the account
value meaning that the account value may decrease in some years.
On top of the different types of guarantees, all three products include a surplus
participation depending on the insurer’s return on assets. Our model is based on
the surplus participation requirements given by German regulation. That means in
particular that each year at least 90 % of the (book value) investment return has to be
distributed to the policyholders.
To illustrate the mechanics, we will first analyze the different products under
different deterministic scenarios. This shows the differences in product design and
how they affect the insurer’s risk.
In Sect. 3, we introduce our stochastic model, which is based on a standard fi-
nancial market model: The stock return and short rate processes are modeled using
a correlated Black-Scholes and Vasicek model.3 We then describe how the evolu-
tion of the insurance portfolio and the insurer’s balance sheet are simulated in our
asset-liability-model. The considered asset allocation consists of bonds with differ-
ent maturities and stocks. The model also incorporates management rules as well as
typical intertemporal risk sharing mechanisms (e.g., building and dissolving unreal-
ized gains and losses), which are an integral part of participating contracts in many
countries and should therefore not be neglected.
Furthermore, we introduce a measure for Capital Efficiency based on currently
discussed solvency regulations such as the Solvency II framework. We also propose
a more tractable measure for an assessment of the key drivers of Capital Efficiency.
In Sect. 4, we present the numerical results. We show that the alternative products
are significantly more capital efficient: financial risk, and therefore also capital re-
quirement is significantly reduced, although in most scenarios all products provide
the same maturity benefit to the policyholder.4 We observe that the typical “asymme-
try”, i.e., particularly the heavy left tail of the insurer’s profit distribution is reduced
by the modified products. This leads to a significant reduction of both, the TVOG
and the solvency capital requirement for interest rate risk.
Section 5 concludes and provides an outlook for further research.
2 Considered Products
In this section, we describe the three different considered contract designs. Note that
for the sake of simplicity, we assume that in case of death in year t, always only the
current account value AVt (defined below) is paid at the end of year t. This allows
us to ignore mortality for the calculation of premiums and actuarial reserves.
3 The correlated Black-Scholes and Vasicek model is applied in Zaglauer and Bauer [29] and Bauer
et al. [5] in a similar way.
4 Note: In scenarios where the products’ maturity benefits do differ, the difference is limited since
the guaranteed maturity benefit (which is the same for all three products) is a lower bound for the
maturity benefit.
Participating Life Insurance Contracts under Risk Based Solvency Frameworks … 189
T −1
(P − ct ) · (1 + i)T −t = G. (1)
t=0
During the lifetime of the contract, the insurer has to build up sufficient (prospective)
actuarial reserves A Rt for the guaranteed benefit based on the same constant interest
rate i:
T −t T −1 k−t
1 1
A Rt = G · − (P − ck ) · . (2)
1+i 1+i
k=t
The client’s account value AVt consists of the sum of the actuarial reserve A Rt and
the bonus reserve B Rt ; the maturity benefit is equal to AVT .
As a consequence, each year at least the rate i has to be credited to the contracts.
The resulting optionality is often referred to as asymmetry: If the asset return is above
i, a large part (e.g., p = 90 %) of the return is credited to the client as a surplus and
the shareholders receive only a small portion (e.g., 1 − p = 10 %) of the return.
If, on the other hand, the asset returns are below i, then 100 % of the shortfall has
to be compensated by the shareholder. Additionally, if the insurer distributes a high
surplus, this increases the insurer’s future risk since the rate i has to be credited also
to this surplus amount in subsequent years. Such products constitute a significant
5 For the equivalence principle, see e.g., Saxer [25], Wolthuis [28].
190 A. Reuß et al.
Fig. 1 Two illustrative deterministic scenarios for the traditional product: asset returns and yield
distribution
6 This was also a key result of the QIS5 final report preparing for Solvency II, cf. [2, 11].
Participating Life Insurance Contracts under Risk Based Solvency Frameworks … 191
We will now introduce two alternative product designs, which are based on the idea
to allow different values for the pricing rate, the reserving rate and the year-to-
year minimum guaranteed interest rate on the account value. So Formulas 1 and 2
translate to the following formulae for the relation between the annual premium, the
guaranteed benefit and the actuarial reserves:
T −1
T −t
(P − ct ) · 1 + i p =G
t=0
T −t
T −1 k−t
1 1
A Rt = G · − (P − ck ) · .
1 + ir 1 + ir
k=t
Note, that in the first years of the contract, negative values for A Rt are possible in
case of i p < ir , which implies a “financial buffer” at the beginning of the contract.
The year-to-year minimum guaranteed interest rate i g is not relevant for the formulae
above, but it is simply a restriction for the development of the client’s account, i.e.,
AVt ≥ (AVt−1 + P − ct−1 ) · 1 + i g ,
The left part of (3) assures that the account value is nonnegative and never lower
than the actuarial reserve. The required yield decreases if the bonus reserve (which
is included in AVt−1 ) increases.
192 A. Reuß et al.
Fig. 2 Two illustrative deterministic scenarios for alternative 1 product: asset returns and yield
distribution
Participating Life Insurance Contracts under Risk Based Solvency Frameworks … 193
Fig. 3 Two illustrative deterministic scenarios for alternative 2 product: asset returns and yield
distribution
Since surplus participation is typically based on local GAAP book values (in particu-
lar in Continental Europe), we use a stochastic balance sheet and cash flow projection
model for the analysis of the product designs presented in the previous section. The
model includes management rules concerning asset allocation, reinvestment strat-
egy, handling of unrealized gains and losses and surplus distribution. Since the focus
of the paper is on the valuation of future profits and capital requirements we will
introduce the model under a risk-neutral measure. Similar models have been used
(also in a real-world framework) in Kling et al. [20, 21] and Graf et al. [16].
We assume that the insurer’s assets are invested in coupon bonds and stocks. We
treat both assets as risky assets in a risk-neutral, frictionless and continuous financial
194 A. Reuß et al.
market. Additionally, cash flows during the year are invested in a riskless bank
account (until assets are reallocated). We let the short rate process rt follow a Vasicek7
model, and the stock price St follow a geometric Brownian motion:
(1)
drt = κ (θ − rt ) dt + σr dWt and
dSt (1)
(2)
= rt dt + ρσ S dWt + 1 − ρ 2 σ S dWt ,
St
(1) (2)
where Wt and Wt each denote a Wiener process on some probability space
(Ω,
F , F, Q) with a risk-neutral
measure Q and the natural filtration F = Ft =
(1) (2)
σ Ws , Ws , s < t . The parameters κ, θ, σr , σ S and ρ are deterministic and
constant. For the purpose of performing Monte Carlo simulations, the stochastic
differential equations can be solved to
⎛ ⎞
t t t
σ2
St = St−1 · exp ⎝ ru du − S + ρσ S dWu(1) + 1 − ρ 2 σ S dWu(2) ⎠ and
2
t−1 t−1 t−1
t
rt = e−κ · rt−1 + θ 1 − e−κ + σr · e−κ(t−u) dWu(1) ,
t−1
rt (s) =
2 2
1 1 − e−κs 1 − e−κs σr2 1 − e−κs σr
exp rt + s − · θ− 2 + −1
s κ κ 2κ κ 4κ
for any time t and term s > 0. Based on the yield curves, we calculate par yields that
determine the coupon rates of the considered coupon bonds.
7 Cf. [27].
8 Cf. Zaglauer and Bauer [29]. A comprehensive explanation of this property is included in
Bergmann [6].
9 See Seyboth [26] as well as Branger and Schlag [7].
Participating Life Insurance Contracts under Risk Based Solvency Frameworks … 195
The insurer’s simplified balance sheet at time t is given by Table 1. Since our analysis
is performed for a specific portfolio of insurance contracts on a stand-alone basis,
there is no explicit allowance for shareholders’ equity or other reserves on the liability
side. Rather, X t denotes the shareholders’ profit or loss in year t, with corresponding
cash flow at the beginning of the next year. Together with AVt as defined in Sect. 2,
this constitutes the liability side of our balance sheet.
In our projection of the assets and insurance contracts, incoming cash flows (pre-
mium payments at the beginning of the year, coupon payments and repayment of
nominal at the end of the year) and outgoing cash flows (expenses at the beginning of
the year and benefit payments at the end of the year) occur. In each simulation path,
cash flows occurring at the beginning of the year are invested in a bank account. At
the end of the year, the market values of the stocks and coupon bonds are derived and
the asset allocation is readjusted according to a rebalancing strategy with a constant
stock ratio q based on market values. Conversely, (1 − q) is invested in bonds and
any money on the bank account is withdrawn and invested in the portfolio consisting
of stocks and bonds.
If additional bonds need to be bought in the process of rebalancing, the corre-
sponding amount is invested in coupon bonds yielding at par with term M. However,
toward the end of the projection, when the insurance contracts’ remaining term is
less than M years, we invest in bonds with a term that coincides with the longest
remaining term of the insurance contracts. If bonds need to be sold, they are sold
proportionally to the market values of the different bonds in the existing portfolio.
With respect to accounting, we use book-value accounting rules following German
GAAP, which may result in unrealized gains or losses (UGL): Coupon bonds are
considered as held to maturity and their book value BVtB is always given by their
nominal amounts (irrespective if the market value is higher or lower). In contrast,
for the book value of the stocks BVtS , the insurer has some discretion.
Of course, interest rate movements as well as the rebalancing will cause fluc-
tuations with respect to the UGL of bonds. Also, the rebalancing may lead to the
realization of UGL of stocks. In addition, we assume an additional management rule
with respect to UGL of stocks: We assume that the insurer wants to create rather
stable book value returns (and hence surplus distributions) in order to signal stability
to the market. We, therefore, assume that a ratio dpos of the UGL of stocks is realized
annually if unrealized gains exist and a ratio dneg of the UGL is realized annually
if unrealized losses exist. In particular, dneg = 100 % has to be chosen in a legal
framework where unrealized losses on stocks are not possible.
196 A. Reuß et al.
Based on this model, the total asset return on a book value basis can be calculated
in each simulation path each year as the sum of coupon payments from bonds,
interest payments on the bank account, and the realization of UGL. The split between
policyholders and shareholders is driven by the minimum participation parameter p
explained in Sect. 2. If the cumulative required yield on the account values of all
policyholders is larger than this share, there is no surplus for the policyholders,
and exactly the respective required yield z t is credited to every account. Otherwise,
surplus is credited, which amounts to the difference between the policyholders’ share
of the asset return and the cumulative required yield. Following the typical practice,
e.g., in Germany, we assume that this surplus is distributed among the policyholders
such that all policyholders receive the same client’s yield (defined by the required
yield plus surplus rate), if possible. To
achieve that, we apply an algorithm that
(1) (k)
sorts the accounts by required yield, i.e., z t , . . . , z t , k ∈ N in ascending order.
First, all contracts receive their respective required yield. Then, the available surplus
(1)
is distributed: Starting with the contract(s) with the lowest required yield z t , the
algorithm distributes the available surplus to all these contracts until the gap to the
(2)
next required yield z t is filled. Then, all the contracts with a required yield lower
(2) (3)
or equal to z t receive an equal amount of (relative) surplus until the gap to z t is
filled, etc. This is continued until the entire surplus is distributed. The result is that
all contracts receive the same client’s yield if this unique client’s yield exceeds the
required yield of all contracts. Otherwise, there exists a threshold z ∗ such that all
contracts with a required yield above z ∗ receive exactly their required yield (and no
surplus) and all contracts with a required yield below z ∗ receive z ∗ (i.e., they receive
some surplus).
From this, the insurer’s profit X t results as the difference between the total asset
return and the amount credited to all policyholder accounts. If the profit is negative,
a shortfall has occurred, which we assume to be compensated by a corresponding
capital inflow (e.g., from the insurer’s shareholders) at the beginning of the next
year.10 Balance sheet and cash flows are projected over τ years until all policies that
are in force at time zero have matured.
10We do not consider the shareholders’ default put option resulting from their limited liability,
which is in line with both, Solvency II valuation standards and the Market Consistent Embedded
Value framework (MCEV), cf. e.g., [5] or [10], Sect. 5.3.4.
Participating Life Insurance Contracts under Risk Based Solvency Frameworks … 197
product portfolio from an insurer’s perspective. Rather, capital requirement and the
resulting cost of capital should be considered in relation to profitability.
Therefore, a suitable measure of Capital Efficiency could be some ratio of prof-
itability and capital requirement, e.g., based on the distribution of the random variable
τ
Xt
Bt
t=1
. (4)
τ
RCt−1 ·CoCt
Bt
t=1
The numerator represents the present value of the insurer’s future profits, whereas the
denominator is equal to the present value of future cost of capital: RCt denotes the
required capital at time t under some risk-based solvency framework, i.e., the amount
of shareholders’ equity needed to support the business in force. The cost of capital
is derived by applying the cost of capital rate CoCt for year t on the required capital
at the beginning of this year.11 In practical applications, however, the distribution of
this ratio might not be easy to calculate. Therefore, moments of this distribution, a
separate analysis of (moments of) the numerator and the denominator or even just
an analysis of key drivers for that ratio could create some insight.
In this spirit, we will use a Monte Carlo framework to calculate the following key
figures using the model described above:
A typical market consistent measure for the insurer’s profitability is the expected
present value of future profits (PVFP),12 which corresponds to the expected value of
the numerator in (4). The PVFP is estimated based on Monte Carlo simulations:
N τ
1 X t(n) 1
N
PVFP = (n)
= PVFP(n) ,
N B N
n=1 t=1 t n=1
where N is the number of scenarios, X t(n) denotes the insurer’s profit/loss in year t
in scenario n, Bt(n) is the value of the bank account after t years in scenario n, and
hence PVFP(n) is the present value of future profits in scenario n.
In addition, the degree of asymmetry of the shareholder’s cash flows can be char-
acterized by the distribution of PVFP(n) over all scenarios13 and by the time value of
options and guarantees (TVOG). Under the MCEV framework,14 the latter is defined
by
TVOG = PVFPC E − PVFP
11 This approach is similar to the calculation of the cost of residual nonhedgeable risk as introduced in
the MCEV Principles in [9], although RCt reflects the total capital requirement including hedgeable
risks.
12 The concept of PVFP is introduced as part of the MCEV Principles in [9].
13 Note that this is a distribution under the risk-neutral measure and has to be interpreted carefully.
However, it can be useful for explaining differences between products regarding PVFP and TVOG.
14 Cf. [9].
198 A. Reuß et al.
τ Xt
(C E)
where PVFPC E = t=1 B (C E) is the present value of future profits in the so-called
t
“certainty equivalent” (CE) scenario. This deterministic scenario reflects the expected
development of the capital market under the risk-neutral measure. It can be derived
from the initial yield curve r0 (s) based on the assumption that all assets earn the
forward rate implied by the initial yield curve.15 The TVOG is also used as an
indicator for capital requirement under risk-based solvency frameworks.
Comparing the PVFP for two different interest rate levels—one that we call ba-
sic level and a significantly lower one that we call stress level—provides another
important key figure for interest rate risk and capital requirements. In the standard
formula16 of the Solvency II framework
determines the contribution of the respective product to the solvency capital require-
ment for interest rate risk (SCRint ). Therefore, we also focus on this figure which
primarily drives the denominator in (4).
4 Results
4.1 Assumptions
portfolio at the beginning of the projections with remaining time to maturity between
1 year and 19 years (i.e., τ = 19 years).17 For each contract, the account value at
t = 0 is derived from a projection in a deterministic scenario. In this deterministic
scenario, we use a flat yield curve of 3.0 % (consistent with the mean reversion
parameter θ of the stochastic model after t = 0), and parameters for management
rules described below. In line with the valuation approach under Solvency II and
MCEV, we do not consider new business.
The book value of the asset portfolio at t = 0 coincides with the book value of
liabilities. We assume a stock ratio of q = 5 % with unrealized gains on stocks at
t = 0 equal to 10 % of the book value of stocks. The coupon bond portfolio consists
of bonds with a uniform coupon of 3.0 % where the time to maturity is equally split
between 1 year and M = 10 years.
Capital market parameters for the basic and stress projections are shown in Table 4.
The parameters κ, σr , σ S and ρ are directly adopted from Graf et al. [16]. The pa-
rameters θ and r0 are chosen such that they are more in line with the current low
interest rate level. The capital market stress corresponds to an immediate drop of
interest rates by 100 basis points.
The parameters for the management rules are given in Table 5 and are consistent
with current regulation and practice in the German insurance market.
For all projections, the number of scenarios is N = 5,000. Further analyses
showed that this allows for a sufficiently precise estimation of the relevant figures.18
17 Note that due to mortality before t = 0, the number of contracts for the different remaining times
to maturity is not the same.
18 In order to reduce variance in the sample an antithetic path selection of the random numbers is
In Table 6, the PVFP and the TVOG for the base case are compared for the three
products. All results are displayed as a percentage of the present value of future
premium income from the portfolio. For alternative 1, the PVFP increases from
3.63 to 4.24 %, i.e., by 0.61 percentage points (pp), compared to the traditional
contract design (which corresponds to a 17 % increase of profitability). This means
that this product with a “maturity only” guarantee and an additional guarantee that
the account value will not decrease is, as expected, more profitable than the product
with a traditional year-to-year (cliquet-style) guarantee. This difference is mainly
caused by the different degree of asymmetry of the shareholders’ cash flows which is
characterized by the TVOG. Since PVFPC E amounts to 4.26 % for all products in the
base case, the difference of TVOG between the traditional product and alternative
1 is also 0.61 pp. This corresponds to a TVOG reduction of more than 90 % for
alternative 1, which shows that the risk resulting from the interest rate guarantee is
much lower for the modified product.
Compared to this, the differences between alternative 1 and alternative 2 are
almost negligible. The additional increase of the PVFP is only 0.01 pp, which is due
to a slightly lower TVOG compared to alternative 1. This shows that the fact that
the account value may decrease in some years in alternative 2 does not provide a
material additional risk reduction.
Additional insights can be obtained by analyzing the distribution of PVFP(n) (see
Fig. 4)19 : For the traditional contract design, the distribution is highly asymmetric
with a strong left tail and a significant risk of negative shareholder cash flows (on a
present value basis). In contrast, both alternative contract designs exhibit an almost
symmetric distribution of shareholder cash flows which explains the low TVOG.
Hence, the new products result in a significantly more stable profit perspective for
the shareholders, while for the traditional product the shareholder is exposed to
significantly higher shortfall risk.
Ultimately, the results described above can be traced back to differences in the
required yield. While for the traditional product, by definition, the required yield
always amounts to 1.75 %, it is equal to 0 % in most scenarios for the alternative 1
product. Only in the most adverse scenarios, the required yield rises toward 1.75 %.20
For the alternative 2 product, it is even frequently negative.
Table 6 PVFP and TVOG for base case (as percentage of the present value of premium income)
Traditional product (%) Alternative 1 (%) Alternative 2 (%)
PVFP 3.63 4.24 4.25
TVOG 0.63 0.02 0.01
Apart from the higher profitability, the alternative contract designs also result
in a lower capital requirement for interest rate risk. This is illustrated in Table 7,
which displays the PVFP under the interest rate stress and the difference to the basic
level. Compared to the basic level, the PVFP for the traditional product decreases
by 75 %, which corresponds to an SCRint of 2.73 % of the present value of future
premium income. In contrast, the PVFP decreases by only around 40 % for the
alternative contract designs and thus the capital requirement is only 1.66 and 1.65 %,
respectively.
We have seen that a change in the type of guarantee results in a significant increase
of the PVFP. Further analyses show that a traditional product with guaranteed interest
rate i = 0.9 % instead of 1.75 % would have the same PVFP (i.e., 4.25 %) as the
alternative contract designs with i p = 1.75 %. Hence, although changing only the
type of guarantee and leaving the level of guarantee intact might be perceived as a
rather small product modification by the policyholder, it has the same effect on the
insurer’s profitability as reducing the level of guarantee by a significant amount.
Furthermore, our results indicate that even in an adverse capital market situation
the alternative product designs may still provide an acceptable level of profitability:
The profitability of the modified products if interest rates were 50 basis points lower
roughly coincides with the profitability of the traditional product in the base case.
202 A. Reuß et al.
Table 7 PVFP for stress level and PVFP difference between basic and stress level
Traditional product (%) Alternative 1 (%) Alternative 2 (%)
PVFP(basic) 3.63 4.24 4.25
PVFP(stress) 0.90 2.58 2.60
ΔPVFP 2.73 1.66 1.65
In order to assess the robustness of the results presented in the previous section, we
investigate three different sensitivities:
1. Interest rate sensitivity: The long-term average θ and initial rate r0 in Table 4 are
replaced by θ = 2.0 %, r0 = 1.5 % for the basic level, and θ = 1.0 %, r0 = 0.5 %
for the stress level.
2. Stock ratio sensitivity: The stock ratio is set to q = 10 % instead of 5 %.
3. Initial buffer sensitivity: The initial bonus reserve B Rt = AVt − A Rt is doubled
for all contracts.21
The results are given in Table 8.
Interest rate sensitivity If the assumed basic interest rate level is lowered by
100 basis points, the PVFP decreases and the TVOG increases significantly for all
products. In particular, the alternative contract designs now also exhibit a significant
TVOG. This shows that in an adverse capital market situation, also the guaran-
tees embedded in the alternative contract designs can lead to a significant risk for
the shareholder and an asymmetric distribution of profits as illustrated in Fig. 5.
Nevertheless, the alternative contract designs are still much more profitable and less
volatile than the traditional contract design and the changes in PVFP/TVOG are
much less pronounced than for the traditional product: while the TVOG rises from
0.63 to 2.13 %, i.e., by 1.50 pp for the traditional product, it rises by only 0.76 pp
(from 0.02 to 0.78 %) for alternative 1.
As expected, an additional interest rate stress now results in a larger SCRint . For
all product designs, the PVFP after stress is negative and the capital requirement
increases significantly. However, as in the base case (cf. Table 7), the SCRint for
the traditional product is more than one percentage point larger than for the new
products.
Stock ratio sensitivity The stock ratio sensitivity also leads to a decrease of PVFP
and an increase of TVOG for all products. Again, the effect on the PVFP of the
traditional product is much stronger: The profit is about cut in half (from 3.63 to
1.80 %), while for the alternative 1 product the reduction is much smaller (from 4.24
to 3.83 %), and even smaller for alternative 2 (from 4.25 to 3.99 %). It is noteworthy
that with a larger stock ratio of q = 10 % the difference between the two alternative
21 The initial book and market values of the assets are increased proportionally to cover this addi-
tional reserve.
Participating Life Insurance Contracts under Risk Based Solvency Frameworks … 203
Table 8 PVFP, TVOG, PVFP under interest rate stress and ΔPVFP for base case and all
sensitivities
Base case Traditional product (%) Alternative 1 (%) Alternative 2 (%)
PVFP 3.63 4.24 4.25
TVOG 0.63 0.02 0.01
PVFP(stress) 0.90 2.58 2.60
ΔPVFP 2.73 1.66 1.65
Interest rate sensitivity
PVFP 0.90 2.58 2.60
TVOG 2.13 0.78 0.76
PVFP(stress) −4.66 −1.81 −1.76
ΔPVFP 5.56 4.39 4.36
Stock ratio sensitivity
PVFP 1.80 3.83 3.99
TVOG 2.45 0.43 0.26
PVFP(stress) −1.43 1.65 1.92
ΔPVFP 3.23 2.18 2.07
Initial buffer sensitivity
PVFP 3.74 4.39 4.39
TVOG 0.64 <0.01 <0.01
PVFP(stress) 1.02 2.87 2.91
ΔPVFP 2.72 1.52 1.48
Fig. 5 Histogram of PVFP(n) for interest rate sensitivity (−100 basis points)
motivation in Sect. 2: For the alternative products, larger surpluses from previous
years reduce risk in future years.22 Furthermore, the stressed PVFPs imply that the
decrease of capital requirement is significantly larger for the alternative products:
0.14 % reduction (from 1.66 to 1.52 %) for alternative 1 and 0.17 % reduction (from
1.65 to 1.48 %) for alternative 2, compared to just 0.01 % reduction for the traditional
product.
So far we have only considered contracts with a different type of guarantee. We will
now analyze contracts with a lower level of guarantee, i.e., products where i p < ir .
If we apply a pricing rate of i p = 1.25 % instead of 1.75 %, the annual premium
required to achieve the same guaranteed maturity benefit rises by approx. 5.4 %,
which results in an additional initial buffer for this contract design. For the sake of
comparison, we also calculate the results for the traditional product with a lower
guaranteed interest rate i = 1.25 %. The respective portfolios at t = 0 are derived
using the assumptions described in Sect. 4.1.
The results are presented in Table 9. We can see that the PVFP is further increased
and the TVOG is very close to 0 for the modified alternative products, which implies
an almost symmetric distribution of the PVFP. The TVOG can even become slightly
negative due to the additional buffer in all scenarios. Although the risk situation for
the traditional product is also improved significantly due to the lower guarantee, the
22 From this, we can conclude that if such alternative products had been sold in the past, the risk
situation of the life insurance industry would be significantly better today in spite of the rather high
nominal maturity guarantees for products sold in the past.
Participating Life Insurance Contracts under Risk Based Solvency Frameworks … 205
Table 9 PVFP, TVOG, PVFP under interest rate stress and ΔPVFP for the alternative products
with lower pricing rate
Traditional Alternative Alternative Traditional Alternative 1 Alternative 2
product (%) 1 (%) 2 (%) i = 1.25 (%) i p = 1.25 (%) i p = 1.25 (%)
PVFP 3.63 4.24 4.25 4.12 4.31 4.31
TVOG 0.63 0.02 0.01 0.14 −0.05 −0.05
PVFP 0.90 2.58 2.60 2.43 3.28 3.32
(stress)
ΔPVFP 2.73 1.66 1.65 1.69 1.03 0.99
alternative products can still preserve their advantages. A more remarkable effect
can be seen for the SCRint , which amounts to 1.03 and 0.99 % for the alternative
products 1 and 2, respectively, compared to 1.69 % for the traditional product. Hence,
the buffer leads to a significant additional reduction of solvency capital requirements
for the alternative products meaning that these are less affected by interest rate risk.
In this paper, we have analyzed different product designs for traditional participating
life insurance contracts with a guaranteed maturity benefit. A particular focus of our
analysis was on the impact of product design on capital requirements under risk-based
solvency frameworks such as Solvency II and on the insurer’s profitability.
We have performed a market consistent valuation of the different products and
have analyzed the key drivers of Capital Efficiency, particularly the value of the
embedded options and guarantees and the insurer’s profitability.
As expected, our results confirm that products with a typical year-to-year guaran-
tee are rather risky for the insurer, and hence result in a rather high capital requirement.
Our proposed product modifications significantly enhance Capital Efficiency, reduce
the insurer’s risk, and increase profitability. Although the design of the modified prod-
ucts makes sure that the policyholder receives less than with the traditional product
only in extreme scenarios, these products still provide a massive relief for the insurer
since extreme scenarios drive the capital requirements under Solvency II and SST.
It is particularly noteworthy that starting from a standard product where the guar-
anteed maturity benefit is based on an interest rate of 1.75 %, changing the type of
the guarantee to our modified products (but leaving the level of guarantee intact) has
the same impact on profitability as reducing the level of guarantee to an interest rate
of 0.9 % and not modifying the type of guarantee. Furthermore, it is remarkable that
the reduction of SCRint from the traditional to the alternative contract design is very
robust throughout our base case as well as all sensitivities and always amounts to
slightly above one percentage point.
206 A. Reuß et al.
We would like to stress that the product design approach presented in this paper
is not model arbitrage (hiding risks in “places the model cannot see”), but a real
reduction of economic risks. In our opinion, such concepts can be highly relevant in
practice if modified products keep the product features that are perceived and desired
by the policyholder, preserve the benefits of intertemporal risk sharing, and do away
with those options and guarantees of which policyholders often do not even know
they exist. Similar modifications are also possible for many other old age provision
products like dynamic hybrid products23 or annuity payout products. Therefore, we
expect that the importance of “risk management by product design” will increase.
This is particularly the case since—whenever the same pool of assets is used to back
new and old products—new capital efficient products might even help reduce the
risk resulting from an “old” book of business by reducing the required yield of the
pool of assets.
We, therefore, feel that there is room for additional research: It would be interesting
to analyze similar product modifications for the annuity payout phase. Also—since
many insurers have sold the traditional product in the past—an analysis of a change
in new business strategy might be worthwhile: How would an insurer’s risk and
profitability change and how would the modified products interact with the existing
business if the insurer has an existing (traditional) book of business in place and
starts selling modified products today?
Another interesting question is how the insurer’s optimal strategic asset allocation
changes if modified products are being sold: If typical criteria for determining an
optimal asset allocation are given (e.g., maximizing profitability under the restriction
that some shortfall probability or expected shortfall is not exceeded), then the c.p.
lower risk of the modified products might allow for a more risky asset allocation, and
hence also higher expected profitability for the insurer and higher expected surplus
for the policyholder. So, if this dimension is also considered, the policyholder would
be compensated for the fact that he receives a weaker type of guarantee.
Finally, our analysis so far has disregarded the demand side. If some insurers
keep selling the traditional product type, there should be little demand for the alter-
native product designs with reduced guarantees unless they provide some additional
benefits. Therefore, the insurer might share the reduced cost of capital with the poli-
cyholder, also resulting in higher expected benefits in the alternative product designs.
Since traditional participating life insurance products play a major role in old-age
provision in many countries and since these products have come under strong pressure
in the current interest environment and under risk-based solvency frameworks, the
concept of Capital Efficiency and the analysis of different product designs should be
of high significance for insurers, researchers, and regulators to identify sustainable
life insurance products. In particular, we would hope that legislators and regulators
would embrace sustainable product designs where the insurer’s risk is significantly
reduced, but key product features as perceived and requested by policyholders are
still present.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
22. Kochanski, M., Karnarski, B.: Solvency capital requirement for hybrid products. Eur. Actuar.
J. 1(2), 173–198 (2011)
23. Mitersen, K., Persson, S.-A.: Guaranteed investment contracts: distributed and undistributed
excess return. Scand. Actuar. J. 103(4), 257–279 (2003)
24. Oechslin, J., Aubry, O., Aellig, M.: Replicating embedded options. Life Pensions pp. 47–52
(2007)
25. Saxer, W.: Versicherungsmathematik. Springer, Berlin (1955)
26. Seyboth, M.: Der Market Consistent Appraisal Value und seine Anwendung im Rahmen der
wertorientierten Steuerung von Lebensversicherungsunternehmen. PhD thesis, University of
Ulm (2011)
27. Vasicek, O.: An equilibrium characterization of the term structure. J. Financ. Econ. 5(2), 177–
188 (1977)
28. Wolthuis, H.: Life Insurance Mathematics. CAIRE, Brussels (1994)
29. Zaglauer, K., Bauer, D.: Risk-neutral valuation of participating life insurance contracts in a
stochastic interest rate environment. Insur.: Math. Econ. 43(1), 29–40 (2008)
Reducing Surrender Incentives Through Fee
Structure in Variable Annuities
Abstract In this chapter, we study the effect of the fee structure of a variable annuity
on the embedded surrender option. We compare the standard fee structure offered
in the industry (fees set as a fixed percentage of the variable annuity account) with
periodic fees set as a fixed, deterministic amount. Surrender charges are also taken
into account. Under fairly general conditions on the premium payments, surrender
charges and fee schedules, we identify the situation when it is never optimal for the
policyholder to surrender. Solving partial differential equations using finite difference
methods, we present numerical examples that highlight the effect of a combination
of surrender charges and deterministic fees in reducing the value of the surrender
option and raising the optimal surrender boundary.
1 Introduction
amount accumulated in the account. Such surrenders represent an important risk for
VA issuers as the expenses linked to the sale of the policy are typically reimbursed
through the fees collected throughout the duration of the contract. As exposed by
Kling et al. [11], unexpected surrenders also compromise the efficiency of dynamic
hedging strategies.
There are various ways to reduce the incentive to surrender a VA contract with
guarantees. For example, insurance companies usually impose surrender charges,
which reduce the amount available at surrender. Milevsky and Salisbury [13] argue
that these charges are necessary for VA contracts to be both hedgeable and marketable.
The design of VA benefits can also discourage policyholders from surrendering. Kling
et al. [11] discuss for example the impact of ratchet options (possibility to reset the
maturity guarantee as the fund value increases) to convince policyholders to keep
the VA alive. Yet another way to reduce the incentive to surrender can be to modify
the way fees are paid from the VA account. As explained above, the typical constant
percentage fee structure leads to a mismatch between the fee paid and the value of
the financial guarantee, which can discourage the policyholder from staying in the
contract.1 By reducing the fee paid when the value of the financial guarantee is low, it
is possible to reduce the value of the real option to surrender embedded in a VA. The
new fee structure can take different forms. For example, Bernard et al. [2] suggest to
set a certain account value above which no fee will be paid. This is shown to modify
the rational policyholder’s surrender incentive. In this paper, we explore another fee
structure so that part of (or all) the fee is paid as a deterministic periodic amount. The
intuition behind this fee structure is that the amount will represent a lower percentage
of the account value as the value of the financial guarantee decreases. This will affect
the surrender incentive, and reduce the additional value created by the possibility to
surrender the contract.
To explore the effect of the deterministic fee amount on the surrender incentive, we
consider a VA with a simple GMAB. We assume that the total fee withdrawn from the
VA account throughout the term of the contract is set as the sum of a fixed percentage
c of the account value, and a deterministic, pre-determined amount pt at time t (in
other words, the deterministic amount does not need to be constant).2 Our paper
constitutes a significant extension of the results obtained on the optimal surrender
strategy for a fee set as a fixed percentage of the fund [4], since the deterministic fee
structure increases the complexity of the dynamics of the VA account value. For this
reason, we need to resort to PDE methods to obtain the optimal surrender strategy
1 Specifically, the policyholder has the option to surrender the contract and to receive a “surrender
benefit”, which can be more valuable than the contract itself. This additional value, as well as the
optimal surrender strategy, is explored and quantified by Bernard, MacKay, and Muehlbeyer in [4]
in the case when the fees are paid as a percentage of the underlying fund.
2 Note that the deterministic amount component of the fee can be interpreted as a variable percentage
of the account value Ft . In fact, let ρ denote the percentage of the fund value that yields the same
fee amount as the deterministic amount pt . Then, ρ is a function of time and of the fund value Ft ,
and can be computed as ρ(t, Ft ) = pt /Ft . Then, ρ(t, Ft )Ft = pt is the fee paid at time t.
Reducing Surrender Incentives Through Fee Structure in Variable Annuities 211
when a portion of the fee is set as a deterministic amount. This paper also extends the
work done on state-dependent fee structures, since Bernard et al. [2] do not quantify
the reduction in the surrender incentive resulting from the new fee structure.
Throughout the paper, our main goal is to investigate the impact of the deter-
ministic fee amount on the value of the surrender option. In Sect. 2, we describe the
model and the VA contract. Section 3 introduces a theoretical result and discusses
the valuation of the surrender option. Numerical examples are presented in Sects. 4
and 5 concludes.
Consider a market with a bank account yielding a constant risk-free rate r and an
index evolving as in the Black-Scholes model so that
dSt
= r dt + σ dWt ,
St
under the risk-neutral measure Q, where σ > 0 is the constant instantaneous volatility
of the index. Let Ft be the natural filtration associated with the Brownian motion Wt .
In this paper, we use a Black-Scholes setting since its simplicity allows us to
compute prices explicitly, and thus to study the surrender incentive precisely. More
realistic market models could be considered, but resorting to Monte Carlo methods or
more advanced numerical methods would be required. Since the focus of this paper is
on the surrender incentive, we believe that the Black-Scholes model’s approximation
of market dynamics is sufficient to provide insight on the effect of the deterministic
amount fee structure.
common in variable annuities but they are regularly neglected in the literature and
most academic research focuses on the single premium case as it is simpler. When
additional contributions can be made to the account throughout time, VAs are called
Flexible Premiums Variable Annuities (FPVAs). Chi and Lin [7] provide examples
of such VAs where the policyholder is given the choice between a single premium
and a periodic monthly payment in addition to some initial lump sum. Analytical
formulae for the value of such contracts can be found in [8, 10]. In the first part of
this chapter, we show how flexible premium payments influence the surrender value.
We assume that all premiums paid at 0 and at later times t are invested in the fund.
All fees (percentage or fixed fees) are taken from the fund. We need to model the
dynamics of the fund. Our approach is inspired by Chi and Lin [7]. For the sake of
simplicity, we assume that all cash flows happen in continuous time, so that a fixed
payment of A at time 1 (say, end of the year) is similar to a payment made continuously
over the interval [0, 1]. Due to the presence of a risk-free rate r , an amount paid at
time T equal to A is equivalent to an instantaneous contribution of at dt at any time
1
t ∈ (0, 1] so that the annual amount paid per year is A = 0 at er (1−t) dt. By abuse
of notation, if at is constant over the year, we will write that at is the annual rate of
contribution per year (although there is no compounding effect).
Specifically, the dynamics of the fund can be written as follows
with F0 = P0 , and where Ft denotes the value of the fund at time t, at is the annual
rate of contributions, c is the annual rate of fees, and pt is the annual amount of fee
to pay for the options. Similarly as [7] it is straightforward to show that
2
t
σ2
(r −c− σ2 )t+σ Wt
Ft = F0 e + (as − ps )e(r −c− 2 )(t−s)+σ (Wt −Ws ) ds, t ≥ 0,
0
that is
t
−ct St
Ft = St e + (as − ps )e−c(t−s) ds, (1)
Ss
0
t
−ct St
Ft = St e + bs e−c(t−s) ds, (2)
Ss
0
case, or if the regular premiums are very low. We will split bs into contributions as
and deterministic fees ps when it is needed for the interpretation of the results.
This formulation can be seen as an extension of the case studied in [7], where it
is assumed that a constant contribution parameter at = a for all t and there is no
periodic fees, so that pt = 0. It is clear from (2) that the fund value becomes path-
dependent and involves a continuous arithmetic average. Without loss of generality,
let F0 = S0 .
2.2 Benefits
We assume that there is a guaranteed minimum accumulation rate g < r on all the
contributions of the policyholder until time t so that the accumulated guaranteed
benefit G t at time t has dynamics
dG t = gG t dt + at dt
When the annual rate of contribution is constant (at = a), the guaranteed value can
be simplified to
e gt − 1
G t = P0 e + a
gt
1{g>0} + t1{g=0} .
g
Chi and Lin [7] develop techniques to price and hedge the guarantee at time t. Using
their numerical approach it is possible to estimate the fair fee for the European VA
(Proposition 3 in their paper).
As in [4, 13], we assume that the policyholder has the option to surrender the
policy at any time t and to receive a surrender benefit at surrender time equal to
(1 − κt )Ft
on surrender. In another example, “the surrender charge is 7 % during the first Con-
tract Year and decreases by 1 % each subsequent Contract Year. No surrender charge
is deducted for surrenders occurring in Contract Years 8 and later” [17].
In this section, we discuss the valuation of the variable annuity contract with maturity
benefit and surrender option.3 We first present a sufficient condition to eliminate the
possibility of optimal surrender. We then explain how we evaluate the value of the
surrender option using partial differential equations (PDEs). We consider a variable
annuity contract with maturity benefit only, which can be surrendered. We choose to
ignore the death benefits that are typically added to that type of contract since our
goal is to analyze the effect of the fee structure on the value of the surrender option.
We denote by υ(t, Ft ) and V (t, Ft ) the value of the contract without and with sur-
render option, respectively. In this paper, we ignore death benefits and assume that
the policyholder survives to maturity.4 Thus, the value of the contract without the
surrender option is simply the risk-neutral expectation of the payoff at maturity,
conditional on the filtration up to time t.
We assume that the difference between the value of the maturity benefit and the
full contract is only attributable to the surrender option, which we denote by e(t, Ft ).
Then, we have the following decomposition.
The value of the contract with surrender option is calculated assuming that the
policyholder surrenders optimally. This means that the contract is surrendered as
soon as its value drops below the value of the surrender benefit. To express the total
value of the variable annuity contract, we must introduce further notation. We denote
by Tt the set of all stopping times τ greater than t and bounded by T . Then, we can
express the continuation value of the VA contract as
3 In this paper, we quantify the value added by the possibility for the policyholder to surrender
his policy. We call it the surrender option, as in [13]. It is not a guarantee that can be added to the
variable annuity, but rather a real option created by the fact that the contract can be surrendered.
4 See [2] for instance for a treatment on how to incorporate mortality benefits.
Reducing Surrender Incentives Through Fee Structure in Variable Annuities 215
where
(1 − κt )x, if t ∈ (0, T )
ψ(t, x) =
max(G T , x), if t = T
is the payoff of the contract at surrender or maturity. Finally, we let St be the optimal
surrender region at time t ∈ [0, T ]. The optimal surrender region is given by the
fund values for which the surrender benefit is worth more than the VA contract if the
policyholder continues to hold it for at least a small amount of time. Mathematically
speaking, it is defined by
The complement of the optimal surrender region St will be referred to as the con-
tinuation region. We also define Bt , the optimal surrender boundary at time t, by
Bt = inf {Ft ∈ St }.
Ft ∈ [0,∞)
t
−ct St
Ft = e St + bs e−c(t−s) ds, t 0,
Ss
0
t+dt
−c(t+dt) St+dt
Ft+dt = e St+dt + bs e−c(t+dt−s) ds.
Ss
0
Proposition 3.1 (Sufficient condition for no surrender) For a fixed time t ∈ [0, T ],
a sufficient condition to eliminate the surrender incentive at time ‘t’ is given by
where κt = ∂κt /∂t. Here, are some special cases of interest:
• When at = pt = 0 (no periodic investment, no periodic fee) and κt = 1−e−κ(T −t)
(situation considered by [4]) then bt = 0 and (5) becomes
κ > c.
• When at = 0 (no periodic investment, i.e., a single lump sum paid at time 0), then
bt = − pt ≤ 0. Assume that pt > 0 so that bt < 0 thus
– If κt + (1 − κt )c > 0 (for example if κ is constant), then the condition can never
be satisfied and no conclusion can be drawn.
– If κt + (1 − κt )c < 0 then it is not optimal to surrender when
− pt (1 − κt )
Ft > .
κt + (1 − κt )c
When κt = κ and bt = b are constant over time, condition (5) can be rewritten as
b(1 − κ) b
Ft < = .
c(1 − κ) c
Remark 3.1 Proposition 3.1 shows that in the absence of periodic fees and invest-
ment, an insurer can easily ensure that it is never optimal to surrender by choosing a
surrender charge equal to 1 − e−κt at time t, with a penalty parameter κ higher than
the percentage fee c. Proposition 3.1 shows that it is also possible to eliminate the
surrender incentive when there are periodic fees and investment opportunities, but
the conditions are more complicated.
Proof Consider a time t at which it is optimal to surrender. This implies that for any
time interval of length dt > 0, it is better to surrender at time t than to wait until
time t + dt. In other words, the surrender benefit at time t must be at least equal to
the expected discounted value of the contract at time t + dt, and in particular larger
than the surrender benefit at time t + dt. Thus
Using the martingale property for the discounted stock price St and the independence
of increments
for the
Brownian
motion, we know that E[St+dt e−r dt ] = St and
E St+dt
St Ft = E
St+dt
St = er dt thus
Reducing Surrender Incentives Through Fee Structure in Variable Annuities 217
t
−r dt −c(t+dt) St
E[e Ft+dt |Ft ] = e St + bs e−c(t+dt−s) ds
Ss
0
t+dt
−c(t+dt−s) −r dt St+dt
+ bs e e E ds,
Ss
t
t
t+dt
−c(t+dt) −c(t+dt−s) St
=e St + bs e ds + bs e−c(t+dt−s) ds,
Ss
0 t
t+dt
Thus ⎛ ⎞
t+dt
t+dt
We then use κt+dt = κt + κt dt + o(dt), e−cdt = 1 − cdt + o(dt) and t bs e−c(t−s)
ds = bt dt + o(dt) to obtain
where the function j (dt) is o(dt). Since this holds for any dt > 0, we can divide (7)
by dt and take the limit as dt → 0. Then, we get that if it is optimal to surrender the
contract at time t, then
(κt + (1 − κt )c)Ft ≥ bt (1 − κt ).
To evaluate the surrender option e(t, Ft ), we subtract the value of the maturity benefit
from the value of the VA contract. These values can be compared to American and
European options, respectively, since the guarantee in the former is only triggered
218 C. Bernard and A. MacKay
when the contract expires, while the latter can be exercised at any time before matu-
rity.
From now on, we assume that the deterministic fee pt is constant over time, so that
pt = p for any time t. We also assume that the policyholder makes no contribution
after the initial premium (so that at = 0 for any t).
It is well-known5 that the value of a European contingent claim on the fund value
Ft follows the following PDE:
∂υ 1 ∂ 2υ 2 2 ∂υ
+ Ft σ + (Ft (r − c) − p) − r υ = 0. (8)
∂t 2 ∂ Ft2 ∂ Ft
Note that Eq. (8) is very similar to the Black-Scholes equation for a contingent
claim on a stock that pays dividends (here, the constant fee c represents the dividends),
with the addition of the term ∂∂υ
Ft p resulting from the presence of a deterministic fee.
Since it represents the contract described in Sect. 2, Eq. (8) is subject to the following
conditions:
υ(T, FT ) = max(G T , FT )
lim υ(t, Ft ) = G T e−r (T −t) .
Ft →0
The last condition results from the fact that when the fund value is very low, the
guarantee is certain to be triggered. When Ft → ∞, the problem is unbounded.
However, we have the following asymptotic behavior:
which stems from the value of the guarantee approaching 0 for very high fund values.
We will use this asymptotic result to solve the PDE numerically, when truncating the
grid of values for Ft . The expectation in (9) is easily calculated and is given in the
proof of Proposition 3.1.
As it is the case for the American put option,6 the VA contract with surrender option
gives rise to a free boundary problem. In the continuation region, V ∗ (t, Ft ) follows
Eq. (8), the same equation as for the contract without surrender option. However, in
the optimal surrender region, the value of the contract with surrender is the value of
the surrender benefit:
For the contract with surrender, the PDE to solve is thus subject to the following
conditions:
V ∗ (T, FT ) = max(G T , FT )
lim V ∗ (t, Ft ) = G T e−r (T −t)
Ft →0
lim V ∗ (t, Ft ) = ψ(t, Bt ).
Ft →Bt
∂ ∗
lim V (t, Ft ) = 1 − κt .
Ft →Bt ∂ Ft
For any time t ∈ [0, T ], the value of the VA with surrender is given by
4 Numerical Example
To price the VA using a PDE approach, we modify Eq. (8) to express it in terms
of xt = log Ft . We discretize the resulting equation over a rectangular
√ grid with
time steps dt = 0.0001 (dt = 0.0002 for T = 15) and dx = σ 3dt (following
suggestions by Racicot and Théoret [16]), from 0 to T in t and from 0 to log 450 in
x. We use an explicit scheme with central difference in x and in x 2 .
Throughout this section, we assume that the contract is priced so that only the
maturity benefit is covered. In other words, we set c and p such that
P0 = υ(t, Ft ), (11)
where P0 denotes the initial premium paid by the policyholder. In this section, when
the fee is set in the manner, we call it the fair fee, even if it does not cover the full
value of the contract. We set the fee in this manner to calculate the value added by
the possibility to surrender.
We now consider variable annuities with the maturity benefit described in Sect. 2.
We assume that the initial premium P0 = 100, that there are no periodic premium
(as = a = 0), that the deterministic fee is constant ( pt = p) and that the guaranteed
roll-up rate is g = 0. We further assume that the surrender charge, if any, is of the
form κt = 1 − eκ(T −t) , and that r = 0.03 and σ = 0.2.
For contracts with and without surrender charge and with maturity 5, 10 and
15 years, the results are presented in Table 1. In each case, the fee levels c and p are
chosen such that P0 = υ(t, Ft ). As a percentage of the initial premium, the fair fee
220 C. Bernard and A. MacKay
Table 1 Value of the surrender option in 5-, 10- and 15-year variable annuity contracts for various
fee structures and surrender charges
T =5 T = 10 T = 15
Surrender option Surrender option Surrender option
Fee κ Fee κ Fee κ
c (%) p 0 % 0.5 % c (%) p 0 % 0.5 % c (%) p 0 % 0.4 %
0.00 4.150 3.09 2.09 0.00 2.032 3.07 1.02 0.00 1.259 2.76 0.23
1.00 2.971 3.32 2.33 0.50 1.387 3.50 1.46 0.30 0.842 3.30 0.77
2.00 1.796 3.56 2.57 1.00 0.744 3.92 1.89 0.60 0.427 3.84 0.84
3.53 0.000 3.92 2.94 1.58 0.000 4.43 2.39 0.91 0.000 4.40 1.86
For the 15-year contract, we lowered the surrender charge parameter to κ = 0.4 % to ensure that
the optimal surrender boundary is always finite
when it is paid as a deterministic amount is higher than the fair constant percentage
fee. In fact, for high fund values, the deterministic fee is lower than the amount paid
when the fee is set as a constant percentage. But when the fund value is low, the
deterministic fee represents a larger proportion of the fund compared to the constant
percentage fee. This higher proportion drags the fund value down and increases the
option value. The effect of each fee structure on the amount collected by the insurer
can explain the difference between the fair fixed percentage and deterministic fees.
The results in Table 1 show that when the fee is set as a fixed amount, the value of
the surrender option is always lower than when the fee is expressed as a percentage
of the fund. When a mix of both types of fees is applied, the value of the surrender
option decreases as the fee set as a percentage of the fund decreases. When the fee
is deterministic, a lower percentage of the fund is paid out when the fund value is
high. Consequently, the fee paid by the policyholder is lower when the guarantee
is worth less, reducing the surrender incentive. This explains why the value of the
surrender option is lower for deterministic fees. This result can be observed both with
and without surrender charges. However, surrender charges decrease the value of the
surrender option, as expected. The effect of using a deterministic amount fee, instead
of a fixed percentage, is even more noticeable when a surrender charge is added. A
lower surrender option value means that the possibility to surrender adds less value
to the contract. In other words, if the contract is priced assuming that policyholders
do not surrender, unexpected surrenders will result in a smaller loss, on average.
Figure 1 shows the optimal surrender boundaries for the fee structures presented
in Table 1 for 10-year contracts. As expected, the optimal boundaries are higher
when there is a surrender charge. Those charges are put in place in part to discourage
policyholders from surrendering early. The boundaries are also less sensitive to the
fee structure when there is a surrender charge. In fact, when there is a surrender
charge, setting the fee as a fixed amount leads to a higher optimal boundary during
most of the contract. This highlights the advantage of the fixed amount fee structure
combined with surrender charges. Without those charges, the fixed fee amount could
lead to more surrenders. We also note that the limiting case p = 0 corresponds to
Reducing Surrender Incentives Through Fee Structure in Variable Annuities 221
160
160
150
150
140
140
Fund value
Fund value
130
130
c=0, p=2.0321 c=0, p=2.0321
120
120
c=0.0050, p=1.3875 c=0.0050, p=1.3875
c=0.0100, p=0.7443 c=0.0100, p=0.7443
c=0.0158, p=0 c=0.0158, p=0
110
110
100
100
0 2 4 6 8 10 0 2 4 6 8 10
Time (in years) Time (in years)
κ =0 κ = 0.005
the situation when fees are paid as a percentage of the fund. The optimal boundary
obtained using the PDE approach in this paper coincides with the optimal boundary
derived in [4] by solving an integral equation numerically.
Table 1 also shows the effect of the maturity combined with the fee structure on
the surrender option. For all maturities, setting the fee as a fixed amount instead of
a fixed percentage has a significant effect on the value of the surrender option. This
effect is amplified for longer maturities. As for the 10-year contract, combining the
fixed amount fee with a surrender charge further reduces the value of the surrender
option, especially when T = 15. The optimal surrender boundaries for different fee
200
200
180
180
Fund value
Fund value
160
160
140
140
120
120
0 5 10 15 0 5 10 15
Time (in years) Time (in years)
κ =0 κ = 0.004
structures when T = 15 are presented in Fig. 2. For longer maturities such as this
one, the combination of surrender charges and deterministic fee raises the surrender
boundary more significantly.
In all cases, the decrease in the value of the surrender option caused by the com-
bination of a deterministic amount fee and a surrender charge is significant. In our
example with a 15-year contract, moving from a fee entirely set as a fixed percentage
to a fee set as a deterministic amount reduces the value of the surrender option by
over 85 %. This is surprising since the shift in the optimal surrender boundary is
not as significant (as can be observed in Figs. 1 and 2). A possible explanation for
the sharp decrease in the surrender option value is that the fee income lost when a
policyholder surrenders when the account value is high is less important, relatively
to the value of the guarantee, than in the constant percentage fee case.
5 Concluding Remarks
In this chapter, the maturity guarantee fees are paid during the term of the contract as
a series of deterministic amounts instead of a percentage of the fund, which is more
common in the industry. We give a sufficient condition that allows the elimination
of optimal surrender incentives for variable annuity contracts with fairly general fee
structures. We also show how deterministic fees and surrender charges affect the
value of the surrender option and the optimal surrender boundary. In particular, we
highlight the efficiency of combining deterministic fees and exponential surrender
charges in decreasing the value of the surrender option. In fact, although the optimal
surrender boundary remains at a similar level, a fee set as a deterministic amount
reduces the value of the surrender option, which makes the contract less risky for the
insurer. This result also suggests that the state-dependent fee suggested in [2] could
also be efficient in reducing the optimal surrender incentive. Future work could focus
on more general payouts (see for example [12] for ratchet and lookback options [4]
for Asian benefits) in more general market models, and include death benefits.
Acknowledgments Both authors gratefully acknowledge support from the Natural Sciences and
Engineering Research Council of Canada and from the Society of Actuaries Center of Actuarial
Excellence Research Grant. C. Bernard thanks the Humboldt Research Foundation and the hospital-
ity of the chair of mathematical statistics of Technische Universität München where the paper was
completed. A. MacKay also acknowledges the support of the Hickman scholarship of the Society of
Actuaries. We would like to thank Mikhail Krayzler for inspiring this chapter by raising a question
at the Risk Management Reloaded conference in Munich in September 2013 about the impact of
fixed deterministic fees on the surrender boundary.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
Reducing Surrender Incentives Through Fee Structure in Variable Annuities 223
References
1. Bauer, D.K., Kling, A., Russ, J.: A universal pricing framework for guaranteed minimum
benefits in variable annuities. ASTIN Bull. 38(2), 621 (2008)
2. Bernard, C., Hardy, M., MacKay, A.: State-dependent fees for variable annuity guarantees.
ASTIN Bull. (2014) forthcoming
3. Bernard, C., Lemieux, C.: Fast simulation of equity-linked life insurance contracts with a
surrender option. In: Proceedings of the 40th Conference on Winter Simulation, pp. 444–452.
Winter Simulation Conference (2008)
4. Bernard, C., MacKay, A., Muehlbeyer, M.: Optimal surrender policy for variable annuity
guarantees. Insur. Math. Econ. 55(C), 116–128 (2014)
5. Björk, T.: Arbitrage Theory in Continuous Time. Oxford University Press, Oxford (2004)
6. Carr, P., Jarrow, R., Myneni, R.: Alternative characterizations of American put options. Math.
Financ. 2(2), 87–106 (1992)
7. Chi, Y., Lin, S.X.: Are flexible premium variable annuities underpriced? ASTIN Bull. (2013)
forthcoming
8. Costabile, M.: Analytical valuation of periodical premiums for equity-linked policies with
minimum guarantee. Insur. Math. Econ. 53(3), 597–600 (2013)
9. Hardy, M.R.: Investment Guarantees: Modelling and Risk Management for Equity-Linked Life
Insurance. Wiley, New York (2003)
10. Hürlimann, W.: Analytical pricing of the unit-linked endowment with guarantees and periodic
premiums. ASTIN Bull. 40(2), 631 (2010)
11. Kling, A., Ruez, F., Ruß, J.: The impact of policyholder behavior on pricing, hedging, and
hedge efficiency of withdrawal benefit guarantees in variable annuities. Eur. Actuar. J. (2014)
forthcoming
12. Krayzler, M., Zagst, R., Brunner, B.: Closed-form solutions for guaranteed minimum accumu-
lation benefits. Working Paper available at SSRN: http://ssrn.com/abstract=2425801(2013)
13. Milevsky, M.A., Salisbury, T.S.: The real option to lapse a variable annuity: can surrender
charges complete the market. In: Conference Proceedings of the 11th Annual International
AFIR Colloquium (2001)
14. New York Life: “Premier Variable Annuity”, fact sheet, http://www.newyorklife.com/nyl-
internet/file-types/NYL-Premier-VA-Fact-Sheet-No-CA-NY.pdf (2014). Accessed 10 May
2014
15. Palmer, B.: Equity-indexed annuities: fundamental concepts and issues, working Paper (2006)
16. Racicot, F.-E., Théoret, R.: Finance computationnelle et gestion des risques. Presses de
l’Université du Quebec (2006)
17. Thrivent financial: flexible premium deferred variable annuity, prospectus, https://www.
thrivent.com/insurance/autodeploy/Thrivent_VA_Prospectus.pdf (2014). Accessed 12 May
2014
A Variational Approach
for Mean-Variance-Optimal Deterministic
Consumption and Investment
Marcus C. Christiansen
1 Introduction
solutions are given. That means that the general existence of solutions remains
unclear. We fill that gap, allowing for a slightly more general model with non-constant
Black-Scholes market parameters. By applying a Pontryagin maximum principle, we
additionally verify that the sufficient conditions of Christiansen and Steffensen [9]
for optimal controls are actually necessary. Furthermore, we present an alternative
numerical algorithm for the calculation of optimal controls. Therefore, we make use
of the variational idea behind the Pontryagin maximum principle. In a first step,
we define generalized gradients for our objective function, which, in a second step,
allows us to construct a gradient ascent method.
Mean-variance investment is a true classic since the seminal work by Markowitz
[16]. Since then various authors have improved and extended the results, see for exam-
ple Korn and Trautmann [12], Korn [13], Zhou and Li [18], Basak and Chabakauri
[3], Kryger and Steffensen [15], Kronborg and Steffensen [14], Alp and Korn [1],
Björk, Murgoci and Zhou [5] and others.
Deterministic optimal control is fundamental in Herzog et al. [11] and Geering
et al. [10]. But apart from other differences, they disregard income and consumption
and focus on the pure portfolio problem without cash flows. Bäuerle and Rieder
[2] study optimal investment for both, adapted stochastic strategies and determin-
istic strategies. They discuss various objectives including mean-variance objectives
under constraints. In the present paper, we discuss an unconstrained mean-variance-
objective and we also control for consumption.
The paper is structured as follows. In Sect. 2, we set up a basic model framework
and specify the optimal consumption and investment problem that we discuss here.
In Sect. 3, we present an existence result for the optimal control. Section 4 derives
necessary conditions for optimal controls by applying a Pontryagin maximum prin-
ciple. Section 5 defines and calculates generalized gradients for the objective, which
helps to set up a numerical optimization algorithm in Sect. 6. In Sect. 7 we illustrate
the numerical algorithm.
where α(t) > r (t) ≥ 0, σ (t) > 0 and α, σ ∈ C([0, T ]). We write π(t) for the
proportion of the total value invested in stocks and call it the investment strategy.
The wealth process X (t) is assumed to be self-financing. Thus, it satisfies
dX (t) = X (t)(r (t) + (α(t) − r (t))π(t))dt + (a(t) − c(t))dt + X (t)σ (t)π(t)dW (t)
(1)
with initial value X (0) = x0 and has the explicit representation
t t t
X (t) = x0 e 0 dU
+ (a(s) − c(s))e s dU
ds, (2)
0
where
1
dU (t) = (r (t) + (α(t) − r (t))π(t) − σ (t)2 π(t)2 )dt
2
+ σ (t)π(t)dW (t), U (0) = 0. (3)
It is important to note that the process (X (t))t≥0 depends on the choice of the invest-
ment strategy (π(t))t≥0 and the consumption rate (c(t))t≥0 . In order to make that
dependence more visible, we will also write X = X (π,c) . For some arbitrary but
fixed risk aversion parameter γ > 0 of the investor, we define the risk measure
M Vγ [ · ] := E[ · ] − γ V ar [ · ].
with respect to the investment strategy π and the consumption rate c. The parameter
ρ ≥ 0 describes the preference for consuming today instead of tomorrow.
with lower and upper consumption bounds c, c ∈ B([0, T ]). Then, the functional G
is continuous and has a finite upper bound.
The right hand side of (8) is maximal with respect to π(t) for
Recall that we assumed γ > 0 and σ (t) > 0, so the first and second denominator are
never zero. If the third denominator E[X (t)2 ] is zero, we implicitly get E[X (t)] = 0,
and (10) is still true by defining 0/0 := 0. The first line on the right hand side of
(10) has an upper bound of
and the inequalities (E[X (t)])2 ≤ E[X (t)2 ] and Var[X (t)] ≤ E[X (t)2 ], we can
show that the second line on the right hand side of (10) has an upper bound of
1 (α(t) − r (t))2
1 + 4γ |E[Y (t)]| dt.
4γ σ (t)2
for some finite positive constants C1 and C2 , since the functions r (t), a(t), α(t) are
uniformly bounded on [0, T ], since −c(t) ≤ −c(t) for a uniformly bounded function
c, and since the positive and continuous function σ (t) has a uniform lower bound
greater than zero. Thus, we have E[Y (t)] ≤ g(t) for g(t) defined by the differential
equation
dg(t) = C1 |g(t)| + C2 dt, g(0) = Y (0) = x0 > 0. (12)
This differential equation for g(t) has a unique solution, which is bounded on [0, T ]
and does not depend on the choice of (π, c). Hence, also M Vγ [X (T )] = E[Y (T )]
has a finite upper bound that does not depend on the choice of (π, c). The same is
true for the functional (4), since
T
G(π, c) ≤ e−ρs c(s)ds + e−ρT M Vγ e−ρT [X (T )] .
0
Now we show the continuity of the functional G. Suppose that (πn , cn )n≥1 is
an arbitrary but fixed sequence in D that converges to (π0 , c0 ) with respect to the
supremum norm. Since D is a Banach space, the limit (π0 , c0 ) is also an element of
D. Let X n (t) := X (πn ,cn ) (t) for all t. As the sequence (πn , cn )n≥1 is convergent and
within D, the absolutes |πn (t)| and |cn (t)| have finite upper bounds, uniformly in n
and uniformly in t. Therefore, analogously to inequality (11), from Eq. (6) we get
that
dE[X n (t)] ≤ C3 |E[X n (t)]| + C4 dt, t ∈ [0, T ], n = 0, 1, 2, . . .
for some positive finite constants C3 and C4 . Arguing analogously to (12), we obtain
that E[X n (t)] ≤ f (t) for some bounded function f (t). Using similar arguments for
−E[X n (t)], we get that also the absolute |E[X n (t)]| is uniformly bounded in n and
230 M.C. Christiansen
Using the uniform boundedness of |E[X n (t)]|, |πn (t)| and |cn (t)|, we can conclude
that
dE[X n (t)2 ] ≤ C5 E[X n (t)2 ] + C6 dt, t ∈ [0, T ], n = 0, 1, 2, . . .
for some positive finite constants C5 and C6 . Hence, arguing analogously to above,
the value E[X n (t)2 ] is uniformly bounded in n and in t. Let Yn (t) be the process
according to definition (5) but with X n instead of X . Using (8) and the uniform
boundedness of |E[X n (t)]|, E[X n (t)2 ], |πn (t)| and |cn (t)|, we can show that
dE[Y0 (t) − Yn (t)] ≤ C7 sup |π0 (t) − πn (t)| + sup |c0 (t) − cn (t)| dt, t ∈ [0, T ]
t∈[0,T ] t∈[0,T ]
where we used that Y0 (0) − Yn (0) = x0 − x0 = 0. Arguing similarly for −E[Y0 (t) −
Yn (t)], we can conclude that
|G(π0 , c0 ) − G(πn , cn )|
T
= e−ρs (c0 (s) − cn (s))ds + e−ρT M Vγ e−ρT [X 0 (T )] − e−ρT M Vγ e−ρT [X n (T )]
0
≤T sup |c0 (t) − cn (t)| + e−ρT E[Y n (T )]
0 (T ) − Y
t∈[0,T ]
≤ T C8 sup |π0 (t) − πn (t)| + 2T sup |c0 (t) − cn (t)|
t∈[0,T ] t∈[0,T ]
for some finite constant C8 , where the processes Y 0 (t) and Yn (t) are defined as
above but with γ replaced by γ e−ρT . Since we assumed that (πn , cn )n≥1 converges in
supremum norm, we obtain that G(πn , c0 ) converges to G(π0 , c0 ), i.e. the functional
G is continuous.
A Variational Approach for Mean-Variance-Optimal Deterministic Consumption … 231
sup G(π, c)
(π,c)∈D
indeed exists. Since G is continuous and D is a Banach space, we can conclude that
on each compact subset K of D there exists a pair (π ∗ , c∗ ) for which
Christiansen and Steffensen [9] identify characterizing equations for optimal invest-
ment and consumption rate by using a Hamilton-Jacobi-Bellman approach. Here, we
show that those characterizing equations are indeed necessary by using a Pontryagin
maximum principle (cf. Bertsekas [4]).
Defining the moment functions
as in Christiansen and Steffensen [9], we can represent the objective function G(π, c)
by
T
G(π, c) = e−ρs c(s)ds + e−ρT (m 1 (t)n 1 (t) + p1 (t))
0
−γ e−2ρT m 2 (t)n 2 (t) + 2m 1 (t)k(t) + p2 (t) − (m 1 (t)n 1 (t) + p1 (t))2
(15)
d
dt m 1 (t) = (r
(t) + (α(t) − r (t))π(t)) m 1 (t) + (a(t) −
c(t)),
d m (t) = 2r (t) + 2 (α(t) − r (t)) π(t) + π(t)2 σ (t)2 m (t) + 2 (a(t) − c (t)) m (t).
dt 2 2 1
(16)
Theorem 2 Let (π ∗ , c∗ ) be an optimal control in the sense of (13), and let m i∗ (t),
pi∗ (t), n i∗ (t), i = 1, 2, and k ∗ (t) be the corresponding moment functions according
to (14). Then, we have necessarily
eρT m ∗ (t)n ∗ (t) − 2γ m ∗ (t) k ∗ (t) − n ∗ (t)m ∗ (T )
α(t) − r (t) 1 1 1 1 1
π ∗ (t) = −1 ,
σ (t)2 2γ m ∗2 (t)n ∗2 (t)
(17)
c(t) if eρ(T −t) − n ∗1 (t) + 2γ e−ρT m ∗1 (t)n ∗2 (t) + k ∗ (t) − n ∗1 (t)m ∗1 (T ) > 0
c∗ (t) =
c(t) else.
(18)
t0
∗ ∗ ε ε
G(π , c ) − G(π , c ) = − e−ρs l(s)ds + e−ρT m ∗1 (t0 ) − m ε1 (t0 ) n ∗1 (t0 )
t0 −ε
− γ e−2ρT m ∗2 (t0 ) − m ε2 (t0 ) n ∗2 (t0 ) + 2 m ∗1 (t0 ) − m ε1 (t0 ) k ∗ (t0 )
− m ∗1 (t0 )2 − m ε1 (t0 )2 n ∗1 (t0 )2 − 2(m ∗1 (t0 )
− m ε1 (t0 ))n ∗1 (t0 ) p1∗ (t0 )
t0
=− e−ρs l(s)ds + m ∗1 (t0 ) − m ε1 (t0 )
t0 −ε
× e−ρT n ∗1 (t0 ) − 2γ e−2ρT k ∗ (t0 ) − n ∗1 (t0 ) p1∗ (t0 )
− m ∗1 (t0 )2 − m ε1 (t0 )2 γ e−2ρT n ∗1 (t0 )2
∗
− m 2 (t0 ) − m ε2 (t0 ) γ e−2ρT n ∗2 (t0 ) (19)
t0
d ∗ d
m ∗1 (t0 ) − m ε1 (t0 ) = m 1 (s) − m ε1 (s) ds
ds ds
t0 −ε
t0
= r (s) + (α(s) − r (s))π ∗ (s) m ∗1 (s)
t0 −ε
A Variational Approach for Mean-Variance-Optimal Deterministic Consumption … 233
− r (s) + (α(s) − r (s))π ε (s) m ε1 (s) + l(s) ds,
t0 t0
m ∗1 (t0 ) − m ε1 (t0 ) = −(α(t0 ) − r (t0 )) m ∗1 (t0 ) h(s)ds + l(s)ds + o(ε).
t0 −ε t0 −ε
(20)
t0
m ∗1 (t0 )2 − m ε1 (t0 )2 = −2(α(t0 ) − r (t0 )) m ∗1 (t0 )2 h(s)ds
t0 −ε
t0
+ 2m ∗1 (t0 ) l(s)ds + o(ε).
t0 −ε
t0
o(ε) ≤ l(s)ds − eρt0 + e−ρT n ∗1 (t0 ) − 2γ e−2ρT m ∗1 (t0 )n ∗2 (t0 ) + k ∗ (t0 )
t0 −ε
− n ∗1 (t0 ) n ∗1 (t0 )m ∗1 (t0 ) + p1∗ (t0 )
234 M.C. Christiansen
t0
+ h(s) ds (α(t0 ) − r (t0 )) − m ∗1 (t0 )n ∗1 (t0 )e−ρT − 2γ e−2ρT m ∗1 (t0 )
t0 −ε
∗
∗
× −k (t0 ) + n ∗1 (t0 ) ∗ ∗
n 1 (t0 )m 1 (t0 ) + p1 (t0 )
σ (t0 )2
− 2γ e−2ρT m ∗2 (t0 )n ∗2 (t0 ) − 1 − π ∗ (t0 )
α(t0 ) − r (t0 )
for all continuous functions l and h. Note that n ∗1 (t0 )m ∗1 (t0 ) + p1∗ (t0 ) = m ∗1 (T ).
Consequently, we must have that the sign of l(t0 ) equals the sign of
−eρt0 + e−ρT n ∗1 (t0 ) − 2γ e−2ρT m ∗1 (t0 )n ∗2 (t0 ) + k ∗ (t0 ) − n ∗1 (t0 )m ∗1 (T ) ,
For differentiable functions on the Euclidean space, a popular method to find maxima
is to use the gradient ascent method. We want to follow that variational concept,
however our objective is a mapping on a functional space. Therefore, we first need
to discuss the definition and calculation of proper gradient functions.
Theorem 3 Let (π, c) ∈ D for D as defined in Theorem 1. For each pair of contin-
uous functions (h, l) on [0, T ], we have
T
G(π + δh, c + δl) − G(π, c)
lim = h(s)(∇π G(π, c))(s)ds
δ→0 δ
0
T
+ l(s)(∇c G(π, c))(s)ds
0
A Variational Approach for Mean-Variance-Optimal Deterministic Consumption … 235
with
(∇π G(π, c))(s) = (α(s) − r (s)) m 1 (s)n 1 (s)e−ρT − 2γ e−2ρT m 2 (s)n 2 (s)
σ (s)2
× 1 + π(s) − 2γ e−2ρT m 1 (s)
α(s) − r (s)
× k(s) − n 1 (s)m 1 (T )
and
The limit
G(π + δh, c + δl) − G(π, c) d
lim = G(π + δh, c + δl)
δ→0 δ dδ δ=0
t0
G(π, c) − G(π + h1(t0 −ε,t0 ] , c + l1(t0 −ε,t0 ] ) = −(∇π G(π, c))(t0 ) h(s)ds
t0 −ε
t0
− (∇c G(π, c))(t0 ) l(s)ds + o(ε)
t0 −ε
for all t0 ∈ [0, T ], (π, c) ∈ D, and h, l ∈ C([0, T ]). Defining an equidistant decom-
position of the interval [0, T ] by
i
τi := T, i = 0, . . . , n,
n
236 M.C. Christiansen
n τi
=δ (∇π G(π + δh1[0,τi−1 ] , c + δl1[0,τi−1 ] ))(τi ) h(s)ds
i=1 τi−1
n τi
n
+δ (∇c G(π + δh1[0,τi−1 ] , c + δl1[0,τi−1 ] ))(τi ) l(s)ds + o(T /n)
i=1 τi−1 i=1
T
G(π + δh, c + δl) − G(π, c)
= (∇π G(π + δh1[0,t] , c + δl1[0,s] ))(s)h(s)ds
δ
0
T
+ (∇c G(π + δh1[0,s] , c + δl1[0,s] ))(s)l(s)ds.
0
With the help of the gradient function (∇π G(π, c), ∇π G(π, c)) of the objective
G(π, c), we can construct a gradient ascent method. A similar approach is also used
in Christiansen [8].
Algorithm
1. Choose a starting control (π (0) , c(0) ).
2. Calculate a new scenario by using the iteration
A Variational Approach for Mean-Variance-Optimal Deterministic Consumption … 237
Fig. 1 Sequence of
investment rates π (i) ,
i = 0, . . . , 40 calculated by
the gradient ascent method.
The higher the number i the
darker the color of the
corresponding graph
(π (i+1) , c(i+1) ) := (π (i) , c(i) ) + K ∇π G(π (i) , c(i) ), ∇π G(π (i) , c(i) )
where K > 0 is some step size that has to be chosen. If c(i+1) is above or below
the bounds c and c, we cut it off at the bounds.
3. Repeat step 2 until G(π (i+1) , c(i+1) ) − G(π (i) , c(i) ) is below some error tol-
erance.
7 Numerical Example
Here, we demonstrate the gradient ascent method of the previous section with a
numerical example. For simplicity, we fix the consumption rate c and only control
the investment rate π . We take the same parameters as in Christiansen and Steffensen
[9] in order to have comparable results: For the Black-Scholes market we assume
that r = 0.04, α = 0.06 and σ = 0.2. The time horizon is set to T = 20, the initial
wealth is x0 = 200, and the savings rate is a(t) − c(t) = 100 − 80 = 20. The
preference parameter of consuming today instead tomorrow is set to ρ = 0.1, and
the risk aversion parameter is set to γ = 0.003.
Starting from π (0) = 0.5, Fig. 1 shows the converging series of investment rates
π , i = 0, . . . , 40 for K = 0.2. The last iteration step π (40) perfectly fits the
(i)
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
We thank Victor DeMiguel, Matthias Scherer, Neal Stoughton, Raman Uppal, Arne West-
erkamp, Rudi Zagst and an anonymous referee for helpful comments.
T. Dangl
Vienna University of Technology, Vienna, Austria
e-mail: [email protected]
O. Randl · J. Zechner(B)
WU Vienna University of Economics and Business, Vienna, Austria
e-mail: [email protected]
J. Zechner
e-mail: [email protected]
1 Introduction
In traditional portfolio theory the scope for risk management is limited. Wilson [63]
showed that in the absence of frictions the consumption allocation of each agent in an
efficient equilibrium satisfies a linear sharing rule as long as agents have equi-cautious
HARA utilities. This implies that investors are indifferent between the universe of
securities and having access to only two appropriately defined portfolio positions, a
result that is usually referred to as the Two-Fund Separation Theorem. If a riskless
asset exists, then these two portfolios can be identified as the riskless asset and the
tangency portfolio. Risk management in this traditional portfolio theory is, therefore,
trivial: the portfolio manager only needs to choose the optimal location on the line
that combines the riskless asset with the tangency portfolio, i.e., on the capital market
line. Risk management is thus equivalent to choosing the relative weights that should
be given to the tangency portfolio and to the riskless asset, respectively.
In a more realistic model that allows for frictions, risk management in asset man-
agement becomes a much more central and complex component of asset manage-
ment. First, a world with costly information acquisition will feature informational
asymmetries regarding the return moments, as analyzed in the seminal paper by
Grossman and Stiglitz [29]. In this setup, investors generally do not hold the same
portfolio of risky assets and the two fund separation theorem brakes down (see,
e.g., Admati [1]). We will refer to such portfolios as active portfolios. In such a
setup, risk management differs from the simple structure described above for the
traditional portfolio theory. Second, frictions such as costly information acquisition
frequently require delegated portfolio management, whereby an investor transfers
decision power to a portfolio manager. This gives rise to principal-agent conflicts
that may be mitigated by risk monitoring and portfolio risk control. Third, investors
may have nonstandard objective functions. For example, the investor may exhibit
large costs if the end-of-period portfolio value falls below a critical level. This may
be the case, for example, because investors are subject to their own principal-agent
conflicts. Alternatively, investors may be faced with model risk, and thus be unable
to derive probability distributions over possible portfolio outcomes. In such a setting
investors may have nonstandard preferences, such as ambiguity aversion. We will
now discuss each of these deviations from the classical frictionless paradigm and
analyze how it affects portfolio risk management.
If the optimal portfolio differs from the market portfolio, portfolio risk management
becomes a much more complicated and important task for the portfolio manager. For
active portfolios individual positions’ risk contributions are no longer fully deter-
mined by their exposures to systematic risk factors that affect the overall market
portfolio. A position’s contribution to overall portfolio risk must not be measured
Risk Control in Asset Management: Motives and Concepts 241
by the sensitivity to the systematic risk factors, but instead by the sensitivity to the
investor’s portfolio return. For active portfolios the manager must, therefore, cor-
rectly measure each asset’s risk-contribution to the overall portfolio risk and ensure
that it corresponds to the expected return contribution of the asset. We will now
derive a simple framework that a portfolio manager may use to achieve this.
We consider an investor who wishes to maximize his expected utility, E[ũ]. In
this section, we consider the case where the investor exhibits constant absolute risk
aversion with the coefficient of absolute risk aversion denoted by Γ . In the following
derivations, we borrow ideas from Sharpe [61] and assume for convenience that
investment returns and their dispersions are small relative to initial wealth, V0 . Thus,
we can approximate Γ γ/V0 with γ denoting the investor’s relative risk aversion.
This allows for easy translation of the results into the context of later sections, where
we focus on relative risk aversion.1 An expected-utility maximizer with constant
absolute risk aversion solves
max E[ũ] = max E[− exp(−Γ (V0 (1 + w r̃ )))] = max E[− exp(−γ(1 + w r̃ ))],
w w w
(1)
where w represents the (N ×1) vector of portfolio weights and r̃ is the (N ×1) vector
of securities returns. We make standard assumptions of mean-variance analysis,
and denote μe as the (N × 1) vector of securities’ expected returns in excess of
the risk free rate r f , σ 2p (w) the portfolio’s return variance given weights w, and
the covariance matrix of excess returns. MR = 2w constitutes the vector of
marginal risk contributions resulting from a marginal increase in portfolio weight of
the respective asset, i.e., MR = ∂(σ 2p )/∂w, financed against the riskless asset. For
each asset i in the portfolio we must, therefore, have
1
μie = γei w = γ MRi , (2)
2
which implies
μie μej 1
= = γ ∀i, j. (3)
MRi MR j 2
These results show the fundamental difference between risk management for
active and passive portfolios. While in the traditional world of portfolio theory, each
asset’s risk contribution was easily measured by a constant (vector of) beta coef-
ficient(s) to the systematic risk factor(s), the active investor must measure a secu-
rity’s risk contribution by the sensitivity of the asset to the specific portfolio return,
expressed by 2ei w. This expression makes clear that each position’s marginal risk
contribution depends not only on the covariance matrix , but also on the portfolio
weights, i.e., the chosen vector w. It actually converges to the portfolio variance,
σ 2p , as the security’s weight approaches one. In the case of active portfolios, these
weights are likely to change over time, and so will each position’s marginal risk
contribution. The portfolio manager can no longer observe a position’s relevant risk
characteristics from readily available data providers such as the stock’s beta reported
by Bloomberg, but must calculate the marginal risk contributions based on the port-
folio characteristics. As shown in Eq. (3), a major responsibility of the portfolio risk
manager now is to ensure that the ratios of securities’ expected excess returns over
their marginal risk contribution are equated.
where
• r e is the (N × 1) vector of fund or manager returns in excess of the risk free return
• B is a (N × K ) matrix that denotes the exposure of each of the N assets to the K
return factors
• f e is a (K × 1) vector of factor excess returns and
• is the error term (independent of f e ).
Let f denote the covariance matrix of factor excess returns and the covariance
matrix of residuals, . Then, the covariance matrix of managers’ excess returns is
given by
= E(r e r e )
= E([B f e + ][B f e + ] )
= E(B f e f e B + )
= B E( f e f e )B + E( )
= B f B + .
Let w denote the N × 1 vector of weights assigned to managers by the CIO, then
the portfolio excess return r ep is given by
r ep = w r e .
Risk Control in Asset Management: Motives and Concepts 243
The beta of manager i’s return with respect to the portfolio is then
ei B f B w + ei w
β̃i = .
w (B f B + )w
Thus, we have an orthogonal decomposition of the vector of betas, β̃, into a part
that is due to factor exposure, β̃ S , and a part that is due to the residuals of active
managers (tracking error), β̃ I
B f B w + w B f B w w
β̃ =
= + .
w (B f B + )w w (B f B + )w w (B f B + )w
β̃ S β̃ I
We can now determine the beta of a pure factor excess return f ke to the portfolio.
With ekF denoting the kth column of the (K × K ) identity matrix, the covariance
between the factor excess return and the portfolio excess return is
Cov( f ke , r ep ) = Cov(ekF f e , w B f e + w )
= ekF f B w.
f Bw
β̃ F =
w (B
f B + )w
i.e., we can decompose the position’s beta into the exposure-weighted betas of the
pure factor returns plus the beta of the position’s residual return.
Next we can derive the vector of marginal risk contributions of the portfolio
positions. Given the factor structure above, the effect of a small change in portfolio
weights, w, on portfolio risk, σ 2p is given by MR:
244 T. Dangl et al.
1 1 ∂
MR = w w = w
2 2 ∂w
= σ 2p β̃ = σ 2p (B β̃ F + β̃ I )
Thus, an individual portfolio position i’s marginal risk contribution, MRi , is given
by
1 1 ∂
MRi = w w
2 2 ∂wi
= ei w = σ 2p β̃i
= σ 2p (B β̃iF + β̃iI ). (5)
One important objective of risk control in a world with active investment strategies
is to ensure that an active portfolio manager’s contribution to the portfolio return
justifies his idiosyncratic risk or “tracking error”. If this is not the case, then it is
better to replace the active manager with a passive position that only provides a pure
factor exposures but no idiosyncratic risks. To analyze this question we define ν e
as the vector of expected excess returns of the factor-portfolios and assume without
loss of generality ν e > 0. Then, the vector of expected portfolio excess returns can
be written as
μe = E(α + B f e + ) = α + Bν e . (6)
The first order optimality condition (3) states that the portfolio weight assigned to
manager i should not be reduced as long as it holds that:
μie μej
≥ , ∀ j.
MRi MR j
Substituting marginal risk contribution from (5) and expected return from (6) into
the above relation, we conclude that a manager i with MRi > 0 justifies her portfolio
weight relative to a pure factor investment in factor k iff
Consider the case where asset manager i has exposure only to factor k, denoted
by Bi,k . Then, this manager justifies her capital allocation iff
Risk Control in Asset Management: Motives and Concepts 245
Note that in general this condition depends on the portfolio weight. For sufficiently
small weights wi , manager i’s tracking error risk will be “non-systematic” in the
portfolio context, i.e., β̃iI = 0. However, as manager i’s weight in the portfolio
increases, his tracking error becomes “systematic” in the portfolio context. Therefore,
the manager’s hurdle rate increases with the portfolio weight. This is illustrated in
Example 1.
Example 1 Consider the special case where there is only one single factor and a
portfolio, which consists of a passive factor-investment and a single active fund. The
portfolio weight of the passive investment is denoted by w1 and that of the active
fund by w2 . The active fund is assumed to have a beta with respect to the factor
denoted by β and idiosyncratic volatility of σ I .2
The covariance of factor returns is then a simple scalar equal to the factor return
variance, the matrix of factor exposures B has dimension (2×1) and the idiosyncratic
covariance matrix is (2 × 2)
1 − w2 1 0 0
w= , f = σν2 , B= , = .
w2 β 0 σ 2I
The usual assumption ν e > 0, σ 2I > 0 applies. The hurdle to be met by the alpha of
the active fund is accordingly given by
β̃2I σ 2I w2
α ≥ H (w2 ) = νe = νe.
β̃ F σν (1 − (1 − β)w2 )
2
The derivative of this hurdle with respect to the weight of the active fund w2 is
dH σ2 1
= 2I ν e > 0,
dw2 σν (1 − (1 − β)w2 )2
i.e., the hurdle H (w2 ) has a strictly positive slope, thus, the higher the portfolio weight
w2 of an active fund, the higher is the required α it must deliver. This is so because with
low portfolio weight, the active fund’s idiosyncratic volatility is almost orthogonal to
2 Note that β is the linear exposure of the fund to the factor. It is a constant and independent of
portfolio weights. In contrast, betas of portfolio constituents relative to the portfolio, β̃ F and β̃ I ,
depend on weights.
246 T. Dangl et al.
^2
α
10 H(w2)
0 20 40 60 80 w*2 100
Fund Portfolio Weight in Percent
the portfolio return, and so its contribution to the overall portfolio risk is low. When in
contrast the active fund has a high portfolio weight, its idiosyncratic volatility already
co-determines the portfolio return and is—in the portfolio’s context—a systematic
component. The marginal risk contribution of the fund is then larger and consequently
demands a higher compensation, translating into an upward-sloping α-hurdle.
Take as an example JPMorgan Funds—Highbridge US STEEP, an open-end fund
incorporated in Luxembourg that has exposure primarily to U.S. companies, through
the use of derivatives. Using monthly data from 12/2008 to 12/2013 (data source:
Bloomberg), we estimate
ˆ f = σ̂ν2 = 0.002069, 1 ˆ = 0 0
B̂ = , .
0.9821 0 0.000303
Furthermore, we use the historical average of the market risk premium ν̄ = 0.013127,
and the fund’s estimated alpha α̂ = 0.001751. The optimal allocation is the vector
of weights w ∗ such that the marginal excess return divided by the marginal risk
contribution is equal for both assets in the portfolio. The increasing relationship
between alpha and optimal fund weight is illustrated in Fig. 1. At the estimated alpha
of 17.51 basis points, the optimal weights are given by
0.1029
w∗ = .
0.8971
Risk Control in Asset Management: Motives and Concepts 247
Brennan and Solanki [14] contrast this analysis and derive a formal condition for
optimality of an option like payoff that is typical for portfolio insurance. It can be
shown that a payoff function where the investor receives the maximum of the refer-
ence portfolio’s value and a guaranteed amount is optimal only under the stringent
conditions of a zero risk premium and linear utility for wealth levels in excess of the
guaranteed amount. Similarly, Benninga and Blume [9] argue that in complete mar-
kets utility functions consistent with optimality of portfolio insurance would have
to exhibit unrealistic features, like unbounded risk aversion at some wealth level.
However, they make the point that portfolio insurance can be optimal if markets are
not complete. An extreme example of market incompleteness in this context, which
makes portfolio insurance attractive, is the impossibility for an investor to allocate
funds into the risk-free asset. Grossman and Vila [30] discuss portfolio insurance
in complete markets, noting that the solution of an investor’s constrained portfolio
optimization problem (subject to a minimum wealth constraint VT > K ) can be
characterized by the solution of the unconstrained problem plus a put option with
exercise price K . More recently, Dichtl and Drobetz [19] provide empirical evidence
that portfolio insurance is consistent with prospect theory, introduced by Kahneman
and Tversky [41]. Loss-averse investors seem to use a reference point to evaluate
portfolio gains and losses. They experience an asymmetric response to increasing
versus decreasing wealth, in being more sensitive to losses than to gains. In addition,
risk aversion also depends on the current wealth level relative to the reference point.
The model by Gomes [27] shows that the optimal dynamic strategy followed by
loss-averse investors can be consistent with portfolio insurance.3
3 It is interesting to study the potential effects of portfolio insurance on the aggregate market. As
our focus is the perspective of a risk-manager who does not take into account such market-wide
effects of his actions, we do not cover this literature. We refer the interested reader to Leland and
Rubinstein [46], Brennan and Schwartz [13], Grossman and Zhou [32] and Basak [6] as a starting
point.
Risk Control in Asset Management: Motives and Concepts 249
The main portfolio insurance strategies used in practice are stop-loss strategies,
option-based portfolio insurance, constant proportion protfolio insurance, ratcheting
strategies with adjustments to the minimum wealth target, and value-at-risk based
portfolio insurance.
The simplest dynamic strategy for an investor to limit downside risk is to protect
his investment using a stop-loss strategy. In this case, the investor sets a minimum
wealth target or floor FT , that must be exceeded by the portfolio value VT at the
investment horizon T . He then monitors if the current value of the portfolio Vt
exceeds the present value of the floor exp(−r f (T − t))FT , where r f is the riskless
rate of interest. When the portfolio value reaches the present value of the floor, the
investor sells the risky and buys the riskfree asset. While this strategy has the benefit
of simplicity, there are several disadvantages. First, due to discreteness of trading or
illiquidity of assets, the transaction price might be undesirably far below the price
triggering portfolio reallocation. Second, once the allocation has switched into the
riskfree asset the portfolio will grow deterministically at the riskfree rate, making
it impossible to even partially participate in a possible recovery in the price of the
risky asset.
Brennan and Schwartz [12] and Leland [47] describe that portfolio insurance can
be implemented in two eqivalent ways: (1) holding the reference portfolio plus a
put option, or (2) holding the riskfree asset plus a call option. When splitting his
portfolio into a position S0 in the risky asset and P0 in a protective put option at
time t = 0, the investor has to take into account the purchase price of the option
when setting the exercise price K , solving (S0 + P0 (K )) · (FT /V0 ) = K for K . The
ratio FT /V0 is the minimum wealth target expressed as a fraction of initial wealth. If
such an option is available on the market it can be purchased and no further action is
needed over the investment horizon; alternatively such an option can be synthetically
replicated as popularized by Rubinstein and Leland [58]. Again, the risky asset will
be bought on price increases and sold on falling prices, but in contrast to the stop-
loss strategy, changes in the portfolio allocation will now be implemented smoothly.
Even after a fall in the risky asset’s price there is scope to partially participate in an
eventual recovery as long as Delta is strictly positive. Toward the end of the investment
horizon, Delta will generally be very close to either zero or one, potentially leading
to undesired portfolio switching if the risky asset fluctuates around the present value
of the exercise price.
250 T. Dangl et al.
The portfolio insurance strategies discussed so far limit the potential shortfall from
the start of the investment period to its end, frequently a calendar year. But investors
may also be concerned with losing unrealized profits that have been earned within
the year. Estep and Kritzman [23] propose a technique called TIPP (time invariant
portfolio insurance) as a simple way of achieving (partial) protection of interim gains
in addition to the protection offered by CPPI. Their methodology adjusts the floor Ft
used to calculate the cushion Ct over time. The TIPP floor is set as the maximum of last
period’s floor and a fraction k of the current portfolio value: Ft = max(Ft−1 , kVt ).
This method of ratcheting the floor up is time invariant in the sense that the notion of
a target date T is lost. However, if the percentage protection is required with respect
to a specific target date, the method can be easily adjusted by setting a target date floor
FT proportional to current portfolio value Vt , which is then discounted. Grossman
and Zhou [31] provide a formal analysis of portfolio insurance with a rolling floor,
while Brennan and Schwartz [13] characterize a complete class of time-invariant
portfolio insurance strategies, where asset allocation is allowed to depend on current
portfolio value, but is independent of time.
Benninga [8] uses Monte Carlo simulation techniques to compare stop-loss, OBPI,
and CPPI. Surprisingly, he finds that stop-loss dominates with respect to terminal
wealth and Sharpe ratio. Dybvig [21] considers asset allocation and portfolio payouts
in the context of endowment management. If payouts are not allowed to decrease,
CPPI exhibits more desirable properties than constant mix strategies. Balder et al. [4]
analyze risks associated with implementation of CPPI under discrete-time trading
and transaction costs. Zagst and Kraus [64] compare OBPI and CPPI with respect
to stochastic dominance. Taking into account that implied volatility—which is rel-
evant for OBPI—is usually higher than realized volatilty—relevant for CPPI—they
find that under specific parametrizations CPPI dominates. Recently, Dockner [20]
compares buy-and-hold, OBPI and CPPI concluding that there does not exist a clear
ranking of the alternatives. Dichtl and Drobetz [19] consider prospect theory (Kah-
neman and Tversky [41]) as framework to evaluate portfolio insurance strategies.
They use a twofold methodological approach: Monte Carlo simulation and historical
simulation with data for the German stock market. Within the behavioral finance
context chosen, their findings provide clear support for the justification of downside
protection strategies. Interestingly, in their study stop-loss, OBPI and CPPI turn out
attractive while the high protection level of TIPP associated with opportunity costs
in terms of reduced upside potential turns out to be suboptimal. Finally, they recom-
mend to implement CPPI aggressively by using the highest multiplier m consistent
with tolerance for overnight or gap risk.
252 T. Dangl et al.
Example 2 In 4 out of the 18 calendar years from 1995 to 2013, the S&P 500 total
return index lost more than 5 %. For investors with limited risk capacity it was not
helpful that these losses happened three times in a row (2000, 2001, and 2002),
or were severe (2008). The following example illustrates how simple versions of
common techniques to control downside risk have performed over these 18 years.
We assume investment opportunities in the S&P 500 index and a risk-free asset, an
investment horizon equal to the calendar year, and a frictionless market (no trans-
action costs). Each calendar year the investment starts with a January 1st portfolio
value of 100. Rebalancing is possible with daily frequency. For the portfolio insur-
ance strategies investigated, the desired minimum wealth is given with 95, and free
parameters are set in a way to make the strategies comparable, by ensuring equal
equity allocations at portfolio start. This is achieved by resetting the multiples m
for CPPI and TIPP each January 1st according to the Delta of the OBPI strategy.
Similarly, the VaR confidence level is set to achieve this same equity proportion at
the start of the calendar year. OBPI Delta also governs the initial equity portion of
the buy-and-hold portfolio. Table 1 reports the main results, and Fig. 2 summarizes
the distribution of year-end portfolio values in a box plot.
The achieved minimum wealth levels show that for CPPI, TIPP, OBPI, and VaR-
based portfolio insurance even in the worst year the desired minimum wealth has been
missed just slightly, while in the case of the stop loss strategy there is a considerable
gap. This can be partly explained by the simple setup of the eample (e.g., rebalancing
using daily closing prices only, while in practice intraday decision-making and trad-
ing will happen). But a possibly large gap between desired and achieved minimum
wealth is also systematic of stop loss strategies because of the mechanics of stop-loss
orders. The moment the stop limit is reached, a market order to sell the entire port-
folio is executed. The trading price, therefore, can and frequently will be lower than
the limit. This can pose considerable problems in highly volatile and illiquid market
environments. Option replication comes next in missing desired wealth protection.
Fig. 2 Comparison of portfolio insurance strategies, annual horizon, S&P 500, 1995–2013. For
each strategy, the shaded area indicates the observations from the 25th to the 75th percentile,
the median is shown as the line across the box and the mean as a diamond within the box. The
whiskers denote the lowest datum still within 1.5 interquartile range of the lower quartile, and the
highest datum still within 1.5 interquartile range of the upper quartile. If there are more extreme
observations they are shown separately by a circle. The semitransparent horizontal line indicates
the desired minimum wealth level
In the example, this might be due the simplified setup, where the exercise price of
the option to be replicated is determined only once per year (at year start), and then
daily Delta is calculated for this option and used for allocation into the risky and the
riskless asset. In practice, new information on volatility and the level of interest rates
will also lead to a reset of the strike used for calculation of the Delta. Another obser-
vation is that the standard deviation of annual returns is lowest for TIPP, which comes
at the price of the lowest average return. If the cross-sectional standard deviation is
computed only for the years with below-average S&P 500 returns, it is lowest for
VaR-based risk control. For all methods shown, practical implementation will typi-
cally use higher levels of sophistication. For example, trading filters will be applied
254 T. Dangl et al.
In the previous discussion, shortfall risk was seen from the perspective of an investor
holding assets only. However, many institutional investors simultaneously optimize
a portfolio of assets A and liabilities L. Sharpe and Tint [62] describe a flexible
approach to systematically incorporate liabilities into pension fund asset allocation,
by optimizing over a surplus measure S = A − k L, where k ∈ [0, 1] is a factor
denoting the relative weight attached to liabilities. In the context of asset liability
management, Ang et al. [2] analyze the effect of downside risk aversion, and offer
an explanation why risk aversion tends to be high when the value of the assets
approaches the value of the liabilities. Ang et al. [2] specify the objective function
of the fund as mean-variance over asset returns plus a downside risk penalty on the
liability shortfall that is proportional to the value of an option to exchange the optimal
portfolio for the random value of the liabilities. An investor following their advice
tends to be more risk averse than a portfolio manager implementing the Sharpe and
Tint [62] model. For very high funding ratios, the impact of downside risk on risk
taking, and therefore the asset allocation of the pension fund manager is small. For
deeply underfunded plans, the value of the option is also relatively insensitive to
changes in volatility, again leading to a small impact on asset allocation. The effect
of liabilities on asset allocation is strongest when the portfolio value is close to the
value of liabilities. In this case, lower volatility reduces the value of the exchange
option, leading to a smaller penalty.
Another hedging motive arises if investors wish to bear only specific risks. This
might be due to specialization of the investor in a certain asset class, making it
desirable to hedge against risks not primarily driving the returns of this asset class.
A popular example is currency risk, which has been recently analyzed by Campbell
et al. [15] who find full currency hedging to be optimal for a variance-minimizing
bond investor, but discuss the potential for overall risk reduction from keeping foreign
exchange exposure partly unhedged in the case of equity portfolios.
4 We assume in general, that the model has a structure, which ensures that parameters are identifiable.
For example, it is assumed that log-returns are normally distributed, but mean and variance must
be estimated from observed data.
256 T. Dangl et al.
in the covariance structure, which the estimate is able to capture only if one restricts
the used history.5
Let r denote the (T × N ) matrix containing weekly returns, then the sample
covariance matrix ˆ S is determined by
1
ˆS =
r M r, (7)
T −1
where the symmetric and idempotent matrix M is the residual maker with respect to
a regression onto a constant,
M = I − 1 (1 1)−1 1 ,
with I the (T × T ) identity matrix and 1 a column vector containing T times the
constant 1.
In the assumed setup, the sample covariance matrix is singular by construction.
ˆ S is bounded from above by
This is so because from (7) it follows that the rank of
min{N , T − 1}.6 And even in the case where the number of return observations per
asset exceeds the number of assets (T > N + 1) the sample covariance matrix is
weakly determined, hence, subject to large estimation errors since one has to estimate
N (N + 1)/2 elements of ˆ S from T · N observations.
Since a simple Markowitz optimization, see Markowitz [49], needs to invert the
covariance matrix, matrix singularity prohibits any attempt of advanced portfolio
optimization, and is thus the most evident estimation problem in portfolio manage-
ment. Elton and Gruber [22] is an early contribution, which proposes the use of
structural estimators of the covariance matrix. Jobson and Korkie [38] provide a rig-
orous analysis of the small sample properties of estimates of the covariance structure
of returns.
Less evident are the problems caused by errors in the estimates of return expec-
tations, whereas it turns out that they are economically much more critical. Jorion
[39] shows in the context of international equity portfolio selection that the errors
in the estimates of return expectations have a severe impact on the out-of-sample
5 Such an approach is typical for dealing with inadequate model specification. The formal estimate
is based on the assumption that the covariance structure is stable. Since data show that the covariance
structure is not stable, an ad-hoc adaptation—the limitation of the data history—is used to capture
the recent covariance structure. The optimal amount of historical data that should be used cannot
be derived within the model, but must be roughly calibrated to some measure of goodness-of-fit,
which balances estimation error against timely response to time variations.
6 The residual maker M has at most rank T − 1 because it generates residuals from a projection
For example, the sample covariance matrix estimated from two years of weekly returns of the 500
constituents of the S&P 500 (104 observations per stock) has at most rank 103. Hence, it is not
positive definite and not invertible, because at least 397 of its 500 eigenvalues are exactly equal 0.
Risk Control in Asset Management: Motives and Concepts 257
θ̂ = λθ̂ S + (1 − λ)θ̂struct .
While practitioners often use ad hoc weighting schemes, the literature provides
a powerful Bayesian interpretation of shrinkage, which allows for the computation
of optimal weights. In this Bayesian view, the structural estimator serves as the
prior, which anchors the location of model parameters θ and the sample estimate
acts as the conditioning signal. Bayes’ rule then gives a stringent advice of how to
combine prior and signal in order to compute the updated posterior that is used as
an input for the portfolio optimization. The abovementioned Bayes-Stein shrinkage
used in Jorion [39, 40] focuses on estimates of the expected returns. In the context
of covariance estimation, an early contribution is Frost and Savarino [25]. More
recently, Ledoit and Wolf [43] determine a more general Bayesian framework to
optimize the shrinkage intensity, in which the authors explicitly correct for the fact
7 See, e.g., Dangl and Kashofer [18] for an overview of structural estimates of the covariance
structure of large equity portfolios—including shrinkage estimates.
8 Shrinkage is usually a multivariate concept, i.e., λ is in general not a fixed scalar, but it depends
that the prior (i.e., the structural estimate of the covariance structure) as well as the
updating information (i.e., the sample covariance matrix) are determined from the
same data. Consequently, errors in these two inputs are not independent and the
Bayesian estimate must control for the interdependence.9
Weight Restrictions A commonly observed reaction to parameter uncertainty in
portfolio management is imposing ad hoc restrictions on portfolio weights. That is,
the discretion of a portfolio optimizer is limited by maximum as well as minimum
constraints on the weights of portfolio constituents.10 In sample, weight restrictions
clearly reduce portfolio performance (as measured by the objective function used in
the optimization approach).11 Nevertheless, out of sample studies show, that in many
cases weight restrictions improve the risk-return trade-off of portfolios. Jagannathan
and Ma [37] provide evidence why weight restrictions might be an efficient response
to estimation errors in the covariance structure. Analyzing minimum-variance portfo-
lios they show that binding long only constraints are equivalent to shrinking extreme
covariance estimates toward more moderate levels.
Robust Optimization A more systematic approach to parameter uncertainty than
weight restrictions is robust optimization. After determining the uncertainty set S for
the relevant parameter vector p, robust portfolio optimization is usually formulated
as a max-min problem where the vector w of portfolio weights solves the equation
with f (w; p) being the planner’s objective function that she seeks to maximize.
This is a conservative or worst-case approach, which in many real-world applica-
tions shows favorable out-of-sample properties (see Fabozzi et al. [24], or for more
details on robust and convex optimization problems and its applications in finance
see Lobo et al. [48]). Provided a distribution of the parameters is available, the rather
extreme max-min approach could be relaxed by applying convex risk measures. In
the context of derivatives pricing, Bannoer and Scherer [5] develop the concept of
risk-capturing functionals and exemplify risk averse pricing using an average Value-
at-Risk measure.
Resampling A different approach to deal with parameter uncertainty in asset man-
agement is resampling. This technique does not attempt to produce more robust para-
meter estimates or to build a portfolio-optimization model, which directly regards
parameter uncertainty in portfolio optimization. Resampling is a simulation-based
approach that was first described in the portfolio-optimization context by Michaud
[52] and exists in different specifications. It takes the sample estimates of mean
9 See also Ledoit and Wolf [44, 45] for more on shrinkage estimates of the covariance structure.
10 Weight restrictions are frequently part of regulatory measures targeting the fund industry aimed
to control the risk characteristics of investment funds.
11 Green and Hollifield [28] argue that in the apparent presence of a strong factor structure in the
cross section of equity returns, mean-variance optimal portfolios should take large short positions
in selected assets. Hence, a restriction to a long-only portfolio is expected to negatively influence
portfolio performance.
Risk Control in Asset Management: Motives and Concepts 259
Example 4 This simple example builds on Example 1 which discusses the optimal
weight of an active fund relative to a passive factor investment. An index-investment
in the S&P 500 serves as the passive factor investment and an active fund with
the constituents of the S&P 500 as its investment universe is the delegated active
investment strategy. In Example 1 we take a history of five years of monthly log-
returns (60 observations) to estimate mean returns as well as the covariance structure
and the alpha, which the fund generates relative to the passive investment. We use
these estimates to conclude that the optimal portfolio weight of the fund should be
roughly 90 % and only 10 % of wealth should be held as a passive investment.
Being concerned about the quality of our parameter estimation that feeds into the
optimization, we first examine the regression, which was performed to come up with
these estimates. Assuming that log-returns are normally distributed, we conclude
from the regression in Example 1 that our best estimates of the parameters α, β and
ν are
α̂ = 17.51 bp/month, β̂ = 0.9821, ν̂ = 131.27 bp/month,
and that the estimation errors are t-distributed with a standard deviation12
σ58 (α̂) = 23.40 bp/month, σ58 (β̂) = 0.0498, σ59 (ν̂) = 454.91 bp/month.
Fig. 3 Distribution of optimal portfolio weight in the interval [−100 %, 200 %] of the active invest-
ment over 100,000 resampled histories. Approximately 29 % of weights lie outside the stated interval
13 This is the simplest version of resampling, mostly used in portfolio optimization. Given the null
hypothesis that returns are normally distributed, we know that the empirical estimates of distribution
moments are t-distributed around the true parameters, see Jobson and Korkie [38] for a detailed
derivation of the small sample properties of these estimates.Thus, a more advanced approach samples
for each of the histories, first the model parameters from their joint distribution, and then—given
the selected moments—the history of normally distributed returns. Harvey et al. [33] is an example
that uses advanced resampling to compare Bayesian inference with simple resampling.
14 Some authors do propose schemes how to generate portfolio decisions from the cross section of
the simulation results, see, e.g., Michaud and Michaud [53]. These schemes are, however, criticized
by other authors for not being well-founded in decision theory, e.g., Markowitz and Usmen [50]
and others mentioned in the text above.
Risk Control in Asset Management: Motives and Concepts 261
Qualitatively different from dealing with parameter uncertainty is the issue of model
uncertainty. Since it is not at all clear what the exact characteristics of the data-
generating process, which underlies asset returns are, it is not obvious which attributes
a model must feature in order to capture all economically relevant effects of the
portfolio selection process. Hence, every model of optimal portfolio choice bears the
risk of being misspecified. In Sect. 4.1 we already mention the fact that traditional
portfolio models assume that mean returns and the covariance structure of returns
are constant over time. This is in contrast to empirical evidence that the moments
of the return distribution are time varying. Limiting the history, which is used to
estimate distribution parameters, is a frequently used procedure to get a more actual
estimate. The correct length of historical data that shall be used is, however, only
rarely determined in a systematic manner.
Bayesian Model Averaging A systematic approach to estimation under model
uncertainty is Bayesian model averaging. It builds on the concept of a Bayesian
decision-maker that has a prior about the probability weights of competing models
that are constructed to predict relevant variables (e.g., asset returns) one period ahead.
Observed returns are then used to determine posterior probability weights for each
of the models considered applying Bayes rule.15 Each of the competing models
generates a predictive density for the next period’s return. After observing the return,
models which have assigned a high likelihood to the observed value (compared to
others) experience an upward revision of their probability weight. In contrast, models
that have assigned a low likelihood to the observed value experience a downward
revision of their weight. Finally, the overall predictive density is calculated as a
probability-weighted sum of all models’ predictive densities. This Bayesian model
averaging is an elegant way to approach a problem of model uncertainty to transform
it into a standard portfolio problem to find the optimal risk-return trade-off under the
derived predictive return distribution. This approach can, however, only be applied
15 The posterior probability that a certain model is the correct model is proportional to the product
of the model’s prior probability weight and the realized likelihood of the observed return.
262 T. Dangl et al.
under the assumption that the decision-maker has a single prior and that she shows
no aversion against the ambiguity inherent in the model uncertainty.16
Raftery et al. [57] provide the technical details of Bayesian model averaging and
Avramov [3], Cremers [16], and Dangl and Halling [17] are applications to return
prediction. Bayesian model averaging treats model uncertainty just as an additional
source of variation. The predictive density for next period’s returns becomes more
disperse the higher the uncertainty about models, which differ in their prediction. The
optimal portfolio selection is then unchanged, but regards the additional contribution
to uncertainty.
Ambiguity Aversion If it is not possible to explicitly assess the probability that a cer-
tain model correctly mirrors the portfolio selection problem and investors are averse
to this form of ambiguity, alternative portfolio selection approaches are needed.
Garlappi et al. [26] develop a portfolio selection approach for investors who have
multiple priors over return expectations and show ambiguity aversion. The authors
prove that the portfolio selection problem of such an ambiguity-averse investor can
be formulated by imposing two modifications to the standard mean-variance model,
(i) an additional constraint that guarantees that the expected return lies in a specified
confidence region (the way how multiple priors are modeled) and (ii) an additional
minimization over all expected returns that conform to the priors (mirroring ambi-
guity aversion). This model gives an intuitive illustration of the fact that ambiguity
averse investors show explicit desire for robustness.
5 Conclusion
The asset management industry has substantial influence on financial markets and
on the welfare of many citizens. Increasingly, citizens are saving for retirement via
delegated portfolio managers such as pension funds or mutual funds. In many cases
there are multiple layers of delegation. It is, therefore, crucial for the welfare of
modern societies that portfolio managers manage and control their portfolio risks.
This article provides an eagle’s perspective on risk management in asset management.
In traditional portfolio theory, the scope for risk control in portfolio management
is limited. Risk management is essentially equivalent to determining the fraction
of capital that the manager invests in a broadly and well diversified basket of risky
securities. Thus, the “risk manager” only needs to find the optimal location on the
securities market line. By contrast, in a more realistic model of the world that accounts
for frictions, risk management becomes a central and important module in asset
management that is frequently separate from other divisions of an asset manager.
We identify several major frictions that require risk management that goes beyond
choosing the weight of the riskless asset in the portfolio. First, in a world with costly
information acquisition, investors do not hold the same mix of risky assets. This
16 As explained in the introduction to this section, ambiguity aversion refers to preferences that
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
1. Admati, A.: A noisy rational expectations equilibrium for multi-asset securities markets. Econo-
metrica 53(3), 629–657 (1985)
2. Ang, A., Chen, B., Sundaresan, S.: Liability-driven investment with downside risk. J. Portf.
Manag. 40(1), 71–87 (2013)
3. Avramov, D.: Stock return predictability and model uncertainty. J. Financ. Econ. 64, 423–458
(2002)
264 T. Dangl et al.
4. Balder, S., Brandl, M., Mahayni, A.: Effectiveness of CPPI strategies under discrete-time
trading. J. Econ. Dyn. Control 33(1), 204–220 (2009)
5. Bannör, K.F., Scherer, M.: Capturing parameter risk with convex risk measures. Eur. Actuar.
J. 3(1), 97–132 (2013)
6. Basak, S.: A comparative study of portfolio insurance. J. Econ. Dyn. Control 26, 1217–1241
(2002)
7. Basak, S., Shapiro, A.: Value-at-risk-based risk management: optimal policies and asset prices.
Rev. Financ. Stud. 14(2), 371–405 (2001)
8. Benninga, S.: Comparing portfolio insurance strategies. Finanzmarkt Portf. Manag. 4(1), 20–30
(1990)
9. Benninga, S., Blume, M.E.: On the optimality of portfolio insurance. J. Financ. 40(5), 1341–
1352 (1985)
10. Black, F., Jones, R.: Simplifying portfolio insurance. J. Portf. Manag. 14(1), 48–51 (1987)
11. Black, F., Perold, A.F.: Theory of constant proportion portfolio insurance. J. Econ. Dyn. Control
16(3–4), 403–426 (1992)
12. Brennan, M.J., Schwartz, E.S.: The pricing of equity-linked life insurance policies with an
asset value guarantee. J. Financ. Econ. 3, 195–213 (1976)
13. Brennan, M.J., Schwartz, E.S.: Time-invariant portfolio insurance strategies. J. Financ. 43(2),
283–299 (1988)
14. Brennan, M.J., Solanki, R.: Optimal portfolio insurance. J. Financ. Quant. Anal. 16(3), 279–300
(1981)
15. Campbell, J.Y., Medeiros, K.S.-D., Viceira, L.M.: Global currency hedging. J. Financ. 65(1),
87–121 (2010)
16. Cremers, M.K.J.: Stock return predictability: a Bayesian model selection perspective. Rev.
Financ. Stud. 15, 1223–1249 (2002)
17. Dangl, T., Halling, M.: Predictive regressions with time-varying coefficients. J. Financ. Econ.
106, 157–181 (2012)
18. Dangl, T., Kashofer, M.: Minimum-variance stock picking—a shift in preferences for
minimum-variance portfolio constituents. Working paper (2013)
19. Dichtl, H., Drobetz, W.: Portfolio insurance and prospect theory investors: popularity and
optimal design of capital protected financial products. J. Bank. Financ. 35(7), 1683–1697
(2011)
20. Dockner, E.: Sind Finanzprodukte mit Kapitalgarantie eine attraktive Anlageform? In: Frick,
R., Gantenbein, P., Reichling, P. (eds.) Asset Management, pp. 271–284. Haupt, Bern Stuttgart
Wien (2012)
21. Dybvig, P.H.: Using asset allocation to protect spending. Financ. Anal. J. 55(1), 49–62 (1999)
22. Elton, E.J., Gruber, M.J.: Estimating the dependence structure of share prices-implications for
portfolio selection. J. Financ. 28(5), 1203–1232 (1973)
23. Estep, T., Kritzman, M.: TIPP: insurance without complexity. J. Portf. Manag. 14(4), 38–42
(1988)
24. Fabozzi, F.J., Kolm, P.N., Pachamanova, D., Focardi, S.M.: Robust Portfolio Optimization and
Management. Wiley, Hoboken (2007)
25. Frost, P.A., Savarino, J.E.: An empirical Bayes approach to efficient portfolio selection. J.
Financ. Quant. Anal. 21(3), 293–305 (1986)
26. Garlappi, L., Uppal, R., Wang, T.: Portfolio selection with parameter and model uncertainty: a
multi-prior approach. Rev. Financ. Stud. 20(1), 41–81 (2007)
27. Gomes, F.J.: Portfolio choice and trading volume with loss-averse investors. J. Bus. 78(2),
675–706 (2005)
28. Green, R.C., Hollifield, B.: When will mean-variance efficient portfolios be well diversified?
J. Financ. 47(5), 1785–1809 (1992)
29. Grossman, S.J., Stiglitz, J.E.: On the impossibility of informationally efficient markets. Am.
Econ. Rev. 70(3), 393–408 (1980)
30. Grossman, S.J., Vila, J.-L.: Portfolio insurance in complete markets: a note. J. Bus. 62(4),
473–476 (1989)
Risk Control in Asset Management: Motives and Concepts 265
31. Grossman, S.J., Zhou, Z.: Optimal investment strategies for controlling drawdowns. Math.
Financ. 3(3), 241–276 (1993)
32. Grossman, S.J., Zhou, Z.: Equilibrium analysis of portfolio insurance. J. Financ. 51(4), 1379–
1403 (1996)
33. Harvey, C.R., Liechty, J.C., Liechty, M.W.: Bayes vs. resampling: a rematch. J. Invest. Manag.
6(1), 1–17 (2008)
34. Harvey, C.R., Liechty, J.C., Liechty, M.W., Müller, P.: Portfolio selection with higher moments.
Quant. Financ. 10(5), 469–485 (2010)
35. Herold, U., Maurer, R., Purschaker, N.: Total return fixed-income portfolio management. A
risk-based dynamic strategy. J. Portf. Manag. Spring 31, 32–43 (2005)
36. Herold, U., Maurer, R., Stamos, M., Vo, H.T.: Total return strategies for multi-asset portfolios:
dynamically managing portfolio risk. J. Portf. Manag. 33(2), 60–76 (2007)
37. Jagannathan, R., Ma, T.: Risk reduction in large portfolios: why imposing the wrong constraints
helps. J. Financ. 58(4), 1651–1684 (2003)
38. Jobson, J., Korkie, B.: Estimation for Markowitz efficient portfolios. J. Am. Stat. Assoc.
75(371), 544–554 (1980)
39. Jorion, P.: International portfolio diversification with estimation risk. J. Bus. 58(3), 259–278
(1985)
40. Jorion, P.: Bayes-Stein estimation for portfolio analysis. J. Financ. Quant. Anal. 21(3), 279–292
(1986)
41. Kahneman, D., Tversky, A.: Prospect theory: an analysis of decision under risk. Econometrica
47(2), 263–291 (1979)
42. Knight, F.H.: Risk, uncertainty and profit. Sentry press, Reprinted 1956 (1921)
43. Ledoit, O., Wolf, M.: Improved estimation of the covariance matrix of stock returns with an
application to portfolio selection. J. Empir. Financ. 10, 603–621 (2003)
44. Ledoit, O., Wolf, M.: Honey, I shrunk the sample covariance matrix. J. Portf. Manag. 30(4),
110–119 (2004a)
45. Ledoit, O., Wolf, M.: A well-conditioned estimator for large-dimensional covariance matrices.
J. Multivar. Anal. 88, 365–411 (2004b)
46. Leland, H., Rubinstein, M.: Comments on the market crash: six months after. J. Econ. Perspect.
2(3), 45–50 (1988)
47. Leland, H.E.: Who should buy portfolio insurance? J. Financ. 35(2), 581–594 (1980)
48. Lobo, M.S., Vandenberghe, L., Boyd, S., Lebret, H.: Applications of second-order cone pro-
gramming. Linear Algebra Appl. 284(1–2), 193–228 (1998)
49. Markowitz, H.: Portfolio selection. J. Financ. 7, 77–91 (1952)
50. Markowitz, H., Usmen, N.: Resampled frontiers versus diffuse Bayes: an experiment. J. Invest.
Manag. 4(1), 9–25 (2003)
51. Merton, R.: Optimum consumption and portfolio rules in a continuous-time model. J. Econ.
Theory 3(4), 373–413 (1971)
52. Michaud, R.O.: Efficient Asset Management. Oxford University Press, Oxford (1998)
53. Michaud, R.O., Michaud, R.O.: Efficient Asset Management, 2nd edn. Oxford University Press,
Oxford (2008)
54. Pastor, L., Stambaugh, R.F.: Are stocks really less volatile in the long run? J. Financ. 67(2),
431–477 (2012)
55. Pennacchi, G.: Theory of Asset Pricing. The Addison-Wesley series in finance. Addison-
Wesley, Boston (2008)
56. Perold, A.F., Sharpe, W.F.: Dynamic strategies for asset allocation. Financ. Anal. J. 44(1),
16–27 (1988)
57. Raftery, A.E., Madigan, D., Hoeting, J.A.: Bayesian model averaging for linear regression
models. J. Am. Stat. Assoc. 92, 179–191 (1997)
58. Rubinstein, M., Leland, H.E.: Replicating options with positions in stock and cash. Financ.
Anal. J. 37(4), 63–72 (1981)
59. Scherer, B.: Portfolio resampling: review and critique. Financ. Anal. J. 58(6), 98–109 (2002)
266 T. Dangl et al.
60. Scherer, B.: A note on the out-of-sample performance of resampled efficiency. J. Asset Manag.
7(3/4), 170–178 (2006)
61. Sharpe, W.F.: Decentralized investment management. J. Financ. 36(2), 217–234 (1981)
62. Sharpe, W.F., Tint, L.G.: Liabilities—a new approach. J. Portf. Manag. 16(2), 5–10 (1990)
63. Wilson, R.: The theory of syndicates. Econometrica 36(1), 119–132 (1968)
64. Zagst, R., Kraus, J.: Stochastic dominance of portfolio insurance strategies—OBPI versus
CPPI. Ann. Op. Res. 185(1), 75–103 (2011)
Worst-Case Scenario Portfolio Optimization
Given the Probability of a Crash
Olaf Menkens
Abstract Korn and Wilmott [9] introduced the worst-case scenario portfolio prob-
lem. Although Korn and Wilmott assume that the probability of a crash occurring
is unknown, this paper analyzes how the worst-case scenario portfolio problem is
affected if the probability of a crash occurring is known. The result is that the addi-
tional information of the known probability is not used in the worst-case scenario.
This leads to a q-quantile approach (instead of a worst case), which is a value at
risk-style approach in the optimal portfolio problem with respect to the potential
crash. Finally, it will be shown that—under suitable conditions—every stochastic
portfolio strategy has at least one superior deterministic portfolio strategy within this
approach.
1 Introduction
Portfolio optimization in continuous time goes back to Merton [17]. Merton assumes
that the investor has two investment opportunities; one risk-free asset (bond) and one
risky asset (stock) with dynamics given by
O. Menkens (B)
School of Mathematical Sciences, Dublin City University, Dublin 9,
Glasnevin, Ireland
e-mail: [email protected]
© The Author(s) 2015 267
K. Glau et al. (eds.), Innovations in Quantitative Risk Management,
Springer Proceedings in Mathematics & Statistics 99,
DOI 10.1007/978-3-319-09114-3_15
268 O. Menkens
Assuming that the utility function U (x) of the investor is given by U (x) = ln(x),
one can define the performance function for an arbitrary admissible portfolio strategy
π(t) by
⎡ T
⎤
π,t,x σ02 2
J0 (t, x, π ) := E ln X 0 (T ) = ln (x) + E ⎣ Ψ0 − ∗
π(s) − π0 ds ⎦ . (1)
2
t
Here,
2
1 μ0 − r0 σ02 ∗ 2 μ0 − r0
Ψ0 := r0 + = r0 + π0 and π0∗ :=
2 σ0 2 σ02
will be called the utility growth potential or earning potential and the optimal port-
folio strategy or Merton fraction, respectively. Using this, the portfolio optimization
problem in the Merton case (that is without taking possible jumps into account) is
given by
where ν0 is known as the value function in the Merton case. From Eq. (1), it is clear
that π0∗ maximizes J0 . Hence, it is the optimal portfolio strategy for Eq. (2).
Merton’s model has the disadvantage that it cannot model jumps in the price of
the risky asset. Therefore, Aase [1] extended Merton’s model to allow for jumps in
the risky asset. In the simplest case, the dynamics of the risky asset changes to
where N is a Poisson process with intensity λ > 0 on (Ω, F , P) and k > 0 is the
crash or jump size. In this setting, the performance function is given by
⎡
⎤
T
σ02
J J (t, x, π ) = ln (x) + E ⎣ π(s) − π0∗ − ln (1 − π(s)k) λ ds ⎦ .
2
Ψ0 −
2
t
1 1 1 1 2 λ
π J∗ = ∗
π0 + − π0∗ − + 2.
2 k 4 k σ0
Figure 1 shows the fraction invested in the risky asset in Merton’s (solid line) and
Aase’s model for various λ (all the other lines). The dashed line below the solid
line is the case where λ1 = 50, that is the investor expects on average one crash
within 50 years. By comparison, the lowest line (the dash–dotted line) is the case
Worst-Case Scenario Portfolio Optimization Given the Probability of a Crash 269
1.4
1.2
1
Fraction invested in the risky asset
0.8
0.6
0.4
0.2
∗
π0 (λ = 0)
0 λ = 0.02 or 1/λ = 50
λ = 0.04 or 1/λ = 25
λ = 0.08 or 1/λ = 12.5
λ = 0.1 or 1/λ = 10
−0.2 λ = 0.2 or 1/λ = 5
λ = 0.3 or 1/λ = 3.3333
λ = 0.4 or 1/λ = 2.5
−0.4
0 5 10 15 20 25 30 35 40 45 50
Investment Horizon T−t
Fig. 1 Examples of Merton’s optimal portfolio strategies. This figure is plotted with π0∗ = 1.25,
σ02 π0∗
σ0 = 0.25, r = 0.05, k = 0.25, and T = 50. This implies that λ0 = k = 0.3125, Ψ0 ≈
0.098828, and k1∗ = 4
where λ1 = 2.5, that is the investor expects on average one crash within 2.5 years.
Note, however, that the fraction invested in the risky asset is negative in this case,
meaning that the optimal strategy is that the investor goes short in the risky asset.
This strategy is very risky because the probability that the investor will go bankrupt
is strictly positive. This can also be observed in practice where several hedge funds
went bankrupt which were betting on a crash in the way described above.
Therefore, let us consider an ansatz which overcomes this problem.
The ansatz made by Korn and Wilmott [9] is to distinguish between normal times
and crash time. In normal times, the same set up as in Merton’s model is used. At
the crash time, the price of the risky asset falls by a factor of k ∈ [k∗ , k ∗ ] (with
0 ≤ k∗ ≤ k ∗ < 1). This implies that the wealth process X 0π (t) just before the crash
time τ – satisfies
X 0π (τ −) = [1 − π(τ )] X 0π (τ −) + π(τ )X 0π (τ −) .
bond investment stock investment
270 O. Menkens
At the crash time, the price of the risky asset drops by a factor of k, implying
Therefore, one has a straightforward relationship of the wealth right before a crash
with the wealth right after a crash.
The main disadvantage of this ansatz is that one needs to know the maximal
possible number of crashes M that can happen at most—in the following, we assume
for simplicity that M = 1 if not stated otherwise—and one needs to know the worst
crash size k ∗ that can happen. On the other hand, no probabilistic assumptions are
made on the crash time or crash size. Therefore, Merton’s approach, to maximize the
expected utility of terminal wealth, cannot be used in this context. Instead the aim is
to find the best uniform worst-case bound, e.g. solve
sup inf E ln X π (T ) , (3)
π(·)∈A(x) 0≤τ ≤T
k∈K
E ln X t,x,π ,τ ,k (T )
π2
π1
Merton approach
π3
will be called a crash indifference strategy. This is, because the investor gets the
same expected utility of terminal wealth if either no crash happens (left-hand side)
or a crash of the worst-case size k ∗ happens (right-hand side). It is straightforward
to verify (see Korn and Menkens [10]) that there exists a unique crash indifference
strategy π̂, which is given by the solution of the differential equation
σ2 1 2
π̂ (t) = 0 π̂ (t) − ∗ π̂ (t) − π0∗ , (6)
2 k
with π̂ (T ) = 0. (7)
strategy for an investor, who wants to maximize her worst-case scenario portfolio
problem, is given by
π̄ (t) := min π̂ (t), π0∗ for all t ∈ [0, T ]. (8)
π̄ will be named the optimal crash hedging strategy or optimal worst-case scenario
strategy.
Figure 3 shows the optimal worst-case scenario strategies of Korn/Wilmott if at
most one (solid line), two (dashed line), or three (dash–dotted line) crashes can
happen. Assuming that the investor has an initial investment horizon of T = 50 and
expects to see at most three crashes, a optimal worst-case scenario investor would use
the portfolio strategy π̂3 (t) until she observes a first crash, say at time τ1 . After having
observed a crash, the investor would switch to the strategy π̂2 (t), since the investor
expects to see at most two further crashes in the remaining investment horizon T −τ1 ;
and so on. Finally, if the investor expects to observe no further crash, she will switch
to the Merton fraction π0∗ .
The worst case scenario strategies are now compared to the optimal portfolio
strategy in Aase’s model, where λ(t) = T 1−t (see dotted line in Fig. 3), that is the
investor expects to see on average one crash over his remaining investment horizon
T − t. Clearly, setting λ in this way is somewhat unrealistic. Nevertheless, this
extreme example is used to point out several disadvantages of the expected utility
1.2
1
Fraction invested in the risky asset
0.8
0.6
0.4
^
π1 (t)
0.2 ^
π2 (t)
^
π (t)
3
∗
πP (λ(t) = 1/(T−t))
0 5 10 15 20 25 30 35 40 45 50
Investment Horizon T−t
approach of Aase. First, considering a λ which changes over time and depends on
the investment horizon of the investor, leads not only to a time-changing optimal
strategy π P (t), but also to a price dynamics of the risky asset which depends on the
investment horizon of the investor. Hence, any two investors with different investment
horizons would work with different price dynamics of the risky assets. Second, as
the investor approaches the investment horizon, λ(t) → ∞ (that is a crash happens
almost surely), thus, π P (t) → −∞, which would lead to big losses on short-term
investment horizons if no crash happens. Of course, these losses would average out
with the gains made if a crash happens remembering that the assumption is that—on
average—every second scenario would observe at least one crash. This is the effect
of averaging the crash out in an expected utility sense (compared to the worst-case
approach of Korn/Wilmott).
Basically, it would be possible to cut off π p (t) at zero, that is, one would not
allow for short-selling. This would imply to cut off λ(t) at μ0 k−r0 and there is no
economic interpretation why this should be done (except that short-selling might not
be allowed). Finally, note that it is also possible to set λ such that one expects to see
at least one crash with probability q (e.g., q = 5 %), however this would not remedy
the two disadvantages mentioned above.
Why is the worst-case scenario approach more suitable than the standard expected
utility approach in the presence of jumps? The standard expected utility approach
will average out the impact of the jumps over all possible scenarios. With other
words, the corresponding optimal strategy will offer protection only on average over
all possible scenarios, which will be good as long as either no jump or just a small
jump happens. However, if a large jumps happens, the protection is negligible. By
comparison, the worst-case scenario approach will offer full protection from a jump
up to the worst-case jump size assumed.
The situation can be compared to the case of buying liability insurance. The
standard utility approach would look at the average of all possible claim sizes (say
e.g., 100,000 EUR)—and its optimal strategy would be to buy liability insurance
with a cover of 100,000 EUR. However, the usual advice is to buy liability insurance
with a cover which is as high as possible—this solution corresponds to the worst-
case scenario approach. With other words, the aim is to insure the rare large jumps.
This observation is supported by the fact that many insurances offer retention (which
excludes small jumps from the insurance).
To the best of our knowledge, Hua and Wilmott [8] were the first to consider the
worst-case scenario approach in a binomial model to price derivatives. Korn and
Wilmott [9] were the first to apply this approach to portfolio optimization, and Korn
and Menkens [10] developed a stochastic control framework for this approach, while
Korn and Steffensen [12] considered this approach as a stochastic differential game.
Korn and Menkens [10] and Menkens [16] looked at changing market coefficients
274 O. Menkens
after a crash. Seifried [22] evolved a martingale approach for the worst-case sce-
nario. Moreover, the worst-case scenario approach has been applied to the optimal
investment problem of an insurance company (see Korn [11]) and to optimize rein-
surance for an insurance company (see Korn et al. [13]). Korn et al. [13] show also
in their setting that the worst-case scenario approach has a negative diversification
effect. Furthermore, both portfolio optimization under proportional transaction costs
(see Belak et al. [4]) and the infinite time consumption problem (see Desmettre
et al. [7]) have been studied in a worst-case scenario optimization setting. Mönnig
[18] applies the worst-case scenario approach in a stochastic target setting to com-
pute option prices. Finally, Belak et al. [2, 3] allow for a random number of crashes,
while Menkens [15] analyzes the costs and benefits of using the worst-case scenario
approach.
Notice that there is a different worst-case scenario optimization problem which
is also known as Wald’s Maximin approach (see Wald [23, 24]). The following
quotation is taken from Wald [23, p. 279]:
A problem of statistical inference may be interpreted as a zero sum two person game as
follows: Player 1 is Nature and player 2 is the statistician. […] The outcome K [θ, w(E)] of
the game is the risk r [θ|w(E)] of the statistician. Clearly, the statistician wishes to minimize
r [θ|w(E)]. Of course, we cannot say that Nature wants to maximize r [θ|w(E)]. However, if
the statistician is in complete ignorance as to Nature’s choice, it is perhaps not unreasonable to
base the theory of a proper choice of w(E) on the assumption that Nature wants to maximize
r [θ|w(E)].
This is a well-known concept in decision theory and is also known as robust opti-
mization (see e.g., Bertsimas et al. [5] or Rustem and Howe [21] and the references
therein). However, while the ansatz is the same, it is usually assumed that the parame-
ters (in our case r0 , μ0 , and σ0 ) are unknown within certain boundaries. Therefore,
this is a parameter uncertainty problem which is solved using a worst-case scenario
approach—instead of using perturbation analysis. Observe that this usually involves
optimization procedures done by a computer. Finally, note that the optimal strate-
gies can be computed directly only in the special case that only μ0 is uncertain (see
Mataramvura and Øksendal [14], Øksendal and Sulem [19], or Pelsser [20] for a
recent application in an insurance setting).
By comparison, the worst-case scenario approach considered in this paper is
taking (possibly external) shocks/jumps/crashes into account—and not parameter
uncertainty. While the original idea and the wording are similar or even the same, it
is clear that the worst-case scenario approach of Korn and Wilmott [9] is different
from the robust optimization approach in decision theory.
The remainder of this paper is organized as follows. Section 2 introduces the set up
of the model which will be considered; and Sect. 3 solves the optimization problem
if the probability of a potential crash is known. As a consequence, the q-quantile
crash hedging strategy will be developed in Sect. 4. Section 5 gives examples of
the q-quantile crash hedging strategy, while Sect. 6 shows that stochastic portfolio
strategies are always inferior to their corresponding deterministic portfolio strategies.
Finally, Sect. 7 concludes.
Worst-Case Scenario Portfolio Optimization Given the Probability of a Crash 275
Let us work with the model introduced above and let us make the following refine-
ments. First, it has been tacitly assumed that the investor is able to realize that the
crash has happened. Thus, let us model its occurrence via a F —stopping time τ . To
model the fact that the investor is able to realize that a jump of the stock price has hap-
pened it is supposed that the investor’s decisions are adapted to the P-augmentation
{Ft } of the filtration generated by the Brownian motion W (t). The difficulty of this
approach is to determine the optimal strategy after a crash because the starting point
is random, however Seifried [22] solved this problem.
Let us further suppose that the market conditions change after a possible crash.
Let therefore k (with k ∈ [k∗ , k ∗ ]) be the arbitrary size of a crash at time τ . The price
of the bond and the risky asset after a crash of size k happened at time τ is assumed
to be
with constant market coefficients r1 , μ1 , and σ1 > 0 after a possible crash of size k
at time τ . That is, this is the same market model as before the crash except that the
market parameters are allowed to change after a crash has happened.
It is important to keep in mind that the investor does not know for certain that a
crash will occur—the investor only thinks that it is possible. An investor who knows
that a crash will happen within the time horizon [0, T ] has additional information
and is therefore an insider. The set of possible crash heights of the insider is indeed
K I := [k∗ , k ∗ ], while the set of possible crash heights of the investor who thinks that
a crash is possible is K := {0} ∪ [k∗ , k ∗ ]. In this paper, only the portfolio problem
of the investor, who thinks a crash is possible, is considered.
For simplicity, the initial market will also be called market 0, while the market
after a crash will be called market 1. In order to set up the model, the following
definitions are needed.
Definition 1 (i) For i = 0, 1, let Ai (s, x) be the set of admissible portfolio
processes π(t) corresponding to an initial capital of x > 0 at time s, i.e.,
{Ft , s ≤ t ≤ T }–progressively measurable processes such that
(a) the wealth equation in market i in the usual crash-free setting
dX iπ,s,x (t) = X iπ,s,x (t) [(ri + π(t) [μi − ri ]) dt + π(t)σi dWi (t)] , (11)
X iπ,s,x (s) = x (12)
T
2
π(t)X iπ,s,x (t) dt < ∞ P–a.s. , (13)
s
i.e. X iπ,s,x (t) is the wealth process in market i in the crash-free world, which
uses the portfolio strategy π and starts at time s with initial wealth x.
In this section, let us suppose that the investor knows the probability of a crash
occurring. Let p, with p ∈ [0, 1], be the probability that a crash can happen (but
must not necessarily happen)1 . Note that the following argument holds also for time-
dependent p (that is p(t)), however to simplify the notation, it is assumed that p is
constant. In this situation, the optimization problem can be split up into two problems
(crash can occur, no crash happens) which have to be solved simultaneously. To that
end define for p ∈ [0, 1]
1 Observe that the important information is that no crash will happen with a probability of at least
1 − p. If one would say that a crash will happen with probability p, the investor would become an
insider with an adjusted optimization problem as described in Sect. 2, p. 9. However, this insider
approach would make the discussion way more difficult. Therefore, to simplify the discussion, the
approach of no crash happens/a crash can happen is taken here.
Worst-Case Scenario Portfolio Optimization Given the Probability of a Crash 277
E p ln X π,t,x (T ) := p E ln X π,t,x (T ) + (1 − p) E ln X 0π,t,x (T ) .
A crash can occur. No crash happens.
Thus, this is the original worst-case scenario portfolio problem. The solution is
already known.
(B) p = 0:
sup inf E0 ln X π,t,x (T ) = sup E ln X 0π,t,x (T ) ,
π(·)∈A(t,x) t≤τ ≤T, π(·)∈A0 (t,x)
k∈K
which is the classical optimal portfolio problem of Merton. The solution is well
known and is given in our notation (compare with Eq. (2)) by π0∗ .
Let us now consider the case p ∈ (0, 1). Denoting the optimal crash hedging
strategy in this situation by π̂ p , Eq. (15) can be rewritten as
J0 t, x, π̂ p = p · ν1 t, x 1 − π̂ p (t)k ∗ + (1 − p) J0 t, x, π̂ p
⇐⇒ J0 t, x, π̂ p = ν1 t, x 1 − π̂ p (t)k ∗ ,
where the last equation is obtained from the first equation by solving the first equation
for J0 . Since the latter equation is the indifference Eq. (5) in this setting, which leads
to the same ODE and boundary condition as in Korn and Wilmott [9], it follows that
π̂ p ≡ π̂ (see the paragraph between Eqs. (5) and (6) for details). This result shows
that the crash hedging strategy remains the same even if the probability of a crash
is known. Thus, this result justifies the wording worst-case scenario of the above-
developed concept. This is due to the fact that the worst-case scenario should be
278 O. Menkens
independent of the probability of the worst case and which has been shown above.
Let us summarize this result in a proposition.
Proposition 1 Given that the probability of a crash is positive, the worst-case sce-
nario portfolio problem as it has been defined in Eq. (3) is independent of the prob-
ability of the worst-case occurring.
If the probability of a crash is zero, the worst-case scenario portfolio problem
reduces to the classical crash-free portfolio problem.
Obviously, the concept of the worst case scenario has the disadvantage that additional
information (namely the given probability of a crash and the probability distribution
of the crash sizes) is not used. However, if the probability of a crash and the probability
of the crash size is known, it is possible to construct the (lower) q-quantile crash
hedging strategy.
Assume that p(t) ∈ [0, 1] is the given probability of a crash at time t ∈ [0, T ] and
assume that f (k, t) ∈ [0, 1] is the given density of the distribution function for a crash
of size k ∈ k∗ , k ∗ at time t. Moreover, suppose that a function q : [0, T ] −→ [0, 1]
is given. With this, define
⎧ ⎫
⎪
⎪ 0 & if 1 − p(t) ≥ q(t) ⎪
⎪
⎪
⎪ ⎪
⎪
⎪
⎪ %kq ⎪
⎪
⎨ inf kq : 1 − p(t) + p(t) f (k, t) dk ≥ q(t) if 1 − p(t) < q(t) and π ≥ 0 ⎬
kq (t; π ) := k∗ &
⎪
⎪ ⎪
⎪
⎪
⎪ %k ∗ ⎪
⎪
⎪ if 1 − p(t) < q(t) and π < 0 ⎪
⎩ sup kq : 1 − p(t) + p(t) f (k, t) dk ≥ q(t)
⎪ ⎪
⎭
kq
for any given portfolio strategy π . This has the following interpretation. The prob-
ability that at most a crash of size kq (t) at time t happens is q(t). Equivalently, the
probability that a crash higher than kq (t) will happen at time t is less than 1 − q(t).
Obviously, this is a value at risk approach which relaxes the worst-case scenario
approach.
Notice that the worst case of a non-negative portfolio strategy is either a crash
of size k ∗ or no crash. On the other hand, the worst case of a negative portfolio
strategy is either a crash of size k∗ or no crash. Correspondingly, the q-quantile
calculates differently for negative portfolio strategies (see the third row) than for the
non-negative portfolio strategies (see the second row). Furthermore, denote by
⎧ ⎫
⎨ {0} if kq (t) = 0 ⎬
K q (t) := {0} ∪ k∗ , kq (t) if kq (t) = 0 and π ≥ 0 .
⎩ ⎭
{0} ∪ kq (τ ), k ∗ if kq (t) = 0 and π < 0
Worst-Case Scenario Portfolio Optimization Given the Probability of a Crash 279
π,τ,X π (τ )
with X 1 0
(t) as above, is called the (lower) q-quantile scenario portfolio
problem.
(ii) The value function to the above problem is defined via
wq (t, x) := sup inf E ln X π,t,x (T ) . (18)
π(·)∈A(t,x) t≤τ ≤T,
k∈K q (τ )
1 σ0 2
π̂q (t) = π̂q (t) − π̂q (t) − π0∗ + Ψ1 − Ψ0 − π̂q (t)kq (t), (19)
kq (t) 2
π̂q (T ) = 0. (20)
For t ∈ [0, T ] \ supp kq set π̂q (t) := π0∗ .
1 1
0 ≤ π̂q (t) < ≤ for t ∈ supp kq .
kq (t) k∗
On the other side, if Ψ1 < r0 the q-quantile crash hedging strategy is bounded
by
∗ 2
π0 − (Ψ0 − Ψ1 ) < π̂q (t) < 0 for t ∈ [0, T ).
σ02
(ii) If Ψ1 < Ψ0 and π0∗ < 0, there exists a partial q-quantile crash hedging strategy
π̃q at time t (which is different from π̂q ), if
ln 1 − π0∗ kq (t)
Sq (t) := T − > 0 for t ∈ supp kq . (21)
Ψ0 − Ψ1
With this, π̃q (t) is given by the unique solution of the differential equation
1 σ02
π̃q (t) = π̃q (t) − π̃q (t) − π0 + Ψ1 − Ψ0 − π̃q (t)kq (t),
∗ 2
kq (t) 2
π̃q Sq (t) = π0∗ .
For Sq (t) ≤ 0 set π̃q (t) := π0∗ . This partial crash hedging strategy is
bounded by
∗ 2
π0 − (Ψ0 − Ψ1 ) < π̃q (t) ≤ π0∗ < 0.
σ02
If kq is independent of the time t, the optimal portfolio strategy for an investor, who
wants to maximize her q-quantile scenario portfolio problem, is given by
Worst-Case Scenario Portfolio Optimization Given the Probability of a Crash 281
π̄q (t) := min π̂q (t), π̃q (t), π0∗ for all t ∈ [0, T ], (22)
where π̃q will be taken into account, if it exists. π̄q will also be called the optimal
q-quantile crash hedging strategy.
Remark 2 Let us write π̂k (t) (instead of π̂q (t)) to emphasize the dependence on k,
whenever needed. It follows from Eqs. (19) and (20) that
⎧ ⎫
1 k↓0,k=0
⎨∞ if Ψ1 < r0 ⎬
π̂k (T ) = − [Ψ1 − r0 ] −→ 0 if Ψ1 = r0 . (23)
k ⎩ ⎭
−∞ if Ψ1 > r0
(a) First, observe that this implies that π̂q (t) ≡ 0 if Ψ1 = r0 , that is this is the only
case where both the optimal q-quantile crash hedging strategy and the optimal
crash hedging strategy are constant. That is, everything is invested in the risk-free
asset if Ψ1 = r0 .
(b) Second, notice that π̂k 1 < π̂k 2 for k1 < k2 . Hence, π̂k1 ≥ π̂k2 with strict
inequality applying on [0, T ). Thus, in particular, π̂q (t) > π̂ (t) for t ∈ [0, T )
for any q which satisfies q(t) < 1 for t ∈ [0, T ).
(c) Third, for the remainder of this remark, let us consider only the case that Ψ1 ≤ Ψ0
and π0∗ ≥ 0 (the' other cases follow similarly). In this situation, one has that
∗
π̂k (t) ≤ π0 − σ22 (Ψ0 − Ψ1 ). Thus, it is clear that
0
&
0 ' for t = T
ψ(t) := ∗
π0 − σ 2 (Ψ0 − Ψ1 ) else
2
0
because of the convergence (23). Finally, keep in mind that the case k = 0 yields
π0∗ as the optimal portfolio with π0∗ ≡ ψ. An example is given in Fig. 4.
Proof (of Theorem 1) If kq (t) is constant in t this theorem follows from Theorem
4.1 in Korn and Wilmott [9, p.181], (for generalizations of this theorem see either
Theorem 4.2 in Korn and Menkens [10, p.135] or Theorem 3.1 in Menkens [16,
p.603]) by replacing k ∗ with kq . To verify the differential equation in the general
case, keep in mind that by differentiating the—modified—Equation (A.5) in Korn and
Wilmott [9, p.183] (or Eq. (3.1) in Menkens [16, p.602]) with respect to t, kq (t) has
also to be differentiated with respect to t. This leads to the differential equation (19).
282 O. Menkens
0.6
0.5
0.4
0.3
0.2
0.1
0
0 1 2 3 4 5 6 7 8 9 10
Time in years
5 Examples
kq
1
1− p+ p dk = q
k∗ − k∗
k∗
q + p−1 ∗
kq = k∗ + k − k∗ .
p
Worst-Case Scenario Portfolio Optimization Given the Probability of a Crash 283
Again, kq = k ∗ for q = 1.
Suppose that not only the crash height has a conditional exponential distribution,
but also the crash time has a conditional distribution, independent of the crash size,
that is
1 − e−θt
p(t) = q + ( p − q) . (24)
1 − e−θ T
This gives
⎛ ∗
#⎞
1 ∗
[1 − q] 1 − e−θ T e−λk∗ − e−λk
kq (t) = − ln ⎝e−λk + ⎠.
λ q 1 − e−θ T + ( p − q) 1 − e−θt
Clearly, this is an example where kq depends on the time t. Its derivative calculates
to
284 O. Menkens
0.8
0.7
Fraction Invested in the Risky Asset π
0.6
0.5
0.4
0.3
0
0 5 10 15 20 25 30 35 40 45 50
Investment Horizon T−t
Fig. 5 The range of (optimal) q-quantile crash hedging strategies for Ψ1 = Ψ0 and π0∗ ≥ 0. This
graphic shows π̂k ∗ (solid line), π̂k∗ (dotted line), the range of possible optimal q–quantile crash
hedging strategies (grey area) if kq is constant, and π0∗ (solid straight line). The dash–dotted line is
a uniform distributed example (see Sect. 5.1), the dashed line is an exponential distributed example
(see Sect. 5.2), and the dotted line is a time–varying example (see Sect. 5.3).
#
−( p−q)[1−q]θe−θt 1−e−θ T ∗
2 e−λk∗ − e−λk
dkq 1 q 1−e −θ T +( p−q) 1−e −θt
(t) = −
dt λ −λk ∗ [1−q] 1−e−θ T
e + −θ T e−λk∗ − e−λk ∗
q 1−e +( p−q) 1−e−θt
1
=
λ q 1 − e−θ T + ( p − q) 1 − e−θt
# ∗
#
( p − q) [1 − q] θ e−θt 1 − e−θ T e−λk∗ − e−λk
× ∗ ∗ .
( p − q) 1 − e−θt e−λk + 1 − e−θ T e−λk∗ [1 − q] − e−λk [1 − 2q]
Figure 5 shows the potential range of the optimal q-quantile crash hedging strategy
(the gray shaded area) if kq (t) = 0 is constant. Obviously, in the case of kq (t) = 0,
one has that π̂q (t) = π0∗ (that is the optimal strategy is to invest according to the Mer-
ton fraction). Moreover, if kq (t) is not constant, it can happen that the corresponding
q-quantile crash hedging strategy moves outside the given range. However, this usu-
dk
ally happens only if the derivative dtq is sufficiently large—which is not the case
for most situations. The parameters used in these figures are k ∗ = 0.5, k∗ = 0.1,
Worst-Case Scenario Portfolio Optimization Given the Probability of a Crash 285
π0∗ = 0.75, and σ0 = 0.25. Additionally, the examples discussed above are plotted
for the choices of p = 0.1 and q = 0.95. The dashed line is the example of a uniform
distribution with p = 10 % and q = 5 %. The dash–dotted line is the example of
an exponential distribution with the additional parameter λ = 10 (where the other
parameters are as above) and the dotted line is the example of a time varying crash
probability given in Eq. (24) with the additional parameter θ = 0.1. Notice that the
first two examples lead to similar strategies as in Korn and Wilmott [9], just that
k ∗ is replaced by kq , which is constant in those two examples. The third example
is clearly different from that. Starting with an investment horizon of T = 50 years,
the optimal strategy is to increase the fraction invested in the risky asset up to an
investment horizon of about 30 years. This is due to the fact that the probability of a
crash happening is 95 % at T = 50 and it is exponentially decreasing to 10 % as the
investment horizon is reached.
Proof (of Lemma 1) Using the Theorem of Fubini, one has for any admissible port-
folio strategy π
⎡ ⎤
T
σ02
J0 (t, x, π ) = ln(x) + E ⎣ π(s) − π0∗ ds ⎦
2
Ψ0 −
2
t
T
σ02 2 #
= ln(x) + Ψ0 − E π(s) − π0∗ ds
2
t
T
σ02 2 σ 2
= ln(x) + Ψ0 − E [π(s)] − π0∗ − 0 Var (π(s)) ds
2 2
t
T
σ02 2 σ 2
= ln(x) + Ψ0 − πd (s) − π0∗ − 0 Var (π(s)) ds
2 2
t
T
σ2
= J0 (t, x, πd ) − 0 Var (π(s)) ds
2
t
≤ J0 (t, x, πd ) .
This is the case if no crash happens. In the case that a crash has happened, one gets
with the definition
Aπ (t) := ln 1 − E [π(t)] kπd (t) − E [ln (1 − π(t)kπ (t))]
the following
where it has been used for the last inequality that Aπ (t) ≥ 0. However, this is Jensen’s
inequality which holds if 1 − π(t)kπ (t) ≥ 0. The latter holds for π(t) < k1∗ , which
is the assumption. This proves the assertion.
Remark 4 The condition π(t) < k1∗ is natural if a crash of size k ∗ can happen,
because it avoids that the investor can go bankrupt. Since k ∗ ≤ 1, the condition
means that the investor is not allowed to be too much leveraged.
Worst-Case Scenario Portfolio Optimization Given the Probability of a Crash 287
7 Conclusion
It has been shown that the worst-case scenario approach of Korn and Wilmott [9] will
not make use of additional probabilistic information of a crash happening. This is
overcome by introducing a q-quantile approach which is a Value at Risk ansatz to the
worst-case scenario method. Examples are given; in particular, one extreme example
shows that it is possible with the q-quantile approach to obtain optimal portfolio
strategies which are first increasing and then decreasing. Finally, it is shown that any
stochastic portfolio strategy will give a lower expected utility of terminal wealth (or
a lower worst-case scenario bound) than the corresponding deterministic portfolio
strategy (defined by taking the expectation of the stochastic portfolio strategy)).
Acknowledgments I like to thank Prof. Ralf Korn for many fruitful discussions, for generating
a stimulating working atmosphere, not only when I was his PhD student but also when I visited
him. I also benefited from discussions with Christian Ewald, Frank Thomas Seifried, and Mogens
Steffensen. Moreover, the feedback from Rudi Zagst and an anonymous referee improved this paper
considerably.
Support from DFG through the SPP 1033 Interagierende stochastische Systeme von hoher Kom-
plexität and partial support by the Science Foundation Ireland via the Edgeworth Centre (Grant No.
07/MI/008) and FMC2 (Grant No. 08/SRC/FMC1389) is gratefully acknowledged.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
10. Korn, R., Menkens, O.: Worst-case scenario portfolio optimization: a new stochastic control
approach. Math. Methods Oper. Res. 62(1), 123–140 (2005)
11. Korn, R.: Worst-case scenario investment for insurers. Insur.: Math. Econ. 36(1), 1–11 (2005)
12. Korn, R., Steffensen, M.: On worst-case portfolio optimization. SIAM J. Control Optim. 46(6),
2013–2030 (2007)
13. Korn, R., Menkens, O., Steffensen, M.: Worst-case-optimal dynamic reinsurance for large
claims. Eur. Actuar. J. 2(1), 21–48 (2012)
14. Mataramvura, S., Øksendal, B.: Risk minimizing portfolios and HJBI equations for stochastic
differential games. Stochastics 80(4), 317–337 (2008)
15. Menkens, O.: Costs and benefits of crash hedging. Preprint, available at http://ssrn.com/
abstract=2397233 February 2014
16. Menkens, O.: Crash hedging strategies and worst-case scenario portfolio optimization. Int. J.
Theor. Appl. Financ. 9(4), 597–618 (2006)
17. Merton, R.C.: Lifetime portfolio selection under uncertainty: The continuous-time case. Rev.
Econ. Stat. 51, 247–257 (1969)
18. Mönnig, L.: A worst-case optimization approach to impulse perturbed stochastic control with
application to financial risk management. PhD thesis, Technische Universität Dortmund (2012)
19. Øksendal, B., Sulem, A.: Portfolio optimization under model uncertainty and bsde games.
Quant. Financ. 11(11), 1665–1674 (2011)
20. Pelsser, A.: Pricing in incomplete markets. Preprint, available at http://ssrn.com/abstract=
1855565 May 2011
21. Rustem, B., Howe, M.: Algorithms for Worst-case Design and Applications to Risk Manage-
ment. Princton University Press, Princeton (2002)
22. Seifried, F.T.: Optimal investment for worst-case crash scenarios: a martingale approach. Math.
Oper. Res. 35(3), 559–579 (2010)
23. Wald, A.: Statistical decision functions which minimize the maximum risk. Ann. Math. 46(2),
265–280 (1945)
24. Wald, A.: Statistical Decision Functions. Wiley, New York (1950)
Improving Optimal Terminal Value
Replicating Portfolios
Abstract Currently, several large life insurance companies apply the replicating
portfolio technique for valuation and risk management of their liabilities. In [7], the
two most common approaches, cash-flow matching and terminal value matching,
have been investigated from a theoretical perspective and it has been shown that
optimal terminal value replicating portfolios are not suitable to replicate liability
cash-flows by construction. Thus, their usage for asset liability management is rather
restricted, especially for out-of-sample cash profiles of liabilities. In this paper, we
therefore enhance the terminal value approach by an additional linear regression of
the corresponding optimal dynamic numéraire strategy to overcome this drawback.
We show that terminal value matching together with an approximated dynamic strat-
egy has in-sample and out-of-sample performance very close to the optimal cash-
flow matching portfolio and, due to computational advantages, can thus be used as an
alternative for cash-flow matching, especially in risk and asset liability management.
1 Introduction
In the last years, market consistent valuation has become the standard approach
toward risk management of life insurance policies, see for example [3]. Due to the
complexity of life insurance contracts, most academics and practitioners resort to
Monte Carlo methods for valuation purposes. However, the difficulty is to find a
computationally efficient yet sufficiently accurate algorithm. For instance, contracts
may include surrender options, which allow the policy holder every year to cancel
the contract and withdraw the value of her account. In this context, [1, 2] and several
other authors therefore resort to the well-known least squares Monte Carlo approach,
which was originally introduced by [6] to price American options. In contrast, [9] first
suggested valuation of with-profits guaranteed annuity options, which are typical life
insurance products, via static replicating portfolios. To hedge against interest rate risk,
a portfolio is built of vanilla swaptions and a remarkably good fit of the market value
of annuity options is obtained. The purpose of constructing a replicating portfolio
is to approximate the liability cash-flows of an insurance company by a portfolio
formed by a finite number of selected financial instruments. If the approximation
is accurate, one obtains a good estimate of the market value of liabilities from the
fair value of the replicating portfolio. In current literature, two portfolio construction
approaches stand out. The first one aims to match liability cash-flows and cash-flows
of the replicating portfolio at each time point. The second one is less restrictive as
it only demands that accrued terminal values of the cash-flows match well at some
final time horizon T .
For risk purposes, insurance companies want to compute the fair value of their
assets and liabilities, i.e., the market consistent embedded value (MCEV) under
shifted market conditions now or one year in the future. More precisely, having
found a replicating portfolio which matches the fair value of liabilities under current
market conditions, one performs instantaneous shocks on known parameters (such
as volatility, forward rate curve, etc.) and checks if fair values are still matched. This
is commonly referred to as a comparison of sensitivities between the fair value of the
replicating portfolio and the fair value of liabilities. If sensitivities are similar, it is
usually assumed that fair values will be roughly matched one year in the future even
if rare events in the 99.5 % quantile take place. For instance, this is the motivation
for [4] to put additional constraints in the optimization problem to guarantee that
fair values are close to one another under various stress scenarios. Figure 1 illustrates
the dependence between initial asset prices and the fair value of liabilities and a
replicating portfolio. It can be observed that fair values are close to each other and
behave quite similar, but not fully identical.
For the purpose of improving terminal value matching, we start with the setup
as given in [7], that is, we consider the cash-flow matching problem and the termi-
nal value matching problem as proposed in [9] and [8], respectively, and relax the
requirement of static replication by allowing for dynamic investment strategies in
the numéraire asset. We briefly review the theoretical results derived therein, before
we investigate in more detail the benefit of our approach based on market scenarios
generated by an insurance company: First, we compare the in-sample and out-of-
sample performance of the two replicating portfolios. Then, in the main contribution
of this article, we take a closer look at the optimal dynamic investment strategy and
approximate it by a time-dependent linear combination of the replicating assets. In
our particular example, the approximation turns out to be remarkably accurate as
in-sample and out-of sample tests will show.
Improving Optimal Terminal Value Replicating Portfolios 291
Fig. 1 Fair value of liabilities and of the replicating portfolio depending on initial asset prices
1 Similarly to [7] we assume that all technical requirements are fulfilled (square integrability,
completeness of filtration, …).
292 J. Natolski and R. Werner
• (Nt )t∈T denotes the numéraire (with initial value N0 = 1, paying no intermediate
cash-flows) which is used in the dynamic investment strategy. We assume that
N T is paidas a cash-flow at the final time horizon. For convenience, let us write
C0F t, DtF for the cash payment of the numéraire at time t, that is
0, t = 1, . . . , T − 1
C0F t, DtF = .
NT , t = T
Next, we review the two most commonly used approaches for the construction of a
replicating portfolio.
The objective function penalizes the difference between two cash payments at each
time t. The role of the discounting factor 1/Nt is to assign equal weight to mismatches
of equal size in terms of their discounted value. An alternative approach is discounted
terminal value matching.
The terminal value of a cash-flow is obtained by summing all cash payments accrued
to the terminal time T with the risk-free interest rate. By discounted terminal value, we
mean the accrued terminal value discounted to the present. In mathematical notation,
the discounted accrued liability cash-flow and the discounted accrued cash-flow of
a replicating portfolio α = (α 0 , . . . , α m ) ∈ Rm+1 are given by
T
C L t, DtF , DtL
à :=
L
,
Nt
t=1
m
T C F t, DtF
i i
à (α) :=
F
α .
Nt
t=1 i=0
The observation that although two cash-flows may have entirely different cash pay-
ment profiles, they still have the same fair value, leads to the alternative optimization
proble3
2 21
Q
min E Ã − Ã (α)
L F
. (RPTV
˜ )
α∈Rm+1
Originally, this problem was introduced by [8] with the difference that they con-
sidered non-discounted terminal values.
Next, we recall the connection between (RPCF ) and (RPTV ˜ ) as established in [7]. If
the numéraire asset can be bought or sold at any time, problems (RPCF ) and (RPTV ˜ )
are practically the same. The brief explanation is that cash-flow mismatches can be
laid off by an appropriate strategy in this asset. These mismatches then sum up to
the discounted terminal value mismatch and thus problems (RPCF ) and (RPTV ˜ ) are
intimately linked.
In more detail, suppose that the insurance company is allowed to invest and finance
cash-flows from trading the numéraire asset at all times t = 1, . . . , T . Define the
following linear space of processes
T
A = (δt )t=1,...,T : ∀t = 1, . . . , T − 1, δt ∈ L (Ft ),
2
δt = 0 .
t=1
The introduction of such strategies turns out to be the key link between problems
˜ ): The discounted terminal value à (α, δ) corresponding to an
(RPCF ) and (RPTV F
à F (α, δ) = à F (α),
where
m
T CiF t, DtF
T
à (α, δ) :=
F
αi
− δt .
Nt
t=1 i=0 t=1
In other words, the discounted terminal value only depends on the initial portfolio
α 0 , α 1 , . . . , α m in the assets. Thus, we write à F (α) instead of à F (α, δ). We say
that two investment strategies (α, δ) and (β, δ̂) with α, β ∈ Rm+1 , δ, δ̂ ∈ A are
FV-equivalent iff
α = β.
Note that due to the above, initial portfolios of two FV-equivalent investment strate-
gies have equal fair value, as they produce identical discounted terminal values.
Based on the extension from static portfolios to partially dynamic strategies, we
define corresponding optimization problems,
T m 2 21
C L t, D F t , D L t C F i t, D F t
Q
inf E − α i
− δt , (GRPCF )
α∈Rm+1 ,δ∈A Nt Nt
t=1 i=0
the generalized discounted terminal value matching problem. Based on the following
two additional weak assumptions, the main results follow.
Assumption 1 The matrix EQ Q F is positive definite, where
Q F := ÃiF Ã Fj ,
i, j=0,...,m
with discounted terminal value ÃiF of the cash-flow generated by asset i given as
T
CiF t, DtF
ÃiF := .
Nt
t=1
Improving Optimal Terminal Value Replicating Portfolios 295
Assumption 2 Let αopt = αopt
0 , α1 , . . . , αm
opt opt be the solution to (RPTV
˜ ). The
m
cash-flow mismatch C T, DT , DT −
L F L
i=0 αopt Ci T, DT
i F F is not FT −1 -
measurable.
The following properties of the two optimization problems and their connections
were derived in [7].
1. Properties of (GRPTV
˜ ) and the relationship to (RPTV
˜ ):
a. The fair value of the solutions to (GRPTV˜ ) and the fair value of the solution
to (GRPCF ) are equal to the fair value of the liability cash-flow.
The main drawback of the generalized terminal value approach lies in the intro-
duction of the dynamic strategy in the numéraire asset: as the optimal δt depend on
the liability cash-flow (see Property 2.c above), this strategy is not available out-
of-sample to reproduce (unknown!) liability cash-flows. Although the main purpose
of replicating portfolios in risk management is fair value replication, asset liability
management usually requires cash-flow replication as well.
Therefore, the optimal numéraire strategy has to be estimated based on available
information up to time t, which then in turn allows a reproduction of liability cash-
flows, even in a terminal value approach. The most simple approach toward this end
is a standard linear regression of the optimal δt against the information available in
time t. Besides the obvious usage of prices of financial instruments as explaining
variables, any further available information (e.g., non-traded risk factors like interest
rate, etc.) could in theory be used for this purpose.
Starting with the portfolio solving (RPTV ˜ ), we compute (δt )t=1,...,T such that
cash-flows are perfectly matched in-sample except for T . The idea is to approximate
δt , t = 1, . . . T − 1 by an ordinary linear regression, that is
296 J. Natolski and R. Werner
F
CCA t, DtF F t, D F
CSP F
CNK t, DtF
t
δ̂t (a) := at1 + at2 + at3 , t = 1, . . . , T − 1,
Nt Nt Nt
−1
T
δ̂T (a) := − δ̂t
t=1
In other words, we solve (GRPCF ) with αopt0 , . . . , α m fixed and optimal for (RP )
opt ˜
TV
and δt restricted to have the form above. Note that the parameters (at1 , at2 , at3 )t=1,...T −1
are known to the insurer at present. The hope is that the portfolio obtained from match-
ing discounted terminal values together with dynamic investment strategy (δ̂t )t=1,...,T
will produce at least a similar out-of-sample objective value as the static portfolio
obtained from solving (RPCF ).
5 Example
Based on financial market scenarios provided by a life insurer, we carry out some
numerical analysis to compare the performance of the portfolios solving (RPCF ) and
(RPTV˜ ). The results above imply that in an in-sample test the terminal value technique
will outperform the cash-flow matching technique. On the other hand, it is not clear
what happens in an out-of-sample test. This will also depend on the robustness of
both methods.
Since scenarios for liability cash-flows were unavailable, we implemented the
model proposed by [5]. A policy holder pays an initial premium P0 , which is invested
by the insurer in a corresponding portfolio of assets with value process (At )t∈N .
The value of the contract (L t )t∈N now evolves according to the following recursive
formula.
At − L t
L t+1 = L t 1 + max r G , ρ −γ , t = 0, 1, . . . ,
Lt
where r G is the interest guaranteed to the policy holder, ρ is the level of participation
in market value earnings and γ is a target buffer ratio.
To generate liability cash-flows, we assumed that starting in January 1998 the
insurance company receives one client every year up to 2012. Each client pays an
initial nominal premium of 10.000 Euros. All contracts run 15 years. At maturity,
the value of the contract is paid to the policy holder generating a liability cash-flow.
Improving Optimal Terminal Value Replicating Portfolios 297
The portfolio in which the premia are invested consists of the Standard and Poors
500 index, the Nikkei 225 index and the cash account. We normed the values of the
cash account, the S and P 500 and the Nikkei 225 so that all three have value 1 Euro
in year 2012. Every year the portfolio is adjusted such that 80 % of the value are
invested in the cash account and 10 % are invested in each index.
For the construction of the replicating portfolio, we chose the same three assets.
Cash-flows are generated by selling or buying assets every year. Since we are con-
structing a static portfolio, the decision how many assets will be bought or sold each
year in the future has to be made in the present. Hence, one may regard the replicating
assets as 3 × 15 call options with strike 0, one option for each index/cash account
and each year. We chose the cash account as the numéraire asset in the market.
In order to make the evolution of contract values more sensitive to changes of
financial asset prices, we assumed a low guaranteed interest rate of r G = 2.0 %, a
high participation ratio ρ = 0.75 and a low target buffer ratio γ = 0.05. From 1,000
scenarios,4 we chose to use 500 for the construction of the replicating portfolios and
the remaining 500 for an out-of-sample performance test. The portfolio is constructed
Table 1 Optimal replicating portfolios (in thousand Euros) for problems (RPCF ) and (RPTV˜ ) and
their fair value
Cash-flow match Terminal value match
Year Cash account S&P Nikkei Cash account S&P Nikkei
Total initial position 173.010 2.688 0.886 178.091 11.174 −12.047
Fair value 176.6 177.2
2012 14.089 0 0 0 3.062 −5.530
2013 13.518 0.042 −0.005 0 2.739 −2.969
2014 13.141 0.052 −0.125 0 4.511 −8.693
2015 12.757 0.171 −0.091 0 −1.790 3.402
2016 12.719 0.237 −0.137 0 −0.745 −3.480
2017 12.728 0.261 −0.038 0 −0.093 −0.459
2018 11.685 0.250 0.046 0 −1.600 4.009
2019 11.300 0.304 0.081 0 1.269 −8.352
2020 10.964 0.212 0.065 0 0.123 7.723
2021 10.606 0.196 0.131 0 1.013 2.730
2022 10.308 0.208 0.169 0 2.139 −1.091
2023 10.155 0.269 0.287 0 2.179 −1.842
2024 9.847 0.184 0.191 0 −1.522 4.333
2025 9.645 0.152 0.143 0 0.828 −0.190
2026 9.549 0.150 0.168 178.091 −0.939 −1.637
The sample fair value of liabilities for the first 500 scenarios is 1.76 × 105 Euros
4As scenarios were provided by a life insurance company, only this restricted number of scenarios
was available. Scenario paths for the Nikkei and the S&P indices as well as the cash account
298 J. Natolski and R. Werner
Table 2 Values of the objective function in (RPCF ) for optimal portfolios to (RPCF ) and (RPTV
˜ )
relative to the fair value of liabilities
In-sample (%) Out-of-sample (%)
Cash-flow 8.72 9.23
Terminal value 193.2 192.8
in year 2012. Tables 1 and 2 show optimal portfolios and the magnitude of in-sample
and out-of-sample mismatches. The numbers in Table 1 show which quantity (in
thousands) of each asset should be bought or sold at the end of each particular
year and the total initial position in year 2012. For the mismatches in Table 2, we
computed the objective value of the cash-flow matching problem for both portfolios
in-sample and out-of sample and divided by the fair value of liabilities. Therefore,
these numbers can be viewed as a relative error.
It needs to be noted that in the terminal value matching problem, all strategies
concerning purchases and sales of the cash account lead to the same objective value.
Hence, the terminal position of 178.091 could have been spread in all possible man-
ners over the years 2012–2026 without any difference.
As one may have expected, the replicating portfolio obtained from discounted
terminal value matching very badly matches cash payments in particular years since
these mismatches are not penalized by the objective function of the discounted ter-
minal value matching problem. Consequently, a replicating portfolio obtained from
terminal value matching is of little use to the insurer if cash payments are supposed
to match well at each point in time. As already explained, the missing remedy is
to employ an approximation of the appropriate dynamic investment strategy in the
numéraire asset.
We implemented the linear approximation of the optimal dynamic investment
strategy as outlined at the end of Sect. 4 for the same scenarios that
were used for the portfolio optimizations (see Fig. 2). Table 3 shows the optimal
parameters (at1 , at2 , at3 )t=1,...,T −1 and the coefficients of determination R 2 .
On first sight, it is striking how large the coefficients of determination (R 2 ) are (on
average above 80 %). However, since the optimal δt is a linear
F t, D F /N , C F t, D F /N
combination of discounted financial cash-flows CCA
t t SP t t
and CNKF t, DtF /Nt and discounted liability cash-flow C L t, DtF , DtL /Nt , this is
not too surprising. Actually, if liability cash-flows were known, i.e., available for the
regression, a perfect fit (i.e., R 2 = 100 %) would be obtainable. In all other cases,
the liability cash-flow is approximated by the asset cash-flows rather well.
Analogous to Table 2, Table 4 shows the in-sample and out-of-sample objective
function values for the portfolio solving (RPCF ) and the portfolio solving (RPTV ˜ )
(Footnote 4 continued)
were generated with standard models from the Barrie and Hibbert Economic Scenario Generator
(see www.barrhibb.com/economic_scenario_generator).
Improving Optimal Terminal Value Replicating Portfolios 299
Fig. 2 The bar chart shows cash-flows of liabilities and the optimal terminal value replicating
portfolio with and without a dynamic correction in the first ten years
together with dynamic investment strategy (δ̂t )t=1,...,T relative to the fair value of
the liabilities.
Clearly, the dynamic strategy in the replicating assets significantly improves the
quality of the cash-flow match. Yet, the optimal portfolio for cash-flow matching still
slightly outperforms this dynamic variant due to the reasoning given above.
We also regressed with additional in-the-money call options on the cash-flows,
but there was only a negligible improvement in-sample and out-of-sample. Possibly,
one may achieve better results with a more sophisticated choice of regressors, but that
seems unlikely or at least challenging given the high coefficients of determination.
Further, all results obtained above have been tested to be quite stable when changing
the number of scenarios or changing the specific choice of liabilities. Of course, a
more detailed analysis based on a real-world example could provide further valuable
insights.
300 J. Natolski and R. Werner
Table 3 Parameters (in thousands) obtained from linear regression and the coefficients of deter-
mination
Year a1 a2 a3 R2
2012 1.4089 −2.3976 0.6460 1
2013 1.3634 −1.9468 0.3073 0.96
2014 1.3361 −3.2633 0.9379 0.99
2015 1.2956 1.6990 −0.4615 0.87
2016 1.2954 0.8510 0.3424 0.91
2017 1.2903 0.3433 0.0117 0.18
2018 1.1768 1.4854 −0.4825 0.83
2019 1.1267 −0.6920 0.9790 0.96
2020 1.0790 0.1306 −0.8758 0.94
2021 1.0380 −0.5618 −0.2773 0.69
2022 1.0143 −1.4692 0.1697 0.61
2023 1.0078 −1.4899 0.2622 0.78
2024 0.9846 1.3168 −0.4793 0.93
2025 0.9706 −0.5706 0.0376 0.43
Table 4 Values of the objective function in (RPCF ) for the optimal portfolio to (RPCF ) and the
˜ ) with strategy (δ̂t )t=1,...,T relative to the fair value of liabilities
optimal portfolio (RPTV
Cash-flow (%) T.V. w. correction (%)
In-sample 8.72 10.16
Out-of-sample 9.23 11.05
6 Conclusion
Acknowledgments The authors would like to thank Pierre Joos, Christoph Winter, and Axel See-
mann for very helpful discussions and feedback. We are also grateful for the generous constant
financial support by Allianz Deutschland AG. Finally, many thanks go to two anonymous referees
of this paper for valuable comments which helped to improve presentation.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
1. Andreatta, G., Corradin, S.: Fair value of life liabilities with embedded options: an application
to a portfolio of Italian insurance policies. Working Paper, RAS Spa, Pianificazione Redditività
di Gruppo
2. Baione, F., De Angelis, P., Fortunati, A.: On a fair value model for participating life insurance
policies. Invest. Manag. Financ. Innov. 3(2), 105–115 (2006)
3. Bauer, D., Bergmann, D., Kiesel, R.: On the risk-neutral valuation of life insurance contracts
with numerical methods in view. Astin Bull. 40, 65–95 (2010)
4. Dubrana, L.: A formalized hybrid portfolio replication technique applied to participating life
insurance portfolios. Available at http://www.ludovicdubrana.com// (2013) Accessed 30 Dec
2013
5. Grosen, A., Jørgensen, P.L.: Fair valuation of life insurance liabilities: the impact of interest
rate guarantees, surrender options, and bonus policies. Insur.: Math. Econ. 26(1), 37–57 (2000)
6. Longstaff, F., Schwartz, E.: Valuing american options by simulation: a simple least-squares
approach. Rev. Financ. Stud. 14(1), 113–147 (2001)
7. Natolski, J., Werner, R.: Mathematical analysis of different approaches for replicating portfo-
lios. Euro. Actuar. J. (2014). doi:10.1007/s13385-014-0094-z
8. Oechslin, J., Aubry, O., Aellig, M., Kappeli, A., Bronnimann, D., Tandonnet A., Valois, G.:
Replicating embedded options. Life and Pensions Risk, 47–52 (2007)
9. Pelsser, A.: Pricing and hedging guaranteed annuity options via static option replication. Insur.:
Math. Econ. 33(2), 283–296 (2003)
Part IV
Computational Methods
for Risk Management
Risk and Computation
Rüdiger U. Seydel
1 Computational Risk
Early computer codes concentrated on the evaluation of special functions. The empha-
sis was to deliver full accuracy (say, seven correct decimal digits on a 32-bit machine)
in minimal time. Many of these algorithms are based on formulas of [1, 6]. Later
the interest shifted to more complex algorithms such as solving differential equa-
tions, where discretizations are required. Typically, the errors are of the type CΔ p ,
where Δ represents a discretization parameter, p denotes the convergence order of the
method, and C is a hardly assessable error coefficient. A control of the error is highly
complicated, costly, and frequently somewhat vague, and is source of computational
risk.
This first part of the paper discusses how to assess the risk from erroneous results
of algorithms. Accuracy properties of algorithms will have to be reconsidered.
costs
large
algorithms
research, aim
low
error
small large
1 An example of such a diagram for the task of pricing American-style options is, for example,
Fig. 4.19 in [14], however, for the root mean square error of a set of 60 problems.
Risk and Computation 307
Notice that this error in the final result does not explicitly consider intermediate
errors or inconsistencies in the algorithm. For example, errors from solving linear
equations, instability caused by propagation of rounding errors, or discretization
errors do not enter explicitly. The final lumped error is seen with the eyes of the user.
Then these “ultimate” versions of algorithms are investigated for their accuracy.
We suggest to gather accuracy or error information into a file separate from the
algorithm. This “file” can be a look-up table, or a set of inequalities for parameters.
Typically, the accuracy results will be determined empirically. As an illustration,
the accuracy information for a certain task (say, pricing an American-style vanilla
put option) and a specific set of parameters (strike, volatility σ , interest rate r , time
to maturity T ) might look as in Table 1. As application, one chooses the algorithm
according to the information file.
1.4 Effort
Certainly, the above suggestion amounts to a big endeavor. In general, original papers
do not contain the required accuracy information. Instead, usually, convergence
behavior, stability, and intermediate errors are analyzed. Accuracy is mostly tested
on a small selection of numerical examples. It will be a challenge to researchers, to
provide the additional accuracy information for “any” set of parameters. The best way
to organize this is left open. Strong results will establish inequalities for the parame-
ters that guarantee certain accuracy. Weaker results will establish multidimensional
tables of discrete values of the parameters, and the application will interpolate the
accuracy.
To encourage the work, let us repeat the advantages: Accuracy information and
conditions under which algorithms fail will be included in external files. The algo-
rithms will be slimmed down, the production runs will be faster, and the costs on a
particular computer are fixed and known in advance. The computational risk will be
eliminated.
1.5 Example
As an example, consider the pricing of a vanilla American put at the money, with one
year to maturity. We choose an algorithm that implements the analytic interpolation
Risk and Computation 309
0.02
0.015
0.01
0.005
-0.005
0.2
0.3
0.4
0.5
σ 0.6 0.02
0.7 0.03
0.05 0.04
0.8 0.06
0.08 0.07
0.9
0.1 0.09 r
method by Johnson [7].3 For the specific option problem, the remaining parame-
ters are r and σ . Figure 2 shows the relative error in the calculated price of the
option depending on r and σ , and implicitly a map of accuracies. For the underlying
rectangle of realistic r, σ -values, and the assumed type of option, a result can be
summarized as follows:
In case σ > 3r holds, the absolute of the relative accuracy is smaller than 0.005
(two and a half digits).
We now turn to the second topic of the paper, on how to assess structural changes in a
model computationally. This is based on dynamical systems, in which the dynamical
behavior depends on a certain model parameter. Critical threshold values of this
parameter will be decisive. Below we shall understand “structural risk” as given by
the distance to the next threshold value of the critical parameter. An early paper
stressing the role threshold values (bifurcations) can play for a risk analysis is [11].
The approach has been applied successfully in electrical engineering for assessing
voltage collapse, see [3]. We begin with recalling some basic facts from dynamical
systems.
3 For analytic methods, strong results may be easier to obtain because implementation issues are
less relevant.
310 R.U. Seydel
An ODE stability analysis of its deterministic kernel does not reveal an attractor.
Rather the equilibrium is degenerate, the Jacobian matrix is singular. Simulating the
tandem system shows two trajectories dancing about each other, but drifting across
the phase space erratically. What is needed is some anchoring, which can be provided
by an additional nonlinear term.
We digress for a moment to emphasize that the above tandem is a mean-field model.
In canonical variables x1 , x2 , it is of the type
1 (1)
dx1 = α1∗ 2 (x 1 + x2 ) − x1 dt + γ1 x1 dWt
dx2 = ∗ 1
α2 2 (x1 + x2 ) − x2 dt + γ2 x2 dWt(2)
1
n
x̄ := xi ,
n
i
and a key element for modeling interaction among agents [4, 5]. More general mean-
field models include an additional nonlinear term, and are of the type
Notice that the dimension n is a parameter, and the solution structure thus depends
on the number of variables. The parameters α measure the size of cooperation, and
γ the strength of external random forces. The nonlinearity f (x) and the balance of
the parameters β, α, γ , n control the dynamics.
As noted above, a suitable nonlinear term can induce a dynamic control that prevents
the trajectories from drifting around erratically. Here we choose a cubic nonlinearity
of the Duffing-type f (x) = x − x 3 , since it represents a classical bistability [13].
For slightly more flexibility, we shift the location of equilibria by a constant s;
otherwise, we choose constants artificially. For the purpose of demonstration, our
artificial example is the system
dx1 = 0.1(x1 − s) 1 − (x1 − s)2 dt + 0.5 [x2 − x1 ] dt + 0.1x1 dWt
dx2 = 0.5 [x1 − x2 ] dt
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0
0 20 40 60 80 100
Fig. 3 Artificial example of Sect. 2.3: x1 and x2 over time t, for s = 2, starting at 0.1
312 R.U. Seydel
3.5
2.5
1.5
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4
Fig. 4 x1 , x2 -phase plane, with the trajectory of Fig. 3, and 11 trajectories of the unforced system
Clearly, there are three ODE equilibria, namely two stable nodes at x1 = x2 = s ± 1
and a saddle at x1 = x2 = s. For graphical illustrations of the response, see Figs. 3
and 4.
Figure 3 depicts the quick attraction of the trajectories (starting at 0.1) toward
the smaller node at s − 1 = 1. This dynamical response is shown again in Fig. 4 in
the x1 , x2 -phase portrait. As a background, this figure shows 11 trajectories of the
deterministic kernel, where the random perturbation is switched off. Starting from 11
initial points in the plane, the trajectories approach one of the two stable nodes. This
part of the plane consists of two basins of attraction, separated by a separatrix that
includes the saddle. The phase portrait of the deterministic kernel serves as skeleton
of the dynamics possible for the randomly perturbed system.
Now imagine to increase the strength of the random force (enlarge γ ). For suf-
ficiently large γ , the trajectories may jump across the wall of the separatrix. Then
the dynamics is attracted by the other node. Obviously, these transitions between
the two regimes may happen repeatedly. In this way, one of the stylized facts can
be modeled, namely the volatility clustering [2]5 . This experiment underlines the
modeling power of such nonlinear systems.
The above gentle reminder on dynamical systems has exhibited the three items node,
saddle, and separatrix. There are many more “beasts” in the phase space. The fol-
lowing is an incomplete list:
5 Phases with high and low volatility are separated from each other.
Risk and Computation 313
• stationary state,
• periodic behavior,
• chaotic behavior,
• jumps, discontinuities,
• loss or gain of stability.
These qualitative labels stand for the structure of dynamical responses. The structure
may change when a parameter is varied. Although a “parameter” is a constant, it may
undergo slow variations, or may be manipulated by some external (political) force.
Such changes in the “constant” parameter are called quasi-stationary. Typically, our
parameter is in the role of a control parameter. Some variations in the parameter may
have little consequences on the response of the system. But there are critical thresh-
old values of the parameter, where the changes in the structure can have dramatic
consequences. At these thresholds, small changes in the parameter can trigger essen-
tial changes in the state of the system. The mathematical mechanism that explains
such qualitative changes is bifurcation.6
When a system drifts toward a bifurcation, then this must be considered as risk!
Bifurcation is at the heart of systemic risk. Hence there is a need for a tool that signals
bifurcations in advance.
λ
R(λ) :=
|λ − λ0 | − ε
Fc := { λ | R(λ) < c } ,
6 For an introduction into bifurcation and related numerical methods, see [13].
7 Essentially, this is a deterministic approach. One may think of incorporating a volatility into R.
314 R.U. Seydel
2.6 Example
Sometimes, stock prices behave cyclically, and one may ask whether there is an
underlying deterministic kernel with periodic structure. In this context, behavioral
trading models are of interest. Lux [8] in his model splits traders into chartists and
fundamentalists, and models their impact on the price of an asset. The variables are
• p(t) market price of an asset, with fundamental value p ∗ ;
• z proportion of chartists, and
• x(t) their sentiment index, between −1 for pessimistic and +1 for optimistic.
The second equation models the sentiment x, with incentive functions U+− ,
U+f , U−f
ṗ
U+− : = α1 x + α2
ν
1
∗
1 ṗ p − p
U+f : = α3 rp +∗
− r − s
p ν2 p
∗ ∗
1 ṗ p − p
U−f : = α3 r − r p∗ + − s
p ν2 p ∗
30
25
20
15
10
0
0.4 0.5 0.6 0.7 0.8 0.9
Fig. 5 Risk index R over parameter z. Left wing index along the stationary states. Right wing index
along the periodic states
2.7 Summary
We summarize the second part of the paper. Provided a good model exists,8 we
suggest to begin with calculating the bifurcations/threshold values of parameters.
They are the pivoting points of possible trend switching. The distance between the
current operation point of the real financial system and the bifurcation point must
be observed. Large values of the risk index can be used as indicator, signaling how
close the risk is. This can be used as a tool for a stress test.
Acknowledgments The paper has benefited from discussions with Roland C. Seydel.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
1. Abramowitz, M., Stegun, I.A. (eds.): Handbook of Mathematical Functions. Dover, New York
(1968)
2. Cont, R.: Empirical properties of asset returns: stylized facts and statistical issues. Quant.
Finance 1, 223–236 (2001)
3. Eidiani, M.: A reliable and efficient method for assessing voltage stability in transmission and
distribution networks. Int. J. Electr. Power Energy Syst. 33, 453–456 (2011)
8 Admittedly, quite an assumption! But a lack of a good model is equivalent to a lack of understand-
ing. Of course, models in economics are not perfect. The challenge is to find a model that captures
all relevant nonlinearities.
316 R.U. Seydel
4. Garnier, J., Papanicolaou, G., Yang, T.-W.: Large deviations for a mean field model of systemic
risk. SIAM J. Financial Math. 4, 151–184 (2013)
5. Haldane, A.G.: Rethinking the financial network. Talk in Amsterdam (2009) http://www.bis.
org/review/r090505e.pdf?frames=0
6. Hart, J.F.: Computer Approximations. John Wiley, New York (1968)
7. Johnson, H.E.: An analytic approximation for the American put price. J. Financial Quant.
Anal. 18, 141–148 (1983)
8. Lux, T.: The socio-economic dynamics of speculative markets: interacting agents, chaos, and
the fat tails of return distributions. J. Econ. Behav. Organ. 33, 143–165 (1998)
9. Moore, R.E.: Methods and Applications of Interval Analysis. SIAM, Philadelphia (1979)
10. Quecke, S.: Dynamische Systeme am Aktienmarkt. Diploma thesis, University of Cologne
(2003)
11. Seydel, R.: Risk and bifurcation. Towards a deterministic risk analysis. In: Risk Analysis
and Management in a Global Economy. Volume I: Risk Management in Europe. New Chal-
lenges for the Industrial World, pp. 318–339. Monograph Series Institut für Technikfolgenab-
schätzung (1997)
12. Seydel, R.: A new risk index. ZAMM 84, 850–855 (2004)
13. Seydel, R.: Practical Bifurcation and Stability Analysis (First Edition 1988), 3rd edn. Springer,
New York (2010)
14. Seydel, R.: Tools for Computational Finance, 5th edn. Springer, London (2012)
Extreme Value Importance Sampling for Rare
Event Risk Measurement
Abstract We suggest practical and simple methods for Monte Carlo estimation of
the (small) probabilities of large losses using importance sampling. We argue that
a simple optimal choice of importance sampling distribution is a member of the
generalized extreme value distribution and, unlike the common alternatives such as
Esscher transform, this family achieves bounded relative error in the tail. Examples
of simulating rare event probabilities and conditional tail expectations are given and
very large efficiency gains are achieved.
Keywords Rare event simulation · Risk measurement · Relative error · Monte Carlo
methods · Importance sampling
1 Introduction
E θ ([I (L(Y) > t) f (Y)/ f IS (Y)]), where I (∗) denotes the indicator function. We use
1
n
f (Yi )
p̂t = I (L(Yi ) > t) , where Y j ∼ f IS (y). (1)
n f IS (Yi )
i=1
confirming that this is an unbiased estimator. There is a great deal of the literature
on such problems when the event of interest is “rare”, i.e., when pt is very small,
and many different approaches depending on the underlying loss function and dis-
tribution. We do not attempt a review of the literature in the limited space available.
Excellent reviews of the methods and applications are given in Chap. 6 of [1] and
Chap. 10 of [9]. Highly efficient methods have been developed for tail estimation in
very simple problems, such as when the loss function consists of a sum of indepen-
dent identically distributed increments. In this paper, we will provide practical tools
for simulation of such problems in many examples of common interest. For rare
events, the variance or standard error is less suitable as a performance measure than
a version scaled by the mean because in estimating very small probabilities such as
0.0001, it is not the absolute size of the error that matters but its size relative to the
true value.
Definition 1 The relative error (RE) of the importance sample estimator is the ratio
of the estimator’s standard deviation to its mean.
Simulation is made more difficult for rare events because crude Monte Carlo
fails. As a simple illustration, suppose we wish to estimate a very small probability
pt . To this end, we generate n values of L(Yi ) and estimate this probability with
p̂ = X/n where X is the number of times that L(Yi ) > t and X has a Binomial(n, pt )
distribution. In this case, the relative error is
Var( Xn ) n pt (1 − pt )
1
1 − pt
RE = X = = n −1/2 .
E n pt pt
For rare events, pt is small and the relative error is very large. If we wish a normal-
based confidence interval for pt of the form ( p̂t − 0.1 p̂t , p̂t + 0.1 p̂t ) for example,
we are essentially stipulating a certain relative error (RE = 0.05102) whatever the
value of pt . In order to achieve a reasonable bound on the relative error, we would
need to use sample sizes that were of the order of pt−1 , i.e., larger and larger sample
sizes for rarer events.
Extreme Value Importance Sampling for Rare Event Risk Measurement 319
and then tune the parameter θ so that the IS estimator is as efficient as possible (see, for
example [1, 7, 15]). Chapter 10 of [9] provides a detailed discussion of methods and
applications as well as a discussion of the boundedness of relative error. McLeish [11]
demonstrates that the IS distribution (2) is suboptimal and unlike the alternatives we
explore there, does not typically achieve bounded relative error. We argue for the use
of the generalized extreme value (GEV) family of distributions for such problems.
A loose paraphrase of the theme of the current paper is “all you really need1 is
GEV”. Indeed in Appendix A, we prove a result (Proposition 1) which shows that,
under some conditions, there is always an importance sampling estimator whose
relative error is bounded in the tail obtained by generating the distance along one
principal axis from an extreme value distribution, while leaving the other coordinates
unchanged in distribution. We now consider some one-dimensional problems.
Consider estimating P (L(Y ) > t) where the value of t is large, the random variable
Y is one-dimensional and L(y) is monotonically increasing. We would like to use an
importance sample distribution for which, by adjusting the values of the parameters,
1 For importance sampling estimates of rare events, at least, with apologies to the Beatles.
320 D.L. McLeish and Z. Men
we can have small relative error for any large t. We seek a parametric family { f θ ; θ ∈
Θ} of importance sample estimators which have bounded relative error as follows:
A parametric family has bounded relative error for estimating functions in a class
H if, for each t > T, there exists a parameter value θ which provides bounded
relative error. Indeed, a bound on the relative error of approximately 0.738n −1/2 can
be achieved by importance sampling if we know the tail behavior of the distribu-
tion. There are very few circumstances under which the exponential tilt, families of
continuous densities of the form
provides bounded relative error. The literature recommending the exponential tilt
usually rests on demonstrating logarithmic efficiency (see [1], p. 159 or Sect. 10.1 of
[9]), a substantially weaker condition that does not guarantee a bound in the relative
error. Although we may design a simulation optimally for a specific criterion such as
achieving small relative error in the estimation of P(L(Y ) > t), we are more often
interested in the nature of the wholetail beyond t. For example, we may be interested
∞
in E [(L(Y ) − t) I (L(Y ) > t)] = t P(L(Y ) > s)ds and this would require that
a single simulation be efficient for estimating all parameters P(L(Y ) > s), s > t.
The property of bounded relative error provides some assurance that the family used
adapts to the whole tail, rather than a single quantile.
For simplicity, we assume for the present that Y is univariate, has a continuous
distribution, and L(Y ) is a strictly increasing function of Y. Then
∞
P (L(Y ) > t) = f (y)dy
L −1 (t)
and we can achieve bounded relative error if we use an importance sample distribution
drawn from the family
where T (y) behaves, for large values of y, roughly like a linear function of F̄(y) =
1− F(y). If T (y) ∼ − F̄(y) as y → ∞, the optimal parameter θ is θt = kp2t 1.5936
pt
2
and the limit of the relative error of the IS estimator is 0.738n −1/2 (see Appendix
A). The simplest and most tractable family of distributions with appropriate tail
behavior is the GEV distribution associated with the density f (y).
We now provide an intuitive argument in favor of the use of the GEV family of
IS distributions. For a more rigorous justification, see Appendix A.
The choice T (y) = − F̄(y) provides asymptotic bounded relative error [11].
Consider a family of cumulative distribution functions
The corresponding probability density function Fθ (y) is of the form (3). As θ → ∞
and y → ∞ in such a way that θ F̄(y) converges to a nonzero constant, then
so that
Fθ (y) ∼ (F(y))θ . (5)
Thus, when we use the corresponding extreme value importance sample distribution,
about 20.3 % of the observations will fall below t and the other 79.7 % will fall above,
and this can be used to identify one of the parameters of the IS distribution. Of course,
only the observations greater than t are directly relevant to estimating quantities like
P(L > t). This leads to the option of conditioning on the event L > t and using the
generalized Pareto family (see Appendix B).
The three distinct classes of extreme value distributions and some of their basic
properties are outlined in Appendix B. All have simple closed forms for their pdf, cdf,
and inverse cdf and can be easily and efficiently generated. In addition to a shape
parameter ξ, they have location and scale parameters d, c so the cdf is Hξ ( x−d c ),
where Hξ (x) is the cdf for unit scale and 0 location parameter. We say that a given
continuous cdf F falls in the maximum domain of attraction of an extreme value cdf
Hξ (x) if there exist sequences cn , dn such that
F n (dn + cn x) → Hξ (x) as n → ∞.
We will choose an extreme value distribution with parameters that approximate the
distribution of the maximum of θt = k2 / pt random variables from the original
density f (y). Further properties of the extreme value distributions and detail on the
322 D.L. McLeish and Z. Men
sample estimator based on a sample size 5 × 106 is quite feasible on a small laptop
computer, but is roughly equivalent to a crude Monte Carlo estimator of sample size
9.2 × 1010 , possible only on the largest computers.
3 Examples
Rarely when we wish to simulate an expected value in the region of the space
[L(Y ) > t] is this the only quantity of interest. More commonly, we are interested
in various functions sensitive to the tail of the distribution. This argues for using
an IS estimator with bounded relative error rather than the more common practice
of simply conditioning on the region of interest. For a simple example, suppose Y
follows a N (0, 1) distribution and we wish to estimate a property of the tail defined
by Y > t, where t is large. Suppose we simulate from the conditional distribution
given Y > t, that is from the pdf
1
φ(y)I (y > t) (6)
1 − Φ(t)
Extreme Value Importance Sampling for Rare Event Risk Measurement 323
where φ and Φ are the standard normal pdf and cdf respectively. If we wish also
s2
to estimate P(Y > t + s|Y > t) ∼ e−st− 2 for s > 0 fixed, sampling from this
pdf is highly inefficient, since for n simulations
from pdf (6), the RE for estimating
s2
P(Y > t + s|Y > t) is approximately n −1/2 est+ 2 − 1 and this grows extremely
s2
rapidly in both t and s. We would need a sample size of around n = 104 est+ 2 (or
about 60 trillion if s = 3 and t = 6) from the IS density (6) to achieve a RE of 1 %.
Crude Monte Carlo fails here but use of IS with the usual standard exponential tilt
or Esscher transform with T (y) = y, though very much better, still fails to deliver
1/4 √
bounded relative error. In [11], it is shown that the relative error is ∼ π2 t/n →
∞ as pt → 0. While the IS distribution obtained by the exponential tilt is a very large
improvement over crude Monte Carlo and logarithmically efficient, it still results in
an unbounded relative error as pt → 0.
The Normal distribution is in the maximum domain of attraction of the Gumbel
distribution (ξ = 0) so our arguments suggest that we should use as IS distribution
y−d
H0 ( ) = exp(−e−(y−d)/c ) (7)
c
−
Y −dθ Y − dθ
w(Y ) = cθ φ(Y ) exp(e cθ + ). (8)
cθ
For example with t = 4.7534, pt = 10−6 and Gumbel parameters c = 0.20 and
d = 4.85, the relative error in 106 simulations was 0.729n −1/2 . We can compare
this with the exponential tilt, equivalent to using the normal(t, 1) distribution as an
IS distribution, whose relative error is 2.32n −1/2 , or with crude Monte Carlo, with
relative error around 103 n −1/2 .
Suppose that our interest is in estimating the conditional tail expectation or TVaRα
I (Y >t)]
based on simulations. The TVaRα is defined as E(Y |Y > t) = E[YP(Y >t) . We
designed the GEV parameters for simulating the numerator, E[Y I (Y > t)]. If we
are interested in estimating TVaR0.0001 by simulation, t = VaR0.0001 = 3.719 the
true value is ∞
ze−z /2 dz
2
We will generate random variables Yi using the Gumbel (0.282, 3.228 ) distribution
and then attach weights (8) to these observations. The estimate of TVaRα is then the
324 D.L. McLeish and Z. Men
average of the values w(Yi ) × Yi averaged only over those values that are greater
than t.
The Gumbel distribution is supported on the whole real line, while the region of
interest is only that portion of the space greater than t so one might generate Yi from
the Gumbel distribution conditional on the event Yi > t rather than unconditionally.
The probability P(Yi > t) where Yi is distributed according to the Gumbel(cθ , dθ )
distribution is exp(−e−(t−dθ )/cθ ) and this is typically around 0.80 indicating that
about 20 % of the time the Gumbel random variables fall in the “irrelevant” portion
of the sample space S < t. Since it is easy to generate from the conditional Gumbel
distribution Y |Y > t this was also done for a further improvement in efficiency. This
conditional distribution converges to the generalized Pareto family of distributions
(see Theorem 2 of Appendix B). In this case, since ξ = 0, P(Y − u ≤ z|Y >
u) → 1 − e−z as u → ∞. Therefore, in order to approximately generate from the
conditional distribution of the tail of the Gumbel, we generate the excess from an
exponential distribution.
Table 1 provides a summary of the results of these simulations. Several simulation
methods for estimating TVaRα = E(Y |Y > t) = E[Y I p(Yt >t)] with pt = P(Y > t) as
well as estimates of pt are compared. Since TVaR is a ratio, we consider estimates of
the denominator and numerator, i.e., pt and E[Y I (Y > t)] separately. The underlying
distribution of Y is normal in all cases. The methods investigated are:
1. Crude Simulation (Crude) Generate independently n Yi , i = 1, . . . , n from orig-
inal (normal) distribution. Estimate pt using n1 i=1 I (Yi > t) and estimate
1 n
E [Y I (Y > t)] using n i=1 Yi I (Yi > t).
2. Exponential Tilt or Shifted Normal IS (SN) Generate independently Yi , i =
n
1, . . . , n from N (t, 1) distribution. Estimate pt using n1 i=1 wi I (Yi > t) and
n
estimate E [Y I (Y > t)] using n1 i=1 wi Yi I (Yi > t) where wi are the weights,
obtained as the likelihood ratio
φ(Yi )
wi = w(Yi ) = .
φ(Yi − t)
Since the exponential tilt, applied to a Normal distribution, results in another Nor-
mal distribution with a shifted mean and the same variance, this is an application
of the exponential tilt.
3. Extreme Value IS (EVIS) Generate independently n Yi , i = 1, . . . , n from the
Gumbel(c, d) distribution. Estimate pt using n1 i=1 wi I (Yi > t) and estimate
Extreme Value Importance Sampling for Rare Event Risk Measurement 325
n
E [Y I (Y > t)] using n1 i=1 wi Yi I (Yi > t) where wi are the weights, obtained
as the likelihood ratio
φ(Yi )
wi = w(Yi ) = Yi −d
c h0( c )
1
and where
c h0( c )
1 s−d
g(s) = , for s > t
1 − H0 ( t−d
c )
tilt.
Table 2 Relative error of estimators: Crude, shifted normal (G & L) and EVIS
t pt n n 1/2 RE(Crude) n 1/2 RE(EVIS) n 1/2 RE(G&L)
1,500 0.0075 30,000 12.01 0.70 2.03
2,000 0.0041 30,000 15.34 0.69 2.33
3,000 0.0022 30,000 22.12 0.70 2.74
2
15.34
and a much more substantial decrease over crude by a factor of around 0.69 or
about 494.
1 − ak ak
T
erated as independent U(0, √1m ), where m is the number of factors. The simulation
described in [4] is a two-stage Monte Carlo IS method. The first stage simulates the
latent factors in Z by IS, where the importance distributions are independent univari-
ate Normal distributions with means obtained by solving equating modes and with
variances unchanged equal to 1. Specifically they choose the normal IS distribution
having the same mode as
t − μ(z)
P(L > t|Z = z)e−z e−z z/2
T z/2 T
1−Φ (9)
σ (z)
328 D.L. McLeish and Z. Men
because this is approximately proportional to P(Z = z|L > t), the ideal IS distrib-
ution. In other words, the IS distribution for Z i is N (μi , 1), i = 1, . . . , m where the
vector of values of μi is given by (see [4], Eq. (20))
Conditional on the values of the latent factors Z, the second stage of the algorithm
in [4] is to twist the Bernoulli random variables Yk using modified Bernoulli distrib-
utions, i.e., with a suitable change in the values of the probabilities P(Yk = 1), k =
1, . . . , ν. Our comparisons below are with this two-stage form of the IS algorithm.
Our simulation for this portfolio credit risk problem is a one-stage IS simulation
algorithm. If there are m factors in the portfolio credit risk model we simulate m − 1
of them Z̃ i from univariate normal N (μi , 1), i = 1, . . . , m −1 with a different mean,
as in [4], but then we simulate an approximation to the total loss, L̃, from a Gumbel
distribution, and finally set Z̃ m equal to the value implied by Z̃ 1 , . . . , Z̃ m−1 and L̃.
This requires solving an equation
for Z̃ m . The parameters μ = (μ1 , μ2 , . . . , μm−1 ) are obtained from the crude sim-
ulation. Having solved (12), we attach weight to this IS point ( Z̃ 1 , . . . , Z̃ m ) equal to
m
c × i=1 φ( Z̃ i )
ω= ∂L m−1 e( L̃−d)/c exp e−( L̃−d)/c , (13)
∂ Zm × i=1 φ( Z̃ i − μi )
and
⎛ ⎞
ν
−1
∂L ak,m ⎝ ak Z̃ + Φ (ρk ) ⎠
= φ .
∂ Zm 1 − ak akT 1 − ak akT
k=1
based on the preliminary simulation, with the parameters c, d of the Gumbel obtained
from (24).
We summarize our algorithm for the portfolio credit risk problem as follows:
1
n
ω j I (L j > t).
n
j=1
5. Estimate the variance of this estimator using n −1 times the sample variance of
the values ω j I (L j > t), j = 1, . . . , n.
Simulation Results The results in Table 3 were obtained by using crude Monte Carlo,
importance sampling using the GEV distribution as the IS distribution, and the IS
approach proposed in [4]. In the crude simulations, the sample size is 50,000, while
in the later two methods, the sample size is 10,000.
Notice that for a modest number of factors there is a very large reduction in
variance over the crude (for example the ratio of relative error corresponding to
2 factors, t = 2,500 corresponds to an efficiency gain or variance ratio of nearly
2,400) and a significant improvement over the Glasserman and Li [4] simulation
with a variance ratio of approximately 4. This improvement erodes as the number of
factors increases, and in fact the method of Glasserman and Li has smaller variance in
this case when m = 10. In general, ratios of multivariate densities of large dimension
tend to be quite “noisy”; although the weights have expected value 1, they often have
large variance. A subsequent paper will deal with the large dimensional case.
330 D.L. McLeish and Z. Men
Table 3 Comparison between crude simulation, EVIS and Glasserman and Li (2005) for the credit
risk model
t pt n n 1/2 RE (crude) n 1/2 RE (EVIS) n 1/2 RE (G&L)
2 factors
1,500 0.0034 50,000 17.1 0.99 1.73
2,000 0.0015 10,000 26.2 0.96 1.82
2,500 0.00038 10,000 51.3 1.05 1.93
3 factors
1,500 0.00305 50,000 18.94 1.24 1.72
2,000 0.00111 10,000 31.61 1.15 1.82
2,500 0.00042 10,000 49.99 1.35 1.99
5 factors
1,500 0.00289 50,000 18.87 1.39 1.71
2,000 0.00099 10,000 39.52 1.55 1.81
2,500 0.00035 10,000 55.89 1.57 1.88
10 factors
1,500 0.00246 50,000 20.83 1.84 1.79
2,000 0.00081 10,000 33.70 2.15 1.89
2,500 0.00029 10,000 57.73 3.06 1.98
4 Conclusion
The family of extreme value distributions are ideally suited to rare event simulation.
They provide a very tractable family of distributions and have tails which provide
bounded relative error regardless of how rare the event is. Examples of simulating
values of risk measures demonstrate a very substantial improvement over crude
Monte Carlo and a smaller improvement over competitors such as the exponential
tilt. This advantage is considerable for relatively low-dimensional problems, but there
may be little or no advantage over an exponential tilt when the dimensionality of the
problem increases.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
We suppose without loss of generality that the argument to the loss function is a
multivariate normal MNV(0, Im ) random vector Z, since any (possibly dependent)
random vector Y can be generated from such a Z. We begin by assuming that “large”
Extreme Value Importance Sampling for Rare Event Risk Measurement 331
Assumption 1 There exists a direction vector v ∈
m such that, for all fixed
vectors w ∈
m ,
P(L(Z 0 v) > t)
→ 1 as t → ∞ (15)
P(L(Z 0 v + w) > t)
where Y has the extreme value distribution H0 ( y−d c ). If we replace the distribution
of Y by the standard normal, it is easy to see that (16) gives Z ∼ MVN(0, Im ) so the
IS weight function in this case is simply the ratio of the two univariate distributions
for Y.
Assumption 2 Suppose that for any fixed w ∈
m , there exits y0 such that
L(yv + w) is an increasing function of y for y > y0 .
In order to prove this result, we will use the following lemma, a special case of
Corollary 1 of [11]:
Lemma 1 Suppose the random variable Y has a continuous distribution with cdf
FY . Suppose that T (y) is nondecreasing and for some real number a we have
a + T (y) ∼ −F Y (y) as y → y F− with y F = sup{y; FY (y) < 1} ≤ ∞. Then the IS
estimator for sample size n obtained from density (3) with θ = θt = kp2t has bounded
RE asymptotic to cn −1/2 as pt → 0 where c = k12 ek2 − 1 − k22 0.738.4
4 k2 1.5936 and c 0.738 are the unique positive solutions to the equations ek = 1
=
1− k2
1 + k 2 (1 + c2 ).
332 D.L. McLeish and Z. Men
1
n FYn−1 (dn + cn x) f (dn + cn x) → H0 (x) = exp −x − e−x as n → ∞
cn
Therefore, the extreme value distribution provides a bounded relative error impor-
tance sampling distribution, equivalent to (17).
for some nondegenerate cdf H (x), then we say that F is in the maximum domain
of attraction (MDA) of the cdf H and write F ∈ MDA(H ). The Fisher–Tippett
theorem (see Theorem 7.3 of [13]) characterizes the possible limiting distributions
H as members of the generalized extreme value distribution (GEV). A cdf is a
member of this family if it has cumulative distribution function of the form Hξ ( x−d
c )
where c > 0 and
−1/ξ
H0 (x) = exp(−e−x ), Hξ (x) = e−(1+ξ x) for ξ = 0 and ξ x > −1. (21)
y − dθt
(F(y))θt Hξ ( ) (22)
cθt
and this leads to matching t with the quantile corresponding to e−k2 0.203. In
other words, one parameter is determined by the equation
t − dθt − ln(k2 ) ξ =0
= Hξ−1 (e−k2 ) = −ξ
k2 −1 (23)
cθt ξ ξ = 0
Another parameter can be determined using the crude simulation and the values of
Y for which L(Y ) > t. We can match another quantile, for example, the median, the
mode, or the sample mean which estimates E[Y | L(Y ) > t]. In the case of standard
normally distributed inputs and the Gumbel distribution, matching the conditional
expected value E(L|L > t) and (23) results approximately in:
E( L̃) − t
c= , and d = t + 0.46659c. (24)
1.0438
334 D.L. McLeish and Z. Men
Then the conditional excess distribution can be approximated by the so-called gen-
eralized Pareto distribution for large values of u (see [13], Theorem 7.20):
for some positive measurable function β(u) where G ξ,β is the c.d.f of the Generalized
Pareto (GP) distribution:
⎧
⎨ ξ y −1/ξ ξ > 0, y > 0 or
1 − (1 + β ) for β
G ξ,β (y) = ξ < 0, 0 < y < −ξ (25)
⎩
1 − e−y/β for ξ = 0, y > 0
References
1. Asmussen, S., Glynn, P.W.: Stochastic Simulation: Algorithms and Analysis. Springer, New
York (2007)
2. De Haan, L., Resnick, S.I.: Local limit theorems for sample extremes. Ann. Probab. 10, 396–413
(1982)
3. Glassserman, P.: Monte Carlo Methods in Financial Engineering. Springer, New York (2004)
4. Glasserman, P., Li, J.: Importance sampling for portfolio credit risk. Manag. Sci. 51(11), 1643–
1656 (2005)
5. Gupton, G., Finger, C.C., Bhatia, M.: CreditMetrics, Technical Document of the Morgan Guar-
anty Trust Company http://www.defaultrisk.com/pp_model_20.htm (1997)
6. Hall, P.: On the rate of convergence of normal extremes. J. Appl. Probab. 16, 433–439 (1979)
7. Homem-de-Mello, T., Rubinstein, R.Y.: Rare event probability estimation using cross-entropy.
In: Proceedings of the 2002 Winter Simulation Conference, pp. 310–319 (2002)
8. Kroese, D., Rubinstein, R.Y.: Simulation and the Monte Carlo Method, 2nd edn. Wiley, New
York (2008)
9. Kroese, D., Taimre, T., Botev, Z.I.: Handbook of Monte Carlo Methods. Wiley, New York
(2011)
Extreme Value Importance Sampling for Rare Event Risk Measurement 335
10. Kotz, S., Nadarajah, S.: Extreme Value Distributions: Theory and Applications. Imperial Col-
lege Press, London (2000)
11. McLeish, D.L.: Bounded relative error importance sampling and rare event simulation. ASTIN
Bullet. 40, 377–398 (2010)
12. McLeish, D.L.: Monte Carlo Simulation and Finance. Wiley, New York (2005)
13. McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management. Princeton University
Press, Princeton (2005)
14. Pickands, J.: Sample sequences of maxima. Ann. Math. Stat. 38, 1570–1574 (1967)
15. Ridder, A., Rubinstein, R.: Minimum cross entropy method for rare event simulation. Simula-
tion 83, 769–784 (2007)
A Note on the Numerical Evaluation
of the Hartman–Watson Density
and Distribution Function
1 Introduction
In the process of studying the probability distribution of the integral over a geometric
Brownian motion, [14] introduced the function
π2 ∞ π y
r e2x y2
θ(r, x) := √ e− 2 x −r cosh(y)
sinh(y) sin dy, (1)
2 π3 x x
0
G. Bernhart (B)
Technische Universität München, Parkring 11, 85748 Garching-Hochbrück, Germany
e-mail: [email protected]
J.-F. Mai
XAIA Investment GmbH, Sonnenstraße 19, 80331 München, Germany
2.5 1
r=0.5
r=1
2 r=1.5 0.8
1.5 0.6
Fr(x)
θ(r,x)
1 0.4
Fig. 1 Left The function θ(r, x) for three different parameters r and values x ∈ (0.15, 4). Right
The distribution function Fr (x) for three different parameters r and values x ∈ (0.15, 10)
∞
1 z 2 m+ν
Iν (z) := (2)
m! (m + ν + 1) 2
m=0
the modified Bessel function of the first kind, the function fr (x) := θ(r, x)/I0 (r ),
x > 0, r > 0, is the density of a one-parametric probability law, say μr ,
on the positive half-axis, called the Hartman–Watson law. The Hartman–Watson
law arises as the first hitting time of certain diffusion processes, see [10], and
is of paramount interest in mathematical finance in the context of Asian option
pricing, see [2, 6, 14]. It was shown in [7] that this law is infinitely divisi-
ble with Laplace transform given by ϕr (u) := I√2 u (r )/I0 (r ), u ≥ 0. More-
over, it follows from a result in [10] that μr is not only infinitely divisible, but
even within the so-called Bondesson class, which is a large subfamily of infi-
nitely divisible laws that is introduced in and named after [5]. Notice in par-
ticular that it follows from this fact together with ([13, Theorem 6.2, p. 49])
that the function Ψr (u) := − log I√2 u (r )/I0 (r ) , u ≥ 0, is a so-called complete
Bernstein function, which allows for a holomorphic extension to the sliced complex
plane C \ (−∞, 0). We will make use of this observation in Sect. 5.
It is well-known that the numerical evaluation of the density of the Hartman–
Watson law near zero is a challenging task because the integrand in the formula for
θ(r, x) is highly oscillating. The following sections discuss several methods to eval-
uate the function θ(r, x) accurately. Figure 1 visualizes the function θ(r, x) for three
different parameters r and values x ∈ (0.15, 4), where all numerical computation
routines discussed in the present note yield exactly the same result.
When looking at Fig. 1, mathematical intuition suggests that the approximation
θ(0.5, x) ≈ 0 for x < 0.15 might be a pragmatic—and numerically efficient—
implementation close to zero. Nevertheless, [9] considers the numerical evaluation
close to zero and obtains significant errors, see Sect. 3. Moreover, [6] studies the
asymptotic behavior of fr (x) as x ↓ 0 and [2] study the behavior of the distribution
function Fr of μr as the argument tends to zero. We like to mention that the right
tail of the Hartman–Watson distribution μr becomes extremely heavy as r ↓ 0. For
A Note on the Numerical Evaluation of the Hartman–Watson Density … 339
x
instance, the distribution function Fr (x) = 0 fr (t) dt is still significantly smaller
than 1 for x = 10 and different r , see Fig. 1.
The remaining article is organized as follows. Section 2 illustrates the occurrence
of the Hartman–Watson distribution, in particular, in mathematical finance. Section 3
discusses the direct implementation of Formula (1). Section 4 proposes the use of the
Gaver–Stehfest Laplace inversion technique. Section 5 proposes a complex Laplace
inversion algorithm to numerically evaluate fr and Fr . Finally, Sect. 6 concludes.
1
f X (x) = er cos(x)
, −π < x < π.
2 π I0 (r )
The von Mises distribution is the most prominent law for an angle in the field of
directional statistics, because it constitutes a tractable approximation to the “wrapped
normal distribution” (i.e., the law of Y mod 2 π when Y is normal), which is difficult
to work with.
The importance of the Hartman–Watson distribution in the context of mathemat-
ical finance originates from the fact that
1 1+e2 x e x
x 2
e− 2 t
√ P(At ∈ du | Wt = x) = e− 2 u θ ,t ,
2πt u u
t
where Wt denotes standard Brownian motion and At = 0 e2 Ws ds an associated
integrated geometric Brownian motion, see [14]. The process At , and hence the
Hartman–Watson distribution, naturally enters the scene when Asian stock deriva-
tives, i.e., derivatives with “averaging periods,” are considered in the Black–Scholes
world, see, e.g., [2, 9]. Another example, which is mathematically based on the
exactly same reasoning, has recently been given in [3]: when the Black–Scholes
model is enhanced by the introduction of stochastic repo margins, this leads to a
convexity adjustment for all kinds of stock derivatives which involves the density of
the Hartman–Watson distribution.
Let us furthermore briefly sketch a potential third application, which uses a sto-
chastic representation for the Hartman–Watson law. Consider a diffusion process
{X t }t≥0 satisfying the SDE
340 G. Bernhart and J.-F. Mai
0.5
0.4
0.3
θ(0.5,x) 0.2
0.1
−0.1
−0.2
Fig. 2 Evaluation of Formula (1) for r = 0.5 and x ∈ [0.125, 0.15] in MATLAB applying the
built-in adaptive quadrature routine quadgk, which can handle infinite integration domains
1 I1 (X t )
dX t = X t + Xt dt + dWt , X 0 = r > 0.
2 I0 (X t )
This explodes with probability one, as can be seen from Feller‘s test for explosion
(the drift increases rapidly), i.e., there exists a stopping time τ ∈ (0, ∞) such that
paths of {X t } are well defined on [0, τ ) and limt↑τ X t = ∞ almost surely. Such
explosive diffusions are used to model fatigue failures in solid materials. X t describes
the evolution of the length of the longest crack and τ is the time point of ultimate
damage. Kent [10] shows that τ ∼ μr . We may rewrite τ as the first hitting time
of zero of the stochastic process Yt := 1/ X t , starting at Y0 = 1/r > 0. Observing
the stock price S0 > 0 of a highly distressed company facing bankruptcy, it might
now make sense to model the evolution of this company‘s stock price until default
as St := Yt setting r := 1/S0 . The time of bankruptcy is defined as the first time the
stock price hits zero, which has a Hartman–Watson law. A similar model, assuming
St to follow a CEV process that is allowed to diffuse to zero, is applied in [1].
Regarding the exact numerical evaluation of the Hartman–Watson density, the article
[9] shows that a straightforward numerical implementation of the Formula (1) for
r = 0.5 and x ∈ [0.125, 0.15] yields significant numerical errors. In particular, Fig. 2
in [9] shows that one ends up with negative density values. We come to the same
conclusion, see Fig. 2.
A Note on the Numerical Evaluation of the Hartman–Watson Density … 341
We apply the Gaver–Stehfest algorithm in order to obtain θ(r, ·) from its Laplace
transform I√2 · (r ) via Laplace inversion for fixed values of r . For a rigorous proof
and a good explanation of this method, see [11]. In particular, it is not difficult to
observe from Yor‘s expression (1) that ([11] Theorem 1(iii)) applies, which justifies
the approximation
log(2)
2n
θ(r, x) ≈ ak (n) I√2 k log(2)/x (r ), (3)
x
k=1
min{k,n}
(−1)k+n n 2j j
ak (n) = j n+1 .
n! j j k− j
j=
(k+1)/2
The Gaver–Stehfest algorithm has the nice feature that only evaluations of the Laplace
transform on the positive half-axis are required. In particular, the required modified
Bessel function I√2 u (r ) is efficient and easy to compute for u > 0. In MATLAB, it
is available as the built-in function besseli. The drawback of the Gaver–Stehfest
algorithm is that it requires high-precision arithmetic because the involved constants
ak (n) are alternating and become huge and difficult to evaluate. For practical imple-
mentations, this prevents the use of large n, which would theoretically be desirable
due to the convergence result of [11]. Nevertheless, our empirical investigation shows
that n = 10 is still feasible on a standard PC without further precision arithmetic
considerations and yields reasonable results for the considered parameterization.
However, for larger values of r , the algorithm is less stable as can be seen at the end
of Sect. 5.
The obtained values of θ(r, x) are visualized in Fig. 3. Comparing them to the
brute force implementation in Fig. 2, the error for small x becomes significantly
smaller.
−6
x 10
3
0.5
0.4 2
0.3 1
θ(0.5,x)
θ(0.5,x)
0.2
0.1 0
0 −1
−0.1
−2
−0.2
−3
0.125 0.13 0.135 0.14 0.145 0.15 0.125 0.13 0.135 0.14 0.145 0.15
x x
Fig. 3 Evaluation of θ(r, x) for r = 0.5 and x ∈ [0.125, 0.15] in MATLAB applying the Gaver–
Stehfest approximation (3) with n = 10. Left The y-axis is precisely the same as in Fig. 2 for
comparability. Right The y-axis is made finer to visualize smaller errors (scale 10−6 )
1 dv
M ex a
θ(r, x) = Im ē x M log(v) (b i−a)
I√2 (a−M log(v) (b i−a))
(r ) (b i − a)
π v
0
(4)
with arbitrary parameters a, b > 0 and M > 2/(a x), and this integral is a proper
Riemannian integral, since the integrand vanishes for v ↓ 0, see [4]. Regarding the
choice of the parameters, [4] have shown that a = 1/x, M = 3 is usually a good
choice and we will use these parameters. Concerning the remaining parameter b, we
choose b = a. For the evaluation of the distribution function Fr , it is also shown in
[4] that
1 dv
M ex a log(v) (b i−a) ϕr a − M log(v) (b i − a)
Fr (x) = Im ē x M (b i −a)
π a − M log(v) (b i − a) v
0
with the same parameter restrictions as above. One particular challenge with this
method is that the modified Bessel function Iν needs to be evaluated for complex ν.
A straightforward implementation sufficient for our needs is achieved by using the
partial sums related to the representation in Eq. (2).
It has the advantage that error
bounds can be computed, as for r > 0 and Snν (r ) := nm=0 m! (m 1+ ν + 1) ( r2 )2 m + ν ,
one can compute
ν Re(ν)
∞ r 2 m
S (r ) − Iν (r ) ≤ r 1
.
n
2 m! |(m + ν + 1)| 2
m=n+1
Using the Gamma functional equation (z + 1) = (z) z, it is easy to see that
|(z + 1)| ≥ |(z)| for |z| ≥ 1. Thus, for n ≥ − Re(ν) − 1, the sequence {|(m +
ν + 1)|}m=n+1,n+2,... is increasing, yielding
A Note on the Numerical Evaluation of the Hartman–Watson Density … 343
−8
x 10
−2
0.5
−4
0.4
0.3 −6
θ(0.5,x)
θ(0.5,x)
0.2 −8
0.1
−10
0
−12
−0.1
−0.2 −14
−16
0.125 0.13 0.135 0.14 0.145 0.15 0.125 0.13 0.135 0.14 0.145 0.15
x x
Fig. 4 Evaluation of θ(r, x) for r = 0.5 and x ∈ [0.125, 0.15] in MATLAB applying the Laplace
inversion formula (4) with a = b = 1/x and M = 3. The modified Bessel function is implemented
with accuracy 10−6 . Left The y-axis is precisely the same as in Fig. 2 for comparability. Right The
y-axis is made finer to visualize smaller errors (scale 10−8 )
r Re(ν) ∞
m
ν 1 r2
S (r ) − Iν (r ) ≤ 2
,
n
|(n + ν + 2)| m! 4
m=n+1
where the series term is the residual of the Taylor expansion of exp(−r 2 /4), which
allows for a closed-form estimate. Consequently, one is able to choose n such that the
modified Bessel function is approximated up to a given accuracy. Using the Gamma
functional equation, one has to compute the complex Gamma function only once
which further increases efficiency. The complex Gamma function is computed using
the Lanczos approximation, see [12].1
Figure 4 shows the resulting values of θ(r, x), where the modified Bessel function
is approximated with accuracy 10−6 . Formula (4) is evaluated in MATLAB apply-
ing the built-in adaptive quadrature routine quadgk. Comparing the results to the
Gaver–Stehfest inversion, the error for small x is again significantly reduced and
the results can be even improved by further increasing the accuracy of the modified
Bessel function.
fileexchange/3572-gamma.
344 G. Bernhart and J.-F. Mai
Formula (1)
Gaver−Stehfest
0.15 Bondesson
0.1
θ(3,x) 0.05
−0.05
−0.1
Fig. 5 Evaluation of θ(r, x) for r = 3 and x ∈ [0.125, 0.15] using the three presented methods
with the same specifications as before, i.e., the Gaver–Stehfest approximation (3) with n = 10 and
the Laplace inversion formula (4) with a = b = 1/x and M = 3
6 Conclusion
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
1. Atlan, M., Leblanc, B.: Hybrid equity-credit modelling. Risk Magazine August: 61–66 (2005)
2. Barrieu, P., Rouault, A., Yor, M.: A study of the Hartman-Watson distribution motivated by
numerical problems related to the pricing of Asian options. J. Appl. Probab. 41, 1049–1058
(2004)
3. Bernhart, G., Mai, J.-F.: On convexity adjustments for stock derivatives due to stochastic repo
margins. Working paper (2014)
4. Bernhart, G., Mai, J.-F., Schenk, S., Scherer M.: The density for distributions of the Bondesson
class. J. Comput. Financ. (2014, to appear)
5. Bondesson, L.: Classes of infinitely divisible distributions and densities: Zeitschrift für
Wahrscheinlichkeitstheorie und verwandte Gebiete 57, 39–71 (1981)
6. Gerhold, S.: The Hartman–Watson distribution revisited: asymptotics for pricing Asian options.
J. Appl. Probab. 48(3), 892–899 (2011)
A Note on the Numerical Evaluation of the Hartman–Watson Density … 345
7. Hartman, P.: Completely monotone families of solutions of nth order linear differential equa-
tions and infinitely divisible distributions. Ann. Scuola Norm. Sup. Pisa 4(3), 267–287 (1976)
8. Hartman, P., Watson, G.: “Normal” distribution functions on spheres and the modified Bessel
functions. Ann. Probab. 5, 582–585 (1974)
9. Ishiyama, K.: Methods for evaluating density functions of exponential functionals represented
as integrals of geometric Brownian motion. Methodol. Comput. Appl. Probab. 7, 271–283
(2005)
10. Kent, J.T.: The spectral decomposition of a diffusion hitting time. Ann. Probab. 10(1), 207–219
(1982)
11. Kuznetsov, A.: On the convergence of the Gaver–Stehfest algorithm. SIAM J. Numer. Anal.
(2013, forthcoming)
12. Lanczos, C.: A precision approximation of the gamma function. J. Soc. Ind. Appl. Math. Ser.
B Numer. Anal. 1, 86–96 (1964)
13. Schilling, R.L., Song, R., Vondracek Z. Bernstein Functions. Studies in Mathematics, Vol. 37,
de Gruyter, Berlin. (2010)
14. Yor, M.: On some exponential functionals of Brownian motion. Adv. Appl. Probab. 24(3),
509–531 (1992)
Computation of Copulas by Fourier Methods
Antonis Papapantoleon
1 Introduction
A. Papapantoleon (B)
Institute of Mathematics, TU Berlin, Straße des 17. Juni 136, 10623 Berlin, Germany
e-mail: [email protected]
the results are proved for random variables for simplicity, while stochastic processes
are considered as a corollary. In Sect. 3, we provide two examples to showcase how
this method can be applied, for example, in performing sensitivity analysis of the
copula with respect to the parameters of the model. Finally, Sect. 4 concludes with
some remarks.
Let Rn denote the n-dimensional Euclidean space, ·, · the Euclidean scalar product
and Rn− the negative orthant, i.e., Rn− = {x ∈ Rn : xi < 0 ∀i}. We consider a random
variable X = (X 1 , . . . , X n ) ∈ Rn defined on a probability space (Ω, F , IP). We
denote by F the cumulative distribution function (cdf) of X and by f its probability
density function (pdf). Let C denote the copula of X and c its copula density function.
Analogously, let Fi and f i denote the cdf and pdf respectively of the marginal X i ,
for all i ∈ {1, . . . , n}. In addition, we denote by Fi−1 the generalized inverse of Fi ,
i.e., Fi−1 (u) = inf{v ∈ R : Fi (v) ≥ u}.
We denote by M X the (extended) moment generating function of X :
M X (u) = IE eu,X , (1)
for all u ∈ Cn such that M X (u) exists. Let us also define the set
I = R ∈ Rn : M X (R) < ∞ and M X (R + i·) ∈ L 1 (Rn ) .
Theorem 1 Let X be a random variable that satisfies Assumption (D). The copula
of X is provided by
1 e−R+iv,x
C(u) = M X (R + iv) n dv , (2)
(−2π )n (R
i=1 i + ivi ) xi =Fi−1 (u i )
Rn
Computation of Copulas by Fourier Methods 349
see, e.g., [14, Theorem 5.3] for a proof in this setting and [16] for an elegant proof
in the general case.
We will evaluate the joint cdf F using the methodology of Fourier methods for
option pricing. That is, we will think of the cdf as the “price” of a digital option on
several fictitious assets. Let us define the function
and denote by
g its Fourier transform. Then we have that
F(x) = IP(X 1 ≤ x1 , . . . , X n ≤ xn )
= IE 1{X 1 ≤x1 ,...,X n ≤xn } = IE[g(X )]
1
= M X (R + iv) g (iR − v)dv, (5)
(2π )n
Rn
where we have applied Theorem 3.2 in [6]. The prerequisites of this theorem are satis-
fied due to Assumption (D) and because g R ∈ L 1 (Rn ), where g R (x) := e−R,x g(x)
for R ∈ Rn− .
Finally, the statement follows from (3) and (5) once we have computed the Fourier
transform of g. We have for Ri < 0, i ∈ {1, . . . , n},
g (iR − v) = eiiR−v,y g(y)dy
Rn
= eiiR−v,y 1{y1 ≤x1 ,...,yn ≤xn } dy
Rn
n xi
= e(−Ri −ivi )yi dyi
i=1−∞
n
e−(Ri +ivi )xi
= (−1)n , (6)
Ri + ivi
i=1
where the expectation can be computed using (5) again, while a root finding algorithm
provides the infimum (using the continuity of Fi ).
We can also compute the copula density function using Fourier methods, which
resembles the computation of Greeks in option pricing.
Lemma 1 Let X be a random variable that satisfies Assumption (D) and assume
further that the marginal distribution functions F1 , . . . , Fn are strictly increasing
and continuously differentiable. Then, the copula density function c of X is provided
by
1
c(u) = n M X (R + iv) e−R+iv,x dv , (7)
(2π ) n
i=1 f i (x i ) xi =Fi−1 (u i )
Rn
∂n
c(u) = C(u 1 , . . . , u n )
∂u 1 . . . ∂u n
∂n 1 e−R+iv,x
= M X (R + iv) n dv
∂u 1 . . . ∂u n (−2π ) n
i=1 (Ri + ivi ) xi =Fi−1 (u i )
Rn
1 M X (R + iv) ∂n −R+iv,x
= n e dv. (8)
(−2π )n i=1 (Ri + ivi ) ∂u 1 . . . ∂u n xi =Fi−1 (u i )
Rn
Now, since the marginal distribution functions are continuously differentiable, using
the chain rule and the inverse function theorem we get that
∂n −R+iv,x
e
∂u 1 . . . ∂u n xi =Fi−1 (u i )
n
1
= (−1)n (Ri + ivi )e−R+iv,x n , (9)
i=1 f i (x i ) xi =Fi−1 (u i )
i=1
(X t )t≥0 . There are many examples of stochastic processes where the corresponding
characteristic functions are known explicitly. Prominent examples are Lévy processes,
self-similar additive (“Sato”) processes and affine processes.
Corollary 1 Let X = (X t )t≥0 be an Rn -valued stochastic process on a filtered
probability space (Ω, F , (Ft )t≥0 , IP). Assume that the random variable X t , t ≥ 0,
satisfies Assumption (D). Then, the copula of X t is provided by
1 e−R+iv,x
Ct (u) = M X t (R + iv) n dv , (10)
(−2π )n i=1 (R i + ivi ) xi =F −1
Xi
(u i )
Rn t
where u ∈ [0, 1]n and R ∈ R. An analogous statement holds for the copula density
function ct of X t .
3 Examples
We will demonstrate the applicability and flexibility of Fourier methods for the
computation of copulas using two examples. First, we consider a 2D normal random
variable and next a 2D normal inverse Gaussian (NIG) Lévy process. Although the
copula of the normal random variable is the well-known Gaussian copula, little was
known about the copula of the NIG distribution until recently; see Theorem 5.13
in [18] for a special case. Hammerstein [19, Chap.2] has now provided a general
characterization of the (implied) copula of the multidimensional NIG distribution
using properties of normal mean-variance mixtures.
Example 1 The first example is simply a “sanity check” for the proposed method. We
consider the two-dimensional Gaussian distribution and compute the corresponding
copula for correlation values equal to ρ = {−1, 0, 1}; see Fig. 1 for the resulting
contour plots. Of course, the copula of this example is the Gaussian copula, which for
correlation coefficients equal to {−1, 0, 1} corresponds to the countermonotonicity
copula, the independence copula and the comonotonicity copula respectively. This
is also evident from Fig. 1.
cf. [1]. The marginals are also NIG distributed and we have that X ti ∼ NIG(α̂ i , β̂ i , δ̂ i t,
μ̂i t), where
α 2 − β 2j (δ j j − δi2j δii−1 )
α̂ =
i
, β̂ i = βi + β j δi2j δii−1 , δ̂ i = δ δii , μ̂i = μi ,
δii
for i = {1, 2} and j = {2, 1}; cf. e.g., [2, Theorem 1]. Assumption (D) is satisfied
for R ∈ R2− such that α 2 − β + R, (β + R) ≥ 0; see Appendix B in [6]. Hence
R
= ∅.
Therefore, we can apply Theorem 1 to compute the copula of the NIG distribution.
The parameters used in the
−3.80
numerical example are similar to [6, pp.+233–234]:
4 Final Remarks
We will not elaborate on the speed of Fourier methods compared with Monte Carlo
methods in the multidimensional case; the interested reader is referred to [7] for a
careful analysis. Moreover, [20] provides recommendations on the efficient imple-
mentation of Fourier integrals using sparse grids in order to deal with the “curse of
dimensionality.” Let us point out though that the computation of the copula function
will be much quicker than the computation of the copula density, since the integrand
in (2) decays much faster than the one in (7). One should think of the analogy to
Computation of Copulas by Fourier Methods 353
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
option prices and option Greeks again. Finally, it seems tempting to use these formu-
las for the computation of tail dependence coefficients. However, due to numerical
instabilities at the limits, they did not yield any meaningful results.
Acknowledgments The author thanks E. A. von Hammerstein for helpful comments and sugges-
tions.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
354 A. Papapantoleon
References
1. Barndorff-Nielsen, O.E.: Processes of normal inverse Gaussian type. Financ. Stoch. 2, 41–68
(1998)
2. Blæsild, P.: The two-dimensional hyperbolic distribution and related distributions, with an
application to Johannsen’s bean data. Biometrika 68, 251–263 (1981)
3. Cuchiero, C., Filipović, D., Mayerhofer, E., Teichmann, J.: Affine processes on positive semi-
definite matrices. Ann. Appl. Probab. 21, 397–463 (2011)
4. Duffie, D., Filipović, D., Schachermayer, W.: Affine processes and applications in finance.
Ann. Appl. Probab. 13, 984–1053 (2003)
5. Eberlein, E., Madan, D.: On correlating Lévy processes. J. Risk 13, 3–16 (2010)
6. Eberlein, E., Glau, K., Papapantoleon, A.: Analysis of Fourier transform valuation formulas
and applications. Appl. Math. Financ. 17, 211–240 (2010)
7. Hurd, T.R., Zhou, Z.: A Fourier transform method for spread option pricing. SIAM J. Financ.
Math. 1, 142–157 (2010)
8. Jacod, J., Protter, P.: Probability Essentials, 2nd edn. Springer, Heidelberg (2003)
9. Kallsen, J., Tankov, P.: Characterization of dependence of multidimensional Lévy processes
using Lévy copulas. J. Multivar. Anal. 97, 1551–1572 (2006)
10. Kawai, R.: A multivariate Lévy process model with linear correlation. Quant. Financ. 9, 597–
606 (2009)
11. Khanna, A., Madan D.: Non Gaussian models of dependence in returns. Preprint,
SSRN/1540875, (2009)
12. Luciano, E., Schoutens, W.: A multivariate jump-driven financial asset model. Quant. Financ.
6, 385–402 (2006)
13. Luciano, E., Semeraro, P.: A generalized normal mean-variance mixture for return processes
in finance. Int. J. Theor. Appl. Financ. 13, 415–440 (2010)
14. McNeil, A., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts, Techniques,
Tools. Princeton University Press, (2005)
15. Muhle-Karbe, J., Pfaffel, O., Stelzer, R.: Option pricing in multivariate stochastic volatility
models of OU type. SIAM J. Financ. Math. 3, 66–94 (2012)
16. Rüschendorf, L.: On the distributional transform, Sklar’s theorem, and the empirical copula
process. J. Stat. Plan. Infer. 139, 3921–3927 (2009)
17. Sato, K.: Lévy Processes and Infinitely Divisible Distributions. Cambridge University, Cam-
bridge (1999)
18. Schmidt, V.: Dependencies of extreme events in finance. Ph.D. thesis, University of Ulm (2003)
19. Hammerstein, E.A.v.: Generalized hyperbolic distributions: theory and applications to CDO
pricing. Ph.D. thesis, University of Freiburg (2011)
20. Villiger S.: Basket option pricing on sparse grids using fast Fourier transforms. Master’s thesis,
ETH Zürich (2007)
Part V
Dependence Modelling
Goodness-of-fit Tests for Archimedean
Copulas in High Dimensions
C. Hering (B)
Institute of Number Theory and Probability Theory, Ulm University,
Helmholtzstraße 18, 89081 Ulm, Germany
e-mail: [email protected]
M. Hofert
Department of Mathematics, Technische Universität München, 85748 Garching, Germany
e-mail: [email protected]
1 Introduction
sampling algorithm, the conditional distribution method, see, e.g., [10]. In other
words, for a bijective transformation which converts d independent and identically
distributed (“i.i.d.”) standard uniform random variables to a d-dimensional random
vector distributed according to some copula C, the corresponding inverse transfor-
mation may be applied to obtain d i.i.d. standard uniform random variables from a
d-dimensional random vector following the copula C. In this work, we suggest this
idea for goodness-of-fit testing based on a transformation originally proposed by [29]
for sampling Archimedean copulas. With the recent work of [22] we obtain a more
elegant proof of the correctness of this transformation under weaker assumptions.
We then apply the first d − 1 components to build a general goodness-of-fit test for
d-dimensional Archimedean copulas. This complements goodness-of-fit tests based
on the dth component, the Kendall distribution function, see, e.g., [13, 26], or [14].
Our proposed test can be interpreted as an Archimedean analogon to goodness-of-fit
tests based on Rosenblatt’s transformation for copulas in general as it establishes a
link between a sampling algorithm and a goodness-of-fit test. The appealing property
of tests based on the inverse of the transformation of [29] for Archimedean copulas
is that they are easily applied in any dimension, whereas tests based on Rosenblatt’s
transformation, as well as tests based on the Kendall distribution function are typi-
cally numerically challenging. The transformation can also be conveniently used for
graphical goodness-of-fit testing as recently advocated by [16].
This paper is organized as follows. In Sect. 2, commonly used goodness-of-fit
tests for copulas in general are recalled. In Sect. 3, the new goodness-of-fit test for
Archimedean copulas is presented. Section 4 contains details about the conducted
simulation study. The results are presented in Sect. 5 and the graphical goodness-of-fit
test is detailed in Sect. 6. Finally, Sect. 7 concludes.
H0 : C ∈ C0 (2)
where F̂n j (x) = n1 nk=1 1{X k j ≤x} denotes the empirical distribution function of
the jth data column (the data matrix consisting of the entries X i j , i ∈ {1, . . . , n},
j ∈ {1, . . . , d}), see [14]. Following the latter approach one ends up with rank-
based pseudo-observations which are interpreted as observations of C (besides the
known issues of this interpretation, see Remark 1 below) and are, therefore, used for
estimating θ and testing H0 .
In order to conduct a goodness-of-fit test, the pseudo-observations U i , i ∈
{1, . . . , n}, are usually first transformed to some variables U i , i ∈ {1, . . . , n}, so
that the distribution of the latter is known and sufficiently simple to test under the
null hypothesis. For Rosenblatt’s transformation (see Sect. 2.1), U i , i ∈ {1, . . . , n},
is also d-dimensional, for tests based on the Kendall distribution function (described
in Sect. 2.2), it is one-dimensional, and for the goodness-of-fit approach we propose
in Sect. 3, it is (d − 1)-dimensional. If not already one-dimensional, after such a
transformation, U i , i ∈ {1, . . . , n}, is usually mapped to one-dimensional quantities
Yi , i ∈ {1, . . . , n}, such that the corresponding distribution FY is again known under
the null hypothesis. So indeed, instead of (2), one usually considers some adjusted
hypothesis H0∗ : FY ∈ F0 under which a goodness-of-fit test can easily be carried
out in a one-dimensional setting. For mapping the variates to a one-dimensional
setting, different approaches exist, see Sect. 2.2. Note that if H0∗ is rejected, so is H0 .
Remark 1 As, e.g., [8] describe, there are two problems with the approach described
above. First, the pseudo-observations U i , i ∈ {1, . . . , d}, are neither realizations of
perfectly independent random vectors nor are the components perfectly following
univariate standard uniform distributions. This affects the null distribution of the test
statistic under consideration. All copula goodness-of-fit approaches suffer from these
effects since observations from the underlying copula are never directly observed
in practice. A solution may be a bootstrap to access the exact null distribution.
Particularly in high dimensions, it is often time-consuming, especially for goodness-
of-fit tests suggested in the copula literature so far. Second, using estimated copula
parameters additionally affects the null distribution.
U1 = U1 ,
U2 = C2 (U2 | U1 ),
..
.
Ud = Cd (Ud | U1 , . . . , Ud−1 ),
D j−1,...,1 C (1,..., j) (u 1 , . . . , u j )
C j (u j | u 1 , . . . , u j−1 ) = , j ∈ {2, . . . , d},
D j−1,...,1 C (1,..., j−1) (u 1 , . . . , u j−1 )
(4)
The problem when applying (4) or (5) in high dimensions is that it is usually quite
difficult to access the derivatives involved, the price which one has to pay for such a
general transformation. Furthermore, numerically evaluating the derivatives is often
time-consuming and prone to errors.
Genest et al. [14] propose a test statistic based on the empirical distribution func-
tion of the random vectors U i , i ∈ {1, . . . , d}. As an overall result, the authors
recommend to use a distance between the distribution under H0 , assumed to be
standard uniform on [0, 1]d , and the empirical distribution, namely
362 C. Hering and M. Hofert
B
Sn,d =n (Dn (u) − Π (u))2 du,
[0,1]d
n
where Π (u) = dj=1 u j denotes the independence copula and Dn (u) = n1 i=1
1{U i ≤u} the empirical distribution function based on the random vectors U i , i ∈
{1, . . . , d}. We refer to this transformation as “Sn,d
B ” in what follows.
generator inverse. Furthermore, for large d, evaluation of K C often gets more and
more complicated from a numerical point of view (see [18] for the derivatives
involved), except for specific cases such as Clayton’s family where all involved
derivatives of ψ are directly accessible, d−1 see, e.g., [29], and therefore K C can be
computed directly via1 K C (t) = k=0 (0 − ψ −1 (t))k ψ (k) (ψ −1 (t))/k!, see, e.g.,
[3] or [22]. Moreover, note that applying Td for obtaining the transformed data U i ,
i ∈ {1, . . . , n}, requires n-times the evaluation of the Kendall distribution function
K C , which can be computationally intensive, especially in simulation studies involv-
ing bootstrap procedures. With the informational loss inherent in the goodness-of-fit
tests following the approaches addressed in Sect. 2.2 in mind, one may therefore
suggest to omit the last component Td of T and only consider T1 , . . . , Td−1 , i.e.,
using the data (Ui1 , . . . , U
i d−1 ), i ∈ {1, . . . , n}, for testing purposes if d is large.
This leads to fast goodness-of-fit tests for Archimedean copulas in high dimensions.
A goodness-of-fit test based on omitting the last component of the transformation T
is referred to as approach “Td−1 ” in what follows.
In our experimental design, focus is put on two features, the error probability of
the first kind, i.e., if a test maintains its nominal level, and the power under several
alternatives. To distinguish between the different approaches we use either pairs or
triples, e.g., the approach “(Td−1 , Nd−1 , AD)” denotes a goodness-of-fit test based
on first applying our proposed transformation T without the last component, then
using the approach based on the χd−1 2 distribution to transform the data to a one-
dimensional setup, and then applying the Anderson-Darling statistic to test H0∗ ;
similarly, “(Td−1 , Sn,d−1
B )” denotes a goodness-of-fit test which uses the approach
Sn,d−1 for reducing the dimension and testing H0∗ .
B
In the conducted Monte Carlo simulation,2 the following ten different goodness-
of-fit approaches are tested:
1 It also follows from this formula that K C converges pointwise to the unit jump at zero for d → ∞.
2 All computations were conducted on a compute node (part of the bwGRiD Cluster Ulm) which
consists of eight cores (two four-core Intel Xeon E5440 Harpertown CPUs with 2.83 GHz and 6 MB
second level cache) and 16 GB memory. The algorithms are implemented in C/C++ and compiled
using GCC 4.2.4 with option O2 for code optimization. Moreover, we use the algorithms of the
Numerical Algorithms Group, the GNU Scientific Library 1.12, and the OpenMaple interface of
Maple 12. For generating uniform random variates an implementation of the Mersenne Twister by
[28] is used. For the Anderson-Darling test, the procedures suggested in [21] are used.
Goodness-of-fit Tests for Archimedean Copulas in High Dimensions 365
Similar to [14], we investigate samples of size n = 150 and parameters of the copulas
such that Kendall’s tau equals τ = 0.25. We work in d = 5 and d = 20 dimensions for
comparing the goodness-of-fit tests given in (7). For every scenario, we simulate the
corresponding Archimedean copulas of Ali-Mikhail-Haq (“A”), Clayton (“C”), Frank
(“F”), Gumbel (“G”), and Joe (“J”), see, e.g., [15], as well as the Gaussian (“Ga”)
and t copula with four degrees of freedom (“t4 ”); note that we use one-parameter
copulas ( p = 1) in our study only for simplicity. Whenever computationally feasible,
N = 1,000 replications are used for computing the empirical level and power. In
some cases, see Sect. 5, less than 1,000 replications had to be used. For all tests, the
significance level is fixed at α = 5 %. For the univariate χ 2 -tests, ten cells were used.
Concerning the use of Maple, we proceed as follows. For computing the first
d − 1 components T1 , . . . , Td−1 of the transformation T involved in the first three
and the sixth to eighth approach listed in (7), Maple is only used if working under
double precision in C/C++ leads to errors. With errors, nonfloat values including
nan, -inf, and inf, as well as float values less than zero or greater than one are
meant. For computing the component Td , Maple is used to generate C/C++ code.
To decrease runtime, the function is then hard coded in C/C++, except for Clayton’s
family where an explicit form of all derivatives and hence K C is known, see [29]. The
same holds for computing K C for the approaches (K C , χ 2 ) and (K C , AD). For the
approaches involving Rosenblatt’s transform, a computation in C/C++ is possible
for Clayton’s family in a direct manner, whereas again Maple’s code generator is
used for all other copula families to obtain the derivatives of the generator. If there
are numerical errors from this approach we use Maple with a high precision for the
computation. If Rosenblatt’s transformation produces errors even after computations
in Maple, we disregard the corresponding goodness-of-fit test and use the remaining
test results of the simulation for computing the empirical level and power.
Due to its well-known properties, we use the maximum likelihood estimator
(“MLE”) to estimate the copula parameters, based on the pseudo-observations of
the simulated random vectors U i ∼ C, i ∈ {1, . . . , n}. Besides building the pseudo-
observations, note that parameter estimation may also affect the null distribution. This
is generally addressed by using a bootstrap procedure for accessing the correct null
distribution, see Sect. 4.2 below. Note that a bootstrap can be quite time-consuming in
high dimensions, even parameter estimation already turns out to be computationally
demanding. For the bootstrap versions of the goodness-of-fit approaches involving
the generator derivatives, we were required to hard code the derivatives in order
to decrease runtime. Note that such effort is not needed for applying our proposed
goodness-of-fit test (Td−1 , Nd−1 , AD), since it is not required to access the generator
derivatives.
366 C. Hering and M. Hofert
For our proposed approach (Td−1 , Nd−1 , AD) it is not clear whether the bootstrap
procedure is valid from a theoretical point of view; see, e.g., [8] and [14]. However,
empirical results, presented in Sect. 5, indicate the validity of this approach, described
as follows.
1. Given the data X i , i ∈ {1, . . . , n}, build the pseudo-observations U i , i ∈
{1, . . . , n} as given in (3) and estimate the unknown copula parameter vector
θ by its MLE θ̂ n .
2. Based on U i , i ∈ {1, . . . , n}, the given Archimedean family, and the para-
meter estimate θ̂ n , compute the first d − 1 components Uij , i ∈ {1, . . . , n},
j ∈ {1, . . . , d − 1},
of the transformation T as in Eq. (6) and the one-dimensional
quantities Yi = d−1 −1 2
j=1 (Φ (Ui j )) , i ∈ {1, . . . , n}. Compute the Anderson-
n
Darling test statistic An = −n − n1 i=1 (2i − 1)[log(Fχ 2 (Y(i) )) + log(1 −
d−1
Fχ 2 (Y(n−i+1) ))].
d−1
3. Choose the number M of bootstrap replications. For each k ∈ {1, . . . , M} do:
a. Generate a random sample of size n from the given Archimedean copula with
parameter θ̂ n and compute the corresponding vectors of componentwise scaled
∗ , i ∈ {1, . . . , n}. Then, estimate the
ranks (i.e., the pseudo-observations) U i,k
∗
unknown parameter vector θ by θ̂ n,k .
∗ , i ∈ {1, . . . , n}, the given Archimedean family, and the
b. Based on U i,k
∗
parameter estimate θ̂ n,k , compute the first d − 1 components Ui∗j,k , i ∈
{1, . . . ,
n}, j ∈ {1, . . . , d − 1}, of the transformation T as in Eq. (6) and
∗ = d−1 −1 ∗
j=1 (Φ (Ui j,k )) , i ∈ {1, . . . , n}. Compute the Anderson-Darling
Yi,k 2
n
test statistic A∗n,k = −n − n1 i=1 ∗ )) + log(1 −
(2i − 1)[log(Fχ 2 (Y(i),k
d−1
∗
Fχ 2 (Y(n−i+1),k ))].
d−1
M
4. An approximate p-value for (Td−1 , Nd−1 , AD) is given by 1
M k=1 1{A∗n,k >An } .
The bootstrap procedures for the other approaches can be obtained similarly. For
the bootstrap procedure using Rosenblatt’s transformation see, e.g., [14]. For our
simulation studies, we used M = 1,000 bootstrap replications. Note that, together
with the number N = 1,000 of test replications, simulation studies are quite time-
consuming, especially if parameters need to be estimated and especially if high
dimensions are involved.
Applying the MLE in high dimensions is numerically challenging and time-
consuming; see also [19]. Although our proposed goodness-of-fit test can be applied
in the case d = 100, it is not easy to use the bootstrap described above in such high
dimensions. We therefore, for d = 100, investigate only the error probability of the
first kind similar to the case A addressed in [8]. For this, we generate N = 1,000
100-dimensional samples of size n = 150 with parameter chosen such that Kendall’s
tau equals τ = 0.25 and compute for each generated data set the p-value of the test
Goodness-of-fit Tests for Archimedean Copulas in High Dimensions 367
(Td−1 , Nd−1 , AD) as before, however, this time with the known copula parameter.
Finally, the number of rejections among the 1,000 conducted goodness-of-fit tests
according to the five percent level is reported. The results are given at the end of
Sect. 5.
5 Results
We first present selected results obtained from the large-scale simulation study con-
ducted for the 10 different goodness-of-fit approaches listed in (7). These results sum-
marize the main characteristics found in the simulation study. As an overall result,
we found that the empirical power against all investigated alternatives increases if
the dimension gets large. As expected, so does runtime.
We start by discussing the methods that show a comparably weak performance
in the conducted simulation study. We start with the results that are based on the
B
test statistics Sn,d−1 B to reduce the dimension. Although keeping the error
or Sn,d
probability of the first kind, the goodness-of-fit tests (Td−1 , Sn,d−1
B ), (Td , Sn,d
B ), and
(R, Sn,d ) show a comparably weak performance against the investigated alternatives,
B
at least in our test setup as described in Sect. 4.1. For example, for n = 150, d = 5,
and τ = 0.25, the method (Td , Sn,d B ) leads to an empirical power of 5.2 % for testing
Clayton’s copula when the simulated copula is Ali-Mikhail-Haq’s, 11.5 % for testing
the Gaussian copula on Frank copula data, 7.7 % for testing Ali-Mikhail-Haq’s copula
on data from Frank’s copula, and 6.4 % for testing Gumbel’s copula on data from
Joe’s copula. Similarly for the methods (Td−1 , Sn,d−1
B ) and (R, Sn,d
B ). We therefore
do not further report on the methods involving Sn,d−1B B in what follows. The
or Sn,d
method (Td , K Π , AD) also shows a rather weak performance for both investigated
dimensions and is therefore omitted. Since the cases of (K C , χ 2 ) and (K C , AD) as
well as the approaches (Td−1 , Nd−1 , AD) and (Td−1 , Nd−1 , χ 2 ) do not significantly
differ, we only report the results based on the Anderson-Darling tests.
Now consider the goodness-of-fit testing approaches (Td−1 , Nd−1 , AD),
(K C , AD), and (Td , Nd , AD). Recall that (Td−1 , Nd−1 , AD) is based on the first
d − 1 components of the transformation T addressed in Eq. (6), (K C , AD) applies
only the last component of T , and (Td , Nd , AD) applies the whole transformation T
in d dimensions, where all three approaches use the Anderson-Darling test for testing
H0∗ . The test results for the three goodness-of-fit tests with n = 150, τ = 0.25, and
d ∈ {5, 20} are reported in Tables 1, 2, and 3, respectively. As mentioned above, we
use a bootstrap procedure to obtain approximate p-values and test the hypothesis
based on those p-values. We use N = 1,000 repetitions wherever possible. In all
cases involving Joe’s copula as H0 copula only about 650 repetitions could be fin-
ished. As Tables 1 and 2 reveal, in many cases, (Td−1 , Nd−1 , AD) shows a larger
empirical power than (K C , AD) (for both d), but the differences in either direction
can be large (consider the case of the t4 copula when the true one is Clayton (both d)
and the case of the Frank copula when the true is one is Clayton (both d)). Overall,
368 C. Hering and M. Hofert
Table 1 Empirical power in % for (Td−1 , Nd−1 , AD) based on N = 1,000 replications with
n = 150, τ = 0.25, and d = 5 (left), respectively d = 20 (right)
True copula, d = 5 True copula, d = 20
H0 A C F G J Ga t4 A C F G J Ga t4
A 4.8 10.5 68.5 97.8 100.0 34.2 94.0 5.2 4.8 98.1 97.8 100.0 47.2 100.0
C 35.4 4.7 92.8 99.6 100.0 84.2 100.0 95.3 6.1 100.0 100.0 100.0 100.0 100.0
F 2.9 10.5 5.3 58.5 94.8 15.8 99.4 0.3 12.8 5.4 63.5 100.0 77.6 100.0
G 24.5 56.6 8.9 5.2 10.3 17.0 99.3 99.4 100.0 24.9 5.2 77.0 100.0 100.0
J 71.7 92.9 41.1 13.7 4.9 76.4 100.0 98.6 98.4 84.4 6.9 5.2 100.0 100.0
Table 2 Empirical power in % for (K C , AD) based on N = 1,000 replications with n = 150,
τ = 0.25, and d = 5 (left), respectively d = 20 (right)
True copula, d = 5 True copula, d = 20
H0 A C F G J Ga t4 A C F G J Ga t4
A 6.1 33.7 13.5 38.3 83.6 11.5 44.4 4.2 16.8 0.0 1.7 8.9 59.5 82.4
C 30.6 5.1 95.5 86.9 99.3 28.8 7.7 65.9 5.6 100.0 99.8 100.0 45.5 4.1
F 41.4 97.6 4.0 63.7 59.5 48.1 88.9 90.0 100.0 5.2 99.9 100.0 98.5 100.0
G 12.0 24.3 41.1 4.9 5.4 6.9 16.3 9.5 56.8 93.0 6.5 60.7 1.3 8.3
J 70.1 50.5 70.5 3.0 5.5 29.0 12.8 100.0 100.0 99.8 1.8 6.7 100.0 100.0
Table 3 Empirical power in % for (Td , Nd , AD) based on N = 1,000 replications with n = 150,
τ = 0.25, and d = 5 (left), respectively d = 20 (right)
True copula, d = 5 True copula, d = 20
H0 A C F G J Ga t4 A C F G J Ga t4
A 4.2 8.4 36.4 83.1 99.7 21.6 98.4 5.3 16.2 98.0 96.6 100.0 68.8 100.0
C 6.9 4.7 16.9 65.9 90.2 25.3 100.0 86.3 5.3 99.7 99.9 100.0 100.0 100.0
F 4.4 3.1 4.9 16.7 46.1 9.1 99.2 0.4 5.6 5.0 30.8 100.0 25.9 100.0
G 3.8 5.8 1.8 5.0 15.8 3.7 98.7 94.7 100.0 8.2 7.1 85.3 98.6 100.0
J 11.1 17.5 6.4 4.8 4.8 10.8 99.7 100.0 100.0 74.8 3.5 5.3 98.7 100.0
when the true copula is the t4 copula, (Td−1 , Nd−1 , AD) performs well. Given the
comparably numerically simple form of (Td−1 , Nd−1 , AD), this method can be quite
useful. Interestingly, by comparing Table 1 with Table 3, we see that if the transfor-
mation T with all d components is applied, there is actually a loss in power for the
majority of families tested (the cause of this behavior remains an open question).
Note that in Table 2 for the case where the Ali-Mikhail-Haq copula is tested, the
power decreases in comparison to the five-dimensional case. This might be due to
numerical difficulties occurring when K C is evaluated in this case, since the same
behavior is visible for the method (K C , χ 2 ).
Table 4 shows the empirical power of the method (R, Nd , AD). In compar-
ison to our proposed goodness-of-fit approach (Td−1 , Nd−1 , AD), the approach
(R, Nd , AD) overall performs worse. For d = 5, there are only two cases where
Goodness-of-fit Tests for Archimedean Copulas in High Dimensions 369
Table 4 Empirical power in % for (R, Nd , AD) based on N = 1,000 replications with n = 150,
τ = 0.25, and d = 5 (left), respectively d = 20 (right)
True copula, d = 5 True copula, d = 20
H0 A C F G J Ga t4 A C F G J Ga t4
A 4.5 8.9 46.9 79.1 98.8 11.0 94.2 ∗ ∗ ∗ ∗ ∗ ∗ ∗
C 11.7 5.0 17.7 53.5 68.8 10.4 99.7 93.4 5.3 100.0 100.0 100.0 100.0 100.0
F 3.4 2.6 5.5 15.8 61.6 5.7 99.5 – – – – – – –
G 4.9 4.0 1.2 3.0 14.5 1.2 97.9 – – – – – – –
J 21.1 21.8 9.5 4.3 3.6 7.2 99.7 – – – – – – –
(R, Nd , AD) performs better than (Td−1 , Nd−1 , AD) which are testing the Ali-
Mikhail-Haq copula when the true copula is t4 and testing Joe’s copula when the
true one is Gumbel. In the high-dimensional case d = 20, only results for the
Clayton copula are obtained. In this case the actual number of repetitions for cal-
culating the empirical power is approximately 500. For the cases when testing the
Ali-Mikhail-Haq, Gumbel, Frank, or Joe copula, no reliable results were obtained
since only about 20 repetitions could be run in the runtime provided by the grid. This
is due to the high-order derivatives involved in this transformation, which slow down
computations considerably; see [19] for more details.
Another aspect, especially in a high-dimensional setup is numerical precision. In
going from the low- to the high-dimensional case we faced several problems dur-
ing our computations. For example, the approach (R, Nd , AD) shows difficulties in
testing the H0 copula of Ali-Mikhail-Haq for d = 20. Even after applying Maple
(with Digits set to 15; default is 10), the goodness-of-fit tests indicated numeri-
cal problems. The numerical issues appearing in the testing approaches (K C , AD)
and (Td , Nd , AD) when evaluating the Kendall distribution function were already
mentioned earlier, e.g., in Sect. 4.1. In principal, one could be tempted to choose
a (much) higher precision than standard double in order to obtain more reliable
testing results. However, note that this significantly increases runtime. Under such a
setup, applying a bootstrap procedure would not be possible anymore. In high dimen-
sions, only the approaches (Td−1 , Nd−1 , AD) and (Td−1 , Nd−1 , χ 2 ) can be applied
without facing computational difficulties according to precision and runtime.
Concerning the case d = 100, we checked if the error probability of the first
kind according to the 5 %-level is kept. As results of the procedure described in
the end of Sect. 4.2, we obtained 4.6, 4.2, 5.0, 5.5, and 4.9 % for the families of
Ali-Mikhail-Haq, Clayton, Frank, Gumbel, and Joe, respectively.
A plot often provides more information than a single p-value, e.g., it can be used
to determine where deviations from uniformity are located; see [16] who advocate
graphical goodness-of-fit tests in higher dimensions. We now briefly apply the trans-
370 C. Hering and M. Hofert
Fig. 1 Data from a Gaussian (left) and t4 (right) copula with parameter chosen such that Kendall’s
tau equals 0.5, transformed with a Gumbel copula with parameter such that Kendall’s tau equals
0.5. The deviations from uniformity are small but visible, especially in the corners of the different
panels
Fig. 2 Data from a Clayton (left) and Gumbel (right) copula with parameter chosen such that
Kendall’s tau equals 0.5, transformed with a Gumbel copula with parameter such that Kendall’s
tau equals 0.5. The deviation from uniformity for the Clayton data is clearly visible. Since the
Gumbel data is transformed with the correct family and parameter, the resulting variates are indeed
uniformly distributed in the unit hypercube
Goodness-of-fit Tests for Archimedean Copulas in High Dimensions 371
0.2 0.2
0.0 0.2 0.4 0.0 0.0 0.2 0.4 0.0
1.0 0.6 0.8 1.0 1.0 0.6 0.8 1.0
0.8 0.8
0.6 0.6
U1 U1
0.4 0.4
0.2 0.2
0.0 0.2 0.4 0.0 0.0 0.2 0.4 0.0
Fig. 3 Data from a Gumbel copula with parameter chosen such that Kendall’s tau equals 0.5,
transformed with a Gumbel copula with parameter such that Kendall’s tau equals 0.2 (left) and 0.8
(right), respectively. Deviations from uniformity are easily visible
Goodness-of-fit tests for Archimedean copulas, also suited to high dimensions were
presented. The proposed tests are based on a transformation T whose inverse is
known for generating random variates. The tests can, therefore, be viewed as analogs
to tests based on Rosenblatt’s transformation, whose inverse is also used for sampling
(known as the conditional distribution method). The suggested goodness-of-fit tests
proceed in two steps. In the first step, the first d − 1 components of T are applied.
They provide a fast and simple transformation from d to d − 1 dimensions. This
complements known goodness-of-fit tests using only the dth component of T , the
Kendall distribution function, but which require the knowledge of the generator
derivatives. In a second step, the d − 1 components are mapped to one-dimensional
quantities, which simplifies testing. This second step is common to many goodness-
of-fit tests and hence any such test can be applied.
The power of the proposed testing approach was compared to other known
goodness-of-fit tests in a large-scale simulation study. In this study, goodness-of-
fit tests in comparably high dimensions were investigated. The computational effort
(precision, runtime) involved in applying commonly known testing procedures turned
out to be tremendous. The results obtained from these tests in higher dimensions have
to be handled with care: Numerical issues for the methods for which not all repetitions
could be run without problems might have introduced a bias. To apply commonly
known goodness-of-fit tests in higher dimensions requires (much) more work in the
future, especially on the numerical side. Computational tools which systematically
check for numerical inaccuracies and which are implemented on the paradigm of
defensive programming might provide a solution here; see [17] for a first work in
this direction.
372 C. Hering and M. Hofert
In contrast, our proposed approach is easily applied in any dimension and its
evaluation requires only small numerical precision. Due to the short runtimes, it
could also be investigated with a bootstrap procedure, showing good performance
in high dimensions. Furthermore, it easily extends to the multiparameter case. To
reduce the effect of non-robustness with respect to the permutation of the arguments,
one could randomize the data dimensions as is done for Rosenblatt’s transformation,
see [4].
Finally, a graphical goodness-of fit test is outlined. This is a rather promising
field of research for high-dimensional data, since, especially in high dimensions,
none of the existing models fits perfectly, and so a graphical assessment of the parts
(or dimensions) of the model which fit well and those which do not is in general
preferable to a single p-value.
Acknowledgments The authors would like to thank Christian Genest (McGill University) for
valuable feedback on this paper. The authors would also like to thank Christian Mosch and the
Communication and Information Center of Ulm University for providing computational power via
the bwGRiD Cluster Ulm and assistance in using it.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
1. Anderson, T.W., Darling, D.A.: Asymptotic theory of certain goodness-of-fit criteria based on
stochastic processes. Ann. Math. Stat. 23(2), 193–212 (1952)
2. Anderson, T.W., Darling, D.A.: A test of goodness of fit. J. Am. Stat. Assoc. 49, 765–769
(1954)
3. Barbe, P., Genest, C., Ghoudi, K., Rémillard, B.: On Kendall’s process. J. Multivar. Anal. 58,
197–229 (1996)
4. Berg, D.: Copula goodness-of-fit testing: an overview and power comparison. Eur. J. Financ.
15(7—-8), 675–701 (2009). http://www.informaworld.com/10.1080/13518470802697428
5. Berg, D., Bakken, H.: A copula goodness-of-fit approach based on the conditional probability
integral transformation (2007). http://www.danielberg.no/publications/Btest.pdf
6. Breymann, W., Dias, A., Embrechts, P.: Dependence structures for multivariate high-frequency
data in finance. Quant. Financ. 3, 1–14 (2003)
7. Devroye, L.: Non-Uniform Random Variate Generation. Springer, Heidelberg (1986)
8. Dobrić, J., Schmid, F.: A goodness of fit test for copulas based on Rosenblatt’s transformation.
Comput. Stat. Data Anal. 51, 4633–4642 (2007)
9. Embrechts, P., Hofert, M.: Comments on: inference in multivariate archimedean copula models.
TEST 20(2), 263–270 (2011). doi:http://dx.doi.org/10.1007/s11749-011-0252-4
10. Embrechts, P., Lindskog, F., McNeil, A.J.: Modelling dependence with copulas and applications
to risk management. In: Rachev, S. (ed.) Handbook of Heavy Tailed Distributions in Finance,
pp. 329–384. Elsevier, Amsterdam (2003)
11. Fermanian, J.D.: Goodness of fit tests for copulas. J. Multivar. Anal. 95(1), 119–152 (2005)
12. Genest, C., Rivest, L.P.: Statistical inference procedures for bivariate Archimedean copulas. J.
Am. Stat. Assoc. 88(423), 1034–1043 (1993)
Goodness-of-fit Tests for Archimedean Copulas in High Dimensions 373
13. Genest, C., Quessy, J.F., Rémillard, B.: Goodness-of-fit procedures for copula models based
on the probability integral transformation. Scand. J. Stat. 33, 337–366 (2006)
14. Genest, C., Rémillard, B., Beaudoin, D.: Goodness-of-fit tests for copulas: a review and a power
study. Insur. Math. Econ. 44, 199–213 (2009)
15. Hofert, M.: Efficiently sampling nested Archimedean copulas. Comput. Stat. Data Anal. 55,
57–70 (2011). doi: http://dx.doi.org/10.1016/j.csda.2010.04.025
16. Hofert, M., Mächler, M.: A graphical goodness-of-fit test for dependence models in higher
dimensions. J. Comput. Graph. Stat. (2013). doi: http://dx.doi.org/10.1080/10618600.2013.
812518
17. Hofert, M., Mächler, M.: Parallel and other simulations in R made easy: An end-to-end study
(2014)
18. Hofert, M., Mächler, M., McNeil, A.J.: Likelihood inference for Archimedean copulas in high
dimensions under known margins. J. Multivar. Anal. 110, 133–150 (2012). doi: http://dx.doi.
org/10.1016/j.jmva.2012.02.019
19. Hofert, M., Mächler, M., McNeil, A.J.: Archimedean copulas in high dimensions: estimators
and numerical challenges motivated by financial applications. J. de la Société Française de
Statistique 154(1), 25–63 (2013)
20. Malov, S.V.: On finite-dimensional Archimedean copulas. In: Balakrishnan, N., Ibragimov, I.,
Nevzorov, V. (eds.) Asymptotic Methods in Probability and Statistics with Applications, pp.
19–35. Birkhäuser, Boston (2001)
21. Marsaglia, G., Marsaglia, J.C.W.: Evaluating the Anderson Darling distribution. J. Stat. Softw.
9(2), 1–5 (2004)
22. McNeil, A.J., Nešlehová, J.: Multivariate Archimedean copulas, d-monotone functions and
l1 -norm symmetric distributions. Ann. Stat. 37(5b), 3059–3097 (2009)
23. McNeil, A.J., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts, Techniques,
Tools. Princeton University, Princeton (2005)
24. Rao, C.R.: Linear Statistical Inference and its Applications. Wiley-Interscience, Hoboken
(2001)
25. Rosenblatt, M.: Remarks on a multivariate transformation. Ann. Math. Stat. 23(3), 470–472
(1952)
26. Savu, C., Trede, M.: Goodness-of-fit tests for parametric families of Archimedean copulas.
Quant. Financ. 8(2), 109–116 (2008)
27. Schmitz, V.: Copulas and stochastic processes. Ph.D. thesis, Rheinisch-Westfälische Technis-
che Hochschule Aachen
28. Wagner, R.: Mersenne twister random number generator (2003).http://www-personal.umich.
edu/wagnerr/MersenneTwister.html
29. Wu, F., Valdez, E.A., Sherris, M.: Simulating exchangeable multivariate Archimedean copulas
and its applications. Commun. Stat. Simul. Comput. 36(5), 1019–1034 (2007)
Duality in Risk Aggregation
1 Introduction
random vector that takes values in Rn and whose distribution is only partially known.
For example, one may only have information about the marginals of X and possibly
partial information about some of the moments.
To solve such problems, duality is often exploited, as the dual may be easier to
approach numerically or analytically [2–5, 14]. Being able to formulate a dual is also
important in cases where the primal is approachable algorithmically, as solving the
primal and dual problems jointly provides an approximation guarantee throughout
the run of a solve: if the duality gap (the difference between the primal and dual
objective values) falls below a chosen threshold relative to the primal objective, the
algorithm can be stopped with a guarantee of approximating the optimum to a fixed
precision that depends on the chosen threshold. This is a well-known technique in
convex optimization, see, e.g., [1].
Although for some special cases of the marginal problem analytic solutions and
powerful numerical heuristics exist [6, 12, 13, 18, 19], these techniques do not apply
when additional constraints are imposed to force the probability measures over which
we maximize the risk to conform with empirical observations: In a typical case, the
bulk of the empirical data may be contained in a region D that can be approximated
by an ellipsoid or the union of several (disjoint or overlapping) polyhedra. For a
probability measure μ to be considered a reasonable explanation of the true distribu-
tion of (multidimensional) losses, one would require the probability mass contained
in D to lie in an empirically estimated confidence region, that is, ≤ μ(D) ≤ u
for some estimated bounds < u. In such a situation, the derivation of robust risk
aggregation bounds via dual problems remains a powerful and interesting approach.
In this chapter, we formulate a general optimization problem, which can be seen as
a doubly infinite linear programming problem, and we show that the associated dual
generalizes several well known special cases. We then apply this duality framework
to a new class of risk management models we propose in Sect. 4.
Let (Φ, F), (Γ, G) and (Σ, S) be complete measure spaces, and let A : Γ ×Φ → R,
a : Γ → R, B : Σ ×Φ → R, b : Σ → R, and c : Φ → R be bounded measurable
functions on these spaces and the corresponding product spaces. Let MF, MG and
MS be the set of signed measures with finite variation on (Φ, F), (Γ, G), and (Σ, S)
respectively. We now consider the following pair of optimization problems over MF
and MG × MS, respectively,
(P) sup c(x) d F (x)
F ∈MF
Φ
s.t. A(y, x) d F (x) ≤ a(y), (y ∈ Γ ),
Φ
Duality in Risk Aggregation 377
B(z, x) d F (x) = b(z), (z ∈ Σ),
Φ
F ≥ 0,
and
(D) inf a(y) d G (y) + b(z) d S (z),
(G ,S )∈MG ×MS
Γ Σ
s.t. A(y, x) d G (y) + B(z, x) d S (z) ≥ c(x), (x ∈ Φ),
Γ Σ
G ≥ 0.
We claim that the infinite-programming problems (P) and (D) are duals of each other.
Theorem 1 (Weak Duality) For every (P)-feasible measure F and every (D)-
feasible pair (G , S ) we have
c(x) d F (x) ≤ a(y) d G (y) + b(z) d S (z).
Φ Γ Σ
In various special cases, such as those discussed in Sect. 3, strong duality is known
to hold subject to regularity assumptions, that is, the optimal values of (P) and (D)
coincide. Another special case under which strong duality applies is when the mea-
sures F , G , and S have densities in appropriate Hilbert spaces, see the forthcoming
DPhil thesis of the second author [17].
We remark that the quantifiers in the constraints can be weakened if the set of
allowable measures is restricted. For example, if G is restricted to lie in a set of
measures that are absolutely continuous with respect to a fixed measure G0 ∈ MG,
then the quantifier (y ∈ Γ ) can be weakened to (G0 -almost all y ∈ Γ ).
378 R. Hauser et al.
3 Classical Examples
Our general duality relation of Theorem 1 generalizes many classical duality results,
of which we now point out a few examples. Let p(x1 , . . . , xk ) be a function of k
arguments. Then we write
1 if p(x) ≥ 0,
1{x: p(x)≥0} := 1{y: p(y)≥0} (x) =
0 otherwise.
In other words, we write the argument x of the indicator function directly into the set
{y : p(y) ≥ 0} that defines the function, rather than using a separate set of variables
y. This abuse of notation will make it easier to identify which inequality is satisfied
by the arguments where the function 1{y: p(y)≥0} (x) takes the value 1.
We start with the Moment Problem studied by Bertsimas and Popescu [2], who
considered generalized Chebychev inequalities of the form
and c(x) = 1{x: r (x)≥0} , where we made use of the abuse of notation discussed above,
problem (P’) becomes a special case of the primal problem considered in Sect. 2,
(P) sup 1{x: r (x)≥0} d F (x)
F
Rn
s.t. x1k1 , . . . , xnkn d F (x) = bk , (k ∈ J ),
Rn
1 d F (x) = 1,
Rn
F ≥ 0.
Duality in Risk Aggregation 379
Our dual
(D) inf z k bk + z 0
(z,z 0 )∈R|J |+1
k∈J
s.t. z k x1k1 , . . . , xnkn + z 0 ≥ 1{x:r (x)≥0} , (x ∈ Rn )
k∈J
is easily seen to be identical with the dual (D’) identified by Bertsimas and Popescu,
(D’) inf z k bk + z 0
(z,z 0 )∈R|J |+1
k∈J
s.t. ∀ x ∈ Rn , r (x) ≥ 0 ⇒ z k x1k1 , . . . , xnkn + z 0 − 1 ≥ 0,
k∈J
k1
∀x ∈ R , n
z k x1 , . . . , xnkn + z 0 ≥ 0.
k∈J
Note that since Γ, Σ are finite, the constraints of (D’) are polynomial copositivity
constraints. The numerical solution of semi-infinite programming problems of this
type can be approached via a nested hierarchy of semidefinite programming relax-
ations that yield better and better approximations to (D’). The highest level problem
within this hierarchy is guaranteed to solve (D’) exactly, although the corresponding
SDP is of exponential
size in the dimension n, in the degree of the polymomial r ,
and in maxk∈J ( i ki ). For further details see [2, 7, 10], and Sect. 4.6 below.
Next, we consider the Marginal Problem studied by Rüschendorf [15, 16] and
Ramachandran and Rüschendorf [14],
(P’) sup h(x) d F (x),
F ∈M F1 ,...,Fn
Rn
where M F1 ,...,Fn is the set of probability measures on Rn whose marginals have the
cdfs Fi (i = 1, . . . , n). Problem (P’) can easily be seen as a special case of the
framework of Sect. 2 by setting c(x) = h(x), Φ = Rn , Γ = ∅, Σ = Nn × R,
B(i, z, x) = 1{y: yi ≤z} (using the abuse of notation discussed earlier), and bi (z) =
Fi (z) (i ∈ Nn , z ∈ R),
(P) sup h(x) d F (x)
F
Rn
s.t. 1{xi ≤z} d F (x) = Fi (z), (z ∈ R, i ∈ Nn )
R
F ≥ 0.
380 R. Hauser et al.
The signed measures Si being of finite variation, the functions Si (z) = S ((−∞, z])
and the limits si = lim z→∞ Si (z) = S ((−∞, +∞)) are well defined and finite.
Furthermore, using lim z→−∞ Fi (z) = 0 and lim z→+∞ Fi (z) = 1, we have
⎛ ⎞
n
n
Fi (z) d S (z) = ⎝ Fi (z)Si (z)|+∞
−∞ − Si (z) d Fi (z)⎠
i=1 R i=1 R
n n
= si − Si (z) d Fi (z)
i=1 i=1 R
n
= (si − Si (z)) d Fi (z),
i=1 R
and likewise,
n
n
+∞
n
1{xi ≤z} d Si (z) = 1 d Si (z) = (si − Si (xi )).
i=1 R i=1 xi i=1
n
s.t. h i (xi ) ≥ h(x), (x ∈ Rn ).
i=0
This is the dual identified by Ramachandran and Rüschendorf [14]. Due to the general
form of the functions h i , the infinite programming problem (D’) is not directly
usable in numerical computations. However, for specific h(x), (D’)-feasible functions
(h 1 , . . . , h n ) can sometimes be constructed explicitly, yielding an upper bound on
the optimal objective function value of (P’) by virtue of Theorem 1. Embrechts and
Puccetti [3–5] used this approach to derive quantile bounds on X 1 + · · · + X n , where
Duality in Risk Aggregation 381
X is a random vector with known marginals but unknown joint distribution. In this
case, the relevant primal objective function is defined by h(x) = 1{x: eT x≥t} , where
t ∈ R is a fixed level. More generally, h(x) = 1{x: Ψ (x)≥t} can be chosen, where Ψ is
a relevant risk aggregation function, or h(x) can model any risk measure of choice.
Our next example is the Marginal Problem with Copula Bounds, an extension
to the marginal problem mentioned in [3]. The copula defined by the probability
measure F with marginals Fi is the function
A copula is any function C : [0, 1]n → [0, 1] that satisfies C = CF for some
probability measure F on Rn . Equivalently, a copula is the multivariate cdf of any
probability measure on the unit cube [0, 1]n with uniform marginals. In quantitative
risk management, using the model
sup h(x) d F (x)
F ∈M F1 ,...,Fn
Rn
to bound the worst-case risk for a random vector X with marginal distributions Fi
can be overly conservative, as no dependence structure between the coordinates of
X i is assumed given at all. The structure that determines this dependence being the
copula CF , where F is the multivariate distribution of X , Embrechts and Puccetti
[3] suggest problems of the form
(P’) sup h(x) d μ(x),
μ∈M F1 ,...,Fn
Rn
s.t. Clo ≤ CF ≤ Cup ,
Once again, (P’) is a special case of the general framework studied in Sect. 2, as it is
equivalent to write
382 R. Hauser et al.
(P) sup h(x) d F (x)
F
Rn
s.t. 1{x≤(F −1 (u 1 ),...,Fn−1 (u n ))} (u, x) d F (x) ≤ Cup (u), (u ∈ [0, 1]n ),
1
Rn
− 1{x≤(F −1 (u 1 ),...,Fn−1 (u n ))} (u, x) d F (x) ≤ −Clo (u), (u ∈ [0, 1]n ),
1
Rn
1{xi ≤z} (z, x) d F (x) = Fi (z), (i ∈ Nn , z ∈ R),
Rn
F ≥ 0.
Glo , Gup ≥ 0.
n
s.t. Gup (B(x)) − Glo (B(x)) + (si − Si (xi )) ≥ h(x), (x ∈ Rn ),
i=1
Gup , Glo ≥ 0,
where B(x) = {u ∈ [0, 1]n : u ≥ (F1 (x1 ), . . . , Fn (xn ))}. To the best of our
knowledge, this dual has not been identified before.
Due to the high dimensionality of the space of variables and constraints both in
the primal and dual, the marginal problem with copula bounds is difficult to solve
numerically, even for very coarse discrete approximations.
Duality in Risk Aggregation 383
where φ(x) is a suitable test function. A suitable way of estimating upper and lower
bounds on such integrals from sample data xi (i ∈ Nk ) is to estimate confidence
bounds via bootstrapping.
4.1 Motivation
To motivate the use of constraints in the form of bounds on integrals (1), we offer the
following explanations: First of all, discretized marginal constraints are of this form
with piecewise constant test functions, as the requirement that Fi (ξk )−Fi (ξk−1 ) = bk
(k = 1, . . . , ) for a fixed set of discretization points ξ0 < · · · < ξ can be expressed
as
1{ξk ≤xi ≤ξk−1 } d F (x) = bk , (k = 1, . . . , ). (2)
Φ
It is, furthermore, quite natural to relax each of these equality constraints to two
inequality constraints
bk,i ≤ 1{ξk ≤xi ≤ξk−1 } d F (x) ≤ bk,i
u
where the weights w j > 0 satisfy j w j = 1 and express the relative importance of
each constituent constraint. Nonnegative test functions thus have a natural interpre-
tation as importance densities in sums-of-constraints relaxations. This allows one to
put higher focus on getting the probability mass right in regions where it particularly
matters (e.g., values of X that account for the bulk of the profits of a financial insti-
tution), while maximzing the risk in the tails without having to resort to too fine a
discretization.
While this suggests to use a piecewise approximation of a prior estimate of the
density of X as a test function, the results are robust under mis-specification of this
prior, for as long as φ(x) is nonconstant, constraints that involve the integral (1)
tend to force the probability weight of X into the regions where the sample points
are denser. To illustrate this, consider a univariate random variable with density
f (x) = 2/3(1 + x) on x ∈ [0, 1] and test function φ(x) = 1 + ax with a ∈ [−1, 1].
1
Then 0 φ(x) f (x) d x = 1+5a/9. The most dispersed probability measure on [0, 1]
that satisfies
1
5a
φ(x) d F (x) = 1 + (3)
9
0
1
3a
φ(x) d F (x) = 1 +
4
0
Duality in Risk Aggregation 385
is the polyhedral cone with recession directions rmi ∈ Rn . Each polyhedron also has
a dual description in terms of linear inequalities,
ki
Ξi = x ∈ Rn : f ji , x ≥ ij ,
j=1
for some vectors f ji ∈ Rn and bounds ij ∈ R. The main case of interest is where Ξi
is either a finite or infinite box in Rn with faces parallel to the coordinate axes, or an
intersection of such a box with a linear half-space, in which case it is easy to pass
between the primal and dual descriptions. Note however that the dual description is
preferrable, as the description of a box in Rn requires only 2n linear inequalities,
while the primal description requires 2n extreme vertices.
Let us now consider the problem
(P) sup h(x) d F (x)
F ∈MF
Φ
s.t. φs (x) d F (x) ≤ as , (s = 1, . . . , M),
Φ
s.t. ψt (x) d F (x) = bt , (t = 1, . . . , N ),
Φ
386 R. Hauser et al.
1 d F (x) = 1,
Φ
F ≥ 0,
k
where the test functions ψt are piecewise linear on the partition Φ = i=1 Ξi , and
where −h(x) and the test functions φs are piecewise linear on the infinite polyhedra
of the partition, and either jointly linear, concave, or convex on the finite polyhedra
(i.e., polytopes) of the partition. The dual of (P) is
M
N
(D) inf as ys + bt z t + z 0 ,
(y,z)∈R M+N +1
s=1 t=1
M
N
s.t. ys φs (x) + z t ψt (x) + z 0 1Φ (x) − h(x) ≥ 0, (x ∈ Φ),
s=1 t=1
(4)
y ≥ 0.
M
N
ys φs (x) + z t ψt (x) + z 0 1Φ (x) − h(x) ≥ 0, (x ∈ Ξi ), (i = 1, . . . , k).
s=1 t=1
Next we will see how these copositivity constraints can be handled numerically, often
by relaxing all but finitely many constraints. Nesterov’s first-order method can be
adapted to solve the resulting problems, see [8, 9, 17].
In what follows, we will use the notation
M
N
ϕ y,z (x) = ys φs (x) + z t ψt (x) + z 0 − h(x).
s=1 t=1
The first case we discuss is when φs |Ξi and h|Ξi are jointly linear. Since we fur-
thermore assumed that the functions ψt |Ξi are linear, there exist vectors vsi ∈ Rn ,
wti ∈ Rn , g i ∈ Rn and constants csi ∈ R, dti ∈ R and ei ∈ R such that
Duality in Risk Aggregation 387
M
N
ys φs (x) + z t ψt (x) + z 0 1Φ (x) − h(x) ≥ 0, (x ∈ Ξi )
s=1 t=1
f ji , x ≥ ij , ( j = 1, . . . , ki ) =⇒
M
N
M
N
i
ys vs + z t wt − g , x ≥ e i −
i i ys csi − z t dti − z 0 .
s=1 t=1 s=1 t=1
M
N
ki
ys vsi + z t wti − g i = λij f ji , (5)
s=1 t=1 j=1
M
N
ki
e −
i
ys csi − z t dti − z0 ≤ λij ij , (6)
s=1 t=1 j=1
λij ≥ 0, ( j = 1, . . . , ki ), (7)
Then choose φι, j (x) = 1 Sι, j (x), the indicator function of slab Sι, j . We remark that
this approach corresponds to discretizing the constraints of the Marginal Problem
described in Sect. 3, but not to discretizing the probability measures over which we
maximize the aggregated risk.
While the number of test functions is nm and thus linear in the problem dimension,
the number of polyhedra to consider is exponentially large, as all intersections of the
form
n
Ξι,j = Sι, jι
ι=1
for the m n possible choices of j ∈ Nnm have to be treated separately. In addition, in VaR
applications h(x) is taken as the indicator function of an affine half-space {x : xι ≥
τ } for a suitably chosen threshold τ , and for
CVaR applications h(x) is chosen as the
piecewise linear function h(x)
= max(0, x ι − τ ). Thus, polyhedra Ξ ι,j that meet
the affine hyperplane {x : xι = τ } are further sliced into two separate polyhedra.
A straightforward application of the above described LP framework would thus lead
to an LP with exponentially many constraints and variables. Note however that the
constraints (5)–(7) now read
ki
gi = λij f ji , (8)
j=1
M
ki
ei − ys csi − z 0 ≤ λij ij , (9)
s=1 j=1
λij ≥ 0, ( j = 1, . . . , ki ), (10)
vs = 0 and no test functions ψt (x) were used, with g i = [ 1 ... 1 ]T when Ξi ⊆ {x :
as i
xι ≥ τ } and g i = 0 otherwise. That is, the vector that appears in the left-hand
side of Constraint (8) is fixed by the polyhedron Ξi alone and does not depend on the
decision variables y, z 0 . Since z 0 is to be chosen as small as possible in an optimal
solution of (D), the constraint (9) has to be made as slack as possible. Therefore, the
optimal values of λij are also fixed by the polyhedron Ξi alone and are identifiable
by solving the small-scale LP
ki
(λij )∗ = arg max λij ij
λ
j=1
ki
s.t. − g i = λij f ji ,
j=1
λij ≥ 0, ( j = 1, . . . , ki ).
Duality in Risk Aggregation 389
In other words, when the polyhedron Ξi is considered for the first time, the variables
(λij )∗ can be determined once and for all, after which the constraints (8)–(10) can be
replaced by
ei − ys csi − z 0 ≤ Ci ,
s
i
where Ci = kj=1 (λij )∗ ij , and where the sum on the left-hand side only extends
over the n indices s that correspond to test functions that are nonzero on Ξi . Thus,
only the nm + 1 decision variables (y, z 0 ) are needed to solve (D). Furthermore, the
exponentially many constraints correspond to an extremely sparse constraint matrix,
making the dual of (D) an ideal candidate to apply the simplex algorithm with delayed
column generation. A similar approach is possible for the situation where φs is of
the form
for all s = (ι, j). The advantage of using test functions of this form is that fewer
breakpoints ξι, j are needed to constrain the distribution appropriately.
When φs |Ξi and −h|Ξi are jointly convex, then ϕ y,z (x) is convex. The copositivity
constraint
M
N
ys φs (x) + z t ψt (x) + z 0 1Φ (x) − h(x) ≥ 0, (x ∈ Ξi )
s=1 t=1
ki
ϕ y,z (x) + λij ij − f ji , x ≥ 0, (x ∈ Rn ), (11)
j=1
λij ≥ 0, ( j = 1, . . . , ki ),
where λij are once again auxiliary decision variables. While (11) does not reduce to
finitely many constraints, the validity of this condition can be checked numerically
390 R. Hauser et al.
i
by globally minimizing the convex function ϕ y,z (x)+ kj=1 λij ij − f ji , x . The
constraint (11) can then be enforced explicitly if a line-search method is used to solve
the dual (D).
When φs |Ξi and −h|Ξi are jointly concave but not linear, then ϕ y,z (x) is concave
and Ξi = conv(q1i , . . . , qni i ) is a polytope. The copositivity constraint
M
N
ys φs (x) + z t ψt (x) + z 0 1Φ (x) − h(x) ≥ 0, (x ∈ Ξi ) (12)
s=1 t=1
ϕ y,z (q ij ) ≥ 0, ( j = 1, . . . , n i ).
Thus, (12) can be replaced by n i linear inequality constraints on the decision variables
ys and z t .
Another case that can be treated via finitely many constraints is when φs |Ξi , ψt |Ξi ,
and h|Ξi are jointly polynomial. The approach of Lasserre [7] and Parrilo [10] can
be applied to turn the copositivity constraint
into finitely many linear matrix inequalities. However, this approach is generally
limited to low-dimensional applications.
5 Conclusions
Our analysis shows that a wide range of duality relations in use in quantitative
risk management can be understood from the single perspective of a generalized
duality relation discussed in Sect. 2. An interesting class of special cases is provided
by formulating a finite number of constraints in the form of bounds on integrals.
Duality in Risk Aggregation 391
The duals of such models are semi-inifinite optimization problems that can often be
reformulated as finite optimization problems, by making use of standard results on
copositivity.
Acknowledgments Part of this research was conducted while the first author was visiting the FIM
at ETH Zurich during sabbatical leave from Oxford. He thanks for the support and the wonderful
research environment he encountered there. The research of the first two authors was supported
through grant EP/H02686X/1 from the Engineering and Physical Sciences Research Council of the
UK.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
1. Borwein, J. and Lewis, A.: Convex Analysis and Nonlinear Optimization: Theory and Examples
(CMS Books in Mathematics)
2. Bertsimas, D., Popescu, I.: Optimal inequalities in probability theory: a convex optimization
approach. SIAM J. Optim. 15(3), 780–804 (2005)
3. Embrechts, P., Puccetti, G.: Bounds for functions of dependent risks. Financ. Stoch. 10, 341–
352 (2006)
4. Embrechts, P., Puccetti, G.: Bounds for functions of multivariate risks. J. Multivar. Analy. 97,
526–547 (2006)
5. Embrechts, P., Puccetti, G.: Aggregating risk capital, with an application to operational risk.
Geneva Risk Insur. Rev. 31, 71–90 (2006)
6. Embrechts, P., Puccetti, G., Rüschendorf, L.: Model uncertainty and VaR aggregation. J. Bank.
Financ. 37(8), 2750–2764 (2013)
7. Lasserre, J.B.: Global optimization with polynomials and the problem of moments. SIAM J.
Optim. 11(3), 796–817 (2001)
8. Nesterov, Y.: A method of solving a convex programming problem with convergence rate
O(1/k 2 ). Soviet Math. Dokl. 27(2), 372–376 (1983)
9. Nesterov, Y.: Smooth minimization of non-smooth functions. Math. Program., Ser. A 103,
127–152 (2005)
10. Parrilo, P.A.: Semidefinite programming relaxations for semialgebraic problems. Math. Pro-
gram., Ser. B 96, 293–320 (2003)
11. Pólik, I., Terlaky, T.: A survey of the S-Lemma. SIAM Rev. 49, 371–418 (2007)
12. Puccetti, G., Rüschendorf, L.: Computation of sharp bounds on the distribution of a function
of dependent risks. J. Comp. Appl. Math. 236(7), 1833–1840 (2012)
13. Puccetti, G., Rüschendorf, L.: Sharp bounds for sums of dependent risks. J. Appl. Probab.
50(1), 42–53 (2013)
14. Ramachandran, D., Rüschendorf, L.: A general duality theorem for marginal problems. Probab.
Theory Relat. Fields 101(3), 311–319 (1995)
15. Rüschendorf, L.: Random variables with maximum sums. Adv. Appl. Probab. 14(3), 623–632
(1982)
16. Rüschendorf, L.: Construction of multivariate distributions with given marginals. Ann. Inst.
Stat. Math. 37(2), 225–233 (1985)
17. Shahverdyan, S.: Optimisation Methods in Risk Management. DPhil thesis, Oxford Mathe-
matical Institute (2014)
392 R. Hauser et al.
18. Wang, R., Peng, L., Yang, J.: Bounds for the sum of dependent risks and worst value-at-risk
with monotone marginal densities. Financ. Stoch. 17(2), 395–417 (2013)
19. Wang, B., Wang, R.: The complete mixability and convex minimization problems with
monotone marginal densities. J. Multivar. Anal. 102(10), 1344–1360 (2011)
Some Consequences of the Markov Kernel
Perspective of Copulas
Abstract The objective of this paper is twofold: After recalling the one-to-one
correspondence between two-dimensional copulas and Markov kernels having the
Lebesgue measure λ on [0, 1] as fixed point, we first give a quick survey over some
consequences of this interrelation. In particular, we sketch how Markov kernels can
be used for the construction of strong metrics that strictly distinguish extreme kinds of
statistical dependence, and show how the translation of various well-known copula-
related concepts to the Markov kernel setting opens the door to some surprising
mathematical aspects of copulas. Secondly, we concentrate on the fact that iterates
of the star product of a copula A with itself are Cesáro convergent to an idempotent
copula  with respect to any of the strong metrics mentioned before and prove that
 must have a very simple form if the Markov operator T A associated with A is
quasi-constrictive in the sense of Lasota.
1 Introduction
In 1996, Olsen et al. (see [23]) proved the existence of an isomorphism between
the family C of two-dimensional copulas (endowed with the so-called star prod-
uct) and the family M of all Markov operators (with the standard composition
as binary operation). Using disintegration (see [29]) allows to express the afore-
mentioned Markov operators in terms of Markov kernels, resulting in a one-to-one
correspondence of C with the family K of all Markov kernels having the Lebesgue
W. Trutschnig (B)
Department for Mathematics, University of Salzburg, Hellbrunnerstrasse 34,
Salzburg 5020, Austria
e-mail: [email protected]
J. Fernández Sánchez
Grupo de Investigación de Análisis Matemático, Universidad de Almería,
La Cañada de San Urbano, Almería, Spain
e-mail: [email protected]
measure λ on [0, 1] as fixed point. Identifying every copula with its Markov kernel
allows to define new metrics D1 , D2 , D∞ which, contrary to the uniform one, strictly
separate independence from complete dependence (full predictability). Additionally,
the ‘translation’ of various copula-related concepts from C to M and K has proved
useful in so far that it allowed both, for alternative simple proofs of already known
properties as well as for new and interesting results. Section 3 of this paper is a quick
incomplete survey over some useful consequences of this translation. In particular,
we mention the fact that for each copula A ∈ C , the iterates of the star product of
A with itself are Cesáro converge to an idempotent copula  w.r.t. each of the three
metrics mentioned before, i.e., we have
n
whereby s∗n (A) = n1 i=1 A∗i for every n ∈ N. Section 4 contains some new
unpublished results and proves that the idempotent limit copula  must have a very
simple (ordinal-sum-like) form if the Markov operator T A corresponding to A is
quasi-constrictive in the sense of Lasota ([1, 15, 18]).
holds P-a.s. It is well known that for each pair (X, Y ) of real-valued random variables
a regular conditional distribution K (·, ·) of Y given X exists, that K (·, ·) is unique
P X -a.s. (i.e., unique for P X -almost all x ∈ R) and that K (·, ·) only depends on
P X ⊗Y . Hence, given A ∈ C we will denote (a version of) the regular conditional
distribution of Y given X by K A (·, ·) and refer to K A (·, ·) simply as regular condi-
tional distribution of A or as the Markov kernel of A. Note that for every A ∈ C ,
Some Consequences of the Markov Kernel Perspective of Copulas 395
its conditional regular distribution K A (·, ·), and every Borel set G ∈ B([0, 1]2 ) we
have
K A (x, G x ) dλ(x) = μ A (G), (2)
[0,1]
for every F ∈ B([0, 1]). On the other hand, every Markov kernel K : [0, 1] ×
B([0, 1]) → [0, 1] fulfilling (3) induces a unique element μ ∈ PC ([0, 1]2 ) via
(2). For more details and properties of conditional expectation, regular conditional
distributions, and disintegration see [13, 14].
T will denote the family of all λ-preserving transformations h : [0, 1] → [0, 1]
(see [34]), T p the subset of all bijective h ∈ T . A copula A ∈ C will be called
completely dependent if and only if there exists h ∈ T such that K (x, E) := 1 E (hx)
is a regular conditional distribution of A (see [17, 29] for equivalent definitions and
main properties). For every h ∈ T , the corresponding completely dependent copula
will be denoted by C h , the class of all completely dependent copulas by Cd .
A linear operator T on L 1 ([0, 1]) := L 1 ([0, 1], B([0, 1]), λ) is called Markov oper-
ator ([3, 23] if it fulfills the following three properties:
1. T is positive, i.e., T ( f ) ≥ 0 whenever f ≥ 0
2. T (1[0,1] ) = 1[0,1]
3. [0,1] (T f )(x)dλ(x) = [0,1] f (x)dλ(x)
As mentioned in the introduction M will denote the class of all Markov operators
on L 1 ([0, 1]). It is straightforward to see that the operator norm of T is one, i.e.,
T
:= sup{
T f
1 :
f
1 ≤ 1} = 1 holds. According to [23] there is a one-to-
one correspondence between C and M —in fact, the mappings Φ : C → M and
Ψ : M → C , defined by
d
Φ(A)( f )(x) : = (T A f )(x) := A,2 (x, t) f (t)dλ(t),
dx
[0,1]
(4)
Ψ (T )(x, y) : = A T (x, y) := (T 1[0,y] )(t)dλ(t)
[0,x]
for every f ∈ L 1 ([0, 1]) and (x, y) ∈ [0, 1]2 (A,2 denoting the partial derivative of
A w.r.t. y), fulfill Ψ ◦ Φ = idC and Φ ◦ Ψ = idM . Note that in case of f := 1[0,y]
we have (T A 1[0,y] )(x) = A,1 (x, y) λ-a.s. According to [29] the first equality in (4)
can be simplified to
396 W. Trutschnig and J. Fernández Sánchez
(T A f )(x) = E( f ◦ Y |X = x) = f (y)K A (x, dy) λ-a.s. (5)
[0,1]
It is not difficult to show that the uniform metric d∞ is a metrization of the weak
operator topology on M (see [23]).
In this section, we give a quick survey showing the usefulness of the Markov kernel
perspective of two-dimensional copulas.
D22 (A, B) := K A (x, [0, y]) − K B (x, [0, y])2 dλ(x) dλ(y) (7)
[0,1] [0,1]
D∞ (A, B) := sup K A (x, [0, y]) − K B (x, [0, y])2 dλ(x) (8)
y∈[0,1]
[0,1]
The following two theorems state the most important properties of the metrics D1 , D2
and D∞ .
Theorem 1 ([29]) Suppose that A, A1 , A2 , . . . are copulas and let T, T1 , T2 , . . .
denote the corresponding Markov operators. Then the following four conditions are
equivalent:
(a) limn→∞ D1 (An , A) = 0
(b) limn→∞ D∞ (An , A) = 0
(c) limn→∞
Tn f − T f
1 = 0 for every f ∈ L 1 ([0, 1])
(d) limn→∞ D2 (An , A) = 0
As a consequence, each of the three metrics D1 , D2 and D∞ is a metrization of the
strong operator topology on M .
Some Consequences of the Markov Kernel Perspective of Copulas 397
Theorem 2 ([29]) The metric space (C , D1 ) is complete and separable. The same
holds for (C , D2 ) and (C , D∞ ). The topology induced on C by D1 is strictly finer
than the one induced by d∞ .
Remark 3 The idea of constructing metrics via conditioning to the first coordinate
can be easily extended to the family C m of all m-dimensional copulas for arbitrary
m ≥ 3. For instance, the multivariate version of D1 on C m can be defined by
D1 (A, B) = |K A (x, [0, y]) − K B (x, [0, y])|dλ(x)dλm−1 (y),
[0,1]m−1 [0,1]
m−1
whereby [0, y] = ×i=1 [0, yi ] and K A (K B ) denotes the Markov kernel (regular
conditional distribution) of Y given X for (X, Y) ∼ A(B). As shown in [11],
the resulting metric spaces (C m , D1 ), (C m , D2 ), (C m , D∞ ) are again complete and
separable.
This dependence measure τ1 exhibits the seemingly natural properties that (i) exactly
members of the family Cd (describing complete dependence) are assigned maximum
dependence (equal to one) and (ii) Π is the only copula with minimum dependence
(equal to zero). Note that (i) means that τ1 (A) is maximal if and only if A describes
the situation of full predictability, i.e., asset Y is a deterministic function of asset X .
In particular, all shuffles of M have maximum dependence. Dependence measures
based on the metric D2 may be constructed analogously.
Example 5 For the Farlie-Gumbel-Morgenstern family (G θ ) ∈ [−1, 1] of copulas
(see [22]), given by
|θ|
it is straightforward to show that τ1 (G θ ) = 4 holds for every θ ∈ [−1, 1] (for
details see [29]).
Example 6 For the Marshall-Olkin family (Mα,β )(α,β)∈[0,1]2 of copulas (see [22]),
given by 1−α
x y if x α ≥ y β
Mα,β (x, y) = (11)
xy 1−β if x α ≤ y β .
6 1 − (1 − α)z 6 1 − (1 − α)z+1
ζ1 (Mα,β ) = 3α (1 − α)z + − (12)
β z β z+1
holds, whereby z = 1
α + 2
β − 1 (for details again see [29]).
Remark 7 The dependence measure τ1 is nonmutual, i.e., we do not necessarily have
τ1 (A) = τ1 (At ), whereby At denotes the transpose of A (i.e., At (x, y) = A(y, x)).
This reflects the fact that the dependence structure of random variables might be
strongly asymmetric, see [29] for examples as well as [27] for a measure of mutual
dependence.
Remark 8 Since most properties of D1 in dimension two also hold in the general
m-dimensional setting it might seem natural to simply consider τ1 (A) := a D1 (A, Π )
as dependence measure on C m (a being a normalizing constant). It is, however,
straightforward to see that this yields no reasonable notion of a dependence quan-
tification in so far that we would also have τ1 (A) > 0 for copulas A describing
independence of X and Y = (Y1 , . . . , Ym−1 ). For a possible way to overcome this
problem and assign copulas describing the situation in which each component of a
portfolio (Y1 , . . . , Ym−1 ) is a deterministic function of another asset X maximum
dependence we refer to [11].
Remark 9 It is straightforward to verify that for samples (X 1 , Y1 ), . . . , (X n , Yn )
from A ∈ C the empirical copula Ê n (see [22, 28]) cannot converge to A w.r.t. D1
unless we have A ∈ Cd . Using Bernstein or checkerboard aggregations (smoothing
the empirical copula) might make it possible to construct D1 -consistent estimators
of τ1 (A). Convergence rates of these aggregations and other related questions are
future work.
Using Iterated Function Systems, one can construct copulas exhibiting surprisingly
irregular analytic behavior. The aim of this section is to sketch the construction and
then state two main results. For general background on Iterated Function Systems
with Probabilities (IFSP, for short), we refer to [16]. The IFSP construction of two-
dimensional copulas with fractal support goes back to [12] (also see [2]), for the
generalization to the multivariate setting we refer to [30].
Some Consequences of the Markov Kernel Perspective of Copulas 399
n
m
aj = ti j0 j ∈ {1, . . . , m}, bi = ti0 j i ∈ {1, . . . , n}. (13)
j0 ≤ j i=1 i 0 ≤i j=1
Since τ is a transformation matrix both (a j )mj=0 and (bi )i=0 n are strictly increasing and
is given by
Z τ
∈ K ([0, 1]2 ) will denote the attractor of the IFSP (see [16]). The induced
operator Vτ on P([0, 1]2 ) is defined by
m
n
Vτ (μ) := ti j μ f ji = ti j μ f ji . (15)
j=1 i=1 (i, j)∈
I
Example 12 Figure 1 depicts the density of Vτn (Π ) for n ∈ {1, 2, 3, 5}, whereby τ
is given by
⎛1 1⎞
6 0 6
τ = ⎝ 0 13 0 ⎠ .
1 1
6 0 6
Step 1 Step 2
1 1
ln(dens) ln(dens)
2/3 1.0 2/3 2.0
0.8 1.6
0.6 1.2
1/3 1/3
0 0
0 1/3 2/ 3 1 0 1/3 2/3 1
Step 3 Step 5
1 1
ln(dens) ln(dens)
2/3 3.2 2/3 5
2.8
2.4 4
2.0
1/3 1/3 3
1.6
0 0
0 1/3 2/ 3 1 0 1/3 2/ 3 1
Fig. 1 Image plot of the density of Vτn (Π ) for n ∈ {1, 2, 3, 5} and τ according to Example 12
still have μ A
τ ⊥ λ2 although in this case μ A
τ has full support [0, 1]2 . In fact, an even
stronger and quite surprising singularity result holds—letting T̂ denote the family of
all transformation matrices τ (i) containing no zeros, (ii) fulfilling that the row sums
and column sums through every ti j are identical, and (iii) μ A
τ = λ2 we have the
following striking result:
Theorem 13 ([33]) Suppose that τ ∈ T̂. Then the corresponding invariant copula
A
τ is singular w.r.t. λ2 and has full support [0, 1]2 . Moreover, for λ-almost every
A
x ∈ [0, 1] the conditional distribution function y → Fx τ (y) = K A
τ (x, [0, y]) is
continuous, strictly increasing and has derivative zero λ-almost everywhere.
Example 18 For the transformation matrix τ from Example 12 the invariant copula
A
τ is idempotent and its support has Hausdorff dimension ln 5/ ln 3. Hence, set-
ting A := A
τ and considering the Markov process outlined in Remark 15 we have
j
(X i , X i+n ) ∼ A for all i, n ∈ N. The same holds if we take A := Vτ (Π ) for
arbitrary j ∈ N since this A is idempotent too.
402 W. Trutschnig and J. Fernández Sánchez
We conclude this section with a general result that will be used later on and which,
essentially, follows from Von Neumanns mean ergodic theorem for Hilbert spaces
(see [24]) since Markov operators have operator norm one. For every copula A ∈ C
and every n ∈ N as in the Introduction we set
1 ∗i
n
s∗n (A) = A . (19)
n
i=1
Theorem 19 ([32]) For every copula A there exists a copula  such that
Komornik and Lasota (see [15]) have shown in 1987 that quasi-constrictivity is
equivalent to asymptotic periodicity—in particular they proved the following spec-
tral decomposition theorem: For every quasi-constrictive Markov operator T there
Some Consequences of the Markov Kernel Perspective of Copulas 403
holds. Furthermore (see again [1, 15, 18]), in case of μ(Ω) = 1 there exists a
measurable partition (E i )ri=1 of Ω in sets with positive measure such that g j and σ
in (22) fulfill
1
gj = 1 E and μ(E j ) = μ(E σ n ( j) ) (23)
μ(E j ) j
holds for every f ∈ D([0, 1]) := D([0, 1], B([0, 1]), λ).
Example 22 There are absolutely continuous copulas A whose corresponding
Markov operator is not quasi-constrictive—one example is the idempotent ordinal-
sum-like copula O with unbounded density k O defined by
∞
k O (x, y) := 2n 1[1−21−n ,1−2−n ) (x, y)
n=1
r
T f (x) =
n
f h i dμ 1 E σ n (i) (x) + Rn f (x) with lim
Rn f
1 = 0 (24)
n→∞
i=1 Ω
404 W. Trutschnig and J. Fernández Sánchez
r
r
h i
1
Rn 1Ω (x) = 1 − T n 1Ω (x) = 1 E σ n (i) (x) − 1 E n (x)
μ(E i ) σ (i)
i=1 i=1
r
h i
1
= 1− 1 E σ n (i) (x)
μ(E i )
i=1
Since μ(E i ) > 0 for every i ∈ {1, . . . , r } this shows that h i := μ(E
hi
i)
∈
∞
L (Ω, A , μ) ∩ D(Ω, A , μ) for every i ∈ {1, . . . , r }. Furthermore we have
limn→∞
Rn h i
1 = 0 for every fixed i, from which
r
1= T n h i (x)dμ(x) = lim h i (z)h j (z)dμ(z) 1 E σ n ( j) (x)dμ(x)
n→∞
Ω Ω j=1 Ω
r
= h i (z)h j (z)dμ(z) μ(E j )
j=1 Ω
follows. Multiplying both sides with μ(E i ), summing over i ∈ {1, . . . , r } yields
r
r
1= h i (z)μ(E i ) h j (z)μ(E j ) dμ(z)
Ω i=1 j=1
:=g(z)
so g ∈ D(Ω, A , μ) and at the same time g 2 ∈ D(Ω, A , μ). Using Cauchy Schwarz
inequality it follows that g(x) = 1 for μ-almost every x ∈ Ω.
Lemma 24 Suppose that A is a copula whose corresponding Markov operator T A
is quasi-constrictive. Then there exists r ≥ 1, a measurable partition (E i )ri=1 of
[0, 1] in sets with positive measure, and pairwise different densities h 1 , . . . , h r ∈
L ∞ ([0, 1]) ∩ D([0, 1]) such that the limit copula  of s∗n (A) is absolutely contin-
uous with density k  , defined by
r
k  (x, y) = h i (y)1 Ei (x) (25)
i=1
Some Consequences of the Markov Kernel Perspective of Copulas 405
Proof Fix an arbitrary f ∈ L 1 ([0, 1]). Then, using Lemma 23, we have
r
1 j 1 1
n n n
T A f (x) = f h σ − j (i) dλ1 Ei (x) + R j f (x)
n n n
j=1 j=1 i=1[0,1] j=1
1
r n n
1
= 1 Ei (x) f (z) h σ − j (i) (z) dλ(z) + R j f (x)
n n
i=1 [0,1] j=1 j=1
:=gni (z)
1
n
lim h σ − j (i) (z) = h i (z)
n→∞ n
j=1
for every z ∈ [0, 1] and every i ∈ {1, . . . , r }. Obviously h i ∈ L ∞ ([0, 1]) and, using
Lebesgue’s theorem on dominated convergence, h i is also a density, so we have
h 1 , . . . , h r ∈ L ∞ ([0, 1]) ∩ D([0, 1]). Finally, using Theorem 19 and the fact that
limn→∞
Rn f
1 = 0 for every f ∈ L 1 ([0, 1]), it follows immediately that
r
T Â f (x) = f (y) h i (y)1 Ei (x)dλ(y).
[0,1] i=1
This completes the proof since mutually different densities can easily be achieved
by building unions from elements in the partition (E i )ri=1 if necessary.
Using the fact that  is idempotent we get the following stronger result:
Lemma 25 The density k  of  in Lemma 24 has the form
r
k  (x, y) = m i, j 1 Ei ×E j (x, y),
i, j=1
r
r
h i (y)1 Ei (x) = h i (x)1 Ei (y)
i=1 i=1
for every (x, y) ∈ Δ. Fix arbitrary i, j ∈ {1, . . . , r }. Then we can find x ∈ E i such
that λ(Δx ) = 1 holds, whereby Δx = {y ∈ [0, 1] : (x, y) ∈ Δ}. For such x we have
h i (y) = h j (x) for λ-almost every y ∈ E j , which firstly implies that h j is, up to a set
of measure zero, constant on E j and, secondly, that k  is constant on E i × E j outside
a set of λ2 -measure zero. Since we may modify the density on a set of λ2 -measure
zero we can assume that k  is of the desired form
r
k  (x, y) = m i, j 1 Ei ×E j (x, y),
i, j=1
Before proceeding with the final result it is convenient to take a look at the matrix
H = (Hi, j )ri, j=1 defined by
Hi, j := m i, j λ(E j ) = h i (z)dλ(z) (26)
Ej
r
h i (y)1 Ei (x) = k  (x, y) = k  ∗ k  (x, y)
i=1
r
r
= h i (z)1 Ei (x) h j (y)1 E j (z) dλ(z)
[0,1] i=1 j=1
r
r
= 1 Ei (x)h j (y) h i (z)dλ(z) = 1 Ei (x)h j (y)Hi, j .
i, j=1 Ej i, j=1
From this is follows immediately that h i (y) = rj=1 Hi, j h j (y) is fulfilled for every
y ∈ [0, 1] and i ∈ {1, . . . , r }, so, integrating both sides over El , we have Hi,l =
Some Consequences of the Markov Kernel Perspective of Copulas 407
r
j=1 Hi, j H j,l , which shows that H is idempotent. Having this, the proof of the
following main result of this section will be straightforward.
Theorem 26 Suppose that A is a copula whose corresponding Markov operator T A
is quasi-constrictive. Then there exist r ≥ 1 and a measurable partition (E i )ri=1
of [0, 1] in sets with positive measure, such that the limit copula  of s∗n (A) is
absolutely continuous with density k  given by
r
1
k  (x, y) = 1 E ×E (x, y) (27)
λ(E i ) i i
i=1
for all x, y ∈ [0, 1]. In other words, the limit copula  has an ordinal-sum-of-Π -like
structure.
Proof Since H is an idempotent stochastic matrix and since H can not have any
column consisting purely of zeros, up to a permutation, H must have the form (see
[5, 21]). ⎛ ⎞
Q1 0 . . . 0
⎜ 0 Q2 . . . 0 ⎟
⎜ ⎟
⎜ .. .. . . .. ⎟ , (28)
⎝ . . . . ⎠
0 0 . . . Qs
Hi1 ,iv = m i1 ,i1 λ(E iv ) = Hi2 ,iv = m i2 ,iv λ(E iv ) = · · · = Hirl ,iv = m irl ,iv λ(E iv ),
r
|m j,i1 − m j,i2 | = |m j,i1 − m j,i2 | > 0
j∈Il j=1
408 W. Trutschnig and J. Fernández Sánchez
Remark 27 Consider again the transformation matrix τ from Example 12. Then
Vτ1 (Π ), Vτ2 (Π ), . . . are examples of the ordinal-sum-of-Π -like copulas mentioned
in the last theorem.
Acknowledgments The second author acknowledges the support of the Ministerio de Ciencia e
Innovación (Spain) under research project MTM2011-22394.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
1. Bartoszek, W.: The work of professor Andrzej Lasota on asymptotic stability and recent
progress. Opusc. Math. 28(4), 395–413 (2008)
2. Cuculescu, I., Theodorescu, R.: Copulas: diagonals, tracks. Rev. Roum. Math Pure A. 46,
731–742 (2001)
3. Darsow, W.F., Nguyen, B., Olsen, E.T.: Copulas and Markov processes. Ill. J. Math. 36(4),
600–642 (1992)
4. Darsow, W.F., Olsen, E.T.: Characterization of idempotent 2-copulas. Note Mat. 30(1), 147–
177 (2010)
5. Doob, J.: Topics in the theory of Markov chains. Trans. Am. Math. Soc. 52, 37–64 (1942)
6. Durante, F., Klement, E.P., Quesada-Molina, J., Sarkoci, J.: Remarks on two product-like
constructions for copulas. Kybernetika 43(2), 235–244 (2007)
7. Durante, F., Sarkoci, P., Sempi, C.: Shuffles of copulas. J. Math. Anal. Appl. 352, 914–921
(2009)
8. Durante, F., Sempi, C.: Copula theory: an introduction. In: Jaworski, P., Durante, F., Härdle, W.,
Rychlik, T. (eds.) Copula Theory and Its Applications. Lecture Notes in Statistics-Proceedings,
vol. 198, pp. 1–31. Springer, Berlin (2010)
9. Durante, F., Fernńdez-Sánchez, J.: Multivariate shuffles and approximation of copulas. Stat.
Probab. Lett. 80, 1827–1834 (2010)
10. Farahat, H.K.: The semigroup of doubly-stochastic matrices. Proc. Glasg. Math. Ass. 7, 178–
183 (1966)
11. Fernández-Sánchez, J., Trutschnig, W.: Conditioning based metrics on the space of multivariate
copulas, their interrelation with uniform and levelwise convergence and Iterated Function
Systems. to appear in J. Theor. Probab. doi:10.1007/s10959-014-0541-4
12. Fredricks, G.A., Nelsen, R.B., Rodríguez-Lallena, J.A.: Copulas with fractal supports. Insur.
Math. Econ. 37, 42–48 (2005)
13. Kallenberg, O.: Foundations of Modern Probability. Springer, New York (1997)
14. Klenke, A.: Probability Theory—A Comprehensive Course. Springer, Berlin (2007)
15. Komornik, J., Lasota, A.: Asymptotic decomposition of Markov operators. Bull. Pol. Acad.
Sci. Math. 35, 321327 (1987)
16. Kunze, H., La Torre, D., Mendivil, F., Vrscay, E.R.: Fractal Based Methods in Analysis.
Springer, New York (2012)
17. Lancaster, H.O.: Correlation and complete dependence of random variables. Ann. Math. Stat.
34, 1315–1321 (1963)
Some Consequences of the Markov Kernel Perspective of Copulas 409
18. Lasota, A., Mackey, M.C.: Chaos, Fractals and Noise—Stochastic Aspects of Dynamics.
Springer, New York (1994)
19. Li, X., Mikusinski, P., Taylor, M.D.: Strong approximation of copulas. J. Math. Anal. Appl.
255, 608–623 (1998)
20. Mikusínski, P., Sherwood, H., Taylor, M.D.: Shuffles of min. Stochastica 13, 61–74 (1992)
21. Mukherjea, A.: Completely simple semigroups of matrices. Semigroup Forum 33, 405–429
(1986)
22. Nelsen, R.B.: An Introduction to Copulas. Springer, New York (2006)
23. Olsen, E.T., Darsow, W.F., Nguyen, B.: Copulas and Markov operators. In: Proceedings of the
Conference on Distributions with Fixed Marginals and Related Topics. IMS Lecture Notes.
Monograph Series, vol. 28, pp. 244–259 (1996)
24. Parry, W.: Topics in Ergodic Theory. Cambridge University Press, Cambridge (1981)
25. Schwarz, S.: A note on the structure of the semigroup of doubly-stochastic matrices. Math.
Slovaca 17(4), 308–316 (1967)
26. Sempi, C.: Conditional expectations and idempotent copulae. In: Cuadras, C.M., et al. (eds.)
Distributions with given Marginals and Statistical Modelling, pp. 223–228. Kluwer, Nether-
lands (2002)
27. Siburg, K.F., Stoimenov, P.A.: A measure of mutual complete dependence. Metrika 71, 239–
251 (2010)
28. Swanepoel, J., Allison, J.: Some new results on the empirical copula estimator with applica-
tions. Stat. Probab. Lett. 83, 1731–1739 (2013)
29. Trutschnig, W.: On a strong metric on the space of copulas and its induced dependence measure.
J. Math. Anal. Appl. 384, 690–705 (2011)
30. Trutschnig, W., Fernández-Sánchez, J.: Idempotent and multivariate copulas with fractal sup-
port. J. Stat. Plan. Inference 142, 3086–3096 (2012)
31. Trutschnig, W., Fernández-Sánchez, J.: Some results on shuffles of two-dimensional copulas.
J. Stat. Plan. Inference 143, 251–260 (2013)
32. Trutschnig, W.: On Cesáro convergence of iterates of the star product of copulas. Stat. Probab.
Lett. 83, 357–365 (2013)
33. Trutschnig, W., Fernández-Sánchez, J.: Copulas with continuous, strictly increasing singular
conditional distribution functions. J. Math. Anal. Appl. 410, 1014–1027 (2014)
34. Walters, P.: An Introduction to Ergodic Theory. Springer, New York (2000)
Copula Representations for Invariant
Dependence Functions
Abstract Our main goal is to characterize in terms of copulas the linear Sibuya
bivariate lack of memory property recently introduced in [12]. As a particular case,
one can obtain nonaging copulas considered in the literature.
In the simplest case, when B(x1 , x2 ; t) = 1 in (2), one gets the functional equation
for all x1 , x2 ≥ 0 and t > 0. Bivariate continuous distributions satisfying (3) possess
the classical bivariate lack of memory property (BLMP).
The only solution of (3) with exponential marginals is the Marshall–Olkin bivari-
ate exponential distribution introduced in [9]. However, there do exist distributions
having BLMP with nonexponential marginals. Various solutions of functional equa-
tion (3) are presented in [7] where the marginals may have any kind of failure rates:
increasing, decreasing, bathtub, etc. It is well-known that BLMP preserves the dis-
tribution of (X 1 , X 2 ) and its residual lifetime vector
d d
independent of t ≥ 0, i.e., (X 1 , X 2 ) = Xt implying X i = X it , i = 1, 2 for all
t ≥ 0.
Remark 1 The vectors (X 1 , X 2 ) and Xt should necessarily have the same survival
copula, which is unique under continuity of X i , i = 1, 2. Therefore, BLMP implies
that the corresponding survival copulas are time invariant (nonaging).
The joint survival function of Xt is given by SXt (x1 , x2 ) = S(x1 +t, x2 +t)/S(t, t).
Its marginal survival functions are S X 1t (x1 ) = S(x1 + t, t)/S(t, t) and S X 2t (x2 ) =
S(t, x2 + t)/S(t, t). Applying the Sibuya form representation (1) with respect to the
residual lifetime vector Xt we have
Observe that BLMP distributions satisfy (5). This means that the class of bivariate
continuous distributions with LS-BLMP includes those possessing BLMP.
Let us assume that the partial derivatives of S(x1 , x2 ) exist and are continuous.
Denote by ri (x1 , x2 ) = −∂ ln S(x1 , x2 )/∂ xi the conditional failure rates, i = 1, 2. In
[12] it is introduced a class L (x; a) of nonnegative bivariate continuous distributions
that satisfy the relation
Remark 2 The joint survival function S(x1 , x2 ) in the previous expression is proper
only for certain marginals S X 1 (x1 ) and S X 2 (x2 ). Their choice will determine the range
of possible values for the non-negative parameters a0 , a1 and a2 , see Theorem 5.2.14
and Proposition 5.2.17 in [12]. The nonnegative parameter a0 plays an important role
in the class L (x; a). If a0 = f X 1 (0) + f X 2 (0), the joint survival function S(x1 , x2 )
is absolutely continuous and if a0 < f X 1 (0) + f X 2 (0), the distribution exhibits a
singular component.
It happens that the class L (x; a) specified by (7) can be characterized by the
LS-BLMP defined by (5) and (6). The class L (x; a) contains continuous bivari-
ate distributions that are symmetric or asymmetric, positive quadrant dependent or
negative quadrant dependent, absolutely continuous or exhibit a singular compo-
nent. In addition, L (x; a) can be equivalently represented by relation (2) when
B(x1 , x2 ; t) = exp{−a1 x1 t − a2 x2 t}, i.e., by
414 J. Pinto and N. Kolev
S(x1 + t, x2 + t)
= S(x1 , x2 ) exp{−a1 x1 t − a2 x2 t}. (8)
S(t, t)
Let the vector (X 1 , X 2 ) be a member of the class L (x; a). Hence, the survival
function of the corresponding residual lifetime vector Xt is given by (8). Denote
by C and Ct , the survival copulas of (X 1 , X 2 ) and Xt , respectively. First, we will
find a relation between the survival copulas C and Ct . As a second step, we will
obtain a characterizing functional equation for the survival copula Ct that joins the
corresponding marginals in both sides of (8).
Theorem 1 Let (X 1 , X 2 ) belong to the class L (x; a). The survival copulas of Xt
and (X 1 , X 2 ) are connected by
−1
Ct (u, v) = C exp{−H1 G −1
1t (− ln u) }, exp{−H2 G 2t (− ln v) }
× exp{−a1 t G −1 −1
1t (− ln u) − a2 t G 2t (− ln v)}, (9)
Proof The marginals of Xt have survival functions specified by (6). Using Sklar’s
theorem, relation (8) can be rewritten in terms of the survival copulas Ct and C as
follows
Ct S X 1 (x1 ) exp{−a1 x1 t}, S X 2 (x2 ) exp{−a2 x2 t}
= C S X 1 (x1 ), S X 2 (x2 ) exp{−a1 x1 t − a2 x2 t}. (10)
Let u = S X 1 (x1 ) exp{−a1 x1 t} and v = S X 2 (x2 ) exp{−a2 x2 t}. From the relations
S X i (xi ) = exp{−Hi (xi )} and G it (xi ) = Hi (xi ) + ai xi t, i = 1, 2, we get x1 =
G −1 −1
1t (− ln u) and x 2 = G 2t (− ln v). Using these Eqs. in (10) we obtain (9).
Relation (9) shows that the survival copulas of (X 1 , X 2 ) and Xt do not coincide in
general. The time invariance (nonaging) in the class L (x; a) (being equivalent to LS-
BLMP) is related to the memoryless dependence function Ωt of the residual lifetime
Copula Representations for Invariant Dependence Functions 415
vector Xt , see relation (5). For comparison only, recall that the time invariance for
BLMP distributions is concerned with the joint distribution of Xt .
Substituting a1 = a2 = 0 in (9), we get Ct (u, v) = C(u, v) for all t ≥ 0, i.e., the
survival copula Ct is time invariant, see Remark 1. The conclusion is same if X 1 and
X 2 are independent, i.e., C(u, v) = uv. Thus, we have the following result.
Corollary 1 Under conditions of Theorem 1 if
(i) a1 = a2 = 0 or
(ii) X 1 is independent of X 2 ,
then Ct (u, v) = C(u, v) for all u, v ∈ (0, 1] and t ≥ 0.
The next example illustrates the relations established.
Example 1 Let the vector (X 1 , X 2 ) belong to L (x; a). Suppose that the marginals
are exponentially distributed, i.e., S X i (x) = exp{−λi xi }, λi > 0, i = 1, 2. There-
fore, G it (x) = λi x + ai xt and G it−1 (u) = u/(λi + ai t), i = 1, 2. From (9) we
obtain
λ1 ln u λ2 ln v a1 t ln u a2 t ln v
Ct (u, v) = C exp , exp exp + ,
λ1 + a 1 t λ2 + a 2 t λ1 + a 1 t λ2 + a 2 t
λ1 λ2 a1 t a2 t
Ct (u, v) = C u λ1 +a1 t , v λ2 +a2 t u λ1 +a1 t v λ2 +a2 t . (11)
Relation (11) gives a general expression for the survival copula Ct (u, v) corre-
sponding to Xt for all members of the class L (x; a) with exponential marginals.
Assume further that (X 1 , X 2 ) follows Gumbel’s type I exponential distribution
with survival function
see [5]. This distribution is a member of the class L (x; a) and the constants in (7)
are specified by a0 = λ1 + λ2 and a1 = a2 = θ λ1 λ2 . The corresponding survival
copula is C(u, v) = uv exp{−θ ln u ln v}. Substituting C(u, v) in (11) we obtain
Ct (u, v) = uv exp{−θ ln u ln v/[(1 + θ λ2 t)(1 + θ λ1 t)]}. Therefore, the survival
copula Ct (u, v) depends on t as well.
When t = 0 in (11) we recover the survival copula C(u, v) of (X 1 , X 2 ) and
letting t → ∞, we obtain the independence copula C∞ (u, v) = uv. Notice that the
independence of X 1 and X 2 is equivalent to the condition a1 = a2 = 0.
Now, our interest is to find a characterizing functional equation involving the
survival copula Ct of Xt for the absolutely continuous members of the class L (x; a).
Theorem 2 Let the survival copula Ct of Xt be differentiable in its arguments. The
absolutely continuous random vector (X 1 , X 2 ) belongs to the class L (x; a), if and
only if there exist non-negative constants a1 and a2 , such that
416 J. Pinto and N. Kolev
S(x1 + t, t) S(t, x2 + t)
Ct , = Ct S X 1 (x1 ) exp{−a1 x1 t}, S X 2 (x2 ) exp{−a2 x2 t} ,
S(t, t) S(t, t)
(12)
for all x1 , x2 , t ≥ 0.
Proof Let us assume that the functional equation (12) is satisfied. We will show
that (7) is fulfilled. Taking the derivative in both sides of (12) with respect to t we
obtain
1
1 +t,t) S(t,x 2 +t) [S (x1 +t,t)+S 2 (x1 +t,t)]S(t,t)−S(x1 +t,t)[S 1 (t,t)+S 2 (t,t)]
Ct1 S(xS(t,t) , S(t,t) [S(t,t)]2
S(x1 +t,t) S(t,x2 +t) [S 1 (t,x2 +t)+S 2 (t,x2 +t)]S(t,t)−S(t,x2 +t)[S 1 (t,t)+S 2 (t,t)]
+Ct 2 , S(t,t) [S(t,t)]2
S(t,t)
= Ct S X 1 (x1 ) exp{−a1 x1 t}, S X 2 (x2 ) exp{−a2 x2t} (−a1 x1 S X 1 (x1 ) exp{−a1 x1 t})
1
+Ct2 S X 1 (x1 ) exp{−a1 x1 t}, S X 2 (x2 ) exp{−a2 x2 t} (−a2 x2 S X 2 (x2 ) exp{−a2 x2 t}),
where the superscripts 1 and 2 denote the partial derivatives with respect to the first
and second arguments of the corresponding functions. Letting x1 = 0 in the last
equation we have
S(t,x2 +t) [S 1 (t,x2 +t)+S 2 (t,x2 +t)]S(t,t)−S(t,x2 +t)[S 1 (t,t)+S 2 (t,t)]
Ct2 1,
S(t,t) [S(t,t)]2
= Ct2 1, S X 2 (x2 ) exp{−a2 x2 t} (−a2 x2 S X 2 (x2 ) exp{−a2 x2 t}).
S(t, x2 + t)
− [r (t, x2 + t) − r (t, t)] = −a2 x2 S X 2 (x2 ) exp{−a2 x2 t},
S(t, t)
which is equivalent to
r (t, x2 + t) = r (t, t) + a2 x2 . (13)
S 1 (x1 + t, t)
= − f X 1 (x1 ) exp{−a1 x1 t} − a1 t S X 1 (x1 ) exp{−a1 x1 t}
S(t, t)
r (t, t) = a0 + a1 t + a2 t.
Since the dependence function Ωt satisfies the Sibuya form (4), we refer to the
survival copula Ct characterized by functional equation (12) as Sibuya-type copula.
where θ i ∈ (0, 1], and λi > 0, i = 1, 2. This distribution was obtained in [12]
and can be named Generalized Gumbel’s bivariate exponential distribution with
parameters λi and θ i , i = 1, 2. If θ 1 = θ 2 = θ , we get the Gumbel distribution
considered in Example 1. The marginal survival functions are S X i (xi ) = exp{−λi xi },
i = 1, 2.
The survival function of the residual lifetime vector Xt is given by (8). After some
algebra, we get the corresponding survival copula
418 J. Pinto and N. Kolev
⎧
⎪
⎪ θ1 λ1 (θ 2 −θ 1 )
⎪
⎪ uv exp − γ1 (t)γ2 (t) ln u ln v exp − (ln v) 2 ,
⎪
⎪ 2λ2 γ12 (t)
⎨ −λ γ (t) −λ γ (t)
if u 2 1 ≥v 1 2 ;
Ct (u, v) =
⎪
⎪ θ λ2 (θ 1 −θ 2 )
⎪
⎪ uv exp − 2
γ1 (t)γ2 (t) ln u ln v exp − (ln u) 2 ,
⎪
⎪ 2λ1 γ22 (t)
⎩ −λ γ (t) −λ γ (t)
if u 2 1 <v 1 2 ,
S(x1 + t, x2 + t)
Ct (exp{−λ1 x1 − λ1 λ2 θ 1 x1 t}, exp{−λ2 x2 − λ1 λ2 θ 2 x2 t}) = ,
S(t, t)
C S X 1 (x1 + t), S X 2 (x2 + t) C S X 1 (x1 + t), S X 2 (t) C S X 1 (t), S X 2 (x2 + t)
=C , (15)
C S X 1 (t), S X 2 (t) C S X 1 (t), S X 2 (t) C S X 1 (t), S X 2 (t)
has to be satisfied for all x1 , x2 ≥ 0 and t ≥ 0. We will assume further that the
survival copula C is time invariant (or nonaging) if it corresponds to a member of
the class A .
Taking into account the conclusion in Remark 1, all bivariate survival functions
possessing BLMP belong to A . It happens that this time invariance property is not
restricted to BLMP survival functions. For instance, it is well-known that the Clayton
bivariate survival function given by
−1/θ
S(x1 , x2 ) = S X−θ1 (x1 ) + S X−θ2 (x2 ) − 1 , θ ∈ (0, ∞),
has time invariant survival copula. One can find other members of the class A in
Examples 3 and 4.
Copula Representations for Invariant Dependence Functions 419
is invariant on the main diagonal of the unit square, see [2]. Let us initially consider
equally distributed marginals S X 1 (x) = S X 2 (x) = S X (x). If S X (x) is exponen-
tially distributed, then S(x1 , x2 ) = Cα (S X 1 (x1 ), S X 2 (x2 )) is a particular case of the
Marshall–Olkin’s bivariate exponential distribution, see [9], possessing BLMP and,
consequently, belonging to the class A . Now, let X be gamma distributed random
variable. In this case, BLMP does not hold true but the corresponding joint survival
function still belongs to A .
In a third scenario, where X 1 and X 2 do not share the same distribution but are
joined by the Cuadras-Augé survival copula, S(x1 , x2 ) neither possesses BLMP nor
belongs to A .
is invariant on the curve {(u, v) = (t α , t β ), t ∈ (0, 1)}, see [2]. Notice that when
α = β we obtain the Cuadras-Augé survival copula from Example 3.
Let us consider a baseline survival function S X (x) and substitute S X 1 (x) =
[S X (x)]α and S X 2 (x) = [S X (x)]β . Then, the corresponding joint survival function
S(x1 , x2 ) = Cα,β (S X 1 (x1 ), S X 2 (x2 )) belongs to A . In particular, if the marginals are
exponentially distributed, not necessarily sharing the same parameter, then S(x1 , x2 )
possesses BLMP. But choosing X 1 exponentially distributed and X 2 beta distributed,
say the corresponding joint survival function is not a member of the class A .
The cases considered in the last two examples depend on the choice of the marginal
survival functions. A general invariance property can be obtained when we consider
the Clayton survival copula. In such a case, for any marginals we have time invariant
survival copulas. We refer the reader to Sect. 4 in [2] for more details on time invariant
copulas.
In fact, the Clayton survival copula is the only absolutely continuous copula that
is preserved even under bivariate truncation, see [11]. The absolutely continuous
assumption is relaxed in Theorem 4.1 in [3]. In [10], it is given a characterization
of the survival functions which simultaneously have Clayton survival copula and
possess BLMP, see their Theorem 3.2.
In the next statement, we establish a necessary condition to an absolutely contin-
uous bivariate survival function be a member of the class A .
420 J. Pinto and N. Kolev
The same equation is obtained in Proposition 3 (ii) in [1] under the condition that
X 1 and X 2 are uniformly distributed on the unit square, i.e., f X 1 (0) = f X 2 (0) = 1.
Further, assume that C(u, v) is exchangeable. Thus, C 2 (u, 1) = C 1 (1, u),
C (u, v) = C 1 (v, u) and the last equation transforms into
2
C 1 (1, u) C 1 (1, v)
C(u, v) = u − C (u, v) + v −
1
C 1 (v, u),
2 2
4 Conclusions
The time invariance of the residual lifetime vector Xt of (X 1 , X 2 ) is characterized
by BLMP in [9]. It tells us that the joint distributions of Xt and (X 1 , X 2 ) coincide
independently of t, i.e., the BLMP holds. In this paper, we consider a more general
concept, namely time invariance of the dependence functions of Xt and (X 1 , X 2 ),
given by (4) and (1), respectively.
We offer copula representations for the time invariance property related to bivariate
survival functions of the residual lifetime vector Xt . While in Sect. 2, the nonaging
phenomena is associated with the dependence function Ωt (x1 , x2 ), in Sect. 3 our
interest is on the survival copula Ct (u, v) of Xt .
We are thankful to the referee and editor for their comments.
Copula Representations for Invariant Dependence Functions 421
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
References
1. Charpentier, A.: Tail distribution and dependence measures. Working paper (2003)
2. Charpentier, A., Juri, A.: Limiting dependence structures for tail events, with applications to
credit derivatives. J. Appl. Probab. 43, 563–586 (2006)
3. Durante, F., Jaworski, P.: Invariant dependence structure under univariate truncation. Stat: J.
Theoret. Appl. Stat. 46, 263–277 (2012)
4. Gourieroux, C., Monfort, A.: Age and term structure in duration models. Working paper (2003)
5. Gumbel, E.: Bivariate exponential distributions. J. Am. Stat. Assoc. 55, 698–707 (1960)
6. Hofert, M., Vrins, F.: Sibuya copulas. J. Multivar. Anal. 114, 318–337 (2013)
7. Kulkarni, H.: Characterizations and modelling of multivariate lack of memory property. Metrika
64, 167–180 (2006)
8. Mai, J.-F., Scherer, M.: Simulating Copulas. Imperial College Press, London (2012)
9. Marshall, A., Olkin, I.: A multivariate exponential distribution. J. Am. Stat. Assoc. 62, 30–41
(1967)
10. Mulero, J., Pellerey, F.: Bivariate aging properties under Archimedean dependence structures.
Commun. Stat: Theor. Methods. 39, 3108–3121 (2010)
11. Oakes, D.: On the preservation of copula structure under truncation. The Canadian J. Stat. 33,
465–468 (2005)
12. Pinto, J.: Deepening the notions of dependence and aging in bivariate probability distributions.
PhD Thesis, University of Sao Paulo (2014)
13. Sibuya, M.: Bivariate extreme statistics I. Ann. Inst. Stat. Math. 11, 195–210 (1960)
Nonparametric Copula Density Estimation
Using a Petrov–Galerkin Projection
u 1 u d
··· c(s1 , . . . , sd )ds1 · · · dsd = C(u 1 , . . . , u d )
0 0
( j = 1, . . . , d) can be expressed as
∂d C
c(u 1 , . . . , u d ) = (1)
∂u 1 · · · ∂u d
exists and then the density gives us the dependence structure in a more convenient
way, because usually the graphs of the copulas look very similar and there are only
small differences in the slope. For this reason the reconstruction of the copula density
is a vibrant field of research in finance and many other scientific fields. Particularly
in practical tasks, the dependence structure of more than two random variables is
of special interest as the dimension d is large. In the nonparametric statistical esti-
mation, usually kernel estimators are used, but they have often problems with the
boundary bias. There are also spline- or wavelet-based approximation methods, but
most of them are only discussed in the two-dimensional case. Likewise, in [12],
the authors discuss a penalized nonparametrical maximum likelihood method in the
two-dimensional case. A detailed survey of literature about nonparametrical copula
density estimation can be found in [6]. However, most of the nonparametrical meth-
ods are faced with the curse of dimensionality such that the numerical computations
are only for sufficiently low dimensions possible. Actually, many authors discuss
only the two-dimensional case in non-parametrical copula density estimation.
In this paper we develop an alternative approach based on the theory of inverse
problems. The copula density (1) exists only for absolutely continuous copulas.
Obviously, the copula is not observable for a sample X1 , X2 , . . . , XT in the statistical
framework, but we can approximate it with the empirical copula
1 1
T T d
Ĉ(u) = 1 {Û j ≤u} = 1 {Ûk j ≤u k } (2)
T T
j=1 j=1 k=1
of the margin transformed pseudo samples Û1 , Û2 , . . . , ÛT with Ûk j = F̂k (X k j )
where
1
T
F̂k (x) = 1 {X k j ≤x}
T
j=1
Nonparametric Copula Density Estimation Using a Petrov–Galerkin Projection 425
denotes the empirical margins. It is well-known that the empirical copula uniformly
converges to the copula (see [2])
1
(log log T ) 2
max C(u) − Ĉ(u) = O 1
a.s. for T → ∞ (3)
u∈[0,1]d T2
u 1 u d
··· c(s1 , . . . , sd )ds1 · · · dsd = C(u 1 , . . . , u d ) ∀u = (u 1 , . . . , u d )T ∈ Ω = [0, 1]d
0 0
(4)
which can be seen as a weak formulation of Eq. (1). In the following,
we therefore
consider the linear Volterra integral operator A ∈ L L 1 (Ω), L 2 (Ω) and solve the
linear operator equation
Ac = C (5)
to find the copula density c. In the following, we assume attainability which means
C ∈ R(A), hence we only consider copulas C ∈ L 2 (Ω) which have a solution
c ∈ L 1 (Ω)
The injective Volterra integral operator is well-studied in the inverse problem
literature. Even in the one-dimensional case, this is an ill-posed operator resulting
from the noncontinuity of the inverse A−1 , which is the differential operator. Hence,
solving Eq. (1) leads to numerical instabilities if the right-hand side of (5) has only a
small data error. Because the solution is sensitive to small data errors, regularization
methods to overcome the instability are discussed in the inverse problem literature.
For a detailed introduction to regularization see, for example, [4, 13].
In Sect. 2 we discuss a discretization of the integral equation (4) and in Sect. 3,
we illustrate the numerical instability if we use the empirical copula instead of the
exact one and discuss regularization methods for the discretized problem.
The basics to the numerical implementation of the problem and especially the
details of the Kronecker multiplication are presented in the authors working paper
[14] and a discussion that the Petrov–Galerkin projection is not a simple counting
algorithm is done in [15]. This paper gives an summary of the proposed method for
effective computation of the right-hand side for larger dimensions and discusses in
more detail the analytical aspects of the inverse problem and reasons for the existence
of the Kronecker structure.
426 D. Uhlig and R. Unger
2 Numerical Approximation
u
c(s)ds = C(u) ∀u = (u 1 , . . . , u d )T ∈ Ω = [0, 1]d
0
for Eq. (4) as a short form. We propose applying a Petrov–Galerkin projection (see
[5]) for some discretization size h and consider the finite dimensional approximation
N
ch (s) = c j φ j (s) , (6)
j=1
u
ch (s)dsψ(u)du = C(u)ψ(u)du ∀ψ ∈ Ṽh . (7)
Ω 0 Ω
It is sufficient to fulfill Eq. (7) for N linear independent test functions ψi ∈ V˜h . This
yields the system of linear equations
Kc = C (8)
u
Ki j = φ j (s)dsψi (u)du .
Ω 0
Nonparametric Copula Density Estimation Using a Petrov–Galerkin Projection 427
If the exact copula is replaced by the empirical copula, we obtain a noisy repre-
sentation Cδ with
δ
Ci = Ĉ(u)ψi (u)du, i = 1, . . . , N (10)
Ω
u
Ch (u) = ch (s)ds
0
we choose the test functions as integrated ansatz functions, such that the approxi-
mated copula
N
Ch (u) = c j ψ j (u)
j=1
n3
·
··
2n n 2
+ −n
+ 1)
n3
−
2
(n
n 2
+ 1)n
+ −n
·
··
n3
−
1
(n
n )n 2
n3 2
·
)n
+ 1
2
−
··
−
+
−1
1 (n
+1 n
n )n 2
+ 1)n 2
(n +n
+ 1
−
−
+
···
2
···
(n
(n
(n
+ 1)n 2
2
− )n
−
··· +n 1)n2 −1
1
u3 ···
(n
+ 1 (n − 2 (n +2
.. 1) 1) n
.
+1 n2(n − 1
2n2 . + ..
− ···
+1 n ··· 2 +n
n
··· ···
n2 ··· ···
− n2
+1 n +n 2 +2 n
+1 n
··· n2
+1 2+ 1 ···
n
n+
1 2
1 1
u2
u1
(a) (b)
(n − 1)n (n − 1)n
+1 +2 ··· n2 n2
.. .. . ..
..
·
..
··
. . .
.
+ 1)
−
u2
...
2n
n
2
n(
+ 1)
−
n
1
··
2n
..
n(
·
.
··
..
n
1
···
+
n
2
1 2
n
u1
u
ψi (u) = φi (s)ds. (12)
0
In contrast to finite element discretizations, the system matrix K is not sparse and the
system size N = n d grows exponentially with the dimension d. A straightforward
assembling and solving of the linear system (8) becomes impossible for usual dis-
cretizations n. Even in the three-dimensional case, the matrix storage of the system
matrix for n = 80 needs approximately one terabyte, even when exploiting sym-
metry, and computing times for assembling and solving such systems will become
enormous.
The choices (11) and (12) yield a structure of the N × N system matrix K ,
illustrated in Fig. 2, allowing us to solve (8) also for d > 2. The matrixplot shows
that the n × n system matrix of the one-dimensional case is equivalent to the upper
left n × n corners of the two- and three-dimensional matrices. Moreover, the other
parts of the system matrices are scaled replications of the one-dimensional n × n
Nonparametric Copula Density Estimation Using a Petrov–Galerkin Projection 429
(a) (b)
(c)
Fig. 2 Matrixplots of the system matrix K for n = 4 and different dimensions d. a System matrix
for d = 1. b System matrix for d = 2. c System matrix for d = 3
This yields
430 D. Uhlig and R. Unger
d
φi (u) = φik (u k ) (13)
k=1
as well as
d
ψi (u) = ψik (u k ) (14)
k=1
We only formulate the main result allowing us to compute solutions of (8) also
for higher dimensions d. Details and proofs can be found in the working paper [14].
Theorem 1 The system matrix for the (d + 1)-dimensional case can be extracted
from the one and d-dimensional system matrices.
(d+1)
K = (1) K ⊗ (d) K
Corollary 1 The system matrix (d) K is the d-fold Kronecker product of the n × n
matrix (1) K
(d)
K = (1) K ⊗ (1) K ⊗ · · · ⊗ (1) K (15)
and the inverse system matrix of the d-dimensional problem is the d-times Kronecker
product of the one-dimensional inverse system matrix
(d)
K −1 = (1) K −1 ⊗ (1) K −1 ⊗ · · · ⊗ (1) K −1 .
which also reduces the numerical effort. In higher dimensions, the number of ele-
ments ei with zero values grows, such that using Eq. (16) instead of (9) improves the
running times.
In the most practical relevant case, where the components of the right-hand side
(10) are evaluated over the empirical copula (2), the numerical effort can be radically
reduced, because the d-dimensional integral
T
d
T
d
Ciδ = Ĉ(u)ψi (u)du = 1
T 1 {Ûk j ≤u k } ψik (u k )du = 1
T Iikj (17)
Ω j=1 Ω k=1 j=1 k=1
using Eqs. (13) and (14). In this case, the numerical effort is of order O (N T d)
which is an extreme improvement to O N 3d T + N 2+N 3d , if the d-dimensional
2
integrals (10) are numerically computed by a usual 3d -points Gauss formula. We want
to point out that the computation of the right-hand side (10) for the empirical copula
based on formula (17) is still possible for d = 9, whereas the computational effort for
computing (16) for an arbitrary given copula C is exorbitant, even if the discretization
size n is moderately chosen. The numerical effort is illustrated in Table 1.
Note that contrary to what might be expected, the vector c = (c1 , . . . , c N )T does
not count the number of samples in the elements, even though the approximated
solution ch is a piecewise constant function on the elements and the Petrov–Galerkin
projection is not simple counting (for more details see [15]).
2.1 Examples
In order to illustrate the computing times and approximation quality, we use the
independent copula
d
C(u) = uk
k=1
which has the exact solution c(u) = 1. Please note that for this example, we used
the exact copula as right-hand side without generating samples. So, there is no data
noise and hence δ = 0, which allows us to separate the approximation error and the
ill-posedness resulting from the uncontinuity of the inverse operator C−1 .
Many authors (see, for example, [11]) look at the integrated square error, which is
the squared L 2 -norm of the difference between the copula density and its approxima-
tion. For the independent copula, the integrated square error can easily be computed
ISE(c, ch ) = c − ch 2L 2 (Ω) =
c − (1, 1, . . . , 1)T
2 .
N l
Actually, this error measure is unsuitable, because the natural space for densities is
L 1 instead of L 2 (see [3]) and so we measure the difference in the L 1 -norm, which
also can be easily computed for the independence copula
1
(a) (b)
35 1
30
25 0.8
20 0.6
15 0.4
10
5 0.2
0 0
1 1
0.8 1 0.8 1
0.6 0.8 0.6 0.8
0.4 0.6 0.6
0.4
0.2 0.2
0.4
0.2 0.2 0.4
0 0 0 0
(a) (b)
4 1
3.5
3 0.8
2.5 0.6
2
1.5 0.4
1
0.5 0.2
0 0
1 1
0.8 1 0.8 1
0.6 0.8 0.6 0.8
0.6 0.6
0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2
0 0 0 0
condition number of the system matrix K is the condition number of the one-
dimensional system matrix (1) K to the power of d.
Naturally, our proposed method works not only for the rather simple independence
copula, it also works quite well for all typical copula families. The approximation
error for noise free right-hand sides can be neglected. Figures 3 and 4 show the
reconstructed densities for the Student and Frank copula, using exact data for the
right-hand side. However, ill-posedness is expected when empirical copulas are used.
In [14], numerical results for other copula families, like the Gaussian, Gumbel,
or Clayton copula, can also be found. However, ill-posedness is expected when
empirical copulas are used and we are faced with data noise, which we discuss in
the next section.
Note that in real problems, the copula C is not known and we only have noisy data
(10) instead of (9). In order to illustrate the expected numerical instabilities, we have
simulated T samples for each two-dimensional copula and present the nonparametric
reconstructed densities using the Petrov–Galerkin projection with grid size n = 50.
(a) (b)
25 25
20 20
15 15
10 10
5 5
0 0
1 1
0.8 1 0.8 1
0.6 0.8 0.6 0.8
0.6 0.4 0.6
0.4 0.4 0.4
0.2 0.2 0.2 0.2
0 0 0 0
(c) (d)
25 25
20 20
15 15
10 10
5 5
0 0
1 1
0.8 1 0.8 1
0.6 0.8 0.6 0.8
0.4 0.6 0.4 0.6
0.4 0.4
0.2 0.2 0.2 0.2
0 0 0 0
(a) (b)
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
−1 −1
1 1
0.8 1 0.8 1
0.6 0.8 0.6 0.8
0.4 0.6 0.4 0.6
0.4 0.4
0.2 0.2 0.2 0.2
0 0 0 0
(c) (d)
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
−1 −1
1 1
0.8 1 0.8 1
0.6 0.8 0.6 0.8
0.4 0.6 0.4 0.6
0.4 0.4
0.2 0.2 0.2 0.2
0 0 0 0
A typical problem of ill-posed inverse problems is, that the numerical instability
decreases if the grid size n decreases, which can also be seen in Table 1. Therefore,
we fix the grid size n = 50 and look at the influence of sample size T .
Because of (3), the data noise δ increases if T decreases. Figures 5 and 6 show
the expected ill-posedness appearing for decreasing sample size T . Of course, this
instabilities also occur for the other copula families, but we restrict our illustration
here to these two examples. More examples can be found in [14].
To overcome the ill-posedness, an appropriate regularization for the discretized
problem (8) is required. Figures 7 and 8 show the reconstructed copula densities for
T = 1,000 and T = 10,000 samples using the well-known Tikhonov regularization.
There is no regularization, if the regularization parameter α = 0 is chosen. The
left-hand side of the figures shows the unregularized solutions. The choice of the
regularization parameter α = 10−8 is very naive and arbitrary and serves only as
demonstration how the instability can be handled. A better parameter choice should
improve the reconstructed densities. It is further work to discuss an appropriate
parameter choice rule for Tikhonov regularization as well as other regularization
methods.
In order to avoid the complete assembling of the system matrix K leading to
high-dimensional systems for d > 2, we are interested in regularization methods
436 D. Uhlig and R. Unger
Tikhonov−Regularisation, alpha=1.00e−08
(a) (b)
15 15
10 10
5 5
0 0
1 1
0.8 1 0.8 1
0.6 0.8 0.6 0.8
0.4 0.6 0.4 0.6
0.4 0.2 0.4
0.2 0.2 0.2
0 0 0 0
Tikhonov−Regularisation, alpha=1.00e−08
(c) (d)
15 15
10 10
5 5
0 0
1 1
0.8 1 0.8 1
0.6 0.8 0.6 0.8
0.4 0.6 0.4 0.6
0.4 0.2 0.4
0.2 0.2 0.2
0 0 0 0
using the special structure (15). In particular, all regularization methods based on
the singular value or eigenvalue decomposition of K can be easily handled because
the eigenvalue decomposition of the one-dimensional matrix(1) K = V ΛV T leads
to the eigenvalue decomposition of the system matrix
K = (V ⊗ · · · ⊗ V ) (Λ ⊗ · · · ⊗ Λ) V T ⊗ · · · ⊗ V T .
A typical property of Tikhonov regularization is that true peaks in the density will be
smoothed. This effect appears in particular for the Student copula density. Hence, the
reconstruction quality should be improved, if other regularization methods are used.
In the inverse problem theory, it is well-known that Tikhonov regularization accom-
panies L 2 -norm penalization of the regularized solutions. Therefore, L 1 penalties or
total variation penalties (see [7]) seem more suitable.
Furthermore, the approximated copula
u
N
Ch (u) = ch (s)ds = c j ψ j (u)
0 j=1
Nonparametric Copula Density Estimation Using a Petrov–Galerkin Projection 437
Tikhonov−Regularisation, alpha=1.00e−08
(a) (b)
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
1 1
0.8 1 0.8 1
0.6 0.8 0.6 0.8
0.4 0.6 0.4 0.6
0.4 0.4
0.2 0.2 0.2 0.2
0 0 0
Tikhonov−Regularisation, alpha=1.00e−08
(c) (d)
4 4
3.5 3.5
3 3
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
1 1
0.8 1 0.8 1
0.6 0.8 0.6 0.8
0.4 0.6 0.4 0.6
0.4 0.4
0.2 0.2 0.2 0.2
0 0 0 0
should yield the typical properties of copulas. For example, the requirement
!
Ch (1, . . . , 1) = 1
N
yields the condition j=1 c j = 1 and the requirements
!
Ch (1, . . . , 1, u k , 1, . . . , 1) = u k k = 1, . . . , d
lead to additional conditions on the vector c, which all together can be used to build
problem specific regularization methods.
Open Access This chapter is distributed under the terms of the Creative Commons Attribution
Noncommercial License, which permits any noncommercial use, distribution, and reproduction in
any medium, provided the original author(s) and source are credited.
438 D. Uhlig and R. Unger
References
1. Anderssen, R.S., Hegland, M.: For numerical differentiation, dimensionality can be a blessing!.
Math. Comput. 68(227), 1121–1141 (1999)
2. Deheuvels, P.: Non parametric tests of independence. In: Raoult J.P. (ed.) Statistique non
Paramétrique Asymptotique. Lecture Notes in Mathematics, vol. 821, pp. 95–107. Springer,
Berlin Heidelberg (1980). doi:10.1007/BFb0097426
3. Devroye, L., Györfi, L.: Nonparametric Density Estimation: the L1 View. Wiley Series in
Probability and Mathematical Statistics. Wiley, New York (1985)
4. Engl, H., Hanke, M., Neubauer, A.: Regularization of Inverse Problems. Mathematics and Its
Applications. Springer, New York (1996)
5. Grossmann, C., Roos, H., Stynes, M.: Numerical Treatment of Partial Differential Equations.
Universitext. Springer, Berlin (2007)
6. Kauermann, G., Schellhase, C., Ruppert, D.: Flexible copula density estimation with penalized
hierarchical b-splines. Scand. J. Stat. 40(4), 685–705 (2013)
7. Koenker, R., Mizera, I.: Density estimation by total variation regularization. Adv. Stat. Model.
Inference pp. 613–634 (2006)
8. Mai, J., Scherer, M.: Simulating Copulas: Stoch. Models. Sampling Algorithms and Applica-
tions. Series in Quantitative Finance. Imperial College Press, London (2012)
9. McNeil, A., Frey, R., Embrechts, P.: Quantitative Risk Management: Concepts. Techniques
and Tools. Princeton Series in Finance. Princeton University Press, USA (2010)
10. Nelsen, R.B.: An Introduction to Copulas (Springer Series in Statistics). Springer, New York
(2006)
11. Qu, L., Qian, Y., Xie, H.: Copula density estimation by total variation penalized likelihood.
Commun. Stat.—Simul. Comput. 38(9), 1891–1908 (2009). doi:10.1080/03610910903168587
12. Qu, L., Yin, W.: Copula density estimation by total variation penalized likelihood with linear
equality constraints. Computat. Stat. Data Anal. 56(2), 384–398 (2012). doi: http://dx.doi.org/
10.1016/j.csda.2011.07.016
13. Schuster, T., Kaltenbacher, B., Hofmann, B., Kazimierski, K.: Regularization Methods in
Banach Spaces. Radon Series on Computational and Applied Mathematics. Walter De Gruyter
In., Berlin (2012)
14. Uhlig, D., Unger, R.: A Petrov-Galerkin projection for copula density estimation. Techni-
cal report, TU Chemnitz, Department of Mathematics (2013). http://www.tu-chemnitz.de/
mathematik/preprint/2013/PREPRINT.php?year=2013&num=07
15. Uhlig, D., Unger, R.: The Petrov-Galerkin projection for copula density estimation isn’t
counting. Technical report, TU Chemnitz, Department of Mathematics (2014). http://www.
tu-chemnitz.de/mathematik/preprint/2014/PREPRINT.php?year=2014&num=03