ReadingsInFormalEpistemology PDF
ReadingsInFormalEpistemology PDF
Readings in
Formal
Epistemology
Sourcebook
Springer Graduate Texts in Philosophy
Volume 1
The Springer Graduate Texts in Philosophy offers a series of self-contained
textbooks aimed towards the graduate level that covers all areas of philosophy
ranging from classical philosophy to contemporary topics in the field. The texts
will, in general, include teaching aids (such as exercises and summaries), and covers
the range from graduate level introductions to advanced topics in the field. The
publications in this series offer volumes with a broad overview of theory in core
topics in field and volumes with comprehensive approaches to a single key topic in
the field. Thus, the series offers publications for both general introductory courses
as well as courses focused on a sub-discipline within philosophy.
The series publishes:
• All of the philosophical traditions
• Includes sourcebooks, lectures notes for advanced level courses, as well as
textbooks covering specialized topics
• Interdisciplinary introductions – where philosophy overlaps with other scientific
or practical areas
We aim to make a first decision within 1 month of submission. In case of a
positive first decision the work will be provisionally contracted: the final decision
about publication will depend upon the result of the anonymous peer review of
the complete manuscript. We aim to have the complete work peer-reviewed within
3 months of submission. Proposals should include:
• A short synopsis of the work or the introduction chapter
• The proposed Table of Contents
• CV of the lead author(s)
• List of courses for possible course adoption.
The series discourages the submission of manuscripts that are below 65,000
words in length.
For inquiries and submissions of proposals, authors can contact Ties.
[email protected]
Readings in Formal
Epistemology
Sourcebook
Assistant Editors
Henrik Boensvang
Rasmus K. Rendsvig
123
Editors
Horacio Arló-Costa (deceased) Vincent F. Hendricks
Center for Information and Bubble Studies
Johan van Benthem University of Copenhagen
University of Amsterdam Copenhagen, Denmark
Amsterdam, The Netherlands
Stanford University
Stanford, United States
Assistant Editors
Henrik Boensvang Rasmus K. Rendsvig
University of Copenhagen Lund University
Copenhagen, Denmark Lund, Sweden
Center for Information and Bubble Studies
University of Copenhagen
Copenhagen, Denmark
“Formal epistemology” is a term coined in the late 1990s for a new constellation
of interests in philosophy, merging traditional epistemological concerns with new
influences from surrounding disciplines like linguistics, game theory, and computer
science. Of course, this movement did not spring to life just then. Formal epistemo-
logical studies may be found in the classic works of Carnap, Hintikka, Levi, Lewis,
Kripke, Putnam, Quine, and many others.
Formal epistemology addresses a growing agenda of problems concerning
knowledge, belief, certainty, rationality, deliberation, decision, strategy, action, and
agent interaction – and it does so using methods from logic, probability theory,
computability theory, decision theory, game theory, and elsewhere. The use of these
formal tools is to rigorously formulate, analyze, and sometimes solve important
issues of interest to philosophers but also to researchers in other disciplines, from the
natural sciences and humanities to the social and cognitive sciences and sometimes
even the realm of technology. This makes formal epistemology an interdisciplinary
endeavor practiced by philosophers, logicians, mathematicians, computer scientists,
theoretical economists, social scientists, cognitive psychologists, etc.
Although a relative newcomer, formal epistemology is already establishing
itself in research environments and university curricula. There are conferences,
workshops, centers, and jobs in formal epistemology, and several institutions offer
courses or seminars in the field.
Yet no volume is in existence comprising canonical texts that define the field
by exemplars. Lecturers and students are forced to collect influential classics
and seminal contemporary papers from uneven sources, some of them hard to
obtain even for university libraries. There are excellent anthologies in mainstream
epistemology, but these are not tuned to new fruitful interactions between the
mainstream and a wider spectrum of formal approaches.
Readings in Formal Epistemology is intended to remedy this situation by
presenting some three dozen key texts, divided into five subsections: Bayesian
Epistemology, Belief Change, Decision Theory, Logics of Knowledge and Belief,
and Interactive Epistemology. The selection made is by no means complete but
vii
viii Preface
On the way to compiling this volume, we have been assisted by many people. Jeffrey
Helzner and Gregory Wheeler gave us valuable suggestions for texts to include. We
are grateful for their assistance in the selection process. Many authors represented
in this volume provided us with essential copies of their papers while also giving
important input on the organization of this collection. We thank them for their kind
help. We would have liked to have included even more seminal papers, but due
to limitations of space, and the fact that some copyrights were either impossible
trace or too expensive to obtain, we ended up with the current selection. We are
furthermore indebted to Springer Science and Business Media for taking on this
project, especially Ties Nijssen, Christi Lue, and Werner Hermens. The editors
also acknowledge the generous funding provided by the Elite Research Prize from
the Danish Ministry of Science, Technology, and Innovation awarded to Vincent F.
Hendricks in 2008.
Finally, this volume would not have seen the light of day without the constant
efforts of Henrik Boensvang and Rasmus K. Rendsvig in communicating with
relevant parties, collecting the required permissions, and compiling all the papers
patiently and efficiently while paying painstaking attention to detail. In the process,
they have more than earned the right to the title of assistant editors of Readings in
Formal Epistemology.
ix
Copyright Acknowledgments
Helzner, J. and Hendricks, V.F. (2010). “Agency and Interaction: What we are and what we do in
formal epistemology,” Journal for the Indian Council of Philosophical Research, vol. XXVII:2,
2010: 44–71, special issue on Logic and Philosophy Today, guest edited by Amitabha Gupta
and Johan van Benthem.
Ramsey, F.P. (1926) “Truth and Probability,” in Ramsey, F.P. (1931), The Foundations of Math-
ematics and other Logical Essays, Ch. VII, p.156–198, edited by R.B. Braithwaite, London:
Kegan, Paul, Trench, Trubner & Co., New York: Harcourt, Brace and Company.
Jeffrey, R.C. (1968), “Probable Knowledge,” in The Problem of Inductive Logic, ed. I. Lakatos,
166–180, Amsterdam: North-Holland. Courtesy of Edith Jeffrey.
Van Fraassen, B.C. (1995) “Fine-Grained Opinion, Probability, and the Logic of Full Belief.”
Journal of Philosophical Logic 24 (4).
Gaifman, H. (1986) “A theory of higher order probabilities,” in Theoretical Aspects of Reasoning
About Knowledge: Proceedings of the 1986 conference on Theoretical aspects of reasoning
about knowledge, pp. 275—292, Morgan Kaufmann Publishers Inc. (Monterey, California).
Levi, I. (1974), “On Indeterminate Probabilities,” The Journal of Philosophy, 71, 391–418.
Glymour, C. (1981) “Why I’m not a Bayesian,” excerpt from Glymour, C. (1981) Theory and
Evidence, Chicago University Press, 63–93.
Skyrms, B. (1993) “A Mistake in Dynamic Coherence Arguments – Discussion.” Philosophy of
Science, 60(2):320–328.
Arntzenius, F. (2003), “Some problems for conditionalization and reflection,” The Journal of
Philosophy, Vol. C, No. 7, 356–371.
M. J. Schervish, T. Seidenfeld and J. B. Kadane (2004) “Stopping to Reflect,” The Journal of
Philosophy, Vol. 101, No. 6, 315–322.
Alchourron, C.E., Gardenfors, P., and Makinson, D. (1985) “On the Logic of Theory Change:
Partial Meet Contraction and Revision Functions.” Journal of Symbolic Logic, 50(2): 510–530.
Reprinted with the permission of the copyright holders, the Association of Symbolic Logic.
xi
xii Copyright Acknowledgments
Hansson, S.O. (1993) “Theory Contraction and Base Contraction Unified.” Journal of Symbolic
Logic, 58(2): 602–625. Reprinted with the permission of the copyright holders, the Association
of Symbolic Logic.
Levi, I. How Infallible but Corrigible Full Belief is Possible, hitherto unpublished.
Rott, H. (1993) “Belief Contraction in the Context of the General Theory of Rational Choice.”
Journal of Symbolic Logic, 58(4): 1426–1450. Reprinted with the permission of the copyright
holders, the Association of Symbolic Logic.
Spohn, W. (2009) “A Survey of Ranking Theory.” In Franz Huber and Christoph Schmidt-Petri
(eds.). Degrees of Belief. Dordrecht: Springer.
Savage, L. (1972) “Allais’s Paradox” The Foundations of Statistics, Dover Publications, Inc., New
York, 101–103.
Seidenfeld, T. (1988), “Decision Theory without ‘Independence’ or without ‘Ordering’: What is
the Difference,” Economics and Philosophy, 4: 267–290.
Gilboa, I. and M. Marinacci (forthcoming) “Ambiguity and the Bayesian Paradigm,” Advances
in Economics and Econometrics: Theory and Applications, Tenth World Congress of the
Econometric Society.
Schervish, M.J., Seidenfeld, T. and Kadane, J.B. (1990) “State-Dependent Utilities,” Journal of
the American Statistical Association, Vol. 85, No. 411, 840–847.
Gibbard, A. and Joyce, J.M. (1998) “Causal Decision Theory.” In Salvador Barberà, Peter J.
Hammond, and Christian Seidl, eds., Handbook of Utility Theory, Vol. 1: Principles, pp. 701–
740. Dordrecht & Boston: Kluwer.
Tversky, A. and Kahneman, D. (1992). “Advances in prospect theory: Cumulative representation
of uncertainty.” Journal of Risk and Uncertainty 5: 297–323.
Hintikka, J. (2007) “Epistemology without Knowledge and without Belief” in Socratic Epistemol-
ogy: Explorations of Knowledge-Seeking by Questioning, Cambridge University Press.
Dretske, F.I. (1970) “Epistemic Operators.” The Journal of Philosophy, Vol. 67, No. 24, 1007–
1023.
Lewis, D. (1996) “Elusive Knowledge,” Australasian Journal of Philosophy, Vol. 74(4), 549–567.
Courtesy of Stephanie Lewis.
Nozick, R. (1981) “Knowledge and Skepticism” In Philosophical Explanations, Harvard Univer-
sity Press, 167–169, 172–179, 197–211, 679–690. Reprinted by permission of the publisher
from “Knowledge and Skepticism,” in PHILOSOPHICAL EXPLANATIONS by Robert
Nozick, pp. 167–169, 172–179, 197–211, 679–690, Cambridge, Mass.: The Belknap Press of
Harvard University Press, Copyright © 1981 by Robert Nozick.
Stalnaker, R. (2006) “On Logics of Knowledge and Belief,” Philosophical Studies 120, 169–199.
Parikh, R. (2008) “Sentences, Belief and Logical Omniscience, or What Does Deduction Tell Us?.”
The Review of Symbolic Logic, 1(4). Reprinted with the permission of the copyright holders,
the Association of Symbolic Logic.
Artemov, S.N. (2008) “The logic of justification.” The Review of Symbolic Logic, 1(4):477–513.
Reprinted with the permission of the copyright holders, the Association of Symbolic Logic.
Kelly, K. (2004) “Learning Theory and Epistemology” in Handbook of Epistemology, I. Niiniluoto,
M. Sintonen, and J. Smolenski (eds.), Dordrecht: Kluwer.
Williamson, T. (2004) “Some Computational Constraints in Epistemic Logic,” in Logic, Epistemol-
ogy and the Unity of Science, S. Rahman et al (eds). Dordrecht: Kluwer Academic Publishers:
437–456.
Copyright Acknowledgments xiii
Lewis, D. (1969) Convention: A Philosophical Study, Harvard University Press, 24–42 (excerpt).
Courtesy of Stephanie Lewis.
Barwise, J. (1988) “Three Views of Common Knowledge.” In Proc. TARK’88: 365–379, Morgan
Kaufmann Publishers.
Baltag, A. and Smets, S. (2008) “A Qualitative Theory of Dynamic Interactive Belief Revision,”
in G. Bonanno, W. van der Hoek, M. Wooldridge (eds.), Logic and the Foundations of Game
and Decision Theory, Texts in Logic and Games, Vol 3, 9–58, Amsterdam University Press.
Aumann, R. (1976) “Agreeing to Disagree,” Annals of Statistics 4, 1236–1239.
Aumann, R. and Brandenburger, A. (1995) “Epistemic Conditions for Nash Equilibrium,” Econo-
metrica, Vol. 63, No. 5,1161–1180.
Stalnaker, R. (1996), “Knowledge, belief and counterfactual reasoning in games,” Economics and
Philosophy 12: 133–163
Halpern, J.Y., (2001) “Substantive Rationality and Backward Induction,” Games and Economic
Behavior, Elsevier, vol. 37(2), 425–435.
Contents
xv
xvi Contents
Index . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 933
Contributors
xix
xx Contributors
xxiii
Chapter 1
Agency and Interaction What We Are and What
We Do in Formal Epistemology
Introduction
J. Helzner ()
AIG, New York, NY, USA
e-mail: [email protected]
V.F. Hendricks
Center for Information and Bubble Studies, University of Copenhagen, Copenhagen, Denmark
e-mail: [email protected]
Probability
Probabilities are very useful in formal epistemology. They are used to measure key
epistemic components like belief and degrees thereof, the strength of expectations
and predictions, may be used to describe actual occurrent frequencies in nature or in
the agent’s environment and of course probabilities play a paramount in accounting
for notions of (Bayesian) confirmation (Earman 1992). Current cognitive models
apply probabilities to represent aggregated experience in tasks involving language
acquisition, problem solving and inductive learning, conditionalization and updating
beliefs and scientific hypotheses.
What sorts of internal states are essential to the agent’s representation of its
environment? Various doxastic notions e.g., according to which the agent simply
believes or is certain of propositions, in contrast to believing the proposition to
some degree, are a traditional interest within mainstream epistemology. Some
philosophers, e.g. Jeffrey (1992), have argued in favor of a doctrine known as radical
probabilism. A central tenet of this doctrine is that various propositional attitudes
of epistemic interest, especially full belief, are reducible to credal judgments. There
are several ways that one might attempt such a reduction. Perhaps the most obvious
is to identify full belief with maximal partial belief. For example, if it is assumed
that the agent’s credal state can be represented by a probability measure, then such
a reduction would identify those propositions that are fully believed by the agent
with those propositions that have maximal probability according to this representing
measure. Following this proposal, it would seem that a proposition counts as a
serious possibility for the agent just in case its negation is not assigned maximal
probability according to the probability measure representing the agent’s credal
judgments. Hence, by the probability axioms, a proposition counts as seriously
possible for the agent just in case it has nonzero probability under the representing
measure. This leads to certain difficulties. For example, if the agent is concerned
to estimate the height of an object that is sufficiently distant, then the agent might
regard a continuum of values as possible – e.g., the height of the object is judged
to be between three and four feet. According to the suggested reduction, such a
continuum of possible values for the height of the object could not serve as a set of
serious possibilities, since it is a mathematical fact that no probability measure can
distribute positive probability to each element of such a continuum. The interested
reader is urged to consult van Fraassen (1995) and Arlo-Costa (2001) for more
sophisticated versions of probabilism.
Following Levi (1980), one may assume that the agent is, at each point in time,
committed to full belief in some set of propositions concerning its environment.
1 Agency and Interaction What We Are and What We Do in Formal Epistemology 3
Where the agent is not committed to full belief in a given proposition, the negation
of that proposition is a serious possibility for the agent. The agent may judge some
serious possibilities to be more probable than others. What can be said about these
judgments? The received view, following a tradition that goes back to the work of
Ramsey (1931), maintains that such credal judgments ought to be representable by
a probability measure. This view has been criticized as being too weak and as being
too strong. As for being too weak, the simple requirement that such judgments be
representable by a probability measure says little about the extent to which these
subjective probabilities should approximate objective probabilities, e.g., limiting
frequencies in the sense of von Mises (1957) or perhaps even propensities in the
sense of Popper (1959). Various principles have been offered in order to require
that the subjective probabilities of a rational agent are informed by that agent’s
knowledge of objective probabilities (Kyburg 1974; Levi 1978; Lewis 1980). As
for being too strong, requiring credal judgments to be representable by a probability
measure implies, among other things, that such credal judgments are complete. A
consequence of such a requirement is that, for any given pair of serious possibilities,
the agent either judges one of the possibilities to be more probable than the other or
the agent regards the possibilities as being equally probable. Thus, the requirement
bars situations in which the agent, because of a lack of information, is unable to
supply such a judgment. Such considerations, which to some extent echo earlier,
related concerns of Keynes (1921) and Knight (1921), have motivated some –
e.g., Kyburg (1968), Levi (1974) and Walley (1990) – to consider indeterminate
probabilities, either in the form of interval-valued measures or sets of traditional
measures, in representing rational credences.
Belief Change
As already hinted, some probability theorists tend to think that belief, as opposed
to knowledge, may be good enough for action, deliberation and decision. Thus
beliefs may suffice as they can serve important epistemic purposes while holding the
information, expectations and conjectures that agents act on. Beliefs may be used
for making creative leaps beyond what is logically implied by available information,
evidence or knowledge and are crucial in agent interaction models representing what
agents think about moves, strategies, payoffs and beliefs of other agents and what
agent rationality amounts to Finally, beliefs and belief revision are prime vehicles
for understanding the mechanism of learning by trial-and-error, one of the main
motors of scientific inquiry in general.
Initially, an agent has beliefs about the environment with which it interacts.
Sometimes these interactions are such that the agent, on pain of irrationality, must
revise its beliefs. The classic example is that of a scientific agent who has beliefs
about the world that might need to be revised in light of new data. The study
of this sort of example has a long history in the philosophy of science, where
it is often discussed at a relatively informal level in connection with topics such
4 J. Helzner and V.F. Hendricks
Decision Theory
An agent interacts with its environment through the choices it makes. Choice
presupposes alternatives, and a theory of rational choice should, at least, distinguish
some of the available alternatives as admissible. As an example, consider those
accounts of rational choice that are built on the concept of preference. One such
account assumes that the agent has complete and transitive preferences over the
set of available alternatives. Those alternatives that are optimal with respect to the
given preference ranking are taken as admissible. This abstract preference-based
account says nothing about the way in which preferences are informed by the agent’s
beliefs about its environment. Subjective expected utility theory [SEU], which is at
the center of modern-day decision theory, provides significantly more detail than
the abstract theory of preference optimization. SEU assumes that alternatives are
acts, which, following Savage’s classic formulation of SEU in Savage (1972), are
functions from states to consequences. Drawing upon the earlier work of Ramsey
(1931) on subjective probability and the work of von Neumann and Morgenstern
(1947) on utility, Savage provides conditions on the agent’s preferences over acts
that guarantee the existence of a probability measure p and a utility function u such
that the agent’s preferences can be regarded as if they were the result of maximizing
utility u with respect to probability p. According to the intended interpretation,
the probability measure p represents the agent’s degrees of belief concerning the
possible states and the utility function u represents the extent to which the agent
values the possible consequences.
The assumptions of SEU may be questioned in various ways. We focus on
two ways that have generated significant interest among philosophers. First, why
should it be that the rational agent’s degrees of belief can be represented by a
probability distribution p? As already noted, it is not clear why such an assumption
should obtain in cases where the agent has very little information concerning the
possible states. Second, in SEU it is assumed that the agent’s subjective probability
concerning the states is independent of the act that is chosen. Some question this
assumption and offer examples in which a modification of SEU that provides for
such dependencies, through the use of conditional probabilities, is supposed to give
an irrational recommendation. The first line of questioning has led some – e.g.,
Ellsberg (1961), Levi (1974, 1977), and Gardenfors and Sahlin (1982) – to use
indeterminate probabilities in their normative accounts of decision making under
uncertainty. The second line of questioning has led some – e.g., Gibbard and Harper
(1978), Lewis (1981), and Joyce (1999) – to investigate causal decision theory.
6 J. Helzner and V.F. Hendricks
What is now known as epistemic logic started with the study of proper axioma-
tizations for knowledge, belief, certainty and other epistemic attitudes. Hintikka
inaugurated the field with his seminal book Hintikka (1962) which focuses on
axiomatizing knowledge and belief in mainly mono-agent systems. Agents are syn-
tactically represented as indices on epistemic operators in a formal logical language.
From the semantic perspective, to be an agent is to be an index on an accessibility
relation between possible worlds representing the epistemic alternatives over which
the agent has to succeed in order to know some proposition (interesting alternative
semantics to Kripke semantics have been developed by Arlo-Costa and Pacuit 2006,
Baltag and Moss 2004 and others). Like many other philosophical logics in their
infancy, interesting axiomatizations governing the logics of knowledge and belief
took center stage in the beginning together with nailing down important logical
properties for these new logics. The field was living a somewhat isolated life remote
from the general concerns of mainstream epistemology. Hintikka himself (and a
few others like Lenzen 1978) was a notable exception and insisted on telling a
better story, not about what agents are in the logical language, but about what
they do and the meaning of epistemic axioms for epistemology (Stalnaker 2006).
Accordingly, Hintikka took axioms of epistemic logic to describe a certain kind
of strong rationality much in sync with the autoepistemological tradition of G.E.
Moore and especially Norman Malcolm. Axioms of epistemic logic are really
prescriptions of rationality in mono-agent systems. Epistemic logic has since been
used address a number of important philosophical problems including for instance
the Fitch Paradox (Brogaard and Salerno 2009), the problem of logical omniscience
(Duc 1997; Parikh 2005), and various conceptual characterizations of knowledge
and other epistemic attitudes (Kraus and Lehmann 1988).1
But rationality considerations are not only central to the singular agent acting
in some environment, call it nature, but likewise, and perhaps especially, central
to agents when in presence of other agents and interacting with these. Thus mono-
agent systems had to be extended to multi-modal systems in order to get both agency
and interaction off the epistemological ground for real. A sea-change took place in
epistemic logic in the late 1980s and the beginning of the 1990s especially due to the
work of Joseph Halpern and his collaborators (Fagin et al. 1995) and others (Meyer
and Hoek 1995). Multiple agents were introduced into the logical language which,
along with multiple epistemic accessibility relations on the semantic level, gave rise
to a precise and adequate representation of the flow of information through an agent
system, together with the nature of various protocols governing such systems. In
this setting, possible worlds are to be understood as the states of the system taken as
a whole, or sometimes the possible histories or consecutive runs of the system as a
1
For solid overviews refer to De Bruin (2008) and Gochet and Gribomont (2006).
1 Agency and Interaction What We Are and What We Do in Formal Epistemology 7
whole, that are compatible with the state transition directives which rule the system.
Stalnaker has recently summarized the consequences of this sea-change precisely:
The general lesson I drew from this work was that it was useful for epistemology to think
of communities of knowers, exchanging information and interacting with the world, as
(analogous to) distributed computer systems. (Hendricks and Roy 2010: 78)
Interactive Epistemology
possible, agents have to agree to disagree (Aumann 1976). That is agency in terms
of players, interaction in terms of games.
On the way to this result Aumann made a host of assumptions about the
nature knowledge much in tune with what is to be found in epistemic logic like
the axiomatic strength of knowledge in order to infer the backwards induction
equilibrium and assumptions about what is common knowledge among the players
(Halpern 2001). In 1999, Aumann coined a term for these kinds of study in
theoretical economics: “Interactive epistemology” (Aumann 1999). It denotes an
epistemic program studying shared knowledge and belief given more than one
agent or player in an environment and has, as already suggested, strong ties to
game theoretic reasoning and questions of common knowledge and belief, backward
induction, various forms of game equilibria and strategies in games, (im)perfect
information games, (bounded) rationality etc (Aumann and Brandenburger 1995;
Stalnaker 1996, 2006).
Given its inauguration with Aumann, the program was in the beginning dom-
inated by scholars drawn from theoretical economics and computer science rather
than philosophy and logic, but recently philosophers and logicians have begun to pay
close attention to what is going on in this striving program of formal epistemology.
And for good reason too; social epistemology focuses on knowledge acquisition
and justification in groups or institutions (Goldman 1999) and the extent to which
exactly institutions may be viewed as genuine agents (List and Pettit 2011) while
the interactive epistemological approach to agency and interaction also have close
shaves with the major new focal points in dynamic epistemic logic (Benthem 2011)
and much of the technical machinery is a common toolbox for both paradigms
(Brandenburger 2007).
Formal Epistemology
devoted to the empirical reality of agency and interaction on the other. In formal
epistemology we are walking the fine line between theory and reality. This is as it
should be: The hallmark of a progressive research program.
This is an edited and reorganized version of the paper “Agency and Interaction: What
We Are and What We Do in Formal Epistemology”, Journal for the Indian Council
of Philosophical Research, nr. 2, vol. XXVII, 2010: 44–71, special issue on Logic and
Philosophy Today, guest edited by Amitabha Gupta and Johan van Benthem.
References
Alchourron, C., Gardenfors, P., & Makinson, D. (1985). On the logic of theory change: Partial
meet functions for contraction and revision. Journal of Symbolic Logic, 50, 510–530.
Arlo-Costa, H. (2001). Bayesian epistemology and epistemic conditionals: On the status of the
export-import laws. The Journal of Philosophy, 98(11), 555–593.
Arlo-Costa, H., & Levi, I. (2006). Contraction: On the decision-theoretical origins of minimal
change and entrenchment. Synthese, 152(1), 129–154.
Arlo-Costa, H., & Liu, H. (2011). Value-based contraction: A representation result. In Proceedings
of TARK’11. New York: ACM Digital Library (ACM DL).
Arlo-Costa, H., & Pacuit, E. (2006). First order classical modal logic. Studia Logica, 84(2), 171–
210.
Artemov, S., & Nogina, E. (2005). Introducing justification into epistemic logic. Journal of Logic
and Computation, 15, 1059–1073.
Aumann, R. (1976). Agreeing to disagree. The Annals of Statistics, 4(6), 1236–1239.
Aumann, R. (1999). Interactive epistemology I. International Journal of Game Theory, 28(3), 263–
300.
Aumann, R., & Brandenburger, A. (1995). Epistemic conditions for nash equilibrium. Economet-
rica, 63(5), 1161–1180.
Baltag, A., & Moss, L. (2004). Logics for epistemic programs. Synthese/Knowledge, Rationality
and Action, 139, 165–224.
Baltag, A., & Smets, S. (2008). A qualitative theory of dynamic interactive belief revision. In Logic
and the foundation of game and decision theory (LOFT7 2008) (Vol. 3, pp. 13–60). Amsterdam:
Amsterdam University Press.
Baltag, A., Moss, L. S., & Solecki, S. (2002). The logic of public annoucements, common
knowledge, and private suspicion. In Proceedings of TARK 1998 (pp. 43–56). Los Altos:
Morgan Kaufmann Publishers.
Barwise, J. (1988). Three theories of common knowledge. In Proceedings of the 2nd conference
on theoretical aspects of reasoning about knowledge (pp. 365–379). Pacific Grove: California.
Benthem, J. v. (2001). Games in dynamic epistemic logic. Bulletin of Economic Research, 53(4),
219–224.
Benthem, J. v. (2007). Logic games, from tools to models of interaction. In A. Gupta, R. Parikh, &
J. van Benthem (Eds.), Logic at the crossroads (pp. 283–317). Mumbai: Allied Publishers.
Benthem, J. v. (2011). Logical dynamics of information and interaction. Cambridge: Cambridge
University Press.
Brandenburger, A. (2007). The power of paradox: Some recent developments in interactive
epistemology. International Journal of Game Theory, 35(4), 465–492.
Brogaard, B., & Salerno, J. (2009). Fitch’s paradox of knowability. In E. Zalta (Ed.), The stanford
encyclopedia of philosophy. Palo Alto: Stanford University.
De Bruin, B. (2008). Epistemic logic and epistemology. In V. F. Hendricks & D. Prichard (Eds.),
New waves in epistemology. London: Palgrave Macmillan.
10 J. Helzner and V.F. Hendricks
Duc, H. N. (1997). Reasoning about rational, but not logically omniscient agents. Journal of Logic
and Computation, 7, 633–648.
Earman, J. (1992). Bayes or bust? Cambridge: MIT.
Ellsberg, D. (1961). Risk, ambiguity, and the savage axioms. The Quarterly Journal of Economics,
75, 643–669.
Fagin, R., Halpern, J. Y., Moses, Y., & Vardi, M. Y. (1995). Reasoning about knowledge.
Cambridge: MIT.
Gardenfors, P., & Sahlin, N. E. (1982). Unreliable probabilities, risk taking, and decision making.
Synthese, 53, 361–386.
Gibbard, A., & Harper, W. (1978). Counterfactuals and two kinds of expected utility”. In C.
A. Hooker, J. J. Leach, & E. F. McClennen (Eds.), Foundations and applications of decision
theory (pp. 125–162). Dordrecht: Reidel.
Gierasimczuk, N. (2009). Learning by erasing in dynamic epistemic logic. In A. H. Dediu, A. M.
Ionescu, & C. Martín-Vide Language and automata theory and applications (Lecture notes in
computer science, Vol. 5457, pp. 362–373). Berlin/Heidelberg: Springer.
Gochet, P., & Gribomont, P. (2006). Epistemic logic. In Logic and the modalities in the twentieth
century (Handbook of the history of logic, pp. 99–185). Amsterdam: Elsevier.
Goldman, A. (1999). Knowledge in a social world. New York: Oxford University Press.
Halpern, J. Y. (2001). Substantive rationality and backwards induction. Games and Economic
Behavior, 37(2), 425–435.
Halpern, J. Y. (2003). Reasoning about uncertainty. Cambridge: MIT.
Hansson, S. O. (1993). Theory contraction and base contraction unified. Journal of Symbolic Logic,
58(2), 602–625.
Helzner, J., & Hendricks, V. F. (2011, forthcoming). Formal epistemology. In Oxford bib-
liography online. http://www.oxfordbibliographies.com/view/document/obo-9780195396577/
obo-9780195396577-0140.xml?rskey=a6uyIx&result=81
Hendricks, V. F. (2001). The convergence of scientific knowledge – A view from the limit.
Dordrecht: Springer.
Hendricks, V. F. (2006). Mainstream and formal epistemology. New York: Cambridge University
Press.
Hendricks, V. F., & Pritchard, D. (Eds.) (2007). Epistemology: 5 questions. New York: Automatic
Press/London: VIP.
Hendricks, V. F., & Roy, O. (Eds.) (2010). Epistemic logic: 5 questions. New York: Automatic
Press/London: VIP.
Hintikka, J. (1962). Knowledge and belief: An introduction to the logic of the two notions. Ithaca:
Cornell University Press.
Jeffrey, R. (1992). Probability and the art of judgment. New York: Cambridge University Press.
Joyce, J. (1999). The foundations of causal decision theory. Cambridge: Cambridge University
Press.
Kelly, K. T. (1996). The logic of reliable inquiry. New York: Oxford University Press.
Kelly, K. T. (2004). Learning theory and epistemology. In I. Niiniluoto, M. Sintonen, & J.
Smolenski (Eds.), Handbook of epistemology. Dordrecht: Kluwer.
Keynes, J. M. (1921). A treatise on probability. London: MacMillan.
Knight, F. H. (1921). Risk, uncertainty and profit. Boston/New York: Houghton-Mifflin.
Kraus, S., & Lehmann, D. (1988). Knowledge, belief and time. Theoretical Computer Science, 58,
155–174.
Kyburg, H. E. (1968). Bets and beliefs. American Philosophical Quarterly, 5(1), 54–63.
Kyburg, H. E. (1974). The logical foundations of statistical inference. Dordrecht: Reidel.
Lenzen, W. (1978). Recent work in epistemic logic (Acta philosophica fennica, Vol. 30, pp. 1–219).
Amsterdam: North-Holland
Levi, I. (1974). On indeterminate probabilities. The Journal of Philosophy, 71, 391–418.
Levi, I. (1977). Direct inference. The Journal of Philosophy, 74, 5–29.
Levi, I. (1978). In A. Hooker, J. J. Leach, & E. F. McClennen (Eds.), Foundations and applications
of decision theory (pp. 263–273). Dordrecht/Boston: D. Reidel.
1 Agency and Interaction What We Are and What We Do in Formal Epistemology 11
There are various possible ways of articulating what Bayesian epistemology is and
how it relates to other branches of formal and mainstream epistemology. Following
the steps of Ramsey, Richard Jeffrey outlines in his article “Probable Knowledge” a
possible way of constructing an epistemology grounded on Bayesian theory. While
knowledge is a central notion in traditional epistemology (and in various branches of
formal epistemology) Jeffrey suggests an epistemology where knowledge does not
have the importance generally attributed to it. The idea is “[ : : : ] to try to make
the concept of belief do the work that philosophers have generally assigned to
the grander concept” (knowledge). Moreover the notion of belief is pragmatically
analyzed along the lines proposed by Ramsey: “the kind of measurement of belief
with which probability is concerned is : : : . a measurement of belief qua basis of
action”. The result of this move is to conceive the logic of partial belief as a branch
of decision theory. So, the first two essays in this section are also quite relevant for
the section of decision theory presented below (Ramsey’s essay contains the first
axiomatic presentation of decision theory). Both Jeffrey and Ramsey present the
foundations of an epistemology which is deeply intertwined with a theory of action.
This move has a behaviorist pedigree but perhaps the behavioral inspiration is not
an essential ingredient of an interpretation of the formal theory that thus arises.
The notion of certainty or full belief does not play a central role in Jeffrey’s
epistemology either. According to him we are only certain of mathematical and
logical truths and the truths related immediately to experience. The rest is the
domain of probable knowledge. To be coherent with this view Jeffrey has to propose
a modification of the classical notion of learning by conditioning on data (which
occupies a central role in various versions of Bayesian Epistemology as used in the
social sciences like economics or psychology). In fact, according to conditioning
when one learns a new piece of evidence this information acquires measure one in
spite of being based on perfectly fallible evidence. A modification of conditioning
that permits to update on uncertain evidence is presented in “Probable Knowledge”.
The version of Jeffrey’s article reprinted here contains as well comments by L.
Hurwicz and P. Suppes and responses by Jeffrey. Some of Suppes’ comments point
in the direction of constructing a theory of rationality that is sensible to our cognitive
limitations. The possibility of constructing a “bounded” theory of rationality only
started with the seminal work of Herb Simon (Rational choice and the structure of
the environment, Psychological Review, Vol. 63 No. 2, 129–138. 1956) and is today
an active area of investigation in economics, psychology and philosophy.
The uses of Bayesianism in epistemology are usually dismissed by realist
philosophers for delivering a subjective picture of rationality that is not sufficiently
sensible to the way in which the behavior of rational agents is connected with
the structure of the environment. Simon’s work was certainly sensible nevertheless
to ecological considerations. And Ramsey’s essay ends with programmatic ideas
differentiating what he called “the logic of consistency” from the “logic of truth”.
Even Bruno de Finetti who is usually presented as a precursor of Jeffrey’s radical
probabilism, had philosophical ideas about certainty that clashed with this view
(he thought that certainty has to be assumed as a primitive alongside probability,
and that we can be certain of more than mere tautologies). Moreover his mature
philosophical work veered towards a more objective point of view. For example he
dismissed the use of Dutch Book arguments and embraced the use of foundational
arguments in terms of scoring rules, a methodological move favored today by
many “objective” Bayesians (a presentation of de Finetti’s mature views appear in:
Philosophical Lectures on Probability: collected, edited, and annotated by Alberto
Mura, Synthese Library, Springer, 2008).
Van Fraassen introduces in his essay a version of radical probabilism (the term
was coined by Jeffrey) where the only epistemological primitive is a notion of
conditional probability. Van Fraassen sees this notion as encoding a notion of
supposition from which he derives a non-trivial notion of full belief. According
to this view it is perfectly possible to be sure of the contingent propositions of
science and everyday knowledge. One can see van Fraassen’s theory as introducing
paradox-free acceptance rules that link probability and belief (some of the usual
acceptance rules of this type like high probability rules are known to be the victim
of various forms of paradoxes, like the paradox of the lottery first proposed by
Henry Kyburg (Probability and the Logic of Rational Belief, Wesleyan University
2 Introduction 17
Press, 1961)). Jeffrey renounced to the use of any form of acceptance rules of this
type and therefore proposed to eliminate any notion of qualitative belief without
a probabilistic origin. Van Fraassen has exactly the opposite intention: namely to
tend bridges between traditional and Bayesian epistemology via the use of novel
acceptance rules.
Most of the models of probability update considered above deal with synchronic
or suppositional change. Is it possible to extend these models to cover cases of
genuine changes of probabilistic belief? David Lewis, Bas van Fraassen and Paul
Teller provided in the 1970s various dynamic coherence arguments showing that
one should update diachronically via conditioning on pain of incoherence (see
references in the section on “Further reading” below). If we denote by Pt (BjA)
the conditional probability of B given A at time t and by Pt (.) the monadic
probability P at time t, we can denote by Pt’ the monadic probability at time t’
where the total evidence gathered between these two times is A. Then the idea that
one should update diachronically via conditioning can be expressed formally by:
Pt’ (B) D Pt (BjA). These arguments in favor of this diachronic principle are dynamic
versions of the static Dutch Book arguments first proposed in Ramsey’s essay.
Unfortunately these arguments are considerably more controversial than the well
know static Dutch Book argument. The article by Brian Skyrms summarizes almost
30 years of arguments pro and con dynamic book arguments and offers a temperate
and more modest version of these arguments that he thinks is valid. This debate is
nevertheless still open.
Levi’s essay is embedded on his own version of Bayesian epistemology where
the notion of full belief is taken as a primitive alongside probability. But the
central goal of the article reprinted here is to study the case where probabilities
are indeterminate, imprecise or vague. Levi also thinks that a theory of partial
belief (precise or imprecise) should be conceived as a branch of decision theory and
therefore proposes rules for deciding in conditions of uncertainty. Currently there
is a fair amount of work in this area not only in philosophy but also in statistics,
computer science and economics.
Gaifman’s article focuses on characterizing the structure of higher order prob-
ability. In particular he investigates a form of the so-called Miller’s principle
by which a rational agent adjusts his or her probabilities in accordance to the
probabilities of an expert. So, we have a principle of this form:
(Miller’s Principle) Pyou (A j Pexpert (A) D r) D r1
van Fraassen proposed a diachronic version of this principle for a single agent:
(Reflection) Pnow (A j Plater (A) D r) D r
1
Actually Gaifman’s formulation of the principle is formally cleaner. He defines PR(A, r) D fx in
W j px (A) D rg, where px is an expert function at world x, i.e. the distribution chosen by the expert
at that world. Then he formulates the principle as follows: P(A j PR(A, r)) D r.
18 H. Arló-Costa et al.
Arntzenius’ article presents five puzzles showing that rational people can update
their degrees of belief in manners that violate Bayesian conditioning and Reflection.
But the article by M.J. Schervish, T. Seidenfeld and J. Kadane disputes that
Arntzenius’ examples impose any new restrictions or challenges to conditioning
or Reflection beyond what is already familiar about these principles.
• An excellent introduction to Ramsey’s philosophy in general and to the essay reprinted here
in particular can be found in the corresponding chapters of: The Philosophy of F.P. Ramsey,
by Nils-Eric Sahlin, Cambridge University Press, 2008. The classical introduction to Richard
Jeffrey’s decision theory is his: The Logic of Decision, University Of Chicago Press: 2nd edition
(July 15, 1990). A detailed articulation of radical probabilism can be found in Probability and
the Art of Judgment, Cambridge Studies in Probability, Induction and Decision Theory (Mar.
27, 1992). The theory of probability cores presented in van Fraassen’s article has been slightly
modified and extended in a paper by Horacio Arlo-Costa and Rohit Parikh: “Conditional
Probability and Defeasible Inference,” Journal of Philosophical Logic 34, 97-119, 2005. The
best axiomatic presentation of primitive conditional probability is given by Lester E. Dubins in
his article Finitely Additive Conditional Probabilities, Conglomerability and Disintegrations,
The Annals of Probability, 3(1):89–99, 1975. Teddy Seidenfeld wrote an accessible note
presenting recent results in this area in: Remarks on the theory of conditional probability: Some
issues of finite versus countable additivity, Probability Theory, V.F. Hendricks et al. (eds.) 2001,
pp. 167-178. Alan Hájek articulated a philosophical defense of the use of primitive conditional
probability in: What Conditional Probability Could Not Be, Synthese, Vol. 137, No. 3, Dec.,
2003. Finally there is an interesting article by David Makinson linking conditional probability
and central issues in belief change: Conditional probability in the light of qualitative belief
change, to appear in a 2011 issue of the Journal of Philosophical Logic marking 25 years of
AGM. References to other classical articles in this area by Karl Popper, Alfred Renyi and Bruno
de Finetti appear in the aforementioned articles.
• Brian Skyrms has also contributed to the theory of higher order probability. One accessible
article is: “Higher Order Degrees of Belief,” in D. H. Mellor (ed.), Prospects for Pragmatism.
Cambridge: Cambridge University Press, 109–13. Isaac Levi has articulated his theory of
indeterminate probabilities in various books and articles. One of the classical sources is: The
Enterprise of Knowledge, MIT Press, Cambridge, 1983. More information about Levi’s version
of decision theory under uncertainty appears in section 7 on Decision Theory below.
• There are two classical sources for the formulation of dynamic Dutch books. One is: Teller,
P. (1973), “Conditionalization and Observation”, Synthese 26: 218-258. The other is: van
Fraassen, Bas (1984), “Belief and the Will,” Journal of Philosophy 81: 235–256. The second
piece introduces also a theory of second order probability that complements the writings of
Skyrms and Gaifman. Van Fraassen introduces there the Reflection Principle. The original
formulation of some of the puzzles discussed by Arntzenius and Seidenfeld is a brief piece
by Adam Elga: “Self-Locating Belief and the Sleeping Beauty problem, Analysis, 60(2): 143-
147, 2000. More detailed reference to the work by Carnap on induction and confirmation can
be found in the bibliography of Maher’s paper. The so-called Raven’s Paradox appeared for
the first time in a seminal article by Carl Hempel: “Studies in the Logic of Confirmation
(I.),” Mind, New Series, Vol. 54, No. 213 (Jan., 1945), pp. 1-26. Branden Fitelson and James
Hawthorne offer an alternative and interesting Bayesian account of the paradox in: “How
Bayesian Confirmation Theory Handles the Paradox of the Ravens,” in E. Eels and J. Fetzer
(eds.), The Place of Probability in Science, Chicago: Open Court. Further information about
confirmation theory can be found in a classical book by John Earman: Bayes or Bust? A Critical
2 Introduction 19
Examination of Bayesian Confirmation Theory, MIT Press, 1992. Another classical source is:
Scientific Reasoning: The Bayesian Approach, by Colin Howson and Peter Urbach, Open Court;
3rd edition, 2005. A interesting book touching a cluster of issues recently discussed in this area
like coherence and the use of Bayesian networks in epistemology is: Bayesian Epistemology
by Luc Bovens and Stephan Hartmann, Oxford University Press,. 2004.
• Another important formal epistemological issue is investigated by Timothy Williamson in his
paper, “Conditionalizing on Knowledge”, British Journal for the Philosophy of Science 49 (1),
1998: 89-121, which intends to integrate the theory of probability and probability kinematics,
with other epistemological notions like the notion of knowledge. The theory of evidential
probability that thus arises is based on two central ideas: (1) the evidential probability of
a proposition is its probability conditional on the total evidence (or conditional on evidence
propositions); (2) one’s total evidence is one’s total knowledge. The tools of epistemic logic
are used in order to represent the relevant notion of knowledge.
• Jeffrey does not adopt (1) but according to his modified notion of updating once a proposition
has evidential probability 1, it keeps it thereafter (monotony). This is a feature shared by
Jeffrey’s updating and the classical notion of updating. Williamson does embrace (1) but
develops a model of updating that abandons monotony. This seems a very promising strategy
given the limited applicability of a cumulative model of growth of knowledge. Similarly
motivated models (that are nevertheless formally quite different) have been proposed by Isaac
Levi, Peter Gärdenfors. Gärdenfors’ model appears in his book Knowledge in Flux (see the
corresponding reference in the bibliographical references of chapter 6). Levi presents his
account in The Enterprise of Knowledge (the reference appears in the bibliographical section
below). Both models appeal directly not only to qualitative belief but also to models of belief
change (contraction and revision - see chapter 6).
• Philosophers of science have traditionally appealed to Bayesian theory in order to provide
a Carnapian explication of the notoriously vague, elusive and paradox-prone notion of
confirmation or partial justification in science. Patrick Maher revives in his article, “Probability
Captures the Logic of Scientific Confirmation,” in Contemporary Debates in the Philosophy of
Science, ed. Christopher Hitchcock, Blackwell, 69–93, the Carnapian program of inductive
inference in order to provide one of these explications. In contrast Clark Glymour and Kevin
Kelly argue in their article, “Why Probability Does Not Capture the Logic of Scientific
Justification”, in Christopher Hitchcock, ed., Contemporary Debates in the Philosophy of
Science, London: Blackwell, 2004, that Bayesian confirmation cannot deliver the right kind
of account of the logic of scientific confirmation. One of the reasons for this skepticism is that
they think that scientific justification should reflect how intrinsically difficult is to find the truth
and how efficient one’s methods are at finding it. So, their skepticism arises because they think
that Bayesian confirmation captures neither aspect of scientific justification. While deploying
their arguments the two articles discuss the well-known paradox of confirmation first proposed
by Hempel, Carnap’s research program on the philosophy of probability and induction and the
possible application of learning theory in order to offer a non-Bayesian account of scientific
justification. The article by Glymour and Kelly continues Glymour’s earlier critique of the
applications of Bayesianism in philosophy of science (also reprinted here). This earlier piece
contains the original versions of some influential and much-discussed conundra engendered by
Bayesian confirmation (like the problem of Old Evidence).
Chapter 3
Truth and Probability
Frank P. Ramsey
To say of what is that it is not, or of what is not that it is, is false, while to say of what is
that it is and of what is not that it is not is true.
– Aristotle.
When several hypotheses are presented to our mind which we believe to be mutually
exclusive and exhaustive, but about which we know nothing further, we distribute our belief
equally among them : : : . This being admitted as an account of the way in which we actually
do distribute our belief in simple cases, the whole of the subsequent theory follows as
a deduction of the way in which we must distribute it in complex cases if we would be
consistent.
– W. F. Donkits.
The object of reasoning is to find out, from the consideration of what we already know,
something else which we do not know. Consequently, reasoning is good if it be such as to
give a true conclusion from true premises, and not otherwise.
– C. S. Peirce.
Truth can never be told so as to be understood, and not be believed.
– W. Blake.
Foreword
In this essay the Theory of Probability is taken as a branch of logic, the logic
of partial belief and inconclusive argument; but there is no intention of implying
that this is the only or even the most important aspect of the subject. Probability
is of fundamental importance not only in logic but also in statistical and physical
science, and we cannot be sure beforehand that the most useful interpretation of it
in logic will be appropriate in physics also. Indeed the general difference of opinion
between statisticians who for the most part adopt the frequency theory of probability
and logicians who mostly reject it renders it likely that the two schools are really
discussing different things, and that the word ‘probability’ is used by logicians in
one sense and by statisticians in another. The conclusions we shall come to as to
the meaning of probability in logic must not, therefore, be taken as prejudging its
meaning in physics.
sort of frequency, or because it can only be the subject of logical treatment when it
is grounded on experienced frequencies. Whether these contentions are valid can,
however, only be decided as a result of our investigation into partial belief, so that
I propose to ignore the frequency theory for the present and begin an inquiry into
the logic of partial belief. In this, I think, it will be most convenient if, instead of
straight away developing my own theory, I begin by examining the views of Mr
Keynes, which are so well known and in essentials so widely accepted that readers
probably feel that there is no ground for re-opening the subject de novo until they
have been disposed of.
Mr Keynes1 starts from the supposition that we make probable inferences for which
we claim objective validity; we proceed from full belief in one proposition to partial
belief in another, and we claim that this procedure is objectively right, so that if
another man in similar circumstances entertained a different degree of belief, he
would be wrong in doing so. Mr Keynes accounts for this by supposing that between
any two propositions, taken as premiss and conclusion, there holds one and only one
relation of a certain sort called probability relations; and that if, in any given case,
the relation is that of degree ’, from full belief in the premiss, we should, if we were
rational, proceed to a belief of degree ’ in the conclusion.
Before criticising this view, I may perhaps be allowed to point out an obvious and
easily corrected defect in the statement of it. When it is said that the degree of the
probability relation is the same as the degree of belief which it justifies, it seems
to be presupposed that both probability relations, on the one hand, and degrees
of belief on the other can be naturally expressed in terms of numbers, and then
that the number expressing or measuring the probability relation is the same as that
expressing the appropriate degree of belief. But if, as Mr. Keynes holds, these things
are not always expressible by numbers, then we cannot give his statement that the
degree of the one is the same as the degree of the other such a simple interpretation,
but must suppose him to mean only that there is a one-one correspondence
between probability relations and the degrees of belief which they justify. This
correspondence must clearly preserve the relations of greater and less, and so make
the manifold of probability relations and that of degrees of belief similar in Mr
Russell’s sense. I think it is a pity that Mr Keynes did not see this clearly, because
the exactitude of this correspondence would have provided quite as worthy material
scepticism as did the numerical measurement of probability relations. Indeed some
of his arguments against their numerical measurement appear to apply quite equally
well against their exact correspondence with degrees of belief; for instance, he
argues that if rates of insurance correspond to subjective, i.e. actual, degrees of
1
J.M. Keynes, A Treatise on Probability (1921).
24 F.P. Ramsey
belief, these are not rationally determined, and we cannot infer that probability
relations can be similarly measured. It might be argued that the true conclusion in
such a case was not that, as Mr Keynes thinks, to the non-numerical probability
relation corresponds a non-numerical degree of rational belief, but that degrees
of belief, which were always numerical, did not correspond one to one with the
probability relations justifying them. For it is, I suppose, conceivable that degrees
of belief could be measured by a psychogalvanometer or some such instrument, and
Mr Keynes would hardly wish it to follow that probability relations could all be
derivatively measured with the measures of the beliefs which they justify.
But let us now return to a more fundamental criticism of Mr Keynes’ views,
which is the obvious one that there really do not seem to be any such things as the
probability relations he describes. He supposes that, at any rate in certain cases, they
can be perceived; but speaking for myself I feel confident that this is not true. I do
not perceive them, and if I am to be persuaded that they exist it must be by argument;
moreover I shrewdly suspect that others do not perceive them either, because they
are able to come to so very little agreement as to which of them relates any two
given propositions.
All we appear to know about them are certain general propositions, the laws of
addition and multiplication; it is as if everyone knew the laws of geometry but no
one could tell whether any given object were round or square; and I find it hard
to imagine how so large a body of general knowledge can be combined with so
slender a stock of particular facts. It is true that about some particular cases there
is agreement, but these somehow paradoxically are always immensely complicated;
we all agree that the probability of a coin coming down heads is 21 , but we can none
of us say exactly what is the evidence which forms the other term for the probability
relation about which we are then judging. If, on the other hand, we take the simplest
possible pairs of propositions such as ‘This is red’ and ‘That is blue’ or ‘This is
red’ and ‘That is red’, whose logical relations should surely be easiest to see, no
one, I think, pretends to be sure what is the probability relation which connects
them. Or, perhaps, they may claim to see the relation but they will not be able to
say anything about it with certainty, to state if it is more or less than 31 , or so on.
They may, of course, say that it is incomparable with any numerical relation, but a
relation about which so little can be truly said will be of little scientific use and it
will be hard to convince a sceptic of its existence. Besides this view is really rather
paradoxical; for any believer in induction must admit that between ‘This is red’ as
conclusion and ‘This is round’, together with a billion propositions of the form ‘a
is round and red’ as evidence, there is a finite probability relation; and it is hard
to suppose that as we accumulate instances there is suddenly a point, say after 233
instances, at which the probability relation becomes finite and so comparable with
some numerical relations.
It seems to me that if we take the two propositions ‘a is red’, ‘b is red’, we
cannot really discern more than four simple logical relations between them; namely
identity of form, identity of predicate, diversity of subject, and logical independence
of import. If anyone were to ask me what probability one gave to the other, I should
not try to answer by contemplating the propositions and trying to discern a logical
3 Truth and Probability 25
relation between them, I should, rather, try to imagine that one of them was all that
I knew, and to guess what degree of confidence I should then have in the other. If
I were able to do this, I might no doubt still not be content with it, but might say
‘This is what I should think, but, of course, I am only a fool’ and proceed to consider
what a wise man would think and call that the degree of probability. This kind of
self-criticism I shall discuss later when developing my own theory; all that I want to
remark here is that no one estimating a degree of probability simply contemplates
the two propositions supposed to be related by it; he always considers inter alia his
own actual or hypothetical degree of belief. This remark seems to me to be borne out
by observation of my own behaviour; and to be the only way of accounting for the
fact that we can all give estimates of probability in cases taken from actual life, but
are quite unable to do so in the logically simplest cases in which, were probability a
logical relation, it would be easiest to discern.
Another argument against Mr Keynes’ theory can, I think, be drawn from his
inability to adhere to it consistently even in discussing first principles. There is a
passage in his chapter on the measurement of probabilities which reads as follows: –
Probability is, vide Chapter 11 (§12), relative in a sense to the principles of human reason.
The degree of probability, which it is rational for us to entertain, does not presume perfect
logical insight, and is relative in part to the secondary propositions which we in fact know;
and it is not dependent upon whether more perfect logical insight is or is not conceivable.
It is the degree of probability to which those logical processes lead, of which our minds
are capable; or, in the language of Chapter II, which those secondary propositions justify,
which we in fact know. If we do not take this view of probability, if we do not limit it
in this way and make it, to this extent, relative to human powers, we are altogether adrift
in the unknown; for we cannot ever know what degree of probability would be justified
by the perception of logical relations which we are, and must always be, incapable of
comprehending.2
This passage seems to me quite unreconcilable with the view which Mr Keynes
adopts everywhere except in this and another similar passage. For he generally holds
that the degree of belief which we are justified in placing in the conclusion of an
argument is determined by what relation of probability unites that conclusion to
our premisses, There is only one such relation and consequently only one relevant
true secondary proposition, which, of course, we may or may not know, but which
is necessarily independent of the human mind. If we do not know it, we do not
know it and cannot tell how far we ought to believe the conclusion. But often, he
supposes, we do know it; probability relations are not ones which we are incapable
of comprehending. But on this view of the matter the passage quoted above has no
meaning: the relations which justify probable beliefs are probability relations, and
it is nonsense to speak of them being justified by logical relations which we are, and
must always be, incapable of comprehending. The significance of the passage for
our present purpose lies in the fact that it seems to presuppose a different view of
probability, in which indefinable probability relations play no part, but in which the
degree of rational belief depends on a variety of logical relations. For instance, there
2
p. 32, his italics.
26 F.P. Ramsey
might be between the premiss and conclusion the relation that the premiss was the
logical product of a thousand instances of a generalization of which the conclusion
was one other instance, and this relation, which is not an indefinable probability
relation but definable in terms of ordinary logic and so easily recognizable, might
justify a certain degree of belief in the conclusion on the part of one who believed
the premiss. We should thus have a variety of ordinary logical relations justifying
the same or different degrees of belief. To say that the probability of a given h was
such-and-such would mean that between a and h was some relation justifying such-
and-such a degree of belief. And on this view it would be a real point that the relation
in question must not be one which the human mind is incapable of comprehending.
This second view of probability as depending on logical relations but not itself
a new logical relation seems to me more plausible than Mr Keynes’ usual theory;
but this does not mean that I feel at all inclined to agree with it. It requires the
somewhat obscure idea of a logical relation justifying a degree of belief, which I
should not like to accept as indefinable because it does not seem to be at all a clear
or simple notion. Also it is hard to say what logical relations justify what degrees of
belief, and why; any decision as to this would be arbitrary, and would lead to a logic
of probability consisting of a host of so-called ‘necessary’ facts, like formal logic
on Mr Chadwick’s view of logical constants.3 Whereas I think it far better to seek
an explanation of this ‘necessity’ after the model of the work of Mr Wittgenstein,
which enables us to see clearly in what precise sense and why logical propositions
are necessary, and in a general way why the system of formal logic consists of
the propositions it does consist of, and what is their common characteristic. Just as
natural science tries to explain and account for the facts of nature, so philosophy
should try, in a sense, to explain and account for the facts of logic; a task ignored
by the philosophy which dismisses these facts as being unaccountably and in an
indefinable sense ‘necessary’.
Here I propose to conclude this criticism of Mr Keynes’ theory, not because there
are not other respects in which it seems open to objection, but because I hope that
what I have already said is enough to show that it is not so completely satisfactory as
to render futile any attempt to treat the subject from a rather different point of view.
Degrees of Belief
The subject of our inquiry is the logic of partial belief, and I do not think we can
carry it far unless we have at least an approximate notion of what partial belief is,
and how, if at all, it can be measured. It will not be very enlightening to be told
that in such circumstances it would be rational to believe a proposition to the extent
of 32 , unless we know what sort of a belief in it that means. We must therefore try
to develop a purely psychological method of measuring belief. It is not enough to
3
“Logical Constants”, Mind, 1927.
3 Truth and Probability 27
But to construct such an ordered series of degrees is not the whole of our task; we
have also to assign numbers to these degrees in some intelligible manner. We can of
course easily explain that we denote full belief by 1, full belief in the contradictory
by 0, and equal beliefs in the proposition and its contradictory by 12 . But it is not so
easy to say what is meant by a belief 32 of certainty, or a belief in the proposition
being twice as strong as that in its contradictory. This is the harder part of the task,
but it is absolutely necessary; for we do calculate numerical probabilities, and if
they are to correspond to degrees of belief we must discover some definite way
of attaching numbers to degrees of belief. In physics we often attach numbers by
discovering a physical process of addition4: the measure-numbers of lengths are not
assigned arbitrarily subject only to the proviso that the greater length shall have
the greater measure; we determine them further by deciding on a physical meaning
for addition; the length got by putting together two given lengths must have for its
measure the sum of their measures. A system of measurement in which there is
nothing corresponding to this is immediately recognized as arbitrary, for instance
Mohs’ scale of hardness5 in which 10 is arbitrarily assigned to diamond, the hardest
known material, 9 to the next hardest, and so on. We have therefore to find a process
of addition for degrees of belief, or some substitute for this which will be equally
adequate to determine a numerical scale.
Such is our problem; how are we to solve it? There are, I think, two ways in
which we can begin. We can, in the first place, suppose that the degree of a belief is
something perceptible by its owner; for instance that beliefs differ in the intensity of
a feeling by which they are accompanied, which might be called a belief-feeling or
feeling of conviction, and that by the degree of belief we mean the intensity of this
feeling. This view would be very inconvenient, for it is not easy to ascribe numbers
to the intensities of feelings; but apart from this it seems to me observably false, for
the beliefs which we hold most strongly are often accompanied by practically no
feeling at all; no one feels strongly about things he takes for granted.
We are driven therefore to the second supposition that the degree of a belief is
a causal property of it, which we can express vaguely as the extent to which we
are prepared to act on it. This is a generalization of the well-known view, that the
differentia of belief lies in its causal efficacy, which is discussed by Mr Russell
in his Analysis of Mind. He there dismisses it for two reasons, one of which seems
entirely to miss the point. He argues that in the course of trains of thought we believe
many things which do not lead to action. This objection is however beside the mark,
because it is not asserted that a belief is an idea which does actually lead to action,
but one which would lead to action in suitable circumstances; just as a lump of
arsenic is called poisonous not because it actually has killed or will kill anyone, but
because it would kill anyone if he ate it. Mr Russell’s second argument is, however,
more formidable. He points out that it is not possible to suppose that beliefs differ
from other ideas only in their effects, for if they were otherwise identical their effects
4
See N. Campbell, Physics The Elements (1920), p.277.
5
Ibid., p.271.
3 Truth and Probability 29
would be identical also. This is perfectly true, but it may still remain the case that
the nature of the difference between the causes is entirely unknown or very vaguely
known, and that what we want to talk about is the difference between the effects,
which is readily observable and important.
As soon as we regard belief quantitatively, this seems to me the only view we
can take of it. It could well be held that the difference between believing and not
believing lies in the presence or absence of introspectible feelings. But when we
seek to know what is the difference between believing more firmly and believing
less firmly, we can no longer regard it as consisting in having more or less of certain
observable feelings; at least I personally cannot recognize any such feelings. The
difference seems to me to lie in how far we should act on these beliefs: this may
depend on the degree of some feeling or feelings, but I do not know exactly what
feelings and I do not see that it is indispensable that we should know. Just the same
thing is found in physics; men found that a wire connecting plates of zinc and copper
standing in acid deflected a magnetic needle in its neighbourhood. Accordingly as
the needle was more or less deflected the wire was said to carry a larger or a smaller
current. The nature of this ‘current’ could only be conjectured: what were observed
and measured were simply its effects. It will no doubt be objected that we know
how strongly we believe things, and that we can only know this if we can measure
our belief by introspection. This does not seem to me necessarily true; in many
cases, I think, our judgment about the strength of our belief is really about how
we should act in hypothetical circumstances. It will be answered that we can only
tell how we should act by observing the present belief-feeling which determines
how we should act; but again I doubt the cogency of the argument. It is possible
that what determines how we should act determines us also directly or indirectly
to have a correct opinion as to how we should act, without its ever coming into
consciousness.
Suppose, however, I am wrong about this and that we can decide by introspection
the nature of belief, and measure its degree; still, I shall argue, the kind of measure-
ment of belief with which probability is concerned is not this kind but is a measure-
ment of belief qua basis of action. This can I think be shown in two ways. First, by
considering the scale of probabilities between 0 and 1, and the sort of way we use
it, we shall find that it is very appropriate to the measurement of belief as a basis of
action, but in no way related to the measurement of an introspected feeling. For the
units in terms of which such feelings or sensations are measured are always, I think,
differences which are just perceptible: there is no other way of obtaining units. But
I see no ground for supposing that the interval between a belief of degree 31 and one
of degree 21 consists of as many just perceptible changes as does that between one of
2
3
and one of 56 , or that a scale based on just perceptible differences would have any
simple relation to the theory of probability. On the other hand the probability of 13
is clearly related to the kind of belief which would lead to a bet of 2 to 1, and it will
be shown below how to generalize this relation so as to apply to action in general.
Secondly, the quantitative aspects of beliefs as the basis of action are evidently more
important than the intensities of belief-feelings. The latter are no doubt interesting,
30 F.P. Ramsey
but may be very variable from individual to individual, and their practical interest is
entirely due to their position as the hypothetical causes of beliefs qua bases of action.
It is possible that some one will say that the extent to which we should act on a
belief in suitable circumstances is a hypothetical thing, and therefore not capable of
measurement. But to say this is merely to reveal ignorance of the physical sciences
which constantly deal with and measure hypothetical quantities; for instance, the
electric intensity at a given point is the force which would act on a unit charge if it
were placed at the point.
Let us now try to find a method of measuring beliefs as bases of possible actions.
It is clear that we are concerned with dispositional rather than with actualized
beliefs; that is to say, not with beliefs at the moment when we are thinking of them,
but with beliefs like my belief that the earth is round, which I rarely think of, but
which would guide my action in any case to which it was relevant.
The old-established way of measuring a person’s belief is to propose a bet,
and see what are the lowest odds which he will accept. This method I regard as
fundamentally sound; but it suffers from being insufficiently general, and from being
necessarily inexact. It is inexact partly because of the diminishing marginal utility
of money, partly because the person may have a special eagerness or reluctance to
bet, because he either enjoys or dislikes excitement or for any other reason, e.g.
to make a book. The difficulty is like that of separating two different co-operating
forces. Besides, the proposal of a bet may inevitably alter his state of opinion; just as
we could not always measure electric intensity by actually introducing a charge and
seeing what force it was subject to, because the introduction of the charge would
change the distribution to be measured.
In order therefore to construct a theory of quantities of belief which shall be
both general and more exact, I propose to take as a basis a general psychological
theory, which is now universally discarded, but nevertheless comes, I think, fairly
close to the truth in the sort of cases with which we are most concerned. I mean
the theory that we act in the way we think most likely to realize the objects of our
desires, so that a person’s actions are completely determined by his desires and
opinions. This theory cannot be made adequate to all the facts, but it seems to me
a useful approximation to the truth particularly in the case of our self-conscious
or professional life, and it is presupposed in a great deal of our thought. It is a
simple theory and one which many psychologists would obviously like to preserve
by introducing unconscious desires and unconscious opinions in order to bring it
more into harmony with the facts. How far such fictions can achieve the required
result I do not attempt to judge: I only claim for what follows approximate truth,
or truth in relation to this artificial system of psychology, which like Newtonian
mechanics can, I think, still be profitably used even though it is known to be false.
It must be observed that this theory is not to be identified with the psychology of
the Utilitarians, in which pleasure had a dominating position. The theory I propose
to adopt is that we seek things which we want, which may be our own or other
people’s pleasure, or anything else whatever, and our actions are such as we think
most likely to realize these goods. But this is not a precise statement, for a precise
statement of the theory can only be made after we have introduced the notion of
quantity of belief.
3 Truth and Probability 31
Let us call the things a person ultimately desires ‘goods’, and let us at first
assume that they are numerically measurable and additive. That is to say that if
he prefers for its own sake an hour’s swimming to an hour’s reading, he will prefer
two hours’ swimming to one hour’s swimming and one hour’s reading. This is of
course absurd in the given case but this may only be because swimming and reading
are not ultimate goods, and because we cannot imagine a second hour’s swimming
precisely similar to the first, owing to fatigue, etc.
Let us begin by supposing that our subject has no doubts about anything, but
certain opinions about all propositions. Then we can say that he will always choose
the course of action which will lead in his opinion to the greatest sum of good.
It should be emphasized that in this essay good and bad are never to be
understood in any ethical sense but simply as denoting that to which a given person
feels desire and aversion.
The question then arises how we are to modify this simple system to take account
of varying degrees of certainty in his beliefs. I suggest that we introduce as a law
of psychology that his behaviour is governed by what is called the mathematical
expectation; that is to say that, if p is a proposition about which he is doubtful,
any goods or bads for whose realization p is in his view a necessary and sufficient
condition enter into his calculations multiplied by the same fraction, which is called
the ‘degree of his belief in p’. We thus define degree of belief in a way which
presupposes the use of the mathematical expectation.
We can put this in a different way. Suppose his degree of belief in p is mn ; then his
action is such as he would choose it to be if he had to repeat it exactly n times, in m
of which p was true, and in the others false. [Here it may be necessary to suppose
that in each of the n times he had no memory of the previous ones.]
This can also be taken as a definition of the degree of belief, and can easily be
seen to be equivalent to the previous definition. Let us give an instance of the sort
of case which might occur. I am at a cross-roads and do not know the way; but I
rather think one of the two ways is right. I propose therefore to go that way but keep
my eyes open for someone to ask; if now I see someone half a mile away over the
fields, whether I turn aside to ask him will depend on the relative inconvenience of
going out of my way to cross the fields or of continuing on the wrong road if it is
the wrong road. But it will also depend on how confident I am that I am right; and
clearly the more confident I am of this the less distance I should be willing to go
from the road to check my opinion. I propose therefore to use the distance I would
be prepared to go to ask, as a measure of the confidence of my opinion; and what I
have said above explains how this is to be done. We can set it out as follows: suppose
the disadvantage of going x yards to ask is f (x), the advantage of arriving at the right
destination is r, that of arriving at the wrong one w. Then if I should just be willing to
go a distance d to ask, the degree of my belief that I am on the right road is given by
f .d/
pD1 :
r w
For such an action is one it would just pay me to take, if I had to act in the same
way n times, in np of which I was on the right way but in the others not.
32 F.P. Ramsey
For the total good resulting from not asking each time
D npr C n .1 p/ w
D nw C np .r w/
f .x/ < .r w/ .1 p/ ;
) the critical distance d is connected with p, the degree of belief, by the relation
f (d) D (r w)(1 p)
f .d/
or p D 1 as asserted above:
r w
It is easy to see that this way of measuring belief gives results agreeing with ordinary
ideas; at any rate to the extent that full belief is denoted by 1, full belief in the contra-
dictory by 0, and equal belief in the two by 21 . Further, it allows validity to betting as
means of measuring beliefs. By proposing a bet on p we give the subject a possible
course of action from which so much extra good will result to him if p is true and so
much extra bad if p is false. Supposing, the bet to be in goods and bads instead of in
money, he will take a bet at any better odds than those corresponding to his state of
belief; in fact his state of belief is measured by the odds he will just take; but this is
vitiated, as already explained, by love or hatred of excitement, and by the fact that
the bet is in money and not in goods and bads. Since it is universally agreed that
money has a diminishing marginal utility, if money bets are to be used, it is evident
that they should be for as small stakes as possible. But then again the measurement
is spoiled by introducing the new factor of reluctance to bother about trifles.
Let us now discard the assumption that goods are additive and immediately
measurable, and try to work out a system with as few assumptions as possible. To
begin with we shall suppose, as before, that our subject has certain beliefs about
everything; then he will act so that what he believes to be the total consequences
of his action will be the best possible. If then we had the power of the Almighty,
and could persuade our subject of our power, we could, by offering him options,
discover how he placed in order of merit all possible courses of the world. In this
way all possible worlds would be put in an order of value, but we should have
no definite way of representing them by numbers. There would be no meaning
in the assertion that the difference in value between ’ and “ was equal to that
between ” and •. [Here and elsewhere we use Greek letters to represent the different
possible totalities of events between which our subject chooses – the ultimate
organic unities.]
3 Truth and Probability 33
Suppose next that the subject is capable of doubt; then we could test his degree of
belief in different propositions by making him offers of the following kind. Would
you rather have world ’ in any event; or world “ if p is true, and world ” if p is false?
If, then, he were certain that p was true, simply compare ’ and “ and choose between
them as if no conditions were attached; but if he were doubtful his choice would not
be decided so simply. I propose to lay down axioms and definitions concerning
the principles governing choices of this kind. This is, of course, a very schematic
version of the situation in real life, but it is, I think, easier to consider it in this
form.
There is first a difficulty which must be dealt with; the propositions like p in
the above case which are used as conditions in the options offered may be such
that their truth or falsity is an object of desire to the subject. This will be found
to complicate the problem, and we have to assume that there are propositions for
which this is not the case, which we shall call ethically neutral. More precisely
an atomic proposition p is called ethically neutral if two possible worlds differing
only in regard to the truth of p are always of equal value; and a non-atomic
proposition p is called ethically neutral if all its atomic truth-arguments6 are
ethically neutral.
We begin by defining belief of degree 12 in an ethically neutral proposition. The
subject is said to have belief of degree 21 in such a proposition p if he has no
preference between the options (1) ’ if p is true, “ if p is false, and (2) ’ if p
is false, “ if p is true, but has a preference between ’ and “ simply. We suppose
by an axiom that if this is true of any one pair ’, “, it is true of all such pairs.7
This comes roughly to defining belief of degree 21 as such a degree of belief as
leads to indifference between betting one way and betting the other for the same
stakes.
Belief of degree 21 as thus defined can be used to measure values numerically
in the following way. We have to explain what is meant by the difference in value
between ’ and “ being equal to that between ” and •; and we define this to mean
that, if p is an ethically neutral proposition believed to degree 21 , the subject has no
preference between the options (1) ’ if p is true, • if p is false, and (2) “ if p is true,
” if p is false.
This definition can form the basis of a system of measuring values in the
following way:–
Let us call any set of all worlds equally preferable to a given world a value: we
suppose that if world ’ is preferable to “ any world with the same value as ’ is
preferable to any world with the same value as “ and shall say that the value of ’ is
greater than that of “. This relation ‘greater than’ orders values in a series. We shall
use ’ henceforth both for the world and its value.
6
I assume here Wittgenstein’s theory of propositions; it would probably be possible to give an
equivalent definition in terms of any other theory.
7
’ and “ must be supposed so far undefined as to be compatible with both p and not-p.
34 F.P. Ramsey
Axioms
(1) There is an ethically neutral proposition p believed to degree 12 .
(2) If p, q are such propositions and the option
’ if p, • if not-p is equivalent to “ if p, ” if not-p
then ’ if q, • if not-q is equivalent to “ if q, ” if not-q.
Def. In the above case we say ’“ D ”•.
Theorems If ’“ D ”•,
then “’ D •”, ’” D “•, ”’ D •“.
(2a) If ’“ D ”•, then ’ > “ is equivalent to ” > •
and ’ D “ is equivalent to ” D •.
(3) If option A is equivalent to option B and B to C, then A to C.
Theorem If ’“ D ”• and “˜ D —”,
then ’˜ D —•.
(4) If ’“ D ”•, ”• D ˜—, then ’“ D ˜—.
(5) (’, “, ”). E!(šx) (’x D “”)
(6) (’, “). E!(šx) (’x D x“)
(7) Axiom of continuity: – Any progression has a limit (ordinal).
(8) Axiom of Archimedes.
These axioms enable the values to be correlated one-one with real numbers so
that if ’1 corresponds to ’, etc.
’“ D ”•: :’1 “1 D ” 1 •1 :
8
Here “ must include the truth of p, ” its falsity; p need no longer be ethically neutral. But we have
to assume that there is a wolrd with any assigned value in which p is true, and one in which p is
false.
3 Truth and Probability 35
It roughly expresses the odds at which he would now bet on p, the bet only to be
valid if q is true. Such conditional bets were often made in the eighteenth century.
The degree of belief in p given q is measured thus. Suppose the subject indifferent
between the options (1) ’ if q true, “ if q false, (2) ” if p true and q true, • if p false
and q true, “ if q false. Then the degree of his belief in p given q is the ratio of the
difference between ’ and • to that between ” and •, which we must suppose the
same for any ’, “, ”, • which satisfy the given conditions. This is not the same as
the degree to which he would believe p, if he believed q for certain; for knowledge
of q might for psychological reasons profoundly alter his whole system of beliefs.
Each of our definitions has been accompanied by an axiom of consistency, and
in so far as this is false, the notion of the corresponding degree of belief becomes
invalid. This bears some analogy to the situation in regard to simultaneity discussed
above.
I have not worked out the mathematical logic of this in detail, because this would,
I think, be rather like working out to seven places of decimals a result only valid to
two. My logic cannot be regarded as giving more than the sort of way it might work.
From these definitions and axioms it is possible to prove the fundamental laws
of probable belief (degrees of belief lie between 0 and 1):
(1) Degree of belief in p C degree of belief in p D 1
(2) Degree of belief in p given q C degree of belief in p given q D 1.
(3) Degree of belief in (p and q) D degree of belief in p degree of belief in q given
p.
(4) Degree of belief in (p and q) C degree of belief in (p and q) D degree of belief
in p.
The first two are immediate. (3) is proved as follows.
Let degree of belief in p D x, that in q given p D y.
Then for certain C (1 x)t if p true, xt if p false for any t.
C .1 x/ t if p true
C .1 x/ t C .1 y/ u if ‘p and q’ true;
C .1 x/ t yu if p true q falseI for any u:
xt
) degree of belief in ‘p and q’ D tC.1 y/t=y D xy: .t ¤ 0/
If y D 0, take t D 0.
36 F.P. Ramsey
) degree of belief in pq D 0.
Thirdly, nothing has been said about degrees of belief when the number of
alternatives is infinite. About this I have nothing useful to say, except that I doubt
if the mind is capable of contemplating more than a finite number of alternatives.
It can consider questions to which an infinite number of answers are possible, but
in order to consider the answers it must lump them into a finite number of groups.
The difficulty becomes practically relevant when discussing induction, but even then
there seems to me no need to introduce it. We can discuss whether past experience
gives a high probability to the sun’s rising to-morrow without bothering about what
probability it gives to the sun’s rising each morning for evermore. For this reason I
cannot but feel that Mr Ritchie’s discussion of the problem9 is unsatisfactory; it is
true that we can agree that inductive generalizations need have no finite probability,
but particular expectations entertained on inductive grounds undoubtedly do have
a high numerical probability in the minds of all of us. We all are more certain that
the sun will rise to-morrow than that I shall not throw 12 with two dice first time,
i.e. we have a belief of higher degree than 3536 in it. If induction ever needs a logical
justification it is in connection with the probability of an event like this.
We may agree that in some sense it is the business of logic to tell us what we ought
to think; but the interpretation of this statement raises considerable difficulties. It
may be said that we ought to think what is true, but in that sense we are told what
to think by the whole of science and not merely by logic. Nor, in this sense, can
any justification be found for partial belief; the ideally best thing is that we should
have beliefs of degree 1 in all true propositions and beliefs of degree 0 in all false
propositions. But this is too high a standard to expect of mortal men, and we must
agree that some degree of doubt or even of error may be humanly speaking justified.
Many logicians, I suppose, would accept as an account of their science the
opening words of Mr Keynes’ Treatise on Probability: “Part of our knowledge we
obtain direct; and part by argument. The Theory of Probability is concerned with
that part which we obtain by argument, and it treats of the different degrees in which
the results so obtained are conclusive or inconclusive.” Where Mr Keynes says ‘the
Theory of Probability’, others would say Logic. It is held, that is to say, that our
opinions can be divided into those we hold immediately as a result of perception or
9
A. D. Ritchie, “Induction and Probability.” Mind, 1926. p. 318. ‘The conclusion of the foregoing
discussion may be simply put. If the problem of induction be stated to be “How can inductive
generalizations acquire a large numerical probability?” then this is a pseudo-problem, because the
answer is “They cannot”. This answer is not, however, a denial of the validity of induction but
is a direct consequence of the nature of probability. It still leaves untouched the real problem of
induction which is “How can the probability of an induction be increased?” and it leaves standing
the whole of Keynes’ discussion on this point.’
38 F.P. Ramsey
memory, and those which we derive from the former by argument. It is the business
of Logic to accept the former class and criticize merely the derivation of the second
class from them.
Logic as the science of argument and inference is traditionally and rightly divided
into deductive and inductive; but the difference and relation between these two
divisions of the subject can be conceived in extremely different ways. According
to Mr Keynes valid deductive and inductive arguments are fundamentally alike;
both are justified by logical relations between premiss and conclusion which differ
only in degree. This position, as I have already explained, I cannot accept. I do not
see what these inconclusive logical relations can be or how they can justify partial
beliefs. In the case of conclusive logical arguments I can accept the account of their
validity which has been given by many authorities, and can be found substantially
the same in Kant, De Morgan, Peirce and Wittgenstein. All these authors agree that
the conclusion of a formally valid argument is contained in its premisses; that to
deny the conclusion while accepting the premisses would be self-contradictory; that
a formal deduction does not increase our knowledge, but only brings out clearly
what we already know in another form; and that we are bound to accept its validity
on pain of being inconsistent with ourselves. The logical relation which justifies the
inference is that the sense or import of the conclusion is contained in that of the
premisses.
But in the case of an inductive argument this does not happen in the least;
it is impossible to represent it as resembling a deductive argument and merely
weaker in degree; it is absurd to say that the sense of the conclusion is partially
contained in that of the premisses. We could accept the premisses and utterly reject
the conclusion without any sort of inconsistency or contradiction.
It seems to me, therefore, that we can divide arguments into two radically
different kinds, which we can distinguish in the words of Peirce as (1) ‘explicative,
analytic, or deductive’ and (2) ‘amplifiative, synthetic, or (loosely speaking) induc-
tive’.10 Arguments of the second type are from an important point of view much
closer to memories and perceptions than to deductive arguments. We can regard
perception, memory and induction as the three fundamental ways of acquiring
knowledge; deduction on the other hand is merely a method of arranging our
knowledge and eliminating inconsistencies or contradictions.
Logic must then fall very definitely into two parts: (excluding analytic logic, the
theory of terms and propositions) we have the lesser logic, which is the logic of
consistency, or formal logic; and the larger logic, which is the logic of discovery, or
inductive logic.
What we have now to observe is that this distinction in no way coincides with
the distinction between certain and partial beliefs; we have seen that there is a
theory of consistency in partial beliefs just as much as of consistency in certain
beliefs, although for various reasons the former is not so important as the latter. The
theory of probability is in fact a generalization of formal logic; but in the process
10
C.S. Peirce Change Love and Logic, p. 92.
3 Truth and Probability 39
gives us a clear justification for the axioms of the calculus, which on such a system
as Mr Keynes’ is entirely wanting. For now it is easily seen that if partial beliefs
are consistent they will obey these axioms, but it is utterly obscure why Mr Keynes’
mysterious logical relations should obey them.11 We should be so curiously ignorant
of the instances of these relations, and so curiously knowledgeable about their
general laws.
Secondly, the Principle of Indifference can now be altogether dispensed with;
we do not regard it as belonging to formal logic to say what should be a man’s
expectation of drawing a white or a black ball from an urn; his original expectations
may within the limits of consistency be any he likes; all we have to point out is
that if he has certain expectations he is bound in consistency to have certain others.
This is simply bringing probability into line with ordinary formal logic, which does
not criticize premisses but merely declares that certain conclusions are the only
ones consistent with them. To be able to turn the Principle of Indifference out of
formal logic is a great advantage; for it is fairly clearly impossible to lay down
purely logical conditions for its validity, as is attempted by Mr Keynes. I do not
want to discuss this question in detail, because it leads to hair-splitting and arbitrary
distinctions which could be discussed for ever. But anyone who tries to decide by
Mr Keynes’ methods what are the proper alternatives to regard as equally probable
in molecular mechanics, e.g. in Gibbs’ phase-space, will soon be convinced that it is
a matter of physics rather than pure logic. By using the multiplication formula, as it
is used in inverse probability, we can on Mr Keynes’ theory reduce all probabilities
to quotients of a priori probabilities; it is therefore in regard to these latter that the
Principle of Indifference is of primary importance; but here the question is obviously
not one of formal logic. How can we on merely logical grounds divide the spectrum
into equally probable bands?
A third difficulty which is removed by our theory is the one which is presented to
Mr Keynes’ theory by the following case. I think I perceive or remember something
but am not sure; this would seem to give me some ground for believing it, contrary to
Mr Keynes’ theory, by which the degree belief in it which it would be rational for me
to have is that given by the probability relation between the proposition in question
and the things I know for certain. He cannot justify a probable belief founded not on
argument but on direct inspection. In our view there would be nothing contrary to
formal logic in such a belief; whether it would be reasonable would depend on what
I have called the larger logic which will be the subject of the next section; we shall
there see that there is no objection to such a possibility, with which Mr Keynes’
method of justifying probable belief solely by relation to certain knowledge is quite
unable to cope.
11
It appears in Mr Keynes’ system as if the principal axioms – the laws of addition and
multiplication – were nothing but definitions. This is merely a logical mistake; his definitions are
formally invalid unless corresponding axioms are presupposed. Thus his definition of multiplica-
tion presupposes the law that if the probability of a given bh is equal to that of c given dh, and the
probability of b given h is equal to that of d given h, then will the probabilities of ab given h and
of cd given h be equal.
3 Truth and Probability 41
The validity of the distinction between the logic of consistency and the logic of
truth has been often disputed; it has been contended on the one hand that logical
consistency is only a kind of factual consistency; that if a belief in p is inconsistent
with one in q, that simply means that p and q are not both true, and that this
is a necessary or logical fact. I believe myself that this difficulty can be met by
Wittgenstein’s theory of tautology, according to which if a belief in p is inconsistent
with one in q, that p and q are not both true is not a fact but a tautology. But I do not
propose to discuss this question further here.
From the other side it is contended that formal logic or the logic of consistency is
the whole of logic, and inductive logic either nonsense or part of natural science.
This contention, which would I suppose be made by Wittgenstein, I feel more
difficulty in meeting. But I think it would be a pity, out of deference to authority, to
give up trying to say anything useful about induction.
Let us therefore go back to the general conception of logic as the science of
rational thought. We found that the most generally accepted parts of logic, namely,
formal logic, mathematics and the calculus of probabilities, are all concerned simply
to ensure that our beliefs are not self-contradictory. We put before ourselves the
standard of consistency and construct these elaborate rules to ensure its observance.
But this is obviously not enough; we want our beliefs to be consistent not merely
with one another but also with the facts12 : nor is it even clear that consistency is
always advantageous; it may well be better to be sometimes right than never right.
Nor when we wish to be consistent are we always able to be: there are mathematical
propositions whose truth or falsity cannot as yet be decided. Yet it may humanly
speaking be right to entertain a certain degree of belief in them on inductive or other
grounds: a logic which proposes to justify such a degree of belief must be prepared
actually to go against formal logic; for to a formal truth formal logic can only assign
a belief of degree 1. We could prove in Mr Keynes’ system that its probability is
1 on any evidence. This point seems to me to show particularly clearly that human
logic or the logic of truth, which tells men how they should think, is not merely
independent of but sometimes actually incompatible with formal logic.
In spite of this nearly all philosophical thought about human logic and especially
induction has tried to reduce it in some way to formal logic. Not that it is supposed,
except by a very few, that consistency will of itself lead to truth; but consistency
combined with observation and memory is frequently credited with this power.
Since an observation changes (in degree at least) my opinion about the fact
observed, some of my degrees of belief after the observation are necessarily
12
Cf. Kant: ‘Denn obgleich eine Erkenntnis der logischen Form völlig gemäss sein möchte, dass
ist sich selbst nicht widerspräche, so kann sie doch noch immer dem Gegenstande widersprechen.’
Kritik der reinen Vernunft, First Edition. p. 59.
42 F.P. Ramsey
inconsistent with those I had before. We have therefore to explain how exactly the
observation should modify my degrees of belief; obviously if p is the fact observed,
my degree of belief in q after the observation should be equal to my degree of belief
in q given p before, or by the multiplication law to the quotient of my degree of
belief in pq by my degree of belief in p. When my degrees of belief change in this
way we can say that they have been changed consistently by my observation.
By using this definition, or on Mr Keynes’ system simply by using the multi-
plication law, we can take my present degrees of belief, and by considering the
totality of my observations, discover from what initial degrees of belief my present
ones would have arisen by this process of consistent change. My present degrees of
belief can then be considered logically justified if the corresponding initial degrees
of belief are logically justified. But to ask what initial degrees of belief are justified,
or in Mr Keynes’ system what are the absolutely a priori probabilities, seems to me
a meaningless question; and even if it had a meaning I do not see how it could be
answered.
If we actually applied this process to a human being, found out, that is to say, on
what a prior probabilities his present opinions could be based, we should obviously
find them to be ones determined by natural selection, with a general tendency to
give a higher probability to the simpler alternatives. But, as I say, I cannot see
what could be meant by asking whether these degrees of belief were logically
justified. Obviously the best thing would be to know for certain in advance what
was true and what false, and therefore if any one system of initial beliefs is to
receive the philosopher’s approbation it should be this one. But clearly this would
not be accepted by thinkers of the school I am criticising. Another alternative
is to apportion initial probabilities on the purely formal system expounded by
Wittgenstein, but as this gives no justification for induction it cannot give us the
human logic which we are looking for.
Let us therefore try to get an idea of a human logic which shall not attempt to
be reducible to formal logic. Logic, we may agree, is concerned not with what men
actually believe, but what they ought to believe, or what it would be reasonable to
believe. What then, we must ask, is meant by saying that it is reasonable for a man
to have such and such a degree of belief in a proposition? Let us consider possible
alternatives.
First, it sometimes means something explicable in terms of formal logic: this
possibility for reasons already explained we may dismiss. Secondly, it sometimes
means simply that were I in his place (and not e.g. drunk) I should have such a
degree of belief. Thirdly, it sometimes means that if his mind worked according to
certain rules, which we may roughly call ‘scientific method’, he would have such
a degree of belief. But fourthly it need mean none of these things for men have
not always believed in scientific method, and just as we ask ‘But am I necessarily
reasonable,’ we can also ask ‘But is the scientist necessarily reasonable?’ In this
ultimate meaning it seems to me that we can identify reasonable opinion with the
opinion of an ideal person in similar circumstances. What, however, would this ideal
person’s opinion be? As has previously been remarked, the highest ideal would be
3 Truth and Probability 43
always to have a true opinion and be certain of it; but this ideal is more suited to
God than to man.13
We have therefore to consider the human mind and what is the most we can
ask of it.14 The human mind works essentially according to general rules or habits;
a process of thought not proceeding according to some rule would simply be a
random sequence of ideas; whenever we infer A from B we do so in virtue of some
relation between them. We can therefore state the problem of the ideal as “What
habits in a general sense would it be best for the human mind to have?” This is a
large and vague question which could hardly be answered unless the possibilities
were first limited by a fairly definite conception of human nature. We could imagine
some very useful habits unlike those possessed by any men. [It must be explained
that I use habit in the most general possible sense to mean simply rule or law of
behaviour, including instinct: I do not wish to distinguish acquired rules or habits in
the narrow sense from innate rules or instincts, but propose to call them all habits
alike.] A completely general criticism of the human mind is therefore bound to
be vague and futile, but something useful can be said if we limit the subject in
the following way.
Let us take a habit of forming opinion in a certain way; e.g. the habit of
proceeding from the opinion that a toadstool is yellow to the opinion that it is
unwholesome. Then we can accept the fact that the person has a habit of this sort,
and ask merely what degree of opinion that the toadstool is unwholesome it would
be best for him to entertain when he sees it; i.e. granting that he is going to think
always in the same way about all yellow toadstools, we can ask what degree of
confidence it would be best for him to have that they are unwholesome. And the
answer is that it will in general be best for his degree of belief that a yellow toadstool
is unwholesome to be equal to the proportion of yellow toadstools which are in fact
13
[Earlier draft of matter of preceding paragraph in some ways better. – F.P.R.
What is meant by saying that a degree of belief is reasonable? First and often that it is what I
should entertain if I had the opinions of the person in question at the time but was otherwise as I
am now, e.g. not drunk. But sometimes we go beyond this and ask: ‘Am I reasonable?’ This may
mean, do I conform to certain enumerable standards which we call scientific method, and which
we value on account of those who practise them and the success they achieve. In this sense to be
reasonable means to think like a scientist, or to be guided only by ratiocination and induction or
something of the sort (i.e. reasonable means reflective). Thirdly, we may go to the root of why we
admire the scientist and criticize not primarily an individual opinion but a mental habit as being
conducive or otherwise to the discovery of truth or to entertaining such degrees of belief as will
be most useful. (To include habits of doubt or partial belief.) Then we can criticize an opinion
according to the habit which produced it. This is clearly right because it all depends on this habit;
it would not be reasonable to get the right conclusion to a syllogism by remembering vaguely that
you leave out a term which is common to both premisses.
We use reasonable in sense 1 when we say of an argument of a scientist this does not seem to
me reasonable; in sense 2 when we contrast reason and superstition or instinct; in sense 3 when
we estimate the value of new methods of thought such as soothsaying.].
14
What follows to the end of the section is almost entirely based on the writings of C. S. Peirce.
[Especially his “Illustrations of the Logic of Science”, Popular Science Monthly, 1877 and 1878,
reprinted in Chance Love and Logic (1923).].
44 F.P. Ramsey
unwholesome. (This follows from the meaning of degree of belief.) This conclusion
is necessarily vague in regard to the spatio-temporal range of toadstools which it
includes, but hardly vaguer than the question which it answers. (Cf. density at a
point of gas composed of molecules.)
Let us put it in another way: whenever I make an inference, I do so according
to some rule or habit. An inference is not completely given when we are given the
premiss and conclusion; we require also to be given the relation between them in
virtue of which the inference is made. The mind works by general laws; therefore
if it infers q from p, this will generally be because q is an instance of a function ®x
and p the corresponding instance of a function §x such that the mind would always
infer ®x from §x. When therefore we criticize not opinions but the processes by
which they are formed, the rule of the inference determines for us a range to which
the frequency theory can be applied. The rule of the inference may be narrow, as
when seeing lightning I expect thunder, or wide, as when considering 99 instances
of a generalization which I have observed to be true I conclude that the 100th is
true also. In the first case the habit which determines the process is ‘After lightning
expect thunder’; the degree of expectation which it would be best for this habit to
produce is equal to the proportion of cases of lightning which are actually followed
by thunder. In the second case the habit is the more general one of inferring from 99
observed instances of a certain sort of generalization that the 100th instance is true
also; the degree of belief it would be best for this habit to produce is equal to the
proportion of all cases of 99 instances of a generalization being true, in which the
100th is true also.
Thus given a single opinion, we can only praise or blame it on the ground of truth
or falsity: given a habit of a certain form, we can praise or blame it accordingly as
the degree of belief it produces is near or far from the actual proportion in which
the habit leads to truth. We can then praise or blame opinions derivatively from our
praise or blame of the habits that produce them.
This account can be applied not only to habits of inference but also to habits
of observation and memory; when we have a certain feeling in connection with an
image we think the image represents something which actually happened to us, but
we may not be sure about it; the degree of direct confidence in our memory varies.
If we ask what is the best degree of confidence to place in a certain specific memory
feeling, the answer must depend on how often when that feeling occurs the event
whose image it attaches to has actually taken place.
Among the habits of the human mind a position of peculiar importance is
occupied by induction. Since the time of Hume a great deal has been written
about the justification for inductive inference. Hume showed that it could not be
reduced to deductive inference or justified by formal logic. So far as it goes his
demonstration seems to me final; and the suggestion of Mr Keynes that it can be got
round by regarding induction as a form of probable inference cannot in my view be
maintained. But to suppose that the situation which results from this is a scandal to
philosophy is, I think, a mistake.
We are all convinced by inductive arguments, and our conviction is reasonable
because the world is so constituted that inductive arguments lead on the whole to
true opinions. We are not, therefore, able to help trusting induction, nor if we could
3 Truth and Probability 45
15
‘Likely’ here simply means that I am not sure of this, but only have a certain degree of belief in
it.
16
Cf. also the account of ‘general rules’ in the Chapter ‘Of Unphilosophical Probability’ in Hume’s
Treatise.
Chapter 4
Probable Knowledge
Richard C. Jeffrey
To begin, we must get clear about the relevant sense of ‘belief’. Here I follow
Ramsey: ‘the kind of measurement of belief with which probability is concerned
is : : : a measurement of belief qua basis of action’.1
1
Frank P. Ramsey, ‘Truth and probability’, in The Foundations of Mathematics and Other Logical
Essays, R. B. Braithwaite, ed., London and New York, 1931, p. 171.
R.C. Jeffrey (deceased)
Princeton University, Boston, MA, USA
Fig. 4.1
1 des W
3
des G
4
1
2
1
4
0 des L
des G des L
prob A D : (4.1)
des W des L
Thus, if the desirabilities of losing and of winning happen to be 0 and 1, we have
prob A D des G, as illustrated in Fig. 4.1, for the case in which the probability of
winning is thought to be 34 .
On this basis, Ramsey2 is able to give rules for deriving the gambler’s subjective
probability and desirability functions from his preference ranking of gambles,
provided the preference ranking satisfies certain conditions of consistency. The
probability function obtained in this way is a probability measure in the technical
sense that, given any finite set of pairwise incompatible propositions which together
exhaust all possibilities, their probabilities are non-negative real numbers that add up
to 1. And in an obvious sense, probability so construed is a measure of the subject’s
willingness to act on his beliefs in propositions: it is a measure of degree of belief.
I propose to use what I take to be an improvement of Ramsey’s scheme, in
which the work that Ramsey does with the operation of forming gambles is done
with the usual truth-functional operations on propositions.3 The basic move is to
restrict attention to certain ‘natural’ gambles, in which the prize for winning is the
truth of the proposition gambled upon, and the penalty for losing is the falsity of
that proposition. In general, the situation in which the gambler takes himself to be
gambling on A with prize W and loss L is one in which he believes the proposition
2
‘Truth and probability’, F. P. Ramsey, op. cit.
3
See Richard C. Jeffrey, The Logic of Decision, McGraw-Hill, 1965, the mathematical basis
for which can be found in Ethan Bolker, Functions Resembling Quotients of Measures, Ph. D.
Dissertation, Harvard University, 1965, and Trans. Am. Math. Soc., 124, 1966, pp. 293–312.
4 Probable Knowledge 49
G D AW _ AL:
G D A A _ A A D T:
Now if A is a proposition which the subject thinks good (or bad) in the sense that he
places it above T (or below T) in his preference ranking, we have
des T des A
prob A D ; (4.2)
des A des A
corresponding to Ramsey’s formula (4.1).
Here the basic idea is that if A1 , A2 , : : : , An are an exhaustive set of incompatible
ways in which the proposition A can come true, the desirability of A must be a
weighted average of the desirabilities of the ways in which it can come true:
Let us call a function des which attributes real numbers to propositions a Bayesian
desirability function if there is a probability measure prob relative to which (4.3)
holds for all suitable A, A1 , A2 , : : : , An . And let us call a preference ranking of
propositions coherent if there is a Bayesian desirability function which ranks those
propositions in order of magnitude exactly as they are ranked in order of preference.
One can show4 that if certain weak conditions are met by a coherent preference
ranking, the underlying desirability function is determined up to a fractional linear
transformation, i.e., if des and DES both rank propositions in order of magnitude
exactly as they are ranked in order of preference, there must be real numbers a, b, c,
d such that for any proposition A in the ranking we have
a des A C b
DES A D : (4.5)
c des A C d
4
Jeffrey, op. cit., chs. 6, 8.
50 R.C. Jeffrey
Under further plausible conditions, (4.5) and (4.6) are given either exactly (as in
Ramsey’s theory) or approximately by
DES A D a des A C b;
(4.7)
PROB A D prob A: (4.8)
I take the principal advantage of the present theory over Ramsey’s to be that here
we work with the subject’s actual beliefs, whereas Ramsey needs to know what
the subject’s preference ranking of relevant propositions would be if his views of
what the world is were to be changed by virtue of his having come to believe that
various arbitrary and sometimes bizarre causal relationships had been established
via gambles.5
To see more directly how preferences may reflect beliefs in the present system,
observe that by (4.2) we must have prob A > prob B if the relevant portion of the
preference ranking is
A; B
T
B
A
In particular, suppose that A and B are the propositions that the subject will get job 1
and that he will get job 2, respectively. Pay, working conditions, etc., are the same,
so that he ranks A and B together. Now if he thinks himself more likely to get job
1 than job 2, he will prefer a guarantee of .B/ not getting job 2 to a guarantee of
.A/ not getting job 1; for he thinks that an assurance of not getting job 2 leaves him
more likely to get one or the other of the equally liked jobs than would an assurance
of not getting job 1.
5
Jeffrey, op. cit., pp. 145–150.
4 Probable Knowledge 51
where prob is the observer’s belief function before the observation. And conversely,
if the observer’s belief function after the observation is probE and probE is not
identical with prob, then the direct effect of the observation will be to change the
observer’s degree of belief in E to 1. This completes a definition of direct.
But from a certain strict point of view, it is rarely or never that there is a
proposition for which the direct effect of an observation is to change the observer’s
degree of belief in that proposition to 1; and from that point of view, the classes of
propositions that count as observational or actual in the senses defined above are
either empty or as good as empty for practical purposes. For if we care seriously to
distinguish between 0.999 999 and 1.000 000 as degrees of belief, we may find that,
after looking out the window, the observer’s degree of belief in the proposition that
the sun is shining is not quite 1, perhaps because he thinks there is one chance in a
million that he is deluded or deceived in some way; and similarly for acts where we
can generally take ourselves to be at best trying (perhaps with very high probability
of success) to make a certain proposition true.
One way in which philosophers have tried to resolve this difficulty is to postulate
a phenomenalistic language in which an appropriate proposition E can always be
expressed, as a report on the immediate content of experience; but for excellent
reasons, this move is now in low repute.7 The crucial point is not that 0.999 999 is
so close to 1.000 000 as to make no odds, practically speaking, for situations abound
in which the gap is more like one half than one millionth. Thus, in examining a
piece of cloth by candlelight one might come to attribute probabilities 0.6 and 0.4
6
G. E. M. Anscombe, Intention, § 8, Oxford, 1957; 2nd ed., Ithaca, N.Y., 1963.
7
See, e.g., J. L. Austin, Sense and Sensibilia, Oxford, 1962.
52 R.C. Jeffrey
to the propositions G that the cloth is green and B that it is blue, without there
being any proposition E for which the direct effect of the observation is anything
near changing the observer’s degree of belief in E to 1. One might think of some
such proposition as that (E) the cloth looks green
or possibly blue, but this is far
too vague to yield prob GE D 0.6 and prob EB D 0.4. Certainly, there is something
about what the observer sees that leads him to have the indicated degrees of belief
in G and in B, but there is no reason to think the observer can express this something
by a statement in his language. And physicalistically, there is some perfectly definite
pattern of stimulation of the rods and cones of the observer’s retina which prompts
his belief, but there is no reason to expect him to be able to describe that pattern or
to recognize a true description of it, should it be suggested.
As Austin8 points out, the crucial mistake is to speak seriously of the evidence of
the senses. Indeed the relevant experiences have perfectly definite characteristics by
virtue of which the observer comes to believe as he does, and by virtue of which in
our example he comes to have degree of belief 0.6 in G. But it does not follow that
there is a proposition E of
which the observer
is certain after the observation and for
which we have prob GE D 0.6, prob EB D 0.4, etc.
In part, the quest for such phenomenological certainty seems to have been
prompted by an inability to see how uncertain evidence can be used. Thus C. I.
Lewis:
If anything is to be probable, then something must be certain. The data which themselves
support a genuine probability, must themselves be certainties. We do have such absolute
certainties, in the sense data initiating belief and in those passages of experience which later
may confirm it. But neither such initial data nor such later verifying passages of experience
can be phrased in the language of objective statement – because what can be so phrased is
never more than probable. Our sense certainties can only be formulated by the expressive
use of language, in which what is signified is a content of experience and what is asserted
is the givenness of this content.9
But this motive for the quest is easily disposed of.10 Thus, in the example of
observation by candlelight, we may take the direct result of the observation (in a
modified sense of ‘direct’) to be, that the observer’s degrees of belief in G and
B change to 0.6 and 0.4. Then his degree of belief in any proposition A in his
preference ranking will change from prob A to
8
Austin, op. cit., ch. 10
9
C. I. Lewis, An Analysis of Knowledge and Valuation, La Salle, Illinois, 1946, p. 186.
10
Jeffrey, op. cit., ch. 11.
4 Probable Knowledge 53
neither 0 nor 1; and where for each proposition A in the preference ranking and for
each i the conditional probability of A on Ei is unaffected by the observation:
Then the belief function after the observation may be taken to be PROB, where
if the observer’s preference rankings before and after the observation are both
coherent. Where these conditions are met, the propositions E1 , E2 , : : : , En , may
be said to form a basis for the observation; and the notion of a basis will play the
role vacated by the notion of directness.
The situation is similar in the case of acts. A marksman may have a fairly definite
idea of his chances of hitting a distant target, e.g. he may have degree of belief 0.3 in
the proposition H that he will hit it. The basis for this belief may be his impressions
of wind conditions, quality of the rifle, etc.; but there need be no reason to suppose
that the marksman can express the relevant data; nor need there be any proposition
E in his preference ranking in which the marksman’s degree of belief changes to 1
upon deciding to fire at the target, and for which we have prob HE D 0.3. But the
pair H, H may constitute a basis for the act, in the sense that for any proposition A
in the marksman’s preference ranking, his degree of belief after his decision is
PROB A D 0:3 prob .A=H/ C 0:7 prob A=H :
It is correct to describe the marksman as trying to hit the target; but the proposition
that he is trying to hit the target can not play the role of E above. Similarly, it was
correct to describe the cloth as looking green or possibly blue; but the proposition
that the cloth looks green or possibly blue does not satisfy the conditions for
directness.
The notion of directness is useful as well for the resolution of unphilosophical
posers about probabilities, in which the puzzling element sometimes consists in
failure to think of an appropriate proposition E such that the direct effect of an
observation is to change degree of belief in E to 1, e.g. in the following problem
reported by Mosteller.11
Three prisoners, a, b, and c, with apparently equally good records have applied for
parole. The parole board has decided to release two of the three, and the prisoners know this
but not which two. A warder friend of prisoner a knows who are to be released. Prisoner
a realizes that it would be unethical to ask the warder if he, a, is to be released, but thinks
of asking for the name of one prisoner other than himself who is to be released. He thinks
that before he asks, his chances of release are 23 . He thinks that if the warder says ‘b will
be released,’ his own chances have now gone down to 12 , because either a and b or b and
c are to be released. And so a decides not to reduce his chances by asking. However, a is
mistaken in his calculations. Explain.
11
Problem 13 of Frederick Mosteller, Fifty Challenging Problems in Probability, Reading, Mass.,
Palo Alto, and London, 1965.
54 R.C. Jeffrey
is 12 . (b) It is compulsory. In the case of the coin as in the case of the sun, I cannot
decide to have a different degree of belief in the proposition, any more than I can
decide to walk on air.
In my scientific and practical undertakings I must make use of such compulsory
beliefs. In attempting to understand or to affect the world, I cannot escape the fact
that I am part of it: I must rather make use of that fact as best I can. Now where
epistemologists have spoken of observation as a source of knowledge, I want to
speak of observation as a source of compulsory belief to one or another degree. I
do not propose to identify a very high degree of belief with knowledge, any more
than I propose to identify the property of being near 1 with the property of being
compulsory.
Nor do I postulate any general positive or negative connection between the char-
acteristic of being compulsory and the characteristic of being sound or appropriate in
the light of the believer’s experience. Nor, finally, do I take a compulsory belief to be
necessarily a permanent one: new experience or new reflection (perhaps, prompted
by the arguments of others) may loosen the bonds of compulsion, and may then
establish new bonds; and the effect may be that the new state of belief is sounder
than the old. or less sound.
Then why should we trust our beliefs? According to K. R. Popper,
: : : the decision to accept a basic statement, and to be satisfied with it, is causally connected
with our experiences – especially with our perceptual experiences. But we do not attempt
to justify basic statements by these experiences. Experiences can motivate a decision, and
hence an acceptance or a rejection of a statement, but a basic statement cannot be justified
by them – no more than by thumping the table.12
But in the absence of a positive account of the nature of acceptance and rejection,
parallel to the account of partial belief given in section 1, it is impossible to evaluate
this view. Acceptance and rejection are apparently acts undertaken as results of
decisions; but somehow the decisions are conventional – perhaps only in the sense
that they may be motivated by experience, but not adequately motivated, if adequacy
entails justification.
12
K. R. Popper, The Logic of Scientific Discovery, London, 1959, p. 105.
13
Popper, op. cit., p. 106.
56 R.C. Jeffrey
To return to the question, ‘Why should we trust our beliefs?’ one must ask what
would be involved in not trusting one’s beliefs, if belief is analyzed as in section 1
in terms of one’s preference structure. One way of mistrusting a belief is declining
to act on it, but this appears to consist merely in lowering the degree of that belief:
to mistrust a partial belief is then to alter its degree to a new, more suitable value.
A more hopeful analysis of such mistrust might introduce the notion of sensitivity
to further evidence or experience. Thus, agents 1 and 2 might have the same degree
of belief – 12 – in the proposition H1 that the first toss of a certain coin will yield
a head, but agent 1 might have this degree of belief because he is convinced that
the coin is normal, while agent 2 is convinced that it is either two-headed or two-
tailed, he knows not which.14 There is no question here of agent 2’s expressing his
mistrust of the figure 21 by lowering or raising it, but he can express that mistrust
quite handily by aspects of his belief function. Thus, if Hi is the proposition that
the coin lands head up the ith time it is tossed, agent 2’s beliefs about the coin are
accurately expressed by the function prob2 where
1
prob2 Hi D ; prob2 Hi =Hj D 1;
2
while agent 1’s beliefs are equally accurately expressed by the function prob1 where
prob1 Hi 1 ; Hi 2 ; : : : ; Hi n D 2 n ;
if i1 < i2 < : : : < in . In an obvious sense, agent 1’s beliefs are firm in the sense that
he will not change them in the light of further evidence, since we have
1
probl .HnC1 =H1 ; H2 ; : : : ; Hn / D prob1 HnC1 D ;
2
while agent 2’s beliefs are quite tentative and in that sense, mistrusted by their
holder. Still, prob1 Hi D prob2 Hi D 21 .
After these defensive remarks, let me say how and why I take compulsive belief
to be sound, under appropriate circumstances. Bemused with syntax, the early
logical positivists were chary of the notion of truth; and then, bemused with Tarski’s
account of truth, analytic philosophers neglected to inquire how we come to believe
or disbelieve simple propositions. Quite simply put, the point is: coming to have
suitable degrees of belief in response to experience is a matter of training – a skill
which we begin acquiring in early childhood, and are never quite done polishing.
The skill consists not only in coming to have appropriate degrees of belief in
appropriate propositions under paradigmatically good conditions of observation, but
also in coming to have appropriate degrees of belief between zero and one when
conditions are less than ideal.
Thus, in learning to use English color words correctly, a child not only learns to
acquire degree of belief 1 in the proposition that the cloth is blue, when in bright
sunlight he observes a piece of cloth of uniform hue, the hue being squarely in
14
This is a simplified version of ‘the paradox of ideal evidence’, Popper, op. cit., pp. 407–409.
4 Probable Knowledge 57
the middle of the blue interval of the color spectrum: he also learns to acquire
appropriate degrees of belief between 0 and 1 in response to observation under bad
lighting conditions, and when the hue is near one or the other end of the blue region.
Furthermore, his understanding of the English color words will not be complete
until he understands, in effect, that blue is between green and violet in the color
spectrum: his understanding of this point or his lack of it will be evinced in the sorts
of mistakes he does and does not make, e.g. in mistaking green for violet he may be
evincing confusion between the meanings of ‘blue’ and of ‘violet’, in the sense that
his mistake is linguistic, not perceptual.
Clearly, the borderline between factual and linguistic error becomes cloudy, here:
but cloudy in a perfectly realistic way, corresponding to the intimate connection
between the ways in which we experience the world and the ways in which we
speak. It is for this sort of reason that having the right language can be as important
as (and can be in part identical with) having the right theory.
Then learning to use a language properly is in large part like learning such skills
as riding bicycles and flying aeroplanes. One must train oneself to have the right
sorts of responses to various sorts of experiences, where the responses are degrees
of belief in propositions. This may, but need not, show itself in willingness to utter
or assent to corresponding sentences. Need not, because e.g. my cat is quite capable
of showing that it thinks it is about to be fed, just as it is capable of showing what
its preference ranking is, for hamburger, tuna fish, and oat meal, without saying or
understanding a word. With people as with cats, evidence for belief and preference
is behavioral; and speech is far from exhausting behavior.15
Our degrees of beliefs in various propositions are determined jointly by our
training and our experience, in complicated ways that I cannot hope to describe.
And similarly for conditional subjective probabilities, which are certain ratios of
degrees of belief: to some extent, these are what they are because of our training –
because we speak the languages we speak. And to this extent, conditional subjective
probabilities reflect meanings. And in this sense, there can be a theory of degree of
confirmation which is based on analysis of meanings of sentences. Confirmation
theory is therefore semantical and, if you like, logical.16
Discussion
15
Jeffrey, op. cit., pp. 57–59.
16
Support of U.S. Air Force Office of Scientific Research is acknowledged, under Grant AF–
AFOSR–529–65.
58 R.C. Jeffrey
more explicit than it was earlier, although I shall not contradict anything that has
been said: I think this will help us to see to what extent, if any, there is anything
surprising or paradoxical in the situation.
First of all let me say this: there were three possible decisions by the warden –
AB, BC and AC; then, as against that, there was also the question of what the warden
would say to a who asked the question who else was being freed, and clearly the
warden could only answer ‘b’ or ‘c’. What I’m going to put down here is simply
the bivariate or two-way probability distribution, and it doesn’t matter at all at this
stage whether we interpret it as a frequency or as a subjective probability, because
it’s just a matter of applying the mechanics of the Bayes theorem.
One other remark I’d like to make is this: the case that was considered by
Professor Jeffrey was one where the a priori probabilities of AB, BC and AC were
each one-third. This actually does not at all affect the reasoning, and I will stick with
it just because it is close to my limitations in arithmetic.
So the marginal frequencies or probabilities are all equal to one-third. If the
decision had been AB, then of course the warden could only answer ‘b’, and
similarly if the decision had been AC, he could only answer ‘c’. So the joint
frequency or probability of the following event is one-third: the people chosen for
freeing are a and b, and when the warden is asked, ‘Who is the person other than a
who is about to be freed?’, his answer is ‘b’. The joint probability is also one-third
that the choice was AC and that the warden answered ‘c’.
We now come to the only case where the warden has a choice of what he will
say, namely, the case where the decision was BC. The question was raised, quite
properly, of how he goes about making this choice.
Let me here say the following. In a sense what I’m doing here is a sally into
enemy territory, because I personally am not particularly Bayesian in my approach
to decision theory, so I would not myself assert that the only method is to describe
the warden’s decision, the warden’s principle of choice, as a probabilistic one.
However, if it is not probabilistic, then of course the prisoner, our a, would have
to be using some other principle of choice on his part in order to decide what to
do. Being an unrepentant conservative on this, I might choose, or A might choose,
the minimax principle. However, in order to follow the spirit of the discussion
here, I will assume that the whole thing is being done in a completely Bayesian
or probabilistic way; in this case, to compute the remaining joint distribution we
must make some probabilistic assumption about how the warden will behave when
asked the question.
So let the principle be this, that he has a certain random device such that if the
people to be freed are b and c, his answer to the question will be ‘b’ with probability
ˇ and ‘c’ with of course probability 1 ˇ. All I will assume for the moment about ˇ
is that it is between zero and one, and that’s probably one of the few uncontroversial
points so far.
It is clear that the sum of the two joint probabilities (BC and ‘b’, and BC and ‘c’)
will be one-third; so the first will be 13 ˇ, and the second 13 .1 ˇ/. The marginal (or
absolute) probabilities of ‘b’ and ‘c’ will be 13 .1 C ˇ/ and 31 .2 ˇ/ respectively.
4 Probable Knowledge 59
Inf. !
Dec. ‘b’ ‘c’ Marginal
#
1 1
AB 3
0 3
1
BC ˇ/3 (1 ˇ)/3 3
1 1
AC 0 3 3
Marginal (1 C ˇ)/3 (2 ˇ)/3 1
Now what are the probabilities after the warden has given his answer? Suppose that
the answer that the warden gave is ‘b’: the problem now is, what is the probability
that a is to be freed, given that the warden said that b is to be freed? This probability,
which I will denote by ‘ b ’, I obtain in the following way using what I hope is a
self-explanatory notation:
b D p .A=‘b’/
p .A ‘b’/
D
p .‘b’/
p .AB‘b’/ C p .AC‘b’/
D
p .AB‘b’/ C p .AC‘b’/ C p .BC‘b’/
1
3 C0
D 1
3 C 0 C ˇ=3
D 1= .1 C ˇ/ :
Similarly I get c (the probability that a is to be freed, given that the warden said
that b is to be freed) as follows:
c D p .A=‘c’/
p .A‘c’/
D
p .‘c’/
p .AB‘c’/ C p .AC‘c’/
D
p .AB‘c’/ C p .AC‘c’/ C p .BC‘c’/
1
0C 3
D 1
0C 3 C .1 ˇ/ =3
D 1= .2 ˇ/ :
Now the question which we now have to ask is this: are these conditional probabili-
ties, , different from the marginal (absolute) probability that a is to be freed, p(a)?
60 R.C. Jeffrey
And the answer is that they are except when ˇ happens to be equal to one-half, in
which case the probability remains at its marginal value of two-thirds. But except in
this special case the probabilities b and c can vary from one-half to one.17
As I indicated before, there is no quarrel between us, but I do want to explore just
one step further, and that is this. You remember when we were told this anecdote
there was a wave of laughter and I now want to see what it was that was so funny.
It is that this prisoner became doubtful about asking for this extra information,
because he thought his probability of being released would go down after getting
it. So it seemed that having this extra information would make him less happy,
even though he didn’t have to pay for it. That really was the paradox, not the fact
that the probabilities changed. Clearly, the change in probabilities is itself not at all
surprising; for example, if the warden had told a the names of two people other than
himself who would be freed, his optimism would have gone down very drastically.18
What is surprising is that a thought he would be less happy with the prospect
of having the extra piece of information than without this prospect. What I want to
show now is that a was just wrong to think this; in other words, if this information
was free, he should have been prepared to hear it.
Suppose for instance ˇ is different from one-half: I think it is implicit in this little
anecdote that the probability of a’s being released either before or after getting the
information, in some sense corresponds to his level of satisfaction. If his chances are
good he is happy; if his chances are bad he is miserable. So these ’s, though they
happen to have been obtained as probabilities, may at the same time be interpreted
as utilities or what Professor Jeffrey called desirabilities. Good. Now if a proceeds
in the Bayesian way he has to do the following: he has to look at all these numbers,
because before he asks for the information he does not know whether the answer
will be ‘b’ or ‘c’. Then he must ask himself the following: How happy will I be if
he says ‘b’? How happy will I be if he says ‘c’? And then in the Bayesian (or de
Finetti or Ramsey) spirit he multiplies the utilities, say u(‘b’) and u(‘c’) associated
with hearing the warden say ‘b’ or ‘c’ by the respective probabilities, say p(‘b’) and
p(‘c’), of hearing these answers. He thus obtains an expression for his expected19
utility associated with getting the extra information, say
Eu D p 0 b0 u 0 b0 C p 0 c0 u 0 c0 :
17
In the problem as reported by Mosteller, it might be reasonable to take ˇ D 12 . In that case, let
us note, b D 1= 1 C 21 D 32 (not 12 as suggested in the statement of the problem!) and also
c D 1= .2 ˇ/ D 1= 2 12 D 23 . Hence (for ˇ D 21 ) a was wrong to expect the probabilities
to change. But, on the other hand, the warden’s reply would give him no additional information.
18
Or suppose, that ˇ D 1 (and a knows this). Then if a hears the warden tell him that c is one of the
persons to be released, he will have good reason to feel happy. For when ˇ D 1, the warden will
tell a about having selected c only if the selected pair was AC. On the other hand, still with ˇ D 1,
if the warden says that b is one of the persons to be released, this means (with equal probabilities)
that either AB or BC has been chosen, but not AC. Hence, with the latter piece of information,
a will be justifiably less optimistic about his chances of release. (With ˇ close to one, a similar
situation prevails.)
19
In the sense of the mathematical expectation of a random variable.
4 Probable Knowledge 61
Now the required probabilities are the marginal probabilities at the bottom of the
table, i.e.,
1Cˇ 2 ˇ
p 0 b0 D ; p 0 c0 D :
3 3
As for utilities, it is implicit in the argument that they are linear20 functions of the
probabilities that a will be released given the warden’s answer. So
1 1
u 0 b0 D b D ; u 0 c0 D c D :
1Cˇ 2 ˇ
Hence the (expected) utility associated with getting the extra information from the
warden is
1Cˇ 1 2 ˇ 1 2
Eu D : C : D :
3 1Cˇ 3 2 ˇ 3
On the other hand, the expected utility Euı , associated with not asking the warden
for extra information is simply equal to the original probability p(a) that a will be
released,
2
Euı D p.a/ D :
3
Hence it so happens that (for a utility function linear in probabilities of release)
Eu D Euı ;
i.e., the expected utility with extra information (Eu*) is the same as without
extra information (Euı). Thus a should be willing (but not eager) to ask for extra
information (if it is free of charge). ‘On the average’21, it won’t do him any harm;
nor will it help him.22
P. SUPPES: Rational Changes of Belief.
I am generally very much in agreement with Professor Jeffrey’s viewpoint on
belief and knowledge as expressed in his paper. The focus of my brief remarks
is to point out how central and difficult are the problems concerning changes of
belief. Jeffrey remarks that the familiar method of changing probable beliefs by
explicitly conditionalizing the relevant probability measure is not adequate in many
20
See footnote 22 on the next page.
21
‘On the average’ expresses the fact that the decision is made on the basis of mathematical
expectation. It need not imply a frequency interpretation of probabilities.
22
When utilities are non-linear with respect to probabilities of release, the prospect of additional
information may be helpful or harmful.
62 R.C. Jeffrey
situations – in fact, in all those situations that involve a change in the probability
assigned to evidence, but a change that does not make the probability of possible
evidence 0 or 1.
My point is that once we acknowledge this fact about the probable character of
evidence we open Pandora’s box of puzzles for any theory of rational behavior. I
would like to mention three problems. These problems are not dealt with explicitly
by Jeffrey, but the focus I place on them is certainly consistent with his own
expressed views.
Finite Memory
The second problem concerns the rational use of our inevitably restricted memory
capacities. A full-blown theory of rationality should furnish guidelines for the kinds
of things that should be remembered and the kind that should not. Again a solution,
but certainly not a solution whose optimality can be seriously defended, is at least
partly given by the choice of a language or a sample space for dealing with a given
set of phenomena. But the amount of information impinging on any man in a day
or a week or a month is phenomenal and what is accessible if he chooses to make it
so is even more so. What tiny fraction of this vast potential should be absorbed and
4 Probable Knowledge 63
stored for ready access and use? Within the highly limited context of mathematical
statistics, certain answers have been proposed. For example, information about the
outcome of an experiment can be stored efficiently and with little loss of information
in the form of the likelihood function or some other sufficient statistic, but this
approach is not of much use in most situations, although elements of the approach
can perhaps be generalized to less restricted circumstances. Perhaps even more
importantly, it is not clear what logical structure is the most rational to impose
on memory. The attempts at constructing associative computer memories, as they
are often called, show how little we are able as yet to characterize explicitly a
memory with the power and flexibility of a normal man’s, not to speak of the
memory of a normal man who is using his powers with utmost efficiency. Perhaps
one of the most important weaknesses of confirmation theory and the Ramsey-sort
of theory developed by Jeffrey and others is that little is said about imposing a
logical structure on evidence. Part of the reason for this is that the treatment of
evidence is fundamentally static rather than dynamic and temporal. In real life,
evidence accumulates over time and we tend to pay more attention to later than
earlier data, but the appropriate logical mechanisms for storing, organizing and
compressing temporally ordered data are as yet far from being understood.
Concept Formation
The most fundamental and the most far-reaching cognitive changes in belief
undoubtedly take place when a new concept is introduced. The history of science
and technology is replete with examples ranging from the wheel to the computer,
and from arithmetic to quantum mechanics. Perhaps the deepest problem of rational
behavior, at least from a cognitive or epistemological standpoint, is to characterize
when a man should turn from using the concepts he has available to solve a
given problem to the search not just for new evidence but for new concepts with
which to analyze the evidence. Perhaps the best current example of the difficulty
of characterizing the kinds of concepts we apply to the solution of problems is
the floundering and vain searching as yet typical of the literature on artificial
intelligence. We cannot program a computer to think conceptually because we do
not understand how men think conceptually, and the problem seems too difficult
to conceive of highly nonhuman approaches. For those of us interested in rational
behavior the lesson to be learned from the tantalizing yet unsatisfactory literature on
artificial intelligence is that we are a long way from being able to say what a rational
set of concepts for dealing with a given body of experience should be like, for we
do not have a clear idea of what conceptual apparatus we actually use in any real
sense.
To the problems about rationality I have raised in these remarks there is the pat
answer that these are not problems of the theory of rational behavior but only of
the theory of actual behavior. This I deny. A theory of rationality that does not
64 R.C. Jeffrey
take account of the specific human powers and limitations of attention, memory and
conceptualization may have interesting things to say but not about human rationality.
R. C. JEFFREY: Reply.
Suppes’ and Hurwicz’ comments are interesting and instructive, and I find I have
little to add to them. But perhaps a brief postscript is in order, in response to Suppes’
closing remark:
A theory of rationality that does not take account of the specific human powers and
limitations of attention may have interesting things to say, but not about human rationality.
It may be that there is no real issue between us here, but the emphasis makes me
uncomfortable. In my view, the logic of partial belief is a branch of decision theory,
and I take decision theory to have the same sort of relevance to human rationality
that (say) quantification theory has: the relevance is there, even though neither
theory is directly about human rationality, and neither theory takes any account of
the specific powers and limitations of human beings.
For definiteness, consider the following preference ranking of four sentences s,
s0 , t, t0 , where s and s0 are logically inconsistent, as are t and t0 .
s
s0
t
t0
This ranking is incoherent: it violates at least one of the following two requirements,
(a) Logically equivalent sentences are ranked together. (b) The disjunction of two
logically incompatible sentences is ranked somewhere in the interval between them,
endpoints included. Requirements (a) and (b) are part of (or anyway, implied by) a
definition of ‘incoherent’. To see that the given ranking is incoherent, notice that (a)
implies that the disjunction of the sentences s, s0 is ranked with the disjunction of
the sentences t, t0 , while (b) implies that in the given ranking, the first disjunction
is higher than the second. In my view, the point of classifying this ranking as
incoherent is much like the point of classifying the pair s, s0 as logically inconsistent:
the two classifications have the same sort of relevance to human rationality. In
the two cases, a rational man who made the classification would therefore decline
to own the incoherent preference ranking or to believe both of the inconsistent
sentences. (For simplicity I speak of belief here as an all-or-none affair.)
True enough: since there is no effective decision procedure for quantificational
consistency there is no routine procedure a man can use – be he ever so rational – to
correctly classify arbitrary rankings of sentences as incoherent or arbitrary sets of
sentences as inconsistent. The relevance of incoherence and inconsistency to human
rationality is rather that a rational man, once he comes to see that his preferences
are incoherent or that his beliefs are inconsistent, will proceed to revise them. In
carrying out the revision he may use decision theory or quantification theory as an
aid; but neither theory fully determines how the revision shall go.
4 Probable Knowledge 65
The most forceful answer I can give if asked for my opinion, is to say what I
fully believe. The point of having beliefs is to construct a single (though in general
incomplete) picture of what things are like. One obvious model of this part of my
opinion is a set of propositions.1 Their intersection is the proposition which captures
1
This was essentially the model provided in Hintikka’s book Knowledge and Belief. By proposi-
tions I mean the semantic content of statements; the same proposition can be expressed by many
statements. I am not addressing how opinion is stored or communicated.
B.C. van Fraassen ()
San Francisco State University, San Francisco, CA, USA
exactly that single picture of the world which has my full assent. Clearly a person’s
full beliefs leave open many alternatives. Alternatives left open by belief are then
also represented by (sets of) propositions, namely ones that imply my beliefs. But
these alternatives do not all have the same status for me, though they are all “possible
for all I [know or] believe.” Some seem more or less likely than others: enter
personal (subjective) probability, as a grading of the possibilities left open by one’s
beliefs.
I will take for granted that the probability of a proposition is a real number in the
interval [0, 1], with the empty proposition ƒ (self-contradiction) receiving 0 and the
universal proposition U (tautology) receiving 1. The assignment is a measure, that
is, it is additive and continuous (equivalently, countably additive). It follows from
this that the assignment of probabilities respects the ordering by logical implication:
2
This has been denied, e.g. by Henry Kyburg, and doubted, e.g. by Richard Foley (1993, Ch. 4).
5 Fine-Grained Opinion, Probability, and the Logic of Full Belief 69
measure then my probability is zero that the number equals x, for a x b. Hence
my probability equals 100 % that the number lies in the set [a, b] fxg, for each
such number x. Yet no real number belongs to all these sets – their intersection
is empty. Probability measures of this sort (deriving from continuous probability
densities) are ubiquitous in science, and informed opinion must be allowed to let
itself be guided by them. We have here a transfinite lottery paradox, and we can’t
get out of it in the way that worked for the finite case (see Maher 1990).
There is a third aspect of opinion, besides belief and subjective grading, namely
supposition. Much of our opinion can be elicited only by asking us to suppose
something, which we may or may not believe. The respondent imaginatively puts
himself in the position of someone for whom the supposition has some privileged
epistemic status. But if his answer is to express his present opinion – which is surely
what is requested – then this “momentary” shift in status must be guided by what
his present opinion is. How does this guidance work?
One suggestion is that the respondent moves to a state of opinion derived from
his own in two steps: (1) discarding beliefs so that the supposition receives more
than minimal likelihood; (2) then (without further change in beliefs) regrading the
alternatives left open so as to give the supposition maximal likelihood. This makes
sense only if both steps are unambiguous. We can imagine a simple case. Suppose
Peter has as “primary” beliefs A and B, and believes exactly what they jointly entail;
he is asked to entertain the supposition C A. In response he imaginatively moves
into the epistemic position in which (1) B is the only primary belief, and (2) he
assigns 0 to all alternatives left open by B which conflict with (C A) and then
regrades the others in the same proportions as they had but with the maximum
assigned to .B \ C A/.
This simple case already hinges on a certain hierarchical structure in Peter’s
opinion. Moreover it presupposes that those alternatives which were left open by
B, but which conflict with his initial equally primary belief that A, had been graded
proportionately as well. Even more structure must be present to guide the two steps
in less simple cases. What if the beliefs had been, say, A, B, and D, and their
joint consequences, and the supposition was compatible with each but not with the
conjunction of any two? The discarding process can then be guided only if some
hierarchy among the beliefs determines the selection.
Let us consider conditional personal probability as a possible means for describ-
ing structure of this sort. The intuitive Example 5.1 above about the mass of the
moon is the sort often given to argue for the irreducibility of conditional probability.
I could continue the example with: the mass of the moon seems to me to equally
likely to be x as (x C b)/2, on the supposition that it is one of those two numbers.
The two possibilities at issue here are represented by the degenerate intervals [x],
[(x C b)/2], so both they and the supposition that one or other is the case (represented
70 B.C. van Fraassen
by set fx, (x C b)/2g their union) receive probability 0. The usual calculation of
conditional probability, which would set P(B j A) equal to P .B \ CjA/ divided
by P .CjA \ C/, can therefore not be carried out. The suggestion that conditional
probability is irreducible means that two-place probability P( j ) – probability of one
thing given (on supposition of ) another – is autonomous and cannot be defined in
terms of the usual one-place (“absolute”) probability. Rather the reverse: we should
define P( ) D P( j U), probability conditional on the tautology.
There is a good deal of literature on two-place (“irreducible conditional”)
probability (see Appendix). Despite many individual differences, general agreement
concerning two-place probability extends to:
I. If P is a 2-place probability function then P( j A) is “normally” a (1-place)
probability function with P(A j A) D 1.
II. These derivative 1-place probability functions [described in I.] are related at least
by the Multiplication Axiom:
3
De Finetti (1936). I want to thank John M. Vickers for bringing this to my attention; De Finetti’s
idea is developed considerably further, with special reference to zero relative frequency, in Vickers
(1988), Sections 3.6 and 5.4. The relation here defined is slightly different from the so-named one
in my (1979) – to which the name was somewhat better suited – for convenience in some of the
proofs.
5 Fine-Grained Opinion, Probability, and the Logic of Full Belief 71
Example 5.3 Let U be the set of natural number f0, 1, 2, : : : g. For index n D 0, 1, 2,
: : : let pn be the probability measure defined on all subsets of U by the condition that
it assigns 0.1 to fxg if x is in the set f10n, : : : , 10n C 9g, and 0 otherwise. Define:
To verify the Multiplication Axiom note for instance that if A \C is not empty, and
P(A j C) > 0 then the first index n for which pn (A \C) > 0 is the same as the first
index m such that pm .C/ > 0. The “otherwise” clause will apply here only if B D ƒ.
These examples are instances of the “lexicographic” probability models which
I will discuss at some length below. We make the ideas of one- and two-place
probability precise as follows.
A space is a couple S D hU; Fi with U a non-empty set (the worlds) and F (the
family of propositions) a sigma-field on U, that is:
(a) U 2F
(b) if A, B 2 F then A B 2F
(c) if fAi : i D 1, 2, : : : g F then [ fAi g 2 F
A (1-place) probability measure P on space S D hU; Fi is a function mapping F
into the real numbers, subject to
1. 0 D P .ƒ/ P.A/ P.U/ D 1
2. P .A [ B/ C P .A \ B/ D P.A/ C P.B/ (finite additivity)
3. If E1 E2 : : : En : : : has union E, then P(E) D supfP(En): n D 1, 2, : : : g
(continuity)
Property 3 is in this context equivalent to countable additivity:
4. If fEn : n D 1, 2, : : : g are disjoint, with union E, then P.E/ D † fP .En /W
n D 1; 2; : : : g
and also to the dual continuity condition for countable intersection. The general
class of two-place probability measures to be defined now will below be seen to
contain a rich variety of non-trivial examples.
A 2-place probability measure P( – j – ) on space S D hU; Fi is a map of F F
into real numbers such that
I. (Reduction Axiom) The function P( j A) is either a probability measure on S
or else has constant value D 1.
II. (Multiplication Axiom)
for all A, B, C, in F.
72 B.C. van Fraassen
If P(j A) is a (1-place) probability measure, I shall call A normal (for P), and
otherwise abnormal. (“Absurd” might have been a better name; it is clearly a notion
allied to self-contradiction.) The definition of 2-place probability allows for the
totally abnormal state of opinion (P(A j B) D 1 for all A and B). It should not be
excluded formally, but I shall tacitly exclude it during informal discussion. Here
are some initial consequences of the definition. The variables range of course over
propositions (members of family F in space S D hU; Fi).
In any conception of our epistemic state there will be propositions which are not
epistemically distinguishable from the tautology U – let us say these are a priori for
the person. This notion is the opposite of the idea of abnormality:
What is a priori for a person is therefore exactly what is certain for him or her
on any supposition whatsoever. This notion generalizes unconditional certainty,
i.e. P(A) D 1. The strongest unconditional probability equivalence relation between
A and B is that their symmetric difference (A C B) has measure zero. We can
4
See B. De Finetti (1972), Section 5.22.
5 Fine-Grained Opinion, Probability, and the Logic of Full Belief 73
The abnormal propositions are the ones a priori equivalent to the empty set (the
self-contradiction) and the a prioris are the ones a priori equivalent to the tautology.
(Of course these are subjective notions: we are speaking of what is a priori for the
person with this state of opinion.)
Note now that AhPiB iff P(A j A C B) D 1 and P(B j A C B) D 1, since additivity
would not allow that if A C B were normal. We can divide this equivalence relation
into its two conjuncts:
Definition A P > B iff P(A j A C B) D 1.
This is the relationship of “superiority” mentioned above.
The beliefs I hold so strongly that they are a priori for me are those whose contraries
are all abnormal. There is a weaker condition a proposition K can satisfy: namely
that any normal proposition which implies K is superior to any that are contrary to
K. Consider the following conditions and definitions:
5
From this point on I shall drop the ubiquitous “for P” unless confusion threatens, and just write “a
priori”, “abnormal”, etc. leaving the context to specify the relevant 2-place probability measure.
74 B.C. van Fraassen
Superiority: the alternatives K leaves open are all superior to any alternative that K
excludes.
We can deduce from these conditions something reminiscent of Carnap’s
“regularity” (or Shimony’s “strict coherence”):
By the definition I gave of full belief, beliefs are clustered: each belongs to a family
fA: A P > Kg for some belief core K (if there are any at all), which by [T4.2] sums
up that cluster exactly:
We can now prove that these clusters form a chain, linearly ordered by set inclusion
(implication).
For proof, assume there is at least one belief core; call it K. Assume also that
the belief cores are countable and form a chain (the latter by [T5.1]), and call the
intersection K*. Countable additivity of the ordinary probability measure P() D P(j
U) is equivalent to just the right continuity condition needed here: the probabilities
of the members of a countable chain of sets converge to the probability of its
intersection. Since in our case all those numbers equal 1, so does P(K*). Therefore
also K* is not empty, and thus normal because it is a subset of at least one belief core.
Moreover, so are its non-empty subsets, so that they are normal too. Its complement
U K* includes U K, and is therefore also normal.
We have now seen that K* satisfies conditions (A1), (A3), and (A4), and need
still to establish (A2). If A is a normal subset of K*, and hence of all belief cores,
and B is disjoint of K*, we have P(A j A C (B K 0 )) D 1 for all belief cores K 0 . But
the sets B K 0 form an increasing chain whose union is B K* D B. Hence also the
sets A C (B K 0 ) here form such a chain with union A C B. To conclude now that
P(A j A C B) D 1, we appeal to [T2.5], the principle of Condition Continuity. This
ends the proof.
The significance of this result may be challenged by noting that the intersection
of countably many sets of measure 1 also has measure 1. So how have we made
progress with the transfinite lottery paradox? In four ways. The first is that in
the representation of opinion we may have a “small” family of belief cores even
if probability is continuous and there are uncountably many propositions with
probability 1. The second is that no matter how large a chain is, its intersection
is one of its members if it has a first (D “smallest”) element. The third is that the
following is a condition typically met in spaces on which probabilities are defined
even in the most scientifically sophisticated applications:
(*) Any chain of propositions, linearly ordered by set inclusion, has a
countable subchain with the same intersection.
[T5.3] If (*) holds and there is at least one belief core, then the intersection of
all belief cores is also a belief core.
This is a corollary to [T5.2].
Fourthly, farther below I will also describe an especially nice class of models
of fine-grained opinion for which we can prove that the intersection of the belief
cores, if any, is always also a belief core (“lexicographic probability”). There are no
countability restrictions there.
R0 D K0
RjC1 D KjC1 Kj .j D 0; 1; 2; : : : / :
6
Could a person’s opinion be devoid of belief cores? Our definitions allow this, and it seems to
me this case is related to the idea of a “Zen mind” which I have explored elsewhere (van Fraassen
1988).
78 B.C. van Fraassen
This says quite clearly that (in this case) belief guides opinion, for probabilities
conditional on belief remnants are, so to speak, all the conditional probabilities
there are.
In the basic theory of two-place probability, the Multiplication Axiom places the
only constraint on how the one-place functions P(j A), P(j B), : : : are related to each
other. It entails that the proposition A B is irrelevant to the value of P(B j A) – that
this value is the same as P(A \ B j A) – and that the usual ratio-formula calculates
the conditional probability when applicable. Indeed, the ratio formula applies in the
generalized form summarized in the following:
If X is a subset of A which is a subset of B, then:
[T7.1] if P(A j B) > 0 then P .XjA/ D P .XjB/ W P .AjB/
[T7.2] if X is normal, then P .XjB/ P .XjA/.
There is another way to sum up how the Multiplication Axiom constrains the
relation between P(j A) and P(j B) in general. When we consider the two conditional
probabilities thus assigned to any proposition that implies both A and B, we find a
proportionality factor, which is constant when defined.
[T7.3] If P(A j B) > 0 then there is a constant k 0 such that for all subsets X
of A \ B, P .XjA/ D kP .XjB/. The constant k D k .A; B/ D ŒP .BjA/ =
P .AjB/, defined provided P(A j B) > 0.
7
In earlier treatments of two-place probability this relationship has appeared as a special axiom: If
P(A j B) D P(B j A) D 1 then P( j A) D P( j B).
5 Fine-Grained Opinion, Probability, and the Logic of Full Belief 79
8
As I have argued elsewhere (van Fraassen 1981a) this construction provides us with the “right”
clue to the treatment of quantification and of intuitionistic implication in so-called probabilistic (or
generally, subjective) semantics.
9
In my (1981a), P//A was designated as PA and called “P conditioned on A.” I now think this
terminology likely to result in confusion, and prefer “P relativized to A.”
80 B.C. van Fraassen
The converse does not hold, as can be seen from our Example 5.3 in Section 2.1,
where U is the set of natural numbers, and is surface equivalent, but not a priori
equivalent, to f0, 1, : : : , 9g. For P(f0, 1, : : : , 9g j f10, : : : , 19g) D 0 there.
1 D .A B C/ ; 2 D .AB C/ ; 3 D B A C;
4 D AC B
5 D BC A; 6 D C B A
E D .A C B/ [ .B C C/ D .A C B/ [ .B C C/ [ .A C C/
5 Fine-Grained Opinion, Probability, and the Logic of Full Belief 81
I will define a class of models such that P satisfies principles I–II iff P can be
represented by one of these models, in the way to be explained. The class will
be chosen large; a special subclass (“lexicographic models”) will yield nontrivial,
easily constructed examples to be used in illustrations and refutations. (The term
“lexicographic” is used similarly in decision theory literature; see Blume et al.
1991a, b.)
A model begins with a sample space S D hU; Fi, where U is a non-empty set
(the universe of possibilities) and F a sigma-field of sets on U (the propositions).
We define the subfields:
if A is in F then FA D fE \ A W E in Fg I
thus FA is a field on A. For each such field designate as PA the set of probability
measures defined on FA. (When A is empty, FA D fAg and PA is empty.) The
restriction of a member p of PA to a subfield FB, with B a subset of A, will be
designed p j FB. Finally let PS be the union of all the sets PA, A in F.
82 B.C. van Fraassen
A.B/ D P .BjA/ :
That (M1) and (M2) are satisfied by M D hS; i follows at once from the facts
about normal sets. Suppose now, equivalently to the antecedent of (M3) that A and
B are normal sets with (B) > 0. To prove that (M3) holds, suppose that B(A) is
defined and positive, so that B and A are normal sets, P(A j B) > 0. Then according to
[T10.3], for each subset X of A \ B we have P .XjA/ D ŒP .BjA/ =P .AjB/ P .XjB/.
Therefore here A(X) D [P(B j A)/P(A j B)]B(X). In conclusion:
5 Fine-Grained Opinion, Probability, and the Logic of Full Belief 83
I will first show that in a lexicographic model, the intersection of all belief cores,
if any, is always a belief core too. Since this does not depend on cardinality or
the character of the sample space, the result adds significantly to the previous
theorems. Then I will construct a lexicographic model to show that in general not all
propositions with probability 1 are full beliefs. This model will be a reconstruction
of Example 5.4 (Petra and Noah’s Arc).
84 B.C. van Fraassen
where XX is a subset of F
The sequence SEQ is now pieced together from some other well ordered
sequences as follows:
[T11.6] In PETRA some propositions not among its full beliefs have probability 1.
In view of the preceding, it suffices to reflect that N \ W – GR has proper subsets
with probability 1; for example N \ W GR EQ.
Appendix
concentrated his research on those which derive trivially from one-place probability
functions (“m-functions”). Reichenbach’s probability was also irreducibly two-
place. I have mentioned De Finetti’s paper (1936) which introduced the idea
of local comparisons (like my “superior”; Vickers’ “thinner”); see also Section
4.18 in his Theory of Probability, vol. 1. The most extensive work on two-place
probability theory is by Renyi (1955, 1970a,b). The theory of two-place probability
here presented is essentially as explored in my (1979), but with considerable
improvement in the characterization of the described classes of models. Finally,
the discussion of supposition in section “Supposition and two-place probability” is
related to work on belief revision, much of it indebted to ideas of Isaac Levi; see
Gardenfors 1988 for a qualitative version.
The ordering P(A) P(B) extends the partial ordering of logical implication: if
A B then P(A) P(B). Unfortunately, the ordering P(A) < P(B) does not extend
in general the partial ordering of proper implication: P(A) D P(B) is possible
even when A ¤ B. Indeed, this is inevitable if there are more than countably
many disjoint propositions. As a corollary, the intersection of all propositions of
maximal probability may itself even be empty. Kolmogoroff himself reacted to
this problem by suggesting that we focus on probability algebras: algebras of
propositions reduced by the relation of equivalence modulo differences of measure
zero: P(A C B) D 0. (See Birkhoff (1967), XI, 5 and Kappos (1969), II, 4 and III, 3.)
The difficulty with this approach is that a probability algebra does not have the
structure usually demanded of an algebra of propositions. For the latter, the notion of
truth is relevant, so it should be possible to map the algebra homomorphically into
f0, 1g. As example take the unit interval with Lebesgue measure, reduced by the
above equivalence relation. This is a probability (sigma-)algebra. Let T be the class
of elements designated as true, i.e. mapped into 1, and let A with measure x be in T.
Then A is the join of two disjoint elements of measure x/2 each. Since the mapping
is a homomorphism, one of these is in T. We conclude that T contains a countable
downward chain A1 , A2 , : : : with the measures converging to zero. Therefore its
meet is the zero element of the algebra. The meet should be in T because it is the
countable meet of a family of “true” propositions; but it can’t be in T, since the zero
element is mapped into 0.
This “transfinite inconsistency” of the propositions which have probability one,
was forcefully advanced by Patrick Maher (1990) as a difficulty for the integration
of subjective probability and belief. My conclusion, contrary to Maher’s, is that the
role of subjective probability is to grade the alternatives left open by full belief. That
automatically bestows maximal probability on the full beliefs, but allows for other
propositions to also be maximally probable. The question became then: how are the
two classes of maximally probable propositions to be distinguished?
5 Fine-Grained Opinion, Probability, and the Logic of Full Belief 87
A4. Infinitesimals?10
There is another solution on offer for most problems which two-place probability
solves. That is to stick with one-place probability, but introduce infinitesimals. Any
non-self-contradictory proposition can then receive a non-zero probability, though
often it is infinitesimal (greater than zero but smaller than any rational fraction).
The infinitesimals solution is to say that all the non-self-contradictory propo-
sitions (that are not contrary to my full beliefs) receive not zero probability but
an infinitesimal number as probability [in a non-standard model]. There is an
important result, due to Vann McGee (1994) which shows that every finitely additive
2-place probability function P(A j B) is the standard part of p(A\B)/p(B) for
some non-standard 1-place probability function p (and conversely). Despite this I
see advantages to the present approach to conditional probability which eschews
infinitesimals. First of all, there is really no such thing as “the” infinitesimals
10
For related critiques of the ‘infinitesimals’ gambit, see Skyrms (1983), Hajek (1992).
88 B.C. van Fraassen
It is easy to construct a small lexicographic model in which this is the case. Let C
be a subset of B and B a subset of A; let p1 give 1 to A but 0 to B and to C; p2 give
1 to B but 0 to C; and p3 give 1 to C. If these are all the measures in the sequence,
then subsets of C which receive probability 0 conditional on C are all abnormal.
Intuitively it would seem that in the infinitesimal approach this would require the
construction in which there are exactly two layers L and M of infinitesimals: x in
L is infinitesimal in comparison to standard numbers, Y in M is infinitesimal in
comparison (even) to any number in L, and no numbers at all are infinitesimal in
comparison to numbers in M. I leave this as an exercise for the reader.
As to the problem of belief, I wonder if the nonstandard reconstruction would
have the desirable features for which we naturally turn to infinitesimals. Suppose for
example that I choose a model in which each non-empty set has a positive (possibly
infinitesimal) probability. Then my full beliefs are not just those which have
probability 1, since that includes the tautology only. On the other hand, I can’t make
it a requirement that my full beliefs have probability > 1 d, for any infinitesimal
d one could choose. For the intersection of the sets with probability 1 d will
generally have a lower probability. Hence the lottery paradox comes back to haunt
us. We would again face the trilemma of either restricting full beliefs to the
tautology, or specifying them in terms of some factor foreign to the degrees-of-
belief framework, or banishing them from epistemology altogether.
References
Birkhoff, G. (1967). Lattice theory (3rd ed.). Providence: American Mathematical Society.
Blume, L., Brandenburger, A., & Deckel, E. (1991a). Lexicographic probabilities and choice under
uncertainty. Econometrica, 59, 61–79.
Blume, L., Brandenburger, A., & Deckel, E. (1991b). Lexicographic probabilities and equilibrium
refinements. Econometrica, 59, 81–98.
De Finetti, B. (1936). Les probabilités nulles. Bulletin des sciences mathématiques, 60, 275–288.
De Finetti, B. (1972). Theory of probability (2 vols.). New York: Wiley.
Field, H. (1977). Logic, meaning and conceptual role. Journal of Philosophy, 74, 374–409.
Foley, R. (1993). Working without a net. Oxford: Oxford University Press.
5 Fine-Grained Opinion, Probability, and the Logic of Full Belief 89
Gardenfors, P. (1988). Knowledge in flux: Modeling the dynamics of epistemic states. Cambridge:
MIT Press.
Hajek, A. (1992). The conditional construal of conditional probability. Ph.D. dissertation,
Princeton University.
Harper, W. L. (1976). Rational belief change, Popper functions, and counter-factuals. In C. Hooker,
& W. Harper (Eds.), Foundations of probability theory (Vol. 1, pp. 73–112). Dordrecht: Reidel
Publishing Company.
Kappos, D. A. (1969). Probability algebras and stochastic spaces. New York: Academic Press.
Maher, P. (1990). Acceptance without belief. PSA, 1, 381–392.
McGee, V. (1994). Learning the impossible, In Eells, E. & Skyrms, B. (Eds.). Probabilities and
conditionals: Belief revision and rational decision (pp. 179–199). Cambridge: Cambridge
University Press.
Popper, K. (1959). The logic of scientific discovery. New York: Basic Books.
Renyi, A. (1955). On a new axiomatic theory of probability. Acta Mathematica Hungarica, 6,
285–333.
Renyi, A. (1970a). Foundation of probability. San Francisco: Holden-Day.
Renyi, A. (1970b). Probability theory. Amsterdam: North-Holland.
Skyrms, B. (1983). Three ways to give a probability function a memory. In J. Earman (Ed.), Testing
scientific theories (Minnesota studies in the philosophy of science, Vol. X, pp. 157–161).
Minneapolis: University of Minnesota Press.
van Fraassen, B. C. (1979). Foundations of probability: A modal frequency interpretation.
In G. Toraldo di Francia (Ed.), Problems in the foundations of physics (pp. 344–387).
Amsterdam/New York: North-Holland Publishing Company.
van Fraassen, B. C. (1981a). Probabilistic semantics objectified: I. Postulates and logics. Journal
of Philosophical Logic, 10, 371–394.
van Fraassen, B. C. (1981b). Probabilistic semantics objectified: II. Implications in probabilistic
model sets. Journal of Philosophical Logic, 10, 495–510.
van Fraassen, B. C. (1988). Identity in intensional logic: Subjective semantics. In U. Eco et al.
(Eds.), Meaning and mental representation (pp. 201–219). Bloomington: Indiana University
Press.
Vickers, J. M. (1988). Chance and structure. Oxford: Oxford University Press.
Chapter 6
A Theory of Higher Order Probabilities
Haim Gaifman
Introduction
A part of this paper has been included in a talk given in a NSF symposium on foundations of
probability and causality, organized by W. Harper and B. Skyrms at UC Irvine, July 1985. I wish
to thank the organizers for the opportunity to discuss and clarify some of these ideas.
1
My Salzburg paper (1983) has been devoted to these questions. The upshot of the analysis there
has been that even a “purely subjective” probability implies a kind of factual claim, for one can
H. Gaifman ()
Columbia University, New York, NY, USA
asses its success in the actual world. Rather than two different kinds, subjective and objective
probabilities are better to be regarded as two extremes of a spectrum.
6 A Theory of Higher Order Probabilities 93
represents his state of knowledge later in the evening, then his announcement in
“The New Yorker” cartoon can be summed up as follows, where A D ‘tomorrow it
will rain’:
Due to limitations of space and deadline I have not entered into details of various
proofs. Some of the material has been abridged; I have included some illustrative
examples of simple HOPs, but not the more interesting ones of general HOPs (which
arise naturally in distributed systems). Also the bibliography is far from complete.
Simple HOPs
As explained in the introduction PR(A, ) is the event that the true (or the eventual,
or the expert-assigned) probability of A lies in . P is the agent’s current subjective
probability.
Among the closed intervals, we include also the empty interval, ¿. The minimal
and maximal elements of F are, respectively, 0 and 1; that is: 0 D empty subset of
W D False, 1 D W D True.
In the explanations I shall use “probability” both for the agent’s current
subjective probability as well as for the true, or eventual one; the contexts indicate
the intended reading.
6 A Theory of Higher Order Probabilities 95
as well as by:
Vice versa, if, for every x 2 W, Px is a probability over F such that fx : px (A) 2 g
is in F for all A 2 F and all real closed, , and if we use (6.1) as a definition of PR
then Axioms (I)–(V) are satisfied.
We call p the kernel of the HOP.
The proof of Theorem 6.1 is nothing more than a straight-forward derivation of
all the required details from the axioms, using (6.2) as the definition of px . (The
“vice versa” part is even more immediate than the first part.)
We can now extend PR and define PR(A,), for arbitrary subsets of reals, as
fx : pz (A) 2 g. If is a Borel set then PR(A,) is in F.
The meaning of px is obvious: it is the probability which corresponds to
the maximal state of knowledge in world x – the distribution chosen by the expert of
that world.
Notation For ’ 2 [0,1], PR .A; ˛/ Ddf PR .A; Œ˛; ˛/ :
96 H. Gaifman
Let P(AjB) be the conditional probability of A, given B. It is defined in the case that
P(B) ¤ 0 as P(A\B)/P(B). It is what the agent’s probability for A should be had he
known B.
Axiom (VIw ) If P(PR(A, [˛,ˇ])) ¤ 0 then ’ P .A j PR .A; Œ˛; ˇ// ˇ.
Axiom (VIw ) (the weak form of the forthcoming Axiom (VI)) is a generalization
of Miller’s Principle to the case of interval-based events. Rewritten in our notation,
Miller’s Principle is: P(A j PR(A,˛)) D ˛. Axiom (VIw ) appears to be the following
rule: My probability for A should be no less than ’ and no more than ˇ, were I to
know that in a more informed state my probability for A will be within these bounds.
Plausible as it sounds, the use of the hypothetical “were I to know that : : : ” needs
in this context some clarification. Now a well-known way of explicating conditional
probabilities is through conditional bets. Using such bets van-Frassen (1984) gives
a Dutch-book argument for the Principle: Its violation makes possible a system of
bets (with odds in accordance with the agent’s probabilities) in which the agent will
incur a net loss in all circumstances. In this argument PR(A,˛) is interpreted as the
event that the agent’s probability for A at a certain future time will be ’, in which
case he should accept at that time bets with odds ’. The same kind of Dutch-book
can be constructed if Axiom (VIw ) is violated. (Here it is crucial that we use an
interval, the argument fails if we replace [’,ˇ] by a non-convex Borel set.)
6 A Theory of Higher Order Probabilities 97
The same intuition which prescribes (VIw ) prescribes (VI); also here the violation of
the axiom makes possible a Dutch-book against the agent. What is essential is that
events of the form PR(B,) be, in principle, knowable to the agent, i.e., be known
(if true) in the maximal states of knowledge as defined by our structure.2
In whatR follows integrating a function f (t) with respect to a probability m is
written as f(t) m(dt).
Lemma 6.1 Axiom (VIw ) implies that the following holds for all A 2 F:
Z
P.A/ D p .x; A/ P.dx/ (6.3)
The proof consists in applying the formula P.A/ D †i P.A j Bi / P .Bi /, where the
Bi ’s form a partition, passing to the limit and using the definition of an integral.
The implication (6.3) ) (VIw ) is not true in general. Note that in the discrete
case (6.3) becomes:
X
P.x/ D p .x; y/ P.y/ .6:3d /
y
Definition Call Two worlds x,y 2 W epistimically equivalent, (or, for short,
equivalent) and denote it by x ' y, if Px D Py • For S – a class of events, define
K[S] to be the field generated by all events of the form PR(A,), A2S, – a real
closed interval.
Epistemic equivalence means having the same maximal knowledge. Evidently
x ' y iff, for all A and all , x2PR(A,) , y2PR(A,). This is equivalent to: for
all C2K[F], x2C , y2C. If K[F] is generated by the countably many generators
Xn , n D 0,1, : : : then the equivalence classes are exactly all non-empty intersections
\n Xn ’ where each Xn ’ is either Xn or its complement. Hence the equivalence classes
are themselves in K[F], they are exactly the atoms of this field. The next lemma
2
It is important to restrict C in Axiom (VI) to an intersection of such events. The removal of this
restriction will cause the px ’s to be two-valued functions, meaning that all facts are known in the
maximal knowledge states.
98 H. Gaifman
shows that the condition that K[F] be countably generated is rather mild, for it
holds whenever F itself is countably generated (which is the common state of
affairs):
Lemma 6.2 If S is either countable or a countably generated field, then K[S] is
countably generated.
(As generators for K[S] one can take all PR(A,), A2S, – a rational closed
interval; the second claim is proved by showing that if S’ is a Boolean algebra that
generates the field S then K[S’] D K[S].)
Terminology A 0-set is a set of probability 0. Something is said to hold for
almost all x if it holds for all x except for a 0-set. The probability in question is
P, unless specified otherwise.
Theorem 6.2 If F is countably generated then axiom (VI) is equivalent to each of
the following conditions:
(A) (6.3) holds (for all A) and the following is true: Let Cx be the epistemic
equivalence class to which x belongs, then
The proof that axiom (VI) is equivalent to (A) and implies (B) uses only basic
measure theory. The present proof of (B) ) (A) relies on advanced ergodic theory3
and I do not know if this can be avoided. Fortunately the rest of this paper does
not rely on this implication (except the corresponding implication in Theorem 6.3).
Note that in the discrete case (6.4) is equivalent to:
X
p .x; z/ D p .x; y/ p .y; z/ .4d /
y
m.V/ D P .[u2V Eu /
3
I am thankful to my colleagues at the Hebrew University H. Furstenberg, I. Katzenelson and B.
Weiss for their help in this item. Needless to say that errors, if any, are my sole responsibility.
6 A Theory of Higher Order Probabilities 99
Pu .Eu / D 1
The set of ontological worlds gets the value 1 under P and under each px where
x is coherent. It is referred to as the ontological part of the HOP. Together with the
structure induced by the original HOP it forms by itself a simple HOP. Similarly we
define the coherent part of the HOP as the set of all coherent worlds (together with
the induced structure). As far as calculating probabilities goes, only the ontological
part matters. Coherent non-ontological worlds are useful as representatives of
transitory states of knowledge.
Examples
Example 6.1 W D fw1, w2, w3g P D (1/3, 1/3, 1/3) and the kernel matrix is:
:5 :5 0
0 :5 :5
:5 0 :5
The agent’s current probability assigns each world the value 1/3. Eventually, in
world w1 he will know that he is not in w3 and he will assign each of the worlds w1,
w2 the value 0.5. This is the meaning of the first row. The other rows are similarly
interpreted.
By direct checking one can verify that (VIw ) is satisfied. (The checking of
all cases in this example is easy because PR .A; ˛/ ¤ ¿ only for ’ D 0.5,1.)
However the matrix is not equal to its square, hence Axiom (VI) is violated, as
indeed the following case shows: Put A D fw1g, C D PR(fw2g,0.5). Then C D
fx W p .x; w2/ D 0:5g D fw1; w2g and similarly PR(A,0.5) D fw1, w3g. Hence
A D PR(A,0.5) \ C implying P(A j PR(A,0.5) \ C) D 1 ¤ 0.5. This can be used
to construct a Dutch book against the agent.
Note also that the epistemic equivalence classes are fw1g,fw2g and fw3g and that
non is ontological; hence also there are no coherent worlds here.
Example 6.2 W D fw1,w2, : : : ,w8g, P is: (.1, .2, .2, .1, .4, 0, 0, 0g and the kernel
matrix is:
:2 :4 :4
:2 :4 :4
:2 :4 :4
:2 :8
:2 :8
1
:05 :2 :2 :05 :2 :5 0 0
:2 :1 :1 :2 :1 :1 :1 :1
where all undisplayed entries are 0. The sets fw1, w2, w3g, fw4,w5g and fw6g are
equivalence classes which are ontological. P is a mixture of these 3 types of rows,
6 A Theory of Higher Order Probabilities 101
with weights 0.5, 0.5, 0, respectively. Hence condition (C) is satisfied, therefore
also Axiom (VI). w7 is a coherent non-ontological world, because the 7th row is a
mixture of the first three types (with weights .25, .25, .5) w8 is not coherent. The
ontological part consists of the upper left 6 6 matrix and the coherent part of the
7 7 one.
The example can be made more concrete by the following scenario. A number is
to be chosen from f1,2,3g. For i D 1,2,3, the number chosen in wi is i, but in each of
these 3 worlds the maximal knowledge consists in assigning probabilities 0.2, 0.4,
0.4 to the 3 possibilities. In w4 the number chosen is 1 and in w5 it is 2; in either of
these worlds the maximal knowledge consists in assigning the probabilities 0.2, 0.8.
In w6 the number is 2 and it is also assigned probability 1. In the agent’s current state
he assigns probability 0 to finding himself eventually in the third state of maximal
knowledge, and equal probabilities to the first and second states. World w7 represent
a similar situation but with different weights. We can imagine 3 lotteries for chosing
the number; in each equivalence class the maximal knowledge is knowledge of the
chosen lottery.
Example 6.3 Let H be the probability of “heads” of some given coin of unknown
bias. Treat H as a random variable. The agent’s knowledge is represented by a
probability distribution for H. Say it is the uniform distribution over [0,1]. The
expert does not know the value of H but he has some additional information. Say his
additional information is the value of N – the number of “heads” in 50 independent
tosses. Then our worlds can be regarded as pairs (h,n), such that in (h,n) the event
H D h\N D n is true; here h is a real number in [0,1] and n an integer between 0
and 50. The field F is generated by the sets Œ˛; ˇ fng ; 0 ˛ ˇ 1; n D
0; : : : ; 50.
Given H D h, we get the binomial distribution bh,50 for N. This fact, together with
the agent’s uniform distribution for H, determines his probability P over F. The
expert’s probability in world (h,n) is obtained by conditioning on his information, it
is P( jN D n). There are 51 equivalence classes which correspond to the 51 possible
values of N and all worlds are ontological.
As is well known, different values of N give rise to different conditional
distributions of H. Therefore the events N D n are in the field generated by the
events4 PR(H2[’,ˇ], ). The whole field F is therefore generated by events
which are either of the form H2[˛,ˇ] or obtained from these by applying the
operator PR. Consequently we can give an abstract description of this HOP which
does not mention the fifty tosses. The only function of the tosses is to affect
the distribution of H; in our framework such changes in distribution constitute
themselves events which can be treated directly, without having to bring in their
causes.
4
Actually there are 51 real numbers ˛ n such that the event N D n is the same as PR(H 1/2, ˛ n ).
102 H. Gaifman
The restriction that F be countably generated is a mild one. The probability spaces
which commonly appear in theory, or in applications, are essentially of this nature5 .
Usually we are interested in properties that involve only countably many generators.
We will first show that for studying such properties we can always restrict ourselves
to the case where the underlying field is countably generated.
Definition Given a simple HOP (W, F, P, PR) and given S F, define H[S] as the
smallest field containing S and closed under PR (i.e., A 2 H ŒS ) PR .A; / 2
H ŒS for every real closed interval ).
H[S], together with the restrictions of P and PR to it, forms a subHOP, where this
notion is defined in the obvious way.
Lemma 6.3 If S is a Boolean algebra and, for every A in S and every rational
closed interval , PR(A, ) is in S, then H[S] is the field generated by S.
This means that, once we have a Boolean algebra closed under PR( ,) for all
with rational endpoints, we get all the rest by countable Boolean operations without
using PR.
Corollary If S is either countable, or a countably generated field, then H[S] is
countably generated.
Using this we can derive from Theorem 6.2 an analogous result for general fields:
Theorem 6.3 Axiom (VI) is equivalent to each of the following conditions:
(A0 ) (3) holds and for every C in K[F], for almost all x: px (C) D 1 if x2C, px (C) D 0
otherwise.
(B0 ) (3) holds and for every A in F (4) is true for almost all x.
(B0 ) differs from the analogous (B) of Theorem 6.2 in that the exceptional 0-set for
(4) can depend on A
Say that A is equal a.e. to B if A-B and B-A are 0-sets. Say that two classes of sets
are equal modulo 0-sets if every member of one is equal a.e. to some member of the
other.
Assuming Axiom (VI) we get:
Corollary If S F, then: (i) The fields K[S], K[K[S]] and K[H[S]] are equal
modulo 0-sets. (ii) If S is a boolean algebra then H[S] is equal modulo 0-sets to
the field generated by S [ K[S].
(To show, for example, that K[S] D K[K[S]] modulo 0-sets, consider C2K[S]; by
Theorem 6.3, fx : px (C)2g is equal a.e. to C if D [1,1], is equal a.e. to W-C if
5
They are seperable, i.e., for some countably generated field every event in the space differs from
a set in the field by a 0-set.
6 A Theory of Higher Order Probabilities 103
D [0,0], and is a 0-set if 0,1 62 . Hence, for all , PR(C,) is equal a.e. to one
of: C, W-C, W, ¿. Since K[K[S]] is generated by such sets, the claim follows.)
Roughly speaking, (ii) means that, modulo 0-sets, nested applications of PR
reduce to non-nested applications. A stronger, syntactical version of this is given
in the next section.
Probability Logic
Let „ be a set of reals such that 0,1 2 . Call an interval with end-points in „ a
„-interval.
Let PRL„ be the calculus obtained by adjoining sentential operants, PR( ,), to
the propositional calculus, where ranges over all closed -intervals. Here, for
the sake of convenience, I use ‘PR’ for the syntactical operant, as well as for the
operation in HOPs. Given some class fXi : i 2Ig of sentential variables, the class of
all wffs (well formed formulas) of PRL„ is the smallest such that:
• Every sentential variable is a wff
• If and are wffs, so are : and * where * is any of the standard binary
connectives.
• If is a wff and is a closed -interval then PR(,) is a wff.
Let H D (W,F,P, PR) be a simple HOP and let r be a mapping which maps each
sentential variable to a member of F. Then the value jjH,r of the wff is defined
by interpreting the sentential connectives as the corresponding Boolean operations
and each syntactic operant PR( ,) as the operation PR( ,) of the HOP.
Definition A wff , is p-valid, to be denoted jDp , if, for every simple HOP H
which satisfies Axiom (VI) and every r, the probability of jjH, is 1. Two wffs ,
are p-equivalent if $ is p-valid.
Call a PC-formula if it is a wff of the propositional calculus, i.e., does not
contain any PR.
Theorem 6.4 Every wff of PRL is p-equivalent to a Boolean combination of PC-
formulas and formulas of the form PR(,) in which ranges over PC-formulas.
This means that as far as probabilities are concerned (i.e., if we disregard 0-sets)
we need not use nested PR’s.
Theorem 6.5 Translate into PRL„ the wffs of propositional modal logic with
the necessity oprant N, by replacing each N( ) by PR( , [1,I]). Let * be the
translation of . Then
S5 ` iff jDp
It can be shown that for D set of rationals the set of p-valid wffs is recursive.
Also PRL can be provided with a natural set of formal axioms so that, with modus
ponens as derivation rule, p-validity coincides with provability.
Some Questions
Other validity notions can be considered (e.g., that jjH, always contains all
coherent worlds
V in the HOP), as well as other interpretations of the necessity operant
(e.g., as PR(,[1,1])). What modal logics are thereby obtained?
General HOPs
.W; F; P; T; PR/
PR(A, t, ) is the event that the probability of A at stage t lies in . If the stages
coincide with time points then the partial ordering of T is total. As before, P is
the current subjective probability; here “current” is earlier (i.e., less than or equally
informative) than the stages in T. Put:
The first five axioms (I*)–(V*) in this setting are the obvious generalizations of our
previous axioms (I)–(V). Namely, we replace ‘PR’ by ‘PRt ’ and require that the
condition hold for all t in T.
Theorem 6.1 generalizes in the obvious way and we get, for each t 2
T and each x 2 W, a probability Pt,x which determines PRt ; it represents
the maximal state of knowledge at stage t in world x.
The “correct” generalization of Axiom (VI) is not as obvious, but is not difficult
to find:
Axiom (VI*) For each t2T the following holds: If C is a finite intersection of
events of the form PRs (B,) where every s is t, and P .C \ PRt .A; Œ˛; ˇ// ¤ 0,
then
ˇ
ˇ
˛ P AˇC \ PRt .A Œ˛; ˇ/ ˇ
The argument for this axiom is the same as the argument for Axiom (Vl). The
essential point is that if s t then true events of the form PRt (B,) are known at
stage t. The same Dutch book argument works for Axiom (VI*).
6 A Theory of Higher Order Probabilities 105
Fix a partially ordered set T D (T, <). The logic PRL,T (which corresponds to
HOPs with set of stages T) is defined in the same way as PRL , except that PR
has an additional argument ranging over T. As before we employ a systematically
ambiguous notation. Define to be p-valid if it gets probability 1 in all HOPs in
which the set of stages is T.
Now consider a propositional modal language, MT , in which we have, instead
of a single necessity operant, an indexed family Nt , t 2 T. Nt states that
is necessary at stage t, i.e., necessary by virtue of the maximal knowledge available
at that stage.
For 2MT , let * be the wff obtained by replacing each Nt by PRt ( , [1,1]). It
can be shown that the set of all in MT such that * is p-valid is exactly the set of
wffs derivable, by modus ponens and the rule: if ` then ` Nt , from the following
axioms:
(i) All tautologies. (ii) For each t 2 T, the axiom schemas of S5, with N replaced
by Nt and
(iii) Ns ! Nt , for each s t.
106 H. Gaifman
Note that (iii) accords well with the intended meaning of the Nt ’s: If something is
necessary at stage s it is also necessary at later stages. On the other hand, something
not necessary at stage s can be necessary later.
References6
6
The list is far from being complete. Some important papers not mentioned in the abstract are to
be found in Ifs, W. Harper ed. Boston Reidel, 1981. Material related in a less direct way: non-
probabilistic measures of certainty (e.g., the Dempster-Shafer measure), expert systems involving
reasoning with uncertainties, probabilistic protocols and distributed systems, has not been included,
but should be included in a fuller exposition.
Chapter 7
On Indeterminate Probabilities
Isaac Levi
SOME men disclaim certainty about anything. I am certain that they deceive
themselves. Be that as it may, only the arrogant and foolish maintain that they are
certain about everything. It is appropriate, therefore, to consider how judgments of
uncertainty discriminate between hypotheses with respect to grades of uncertainty,
probability, belief, or credence. Discriminations of this sort are relevant to the
conduct of deliberations aimed at making choices between rival policies not only
in the context of games of chance, but in moral, political, economic, or scientific
decision making. If agent X wishes to promote some aim or system of values, he
will (ceteris paribus) favor a policy that guarantees him against failure over a policy
that does not. Where no guarantee is to be obtained, he will (or should) favor a policy
that reduces the probability of failure to the greatest degree feasible. At any rate, this
is so when X is engaged in deliberate decision making (as opposed to habitual or
routine choice).
Two problems suggest themselves, therefore, for philosophical consideration:
The Problem of Rational Credence: Suppose that an ideally rational agent X is
committed at time t to adopting as certain a given system of sentences KX,t (in
a suitably regimented L) and to assigning to sentences in L that are not in KX,t
various degrees of (personal) probability, belief, or credence. The problem is to
specify conditions that X’s “corpus of knowledge” KX ,t and his “credal state” BX,t
(i.e., his system of judgments of probability or credence) should satisfy in order
to be reasonable.
The Problem of Rational Choice: Given a corpus KX,t and a credal state BX,t at t,
how should X make decisions between alternative policies from which he must
choose one at t?
I. Levi ()
Columbia University, New York, NY, USA
e-mail: [email protected]
utility and that, appearances to the contrary notwithstanding, men are quite capable
of meeting these requirements and often do so.
I am not concerned to speculate on our capacities for meeting strict bayesian
requirements for credal (and value) rationality. But even if men have, at least to a
good degree of approximation, the abilities bayesians attribute to them, there are
many situations where, in my opinion, rational men ought not to have precise utility
functions and precise probability judgments. That is to say, on some occasions,
we should avoid satisfying the conditions for applying the principle of maximizing
expected utility even if we have the ability to satisfy them.
In this essay, reference to the question of utility will be made from time to
time. I shall not, however, attempt to explain why I think it is sometimes (indeed,
often) irrational to evaluate consequences by means of a utility function unique up
to a linear transformation. My chief concern is to argue that rational men should
sometimes avoid adopting numerically precise probability judgments.
The bayesian answer to the problem of rational choice presupposes at least part of
an answer to the problem of rational credence. For a strict bayesian, a rational agent
has a credal state representable by a numerically precise function on sentences (or
pairs of sentences when conditional probability is considered) obeying the dictates
of the calculus of probabilities.
There are, to be sure, serious disputes among bayesians concerning credal
rationality. In his early writings, Carnap believed that principles of “inductive
logic” could be formulated so that, given X’s corpus KX,t , X’s credal state at
t would be required by the principles of inductive logic to be represented by
a specific Q-function that would be the same for anyone having that corpus.1
Others (including the later Carnap2 ) despair of identifying such strong principles.
Nonetheless, bayesian critics of the early Carnap’s program for inductive logic
continue to insist that ideally rational agents should assign precise probabilities to
hypotheses.
II
X’s corpus of knowledge KX,t shall be construed to be the set of sentences (in L)
to whose certain truth X is committed at t. I am not suggesting that X is explicitly
or consciously certain of the truth of every sentence in KX,t , but only that he is
committed to being certain. X might be certain at t of the truth of h and, hence, be
committed to being certain of h _ g, without actually being certain. Should it be
brought to X’s attention, however, that h _ g is a deductive consequence of h, he
would be obliged as a rational agent either to cease being certain of h or to take h
_ g to be certain. The latter alternative amounts to retaining his commitment; the
former to abandoning it.
1
Logical Foundations of Probability (Chicago: University Press, 2nd ed., 1962), pp. 219–241.
2
“Inductive Logic and Rational Decisions,” in Carnap and R. C. Jeffrey, eds., Studies in Inductive
Logic and Probability (Berkeley: UCLA Press, 1971), p. 27.
7 On Indeterminate Probabilities 111
III
How does all this relate to bayesian views about the revision of credal states?
Consider X’s corpus of knowledge KX,t at t. X’s credal state BX,t at t is, according
to strict bayesians, determined by KX,t . Strict bayesians disagree among themselves
concerning the appropriate way in which to formulate this determination. The
following characterization captures the orthodox view in all its essentials.
Let K be any potential corpus (i.e., let it be UK or an expansion thereof).
Let CX,t (K) be X’s judgment at t as to what his credal state should be were he
to adopt K as his corpus of knowledge. I shall suppose that X is committed to
judgments of this sort for every feasible K in L. The resulting function from potential
corpora of knowledge to potential credal states shall be called X’s “confirmational
commitment” at t.
According to strict bayesians, no matter what corpus K is (provided it is
consistent), CX,t (K) is representable by a probability function where all sentences
in K receive probability 1. In particular, CX,t (UK) is representable by a function
P(x;y)—which I shall call a P-function, to contrast it with a Q-function representing
CX,t (K) where K is an expansion of UK.
Strict bayesians adopt the following principle, which imposes restrictions upon
confirmational commitments:
Confirmational Conditionalization: If K is obtained from UK by adding e (con-
sistent with UK) to UK and forming the deductive closure, P(x;y) represents
Cx,t (UK) and Q(x;y) represents Cx,t (K), Q(h;f ) D P(h;f&e)
In virtue of this principle, X’s confirmational commitment is defined by specify-
ing CX,t (UK) D CX,t and employing confirmational conditionalization.3 X’s credal
state at t, BX,t , is then determined by KX,t and CX,t according to the following
principle:
Total Knowledge: Cx,t (Kx,t ) D Bx,t
Notice that the principle of confirmational conditionalization, even when taken
together with the principle of total knowledge, does not prescribe how X should
modify his credal state given a change in his corpus of knowledge.
To see this, suppose that at t1 X’s corpus is K1 and that at t2 his corpus K2
is obtained from K1 by adding e (consistent with K1 ) and forming the deductive
3
Confirmational commitments built on the principle of confirmational conditionalization are called
“credibilities” by Carnap (ibid., pp. 17–19). The analogy is not quite perfect. According to Carnap,
a credibility function represents a permanent disposition of X to modify his credal states in the
light of changes in his corpus of knowledge. When credibility is rational, it can be represented by
a “confirmation function.” Since I wish to allow for modifications of confirmational commitments
as well as bodies of knowledge and credal states, I assign dates to confirmational commitments.
Throughout I gloss over Carnap’s distinction between credibility functions and confirmation
functions (ibid., pp. 24–27).
7 On Indeterminate Probabilities 113
IV
4
“A Basic System of Inductive Logic,” in Carnap and Jeffrey, op. cit., pp. 51–52.
7 On Indeterminate Probabilities 115
“Coherentists” like de Finetti and Savage claim that the principle of coherence
constitutes a complete inductive logic. On their view, CIL(UK) is the set of all P-
functions obeying the calculus of probabilities defined over M.
Some authors are prepared to add a further principle to the principle of coherence.
This principle determines permissible Q-values for hypotheses about the outcome
of a specific experiment on a chance device, given suitable knowledge about the
experiment to be performed and the chances of possible outcomes of experiments
of that type.
There is considerable controversy concerning the formulation of such a principle
of “direct inference.” In large measure, the controversy reflects disagreements
over the interpretation of “chance” or “statistical probability,” concerning the so-
called “problem of the reference class” and random sampling. Indeed, the reason
coherentists do not endorse a principle linking objective chance with credence is
that they either deny the intelligibility of the notion of objective chance or argue in
favor of dispensing with that notion.
Setting these controversies to one side, I shall call anyone who holds that a
complete inductive logic consists of the coherence principle and an additional
principle of direct inference from knowledge of chance to outcomes of random
experiments an “objectivist.”
There are many authors who are neither coherentists nor objectivists because they
wish to supplement the principles of coherence and direct inference with additional
principles. Some follow J. M. Keynes, Jeffreys, and Carnap in adding principles
of symmetry of various kinds. Others, like I. Hacking,5 introduce principles of
irrelevance or other criteria which attempt to utilize knowledge about chances in
a manner different from that employed in direct inference. Approaches of this
sort stem by and large from the work of R. A. Fisher. I lack a good tag for this
somewhat heterogeneous group of viewpoints. They all agree, however, in denying
that objectivist inductive logic is a complete inductive logic.
Attempting to classify the views of historically given authors concerning induc-
tive logic is fraught with risk. I shall not undertake a tedious and thankless task
of textual analysis in the vain hope of convincing the reader that many eminent
authors have been committed to an inductive logic whether they have said so or not.
Yet much critical insight into controversies concerning probability, induction, and
statistical inference can be obtained by reading the parties to the discussion as if they
were committed to some form of inductive logic. If I am right, far from being a dead
issue, inductive logic remains very much alive and debated (at least implicitly) not
only by bayesians of the Keynes-Jeffreys-Carnap persuasion but by objectivists (to
whose number I think J. Neyman, H. Reichenbach, and, with some qualifications,
H. Kyburg belong) and the many authors, like Hacking, who are associated with the
tradition of Fisher in various ways.
Assuming, for the sake of the argument, that the debate concerning what
constitutes a complete set of principles of inductive logic is settled (I, for one, would
5
Logic and Statistical Inference (New York: Cambridge, 1965), p. 135.
118 I. Levi
weak for use in practical decision making or statistical inference. (Many objectivist
necessitarians seem to deny this; but the matter is much too complicated to discuss
here.)
Personalists, like de Finetti and Savage, abandon necessitarianism but continue
to endorse confirmational tenacity—at least during normal periods free from
revolutionary stress. It is this position that I contended earlier leads to dogmatism or
capriciousness with respect to confirmational commitment.
The view I favor is revisionism. This view agrees with the personalist position
in allowing rational men to adopt confirmational commitments stronger than CIL.
It insists, however, that such commitments are open to revision. It sees as a
fundamental epistemological problem the task of providing an account of the
conditions under which such revision is appropriate and criteria for evaluating
proposed changes in confirmational commitment on those occasions when such
shifts are needed.
I shall not offer an account of the revision of confirmational commitments.
The point I wish to emphasize here is that, once one abandons the strict bayesian
approach to credal rationality and allows credal states to contain more than one
permissible Q-function in the manner I am suggesting, the revisionist position can
be seriously entertained. The strict bayesian view precludes it and leaves us with
the dubious alternatives of necessitarianism and personalism. By relaxing the strict
bayesian requirements on credal rationality, we can at least ask a question about
revision which could not be asked before.
VI
6
“Subjective Probability as the Measure of a Non-measurable Set,” in P. Suppes, E. Nagel, and A.
Tarski, Logic, Methodology, and the Philosophy of Science (Stanford: University Press, 1962), pp.
319–329.
7
“Consistency in Statistical Inference and Decision” (with discussion), Journal of the Royal
Statistical Society, series B, XXIII (1961): 1–25.
8
“Upper and Lower Probabilities Induced by a mutivalued Mapping,” Annals of Mathematical
Statistics, XXXVIII (1967): 325–339.
9
“The Bases of Probability,” Bulletin of the American Mathematical Society, XLVI (1940): 763–
774.
10
Dempster, op. cit.; Good, op. cit.; Kyburg, Probability and the Logic of Rational Belief
(Middletown, Conn.: Wesleyan Univ. Press, 1961): Smith, op. cit.; Schick, Explication and
Inductive Logic, doctoral dissertation, Columbia University, 1958.
7 On Indeterminate Probabilities 121
even though they generate the identical interval-valued function—provided they are
different convex sets of Q-functions.11
Thus, the chief difference between my proposal and other efforts to come to
grips with “indeterminate” probability judgments is that my proposal recognizes
significant differences between credal states (confirmational commitments) where
other proposals recognize none. Is this a virtue, or are the fine distinctions allowed
by my proposal so much excess conceptual baggage?
I think that the distinctions between credal states recognized by the proposals
introduced here are significant. Agents X and Y, who confront the same set of
feasible options and evaluate the possible consequences in the same way may,
nonetheless, be obliged as rational agents to choose different options if their
credal states are different, even though their credal states define the same interval-
valued credence function. That is to say, according to the decision theory that
supplements the account of rational credence just introduced, differences in credal
states recognized by my theory but not by Dempster’s or Smith’s, do warrant
different choices in otherwise similar contexts of choice.
To explain this claim, we must turn to a consideration of rational choice. We
would have to do so anyhow. One of the demands that can fairly be made of those
who propose theories rival to bayesianism is that they furnish answers not only to the
problems of rational credence and revision but to the questions about rational choice.
Furthermore, the motivation for requiring credal states to be non-empty, convex
sets of probability measures and the explanation of the notion of a permissible Q-
function are best understood within the context of an account of rational choice. For
all these reasons, therefore, it is time to discuss rational choice.
VII
Consider, once more, a situation where X faces a decision problem of the type
described in section “I”. No longer, however, will it be supposed that X’s credal
state for the “states of nature” h1 , h2 , : : : , hn and for the possible consequences oi1
oi2 , : : : , oim conditional on X choosing Ai are representable by a single Q-function.
Instead, the credal state will be required only to be a nonempty convex set of Q-
functions.12
11
The difference between my approach and Smith’s was drawn to my attention by Howard Stein. To
all intents and purposes, both Dempster and Smith represent credal states by the largest convex sets
that generate the interval-valued functions characterizing those credal states. Dempster (332/3) is
actually more restrictive than Smith. Dempster, by the way, wrongly attributes to Smith the position
I adopt. To my knowledge, Dempster is the first to consider this position in print—even if only to
misattribute it to Smith.
12
As in section “I”, I am supposing that “states of nature” are “independent” of options in the
sense that, for every permissible Q-function, Q(hj ) D Q(oij ;Ai ). I have done this to facilitate the
exposition. No question of fundamental importance is, in my opinion, thereby seriously altered.
122 I. Levi
Although I have not focused attention here on the dubiety of requiring X’s
evaluations of the oij s to be representable by a utility function unique up to a linear
transformation, I do believe that rational men can have indeterminate preferences
and will, for the sake of generality, relax the bayesian requirement as follows: X’s
system of evaluations of the possible consequences of the feasible options is to be
represented by a set G of “permissible” u-functions defined over the oij s which is
(a) nonempty, (b) convex, and such that all linear transformations of u-functions in
G are also in G. A bayesian G is, in effect, such that all u-functions in it are linear
transformations of one another. It is this latter requirement that I am abandoning.
In those situations where X satisfies strict bayesian conditions so that his credal
state contains only a single Q-function and G contains all and only those u-functions
which are linear transformations of some specific u-function u1 , an admissible
option Ai is, according to the principle of maximizing expected utility, an option
X m
that bears maximum expected utility E .Ai / D Q .hi / u1 oij . Notice that, if
iD1
any linear transformation of u1 is substituted for u1 in the computation of expected
utility, the ranking of options with respect to expected utility remains unaltered.
Hence we can say that, according to strict bayesians, an option is admissible if it
bears maximum expected utility relative to the uniquely permissible Q-function and
to any of the permissible u-functions in G (all of which are linear transformations
of u1 ).
There is an obvious generalization of this idea applicable to situations where BX,t
contains more than one permissible Q-function and G contains u-functions that are
not linear transformations of one another. I shall say that Ai is E-admissible if and
only if there is at least one Q-function in BX,t and one u-function in G such that
E(Ai ) defined relative to that Q-function and u-function is a maximum among all
the feasible options. The generalization I propose is the following:
E-admissibility: All admissible options are E-admissible.
The principle of E-admissibility is by no means novel. I. J. Good, for example,
endorsed it at one time. Indeed, Good went further than this. He endorsed the
converse principle that all E-admissible options are admissible as well.13
I disagree with Good’s view on this. When X’s credal state and goals select
more than one option as E-admissible, there may be and sometimes are other
considerations than E-admissibility which X, as a rational agent, should employ
in choosing between them.
There are occasions where X identifies two or more options as E-admissible
and where, in addition, he has the opportunity to defer decision between them.
If that opportunity is itself E-admissible, he should as a rational agent “keep his
options open.” Notice that in making this claim I am not saying that the option
of deferring choice between the other E-admissible options is “better” than the
13
“Rational Decisions,” Journal of the Royal Statistical Society, Ser. B, XIV (1952): 114.
7 On Indeterminate Probabilities 123
other E-admissible options relative to X’s credence and values and the assessments
of expected utility based thereon. In general, E-admissible options will not be
comparable with respect to expected utility (although sometimes they will be). The
injunction to keep one’s options open is a criterion of choice that is based not on
appraisals of expected utility but on the “option-preserving” features of options.
Deferring choice is better than the other E-admissible options in this respect, but
not with respect to expected utility.
Thus, a P-admissible option is an option that is (a) E-admissible and (b) “best”
with respect to E-admissible option preservation among all E-admissible options.
I shall not attempt to provide an adequate explication of clause (b) here. In the
subsequent discussion, I shall consider situations where there are no opportunities to
defer choice. Nonetheless, it is important to notice that, given a suitably formulated
surrogate for (b), the following principle holds:
P-admissibility: All admissible options are P-admissible.
My disagreement with Good goes still further than this; for I reject not only the
converse of E-admissibility but that of P-admissibility as well.
To illustrate, consider a situation that satisfies strict bayesian requirements. X
knows that a coin with a .5 chance of landing heads is to be tossed once. g is the
hypothesis that the coin will land heads. Under the circumstances, we might say that
X’s credal state is such that all permissible Q-functions assign g the value Q(g) D .5.
Suppose that X is offered a gamble on g where X gains a dollar if g is true and loses
one if g is false. (I shall assume that X has neither a taste for nor an aversion to
gambling and that, for such small sums, money is linear with utility). He has two
options: to accept the gamble and to reject it. If he rejects it, he neither gains nor
loses.
Under the circumstances described, the principle of maximizing expected utility
may be invoked. It indicates that both options are optimal and, hence, in my terms
E-admissible. Since there are no opportunities for delaying choice, both options (on
a suitably formulated version of P-admissibility) become P-admissible.
Bayesians—and Good would agree with this—tend to hold that rational X is
free to choose either way. Not only are both options E-admissible. They are both
admissible. Yet, in my opinion, rational X should refuse the gamble. The reason is
not that refusal is better in the sense that it has higher expected utility than accepting
the gamble. The options come out equal on this kind of appraisal. Refusing the
gamble is “better,” however, with respect to the security against loss it furnishes
X. If X refuses the gamble, he loses nothing. If he accepts the gamble, he might
lose something. This appeal to security does not carry weight, in my opinion, when
accepting the gamble bears higher expected utility than refusing it. However, in that
absurdly hypothetical situation where they bear precisely the same expected utility,
the question of security does become critical.
These considerations can be brought to bear on the more general situation where
two or more options are E-admissible (even though they are not equal with respect
to expected utility) and where the principle of P-admissibility does not weed out
any options.
124 I. Levi
14
The possible consequences of a “mixed act” constructed by choosing between “pure options”
Ai and Aj with the aid of a chance device with known chance probability of selecting one or the
other option is the set of possible consequences of either Ai or Aj . Consequently, the security level
of such a mixed option for a given u-function is the lowest of the security levels belonging to Ai
and Aj . Thus, my conception of security levels for mixed acts differs from that employed by von
Neumann and Morgenstern and by Wald in formulating maximin (or minimax) principles. For this
reason, starting with a set of P-admissible pure options, one cannot increase the security level by
forming mixtures of them. In any case, mixtures of E-admissible options are not always themselves
E-admissible. I shall leave mixed options out of account in the subsequent discussion. See D. Luce
and H. Raiffa, Games and Decisions (New York: Wiley, 1958), pp. 68–71, 74–76, 278–280.
7 On Indeterminate Probabilities 125
VIII
on equality as bayesians do. Given Smith’s definitions of upper and lower pignic
probabilities, it should be fairly clear that, in case 1 and case 2 where CrX,t (g) D [.4,
.6], Smith’s analysis and mine coincide.15
Before leaving cases 1 and 2, it should be noted that, if X’s credal state were
empty, no option in case 1 would be admissible and no option in case 2 would be
admissible either. If X is confronted with a case 1 predicament and an empty credal
state, he would be constrained to act and yet as a rational agent enjoined not to
act. The untenability of this result is to be blamed on adopting an empty credal
state. Only when X’s corpus is inconsistent, should a rational agent have an empty
credal state. But, of course, if X finds his corpus inconsistent, he should contract to
a consistent one.
Case 3: A1 is accepting both the case 1 and the case 2 gamble jointly with a net
payoff if g is true or false of S 2P.
This is an example of decision making under certainty. Everyone agrees that if P
is greater than 2S the gamble should be rejected; for it leads to certain loss. If P is
less than 2S X should accept the gamble; for it leads to a certain gain. These results,
by the way, are implied by the criteria proposed here as well as by the strict bayesian
view.
Strict bayesians often defend requiring that Q-functions conform to the require-
ments of the calculus of probabilities by an appeal to the fact that, when credal states
contain but a single Q-function, a necessary and sufficient condition for having
credal states that do not license sure losses (Dutch books) is having a Q-function
obeying the calculus of probabilities. The arguments also support the conclusion
that, even when more than one Q-function is permissible according to a credal
state, if all permissible Q-functions obey the coherence principle, no Dutch book
can become E-admissible and, hence, admissible.
Case 4: B1 is accepting the case 1 gamble, B2 is accepting the case 2 gamble, and
B3 is rejecting both gambles.
15
Smith, op. cit, pp. 3–5, 6–7. The agreement applies only to pairwise choices where one option is a
gamble in which there are two possible payoffs and the other is refusing to gamble with 0 gain and 0
loss. In this kind of situation, it is clear that Smith endorses the principle of E-admissibility, but not
its converse. However, in the later sections of his paper where Smith considers decision problems
with three or more options or where the possible consequences of an option to be considered
are greater than 2, Smith seems (but I am not clear about this) to endorse the converse of the
principle of E-admissibility—counter to the analysis on the basis of which he defines lower and
upper pignic probabilities. Thus, it seems to me that either Smith has contradicted himself or (as
is more likely) he simply does not have a general theory of rational choice. The latter sections
of the paper may then be read as interesting explorations of technical matters pertaining to the
construction of such a theory, but not as actually advocating the converse of E-admissibility. At
any rate, since it is the theory Smith propounds in the first part of his seminal essay which interests
me, I shall interpret him in the subsequent discussion as having no general theory of rational choice
beyond that governing the simple gambling situations just described.
7 On Indeterminate Probabilities 127
Let the credal state be such that all values between 0 and 1 are permissible Q-
values for h1 and, hence, all values between .4 and .6 are permissible for g.
If P/S is greater than .6, B3 is uniquely E-admissible and, hence, admissible. If
P/S is less than .4, B3 is E-inadmissible. The other two options are E-admissible and
admissible.
If P/S is greater than or equal to .4 and less than .5, B3 remains inadmissible and
the other two admissible.
If P/S is greater than or equal to .5 and less than .6, all three options are E-
admissible; but B3 is uniquely S-admissible. Hence, B3 should be chosen when P/S
is greater than or equal to .5.
Three comments are worth making about these results.
(i) I am not sure what analysis Smith would propose of situations like case 4. At
any rate, his theory does not seem to cover it (but see footnote 15).
(ii) When P/S is between .4 and .5, my theory recommends rejecting the gamble
in case 1, rejecting the gamble in case 2, and yet recommends accepting one or
the other of these gambles in case 4. This violates the so-called “principle of
independence of irrelevant alternatives.”16
(iii) If the convexity requirement for credal states were violated by removing as
permissible values for g all values from (S – P)/S to P/S, where P/S is greater
than .5 and less than .6, but leaving all other values from .4 to .6, then—counter
to the analysis given previously, B3 would not be E-admissible in case 4. The
peculiarity of that result is that B1 is E-admissible because, for permissible
Q-values from .6 down to P/S, it bears maximum expected utility, with B3 a
close second. B2 is E-admissible because, for Q-values from .4 to (S – P)/S,
B2 is optimal, with B3 again a close second. If the values between (S P)/S
and P/S are also permissible, B3 is E-admissible because it is optimal for those
values. To eliminate such intermediate values and allow the surrounding values
to retain their permissibility seems objectionable. Convexity guarantees against
this.
Case 5: X is offered a gamble on a take-it-or-leave-it basis in which he wins 15 cents
if f1 is true, loses 30 cents if f2 is true, and wins 40 cents if f3 is true.
16
See Luce and Raiffa, op. cit., pp. 288/9. Because the analysis offered by Smith and me for cases
1 and 2 seems perfectly appropriate and the analysis for case 4 also appears impeccable, I conclude
that there is something wrong with the principle of independence of irrelevant alternatives.
A hint as to the source of the trouble can be obtained by noting that if ‘E-admissible’ is
substituted for ‘optimal’ in the various formulations of the principle cited by Luce and Raiffa,
p. 289, the principle of independence of irrelevant alternatives stands. The principle fails because
S-admissibility is used to supplement E-admissibility in weeding out options from the admissible
set.
Mention should be made in passing that even when ‘E-admissible’ is substituted for ‘optimal’
in Axiom 9 of Luce and Raiffa, p. 292, the axiom is falsified. Thus, when .5 P/S .6 in case 4,
all three options are E-admissible, yet some mixtures of B1 and B2 will not be.
128 I. Levi
17
I mention this because I. J. Good, whose seminal ideas have been an important influence on the
proposals made in this essay, confuses permissible with possible probabilities. As a consequence,
he introduces a hierarchy of types of probability (Good, op. cit., p. 327). For criticism of such
7 On Indeterminate Probabilities 129
I have scratched the surface of some of the questions raised by the proposals
made in this essay. Much more needs to be done. I do believe, however, that these
proposals offer fertile soil for cultivation not only by statisticians and decision
theorists but by philosophers interested in what, in my opinion, ought to be the
main problem for epistemology—to wit, the improvement (and, hence, revision) of
human knowledge and belief.
Acknowledgements Work on this essay was partially supported by N.S.F. grant GS 28992.
Research was carried out while I was a Visiting Scholar at Leckhampton, Corpus Christi, Cam-
bridge. I wish to thank the Fellows of Corpus Christi College and the Departments of Philosophy
and History and Philosophy of Science, Cambridge University, for their kind hospitality. I am
indebted to Howard Stein for his help in formulating and establishing some of the results reported
here. Sidney Morgenbesser, Ernest Nagel, Teddy Seidenfeld, and Frederic Schick as well as Stein
have made helpful suggestions.
views, see Savage, The Foundations of Statistics (New York: Wiley, 1954), p. 58. In fairness to
Good, it should be mentioned that his possible credal probabilities are interpreted not as possibly
true statistical hypotheses but as hypotheses entertained by X about his own unknown strictly
bayesian credal state. Good is concerned with situations where strict bayesian agents having precise
probability judgments cannot identify their credal states before decision and must make choices
on the basis of partial information about themselves. [In Decision and Value (New York: Wiley,
1964), P. G. Fishburn devotes himself to the same question.] My proposals do not deal with this
problem. I reject Good’s and Fishburn’s view that every rational agent is at bottom a strict bayesian
limited only by his lack of self-knowledge, computational facility, and memory. To the contrary, I
claim that, even without such limitations, rational agents should not have precise bayesian credal
states. The difference in problem under consideration and presuppositions about rational agents
has substantial technical ramifications which cannot be developed here.
Chapter 8
Why I am not a Bayesian
Clark Glymour
1
A third view, that probabilities are to be understood exclusively as frequencies, has been most
ably defended by Wesley Salmon (1969).
8 Why I am not a Bayesian 133
theories, or probabilistic analyses of data, have been great rarities in the history of
science. In the physical sciences at any rate, probabilistic arguments have rarely
occurred. Copernicus, Newton, Kepler, none of them give probabilistic arguments
for their theories; nor does Maxwell or Kelvin or Lavoisier or Dalton or Einstein or
Schrödinger or : : : There are exceptions. Jon Dorling has discussed a seventeenth-
century Ptolemaic astronomer who apparently made an extended comparison of
Ptolemaic and Copernican theories in probabilistic terms; Laplace, of course, gave
Bayesian arguments for astronomical theories. And there are people—Maxwell, for
example—who scarcely give a probabilistic argument when making a case for or
against scientific hypotheses but who discuss methodology in probabilistic terms.
This is not to deny that there are many areas of contemporary physical science where
probability figures large in confirmation; regression analysis is not uncommon in
discussions of the origins of cosmic rays, correlation and analysis of variance in
experimental searches for gravitational waves, and so on. It is to say that, explicitly,
probability is a distinctly minor note in the history of scientific argument.
The rarity of probability considerations in the history of science is more an
embarrassment for some accounts of probability than for others. Logical theories,
whether Carnap’s or those developed by Hintikka and his students, seem to lie at
a great distance from the history of science. Still, some of the people working in
this tradition have made interesting steps towards accounting for methodological
truisms. My own inclination is to believe that the interest such investigations
have stems more from the insights they obtain into syntactic versions of structural
connections among evidence and hypotheses than to the probability measures they
mesh with these insights. Frequency interpretations suppose that for each hypothesis
to be assessed there is an appropriate reference class of hypotheses to which
to assign it, and the prior probability of the hypothesis is the frequency of true
hypotheses in this reference class. The same is true for statements of evidence,
whether they be singular or general. The matter of how such reference classes are to
be determined, and determined so that the frequencies involved do not come out to
be zero, is a question that has only been touched upon by frequentist writers. More to
the point, for many of the suggested features that might determine reference classes,
we have no statistics, and cannot plausibly imagine those who figure in the history of
our sciences to have had them. So conceived, the history of scientific argument must
turn out to be largely a history of fanciful guesses. Further, some of the properties
that seem natural candidates for determining reference classes for hypotheses—
simplicity, for example—seem likely to give perverse results. We prefer hypotheses
that posit simple relations among observed quantities, and so on a frequentist view
should give them high prior probabilities. Yet simple hypotheses, although often
very useful approximations, have most often turned out to be literally false.
At present, perhaps the most philosophically influential view of probability
understands it to be degree of belief. The subjectivist Bayesian (hereafter, for
brevity, simply Bayesian) view of probability has a growing number of advocates
who understand it to provide a general framework for understanding scientific
reasoning. They are singularly unembarrassed by the rarity of explicit probabilistic
134 C. Glymour
arguments in the history of science, for scientific reasoning need not be explicitly
probabilistic in order to be probabilistic in the Bayesian sense. Indeed, a number
of Bayesians have discussed historical cases within their framework. Because of
its influence and its apparent applicability, in what follows it is to the subjective
Bayesian account that I shall give my full attention.
My thesis is several-fold. First, there are a number of attempts to demonstrate
a priori the rationality of the restrictions on belief and inference that Bayesians
advocate. These arguments are altogether admirable, but ought, I shall maintain,
to be unconvincing. My thesis in this instance is not a new one, and I think many
Bayesians do regard these a priori arguments as insufficient. Second, there are a
variety of methodological notions that an account of confirmation ought to explicate
and methodological truisms involving these notions that a confirmation theory ought
to explain: for example, variety of evidence and why we desire it, ad hoc hypotheses
and why we eschew them, what separates a hypothesis integral to a theory from
one ‘tacked on’ to the theory, simplicity and why it is so often admired, why
‘de-Occamized’ theories are so often disdained, what determines when a piece of
evidence is relevant to a hypothesis, and what, if anything, makes the confirmation
of one bit of theory by one bit of evidence stronger than the confirmation of
another bit of theory (or possibly the same bit) by another (or possibly the same)
bit of evidence. Although there are plausible Bayesian explications of some of
these notions, there are not plausible Bayesian explications of others. Bayesian
accounts of methodological truisms and of particular historical cases are of one
of two kinds: either they depend on general principles restricting prior probabilities,
or they don’t. My claim is that many of the principles proposed by the first kind of
Bayesian are either implausible or incoherent, and that, for want of such principles,
the explanations the second kind of Bayesians provide for particular historical cases
and for truisms of method are chimeras. Finally, I claim that there are elementary but
perfectly common features of the relation of theory and evidence that the Bayesian
scheme cannot capture at all without serious—and perhaps not very plausible—
revision.
It is not that I think the Bayesian scheme or related probabilistic accounts capture
nothing. On the contrary, they are clearly pertinent where the reasoning involved
is explicitly statistical. Further, the accounts developed by Carnap, his predeces-
sors, and his successors are impressive systematizations and generalizations, in
a probabilistic framework, of certain principles of ordinary reasoning. But so far
as understanding scientific reasoning goes, I think it is very wrong to consider
our situation to be analogous to that of post-Fregean logicians, our subject-matter
transformed from a hotchpotch of principles by a powerful theory whose outlines
are clear. We flatter ourselves that we possess even the hotchpotch. My opinions are
outlandish, I know; few of the arguments I shall present in their favour are new, and
perhaps none of them is decisive. Even so, they seem sufficient to warrant taking
seriously entirely different approaches to the analysis of scientific reasoning.
The theories I shall consider share the following framework, more or less. There
is a class of sentences that express all hypotheses and all actual or possible evidence
8 Why I am not a Bayesian 135
of interest; the class is closed under Boolean operations. For each ideally rational
agent, there is a function defined on all sentences such that, under the relation
of logical equivalence, the function is a probability measure on the collection
of equivalence classes. The probability of any proposition represents the agent’s
degree of belief in that proposition. As new evidence accumulates, the probability
of a proposition changes according to Bayes’s rule: the posterior probability of
a hypothesis on the new evidence is equal to the prior conditional probability of
the hypothesis on the evidence. This is a scheme shared by diverse accounts of
confirmation. I call such theories ‘Bayesian’, or sometimes ‘personalist’.
We certainly have grades of belief. Some claims I more or less believe, some
I find plausible and tend to believe, others I am agnostic about, some I find
implausible and far-fetched, still others I regard as positively absurd. I think
everyone admits some such gradations, although descriptions of them might be
finer or cruder. The personalist school of probability theorists claim that we also
have degrees of belief, degrees that can have any value between 0 and 1 and that
ought, if we are rational, to be representable by a probability function. Presumably,
the degrees of belief are to co-vary with everyday gradations of belief, so that one
regards a proposition as preposterous and absurd just if his degree of belief in it is
somewhere near zero, and he is agnostic just if his degree of belief is somewhere
near a half, and so on. According to personalists, then, an ideally rational agent
always has his degrees of belief distributed so as to satisfy the axioms of probability,
and when he comes to accept a new belief, he also forms new degrees of belief by
conditionalizing on the newly accepted belief. There are any number of refinements,
of course; but that is the basic view.
Why should we think that we really do have degrees of belief? Personalists have
an ingenious answer: people have them because we can measure the degrees of
belief that people have. Assume that no one (rational) will accept a wager on which
he expects a loss, but anyone (rational) will accept any wager on which he expects a
gain. Then we can measure a person’s degree of belief in proposition P by finding,
for fixed amount v, the highest amount u such that the person will pay u in order
to receive u C if P is true, but receive nothing if P is not true. If u is the greatest
amount the agent is willing to pay for the wager, his expected gain on paying u must
be zero. The agent’s gain if P is the case is v; his gain if P is not the case is u.
Thus
v prob.P/ C . u/ prob . P/ D 0:
prob.P/ D u= .u C v/ :
The reasoning is clear: any sensible person will act so as to maximize his expected
gain; thus, presented with a decision whether or not to purchase a bet, he will make
136 C. Glymour
the purchase just if his expected gain is greater than zero. So the betting odds he
will accept determine his degree of belief.2
I think that this device really does provide evidence that we have, or can produce,
degrees of belief, in at least some propositions, but at the same time it is evident that
betting odds are not an unobjectionable device for the measurement of degrees of
belief. Betting odds could fail to measure degrees of belief for a variety of reasons:
the subject may not believe that the bet will be paid off if he wins, or he may doubt
that it is clear what constitutes winning, even though it is clear what constitutes
losing. Things he values other than monetary gain (or whatever) may enter into his
determination of the expected utility of purchasing the bet: for example, he may
place either a positive or a negative value on risk itself. And the very fact that he is
offered a wager on P may somehow change his degree of belief in P.
Let us suppose, then, that we do have degrees of belief in at least some
propositions, and that in some cases they can be at least approximately measured
on an interval from 0 to 1. There are two questions: why should we think that, for
rationality, one’s degrees of belief must satisfy the axioms of probability, and why
should we think that, again for rationality, changes in degrees of belief ought to
proceed by conditionalization? One question at a time. In using betting quotients
to measure degrees of belief, it was assumed that the subject would act so as to
maximize expected gain. The betting quotient determined the degree of belief by
determining the coefficient by which the gain is multiplied in case that P is true in
the expression for the expected gain. So the betting quotient determines a degree of
belief, as it were, in the role of a probability. But why should the things, degrees
of belief, that play this role be probabilities? Supposing that we do choose those
actions that maximize the sum of the product of our degrees of belief in each
possible outcome of the action and the gain (or loss) to us of that outcome. Why
must the degrees of belief that enter into this sum be probabilities? Again, there is an
ingenious argument: if one acts so as to maximize his expected gain using a degree-
of-belief function that is not a probability function, and if for every proposition
there were a possible wager (which, if it is offered, one believes will be paid off
if it is accepted and won), then there is a circumstance, a combination of wagers,
that one would enter into if they were offered, and in which one would suffer a net
loss whatever the outcome. That is what the Dutch-book argument shows; what it
counsels is prudence.
Some of the reasons why it is not clear that betting quotients are accurate
measures of degrees of belief are also reasons why the Dutch-book argument is
not conclusive: there are many cases of propositions in which we may have degrees
of belief, but on which, we may be sure, no acceptable wager will be offered us;
2
More detailed accounts of means for determining degrees of belief may be found in Jeffrey (1965).
It is a curious fact that the procedures that Bayesians use for determining subjective degrees of
belief empirically are an instance of the general strategy described in Glymour 1981, ch. 5. Indeed,
the strategy typically used to determine whether or not actual people behave as rational Bayesians
involves the bootstrap strategy described in that chapter.
8 Why I am not a Bayesian 137
again, we may have values other than the value we place on the stakes, and these
other values may enter into our determination whether or not to gamble; and we
may not have adopted the policy of acting so as to maximize our expected gain or
our expected utility: that is, we may save ourselves from having book made against
us by refusing to make certain wagers, or combinations of wagers, even though we
judge the odds to be in our favour.
The Dutch-book argument does not succeed in showing that in order to avoid
absurd commitments, or even the possibility of such commitments, one must have
degrees of belief that are probabilities. But it does provide a kind of justification
for the personalist viewpoint, for it shows that if one’s degrees of belief are
probabilities, then a certain kind of absurdity is avoided. There are other ways of
avoiding that kind of absurdity, but at least the personalist way is one such.3
One of the common objections to Bayesian theory is that it fails to provide any
connection between what is inferred and what is the case. The Bayesian reply is
that the method guarantees that, in the long run, everyone will agree on the truth.
Suppose that Bi are a set of mutually exclusive, jointly exhaustive hypotheses, each
with probability B(i). Let xr be a sequence of random variables
ˇ with aˇ finite
set
ˇ ˇ
of values and conditional distribution given by P xr D xr ˇBi D © xr ˇBi ; then
we can think of the values xr as the outcomes of experiments, each hypothesis
determining a likelihood for each outcome. Suppose that no two hypotheses have
the same likelihood distribution; that is, for i ¤ j it is not the case that for all values
xr of xr ; "(xr jBi ) D "(xr jBj ), where the "’s are defined as above. Let x denote the first
n of these variables, where x is a value of x. Now imagine an observation of these n
random variables. In Savage’s words:
Before the observation, the probability that the probability given x of whichever
element of the partition actually obtains will be greater than ˛ is
X ˇ ˇ
ˇ ˇ
B.i/P P Bi ˇx > ’ˇBi ;
i
where summation is confined to those i’s for which B(i) ¤ 0. (1972: 49)
In the limit as n approaches infinity, the probability that the probability given x
of whichever element of the partition actually obtains is greater than ˛ is 1. That is
the theorem. What is its significance? According to Savage, ‘With the observation
of an abundance of relevant data, the person is almost certain to become highly
convinced of the truth, and it has also been shown that he himself knows this to
be the case’ (p. 50). That is a little misleading. The result involves second-order
probabilities, but these too, according to personalists, are degrees of belief. So what
has been shown seems to be this: in the limit as n approaches infinity, an ideally
rational Bayesian has degree of belief 1 that an ideally rational Bayesian (with
degrees of belief as in the theorem) has degree of belief, given x, greater than ˛
3
For further criticisms of the Dutch-book argument see Kyburg 1978.
138 C. Glymour
in whichever element of the partition actually obtains. The theorem does not tell us
that in the limit any rational Bayesian will assign probability 1 to the true hypothesis
and probability 0 to the rest; it only tells us that rational Bayesians are certain that
he will. It may reassure those who are already Bayesians, but it is hardly grounds
for conversion. Even the reassurance is slim. Mary Hesse points out (1974: 117–19),
entirely correctly I believe, that the assumptions of the theorem do not seem to apply
even approximately in actual scientific contexts. Finally, some of the assumptions of
stable estimation theorems can be dispensed with if one assumes instead that all of
the initial distributions considered must agree regarding which evidence is relevant
to which hypotheses. But there is no evident a priori reason why there should be
such agreement.
I think relatively few Bayesians are actually persuaded of the correctness of
Bayesian doctrine by Dutch-book arguments, stable estimation theorems, or other a
priori arguments. Their frailty is too palpable. I think that the appeal of Bayesian
doctrine derives from two other features. First, with only very weak or very
natural assumptions about prior probabilities, or none at all, the Bayesian scheme
generates principles that seem to accord well with common sense. Thus, with minor
restrictions, one obtains the principle that hypotheses are confirmed by positive
instances of them; and, again, one obtains the result that if an event that actually
occurs is, on some hypothesis, very unlikely to occur, then that occurrence renders
the hypothesis less likely than it would otherwise have been. These principles, and
others, can claim something like the authority of common sense, and Bayesian
doctrine provides a systematic explication of them. Second, the restrictions placed
a priori on rational degrees of belief are so mild, and the device of probability
theory at once so precise and so flexible, that Bayesian philosophers of science may
reasonably hope to explain the subtleties and vagaries of scientific reasoning and
inference by applying their scheme together with plausible assumptions about the
distribution of degrees of belief. This seems, for instance, to be Professor Hesse’s
line of argument. After admitting the insufficiency of the standard arguments for
Bayesianism, she sets out to show that the view can account for a host of alleged
features of scientific reasoning and inference. My own view is different: particular
inferences can almost always be brought into accord with the Bayesian scheme
by assigning degrees of belief more or less ad hoc, but we learn nothing from
this agreement. What we want is an explanation of scientific argument; what the
Bayesians give us is a theory of learning—indeed, a theory of personal learning.
But arguments are more or less impersonal; I make an argument to persuade
anyone informed of the premisses, and in doing so I am not reporting any bit of
autobiography. To ascribe to me degrees of belief that make my slide from my
premisses to my conclusion a plausible one fails to explain anything, not only
because the ascription may be arbitrary, but also because, even if it is a correct
assignment of my degrees of belief, it does not explain why what I am doing is
arguing—why, that is, what I say should have the least influence on others, or why
I might hope that it should. Now, Bayesians might bridge the gap between personal
inference and argument in either of two ways. In the first place, one might give
arguments in order to change others’ beliefs because of the respect they have for his
8 Why I am not a Bayesian 139
opinion. This is not very plausible; if that were the point of giving arguments, one
would not bother with them, but would simply state one’s opinion. Alternatively,
and more hopefully, Bayesians may suggest that we give arguments exactly because
there are general principles restricting belief, principles that are widely subscribed
to, and in giving arguments we are attempting to show that, supposing our audience
has certain beliefs, they must in view of these principles have other beliefs, those
we are trying to establish. There is nothing controversial about this suggestion, and I
endorse it. What is controversial is that the general principles required for argument
can best be understood as conditions restricting prior probabilities in a Bayesian
framework. Sometimes they can, perhaps; but I think that when arguments turn on
relating evidence to theory, it is very difficult to explicate them in a plausible way
within the Bayesian framework. At any rate, it is worth seeing in more detail what
the difficulties may be.
There is very little Bayesian literature about the hotchpotch of claims and notions
that are usually canonized as scientific method; very little seems to have been
written, from a Bayesian point of view, about what makes a hypothesis ad hoc, about
what makes one body of evidence more various than another body of evidence, and
why we should prefer a variety of evidence, about why, in some circumstances,
we should prefer simpler theories, and what it is that we are preferring when we
do. And so on. There is little to nothing of this in Carnap, and more recent, and
more personalist, statements of the Bayesian position are almost as disappointing.
In a lengthy discussion of what he calls ‘tempered personalism’, Abner Shimony
(1970) discusses only how his version of Bayesianism generalizes and qualifies
hypothetico-deductive arguments. (Shimony does discuss simplicity, but only to
argue that it is overvalued.) Mary Hesse devotes the later chapters of her book to
an attempt to show that certain features of scientific method do result when the
Bayesian scheme is supplemented with a postulate that restricts assignments of prior
probabilities. Unfortunately, as we shall see, her restrictive principle is incoherent.4
One aspect of the demand for a variety of evidence arises when there is some
definite set of alternative hypotheses between which we are trying to decide. In
such cases we naturally prefer the body of evidence that will be most helpful in
eliminating false competitors. This aspect of variety is an easy and natural one
for Bayesians to take account of, and within an account such as Shimony’s it is
taken care of so directly as hardly to require comment. But there is more to variety.
In some situations we have some reason to suspect that if a theory is false, its
falsity will show up when evidence of certain kinds is obtained and compared. For
example, given the tradition of Aristotelian distinctions, there was some reason to
demand both terrestrial and celestial evidence for seventeenth-century theories of
motion that subjected all matter to the same dynamical laws. Once again, I see no
special reason why this kind of demand for a variety of evidence cannot be fitted
into the Bayesian scheme. But there is still more. A complex theory may contain
4
Moreover, I believe that much of her discussion of methodological principles has only the loosest
relation to Bayesian principles.
140 C. Glymour
in the equation relating the measured quantities. Roughly, if one’s equation fits
the data too well, then the equation has too many terms and too many arbitrary
parameters; and if the equation does not fit the data well enough, then one has not
included enough terms and parameters in the equation. The whole business depends,
of course, entirely on the ordering of prior probabilities. In his Theory of Probability
Jeffreys (1967) proposed that the prior probability of a hypothesis decreases as the
number of arbitrary parameters increases, but hypotheses having the same number
of arbitrary parameters have the same prior probability. This leads immediately to
the conclusion that the prior probability of every hypothesis is zero. Earlier, Jeffreys
proposed a slightly more complex assignment of priors that did not suffer from this
difficulty. The problem is not really one of finding a way to assign finite probabilities
to an infinite number of incompatible hypotheses, for there are plenty of ways to do
that. The trouble is that it is just very implausible that scientists typically have their
prior degrees of belief distributed according to any plausible simplicity ordering,
and still less plausible that they would be rational to do so. I can think of very few
simple relations between experimentally determined quantities that have with-stood
continued investigation, and often simple relations are replaced by relations that
are infinitely complex: consider the fate of Kepler’s laws. Surely it would be naïve
for anyone to suppose that a set of newly measured quantities will truly stand in a
simple relation, especially in the absence of a well-confirmed theory of the matter.
Jeffreys’ strategy requires that we proceed in ignorance of our scientific experience,
and that can hardly be a rational requirement (Jeffreys 1973).
Consider another Bayesian attempt, this one due to Mary Hesse. Hesse puts a
‘clustering’ constraint on prior probabilities: for any positive r, the conjunction
of r C 1 positive instances of a hypothesis is more probable than a conjunction
of r positive instances with one negative instance. This postulate, she claims, will
lead us to choose, ceteris, paribus, the most economical, the simplest, hypotheses
compatible with the evidence. Here is the argument:
Consider first evidence consisting of individuals a1 , a2 , : : : , an , all of which
have properties P and Q. Now consider an individual anC1 with property P. Does
anC1 have Q or not? If nothing else is known, the clustering postulate will direct
us to predict Q˛C1 since, ceteris paribus, the universe is to be postulated to be
as homogeneous as possible consistently with the data : : : But this is also the
prediction that would be made by taking the most economical general law which
is both confirmed by the data and of sufficient content to make a prediction about
the application of Q to anC1 . For h D ‘All P are Q’ is certainly more economical
than the ‘gruified’ conflicting hypothesis of equal content h0 : ‘All x up to an that are
P are Q, and all other x that are P are Q.’
If follows in the [case] considered that if a rule is adopted to choose the prediction
resulting from the most probable hypothesis on grounds of content, or, in case of a
tie in content, the most economical hypothesis on those of equal content, this rule
will yield the same predictions as the clustering postulate.
Here is the argument applied to curve-fitting:
Let f be the assertion that two data points (x1 , y1 ,), (x2 , y2 ) are obtained from
experiments : : : The two points are consistent with the hypothesis y D a C bx,
142 C. Glymour
and also of course with an indefinite number of other hypotheses of the form
y D a0 C a1 C C an x1 , where the values of a0 , : : : , an are not determined
by (x1 , y1 ), (x2 , y2 ). What is the most economical prediction of the y-value of a
further point g, where the x-value of g is x3 ? Clearly it is the prediction which uses
only the information already contained in f, that is, the calculable values of a, b
rather than a prediction which assigns arbitrary values to the parameters of a higher-
order hypothesis. Hence the most economical prediction is about the point g D (x3 ,
a C bx3 ), which is also the prediction given by the ‘simplest’ hypothesis on almost
all accounts of the simplicity of curves. Translated into probabilistic language, this
is to say that to conform to intuitions about economy we should assign higher initial
probability to the assertion that points (x1 , a C bx1 ), (x2 , a C bx2 ), (x3 , a C bx3 ) are
satisfied by the experiment than to that in which the third point is inexpressible in
terms of a and b alone. In this formulation economy is a function of finite descriptive
lists of points rather than general hypotheses, and the relevant initial probability is
that of a universe containing these particular points rather than that of a universe in
which the corresponding general law is true : : : Description in terms of a minimum
number of parameters may therefore be regarded as another aspect of homogeneity
or clustering of the universe. (Hesse 1974: 230–2)
Hesse’s clustering postulate applies directly to the curve-fitting case, for her
clustering postulate then requires that if two paired values of x and y satisfy the
predicate y D ax C b, then it is more probable than not that a third pair of values will
satisfy the predicate. So the preference for the linear hypothesis in the next instance
results from Hesse’s clustering postulate and the probability axioms. Unfortunately,
with trivial additional assumptions, everything results. For, surely, if y D a C bx is a
legitimate predicate, then so is y D ˛ 1 C b1 x2 , for any definite values of a1 and b1 .
Now Hesse’s first two data points can be equally well described by (x1 ; a1 C b1 x21 )
and (x2; a1 C b1 x22 ), where
y1 y2 y1 y2
b1 D a 1 D y1 x21 ;
x21 x22 x21 x22
Hence her first two data points satisfy both the predicate y D a C bx and the
predicate y D a1 C b1 x2 . So, by the clustering postulate, the probability that the
third point satisfies the quadratic expression must be greater than one-half, and the
probability that the third point satisfies the linear expression must also be greater
than one-half, which is impossible.
Another Bayesian account of our preference for simple theories has recently
been offered by Roger Rosencrantz (1976). Suppose that we have some criterion
for ‘goodness of fit’ of a hypothesis to data—for example, confidence regions
based on the ¦2 distribution for categorical data, or in curve-fitting perhaps that the
average sum of squared deviations is less than some figure. Where the number of
possible outcomes is finite, we can compare the number of such possible outcomes
that meet the goodness-of-fit criterion with the number that do not. This ratio
Rosencrantz calls the ‘observed sample coverage’ of the hypothesis. Where the
possible outcomes are infinite, if the region of possible outcomes meeting the
8 Why I am not a Bayesian 143
The first difficulty is a familiar one. Let us suppose that we can divide the
consequences of a theory into sentences consisting of reports of actual or possible
observations, and simple generalizations of such observations, on the one hand;
and on the other hand, sentences that are theoretical. Then the collection of
‘observational’ consequences of the theory will always be at least as probable as
the theory itself; generally, the theory will be less probable than its observational
consequences. A theory is never any better established than is the collection of
its observational consequences. Why, then, should we entertain theories at all?
On the probabilist view, it seems, they are a gratuitous risk. The natural answer
is that theories have some special function that their collection of observational
consequences cannot serve; the function most frequently suggested is explanation—
theories explain; their collection of observational consequences do not. But however
sage this suggestion may be, it only makes more vivid the difficulty of the Bayesian
way of seeing things. For whatever explanatory power may be, we should certainly
expect that goodness of explanation will go hand in hand with warrant for belief; yet,
if theories explain, and their observational consequences do not, the Bayesian must
deny the linkage. The difficulty has to do both with the assumption that rational
degrees of belief are generated by probability measures and with the Bayesian
account of evidential relevance. Making degrees of belief probability measures in
the Bayesian way already guarantees that a theory can be no more credible than
any collection of its consequences. The Bayesian account of confirmation makes it
impossible for a piece of evidence to give us more total credence in a theory than in
its observational consequences. The Bayesian way of setting things up is a natural
one, but it is not inevitable, and wherever a distinction between theory and evidence
is plausible, it leads to trouble.
A second difficulty has to do with how praise and blame are distributed among
the hypotheses of a theory. Recall the case of Kepler’s laws (discussed in Glymour
1981, ch. 2). It seems that observations of a single planet (and, of course, the
sun) might provide evidence for or against Kepler’s first law (all planets move on
ellipses) and for or against Kepler’s second law (all planets move according to the
area rule), but no observations of a single planet would constitute evidence for or
against Kepler’s third law (for any two planets, the ratio of their periods equals the
3 power of the ratio of their distances). Earlier [in Ch. 2 of Glymour’s Theory
2
and Evidence] we saw that hypothetico-deductive accounts of confirmation have
great difficulty explaining this elementary judgement. Can the Bayesians do any
better? One thing that Bayesians can say (and some have said) is that our degrees of
belief are distributed—and historically were distributed—so that conditionalizing
on evidence about one planet may change our degrees of belief in the first and
second laws, but not our degree of belief in the third law.5 I don’t see that this is
an explanation for our intuition at all; on the contrary, it seems merely to restate
(with some additional claims) what it is that we want to be explained. Are there any
reasons why people had their degrees of belief so distributed? If their beliefs had
5
This is the account suggested by Horwich (1978).
8 Why I am not a Bayesian 145
been different, would it have been equally rational for them to view observations
of Mars as a test of the third law, but not of the first? It seems to me that we never
succeed in explaining a widely shared judgement about the relevance or irrelevance
of some piece of evidence merely by asserting that degrees of belief happened to be
so distributed as to generate those judgements according to the Bayesian scheme.
Bayesians may instead try to explain the case by appeal to some structural difference
among the hypotheses; the only gadget that appears to be available is the likelihood
of the evidence about a single planet on various combinations of hypotheses. If it
is supposed that the observations are such that Kepler’s first and second laws entail
their description, but Kepler’s third law does not, then it follows that the likelihood
of the evidence on the first and second laws—that is, the conditional probability of
the evidence given those hypotheses—is unity, but the likelihood of the evidence on
the third law may be less than unity. But any attempt to found an account of the case
on these facts alone is simply an attempt at a hypothetico-deductive account. The
problem is reduced to one already unsolved. What is needed to provide a genuine
Bayesian explanation of the case in question (as well as of many others that could
be adduced) is a general principle restricting conditional probabilities and having
the effect that the distinctions about the bearing of evidence that have been noted
here do result. Presumably, any such principles will have to make use of relations
of content or structure between evidence and hypothesis. The case does nothing to
establish that no such principles exist; it does, I believe, make it plain that without
them the Bayesian scheme does not explain even very elementary features of the
bearing of evidence on theory.
A third difficulty has to do with Bayesian kinematics. Scientists commonly argue
for their theories from evidence known long before the theories were introduced.
Copernicus argued for his theory using observations made over the course of
millennia, not on the basis of any startling new predictions derived from the theory,
and presumably it was on the basis of such arguments that he won the adherence of
his early disciples. Newton argued for universal gravitation using Kepler’s second
and third laws, established before the Principia was published. The argument that
Einstein gave in 1915 for his gravitational field equations was that they explained
the anomalous advance of the perihelion of Mercury, established more than half a
century earlier. Other physicists found the argument enormously forceful, and it is a
fair conjecture that without it the British would not have mounted the famous eclipse
expedition of 1919. Old evidence can in fact confirm new theory, but according to
Bayesian kinematics, it cannot. For let us suppose that evidence e is known before
theory T is introduced at time t. Because e is known at t, probt (e) D 1. Further,
because probt (e) D 1, the likelihood of e given T, probt (e, T), is also 1. We then
have
6
All of the defences sketched below were suggested to me by one or another philosopher
sympathetic to the Bayesian view; I have not attributed the arguments to anyone for fear of
misrepresenting them. None the less, I thank Jon Dorling, Paul Teller, Daniel Garber, Ian
Hacking, Patrick Suppes, Richard Jeffrey, and Roger Rosencrantz for valuable discussions and
correspondence on the point at issue.
8 Why I am not a Bayesian 147
and we may try to use this formula to evaluate the counterfactual degree of belief
in e. The problem is with the last term. Of course, one could suggest that this term
just be ignored when evaluating P(e), but it is difficult to see within a Bayesian
framework any rationale at all for doing so. For if one does ignore this term, then the
collection of prior probabilities used to evaluate the posterior probability of T will
not be coherent unless either the likelihood of e on T is zero or the prior probability
of T is zero. One could remedy this objection by replacing the last term by
P.T/P .e; T/ ;
P .T1 _ T2 _ _ Tk _ T/
7
The actual history is still more complicated. Newcomb and Doolittle obtained values for the
anomaly differing by about 2 s of are per century. Early in the 1920s. Grossmann discovered that
Newcomb had made an error in calculation of about that magnitude.
148 C. Glymour
is not unity, then the set of prior degrees of belief will still be incoherent. Moreover,
not only will it be the case that if the actual degree of belief in e is replaced by
a counterfactual degree of belief in e according to either of these proposals, then
the resulting set of priors will be incoherent, it will further be the case that if we
conditionalize on e the resulting conditional probabilities will be incoherent. For
example, if we simply delete the last term, one readily calculates that
P .T1 _ _ Tk / P .e; T1 _ _ Tk /
P .T1 _ _ Tk ; e/ D D 1;
P .e; T1 _ _ Tk / P .T1 _ _ Tk /
P.T/ P .e; T/
P .T; e/ D :
P .e; T1 _ _ Tk / P .T1 _ _ Tk /
and let P00 be your new actual degree of belief function. (Alternatively, P00 can be
formed by using Jeffrey’s rule a second time.)
There remain a number of objections to the historical proposal. It is not obvious
that there are, for each of us, degrees of belief we personally would have had in
some historical period. It is not at all clear which historical period is the relevant
8 Why I am not a Bayesian 149
one. Suppose, for example, that the gravitational deflection of sunlight had been
determined experimentally around 1900, well before the introduction of general
relativity.8 In trying to assess the confirmation of general relativity, how far back
in time should a twentieth-century physicist go under this supposition? If only to
the nineteenth, then if he would have shared the theoretical prejudices of the period,
gravitational deflection of light would have seemed quite probable. Where ought
he to stop, and why? But laying aside these difficulties, it is implausible indeed that
such a historical Bayesianism, however intriguing a proposal, is an accurate account
of the principles by which scientific judgements of confirmation are made. For if it
were, then we should have to condemn a great mass of scientific judgements on the
grounds that those making them had not studied the history of science with sufficient
closeness to make a judgement as to what their degrees of belief would have been
in relevant historical periods. Combined with the delicacy that is required to make
counterfactual degrees of belief fit coherently with actual ones, these considerations
make me doubt that we should look to counterfactual degrees of belief for a plausible
Bayesian account of how old evidence bears on new theory.
Finally, consider a quite different Bayesian response to the old evidence/new
theory problem. Whereas the ideal Bayesian agent is a perfect logician, none of us
are, and there are always consequences of our hypotheses that we do not know to
be consequences. In the situation in which old evidence is taken to confirm a new
theory, it may be argued that there is something new that is learned, and typically,
what is learned is that the old evidence is entailed by the new theory. Some old
anomalous result is lying about, and it is not this old result that confirms a new
theory, but rather the new discovery that the new theory entails (and thus explains)
the old anomaly. If we suppose that semi-rational agents have degrees of belief about
the entailment relations among sentences in their language, and that
ˇ
ˇ
P hˇ e D 1 implies P .e; h/ D 1;
8
Around 1900 is fanciful, before general relativity is not. In 1914 E. Freundlich mounted an
expedition to Russia to photograph the eclipse of that year in order to determine the gravitational
deflection of starlight. At that time, Einstein had predicted an angular deflection for light passing
near the limb of the sun that was equal in value to that derived from Newtonian principles by
Soldner in 1801. Einstein did not obtain the field equations that imply a value for the deflection
equal to twice the Newtonian value until late in 1915. Freundlich was caught in Russia by the
outbreak of World War I, and was interned there. Measurement of the deflection had to wait until
1919.
150 C. Glymour
ˇ
ˇ
P h; b&e& hˇ e > P .h; b&e/ :
Now, in a sense, I believe this solution to the old evidence/new theory problem to
be the correct one; what matters is the discovery of a certain logical or structural
connection between a piece of evidence and a piece of theory, and it is in virtue of
that connection that the evidence, if believed to be true, is thought to be evidence for
the bit of theory. What I do not believe is that the relation that matters is simply the
entailment relation between the theory, on the one hand, and the evidence, on the
other. The reasons that the relation cannot be simply that of entailment are exactly
the reasons why the hypothetico-deductive account (see Glymour 1981, ch. 2)
is inaccurate; but the suggestion is at least correct in sensing that our judgement
of the relevance of evidence to theory depends on the perception of a structural
connection between the two, and that degree of belief is, at best, epiphenomenal. In
the determination of the bearing of evidence on theory, there seem to be mechanisms
and stratagems that have no apparent connection with degrees of belief, which are
shared alike by people advocating different theories. Save for the most radical
innovations, scientists seem to be in close agreement regarding what would or
would not be evidence relevant to a novel theory; claims as to the relevance to
some hypothesis of some observation or experiment are frequently buttressed by
detailed calculations and arguments. All of these features of the determination
of evidential relevance suggest that that relation depends somehow on structural,
objective features connecting statements of evidence and statements of theory. But if
that is correct, what is really important and really interesting is what these structural
features may be. The condition of positive relevance, even if it were correct, would
simply be the least interesting part of what makes evidence relevant to theory.
None of these arguments is decisive against the Bayesian scheme of things, nor
should they be; for in important respects that scheme is undoubtedly correct. But
taken together, I think they do at least strongly suggest that there must be relations
between evidence and hypotheses that are important to scientific argument and to
confirmation but to which the Bayesian scheme has not yet penetrated.
References
Carnap, R. (1950). The logical foundations of probability. Chicago: University of Chicago Press.
Glymour, C. (1981). Theory and evidence. Chicago: University of Chicago Press.
Hesse, M. (1974). The structure of scientific inference. Berkeley: University of California Press.
Horwich, P. (1978). An appraisal of Glymour’s confirmation theory. Journal of Philosophy, 75,
98–113.
Jeffrey, R. (1965). The logic of decision. New York: McGraw-Hill.
Jeffreys, H. (1967). Theory of probability. Oxford: Clarendon.
Jeffreys, H. (1973). Scientific inference. Cambridge: Cambridge University Press.
Kyburg, H. (1978). Subjective probability: Criticisms, reflections and problems. Journal of
Philosophical Logic, 7, 157–180.
8 Why I am not a Bayesian 151
Brian Skyrms
The person whose degrees of belief are being tested for coherence acts as a bookie.
She posts her fair prices for wagers corresponding to her degrees of belief. Her
degrees of belief are incoherent if a cunning bettor can make a Dutch book against
her with a finite system of wagers—that is, there is a finite set of wagers individually
perceived as fair, whose net payoff is a loss in every possible future. Otherwise
her degrees of belief are coherent. De Finetti ([1937] 1980) proved the following
theorem: Degrees of belief are coherent if and only if they are finitely additive
probabilities.
Obviously, if a Dutch book can be made with a finite number of fair transactions,
it can be made with a finite number of uniformly favorable transactions. The bettor
pays some small transaction premium " to the bookie for each of the n transactions
where n" is less than the guaranteed profit that the bettor gets under the Dutch book
based on fair prices. Let us bear in mind that this point applies equally well in what
follows.
The epistemologist acts as bookie. Her updating rule is public knowledge. Today she
posts her fair prices, and does business. Tomorrow she makes an observation (with
a finite number of possible outcomes each of which has positive prior probability)
and updates her fair prices according to her updating rule. The updating rule is thus
a function from possible observations to revised fair prices. The day after tomorrow
she posts prices again, and does business. The pair consisting of the (1) her fair
prices for today and (2) her updating function will be called the bookie’s epistemic
strategy.
The bookie’s epistemic strategy is coherent if there is no possible bettor’s
strategy which makes a Dutch book against him (the bettor’s strategy being
a pair consisting of (1) a finite number of transactions today at the bookie’s
posted prices and (2) a function taking possible observations into a finite number
of transactions the day after tomorrow at the prices that the bookie will post
according to her epistemic strategy). Lewis (reported in Teller 1973) proves that the
epistemologist’s strategy is coherent only if her degrees of belief today are finitely
additive probabilities and her updating rule is Bayes’s rule of conditioning. The
“only if” can be strengthened to “if and only if” (see section “The converse”). (For
generalizations of this theorem see van Fraassen 1984 and Skyrms 1987, 1990.)
Notice that the relevant notions of coherence and incoherence here apply not just
to the pair of degrees of belief for today and the day after tomorrow, but rather to an
epistemic strategy, which is a more complicated object. A focus on the former notion
leads understandably to skepticism regarding dynamic coherence, as in Hacking
(1967), Kyburg (1978), and Christensen (1991).
Coherence of degrees of belief today is the static case. It remains to show that for
any non-Bayes updating rule, there is a bettor’s strategy which makes a Dutch book.
Let the conditional probability of A on e, that is Pr(A & e)/Pr(e), be symbolized as
usual, as Pr(Aje), and let the probability that the updating rule gives A if e is observed
be Pre (A). If the predictor’s rule disagrees with conditioning, then for some possible
evidential result e and some A, Pre (A) is not Pr(Aje). Suppose that Pr(Aje) > Pre (A).
(The other case is similar.) Let the discrepancy be ı D Pr.Aje/ Pre .A/. Here is a
bettor’s strategy which makes a Dutch book:
TODAY: Offer to sell the bookie at her fair price:
1: [$1 if A & e, 0 otherwise]
2: [$Pr(Aje) if not-e, 0 otherwise]
3: [$• if e, 0 otherwise]
9 Discussion: A Mistake in Dynamic Coherence Arguments? 155
The Converse
If the bookie has the strategy of updating by Bayes’s rule of conditioning, then every
payoff that a bettor’s strategy can achieve can be achieved by betting only today (see
Skyrms 1987). This reduces our case to the static case. Thus, by de Finetti’s result,
if the epistemologist’s prior degrees of belief are finitely additive probabilities and
her updating rule is Bayes’s rules of conditioning, then she is dynamically coherent.
Maher’s (1992b) objection is that the bookie will see it coming and refuse to bet.
This is made precise by modeling the bookie’s situation as a sequential choice
problem, as shown in Fig. 9.1. The bookie sees that if she bets today and e occurs,
then at decision node 2, she will find the cunning bettor’s offer fair according to
her revised probability, Pre (A). Thus she sees that betting today leads to a sure loss.
Since she prefers net gain of zero to a sure loss, she refuses to bet today—frustrating
the cunning bettor who goes home unable to execute his plan.
The first thing that must be said about “Maher’s objection” is that it is misleading
to represent it as showing a “mistake” in the dynamic coherence theorem. Under the
conditions of the theorem the bookie posts her fair prices for today and honors them.
There is no provision for changing one’s mind when approached by a cunning bettor
who discloses his strategy, nor indeed any mention of a requirement that the cunning
bettor disclose his strategy prior to the initial transaction. But Maher might be read
as suggesting a different conception of dynamic coherence in this setting:
The epistemologist acts as bookie. Her updating rule is public knowledge. Today she posts
her tentative fair prices, but in fact does business only with bettors who disclose their
strategies in advance, and does so on the basis of sequential decision analysis. Tomorrow
she makes an observation (with a finite number of possible outcomes each of which has
positive prior probability) and updates her probabilities according to her updating rule. The
day after tomorrow she posts prices again, and does business according to those prices.
She is coherent if there is no possible bettor’s strategy which makes a Dutch book
against her.
A natural reaction to Maher’s line might be to say that the redefinition unfairly
prejudices the case against dynamic coherence arguments. It is therefore of some
interest to see that the dynamic Dutch book still goes through under the revised
scenario.
There is a gratuitous assumption in the analysis presented in Fig. 9.1. Why is it
assumed that the cunning bettor will just go home if the bookie refuses to bet today?
The bettor’s strategy which I presented says otherwise. The bettor will make an offer
the day after tomorrow if e was observed. So the branch of the decision tree where
the bookie refuses transactions today cannot simply be assumed to have payoff of
zero, but requires further analysis. This is done in Fig. 9.2.
Note that the bookie knows that if e is observed, she will accept the offer the day
after tomorrow for the same reason on the lower path as on the upper. Deciding now
not to bet ever is not an option. If the offer the day after tomorrow is accepted but
the offer today was not and e and A both happen, then the net payoff is the price the
cunning bettor paid, $Pr(Aje) •, less the lost bet, $ 1, as shown. If e occurs but
A does not, the net payoff is just $Pr(Aje) •. For the bookie’s current analysis of
this decision tree, to get the relevant expectation over A occurring or not we average
using as weights her current conditional probabilities, Pr(Aje) and Pr( Aje). Thus
the value at the node where the bookie refused to bet today and where e is observed
tomorrow is
ˇ hn ˇ o i h ˇ i h ˇ i
ˇ ˇ ˇ ˇ
Pr Aˇe $ Pr Aˇe ı 1 C 1 Pr Aˇe $ Pr Aˇe ı D $ ı:
Then the value at the node where the bookie refused to bet today is not 0 but rather
$ ı Pr(e). This is just the same as the value at the node where the bookie agrees to
bet today.
In fact, if we consider the version of the Dutch-book strategy where the bettor
adds an " premium for each transaction, the upper branch involves four transactions
and the lower branch involves only one, so the upper branch has a higher payoff
than the lower branch. Even though the bookie sees it coming, she will prefer the
sure loss of the upper branch because doing so looks strictly better to her than the
alternative.
Why did the cunning bettor adopt a strategy of staying around if the bookie decided
not to bet today? The official answer in sections “Dynamic coherence for updating
rules” and “The dynamic Dutch book” is “Don’t ask”. Any bettor’s strategy which
makes a Dutch book will prove incoherence. But, as Levi (1991) points out, that sort
of analysis proceeds in strategic normal form rather than in extensive form. Might
it be that the cunning bettor’s strategy described would have to be sequentially
irrational? That is to say, might it not be that staying around and betting the day
after tomorrow if the bookie decided not to bet today would not maximize expected
utility for the cunning bettor in the belief state he would have in that case the day
after tomorrow? If this could be shown, then the cunning bettor’s strategy that I
have described would have to rest on a noncredible threat, and the significance of
the analysis of the previous section would be called into question. (For discussion
of such noncredible threats in extensive form games and of sequential rationality,
see Selten 1975 and Kreps and Wilson 1982.)
But such is not the case. Suppose that the bettor is a Bayesian; that he starts
out with exactly the same degrees of belief as the bookie; and that he updates by
conditioning. If e is observed tomorrow—whether or not the bookie accepted the
9 Discussion: A Mistake in Dynamic Coherence Arguments? 159
bet today—he conditions on e and the day after tomorrow his fair price for b4 is $
pr(Aje). But his strategy only commits him to offering to pay the bookie’s fair price,
$ pr(Aje) ı, to buy back b4 for what he perceives as a net gain in expected utility
of $ı. This bettor’s threat to stick around and bet the day after tomorrow if e, even if
the bookie declines to bet today, is perfectly credible and consistent with sequential
rationality. If he is called upon to carry out the threat, he maximizes expected utility
by doing so.
Strategic Rationality
Let us explicitly model the bookie’s choice of an updating strategy. The bookie and
the bettor start out with identical priors. The bettor updates by conditioning. First
the bookie chooses an updating strategy. Then the bettor bets, the evidence comes
in, the bookie updates according to her updating rule, and the bettor bets again. The
bookie’s initial strategy is either to choose updating by conditioning or not.
If the bookie chooses the strategy of updating by conditioning, then the fair
prices of the bookie and bettor agree at all times. Thus either no transactions are
made, or any transactions have net change in expected utility of 0 for both players.
The bookie’s expected utility of choosing the strategy of updating by conditioning
is zero. If, however, the bookie chooses an updating strategy at variance with
conditioning then, for the bettor, the expected utility of betting is greater than that of
not betting (section “Sequential analysis 3: what makes the cunning bettor tick?”)
and the net expected utility for the bookie is negative (section “Sequential analysis
2: a mistake in the mistake?”). At the first choice point the bookie is strictly better
off by choosing the rule of updating by conditioning.
Thus the strategy combination in which the bookie updates by conditioning and
the bettor does not bet at all is an equilibrium in the sense that no player will
perceive it in his or her interest at any decision node to deviate from that strategy.
But no strategy combination in which the bookie chooses a strategy at variance with
conditioning is such an equilibrium.
Two ways of strengthening the requirements for a dynamic Dutch book were
suggested by the discussions of Levi (1987) and Maher: (1) We require the cunning
bettor to disclose his strategy, and allow the bookie to use knowledge of that
strategy in a sequential analysis when deciding whether to bet today or not, and
(2) we require that the cunning bettor’s strategy itself be sequentially rational. The
somewhat surprising result is that the additional restrictions made no difference.
The bookie whose epistemic strategy is at odds with conditioning is also subject
to a Dutch book in this stronger sense. “Seeing it coming” does not help. It is at
160 B. Skyrms
the very least a noteworthy property of the rule of conditioning that in this sort of
epistemic situation, it alone is immune from a Dutch book under either the original
or strengthened requirements.
Many of the concerns of Levi and Maher have not been addressed in the foregoing.
Levi is concerned to resist the doctrine of “confirmational tenacity”, according to
which the only legitimate way in which to update is by conditioning. Maher wishes
to resist the doctrine that rationality requires dynamic coherence at all costs. Does
the foregoing show that conditioning is the only coherent way to ever update one’s
probabilities? Does it show that rationality requires coherence at all costs?
I agree with Levi and Maher in answering “no” to both questions. With regard to
the first, let me emphasize that the Lewis proof takes place within the structure of a
very special epistemic model. In that context it shows that the rule of conditioning
is the unique dynamically coherent updating rule. It does not show that one must
have an updating rule. It does not apply to other epistemic situations which should
be modeled differently. The modeling of a variety of epistemic situations and the
investigation of varieties of dynamic coherence in such situations is an ongoing
enterprise (in which I take it that both Levi and I are engaged; see Skyrms 1990 for
further discussion).
Maher is concerned that an uncritical doctrine of “dynamic coherence at all
costs” could lead one to crazy belief changes and disastrous actions. Should Ulysses
have changed to 1 his prior probability of safe sailing conditional on hearing
the Sirens’ song so that subsequently his belief change would be in accordance
with the rule of conditioning? Nothing in the foregoing implies that he should.
In the first place, there is something a little odd in thinking that one achieves
dynamic coherence by changing the original prior pr1 to the revised prior pr2 so
that the change to pr3 will agree with conditioning. What about the change from
pr1 to pr2 ? But, more fundamentally, I would agree with Maher that rationality
definitely does not require coherence at all costs. Where costs occur they need to
be weighed against benefits. There are lucid discussions of this matter in Maher
(1992a, b). These things said, it remains that in the Lewis epistemic model under
the “sequentialized” notion of dynamic coherence, the unique coherent updating
rule is the rule of conditioning.
Acknowledgement I would like to thank Brad Armendt, Ellery Eells, Isaac Levi, Patrick Maher
and an anonymous referee for helpful comments on an earlier draft of this note. I believe that
Maher, Levi and I are now in substantial agreement on the issues discussed here, although
differences in emphasis and terminology may remain.
9 Discussion: A Mistake in Dynamic Coherence Arguments? 161
References
Christensen, D. (1991). Clever bookies and coherent beliefs. The Philosophical Review, 100, 229–
247.
de Finetti, B. ([1937] 1980). Foresight: Its logical laws, its subjective sources, translated in H. E.
Kyburg, Jr., & H. Smokler (Eds.), Studies in subjective probability (pp. 93–158). (Originally
published as “La Prevision: ses lois logiques, ses sources subjectives”, Annales de l’Institut
Henri Poincaré, 7, 1–68.) Huntington: Kreiger.
Hacking, I. (1967). Slightly more realistic personal probability. Philosophy of Science, 34, 311–
325.
Kreps, D., & Wilson, R. (1982). Sequential equilibria. Econometrica, 50, 863–894.
Kyburg, H. (1978). Subjective probability: Criticisms, reflections and problems. The Journal of
Philosophical Logic, 7, 157–180.
Levi, I. (1987). The demons of decision. The Monist, 70, 193–211.
Levi, I. (1991). Consequentialism and sequential choice. In M. Bacharach & S. Hurley (Eds.),
Foundations of decision theory (pp. 92–146). Oxford: Basil Blackwell.
Maher, P. (1992a). Betting on theories. Cambridge: Cambridge University Press.
Maher, P. (1992b). Diachronic rationality. Philosophy of Science, 59, 120–141.
Selten, R. (1975). Reexamination of the perfectness concept of equilibrium in extensive form
games. International Journal of Game Theory, 4, 25–55.
Skyrms, B. (1987). Dynamic coherence and probability kinematics. Philosophy of Science, 54,
1–20.
Skyrms, B. (1990). The dynamics of rational deliberation. Cambridge: Harvard University Press.
Teller, P. (1973). Conditionalization and observation. Synthese, 26, 218–258.
van Fraassen, B. (1984). Belief and the will. Journal of Philosophy, 81, 235–256.
Chapter 10
Some Problems for Conditionalization
and Reflection
Frank Arntzenius
I will present five puzzles which show that rational people can update their degrees
of belief in manners that violate Bayesian Conditionalization and van Fraassen’s
Reflection Principle. I will then argue that these violations of Conditionalization
and Reflection are due to the fact that there are two, as yet unrecognized, ways in
which the degrees of belief of rational people can develop.
Every now and then the guardians to Shangri La will allow a mere mortal to enter
that hallowed ground. You have been chosen because you are a fan of the Los
Angeles Clippers. But there is an ancient law about entry into Shangri La: you are
only allowed to enter, if, once you have entered, you no longer know by what path
you entered. Together with the guardians you have devised a plan that satisfies this
law. There are two paths to Shangri La, the Path by the Mountains, and the Path
by the Sea. A fair coin will be tossed by the guardians to determine which path
you will take: if heads you go by the Mountains, if tails you go by the Sea. If you
go by the Mountains, nothing strange will happen: while traveling you will see the
glorious Mountains, and even after you enter Shangri La you will forever retain your
memories of that Magnificent Journey. If you go by the Sea, you will revel in the
Beauty of the Misty Ocean. But, just as you enter Shangri La, your memory of this
Beauteous Journey will be erased and be replaced by a memory of the Journey by
the Mountains.
F. Arntzenius ()
University of Oxford, Oxford, UK
e-mail: [email protected]
Suppose that in fact you travel by the Mountains. How will your degrees of belief
develop? Before you set out your degree of belief in heads will be ½. Then, as you
travel along the Mountains and you gaze upon them, your degree of belief in heads
will be 1. But then, once you have arrived, you will revert to having degree of belief
½ in heads. For you will know that you would have had the memories that you have
either way, and hence you know that the only relevant information that you have is
that the coin was fair.
This seems a bizarre development of degrees of belief. For as you are traveling
along the Mountains, you know that your degree of belief in heads is going to go
down from 1 to ½. You do not have the least inclination to trust those future degrees
of belief. Those future degrees of belief will not arise because you will acquire
any evidence, at least not in any straightforward sense of “acquiring evidence”.
Nonetheless you think you will behave in a fully rational manner when you acquire
those future degrees of belief. Moreover, you know that the development of your
memories will be completely normal. It is only because something strange would
have happened to your memories had the coin landed tails, that you are compelled
to change your degrees of belief to ½ when that counterfactual possibility would
have occurred.
The Prisoner
You have just been returned to your cell on death row, after your last supper. You are
to be executed tomorrow. You have made a last minute appeal to President George
W. Bush for clemency. Since Dick Cheney is in hospital and can not be consulted,
George W. will decide by flipping a coin: heads you die, tails you live. His decision
will be made known to the prison staff before midnight. You are friends with the
prison officer that will take over the guard of your cell at midnight. He is not allowed
to talk to you, but he will tell you of Bush’s decision by switching the light in your
cell off at the stroke of midnight if it was heads. He will leave it on if it was tails.
Unfortunately you don’t have a clock or a watch. All you know is that it is now
6 pm since that is when prisoners are returned to their cells after supper. You start
to reminisce and think fondly of your previous career as a Bayesian. You suddenly
get excited when you notice that there is going to be something funny about the
development of your degrees of belief. Like anybody else, you don’t have a perfect
internal clock. At the moment you are certain that it is 6 pm, but as time passes your
degrees of belief are going to be spread out over a range of times. What rules should
such developments satisfy?
Let us start on this problem by focusing on one particularly puzzling feature of
such developments. When in fact it is just before midnight, say 11.59 pm, you are
going to have a certain, non-zero, degree of belief that it is now later than midnight.
Of course, at 11.59 pm the light in your cell is still going to be on. Given that at
this time you will have a non-zero degree of belief that it is after midnight, and
given that in fact you will see that the light is still on, you will presumably take it
10 Some Problems for Conditionalization and Reflection 165
that the light provides some evidence that the outcome was tails. Indeed, it seems
clear that as it gets closer to midnight, you will monotonically increase your degree
of belief in tails. Moreover you know in advance that this will happen. This seems
puzzling. Of course, after midnight, your degree of belief in tails will either keep
on increasing, or it will flip to 0 at midnight and stay there after midnight. But that
does not diminish the puzzlement about the predictable and inevitable increase in
your degree of belief in tails prior to midnight. In fact, it seems that this increase is
not merely puzzling, it seems patently irrational. For since this increase is entirely
predictable, surely you could be made to lose money in a sequence of bets. At 6 pm
you will be willing to accept a bet on heads at even odds, and at 11.59 pm you will,
almost certainly, be willing to accept a bet on tails at worse than even odds. And
that adds up to a sure loss. And surely that means you are irrational.
Now, one might think that this last argument shows that your degree of belief
in tails in fact should not go up prior to midnight. One might indeed claim that
since your degree of belief in heads should remain ½ until midnight, you should
adjust your idea of what time it is when you see that the light is still on, rather than
adjust your degree of belief in tails as time passes. But of course, this suggestion is
impossible to carry out. Armed with an imperfect internal clock, you simply can not
make sure that your degree of belief in heads stays ½ until midnight, while allowing
it to go down after midnight. So how should they develop?
Let us start with a much simpler case. Let us suppose that there is no coin toss
and no light switching (and that you know this). You go into you cell at 6 pm. As
time goes by there will be some development of your degrees of belief as to what
time it is. Let us suppose that your degrees of belief in possible times develop as
pictured in the top half of Fig. 10.1.
Next, let us ask how your degrees of belief should develop were you to know
with certainty that the guard will switch the light off at 12 pm. It should be clear
that then at 11.59 pm your degree of belief distribution should be entirely confined
DEVELOPMENT OF YOUR DEGREES OF BELIEF WHEN THERE IS NO EVIDENCE REGARDING THE TIME
AT 7.30 PM
AT 9 PM
AT 11.59 PM
DEVELOPMENT OF YOUR DEGREES OF BELIEF WHEN YOU KNOW THE LIGHT WILL BE TURNED OFF 12PM
AT 7.30 PM AT 11.59 PM
AT 9 PM
??
to the left of 12 pm, as depicted in the bottom half of Fig. 10.1. For at 11.59 pm
the light will still be on, so that you know that it must be before 12 pm. But other
than that it should be entirely confined to the left of 12 pm, it is not immediately
clear exactly what you degree of belief distribution should be at 11.59 pm. It is not
even obvious that there should be a unique answer to this question. However, a very
simple consideration leads to a unique answer.
Suppose that, even though the guard is going to switch the light off at 12 pm,
you were not told that the guard is going to switch the light off at 12 pm. Then
the development of your degrees of belief would be as pictured in the top half of
Fig. 10.1. Next, suppose that at 11.59 pm your are told that the guard will switch
the light off at 12 pm, but you are not told that it is now 11.59 pm. Obviously,
since the light is still on you can infer that it is prior to 12 pm. Surely you should
update your degrees of belief by conditionalization: you should erase that part of
your degree of belief distribution that is to the right of 12 pm, and re-normalize
the remaining part (increase the remaining part proportionally). Now it is clear that
this is also the degree of belief distribution that you should have arrived at had
you known all along that the guard would turn the light off at 12 pm. For either
way you have accumulated exactly the same relevant information and experience by
11.59 pm. This uniquely determines how your degree of belief distribution should
develop when you know all along that the guard will turn the light off at 12 pm. At
any time this (constrained) distribution should be the distribution that you arrive at
by conditionalizing the distribution that you have if you have no evidence regarding
the time, on the fact that it is now before 12 pm. One can picture this development
in the following way. One takes the development of the top part of Fig. 10.1. As this
distribution starts to pass through the 12 pm boundary, the part that passes through
this boundary gets erased, and, in a continuous manner, it gets proportionally added
to the part that is to the left of the 12 pm boundary.
Now we are ready to solve the original puzzle. Your degrees of belief in that case
can be pictured as being distributed over possible times in two possible worlds: see
Fig. 10.2. The development is now such that when the bottom part of the degree
of belief distribution hits midnight, it gets snuffed out to the right of midnight,
and the rest of the degree of belief distribution is continuously re-normalized, i.e.
the top part of the degree of belief distribution and the remaining bottom part
are continuously proportionally increased as time passes. Note that Fig. 10.2 is
essentially different from Fig. 10.1. In Fig. 10.2 the top distribution starts to increase
its absolute size once the leading edge of the bottom distribution hits midnight. This
does not happen in Fig. 10.1, since there the degree of belief distributions each
were total degree of belief distributions in separate scenarios. Also, in Fig. 10.2 the
bottom distribution starts to increase in size once its leading edge hits midnight, but
it only increases half as much as it does in Fig. 10.1, since half of the “gains” is
being diverted to the top degree of belief distribution.
Thus, at the very least until it actually is midnight, the top and the bottom
degree of belief distribution will always be identical to each other, in terms of
shape and size, to the left of midnight. Prior to midnight, your degrees of belief
will be such that conditional upon it being prior to midnight, it is equally likely to
10 Some Problems for Conditionalization and Reflection 167
AT 8 PM
AT 10 PM AT 11.59 PM
AT 8 PM
AT 10 PM AT 11.59 PM
6 PM POSSIBLE TIMES 12 PM
be heads as tails. Your unconditional degree of belief in tails, however, will increase
monotonically as you approach midnight.
After midnight there are two possible ways in which your degree of belief
distribution can develop. If the light is switched off your degree of belief distribution
collapses completely onto midnight and onto the heads world. If in fact it is not
switched off your degree of belief distribution continues to move to the right in both
worlds, and it continues to be snuffed out in the heads world to the right of midnight,
and the remaining degrees of belief keep being proportionally increased.1
Now I can answer the questions that I started with. It is true, as I surmised, that
your degree of belief in tails will have increased by 11.59 pm. You will take your
internal sense of the passing of time, and combine it with the fact that the light is
still on, and you will take this as providing some evidence that the outcome is tails.
It is also true, as I surmised, that the light still being on will be taken by you as
providing some evidence that it is not yet midnight. For at 11.59 pm your degree
of belief distribution over possible times (averaged over the heads and tails worlds)
will be further to the left that it would have been had you believed that the light
would stay on no matter what. More generally, we have found a unique solution
1
Thus, for instance, if the light is not switched off, there must be a moment (which could be before
or after midnight) such that you have equal degree of belief in each of the 3 possibilities: heads and
it is before midnight, tails and it is before midnight, tails and its after midnight.
168 F. Arntzenius
to the puzzle of how a rational person’s sense of time must interact with evidence,
given how that person’s sense of time works in the absence of evidence.
Rather surprisingly, this interaction can be such, as it is in my example, that
you know in advance that at some specified later time you will, almost certainly,
have increased your degree of belief in tails, and that you could not possibly have
decreased your degree of belief in tails.2 It is also interesting to note that nothing
essential changes in this example if one assumes that the coin toss will take place
exactly at midnight. Thus it can be the case that one knows in advance that one will
increase one’s degrees of belief that a coin toss, which is yet to occur, will land tails.
Of course, at the time that one has this increased degree of belief one does not know
that this coin toss is yet to occur. Nonetheless, such predictable increases in degrees
of belief seem very strange.
John Collins has come up with the following variation of the case of the prisoner
that was described in the previous section. In Collins’s variation the prisoner has 2
clocks in his cell, both of which run perfectly accurately. However, clock A initially
reads 6 pm, clock B initially reads 7 pm. The prisoner knows that one of the clocks
is set accurately, the other one is one hour off. The prisoner has no idea which one
is set accurately; indeed he initially has degree of belief ½ that A is set accurately,
and degree of belief ½ that B is set accurately. As in the original case, if the coin
lands heads the light in his cell will be turned off at midnight, and it will stay on if it
lands tails. So initially the prisoner has degree of belief 1/4 in each of the following
4 possible worlds:
W1 : Heads and clock A is correct
W2 : Heads and clock B is correct
W3 : Tails and clock A is correct
W4 : Tails and clock B is correct.
2
One might wonder why I inserted the phrase “almost certainly” in this sentence. The reason for
this is that there is a subtlety as to whether you know at 6 pm that you will have an increased degree
of belief in tails at 11.59 pm. There is an incoherence in assuming that at 6 pm you know with
certainty what your degree of belief distribution over possible times will be at 11.59 pm. For if you
knew that you could simply wait until your degree of belief distribution is exactly like that. (You
can presumably establish by introspection what your degree of belief distribution is.) And when
you reach that distribution, you would know that it has to be 11.59 pm. So when that happens you
should then collapse your degree of belief distribution completely on it being 11.59 pm. But this is
incoherent. Thus, the fact that you do not have a perfect internal clock also implies that you can not
know in advance what your degree of belief distribution is going to look like after it has developed
(guided only by your internal clock). Thus you can not in advance be certain how your degree of
belief distribution over possible times will develop. Nonetheless you can be certain at 6 pm that
your degree of belief in tails will not decrease prior to midnight, and that it is extremely likely to
have increased by 11.59 pm. At 6 pm your expectation for you degree of belief in tails at 11.59 pm
will be substantially greater than 0.5.
10 Some Problems for Conditionalization and Reflection 169
When in fact it is 11.30 pm the light, for sure, will still be on. What will the
prisoner’s degrees of belief then be? Well, if the actual world is W1 , then, when it
actually is 11.30 pm clock A will read 11.30 pm and clock B will read 12.30 am.
In that case, since the prisoner sees that the light is still on, he will know that it can
not be that the coin landed heads and clock B is correct. I.e. his degree of belief in
W2 will be 0, and his degrees of belief in the three remaining options will be 1/3
each. Similarly if the actual world is W3 then at 11.30 pm prisoner will have degree
of belief 0 in W2 and degree of belief in 1/3 each of the remaining options. On the
other hand if the actual world is W2 or W4 , then when it is actually 11.30 pm, the
clock readings will be 10.30 pm and 11.30 pm, and the prisoner will still have the
degrees of belief that he started with, namely 1/4 in each of the 4 possibilities. The
prisoner, moreover, knows all of this in advance.
This is rather bizarre, to say the least. For, in the first place, at 6 pm the prisoner
knows that at 11.30 pm his degrees of belief in heads will be less or equal to what
they now are, and can not be greater. So his current expectation of what his degrees
of belief in heads will be at 11.30 pm, is less than his current degree of belief in
heads. Secondly, there is a clear sense in which he does not trust his future degrees
of belief, even though he does not think that he is, or will be, irrational, and even
though he can acquire new evidence (the light being on or off). Let Dt denote the
prisoner’s degrees of belief at time t. Then, e.g., D6.00 (clock B is correct/D11.30(clock
B is correct) D 1/3) D 0. For D11.30 (clock B is correct) D 1/3 only occurs in worlds
W1 and W3 , and in each of those worlds clock B is not correct, and the prisoner
knows this. Thus his current degrees of belief conditional upon his future degrees of
belief do not equal those future degrees of belief. So he systematically distrusts his
future degrees of belief. Strange indeed.
Sleeping Beauty
Some researchers are going to put Sleeping Beauty, SB, to sleep on Sunday night.
During the two days that her sleep will last the researchers will wake her up either
once, on Monday morning, or twice, on Monday morning and Tuesday morning.
They will toss a fair coin Sunday night in order to determine whether she will be
woken up once or twice: if it lands heads she will be woken upon Monday only, if
it lands tails she will be woken up on Monday and Tuesday. After each waking, she
will be asked what her degree of belief is that the outcome of the coin toss is heads.
After she has given her answer she will be given a drug that erases her memory of
the waking up, indeed it resets her mental state to the state that it was in on Sunday
just before she was put to sleep. Then she is put to sleep again. The question now is:
when she wakes up what should her degree of belief be that the outcome was heads?
Answer 1: Her degree of belief in heads should be 1/2. It was a fair coin and she
learned nothing relevant by waking up.
170 F. Arntzenius
Answer 2: Her degree of belief in heads should be 1/3. If this experiment is repeated
many times, approximately 1/3 of the awakenings will be heads-awakenings, i.e.
awakenings that happen on trials in which the coin landed heads.
Adam Elga3 has argued for the second answer. I agree with him, and I agree with
his argument. But let me amplify this view by giving a different argument for the
same conclusion. Suppose that SB is a frequent and rational dreamer. Suppose in
fact that every morning if SB is not woken up at 9 am, she dreams at 9 am that
she is woken up at 9 am. Suppose that the dream and reality indistinguishable in
terms of her experience, except that if SB pinched herself and she are dreaming,
it does not hurt (and she doesn’t wake up), while if she does this while she is
awake it does hurt. And let us suppose that SB always remembers to pinch herself
a few minutes after she experiences waking up (whether for real, or in a dream.)
What should her degrees of belief when she experiences waking up? It seems
obvious she should consider the 4 possibilities equally likely (the 4 possibilities
being: Monday&Tails&Awake, Monday&Heads&Awake, Tuesday&Tails&Awake,
Tuesday&Heads&Dreaming). If SB then pinches herself and finds herself to be
awake, she should conditionalize and then have degree of belief 1/3 in each of
the remaining 3 possibilities (Monday&Tails&Awake, Monday&Heads&Awake,
Tuesday&Tails&Awake). Suppose now that at some point in her life SB loses the
habit of dreaming. She no longer needs to pinch herself; directly upon waking she
knows that she is not asleep. However, it seems clear that this lack of dreaming
should make no difference as to her degrees of belief upon realizing that she is
awake. The process now occurs immediately, without the need for a pinch, but the
end result ought to be the same.
Here again the crucial assumption is commutativity: if the relevant evidence and
experience collected is the same, then the order of collection should not matter for
the final degrees of belief.4 But there is clearly something very puzzling about such
foreseeable changes in degrees of belief.
Duplication
Scenario 1 While you are at the beach, Vishnu tells you that, contrary to appear-
ances, you have existed only for one month: Brahma created you one month ago,
complete with all your memories, habits, bad back, and everything. What’s more,
says Vishnu, one month ago Brahma in fact created two human beings like you
(you are one of them), in exactly the same environment, at two different ends of
3
Elga, A. (2000): “Self-locating belief and the Sleeping Beauty problem”, Analysis 60, pp 143–
147.
4
Cian Dorr has independently arrived at the idea of using commutativity in order to argue for the
degrees of belief that Elga advocates in the Sleeping Beauty case. See Dorr, C.: “Sleeping Beauty:
in defence of Elga”, forthcoming, Analysis.
10 Some Problems for Conditionalization and Reflection 171
the universe: one on earth, one on twin earth. Unfortunately, Vishnu has a further
surprise for you: one month ago Shiva tossed a coin. If it landed heads Shiva will
destroy the human being that is on twin earth one month from now. If it landed tails
Shiva will do nothing. Vishnu does no tell you whether you are to be destroyed,
but recommends that if you want to know, you should go check your mail at
home. If there is a letter from president Bush for you, then you will be destroyed.
Before running home, what degree of belief should you have in the 4 possibilities:
Earth&Heads, Earth&Tails, Twin Earth&Heads, Twin Earth&Tails? It seems clear
that you should have degree of belief 1/4 in each, or at the very least, that it is
not irrational to have degree of belief 1/4 in each. You run home, and find no letter
from Bush. What should your degrees of belief now be? Well, by conditionalization,
they should now be 1/3 in each of the remaining possibilities (Earth&Tails, Twin
Earth&Heads, Twin Earth&Tails). Consequently you should now have degree of
belief 1/3 that the toss landed heads and 2/3 that it landed tails.
Scenario 2 same as scenario 1, except that Vishnu tells you that if the toss came
heads, your identical twin was destroyed by Shiva one week ago. Since you were
obviously not destroyed, you do not need to rush home to look for a letter from Bush.
In essence you have learned the same as you learned in the previous scenario when
you found you had no letter from Bush, and hence you should now have degree of
belief 1/3 that the toss landed heads.
Scenario 3 same as scenario 2, except that Vishnu tells you that rather than that 2
beings were created one month ago by Brahma, one of them already existed and had
the exactly the life you remember having had. This makes no relevant difference and
you should now have degree of belief 1/3 that the coin landed heads.
Scenario 4 same as scenario 3, except that Vishnu tells you that if the die landed
heads one month ago Shiva immediately prevented Brahma from creating the
additional human being one month ago. The upshot is that only if the coin landed
tails Brahma will have created the additional human being. Since the timing of the
destruction/prevention makes no relevant difference you should again have degree
of belief 1/3 that the coin landed heads.
Scenario 55 You are on earth, and you know it. Vishnu tells you that one month
from now Brahma will toss a coin. If it lands tails Brahma will create, at the other
end of the universe, another human being identical to you, in the same state as you
will then be, and in an identical environment as you will then be. What do you now
think that your degrees of belief should be in one month time? The answer is that
they should be the same as they are in scenario 5, since in one month time you will
be in exactly the epistemic situation that is described in scenario 5. Of course, it
is plausible to claim that your future self will actually be on earth, since it is only
your future continuation on earth that can plausibly be called “your future self”.
However, that does not mean that your future self can be sure that he is on earth. For
your future self will know that he will have the same experiences and memories,
5
This scenario is similar to the “Dr Evil scenario” in Elga, A. (manuscript): “Defeating Dr. Evil
with self-locating belief”.
172 F. Arntzenius
whether or not he is on earth or on twin earth, and thus he will not know whether he
can trust his memories. Thus you now have degree of belief ½ in heads, and yet you
know that in one month time, you will have degree of belief 1/3. This is bizarre, to
say the least.
Yet again, the crucial assumption in this reasoning is commutativity: your final
degrees of belief should not depend on the order in which you receive all the relevant
experience and evidence. You should end up with the same degrees of belief, namely
degree of belief ½ in heads, whether you all along knew you were on Earth, or
whether you only later found out that you were on Earth. But that can only be so if
you had degree of belief 1/3 in heads prior to discovering that you were on Earth.
Diagnosis
Bas van Fraassen’s Reflection Principle6 says that one should trust one’s future
degrees belief in the sense that one’s current degree of belief D0 in any proposition
X, given that one’s future degree of belief Dt in X equals p, should be p:
D0 (X/Dt (X) D p) D p. Given that one is sure that one will have precise degrees of
belief at time t, the Reflection Principle entails that one’s current
Pdegrees of belief
equal the expectations of one’s future degrees of belief: D0 (X) D pD0 (Dt (X) D p).
The Reflection Principle is violated in each of the 5 puzzles that I have presented,
for in each case there is a time at which one’s expectation of one’s future degree
of belief in Heads differs from one’s current degree of belief in Heads. This is
presumably why we find these cases, prima facie, so worrying and strange.
The source of the problem, I claim, is that the degrees of belief of perfectly
rational people, people who are not subject to memory loss or any other cognitive
defect, can develop in ways that are as yet unrecognized, and indeed are not
allowed according to standard Bayesian lore. Standard Bayesian lore has it that
rational people satisfy the Principle of Conditionalization: rational people alter
their degrees of belief only by strict conditionalization on the evidence that they
acquire.7 Strict conditionalization of one’s degrees of belief upon proposition X
can be pictured in the following manner. One’s degrees of belief are a function on
the set of possibilities that one entertains. Since this function satisfies the axioms
of probability theory it is normalized: it integrates (over all possibilities) to 1.
6
See van Fraassen (1995): “Belief and the problem of Ulysses and the sirens”, Philosophical
Studies 77: 7–37.
7
Strict conditionalization: when one learns proposition X at t, one’s new degrees of belief Dt equals
one’s old degrees of belief D0 conditional upon X: Dt (Y) D D0 (Y/X). One might also allow Jeffrey
conditionalization. It matters not for our purposes.
10 Some Problems for Conditionalization and Reflection 173
8
Bas van Fraassen has, in conversation with me, has suggested that in such situations Condition-
alization indeed should be violated, but Reflection should not. In particular he suggested that the
degrees of belief of the traveler should become completely vague, upon arrival in Shangri La. This
does not strike me as plausible. Surely upon arrival in Shangri La our traveler is effectively in
the same epistemic situation as someone who simply knows that a fair coin has been tossed. One
can make this vivid by considering two travelers, A and B. Traveler A never looks out of the
window of the car, and hence maintains degree of belief ½ in heads all the way. (The memory
replacement device does not operate on travelers who never look out of the window.) Traveler A,
even by van Fraassen’s lights, upon arrival in Shangri La, should still have degree of belief ½ in
Heads. However, traveler B, does look out of the window during the trip. Upon arrival, by van
Fraassen’s lights, B’s degrees of belief should become completely vague. But it seems odd to me
that traveler B is epistemically penalized, i.e. is forced to acquire completely vague degrees of
174 F. Arntzenius
experiential paths that end up in the same experiential state. That is to say the
traveler’s experiences earlier on determine whether possibility A is the case (Path by
the Mountain), or whether possibility B is the case (Path by the Ocean). But because
of the memory replacement that occurs if possibility B is the case, those different
experiential paths merge into the same experience, so that that experience is not
sufficient to tell which path was taken. Our traveler therefore has an unfortunate
loss of information, due to the loss of the discriminating power of his experience.
What is somewhat surprising is that this loss of discriminating power is not due to
any loss of memory or any cognitive defect on his part: it is due to the fact that
something strange would have happened to him had he taken the other path! This
loss of discriminatory power of experience, and consequent spreading out of degrees
of belief here does not involve self-locating degrees of belief. Suppose, e.g., that our
traveler is the only person ever to travel along either paths. Then our traveler initially
is unsure whether he is in a world in which path A is never taken or whether he is in
a world in which path B is never taken. He then becomes sure that he is in a world
in which path B is never taken. Even later, upon arrival, he again becomes unsure as
to which world he is in. None of this has anything to do with self-locating beliefs.9
The source of the Sleeping Beauty and Duplication problems is exactly the
same. In the case of Sleeping Beauty the possibility of memory erasure ensures
that the self-locating degrees of belief of Sleeping Beauty, even on Monday when
she has suffered no memory erasure, become spread out over two days. In the
Duplication case, yet again, the possible duplication of experiences forces one to
become uncertain as to where (or who) one is. The cause of the spreading of degrees
of belief in both cases is “experience duplication”, and has nothing to do with the
self-locating nature of these beliefs.10
It is not very surprising that the spreading of degrees of belief can bring about
a violation of Reflection. For instance, in the non-self locating case a predictable
reduction from degree of belief 1 in some proposition X to anything less than 1 will
belief, just because he looked out of the window during the trip, when it seems clear that he ends
up in exactly the same epistemic position as his companion, who did not look out of the window.
9
It is obvious how to generalize this case to a case in which there are memory replacement devices
at the end of both roads, where these memory replacement devices are indeterministic, i.e. when
it is the case that for each possible path there are certain objective chances for certain memories
upon arrival in Shangri La. For, given such chances (and the Principal Principle), one can easily
calculate the degrees of belief that one should have (in heads and tails) given the memory state that
one ends up with. And, generically, one will still violate Conditionalization and Reflection.
10
Some people will balk at some of the degrees of belief that I have argued for in this paper, in
particular in the self-locating cases. For instance, some people will insist that tomorrow one should
still be certain that one is on Earth, even when one now knows (for sure) that a perfect duplicate of
oneself will be created on Mars at midnight tonight. I beg to differ. However, even if in this case,
and other cases, one disagrees with me as to which degrees of belief are rationally mandated, the
main claim of this paper still stands. The main claim is that in such cases of possible experience
duplication, it is at the very least rationally permissible that one’s degrees of belief become
more spread out as time progresses, and hence rational people can violate Conditionalization and
Reflection.
10 Some Problems for Conditionalization and Reflection 175
immediately violate Reflection: now you know it, now you don’t. The argument
is slightly less straightforward in the self-locating case. Consider, e.g., a case in
which one is on Earth and one knows that at midnight a duplicate of oneself will
be created on Mars. One might claim that since one now is certain that one is on
Earth, and at midnight one will be uncertain as to whether one is on Earth, that
one has a clear violation of Reflection. However, this is too quick. To have a clear
violation of Reflection it has to be the very same “content of belief” such that one’s
current degree of belief differs from one’s expectation of one’s future degree of
belief. Depending on what one takes to be the contents of belief when it concerns
self-locating beliefs (propositions? maps from locations to propositions? .....?), one
might argue that the contents of belief are not the same at the two different times,
and hence there is no violation of Reflection. However the arguments of sections IV
and V show that one can in any case parlay such spreading of self-locating degrees
of belief into violations of Reflection concerning such ordinary beliefs as to whether
a coin lands Heads or Tails. So Reflection is suckered anyhow.
Finally, the original case of the prisoner involves both a spreading of degrees of
belief and a shifting of degrees of belief. The shifting is due simply to the passage
of time and the self-locating nature of the beliefs. The spreading is due to the fact
that our prisoner does not have experiences that are discriminating enough to pick
out a unique location in time.11 The analysis of section II shows, yet again, that
such a spreading and shifting of self-locating degrees of belief can be parlayed into
a violation of Reflection concerning such ordinary beliefs as to whether a coin lands
Heads or Tails.
Conclusions
The degrees of belief of rational people can undergo two as yet unrecognized types
of development. Such degrees of belief can become more spread out due to the
duplication of experiences, or more generally, due to the loss of discriminating
power of experiences, and thereby violate Conditionalization. In addition self-
locating degrees of belief will generically be shifted over the space of possible
11
One might model the prisoner here as having unique distinct experiences at each distinct, external
clock, time, and as initially having precise degrees of belief over the possible ways in which those
experiences could correlate to the actual, external clock, time. If one were to do so then the prisoner
would merely be initially uncertain as to which world he was in (where worlds are distinguished
by how his experiences line up with the actual, external clock, time), but for each such possible
world would be always certain as to where he was located in it. And, if one were to do so, then the
original prisoner case would be essentially the same case as Collins’s prisoner case: no uncertainty
of location in any given world, merely an initial uncertainty as to which world one is in, and
a subsequent shifting of the locally concentrated degrees of belief within each of the possible
worlds. However, there is no need to represent the original prisoner case that way. Indeed it seems
psychologically somewhat implausible to do so. More importantly, the arguments and conclusions
of this paper do not depend on how one models this case.
176 F. Arntzenius
locations, due to the passage of time, and thereby violate Conditionalization. Such
violations of Conditionalization can be parlayed into violations of Reflection, and
lead to a distrust of one’s future degrees of belief. Strange, but not irrational.
Acknowledgements I would like to thank John Collins, Adam Elga, John Hawthorne, Isaac Levi,
Barry Loewer, and Tim Maudlin for extensive and crucial comments and discussions on earlier
versions of this paper.
Chapter 11
Stopping to Reflect
Our note is prompted by a recent article by Frank Arntzenius, “Some Problems for
Conditionalization and Reflection”.1 Through a sequence of examples, that article
purports to show limitations for a combination of two inductive principles that
relate current and future rational degrees of belief: Temporal Conditionalization and
Reflection:
(i) Temporal Conditionalization is the rule that, when a rational agent’s corpus
of knowledge changes between now and later solely by learning the (new)
evidence, B, then coherent degrees of belief are updated using conditional
probability according the formula, for each event A,
(ii) Reflection2 between now and later is the rule that current conditional degrees
of belief defer to future ones according to the formula that, for each event A,3
1
The Journal of Philosophy Vol C, Number 7 (2003), 356–370.
2
See B.van Frasseen’s “Belief and the Will,” this Journal, 81 (1984), 235–256. van Fraassen’s
Reflection has an antecedent in M.Goldstein’s “Prevision of a Prevision,” JASA 78 (1983): 817–
819.
3
Here and through the rest of this note ‘r’ is a standard designator for a real number – this in order
to avoid Miller-styled problems. See, D.Miller’s “A Paradox of Information,” Brit. J. Phil. Sci. 17
(1966):144–147.
M.J. Schervish () • T. Seidenfeld • J.B. Kadane
Carnegie Mellon University, Pittsburgh, PA, USA
We will use the expression “Reflection holds with respect to the event A.” to apply
to this equality for a specific event A.
It is our view that neither of these principles is mandatory for a rational agent.4
However, we do not agree with Arntzenius that, in the examples in his article, either
of these two is subject to new restrictions or limitations beyond what is already
assumed as familiar in problems of stochastic prediction.
To the extent that a rational person does not know now exactly what she or he
will know in the future, anticipating one’s future beliefs involves predicting the
outcome of a stochastic process. The literature on stochastic prediction relies on two
additional assumptions regarding states of information and the temporal variables
that index them5 :
(iii) When t2 > t1 are two fixed times, then the information the agent has at t2
includes all the information that she or he had at time t1 .6 This is expressed
mathematically by requiring that the collection of information sets at all times
through the future form what is called a filtration.
Second, since the agent may not know now the precise time at which some
specific information may become known in the future, then future times are treated
as stopping times. That is:
(iv) For each time T (random or otherwise) when a prediction is to be made, the
truth or falsity of the event fT tg is known at time t, for all fixed t. Such
(random) times T are called stopping times.
In this note, we apply the following three results7 to the examples in Arntzenius’
article. These results, we believe, help to explain why the examples at first appear
puzzling and why they do not challenge either Temporal Conditionalization or
Reflection. Result 11.1 covers the ordinary case, where Reflection holds. Results
11.2 and 11.3 establish that Reflection will fail when one or the other of the
4
We have argued, for example, that when (subjective) probability is finitely but not countably
additive, then there are simple problems where (i) is reasonable, but where (i) precludes (ii).
See our “Reasoning to a Foregone Conclusion,” JASA 91 (1996): 1228–1236. Also, Levi argues
successfully, we think, that (i) is not an unconditional requirement for a rational agent. See his
“The Demons of Decision,” The Monist 70 (1987): 193–211.
5
See, for example, section 35 of P.Billingsley, Probability and Measure 3rd edition, J.Wiley, 1995.
6
Here and through the rest of this note ‘t’ is a standard designator for a real number for time.
More precisely, we use the subscripted variable, e.g. ‘t1 ’ to denote a specific time as the agent of
the problem is able to measure it. We presume that the agent has some real-valued “clock” that
quantifies a transitive relation of “is later than.” Subtleties about the differences between how time
is so indexed for different observers is relevant to one of Arntzenius’ puzzles, to wit, the Prisoner’s
Problem.
7
Proofs for these three are given in the “Appendix”. In this note, we assume that all probability is
countably additive.
11 Stopping to Reflect 179
two additional assumptions, (iii) and (iv) fail. It is not hard to locate where these
assumptions are violated in the examples that Arntzenius presents.
Result 11.1 When “later” is a stopping time, when the information sets of future
times form a filtration, and assuming that degrees of belief are updated by Temporal
Conditionalization, then Reflection between now and later follows.
Result 11.2 When the information known to the agent over time fails to form
a filtration, not only is Temporal Conditionalization vacuously satisfied (as its
antecedent fails), but then Reflection fails unless what is forgotten in the failure
of filtration becomes practically certain (its probability becomes 0 or 1) in time for
future predictions, later.
Result 11.3 However, if the information known to the agent over time forms
a filtration and Temporal Conditionalization holds, but “later” is not a stopping
time, then Reflection between now and later holds for the specific event A, i.e.,
Pnow .AjPlater .A/ D r/ D r, subject to the necessary and sufficient condition, (11.1),
below.
Let Ht be the event “t D later.” When later is not a stopping time, the event Ht
is news to the agent making the forecasts. The question at hand is whether this
news is relevant to the forecasts expressed by Reflection. To answer that question,
concerning such forecasts about the event A, define the quantity yt (A) by
Thus, Reflection is satisfied between now and later if and only if (11.1) holds for
each A.
Next, we illustrate the second and third results with examples that show how
Reflection may fail.
180 M.J. Schervish et al.
: : : . Notice that Plater (A) ½, no matter when T occurs, and Plater (A) < ½ for T > 0,
since if T > 0, the initial sequence of tosses that the agent observes all land tails
up. However, from the value of Plater (A) and knowing it is this quantity, one may
calculate T exactly and thus know the outcome of the n C 1st toss, which is a heads.
But when the agent computes Plater (A) at the time later, he does not then know that
later has arrived. Thus, later, he is not in a position to use the extra information that
he would get from knowing when T occurs to learn the outcome of the n C 1st toss.
To repeat the central point, T is not a stopping variable.
It is evident that Reflection fails, Pnow (A j Plater (A) D r) ¤ Plater (A). The extra
information, namely that Plater (A) D r rather than merely that Pt (A) D r where t is
the time on the agent’s clock, is information that is relevant to his current probability
of A, since it reveals the outcome of the next toss. Even now, prior to any coin tosses,
when he computes Pnow (A j Plater (A) D r), the conditioning event reveals to him the
value of T, since n is a function of r. In this case, the conditioning event entails the
information of n and when the first heads occurs, namely, on the n C 1st toss. Then
Reflection fails as
1
Pnow AjPlater .A/ D Œ1 C .3=2/n –1 D 1 C 3n =2nC1 :
It remains only to see that (11.1) fails as well. Consider the quantity yt (A) used in
.Ht jPt .A/Dr&A/
condition (11.1). yt .A/ D Pnow
Pnow .Ht jPt .A/Dr/
. Given Pt (A) D r, the added information
that A obtains is relevant to the agent’s current probability when later occurs.
Specifically, as Pt (A) D [1 C (3/2)n ] 1 entails that t D n,
Pnow Ht jPt .A/ D Œ1 C .3=2/n –1 D Pnow XtC1 D 1jPt .A/ D Œ1 C .3=2/n –1
D .1=2/Œ1 C .3=2/n –1 C .1=4/ .3=2/n Œ1 C .3=2/n –1
< 2
1
–
D Pnow XtC1 D 1jPt .A/ D Œ1 C .3=2/n &A 1
DP now H jP .A/ D Œ1 C .3=2/n –1 &A :
t t
Acknowledgements Our research was carried out under NSF Grant DMS 0139911. We thank
Joseph Halpern for alerting one of us (T.S.) to the Sleeping Beauty problem, independent of
Arntzenius’ article.
8
See also J.Y.Halpern’s “Sleeping Beauty Reconsidered: Conditioning and Reflection in Asyn-
chronous Systems,” Dept. of Computer Science, Cornell University. September, 2003. We agree
with Halpern that, in our words, coherence of a sequence of previsions does not require that
they will be well calibrated – in a frequency sense of “well calibrated.” That is, we think it is
reasonable for Sleeping Beauty to give a prevision of ½ to the event that the known fair coin
landed heads on the flip in question, each time she is woken up. What complicates the analysis
is that the repeated trials in Sleeping Beauty’s game do not form an independent sequence, and
her mandated forgetfulness precludes any “feedback” about the outcome of past previsions. When
repeated trials are dependent and there is no learning about past previsions, coherent previsions
may be very badly calibrated in the frequency sense. For other examples and related discussion of
this point see, e.g., Seidenfeld, T. (1985) “Calibration, Coherence, and Scoring Rules,” Philosophy
of Science 52: 274–294.
11 Stopping to Reflect 183
Appendix
Proof of Result 11.19 Assume that when X is a random variable and C is an event,
the agent’s expected value EP (X) and conditional expected value EP (Xj C) exist
with respect to the probability P. Let A be an event and let X D P(AjY) be a random
variable, a function of the random variable Y. Then, as a consequence of the law of
total probability, with C also a function of Y,
Assume that the agent’s degrees of belief now include his later degrees of belief as
objects of uncertainty. That is, future events such as “Plater (A) D r” and “Plater (A j
C) D q” are proper subjects, now, of the agent’s current degrees of belief. Suppose
that, now, the agent anticipates using (i) Temporal Conditionalization in responding
to the new evidence Y D y that he knows he will learn at the stopping time, later.
For example, Y might be the result of a meter reading made at the later time,
with a sample space of m many possible values Y D fy1 , : : : , ym g. Thus, by (i),
for whichever value y of Y that results,
Then, by (i) and (11.2), for C also a function of Y, the agent now believes that
Let C be the event, “Plater (A) D r,” which we presume is a possible value for Plater (A)
from the agent’s current point of view. (This C is function of Y.) Then, because later
is a stopping time,
As
9
van Fraassen (1995) “Belief and the Problem of Ulysses and the Sirens,” Phil. Studies 77: 7–37,
argues (pp. 17–19) that Temporal Conditionalization implies Reflection. His argument (pp. 18–19)
has an additional, tacit assumption that the time t at which conditioning applies for Reflection is a
stopping time.
184 M.J. Schervish et al.
therefore
for a set of r-values of probability 1 under Pt1 . But, since it is known at t1 that E
will be forgotten at t2 , Pt1 .0 < Pt2 .E/ < 1/ D 1. Hence Reflection fails as 0 < r < 1
in (11.8).
Proof of Result 11.3 Assume that the agent’s information sets form a filtration over
time and that Temporal Conditionalization holds between now and later but that
later is not a stopping time for the agent. Let Ht be the event “later D t” for the
specific time t. That is, assume that 0 < Plater (Ht ) < 1, when later occurs at t.
Later is the future time we will focus on in calculating whether Reflection holds,
i.e. we will inquire whether for each event A, Pnow (A j Plater (A) D r) D r, or not. We
calculate as follows.
X
D Pnow .A&Ht jPlater .A/ D r/
t
It is well known that the usual versions of probability kinematics have serious
limitations. According to the classical notion of conditioning when one learns
a piece of information A its probability raises to its maximum (one). Moreover
no further instance of learning will be capable of defeating A. Once a piece
of information is learned one should be maximally confident about it and this
confidence should remain unaltered forever. It is clear that there are many instances
of learning that cannot be accommodated in this Procrustean bed. There are various
ways of amending this limited picture by enriching the Bayesian machinery. For
example, one can appeal to a notion of primitive conditional probability capable of
making sense of conditioning on zero measure events. But the detailed consideration
of this alternative leads to similar limitations: the picture of learning that thus
arises continues to be cumulative. There are many ways of overcoming these
important limitations. Williamson considers one possible way of doing so in his
essay reprinted in the section on Bayesian epistemology. One of the lessons that
have been learned in recent years is that there is no apparent way of circumventing
this rigidity of Bayesianism without introducing in some way a qualitative doxastic
or epistemic notion as a primitive alongside probability. Here are two examples:
called maxi-choice sets, do satisfy this requirement. But then if they are optimal
until this point of view why not to use a selection function that picks singletons
from K?A? AGM showed that this type of contraction is badly behaved. So is the
opposite idea of taking directly the intersection of the entire K?A. So, partial meet
appears as an Aristotelian middle ground that happens to satisfy a set of intuitive
postulates. Or so argued AGM. Nevertheless the subsequent discussion focused on
some controversial AGM postulates like recovery (requiring that if one contracts K
with A and then adds A to the result of this contraction one returns to K). There
are many putative counterexamples to recovery and this generated the interest in
defining notions of contraction that fail to satisfy recovery. Isaac Levi is a well-
known defender of this line of thought and in his article he characterizes a notion
of contractions that does fail to satisfy recovery. The central idea he proposes is that
what is minimized in contraction is not information loss but loss of informational
value. The notion of information value is some sort of epistemic utility obeying
basic structural postulates like:
(Weak Monotony) If X Y, then V(X) V(Y).
This is an intuitive principle that makes permissible that two sets carry equal
informational value even when one the sets carries more information than the other.
The additional information might not be valuable at all and therefore the level of
informational value of the larger set might remain equal to the informational value
of the smaller set. What other constraints one should impose on information value?
In the article reprinted here Levi presents a very specific form of information value
that he uses to characterize a particular notion of withdrawal (some rational notion of
contraction where recovery fails) that he calls mild contraction. Rott and Pagnucco
offered an alternative model of the same notion that they call severe withdrawal.
It is clear that when epistemic utility satisfies the constraints proposed by Levi
this particular form of contraction obtains. What seems to be missing is a pre-
systematic explanation of why epistemic utility should satisfy these constraints or a
justification of some controversial properties of severe withdrawal (like the postulate
of antitony). It is A true thought that the introduction of epistemic utility in models
of belief change opens up an insightful research strategy that at the moment remains
relatively unexplored.
Sven Ove Hansson offers another account of contraction that fails to obey
recovery. Nevertheless he arrives at this conclusion in a completely different way.
In fact, Hansson is one of the most prominent defenders of finite models of belief
in terms of belief bases (finite sets of sentences that are one of the possible
axiomatic bases of a given theory). It is easy to characterize a version of partial meet
contraction for bases by using the obvious variant of the definition used for theories.
Then one can proceed as follows: an operation of contraction on a belief set K is
generated by a partial meet base contraction if and only if there is a belief base B
for K and an operator of partial meet contraction for B such that the contraction
of K with A yields the logical consequences of (B A) for all sentences A in the
underlying language. Hansson shows that if an operation on a belief set is generated
by some partial meet base contraction, then it satisfies the classical AGM postulates
192 H. Arló-Costa et al.
for contraction except recovery. In addition the operation satisfies other postulates
encoding a specific notion of conservativity.
The article by Spohn articulates an important epistemological idea, namely that
one should focus on changes of entire epistemic states endowed with more structure
than mere belief. This approach, in a more general setting, is also independently
pursued by Adnan Darwiche and Judea Pearl in Darwiche and Pearl (1996). Spohn
focuses on a particular type of epistemic state that now is usually called a ranking
function. Roughly a ranking function is a function from the set of propositions
(D sets of possible worlds) to the set of natural, real, or ordinal numbers, similar to
a probability measure. Epistemologically one can see such functions as numerical
(but non-probabilistic) representations of a notion of plausibility. In the presence
of a new input the current ranking is mapped to a new ranking incorporating the
incoming information (in revision). This is an ideal setting to study the structure of
iterated changes of view and as a matter of fact both articles offer the best existing
articulation of principles regulating iterated change. This is an important area of
research in this field that still remains relatively open.
There are a number of recent surveys and books that complement the reprinted papers appearing
here. Regarding surveys the two most recent surveys are: Logic of Belief Revision, in Stanford
Encyclopedia of Philosophy, 2006, by Sven Ove Hansson; and: Belief Revision in The Continuum
Companion to Philosophical Logic, (eds.) L. Hornsten and R. Pettigrew, by Horacio Arlo-Costa
and Paul Pedersen. These surveys contain references to previous surveys in the field. A classic
book in this area that continues to be useful is Peter Gärdenfors’s monograph: Knowledge in Flux:
Modeling the Dynamic of Epistemic States, College Publications (June 2, 2008). A very useful
textbook presentation of some of the main results in the theory of belief change is: A Textbook of
Belief Dynamics: Theory Change and Database Updating, Springer 2010, by Sven Ove Hansson.
The book focuses mainly on syntactic presentations of belief change and it contains a very detailed
presentation of belief base updating. Some more recent topics like iterated belief change are not
treated in detail though.
Decision theoretic foundations for belief change are provided in various books by Hans
Rott and Isaac Levi (independently). A book-length argument articulating Rott’s account (and
extending the content of the article reprinted here) appears in: Change, Choice and Inference:
A Study of Belief Revision and Non-monotonic Reasoning, Oxford Logic Guides, 2001. Some
challenges to this type of foundational strategy are considered by Arlo-Costa and Pedersen in:
“Social Norms, Rational Choice and Belief Change,” in Belief Revision Meets Philosophy of
Science, (eds.) E.J. Olsson and S. Enqvist, Springer, 2011. Isaac Levi has also published various
essays where he presents decision theoretic foundations for belief change (but his account is rather
different than Rott’s). The most recent book presenting Levi’s current views about belief change
is: Mild Contraction: Evaluating Loss of Information Due to Loss of Belief, Oxford, 2004. Further
references to his work can be found in this book.
The previous accounts tried to justify principles of belief change in the broader context of
Bayesian or neo-Bayesian theory. An almost orthogonal view consists in deriving principles of
belief change by taking some form of formal learning theory as an epistemological primitive. While
all the previous accounts focused on justifying the next step of inquiry (or a finite and proximate
sequence of steps) this second strategy focuses on selecting belief change methods capable of
learning the truth in the long run. One important paper in this tradition is Kevin Kelly’s: Iterated
12 Introduction 193
Belief Revision, Reliability, and Inductive Amnesia, Erkenntnis, 50, 1998 pp. 11–58. Daniel
Osherson and Eric Martin present a similarly motivated account that nevertheless is formally quite
different from Kelly’s theory in: Elements of Scientific Inquiry, MIT, 1998.
There are various attempts to extend the theory of belief revision to the multi-agent case
and to present a theory of belief change as some form of dynamic epistemic logic. The idea in
this case is to use traditional formal tools in epistemic logic to represent the process of belief
change. Hans van Ditmarsch, Wiebe van der Hoek, and Barteld Kooi have recently published a
textbook with some basic results in this area: Dynamic Epistemic Logic, Springer, 2011. Krister
Segerberg has developed his own brand of dynamic doxastic logic in a series of articles since at
least the mid 1990’s. One recent paper including rather comprehensive results in this area is: “Some
Completeness Theorems in the Dynamic Doxastic Logic of Iterated Belief Revision,” Review of
Symbolic Logic, 3(2):228–246, 2010.
The notion of relevance is quite central for a representation of belief and belief change. In a
Bayesian setting there are standard ways of articulating relevance. But there is recent work that has
used proof theoretic techniques to deal with relevance rather than probability theory. Rohit Parikh
initiated this area of research with an article published in 1999: Beliefs, belief revision, and splitting
languages, Logic, language, and computation (Stanford, California) (Lawrence Moss, Jonathan
Ginzburg, and Maarten de Rijke, editors), vol. 2, CSLI Publications, pp. 266–278. Recently David
Makinson has contributed as well an important article in collaboration with George Kourousias,:
Parallel interpolation, splitting, and relevance in belief change, Journal of Symbolic Logic 72
September 2007 994-1002. This article contains a detailed bibliography of recent work in this
area.
One recent paper including rather comprehensive results in this area is: “Some completeness
theorems in the dynamic doxastic logic of iterated belief revision,” Review of Symbolic Logic, 3,
02, 2010. For more on iterated belief revision please refer to: Darwiche and Pearl (Darwiche, A., &
Pearl, J. (1996). On the logic of iterated belief revision. Artificial Intelligence, 89, 1–29) appears
in: Change, choice and inference: A study of belief revision and non-monotonic reasoning, Oxford
Logic Guides, 2001.
And there is also more to be found in Pagnucco and Rott (Pagnucco, M., & Rott, H. (1999).
Severe withdrawal – and recovery. Journal of Philosophical Logic, 28, 501–547. See publisher’s
“Erratum” (2000), Journal of Philosophical Logic, 29, 121) and Lindström (Lindström, S. (1991).
A semantic approach to nonmonotonic reasoning: Inference operations and choice. Uppsala Prints
and Preprints in Philosophy, no. 6/1991, University of Uppsala).
Chapter 13
On the Logic of Theory Change: Partial Meet
Contraction and Revision Functions
1 Background
The simplest and best known form of theory change is expansion, where a new
proposition (axiom), hopefully consistent with a given theory A, is set-theoretically
added to A, and this expanded set is then closed under logical consequence.
There are, however, other kinds of theory change, the logic of which is less well
understood. One form is theory contraction, where a proposition x, which was
earlier in a theory A, is rejected. When A is a code of norms, this process is
known among legal theorists as the derogation of x from A. The central problem
is to determine which propositions should be rejected along with x so that the
contracted theory will be closed under logical consequence. Another kind of change
is revision, where a proposition x, inconsistent with a given theory A, is added to
A under the requirement that the revised theory be consistent and closed under
logical consequence. In normative contexts this kind of change is also known as
amendment.
mean, as is customary, a set A of propositions that is closed under Cn; that is, such
that A D Cn(A), or, equivalently, such that A D Cn(B) for some set B of propositions.
As in (Alchourron and Makinson 1982b), we assume that Cn includes classical
tautological implication, is compact (that is, y 2 Cn(X 0 ) for some finite subset X 0
of X whenever y 2 Cn(X)), and satisfies the rule of “introduction of disjunctions in
the premises” (that is, y 2 Cn(X [ fx1 _ x2 g) whenever y 2 Cn(X [ fx1 g) and y 2
Cn(X [ fx2 g)). We say that a set X of propositions is consistent (modulo Cn) iff for
no proposition y do we have y &:y 2 Cn(X).
and then the contraction A P x contains the propositions which are common to the
selected elements of A ? x. Partial meet revision is defined via the Levi identity as
A u x D Cn((A P :x) [ fxg). Note that the identity of A P x and A u x depends on
the choice function ”, as well, of course, as on the underlying consequence operation
Cn. Note also that the concept of partial meet contraction includes, as special cases,
those of maxichoice contraction and (full) meet contraction. The former is partial
meet contraction with ”(A ? x) a singleton; the latter is partial meet contraction with
”(A ? x) the entire set A ? x. We use the same symbols P and u here as for the
maxichoice operations in (Alchourron and Makinson 1982b); this should not cause
any confusion.
Our first task is to show that all partial meet contraction and revision func-
tions satisfy Gärdenfors’ postulates for contraction and revision. We recall (cf.
(Alchourron and Makinson 1982b) and (Makinson 1985)) that these postulates may
conveniently be formulated as follows:
( P 1) A P x is a theory whenever A is a theory (closure).
( P 2) A P x A (inclusion).
( P 3) If x 62 Cn(A), then A P x D A (vacuity).
( P 4) If x 62 Cn(¿), then x 62 Cn(A P x) (success).
( P 5) If Cn(x) D Cn(y), then A P x D A P y (preservation).
( P 6) A Cn((A P x) [ fxg) whenever A is a theory (recovery).
The Gärdenfors postulates for revision may likewise be conveniently formulated
as follows:
(u1) A u x is always a theory.
(u2) x 2 A C x.
(u3) If :x 62 Cn(A), then A u x D Cn (A [ fxg).
(u4) If :x 62 Cn(¿), then A u x is consistent under Cn.
(u5) If Cn(x) D Cn(y), then A u x D A u y.
(u6) (A u x) \ A D A P :x, whenever A is a theory.
Our first lemma tells us that even the very weak operation of (full) meet
contraction satisfies recovery.
Lemma 2.1 Let A be any theory. Then A Cn((A x) [ fxg).
Proof In the limiting case that x 62 A we have A x D A and we are done. Suppose
x 2 A. Then, by Observation 2.1 of (Alchourron and Makinson 1982b), we have
A x D A \ Cn(:x) so it will suffice to show A Cn((A \ Cn(:x)) [ fxg). Let a
2 A. Then since A is a theory, :x _ a 2 A. Also :x _ a 2 Cn(:x), so :x _ a 2 A
\ Cn(:x), so since Cn includes tautological implication, a 2 Cn((A \ Cn(:x)) [
fxg).
Corollary 2.2 Let P be any function on pairs A, x. Let A be any theory. If P is
bounded over A, then P satisfies recovery over A.
Observation 2.3 Every partial meet contraction function P satisfies the Gärden-
fors postulates for contraction, and its associated partial meet revision function
satisfies the Gärdenfors postulates for revision.
13 On the Logic of Theory Change: Partial Meet Contraction and Revision Functions 199
Proof It is easy to show (cf. (Gärdenfors 1978) and (Gärdenfors 1982)) that the
postulates for revision can all be derived from those for contraction via the Levi
identity. So we need only verify the postulates for contraction. Closure holds,
because when A is a theory, so too is each B 2 A ? x, and the intersection of
theories is a theory; inclusion is immediate; vacuity holds because when x 62 Cn(A)
then A ? x D fAg so ”(A ? x) D fAg; success holds because when x 62 Cn(¿) then
by compactness, as noted in Observation 2.2 of (Alchourron and Makinson 1982a),
A ? x is nonempty and so A P x D \”(A ? x) ° x; and preservation holds because
the choice function is defined on families A ? x rather than simply on pairs A, x,
so that when Cn(x) D Cn(y) we have A ? x D A ? y, so that ”fA ? x) D ” (A ?
y). Finally, partial meet contraction is clearly bounded over any set A, and so by
Corollary 2.2 satisfies recovery.
In fact, we can also prove a converse to Observation 2.3, and show that for
theories, the Gärdenfors postulates for contraction fully characterize the class of
partial meet contraction functions. To do this we first establish a useful general
lemma related to 7.2 of (Alchourron and Makinson 1982b).
Lemma 2.4 Let A be a theory and x a proposition. If B 2 A ? x, then B 2 A ? y for
all y 2 A such that B ° y.
Proof Suppose B 2 A ? x and B ° y, y 2 A. To show that B 2 A ? y it will suffice
to show that whenever B B0 A, then B0 ` y. Let B B0 A. Since B 2 A ? x we
have B0 ` x. But also, since B 2 A ? x, A ? x is nonempty, so A x D \(A ? x) B;
so, using Lemma 2.1, A Cn(B [ fxg) Cn(B0 [ fxg) D Cn(B0 ), so since y 2 A we
have B0 ` y.
Observation 2.5 Let P be a function defined for sets A of propositions and
propositions x. For every theory A, P is a partial meet contraction operation over A
iff P satisfies the Gärdenfors postulates ( P l) ( P 6) for contraction over A.
Proof We have left to right by Observation 2.3. For the converse, suppose that
P satisfies the Gärdenfors postulates over A. To show that P is a partial meet
contraction operation, it will suffice to find a function such that:
(i) ”(A ? x) D fAg in the limiting case that A ? x is empty,
(ii) ”(A ? x)Tis a nonempty subset of A ? x when A ? x is nonempty, and
(iii) A P x D ”(A ? x).
Put ”(A ? x) to be fAg when A ? x is empty, and to be fB 2 A ? x: A P x Bg
otherwise. Then (i) holds immediately. When A ? x is nonempty, then x 62 Cn(¿) so
by the postulate of success A P x ° x, so, using compactness, ”(A ? x) is nonempty,
T
(A ? x) A ? x, so (ii) also holds. For (iii) we have the inclusion
and clearly
A P x T ”(A ? x) immediately from the definition of ”. So it remains only to
show that ”(A ? x) A P x.
In the case that x 62 A we have by the postulate of vacuity that A P x D A, so the
desired conclusion holds trivially. Suppose then that x 2 A, and suppose a 62 A P x;
we want to show that a 62 \”(A ? x). In the case a 62 A, this holds trivially, so we
200 C.E. Alchourrón et al.
Gärdenfors (1984) has suggested that revision should also satisfy two further
“supplementary postulates”, namely:
(u7) A u (x & y) Cn((A u x) [ fyg) for any theory A, and its conditional converse:
(u8) Cn((A u x) [ fyg) A u (x & y) for any theory A, provided that :y 62 A u x.
Given the presence of the postulates ( P 1) ( P 6) and (ul) (u6), these two
supplementary postulates for u can be shown to be equivalent to various conditions
on P . Some such conditions are given in (Gärdenfors 1984); these can however
be simplified, and one particularly simple pair, equivalent respectively to (u7) and
(u8), are:
( P 7) (A P x) \ (A P y) A P (x & y) for any theory A.
( P 8) A P (x & y) A P x whenever x 62 A P (x & y), for any theory A.
13 On the Logic of Theory Change: Partial Meet Contraction and Revision Functions 201
Observation 3.1 Let P be any partial meet contraction operation over a theory A.
Then it satisfies ( P 7) iff it satisfies (u7).
Proof We recall that u is defined by the Levi identity A u x D Cn((A P :x) [ fxg).
Let A be any theory and suppose that ( P 7) holds for all x and y. We want to show
that (u7) holds for all x and y. Let
But the former is given by hypothesis, so we need only verify the latter. Now by the
former, we have w 2 Cn(A [ fx & yg), so it will suffice to show that
we have
But by recovery we also have a 2 Cn((A P (x & y)) [ fx & yg), so, again using
disjunction of premises,
a 2 Cn .A P .x&y// D A P .x&y/
This inclusion justifies the inclusion in the following chain, whose other steps are
trivial:
Cn ACx P [ fyg D Cn .Cn ..A P :x/ [ fxg/ [ fyg/
D Cn ..A P :x/ [ fx&yg/ Cn ..A P : .x&y// [ fx&yg/
P .x&y/ :
D AC
For the converse, suppose (u8) holds for all x and y, and suppose x 62 A P (x & y).
Then clearly
Thus, since A P (x & y) is included in the leftmost term of this series, we have
separately, are included in A P (x & y). But it goes close to it, for it does yield
the following “partial antitony” property.
Observation 3.3 Let P be any partial meet contraction function over a theory A.
Then P satisfies ( P 7) iff it satisfies the condition
( P P) (A P x) \ Cn(x) A P (x & y) for all x and y.
Proof Suppose ( P 7) is satisfied. Suppose w 2 A P x and x ` w; we want to show
that w 2 A P (x & y). If x 62 A or y 62 A, then trivially A P (x & y) D A, so w 2 A P
(x & y). So suppose that x 2 A and y 2 A. Now
also be shown by an example (briefly described at the end of next section) that the
converse implication likewise fails. The question nevertheless remains whether there
are relational constraints on the partial meet operations that correspond, perfectly or
in part, to the supplementary postulates ( P 7) and ( P 8). That is the principal theme
of the next section.
.A ? x/ D B 2 A ? x W B0 B for all B0 2 A ? x :
Roughly speaking,
is relational over A iff there is some relation that marks off
the elements of
(A ? x) as the best elements of A ? x, whenever the latter is
nonempty. Note that in this definition, is required to be fixed for all choices of x;
otherwise all partial meet contraction functions would be trivially relational. Note
also that the definition does not require any special properties of apart from being
a relation; if there is a transitive relation such that for all x 62 Cn(¿) the marking
off identity holds, then
is said to be transitively relational over A. Finally, we say
that a partial meet contraction function P is relational (transitively relational) over
A iff it is determined by some selection function that is so. “Some”, because a single
partial meet contraction function may, in the infinite case, be determined by two
distinct selection functions. In the finite case, however, this cannot happen, as we
shall show in Observation 4.6.
Relationality is linked with supplementary postulate ( P 7), and transitive relation-
ality even more closely linked with the conjunction of ( P 7) and ( P 8). Indeed, we
shall show, in the first group of results of this section, that a partial meet contraction
function P is transitively relational iff ( P 7) and ( P 8) are both satisfied. In the later
part of this section we shall describe the rather more complex relationship between
relationality and ( P 7) considered alone. It will be useful to consider various further
conditions, and two that are of immediate assistance are:
(
7)
(A ? x & y)
(A ? x) [
(A ? y) for all x and y.
(
8)
(A ? x)
(A ? x & y) whenever A ? x \
(A ? x & y) ¤ ¿.
As with ( P 8), it is easy to show that when A is a theory and ” is a selection
function over A, then (
8) can equivalently be formulated as
.A ? x/ .A ? x & y/ whenever A ? x \ .A ? y/ ¤ ¿:
D A P .x&y/ :
T
Suppose now that (
8) holds, and suppose x 62 A P (x &
); that is, x 62
(A ? x
& y). We need to show that A P (x & y) A P x.TIn the case x 62 A we have A P (x
& y) D A D A P x. So suppose x 2 A. Since x 62
(A ? x & y) there is a B 2
(A
? x & y) with B ° x, so, by Lemma 2.4, B 2 A ? x and thus B 2 A ? x \ T
(A ? x
& y). Applying
T (
8) we have
(A ? x)
(A ? x & y), so A P (x & y) D ”(A ?
x & y)
(A ? x) D A P x as desired.
Observation 4.3 Let A be any theory and
a selection function for A. If
is
relational over A then
satisfies the condition (
7), and if
is transitively relational
over A, then
satisfies the condition (”8).
Proof In the cases that x 2 Cn(¿), y 2 Cn(¿), x 62 A and y 62 A, both (
7) and (”8)
hold trivially, so we may suppose that x 62 Cn(¿), y 62 Cn(¿), x 62 A and y 2 A.
Suppose
is relational over A, and suppose B 2
(A ? x & y). Now
(A ? x &
y) A ? x & y D A ? x [ A ? y, so B 2 A ? x or B 2 A ?
; consider the former
case, as the latter is similar. Let B0 2 A ? x. Then B0 2 A ? x [ A ? y D A ? x &
y, and so B0 B since B 2
(A ? x & y) and
is relational over A; and thus, by
relationality again, B0 2 (A ? x) ”(A ? x) [
(A ? y) as desired.
Suppose now that
is transitively relational over A, and suppose A ? x \
(A ?
x & y) ¤ ¿. Suppose for reductio ad absurdum that there is a B 2
(A ? x) with B
62 y (A ? x & y). Since B 2
(A ? x) A ? x A ? x & y by Lemma 4.1, whilst B
62
(A ? x & y), we have by relationality that there is a B0 2 A ? x & y with B0 B.
Now by the hypothesis A ? x \ y (A ? x & y) ¤ ¿, there is a B00 2 A ? x with B00
2 ”(A ? x & y). Hence by relationality B0 B00 and also B00 B. Transitivity gives
us B0 B and thus a contradiction.
When A is a theory and
is a selection function T for A, we define ”*, the
completion of ”, by putting
*(A ? x) D fB 2 A ? x: ”(A ? x) Bg for all x
62 Cn(¿), and
*(A ? x) D
(A ? x) D fAg in the limiting case that x 2 Cn(¿).
206 C.E. Alchourrón et al.
It is easily verified that ”* is also a selection function for A, and determines the
same partial meet contraction function as
does. Moreover, we clearly have
(A
? x)
*(A ? x) D
**(A ? x) for all x. This notion is useful in the formulation of
the following statement:
Observation 4.4 Let A be any theory, and P a partial meet contraction function
over A, determined by a selection function
. If P satisfies the conditions ( P 7) and
( P 8) then ”* is transitively relational over A.
Proof Define the relation over 2A as follows: for all B, B0 2 2A , B0 B iff either
B0 D B D A, or the following three all hold:
(i) B0 2 A ? x for some x 2 A.
(ii) B 2 A ? x and A P x B for some x 2 A.
(iii) For all x, if B0 , B 2 A ? x and A P x B0 , then A P x B.
We need to show that the relation is transitive, and that it satisfies the marking
off identity
*(A ? x) D fB 2 A ? x: B0 B for all B0 2 A ? xg for all x 62 Cn(¿).
For the identity, suppose first that B 2
*(A ? x) A ? x since x 62 Cn(¿). Let B0
2 A ? x; we need to show that B0 < B. If x 62 A then B0 D B D A so B0 B. Suppose
that x 2 A. Then clearly conditions (i) and (ii) are satisfied. Let y be any proposition,
and suppose B0 , B 2 A ? y and A P y B0 ; we need to show that A P y B. Now by
covering, which we have seen to follow from ( P 8), either A P (x & y) A P x or A
P (x & y) A P y. And in the latter case A P (x & y) A P y B0 2 A ? x so x 62
A P (x &
); so by ( P 8) again A P (x & y) A P x. Thus in either case A P (x &
y) A P x. Now suppose for reductio ad absurdum that there is a w 2 A P y with w
62 B. Then y _ w 2 A P y and so since y ` T y _ w we have by ( P 7) using Observation
3.3 that y _ w 2 A P (x & y) A P x D ”*(A ? x) B; so y _ w 2 B. But also
since B 2 A ? y and w 62 B and w 2 A, we have B [ fwg ` y, so :w _ y 2 B. Putting
these together gives us (y _ w) & (y _ :w) 2 B, so y 2 B, contradicting B 2 A ? y.
For the converse, suppose B 62
*(A ? x) and B 2 A ? x; we need to find a B0 2 A
? x with B0 — B. Clearly the supposition implies that x 2 A, so B ¤ A. Since B 2 A
? x, the latter is nonempty, so
*(A ? x) is nonempty; let B0 be one of its elements.
Noting that B0 , B 2 A ? x, B0 2
*(A ? x), but B 62 y*(A ? x), we see that condition
(iii) fails, so that B0 — B, as desired.
Finally, we check out transitivity. Suppose B00 B0 and B0 B; we want to show
that B00 B. In the case that B D A then clearly since B0 B we have B0 D B D A,
and thus since B00 B0 we have B00 D B0 D A, so B00 D B D A and B00 B. Suppose
for the principal case that B ¤ A. Then since B0 B, clearly B0 ¤ A. Since B0 B we
have B 2 A ? w and A P w B for some w 2 A, so (ii) is satisfied. Since B00 B0
we have B00 2 A ? w for some w 2 A, so (i) is satisfied. It remains to verify (iii).
Suppose B00 , B 2 A ? y and A P y B00 ; we need to show that A P y B. First, note
that since B ¤ A by the condition of the case, we have y 2 A. Also, since B00 B0
and B0 ¤ A, there is an x 2 A with B0 2 A ? x and A P x B0 . Since x, y 2 A we have
by Lemma 4.1 that A ? x & y D A ? x [ A ? y, so B00 , B0 , B 2 A ? x & y. Now
by covering, either A P (x & y) A P y or A P (x & y) A P x. The former case
gives us A P (x & y) B00 , so since B00 B0 and B0 ¤ A We have A P (x & y) B0 ,
13 On the Logic of Theory Change: Partial Meet Contraction and Revision Functions 207
(
7), then P satisfies ( P 7), and it is not difficult to show, by an argument similar to
that of 4.6, that:
Observation 4.8 If A is a theory finite modulo Cn, and P a partial meet contraction
function over A determined by a selection function
, then P satisfies ( P 7) iff
satisfies (
7). Also, P satisfies ( P 8) iff
satisfies (
8).
But on the other hand, even in the finite case, (
7) does not imply the relationality
of
or of P :
Observation 4.9 There is a theory A, finite modulo Cn, with a partial meet
contraction function P over A, determined by a selection function
, such that P
satisfies (
7), but P is not relational over A.
Sketch of Proof Take the sixteen-element Boolean algebra, take an atom a0 of this
algebra, and put A to be the principal filter determined by a0 . This will be an eight-
element structure, lattice-isomorphic to the Boolean algebra of eight elements. We
take Cn in the natural way, putting Cn(X) D fx: ^X xg. We label the eight elements
of A as a0 , . . . , a7 , where a0 is already defined, a1 , a2 , a3 are the three atoms
of A (not of the entire Boolean algebra), a4 , a5 , a6 are the three dual atoms of A,
and a7 is the greatest element of A (i.e. the unit of the Boolean algebra). For each
i 7, we write !ai for faj 2 A: ai aj g. We define
by putting
(A ? a7 ) D
(A
? Cn(¿)) D fAg D !a0 as required in this limiting case,
(A ? a0 ) D A ? aj for all
j with 1 j < 7, and
(A ? a0 ) D f!a1 g. Then it is easy to verify that for all ai 62
Cn(¿),
(A ? ai ) is a nonempty subset of A ? ai , so
is a selection function for
A. By considering cases we easily verify (
7) (and thus also by 4.2 ( P 7)); and by
considering the role of !a2 it is easy to verify that
(and hence by 4.6, P itself) is
not relational over A.
The question thus arises whether there is a condition on P or on
that is
equivalent to the relationality of P or of
respectively. We do not know of any
such condition for P , but there is one for
, of an infinitistic nature. It is convenient,
in this connection, to consider a descending series of conditions, as follows:
T S
(
7:1) A ? x \ i2I f
(A ? yi )g
(A ? x), whenever A ? x i2I fA ? yj g.
(
7:N) A ? x \
(A ? y1 ) \ \
(A ? yn )
(A ? x), whenever A ? x A ? y1
[ [ A ? yn , for all n 1.
(
7:2) A ? x \
(A ? y1 ) \
(A ? y2 )
(A ? x), whenever A ? x A ? y1 [ A
? y2 .
(
7:1) A ? x \
(A ? y)
(A ? x), whenever A ? x A ? y.
Observation 4.10 Let A be any theory and
a selection function over A. Then
is
relational over A iff (
7: 1) is satisfied. Moreover, we have (
7:1) ! (
7:N) $
(
7:2) ! (
7:1) $ (
7). On the other hand, (
7:1) does not imply (
7:2), even in
the finite case; although in the finite case, (
7:N) is equivalent to (
7: 1).
Sketch of Proof Writing (
R) for “
is relational S over A”, we show first that
(
R) !T(
7:1). Suppose (
R), and suppose A ? x i2I fA ? yi g. Suppose B 2 A
? x \ i2I f
(A ? yi )g. We need to show that B 2
(A ? x). Since B 2
(A ? yi )
13 On the Logic of Theory Change: Partial Meet Contraction and Revision Functions 209
for all i 2 I, we have by relationality that B0 B for all B0 2 A ? yi , for all i 2 I; so,
by the supposition, B0 B for all B0 2 A ? x. Hence, since B 2 A ? x, so that also
x 62 Cn(¿), we have by relationality that B 2
(A ? x). To show the converse (
7:
1) ! (
R), suppose (
7: 1) holds, and define over 2A by putting B0 B iff there
is an x with B 2
(A ? x) and B0 2 A ? x; we need to verify the marking off identity.
The left to right inclusion of the marking off identity is immediate. For the right to
left, suppose B 2 A ? x and for all B0 2 A ? x, B0 B. Then by the definition of ,
for all Bi 2 fBi gi2I D A ? x there is a yi with B 2S
(A ? yi ) and Bi 2 A ? yi . Since Bi
2 A ? yi , for all Bi 2 A ? x, weT have A ? x i2I fA ?yi g, so we may apply (
7:
1). But clearly B 2 A ? x \ i2I f
(A ? yi )g. Hence by (
7:1) we have B 2
(A
? x), as desired.
The implications (
7:1) ! (
7:N) ! (
7:2) ! (
7:1) are trivial, as is the
equivalence of (
7:1) to (
7:N) in the finite case. To show that (
7:2) implies
the more general (
7:N), it suffices to show that for all n 2, (
7:n) ! (
7:n C 1):
this can be done using the fact that when yn , ynC1 2 A, A ? yn [ A ? yn C 1 D A ?
(yn & ynC1 ) by Lemma 4.1.
To show that (
7:1) ! (
7), recall from 4.1 that when x, y 2 A, then A ? x &
y D A ? x [ A ? y; so A ? x A ? x & y, and so, by (
7:1), (A ? x) \
(A ? x
& y)
(A ? x). Similarly (A ? y) \
(A ? x & y)
(A ? y). Forming unions on
left and right, distributing on the left, and applying 4.1 gives us
(A ? x & y)
(A
? x) [
(A ? y) as desired.
To show conversely that (
7) ! (
7:1), suppose (
7) is satisfied, suppose A ?
x A ? y and consider the principal case that x, y 2 A. Then using compactness we
have y ` x, so Cn(y) D Cn(x & (:x _ y)), so by (
7)
.A ? y/ .A ? x/ [ .A ? :x _ y/ ;
5 Remarks on Connectivity
all (up to equivalence modulo Cn) the elements a 2 A such that B ° a. Now A ? b0 &
b D A ? b0 [ A ? b D fB0 , Bg by Lemma 4.1, and so since
is a selection function,
(A ? b0 & b) is a nonempty subset of fB0 , Bg, which implies that either B0 or B is
in
(A ? b0 & b). In the former case we have B B0 , and in the latter case we have
the converse.
Corollary 5.3 Let A be a theory finite modulo Cn, and let P be a partial meet
contraction function over A. Then P is relational iff it is connectively relational.
Proof Immediate from 5.2.
The first topic of this section will be a brief investigation of the consequences of the
following rather strong fullness condition:
( P F) If y 2 A and y 62 A P x, then :y _ x 2 A P x, for any theory A.
From the results in Gärdenfors (1982), it follows that if P is a partial meet
contraction function, then this condition (there called ( 6)) is equivalent with the
following condition (called (21) in Gärdenfors (1982)) on partial meet revision
functions:
(u F) If y 2 A and y 62 A u x, then :y 2 A u x, for any theory A.
The strength of the condition ( P F) is shown by the following simple representa-
tion theorem:
Observation 6.1 Let P be any partial meet contraction function over a theory A.
Then P satisfies ( P F) iff P is a maxichoice contraction function.
Proof Suppose P satisfies ( P F). Suppose B, B0 2
(A ? x) and assume for
contradiction that B ¤ B0 . There is then some 2 B0 such that y 62 B. Hence y 62
A P x and since y 2 A it follows from ( P F) that : y _ x 2 A P x. Hence : y _ x
2 B0 , but since y 2 B0 it follows that x 2 B0 , which contradicts the assumption that
B0 2 A ? x. We conclude that B D B0 and hence that P is a maxichoice contraction
function.
For the converse, suppose that P is a maxichoice contraction function and
suppose that y 2 A and y 2 A P x. Since A P x D B for some B 2 A ? x, it follows
that y 62 B. So by the definition of A ? x, x 2 Cn(B [ fyg). By the properties of
the consequence operation we conclude that :y _ x 2 B D A P x, and thus ( P F) is
satisfied.
In addition to this representation theorem for maxichoice contraction functions,
we can also prove another one based on the following primeness condition.
212 C.E. Alchourrón et al.
D A P x \ A P y;
The converse can be proven via the representation theorem (Observation 4.4),
but it can also be given a direct verification as follows. Suppose that P satisfies ( P 7)
and ( P 8), and suppose that A P (x & y) ¤ A P x and A P (x & y) ¤ A P y; we want
to show that A P (x & y) D A P x \ A P y. By ( P 7) it suffices to show that A P
(x & y) A P x \ A P y, so it suffices to show that A P (x & y) P x and A P (x
& y) A P y. By ( P C), which we know by 3.4 to be an immediate consequence of
( P 8), we have at least one of these inclusions. So it remains to show that under our
hypotheses either inclusion implies the other. We prove one; the other is similar.
Suppose for reductio ad absurdum that A P (x & y) P x but A P (x & y) ª A
P y. Since by hypothesis A P (x & y) ¤ A P x, we have A P x ª A P (x & y), so
there is an a 2 A P x with a 62 A P (x & y). Since A P (x & y) ª A P y, we have
by ( P 8) that y 2 A P (x & y). Hence since a 62 A P (x & y) we have : y _ a 62 A P
(x & y). Hence by ( P 7), : y _ a 62 A P x or : y _ a 62 A P y. But since a 2 A P x
the former alternative is impossible. And the second alternative is also impossible,
since by recovery A P y [ fyg ` a, so that : y _ a 2 A P y.
which it follows immediately that (
7) does not in the finite case imply ( P TR). The
other nonequivalences need other examples, which we briefly sketch.
For the second example, take A to be the eight-element theory of Observation
4.9, but define
as follows: In the limiting case of a7 , we put
(A ? a7 ) D f!a0g as
required by the fact that a7 2 Cn(¿); put
(A ? aj ) D A ? aj for all j with 2 j < 7;
put
(A ? a1 ) D f!a3 g; and put
(A ? a0 ) D f!a1 , !a3 g. Then it can be verified that
the partial meet contraction function P determined by
satisfies ( P C), and so by
finiteness also (
C), but not ( P 8) and so a fortiori not ( P TR).
For the third example, take A as before, and put
(A ? a7 ) D f!a0 g as always;
put
(A ? a1 ) D f!a2 g; and put
(A ? ai ) D A ? ai for all other ai It is then easy to
check that this example satisfies ( P 8) but not ( P 7), and so a fortiori not ( P R) and
not ( P TR).
For the fourth and last example, take A as before, and put to be the least
reflexive relation over 2A such that !a1 !a2 , !a2 !a3 , !a3 T!a2 and !a3 !a1 .
Define
from via the marking off identity, and put A P x D
(A ? x). Then it
is easy to check that
is a selection function for A, so that ( P R) holds. But ( P C)
fails; in particular when x D a1 and y D a2 we can easily verify that A P (x & y)
ª A P x and A P (x & y) ª A P y. Hence, a fortiori, ( P 8) and ( P TR) also fail.
13 On the Logic of Theory Change: Partial Meet Contraction and Revision Functions 217
8 Added in Proof
The authors have obtained two refinements: the arrow ( P D) ! ( P TR) of the
diagram on page 528 can be strengthened to ( P D) ! (
TR); the implication
(
7:1) ! (
7:N) of Observation 4.10 can be strengthened to an equivalence. The
former refinement is easily verified using the fact that any maxichoice contraction
function over a theory is determined by a unique selection function over that theory.
The latter refinement can be established by persistent use of the compactness of Cn.
Observation 4.10 so refined implies that for a theory A and selection function
over A,
is relational over A iff (
7:2) holds. This raises an interesting open
question, a positive answer to which would give a representation theorem for
relational partial meet contraction, complementing Corollary 4.5: Can condition
(
7:2) be expressed as a condition on the contraction operation P determined by
?
We note that a rather different approach to contraction has been developed by
Alchourrón and Makinson in On the logic of theory change: safe contraction, to
appear in Studia Logica, vol. 44 (1985), the issue dedicated to Alfred Tarski; the
relationship between the two approaches is studied by the same authors in Maps
between some different kinds of contraction function: the finite case, also to appear
in Studia Logica, vol. 44 (1985).
References
Alchourron, C. E., & Makinson, D. (1982a). Hierarchies of regulations and their logic, new studies
in deontic logic (pp. 125–148), R. Hilpinen (Ed.), Dordrecht: Reidel.
Alchourron, C. E., & Makinson, D. (1982b). On the logic of theory change: Contraction functions
and their associated revision functions. Theoria, 48, 14–37.
Gärdenfors, P. (1978). Conditionals and changes of belief The logic and epistemology of scientific
change. In I. Niiniluoto & R. Tuomela (Eds.), Acta Philosophica Fennica (Vol. 30, pp. 381–
404).
Gärdenfors, P. (1982). Rules for rational changes of belief 320311: Philosophical essays dedicated
to Lennart Åqvist on his fiftieth birthday (pp. 88–101), (T. Pauli, Ed., Philosophical studies no.
34). Uppsala: Department of Philosophy, University of Uppsala.
Gärdenfors, P. (1984). Epistemic importance and minimal changes of belief. Australasian Journal
of Philosophy, 62, 136–157.
Makinson, D. (1985). How to give it up: A survey of some formal aspects of the logic of theory
change. Synthese, 62, 347–363.
Chapter 14
Theory Contraction and Base Contraction
Unified
General Introduction
“Partial meet contraction” contains introductory material. In the section “The new
postulates”, the postulates that will be used for the characterizations are introduced,
and in the section “Axiomatic characterizations” axiomatic characterizations of
various types of base-generated theory contractions are given. The section “Proofs”
provides proofs of the results reported in the section “Axiomatic characterizations”.
If this condition holds for a transitive relation «, then the operator is transitively
relational.
14 Theory Contraction and Base Contraction Unified 221
1
The maximizing property may be interpreted as saying that all elements of the belief base have
positive epistemic value. This property might at first hand seem superfluous. If K 0 K00 , then K 0
2 K?˛ and K00 2 K?˛ cannot both be true, so that K 0 and K00 cannot be candidates between
which the selection function has to choose. However, when more than one contraction is taken
into account, the property need not be superfluous. If K1 K2 , and K3 is neither a subset nor a
superset of either K1 or K2 , then ”(fK1 ,K3 g) D fK1 g and ”(fK2 ,K3 g) D fK3 g may both hold for a
transitively relational selection function ”, but not if it is in addition required to be maximizing.
2
See (Hansson 1993) for some results, including an axiomatic characterization of the partial meet
contractions on a belief base.
3
In Makinson’s (1987) terminology, an operation that satisfies (G1)-(G5) but not necessarily
(G6) is a “withdrawal”. However, I will use “contraction” in the wide sense indicated above, thus
including Makinson’s withdrawals.
222 S.O. Hansson
Recovery is the most controversial of the six postulates, and has been seriously ques-
tioned by several authors. See (Hansson 1991); (Makinson 1987); (Niederée 1991).
It has often been remarked that the only realistic belief sets are those that have a
finite representation. (Gärdenfors and Makinson 1988) We are going to assume that
both the original belief set and the belief sets that are obtained through contraction
have finite representations (belief bases); thus for every ’, there is some finite set
A such that K ’ D Cn.A/. Furthermore, we will assume that although there
may be infinitely many sentences by which the belief set can be contracted, there
is only a finite number of belief sets that can be obtained through contraction, i.e.
fK0 j .9’/.K0 D K ’/g is finite. These two finiteness properties can be combined
into the following:
There is a finite set A such that for every ’, there is some A0 A such that K ’ D
Cn.A0 /. (finitude)
In the presence of vacuity (G3), finitude implies that there is some finite set A such
that K D Cn(A).
If is a contraction operator for K, and ’ is not a logical theorem, then K ’
does not contain ’. However, to contract K by ’ is not the only way to exclude ’
from the belief set. Typically, there are several “ distinct from ’ such that ’ … K“.
This must be the case if ’ logically implies “, and it can be true in other cases as
well. A contraction K “ such that ’ … K “ will be called an ’-removal.
Two different beliefs may have exactly the same justification(s). As an example, I
believe that either Paris or Oslo is the capital of France (’). I also believe that either
Paris or Stockholm is the capital of France (“). Both these beliefs are entirely based
on my belief that Paris is the capital of France. Therefore, a contraction by some
sentence • removes ’ if and only if it removes “ (namely if and only if it removes
the common justification of these two beliefs). There is no contraction by which I
can retract ’ without retracting “ or vice versa. It is not unreasonable to require that
if two beliefs in a belief set stand or fall together in this way, then their contractions
are identical. In other words, if all ’-removals are “-removals and vice versa, then
K ’ D K “. In the formal language:
If K • ` ’ iff K • ` “ for all •, then K ’ D K “: (symmetry)
An ’-removal K “ will be called a preservative ’-removal if and only if K ’
K “, and a strictly preservative ’-removal if and only if K ’ K “. A
strictly preservative ’-removal is an operation that removes ’, and does this in a
more economical way than what is done by the contraction by ’.
Often, a belief can be removed more economically if more specified information
is obtained. As an example, I believe that Albert Schweitzer was a German
Missionary (’). Let ’1 denote that he was a German and ’2 that he was a Missionary,
14 Theory Contraction and Base Contraction Unified 223
4
It is here assumed that ’1 ’2 is less entrenched than ’1 and ’2 .
224 S.O. Hansson
weak criterion, and it can be strengthened in various ways.5 However, it turns out to
be sufficient for our present purposes, and it will therefore be used in our postulate
of conservativity:
If K “ 6 K ’, then there is some • such that K ’ K • ° ’ and
.K “/ [ .K •/ ` ’. (conservativity)
An obvious way to strengthen conservativity is to require that if a unit of belief is
lost in contraction by ’, then ’ will be implied if it is added to K’ (and not merely
if it is added to some preservative ’-removal):
If K “ 6 K ’ then K ’ ° ’ and .K “/ [ .K ’/ ` ’. (strong
conservativity)
Strong conservativity is much less plausible than conservativity. This can be seen
from our Albert Schweitzer example. In that example, it may reasonably be assumed
that ’1 … K .’1 &’2 /, ’2 … K .’1 &’2 /, ’1 2 K ’2 , ’2 2 K ’1 , K
.’1 &’2 / K ’1 , and K .’1 &’2 / K ’2. However, this is incompatible
with strong conservativity. Since K ’1 ª K .’1 &’2 /, this postulate requires
that K ’1 [ K .’1 &’2 / ` .’1 &’2 /, contrary to our assumptions for this case.
More generally, strong conservativity is implausible since it precludes the removal
of two or more sentences (in this case ’1 and ’2 ), when it would have been logically
sufficient to remove only one of them. Such epistemic behaviour is rational enough
when the beliefs in question are equally entrenched, or have equal epistemic utility.
The concepts of epistemic entrenchment and epistemic utility refer to extra-
logical reasons that one may have for preferring one way to remove a sentence
’ rather than another. It is conceivable for an epistemic agent to make no use of
such extra-logical information. Such an agent is indecisive in the sense of not being
able to make a choice among different ways to remove a belief, if these are on an
equal footing from a logical point of view. Formally, this is close to a reversal of
conservativity: If a (self-sustained) unit of belief conflicts with some way to remove
’ from K, then it is not a part of K ’:
If there is some • such that K • ° ’ and .K “/ [ .K •/ ` ’, then K “ 6
K ’. (indecisiveness)
Non-logical considerations play an important role in actual human epistemic
behaviour. Arguably, it would in many cases be irrational not to let them do
so. Therefore, indecisiveness is not a plausible general property of rational belief
change.
5
Two such strengthened versions should be mentioned: (1) can be extended to contraction by
sets (multiple contraction), such that if A \ Cn(¿) D ¿, then KA is a logically closed subset of
K that does not imply any element of A. It a unit of belief cannot stand on its own, then it should
not be equal to KA for any set A. (2) Iterated contraction can be used for the same purpose: If
a unit of belief cannot stand on its own, then it should not be equal to K“1 “2 : : : “n for any
series “1 : : : “n of sentences.
14 Theory Contraction and Base Contraction Unified 225
6
Regularity 1 is closely related to Amartya Sen’s “ property for rational choice behaviour, and
regularity 2 to his ’ property. (Sen 1970).
226 S.O. Hansson
also the removed unit of belief when ’1 & ’2 is contracted. In general, her epistemic
behaviour should satisfy the following postulate:
If ` ’ ! “ and K ’ ° “, then K “ D K ’. (hyperregularity)
Hyperregularity implies that for all ’ and “, K .’&“/ D K ’ or K .’&“/ D
K“. This condition has also been called “decomposition” (Alchourrón et al. 1985,
p. 525) As was noted by Gärdenfors (1988, p. 66), it is too strong a principle.
Thus, hyperregularity is a limiting case of some interest, but not a plausible
criterion of rational belief change. The same applies to strong conservativity and
indecisiveness, whereas symmetry, conservativity, and regularity are proposed as
reasonable postulates for rational belief change.
Axiomatic Characterizations
7
The completeness of ” is used in the proof. I do not know if it can be dispensed with.
14 Theory Contraction and Base Contraction Unified 227
For maxichoice operations that are transitively, maximizingly relational, the follow-
ing axiomatic characterization has been obtained:
Theorem 14.5 An operation on a belief set K is generated by a transitively,
maximizingly relational maxichoice contraction of a finite base for K iff satisfies
(G1), (G2), (G3), (G4), (G5), finitude, symmetry, strong conservativity,
and hyperregularity.
Some of the postulates used in Theorems 14.2, 14.3, and 14.5 were shown in
the section “The new postulates” to be quite implausible. Indeed, maxichoice and
full meet contraction are of interest only as limiting cases. In contrast, Theorems
14.1 and 14.4 only employ fairly plausible postulates of rational belief change. It
is proposed that the classes of base-generated contractions of belief sets that are
characterized in these theorems represent reasonable types of belief contraction.
Two further important properties have been obtained for the class of operations
that were referred to in Theorem 14.4, namely contractions of belief sets that are
generated by transitively, maximizingly relational partial meet base contractions:
Theorem 14.6 Let the operation on the belief set K be generated by some
transitively, maximizingly relational partial meet contraction of a finite base for K.
Then:
(1) If K • .K ’/ \ .K “/, then K • K .’&“/. (weak intersection)
(2) If ’ … K .’&“/, then K .’&“/ K ’. (conjunction)
Weak intersection is a weaker form of Gärdenfors’s (G7) postulate, namely that
“the beliefs that are both in K ’ and K “ are also in K .’&“/” (Gärdenfors
1988, p. 64).8 It differs from Gärdenfors’s original postulate in being restricted to
beliefs that are self-sustained in the sense that was accounted for in section “The
new postulates”. To see that this is a reasonable restriction, let ’ and “ be self-
sustained beliefs that have the same degree of entrenchment, and such that ’ _ “ is
believed only as a logical consequence of ’ and “. For a plausible practical example,
let ’ denote that Algiers is a capital and “ that Bern is a capital. I have both these
beliefs, and they are equally entrenched. If I contract my belief set by ’, then “ is
unperturbed, so that “ 2 K ’, and as a consequence of that, ’ _ “ 2 K ’. For
symmetrical reasons, ’ _ “ 2 K “. However, if I contract my belief set by ’ & “,
then since ’ and “ are equally entrenched I cannot choose between them, so that
they must both go. Since neither ’ nor “ is in K .’&“/, and ’ _ “ was believed
only as a consequence of (each of) these two beliefs, ’ _ “ will be lost as well.
Thus, ’ _ “ 2 .K ’/ \ .K “/ but ’ _ “ … K .’&“/, contrary to (G7) but
in accordance with weak intersection.
Conjunction is Gärdenfors’s (G8) postulate. To motivate it, the Algiers and
Bern example may again be used. In that case, to remove ’ is a way to remove a
specified part of ’&“. In general, the removal of a part of a certain belief is at least
8
The formulas of the quotation have been adapted to the notational convention used here.
228 S.O. Hansson
Proofs
A set A is finite-based iff there is some finite set A0 such that Cn(A0 ) D Cn(A). For
any non-empty, finite-based set A, &A denotes the conjunction of all elements of
some finite base of A. For any finite, non-empty set A, V.A/ denotes the disjunction
of the elements of A.
The following lemmas will be needed for the proofs:
Lemma 14.1 B ? .’&“/ B ? ’ [ B ? “.
Proof of Lemma 14.1 Let W 2 B ? .’&“/. Then either W ° ’ or W ° “. It follows
from W ° ’ and W 2 B ? .’&“/ that W 2 B ? ’, and in the same way from W °
“ and W 2 B ? .’&“/ that W 2 B ? “.
Lemma 14.2 If X ª Y ª X for all X 2 B ? ’ and Y 2 B ? “, then B ? .’&“/ D
B ? ’ [ B ? “.
We will mostly use a special case of the lemma. Namely if fXg D B ? ’ and
fYg D B ? “, and X ª Y ª X, then fX; Yg D B ? .’&“/.
Proof of Lemma 14.2 One direction follows from lemma 14.1. For the other
direction, let X 2 B ? ’. Then X ° (’&“). In order to prove that X 2 B ? .’&“/,
suppose to the contrary that there is some W such that X W B and W °
(’&“). From X W it follows that W ` ’. With W ° (’&“) this yields W °
“, from which it follows that W Y for some Y 2 B ? “. We therefore have
X W Y, contradicting the assumption that X ª Y ª X. We may conclude that
X 2 B ? .’&“/.
In the same way it follows that if Y 2 B ? “ then Y 2 B ? .’&“/.
Lemma 14.3 If X 2 B ? ’ and B is finite, then there is some “ such that ` ’ ! “
and fXg D B ? “.
Proof of Lemma 14.3 If X D B, then let “ D ’. Otherwise, let BnX D fŸ1 ; : : : Ÿn g,
and let “ be ’ _ Ÿ1 _ _ Ÿn . First suppose that X ` ’ _ Ÿ1 _ _ Ÿn . It follows
from X 2 B ? ’ that that X ` Ÿk ! ’ for all Ÿk 2 BnX. We therefore have X `
9
I do not know if weak intersection and conjunction can replace regularity in theorem 14.4.
14 Theory Contraction and Base Contraction Unified 229
Part 1: In order to prove that ” 00 is a selection function for B00 we have to show
that if B00 ? ’ ¤ ¿, then ” 00 B00 ? ’ ¤ ¿. Let B00 ? ’ ¤ ¿. Then B ? ’ ¤ ¿,
from which follows that ”.B ? ’/ is nonempty. Let X 2 ”.B ? ’/. It follows from
Cn(f (X)) D Cn(X) that f (X) ° ’.
230 S.O. Hansson
Suppose that there is some Y B00 such that f .X/ Y ° ’. There is then a set Y
with this property that is closed under conjunction. It follows that X f 1 .Y/ ° ’
and f 1 .Y/ B, contrary to X 2 B ? ’. We may conclude from this contradiction
that there is no Y B00 such that f .X/ Y ° ’. Since f (X) ° ’ it follows that
f .X/ 2 B00 ? ’. Sincef 1 .f .X// D X 2 ”.B ? ’/, it follows from the construction
of ” 00 that ” 00 B00 ? ’ ¤ ¿. T
Part 2: Since, by the assumptions,
T 00 00 .Cn .B// T’ D Cn . ”.B ? ’//, it is
sufficient to prove that Cn ” B ? ’ D Cn . ”.B ? ’//. Since B00 ? ’ ¤
¿ if and only if B ? ’ ¤ ¿, and Cn(B00 ) D Cn(B), only the case when B00 ? ’ ¤ ¿
requires further consideration.T
For one direction, let • 2 ”.B ? ’/. Then • 2 Z for every Z 2 ”.B ? ’/.
00
By the construction
T 00 00 of ” , • 2 Z 00 for every Z 00 T2 ” 00 B00 ?’ . It follows
T
that T • 2 ” B ? ’T. Thus, ? ’/
”.B ” 00 B00 ? ’ ; from which
00
Cn . ”.B ? ’// Cn ” B ? ’ can
00
T be concluded.
For the other direction, suppose that © 2 ” 00 B00 ? ’ . It follows from © 2 B00
that there are elements ©1 , : : : ©n of B such that © ©1 & : : : &©n . Let W 2 ”.B ? ’/.
By the construction of ” 00 , f .W/ 2 ” 00 B00 ? ’ . It follows from f .W/ 2 B00 ? ’
that f (W) is B00 -closed. We may conclude from f©1 ; : : : &©n g B00 , © 2 f .W/ and
the B00 -closure of f (W) that f©1 ; : : : &©n g f .W/.
It follows from W 2 B ? ’ that W is B-closed. Since Cn(W) D Cn(f (W))
and f©1 ; : : : &©n g B, we may conclude from f©1 ; : : : &©n g f .W/ that
f©1 ; : : : &©n g W. Since thisTholds for all W 2 ”.B ? ’/, we have
T T f©001 ; : :00: &©ng
”.BT ? ’/. Thus, © 2 Cn . ”.B ? ’//.
T 00 00 We have
proved Tthat ” B ?’
Cn . ”.B ? ’//, from which Cn ” B ? ’ Cn . ”.B ? ’// follows as
desired.
Lemma 14.6 Let the operation on the belief set K be generated by the partial
meet contraction ” on the finite base B for K. Then K “ is a maximally
preservative ’-removal iff B \ .K ’/ B \ .K “/ 2 B ? ’.
Proof of Lemma 14.6 For the non-trivial direction, suppose that K “ is a
maximally preservative ’-removal and that B \ .K “/ is not an element of B ? ’.
Then there must be some X B such that B \ .K “/ X 2 B ? ’ and that
there is no • for which Cn.X/ D K •. However, this is impossible since by lemma
14.3 there is some • such that fXg D B ? •, and by the definition of partial meet
contraction B” • D X, so that K • D Cn.X/.
Lemma 14.7 Let be an operator on K that satisfies closure (G1) and finitude,
and let B D f&Xj .9’/.X D K ’/g. Then Cn .B \ .K ’// D K ’.
Proof of Lemma 14.7 By the construction, &.K ’/ 2 B. By closure (G1),
&.K ’/ 2 K ’. It follows that &.K ’/ 2 .B \ .K ’//, so that
Lemma 14.8 Let be an operator on the belief set K that satisfies closure (G1),
success (G4), finitude and conservativity, and let
Then:
If fXg D B ? •; then X D B \ .K •/:
K “ Cn.X/:
Then B \ .K “/ B \ Cn.X/ D X.
It follows from X 2 B ? “, by lemma 14.3, that fXg D B ? • for some •
such that ` “ ! •. By lemma 14.8, X D B \ .K •/, so that by closure (G1)
and lemma 14.7, K • D Cn.X/. Similarly, there are © and —1 , : : : —n such that
fYg D B ? © and K © D Cn.Y/, and that fZk g D B ? —k and K —k D Cn .Zk /
for all k.
By repeated applications of lemma 14.2, B ? “ D B ? .•&©&—1 & : : : &—n /.
Thus, for all ¥, B \ .K ¥/ ` “ iff B \ .K ¥/ ` •&©&—1 & : : : &—n . By lemma
14.7, K ¥ ` “ iff K ¥ ` •&©&—1 & : : : &—n . By symmetry, K “ D K
.•&©&—1 & : : : &—n /.
We have assumed that K “ Cn.X/, i.e., K .•&©&—1 & : : : &—n / K •.
Suppose that K • is not a maximally preservative • & © & —1 & : : : &—n -removal.
Then there is some ¥ such that K • K ¥°•&©&—1 & : : : &—n . By lemma 14.7,
Cn .B \ .K •// Cn .B \ .K ¥//, so that
fX 2 B ? ’j .K ’/ Cn.X/g ¤ fX 2 B ? ’j .K “/ Cn.X/g :
implies conservativity, the proof of part 4 of Theorem 14.1 is also a proof of part 4
of the present proof.
Part 5: Let • 2 B\(B ” ’). By the construction of B, • D &(K“) for some
“. Since &(K“) 2 B and B T” ’ is B-closed (cf. the definition forTlemma 14.5),
it follows from & .K “/ …
.B ? ˛/ that & .K “/ … Cn .
.B ? ˛//,
thus by the result of part 4, &(K“) 62 K’. By closure (G1), K“ ª K’. By
strong conservativity, K“ [ K’ ` ’. Thus, B ” ’ [ f&(K“)g ` ’, i.e., B ” ’
[ f•g ` ’. Since this holds for all • 2 B\(B ” ’), we can conclude that B ” ’ 2
B?’.
Proof of Theorem 14.3, Left-to-Right Let be the operation on K that is generated
by the operator of full meet contraction on a finite belief base B for K. Making
use of the corresponding part of the proof of Theorem 14.1, it only remains for us
to prove that indecisiveness holds. Just as in Theorem 14.1 we may, due to lemma
14.5, assume that B is closed under conjunction.
Suppose that K• ° ’ and (K“)[(K•) ` ’. Then B• ° ’ and (B“) [
(B•) ` ’. Since B“ and B• are both subsets of B there is some subset X of B
such that B• X 2 B?’ and B“ ª X. By the definition of full meet contraction,
B’ X. Suppose that B“ B’. It would then follow from (B“) [ (B•) `
’, B• X and B’ X that X ` ’, contrary to X 2 B?’. We may conclude that
B“ ª B’.
Since all elements of B?’ are B-closed (cf. the definition for lemma 14.5), their
intersection B’ is also B-closed. Similarly, B“ is B-closed. It therefore follows
from B“ ª B’. that Cn(B“) ª Cn(B’), i.e. K“ ª K’.
Proof of Theorem 14.3, Right-to-Left Let B D f&Xj .9’/.X T D K ’/g. We need
base for K, (2) for all ’, Cn . .B ? ’// K ’, and
to show that (1) B is a finite T
(3) for all ’, K ’ Cn . .B ? ’//. The proof of part 1 coincides with that of
part 1 of Theorem 14.1. T
Part 2: We are going to prove that .B ? ’/ K ’. Let — … K ’. If
thereTis no “ such that — D K “, then it follows by the construction of B that
— … .B ? ’/. In the principal case, — D & .K “/.
It follows from & .K “/ … K ’ and closure (G1) that K“ ª K’. By
conservativity, there is some • such that K ’ K • ° ’ and .K “/ [
.K •/ ` ’. By lemma 14.7, .B \ .K “// [ .B \ .K •// ` ’. Thus, there is
some X such that & .B \ .K T“// … X 2 B ? ’, i.e. & .K “/ … X 2 B ? ’. It
follows that — D & .K “/ … .B T ? ’/. T
Thus, if — … K ’, then — … T.B ? ’/. i.e. .B ? ’/ K ’. By closure
(G1), we can conclude that Cn . .B ? ’// K T ’. T
Part 3: We are going to show that B\.K ’/ .B ? ’/. Let • … .B ? ’/.
If there is no “ such that • D & .K “/, then clearly • … B \ .K T ’/. In the
principal case, let • D & .K “/. It follows from & .K “/ … .B ? ’/ that
there is some X such that & .K “/ … X 2 B ? ’.
By lemma 14.3, fXg D B ? ¥ for some ¥. By lemma 14.8, X D B \ .K ¥/.
By lemma 14.7, Cn.X/ D K ¥. Therefore, it follows from & .K “/ … B \
.K ¥/ 2 B ? ˛ that K ¥ ° ’ and .K “/ [ .K ¥/ ` ˛. By indecisiveness,
14 Theory Contraction and Base Contraction Unified 235
B \ .K “/ B \ .K •/ 2 B ? “:
B00 « B0 .
Since this holds for all B00 2 B?’, we may conclude by the construction of ” that
B 2
.B ? ˛/, so that ”(B?’) is non-empty.
0
T
contraction,T
.B ? ˛/ D B. Using the result of part 1 of the present proof, we
obtain Cn.
.B ? ˛// D Cn.B/ D K D K ˛.
It remains to prove the principal case, in which ’ is notT
a logical theorem.
Part 3a: We are going to show that B \ .K ˛/
.B ? ˛/, i.e., that if
X 2
.B ? ˛/, then B \ .K a/ X.
Let X 2
.B ? ˛/. By lemma 14.3, fXg D B ? • for some • such that ` ’ ! •.
By lemma 14.8, X D B \ .K •/ and by lemma 14.7, Cn(X) D K•.
Let Y1 , : : : Yn be the elements of B?’ apart from X. By lemma 14.3, for each
Yk there is some ©k such that ’ ! ©k and fYk g D B?©k . By lemma 14.8, Yk D B \
.K "k /. By lemma 14.7, Cn(Yk ) D K©k .
By repeated uses of lemma 14.2, B ? ˛ D B ? .•&"1 & : : : &"n /. Thus, for all
¥; B \ .K ¥/ ` ˛ iff B \ .K ¥/ ` •&"1 & : : : &"n . By lemma 14.7, K¥ ` ’
iff K¥ ` •&©1 & : : : &©n . By symmetry, K’ D K(•&©1 & : : : &©n ).
If K(•&©1 & : : : &©n ) ° •, then it follows from fXg D B?• that B \ .K ˛/ D
B \ .K .•&"1 & : : : &"n // X.
In the remaining case, when K(•&©1 & : : : &©n ) ` •, there is by success (G4)
some ©k such that K(•&©1 & : : : &©n ) ° ©k . Since fYkg D B ? "k ,
K .a&b&c/ K b:
This contradicts strong conservativity, and we may conclude that (for all ’) B \
.K ’/ 2 B ? ’.
By lemma 14.3 there is some ¥ such that ` ’ ! ¥ and fB\(K’)g D B?¥.
By lemma 14.7, K’ ° ¥. Let Y 2 B?’. By lemma 14.3 there is some § such
that ` ’ ! § and fYg D B ? §. By lemma 14.2, fB \ .K ’/ ; Yg D B ?
.¥&§/. Since ` ’ ! (¥&§) and K’ ° (¥&§) it follows by hyperregularity that
K’ D K(¥&§). By lemma 14.7 and our definition of «, it follows that Y« B \
.K ’/. Since this holds for all Y 2 B ? ’, it follows by the definition of ”
that B \ .K ’/ 2 ”.B ? ’/. Thus ”.B ? ’/ is non-empty whenever B ? ’ is
non-empty.
Part 3: In order to prove that ” is maxichoice, suppose that it is not. Then there
is some ’ such that there are distinct X and Y with X, Y 2 ”.B ? ’/. It follows, by
the definition of ”, that there are ¥ and § such that fX; Yg D B ? ¥ D B ? §,
Cn.X/ D K ¥ and Cn(Y) D K§.
It follows from fX; Yg D B ? ¥ D B ? § that fX; Yg D B ? .¥&§/. By
what was shown in part 2 of the present proof, B \ .K .¥&§// 2 B ? .¥&§/.
It follows from this and B ? ¥ D B ? .¥&§/ that B \ .K .¥&§// ° ¥.
By lemma 14.7, K(¥&§) ° ¥. From this it follows by hyperregularity that
K(¥&§) D K¥. Similarly, K(¥&§) D K§, so that K¥ D K§. Thus
X D Y, contrary to our conditions. We may conclude that ” is maxichoice.
Part 4: It was shown in part 2 of the present proof that for all ’, B \ .K ’/ 2
”.B ? ’/. By part 3, fB \ .K ’/g D ”.B ? ’/. It follows from this, by lemma
14.7, that Cn(B ” ’) D K’.
Part 5: It follows directly by the construction that ” is relational by « and that the
maximizing property holds. In the proof of transitivity, we will use the symbol as
follows:
Y X iff there is some “ such that fX; Yg D B ? “ and K“ D Cn(X).
Thus, X « Y iff either X Y or X Y. The proof of transitivity can therefore be
divided into the following four cases:
(1) If X Y and Y Z, then either X Z or X Z.
(2) If X Y and Y Z, then either X Z or X Z.
(3) If X Y and Y Z, then either X Z or X Z.
(4) If X Y and Y Z, then either X Z or X Z.
If X D B, then X 6 Y, and X Y implies X D Y, so that Y« Z implies X« Z. If
Y D B, then Y 6 Z, and Y Z implies Y D Z, so that X« Y implies X« Z. If Z D B,
then either X Z or X D Z, in both cases yielding X« Z. Thus, in the proofs of the
four cases we can assume that X ¤ B, Y ¤ B, and Z ¤ B.
Case 1: Trivial.
Case 2: Suppose that X Y and Y Z. By lemma 14.4, there are sentences a, b,
and c such that fXg D B ? a, fYg D B ? b, and fZg D B ? c. By lemma 14.2 we
have fX; Yg D B ? .a&b/ and fY; Zg D B ? .b&c/
14 Theory Contraction and Base Contraction Unified 243
Acknowledgement I would like to thank Peter Gärdenfors, Hans Rott, Wlodek Rabinowicz, and
an anonymous referee for valuable comments on an earlier version.
References
Alchourrón, C. E., Gärdenfors, P., & Makinson, D. (1985). On the logic of theory change: Partial
meet contraction and revision functions. Journal of Symbolic Logic, 50, 510–530.
14 Theory Contraction and Base Contraction Unified 245
Isaac Levi
Inquirers ought to change beliefs for good reason (Levi 1980 ch. 1, 1991, ch. 1,
2004). What those good reasons are depend on the proximate goals of their inquiries.
William James urged us to seek Truth and avoid Error in forming beliefs. He
ought to have said: Seek Information and avoid Error. The common features of the
proximate goals of scientific inquiries ought to be to answer questions of interest
without error and in a manner that yields valuable information.
The beliefs inquirers seek to change are full beliefs. Agent X fully believes that
h if and only if X is certain that h is true. That is to say, X rules out the logical
possibility that h is false as a serious possibility, takes for granted that h is true and
uses this information as evidence in efforts to increase the information available to
X. Such evidence constitutes the basis for making assessments of credal probability
used to evaluate risky choices.
Justifiably changing judgments of credal probability is important to inquiry only
insofar as it contributes to the promotion of the goals of inquiry. One does not engage
in inquiry in order to justify changes in credal probabilities. In inquiry new error free
information is sought. Credal, belief or subjective probabilities are neither true nor
false.1 Changing credal probabilities neither succeeds in nor fails to avoid error.
And changing credal probabilities fails to rule out logical possibilities as serious
possibilities. So they do not add to information.
1
As F.P. Ramsey (1990), B. de Finetti (1964) and L.J Savage (1954) rightly observed.
I. Levi ()
Columbia University, New York, NY, USA
e-mail: [email protected]
that implement such improvements are not changes in doxastic commitment. Even
though they play an important role in inquiry, they are not the changes that are the
target of inquiry.
The belief changes that are central to inquiry are changes in doxastic commit-
ment. I have suggested for a long time that changes in doxastic commitment that are
subject to justification are expansions and contractions.
Expansions are changes from weaker commitments to stronger ones. KN is
a consequence in the algebra of KI . The information contained in the initial
commitment KI is included in the information contained in the new commitment
KN . Contractions are changes from strong commitments to weak one where the
inquirer gives up information rather than acquiring it.
Expansions may be described in terms of the information added to an initial
state K. The expansion of K by adding potential state of full belief H is K^H or
KC H . To the extent that potential states are represented by deductively closed sets
of sentences in L, the expansion is represented by Cn(K [ H). PCn(fK[Hg) PK .
When H is the set of consequences of a single sentence h, Cn(K [ fhg) D KC h or
expansion by adding h.
In contraction from K, some of the possibilities ruled out by K are added to the
serious possibilities according to K. If K has as a consequence Hc , contraction that
removes this consequence has K _ H as a consequence. H is no longer impossible.
But the contraction could be stronger. It could have K _ X as consequence where X
is stronger than H. We shall explore this some more later on.
Many authors take another kind of change in doxastic commitment to be
more central than expansion or contraction. Revisions are changes in doxastic
commitment where the inquirer adds information to KI whether or not the new
information is consistent with the information in KI .2
Notice, however, that if H is a consequence of KI and KN has Hc as a conse-
quence the inquirer starts with full belief that H is true and from the perspective
to which he is then committed deliberately replaces this conviction with Hc . From
the inquirer’s initial state of full belief KI , making the change from KI to KN is
replacing true by false belief. Anyone seeking to avoid importing false belief in
making a change in full belief should not deliberately engage in such replacement.
This consideration seems decisive as an objection against recognizing deliberate
revision of KI in the sense of Alchourrón et al. (1985) as a justifiable form of belief
change. The objection is decisive provided the proximate goal of inquiry is to seek
new, error free and valuable information at the stage where a change in belief is
made. But alternative views of the goals of belief change are available.
(a) One can hold that avoidance of false belief is a desideratum in inquiry where
truth and falsity are assessed according to an “external” standard severed from
2
When the information added is implied by K, the revision is degenerate and identical with
K. When the information added is consisitent with but not implied by K, the revision is
a nondegenerate expansion. When the information is inconsistent with K, the revision is a
replacement in the sense of Levi 1980.
15 How Infallible but Corrigible Full Belief Is Possible 251
any inquirer’s point of view or as T.Nagel infamously put it, we consider truth
according to the point of view from nowhere.3 In that case, the inquirer should
recognize that H in KI might be false as fallibilism requires even though KI is
the inquirer’s standard for serious possibility.
(b) One can follow John Dewey and Peter Gärdenfors in refusing to consider
avoidance of false belief as a desideratum in inquiry.
(c) One can embrace the Messianic Realism of Charles Peirce and Karl Popper
and regard convergence on the true complete story of the world as an ultimate
goal of inquiry. This Messianic Realism may be supplemented by the thesis
that inquirers ought to get closer to the truth as a proximate aim of inquiry as
Niiniluoto continues to advocate.
(d) One can reject avoidance of error as a proximate aim of the next change in
inquiry but insist that avoidance of error in some finite number of changes after
the next change is such a goal.
(e) One can embrace Secular Realism as I and W.V. Quine do and maintain that
inquirers ought to avoid error as judged according to the evolving doctrine (i.e.,
according to KI ) as a proximate aim at the next change in inquiry. (Niiniluoto
1984).
The secular realist response (e) that I favor argues against the justifiability of
revising KI by adding information inconsistent with it in a single step. However,
sometimes the net effect of such revision may be justified by justifying each step in
a sequence of contractions and expansions.
The disagreement between secular realism and the alternatives is neither a
metaphysical nor a semantic one. It is a question of values. Peirce and Popper
thought that inquiry should be promoting progress toward the truth at the End of
Days. Others might think that progress towards true answers is desirable but avoid
the excesses of Messianic Realism. In any case, these approaches can spell out their
conceptions of aiming at truth and allow truth and falsity to be judged from some
agent’s point of view. Cognitive values that should be pursued in inquiry are at issue.
Truth and falsehood should be judged by the inquirer from the inquirer’s initial
state KI of full belief when assessing the options available for change in full belief.
Retrospective assessment is from the inquirer’s view in state KN . And others may
judge the truth or falsity of the inquirer’s beliefs before and after change from their
own belief states. Only assessment of the truth or falsity of the inquirer’s beliefs
from the point of view from nowhere is incoherent.4
In any case, according to Secular Realism (the position I endorsed under the
epithet “myopic realism” in Levi 1980) the direct justifiability of replacements
is decisively rejectable. Of course, changes of belief state by revision can be
3
Donald Davidson criticized this view. He argued that inquirers cannot coherently aim at truth
(1998). Aiming at truth is, indeed, incoherent if it is judged from the point of view from nowhere
rather than as Quine put it from the “evolving doctrine”.
4
If someone insists on the coherence of that point of view, it remains obscure as to the relevance
of seeking to avoid error as judged from that point of view.
252 I. Levi
5
When I first advanced this view, I used ‘knows’ rather than ‘fully believes’. I continue to think
that according to X at time t, X fully believes that h if and only if X knows that h. That is because
I define ‘knows’ as ‘truly believes’. This definition is an expression of the epistemic ideals I have
borrowed from the pragmatists who do not require that X justify X’s current beliefs but only
changes in X’s beliefs and the view that truth is judged relative to the evolving doctrine so that
according to X everything X fully believes is true. Notice that agent Y can agree that X knows that
h when Y also fully believes that h and will disagree other ways. X and Y will agree that X knows
that h if and only if X truly believes that h.
6
To fully believe, to be certain, or to know that H is not equivalent to judging that the probability
that H is 1. Setting aside the issue of indeterminacy in probability judgment, subjective or credal
probability judgment assigns numerical values in the closed unit interval to potential states of
full belief (or to propositions) and their complements when both are judged seriously possible
according to K. If H is fully believed and, hence, a consequence of K, it carries probability 1 and
is seriously possible while its complement is assigned probability 0 and is ruled out as impossible.
7
In a well-known example of the conflation of infallibility and incorrigibility, R.C. Jeffrey (1965)
argued that reasonable agents should not assign credal probability 1 to propositions because one
cannot coherently shift down from probability 1 in conformity with modifying credal probability
by Conditionalization. This sort of change in credal state is derivable from the inquirer’s state
of full belief and the credal probability determined for that state of full belief by the inquirer’s
credibility function (Carnap 1960) or confirmational commitment provided that the confirmational
commitment remains unchanged while the state of full belief is expanded by adding the proposition
e and the confirmational commitment satisfies the principle of confirmational conditionalization
(Levi 1980, 4.3). Confirmational Conditionalization is a synchronic constraint on confirmational
commitments. Temporal credal conditionalization is a procedure for changing credal probabilities
if the state of full belief is expanded by adding a new item e of information. That temporal principle
does not forbid giving up e. Indeed, as long as the confirmational commitment remains fixed,
a change from KC e to K can be derived from confirmational conditionalization as long as the
confirmational commitment remains fixed. Jeffrey took for granted that contraction of a state of
full belief could not be justified. He did not offer a compelling case for this conclusion.
15 How Infallible but Corrigible Full Belief Is Possible 253
The puzzle is not premised on a contradiction between saying that X was certain
that H but no longer is. There is no such contradiction. The issue is whether X can
justifiably become uncertain (change from full belief that H to doubt as to the truth
of H) if X is concerned to avoid false belief and maximize the value of information.
The justification should show how an inquirer X concerned to avoid importing
false belief while seeking to increase the value of the information available
could change views. The two desiderata of avoiding error and acquiring valuable
information tend to be in conflict. The more probable X judges H to be, the lower
the risk of error in coming to believe it and less the value of the information carried
by h tends to be.
Suppose X initially fully believes that H, X may not change belief state, may
replace full belief that H with full belief that Hc or move to a position of suspense
between H and Hc by contracting from K by removing H.
From X’s initial point of view, remaining with the status quo incurs no risk of
false belief.
From the same point of view, replacing full belief that H with full belief that Hc
imports false belief deliberately. Replacements are indefensible given the goal of
avoiding false belief while increasing the value of information.
Finally moving to a position of suspense between H and Hc by contraction incurs
no risk of importing false belief. But the inquirer X will be deliberately giving up
information in doing this. And given the goals of inquiry, gratuitous surrender of
information looks indefensible.
If not only replacement but contraction also is indefensible given the goals of
inquiry, the only kind of change that does not deliberately import error or give up
valuable information appears to be expansion. Expansion does to be sure incur a
risk of importing false belief. But incurring the risk may be justifiable in a manner
compatible with the concern to avoid error as long as the value of the information
promised compensates for the risk incurred.
To the extent that expansion can be justified along these lines,8 the inquirer
can overcome the obstacles to giving up information in contraction. Contraction
incurs a loss of valuable information. But subsequent inquiry stands some chance of
enhancing the informational value of the state of full belief.
Changing from an initial state of full belief KI to a contraction KN implied by KI
is entertainably justifiable in at least two contexts:
A. KI is inconsistent and contraction amounts to retreat from inconsistency.
B. KI is consistent but has a consequence Hc where H is a conjecture that would
contribute valuable information to X’s store of information were it (counter to
the verdict of X’s current state of full belief KN ) true. Contraction removing Hc
from KN is contemplated in order to give a hearing to H.
8
In Levi 1983; 1980, and 1991 I provide an account of the justification of expansion along these
lines based on ideas developed in Levi. 1967a and 1967b.
254 I. Levi
The first puzzle to consider regarding context A is that in order to retreat from
inconsistency, the inquirer needs to be in a state of inconsistency. KI D K? .
An inquirer’s state of full belief can be inconsistent in one of two ways. X
may be committed to a consistent state of full belief but his doxastic performances
may be inconsistent. In that case, removing inconsistency involves either efforts
at self-therapy or receiving help from others and their technologies. Indeed, not
only must the agent extricate him or herself from inconsistency but must identify
his or her consistent state of doxastic commitment. No one can humanly succeed
in this endeavor completely. All flesh and blood inquirers are inconsistent in their
performances. I have already indicated that I shall not be taking up the question of
the considerable therapeutic and engineering tasks involved in realizing the local
pockets of fulfillment of doxastic commitment that flesh and blood achieve. In this
discussion, the kind of retreat from inconsistency involved in this achievement shall
not be considered.
The sort of inconsistent state of full belief of concern here is an inconsistent state
of doxastic commitment. The puzzle we need to address is how an inquirer who is
rationally fulfilling his or her commitments could end up in such an inconsistent
state without involvement in a performance failure? To deliberately expand into
inconsistency is never justifiable if the common feature of the proximate goals of
all efforts to change doxastic commitments is to obtain valuable new information
without importing false beliefs. As long as X’s state of full belief is consistent, X is
committed to judging the inconsistent state K? to be false. To deliberately expand
into inconsistency is to deliberately import false belief into one’s evolving doctrine.
The inquirer, however, may deliberately incur a risk of importing false belief,
and do so rationally, if some appropriate benefit compensates for the risk. In
inquiry where the proximate aim is the acquisition of new and valuable error-free
information, making well designed observations is undertaken on the assumption
that the beliefs formed in response to the observations made have a good chance of
being true and informative. And the inquirer may sometimes be convinced that the
testimony of witnesses and experts is reliable and substantive. Consulting external
sources of information of both varieties incurs some risk of error. But often such
consultation is the only available way of acquiring information relevant to a certain
investigation. The value of the information acquired may compensate for the risk of
error incurred.
The acquisition of new beliefs via observation and the testimony of experts and
witnesses is “direct” in the familiar sense that it is not inference from premises. The
consultation involves implementing a program for letting inputs (such as sensory
excitations or verbal testimony) determine what information to add to a state of
full belief. The sensory inputs and verbal testimonies, however, are not premises
from which the inquirer infers a new belief. The addition of a new belief is an
outcome of a process initiated by the input. Whatever the psychological details
of the process might be, the occurrence of the inputs (the sensory stimulations
15 How Infallible but Corrigible Full Belief Is Possible 255
9
Regardless of whether the implementation is the product of habitual or customary practice or is
deliberate, the inquirer is committed to implementing the program prior to its implementation. The
inquirer is precommitted. The inquirer is (pre) committed to the results of implementing the routine
regardless of what they might be. When the result is expansion into inconsistency, the importation
of false belief is inadvertent.
256 I. Levi
10
The equivalence of a replacement to an expansion of a contraction is the Commensuration Thesis.
(Levi 1991, p.65 ) The Commensuration thesis is trivial as long the domain of potential states of
full belief is partially ordered in a manner satisfying the requirements of a Boolean algebra.
15 How Infallible but Corrigible Full Belief Is Possible 257
On the other hand, if options (i) and (ii) result in belief states carrying equal
informational value or if they are noncomparable with respect to informational
value, the inquirer should follow option (iii). This recommendation is based on
the assumption that the informational value of the join of two potential states of
full belief is the minimum informational value carried by the pair of states. This
is the assumption I have used in evaluating loss of damped informational value in
contraction in Levi 2004 and in Levi 1991 and 1997.
On this view, neither information acquired via observation nor from expert
witnesses is categorically authoritative. If it were, option (ii) would be mandatory
in all cases. All but the most rabid empiricists agree that the adoption of (ii) can be
trumped when the information carried by the background beliefs to be given up is
too valuable to surrender. Depending upon how valuable that information is, option
(i) or (ii) is favored.
The expansion step used to define replacement of Hc by H in KI is uniquely
determined by the contraction of KI by removing H. So both options (ii) and (iii)
are determined once we can identify the contraction of KI by removing Hc . But such
specification requires a choice from a roster of contraction strategies each of which
removes Hc from KI .
Keep in mind that such a “choice” is not a deliberate decision to implement
such a contraction. It is not a deliberate choice of a contraction from K? . Nor is
it a deliberate choice of a contraction from KI . If there is any deliberate decision
involved it is a deliberate choice of a precommitment at KPI to a clause in
the program for routine expansion precommitting the inquirer to an approach to
contracting from inconsistency should routine expansion lead to inconsistency..
Even so, the decision concerning a contingency plan for contracting from
inconsistency depends on answering the question: What should be a best or an
admissible contraction removing H from KI on the belief contravening supposition
that a deliberate choice of a contraction removing Hc from KI is required? The
supposition of this question contravenes or should contravene the full beliefs of
the inquiring agent. Even so, the question could be addressed as a problem of
rational choice. Given a specification of the options and the payoffs, how should
a decision be taken? In Levi 1991, I contended that that in such contexts of
“coerced” contraction where the inquirer has no coherent option but to retreat from
inconsistency, the inquirer still needs to decide “how to contract”.
Deliberate Contraction
11
In Levi 1991, ch. 4, I distinguished between coerced and uncoerced contraction and took note of
the fact that both types of contraction require consideration of the problem of “How to Contract”.
15 How Infallible but Corrigible Full Belief Is Possible 259
This is the question of determining what should constitute contraction removing h from K. I
subsequently discussed this matter in Levi 1997. My final word on this topic (I think) is to be
found in Levi 2004. I do not discuss this topic here but place emphasis on what I take to be the two
important types of contraction.
260 I. Levi
In any decision problem, the rationality of the choice made depends on the set of
options available to the decision maker – that is to say available to the decision
maker according to the decision maker’s point of view. In order for a proposition to
represent an available option, the decision maker must be convinced of his or her
ability to implement the option if he or she chooses to do so. The decision maker
must also judge it a serious possibility that he or she will implement the option.
If the decision maker fully believes that he or she will not make the choice, from
the decision maker’s point of view, deliberation directed at deciding whether to
implement the option is pointless.
Deciding how to contract by removing H is, so I assume, a problem for choice
where the inquirer seeks to implement the best option available that if implemented
would remove H from KI .
Given the nature of problem, the domain of options from which choice should be
made should be some subset of the contractions removing H from KI . Should the
set of options include all contractions removing H from KI ?
This question is not well formed. A contraction removing H from KI is a potential
state of full belief. We need to identify the Boolean algebra of potential states of
full belief and then ascertain the subset of consisting of contractions from KI
removing H. A customary view is to think of this algebra as the powerset of a set W
of atoms conceptually accessible to the inquirer.
15 How Infallible but Corrigible Full Belief Is Possible 261
12
The contraction is also characterized as the intersection of the set of consequences of K and the
set of consequences of T.
15 How Infallible but Corrigible Full Belief Is Possible 263
If the extension is the finitely additive probability mentioned in (a), the assess-
ment of informational value is given b y 1 – M(x) for all x in the algebra. This is the
undamped assessment of informational value.
There is an alternative assessment – the damped assessment of informational
value. Given any finite subset S of the powerset of ULK , the damped informational
value of the join of S is the minimum value assigned to an element of S.13
In discussing contraction, we are interested in subsets of the dual ultimate
partition U*K given K. The dual ultimate partition is, of course, a subset of the
basic partition. The informational value of a subset T of U*K is the minimum of
the informational values assigned elements of T and hence the maximum value of
informational value determining probability M. This maximum exists as long as T
is finite. We shall consider cases where U*K and, hence, T is finite.
In Levi 2004, I proposed that contraction from K removing H should be assessed
by evaluating the damped informational value of every contraction from K removing
H recognized by K and U*K and restricting choice to contractions that minimize
loss of damped informational value.
I also proposed a rule for Ties that recommended choosing the weakest contrac-
tion minimizing loss of informational value.
This proposal is intended for cases where the inquirer is committed to evalu-
ating damped informational value based on a numerically determinate probability
distribution over ULK .
In general, inquirers will not be committed to such numerically determinate
assessments but rather to a set of permissible assessments. I contend that the set
should be convex. The recommendation is to choose the weakest of the V-admissible
contraction.
The result is that the appropriate contraction removing H from K should be
what Rott and Pagnucco call “severe withdrawal” and I call “mild contraction”
for situations where the evaluation of loss of informational value is numerically
determinate or is representable the restriction of the set of probability distributions
over ULK to elements of U*K . that weakly order the elements of U*K and, hence,
maxichoice contractions from K whether they remove H or not.
Moreover, the recommendation is based on appeal to an account of rational
choice appropriate both for cases where the options are weakly ordered and where
weal ordering fails.
If the domain of options were to be restricted to the domain of partial meet
contractions removing H from K, and the assessment of informational remains
numerically determinate or at least free from ordinal conflict, the recommendation
would coincide with the recommendation of those who subscribe to the AGM
account of contraction.
13
In Levi 1991 and 1997, I proposed a variant on the damped informational value assessment. I
abandoned it in Levi 2004. I called the earlier variant “version 1 damped informational value” and
the later one version 2 damped informational value. In Rott (2006), Hans Rott showed that my
characterization of version 1 was seriously defective. In Levi (2006), I showed how to repair the
defect. But there were other reasons including comments by Hansson and Olsson (1995) that argue
for abandoning version 1 damped informational value. I rehearse them in Levi (2004).
264 I. Levi
But the AGM account provides no decision theoretic basis for restricting the
domain of options to just the partial meet contractions.
I do not mean to suggest that arguments have not been given in support of partial
meet contraction. David Makinson has offered at least two. The first points out that
every contraction removing H from K that is not the join of a subset T of U*K all
of whose members entail Hc is the join of a subset T’ that is the union of just such
a subset with another whose members entail H. This “withdrawal incurs a greater
loss of information than is incurred by using the partial meet contraction.
If we think of this argument decision theoretically, it presupposes that the goal
of contraction should be to minimize loss of information rather than informational
value. This is not a goal to which many, including Makinson, would subscribe; for
if endorsed it favors restricting choice to maxichoice contractions removing H from
K.14
14
H. Rott (2006) declares that my recent proposal for assessing loss of informational value (damped
informational value version 2) is strongly counterintuitive. The basis for his charge is that the
recommended contraction could be the disjunction of a great number of maxichoice contractions
each of which on its own incurs a great loss of informational value. But this basis presupposes what
I deny – to wit, that the aim of contraction is to minimize loss of undamped informational value.
Rott also insists that my proposal is sheer definition or stipulation. The only sense in which this
is so is that damped informational value version 2 represents one among many utility functions
that might be candidates for representing the value of information. I do not deny that rational
agents could have different preferences. But the alternatives do not appear appetizing. Consider
the interesting results reported in Rott (1993, 2001). Rott has explored choice functions over a
domain of maxichoice functions and the preference relations thus generated. In cases where the
choice function yields a set of two or more maxichoice contractions as optimal, he stipulates
without appeal to decision theoretic considerations implementation of a corresponding partial meet
contraction. Having done this he establishes a connection between choice consistency conditions
for choice functions defined over the domain of maxichoice contractions and important axioms for
contraction. Had he adopted a version 2 damped informational value utility function over the power
set of the maxichoice contractions removing H the correspondence Rott identifies could have
been rationalized decision theoretically rather than by stipulation. Even with this improvement,
the approach still restricts the options to the set of partial meet contractions without any decision
theoretic rationale for doing so whatsoever. I do not deny that Rott has established a correspondence
between choice consistency over maxichoice contractions and axioms for contraction. Such a
correspondence may satisfy philosophical logicians. But I do not see how it could satisfy anyone
interested in a decision theoretic rationalization of contraction.
Pagnucco and Rott (1999) do consider the full range of contractions removing H as options.
But, as Rott says, they think of the goals of contraction in such cases as being informational value
and fairness (expressed by rule for Ties) which are conflicting primary desiderata.
I cannot prove anyone irrational who takes this position but I think it is obviously untenable.
When two potential contractions removing H minimize loss of informational value, one may break
ties by suspending judgment – which involves moving to a weaker contraction than either of the
two – provided that this contraction does not incur a greater loss in informational value. If it does,
invoking the rule for Ties is untenable. If it does not, then even if the rule for ties recommends
favoring a contraction weaker than the two options that carry the same informational value that
it does, it recommends minimizing loss of informational value. Why should Pagnucco and Rott
deny this? I suspect that they think that suspending judgment between two options that minimize
loss of information or informational value cannot coherently minimize loss of information or of
informational value. This is so for information. But why do they insist that it is so for informational
15 How Infallible but Corrigible Full Belief Is Possible 265
The second argument alludes to the fact that every withdrawal is revision
equivalent to a partial meet contraction. That is to say, expanding the partial
meet contraction and expanding the revision equivalent withdrawal yield the same
revision of K. Here the revision is an AGM revision.
Consider the following transformation of K by adding K. The Ramsey revision
of K by adding H is the same as AGM revision in two cases: (i) K is inconsistent
with H, (ii) K is consistent with both H and Hc . In case (iii) where K is inconsistent
with Hc , AGM revision by adding H is an identity transformation. Ramsey revision
requires contraction by removing H and then expanding by adding H.
If the Recovery postulate for contraction is applicable, AGM revision and
Ramsey revision are equivalent. But Recovery should fail in many cases – most
notably statistical examples. If X knows that coin has been tossed and landed heads
and then contracts by removing the claim that the coin has been tossed, X should
give up the claim that the coin landed heads (or, indeed, that it landed at all).
Restoring the claim that the coin has been tossed will restore the claim that the
coin landed on the surface but not that it landed heads. In that case recovery fails
and AGM revision is no longer equivalent to Ramsey revision.
Although every withdrawal is AGM revision equivalent to a partial meet
contraction, it is not Ramsey revision equivalent to a partial meet contraction. If the
view of legitimate belief change I have been sketching is along right lines, neither
form of revision plays a central role in justifying belief change. Revision comes into
its own when an analysis of modal judgment on a supposition is on offer. I contend
that such Ramsey revision is more adequate to this task than AGM revision.
Hans Rott (1993, 2001) has offered a representation of the preference among
maxichoice contractions in terms of choice functions. Given a choice function over
the domain of maxichoice contractions removing H, the value of the function is the
set of optimal maxichoice contractions. removing H. The contraction determined by
the choice function is the join of the set of optimal maxichoice contractions and is
the partial meet contraction. This join is not, in general, a maxichoice contraction
and, hence, is not a member of the value of the choice function examined by Rott.
But it is join of the maxichoice contractions in the value of the choice function.
One could examine choice functions that take as arguments sets of contractions
removing H whether they are maxichoice or not. Rott does not develop an account of
preference over all contractions that shows that the meet contraction is best among
all the options available to the decision maker. He does show that the most preferred
maxichoice contractions removing H from K can be used to define a partial meet
contraction. But given his restricted account of the domain of the choice functions,
Rott cannot show that the recommended partial meet contraction is optimal.
It would have been easy for Rott to have obtained this definition of partial meet
contraction in terms of choice functions (at least in the finite case) had he extended
his choice functions from the domain of maxichoice contractions removing H from
value? If it were true, minimizing loss of informational value would lead straightforwardly to
choosing maxichoice contractions a consequence he acknowledges is untenable.
266 I. Levi
K to the powerset of this domain (that is to say, the set of partial meet contractions).
Given any set of maxichoice contractions, its value would be the best of the values
of the maxichoice contractions.
But even if Rott had taken this step, he would not have provided a satisfactory
decision theoretic account of contraction because he would not have explained
decision theoretically why “withdrawals” are left out of account. Using damped
informational value (that is to say the type 2 or second version), it is possible to
account for withdrawals.
My aim here has been to sketch the rationalization I have offered for a decision
theoretic approach to how to contract that completes the account of coerced
contraction and deliberate contraction outlined in previous sections. Given that
account, the corrigibility of the inquirer’s point of view, from the inquirer’s point
of view, is justifiable even though the inquirer to be coherent is committed to ruling
out the serious possibility that his or her current point of view is in error.
As I have done throughout most of my career, I have been maintaining that the
belief states that are targets for justifiable change are states of full belief. Such states
serve as standards for serious possibility. From the point of view of the inquirer
who makes the judgment of serious possibility, there is no serious possibility that
what the inquirer fully beliefs is false. The inquirer is committed to epistemological
infallibilism.
The main philosophical tradition maintains that such a view is untenable because
it implies epistemological incorrigibilism. I deny this. The inquiring agent is
sometimes warranted in contracting his or her state of full belief.
My argument for this view is based on the legitimacy of both routine expansion
and deliberate or inductive expansion.
Routine expansion is potentially conflict injecting. If we are to acknowledge the
legitimacy of such expansion whether by appealing to the testimony of the senses or
of competent witnesses and experts, programs for routine expansion must provide
contingency plans for contraction in case inconsistency inadvertently arises.
Deliberate contraction to give informationally valuable propositions a hearing
can be rationalized only on the assumption that subsequent to such contraction,
expansion will be legitimate that affords a promise of removing the doubts that
are raised. Sometimes routine expansion may be all that is required. But when the
conjectures to be given a hearing are highly theoretical (as is the case with General
Relativity Theory), deliberate or inductive expansion may be needed.
Expansion, so I claim, is legitimate as long as the quest for valuable information
may be seen to compensate for the risk of importing error incurred in expansion. If
we are to avoid the dogma that equates infallilibilism with incorrigibilism, we need
to follow William James in rejecting W.K. Clifford’s emphasis on the avoidance of
error as the sole desideratum of inquiry.
15 How Infallible but Corrigible Full Belief Is Possible 267
References
Alchourrón, C., Gärdenfors, P., & Makinson, D. (1985). On the logic of theory change: Partial
meet functions for contraction and revision. Journal for Symbolic Logic, 50, 510–530.
Carnap, R. (1960). The aim of inductive logic. In E. Nagel, P. Suppes, & A. Tarski (Eds.), Logic,
methodology and philosophy of science (pp. 302–318). Stanford: Stanford University Press.
Cohen, L. J. (1970). The implications of induction. London: Methuen.
Cohen, L. J. (1977). The probable and the provable. Oxford: Oxford University Press.
Davidson, D. (1998). Truth rehabilitated, Unpublished Manuscript.
De Finetti, B. (1964). Foresight: Its logical laws, its subjective sources. In H. E. Kyburg & H.
Smokler (Eds.), Studies in subjective probability (pp. 93–158). New York: Wiley.
Hansson, S. O., & Olsson, E. J. (1995). Levi contractions and AGM contractions: a comparison.
Notre Dame Journal of Formal Logic, 36, 103–119.
Jeffrey, R. C. (1965). The logic of decision. New York: McGraw Hill.
Levi, I. (1967a). Gambling with truth. Cambridge: MIT Press, Paper, 1973.
Levi, I. (1967b). Information and inference. Synthese, 17, 369–391.
Levi, I. (1974). On indeterminate probabilities. The Journal of Philosophy, 71, 391–418.
Levi, I. (1980). The enterprise of knowledge. Cambridge: MIT Press, Paper, 1983.
Levi, I. (1983). Truth, fallibility and the growth of knowledge. In R. S. Cohen & M. W. Wartofsky
(Eds.), Language, logic and method (pp. 153–174). Dordrecht: Reidel.
Levi, I. (1986). Hard choices: Decision making under unresolved conflict. New York: Cambridge
University Press.
Levi, I. (1991). The fixation of belief and its undoing. Cambridge: Cambridge University Press,
Paper, 2009.
Levi, I. (1997). For the sake of the argument. Cambridge: Cambridge University Press, Paper,
2007.
Levi, I. (2002). Maximizing and satisficing measures of evidential support. In M. David (Ed.),
Reading natural philosophy: Essays in the history and philosophy of science and mathematics
(pp. 315–333). Chicago: Open Court.
Levi, I. (2003). Contraction from epistemic hell is routine. Synthese, 135, 141–164.
Levi, I. (2004). Mild contraction. Oxford: Oxford University Press.
Levi, I. (2006). Informational value should be relevant and damped! Reply to Rott (2006).
Niiniluoto, I. (1984). Is science progressive? Dordrecht: D. Reidel.
Olsson, E. J. (2003). Avoiding epistemic hell, Levi on testimony and consistency. Synthese, 135,
119–140.
Pagnucco, M., & Rott, I. (1999). Severe withdrawal and recovery. Journal of Philosophical Logic,
28, 501–547. See ‘Erratum’ Journal of Philosophical Logic 29 (2000).
Ramsey, F. P., (1990). Philosophial papers (D. H. Mellor, Ed.). Cambridge: MIT Press.
Rott, H. (1993). Belief contraction in the context of the general theory of rational choice. Journal
of Symbolic Logic, 58, 1426–1450.
Rott, H. (2001). Change, choice and inference: A study of belief revision and nonmonotonic
reasoning. Oxford: Oxford University Press.
Rott, H. (2006). The value of truth and the value of information: On Isaac Levi’s epistemology.
In E. Olsson (Ed.), Knowledge and inquiry (pp. 179–200). Cambridge: Cambridge University
Press.
Savage, L. J. (1954). The foundations of statistics. New York: Wiley.
Shackle, G. L. S. (1949, 1952). Expectation in economics, Cambridge: Cambridge University
Press.
Shackle, G. L. S. (1961, 1969). Decision, order and time. Cambridge: Cambridge University Press.
Spohn, W. (1988). Ordinal conditional functions: A dynamic theory of epistemic states. In W.
Harper & B. Skyrms (Eds.), Causation in decision, belief change and statistics (pp. 105–134).
Dordrecht: Kluwer Academic Publishers.
Chapter 16
Belief Contraction in the Context of the General
Theory of Rational Choice
Hans Rott
Introduction
The theory of partial meet contraction and revision was developed by Alchourrón,
Gärdenfors and Makinson (henceforth, “AGM”) in a paper published in the Journal
of Symbolic Logic in 1985. That paper is the by now classic reference of the
flourishing research programme of theory change, or as it is alternatively called,
belief revision (Fuhrmann and Morreau 1991; Gärdenfors 1988, 1992; Katsuno and
Mendelzon 1991). In particular, it has been shown that partial meet contraction is
a powerful tool for the reconstruction of other kinds of theory change such as safe
contraction, epistemic entrenchment contraction, and base contraction (Alchourrón
and Makinson 1986; Nebel 1989, 1992; Rott 1991, 1992a).
The basic idea of partial meet contraction is as follows. In order to eliminate a
proposition x from a theory A while obeying the constraint of deductive closure and
minimizing the loss of information, it is plausible to look at the maximal subsets
B of A that fail to imply x. In an earlier paper, Alchourrón and Makinson had
proved that when A D Cn.A/ then taking one such B leaves an agent with too
many propositions, while taking the intersection of all such B’s leaves him with too
few. In AGM (1985), AGM investigate the idea of taking the intersection of a select
set of such B’s. The choice which B’s to take is made with the help of a selection
function. A natural question is whether all these selections can be represented as the
selections of preferred B’s, where the preferences between maximally nonimplying
subsets of A are independent of the proposition x to be deleted.
The purpose of the present paper is threefold. First, we put the theory of partial
meet contraction in a broader perspective. We decompose it into two layers, each
H. Rott ()
Department of Philosophy, University of Regensburg, 93040 Regensburg, Germany
e-mail: [email protected]
of which can be cultivated with the help of methods developed in different research
areas. On the one hand, in relating selection functions to preference relations we can
draw on existing work in social choice theory (Chernoff 1954; Herzberger 1973; Sen
1982; Suzumura 1983). On the other hand, we shall elaborate on a remark of Grove
(1988) and link maximally nonimplying subsets to “possible worlds” or models,
thereby making it possible to compare partial meet contraction with semantical
approaches to belief revision and nonmonotonic reasoning (Kraus et al. 1990;
Katsuno and Mendelzon 1991; Lindström 1991; Lehmann and Magidor 1992).
Exaggerating somewhat, we can say that the theory of partial meet contraction
emerges from juxtaposing the general theory of rational choice and a tiny bit of
model theory. After introducing abstract postulates for contraction functions, we
reprove the two main representation theorems of AGM (1985, pp. 521, 530) con-
cerning partial meet contraction and transitively relational partial meet contraction
in a slightly more systematic fashion.
Second, we provide a partial solution to a problem left unanswered by AGM
and still considered to be an interesting open question in Makinson and Gärdenfors
(1991, p. 195). More precisely, we present two new results that lie strictly “between”
those of AGM (1985), viz., representation theorems for relational and negatively
transitively relational partial meet contraction. However, these results hold only
under certain preconditions. If the theory to be contracted is logically finite, then all
these conditions are met. Our decomposition allows for a uniform method of proof
using so-called Samuelson preferences. It increases our understanding of the partial
meet contraction mechanism by localizing exactly in which parts of the proofs the
finiteness assumption is effective.
Third, as an application, we explore the logic of a variant of syntax-based belief
change, namely simple and prioritized base contractions in the finite case. The basic
idea here is that real life theories are generated from finite axiomatizations and that
the axioms may carry different epistemic weight. Both the syntactical encoding of
a theory and the prioritization are held to be relevant for theory change. We achieve
a logical characterization of simple and prioritized base contraction by combining
one of our representation theorems with observations originating with Lewis (1981)
and Nebel (1989, 1992).
Independently from the research for this article, related work has been carried
out by Katsuno and Mendelzon (1991) and Lindström (1991). Lindström proves
numerous representation theorems, and like us, he heavily draws on results from the
general theory of rational choice. However, there are some important differences.
He adopts a semantic approach right from the outset, while our reconstruction
starts by turning a syntactic approach into an essentially semantic one, with the
notable feature that any two “worlds” in our sense are linguistically distinguishable
(our models are “injective”, to use a term of Michael Freund). Lindström’s main
concern is nonmonotonic inference relations and the related area of belief revision,
whereas we shall focus on belief contraction. (Everything we might wish to say
about the intimate connection between revisions and contractions is contained in
AGM 1985 and Gärdenfors 1988). He applies liberal maximization where we apply
stringent maximization. He uses different postulates for choice functions and a
16 Belief Contraction in the Context of the General Theory of Rational Choice 271
different notion of revealed preference. The main contrast between his approach
and ours, however, is that Lindström permits revisions by—possibly infinite—sets
of propositions whilst we stick to the original AGM model of changing theories by
single propositions. Lindström’s generalization allows him to revise by “worlds”
and thus dispense with a finiteness assumption which will prove to be necessary
at several places in the AGM framework. In fact, it is a major aim of our paper to
make transparent the reasons where and why logical finiteness is needed in the AGM
theory. Further important results on purely finitistic belief revision are found in
Katsuno and Mendelzon (1991) who incorporate a revision operator into their object
language. Both papers are highly recommended and should be read in conjunction
with the present one.
Unfortunately, space limitations do not allow a presentation that makes for
easy reading. Familiarity with earlier work either in the area of theory change—
an excellent overview is Gärdenfors (1988)—or in the general theory of rational
choice—of particular relevance is Herzberger (1973)— will greatly facilitate the
reader’s task. However, we will shortly repeat the basic definitions so that our
presentation is in principle self-contained. It may be useful to inform the reader in
advance about the fundamental entities that will show up as determinants of theory
change. We shall meet contraction functions : , selection functions
, preference
relations (and Î), maximally nonimplying sets M (and N), maximally consistent
sets or “worlds” W, as well as simple and prioritized belief bases B and hBi i. We
shall frequently jump to and fro between these kinds of entities. When we wish
to generate a contraction function, selection function, preference relation, maximal
nonimplying set, world, or belief base from another kind of entity, we shall use
metalevel mappings denoted by C; S; P; M; W; B respectively.
Rational Choice
In the general theory of choice and preference we often find an idea which can be
phrased in the slogan “Rational choice is relational choice”. That is, rational choice
is choice which can be construed as based on an underlying preference relation.
16 Belief Contraction in the Context of the General Theory of Rational Choice 273
The intended interpretation of the set
.S/, called the choice set for S, is that its elements
are regarded as equally adequate or satisfactory choices for an agent whose values are
represented by the function
, and who faces a decision problem represented by the set
S. Following Chernoff (1954) . . . , this relativistic concept of equiadequacy for a given
decision problem bears sharp distinction from invariant concepts like preferential matching
or indifference which for a given agent are not relativized to decision problems, and which
may be subject to more stringent constraints, both for rational agents and for agents in
general. (Herzberger 1973, p. 189, notation adapted)
Choice sets are taken to be sets of “best” elements. There are basically two
ideas to make this precise. The first is based on a nonstrict (reflexive) preference
relation :
1
Intuitively, however, I think that liberal maximization is preferable. Liberal maximization is based
on strict relations which do not allow to distinguish between incomparabilities and indifferences.
Nonstrict relations do make this distinction, but stringent maximization tends to require connected
relations which often can be had only if incomparabilities are turned into indifferences—i.e., if
augmentations are used. The interpretation of nonstrict relations as the converse complements of—
more intuitive—strict relations explains the crucial role of negative transitivity and negative well-
foundedness in the following. Also compare the recommendation in Rott (1992b) to regard the
nonstrict epistemic entrenchment relation E of Gärdenfors and Makinson (1988) as the converse
complement of a more intuitive strict relation <E .
274 H. Rott
D S./. When we say that
is relational, we mean that there is some relation
over X such that
D S./.
For any nonstrict relation , < is to denote the asymmetric part of , which is
defined by x < y iff x y and not y x. is called n-acyclic, if no n objects
x1 ; x2 ; : : : ; xn form a cycle under <, i.e., if a chain x1 < x2 < < xn < x1
does not occur. 1-acyclicity is irreflexivity, 2-acyclicity is asymmetry. is called
acyclic, if it is n-acyclic for every n D 1; 2; 3; : : : . is called negatively acyclic if
there is no cycle under 6, i.e., if never x1 6 x2 6 6 xn 6 x1 , and it is called
negatively well-founded if there is no infinite descending chain under 6, i.e., if never
6 x3 6 x2 6 x1 . For connected relations , negative well-foundedness coincides
with converse well-foundedness where infinite ascending chains x1 < x2 < x3 < : : :
do not occur. Obviously, if is conversely (or negatively) well-founded then it is
acyclic (negatively acyclic). is smooth with respect to X if there are no infinite
descending 6-chains in S, for every S 2 X. Smoothness is a restricted form of
negative well-foundedness. is called negatively transitive (virtually connected,
modular, ranked) if x 6 y and y 6 z implies x 6 z. It is quickly verified that a
connected relation is negatively transitive iff it is quasi-transitive in the sense that
its asymmetric part < is transitive. Quasi-transitivity is a much disputed requirement
in social choice theory (Herzberger 1973; Sen 1982; Suzumura 1983). It should be
noted that all transitive relations are both acyclic and quasi-transitive.
In many contexts one can hope to recover the underlying preferences of an agent
from observed choice behavior. The two most commonly used types of “revealed
preference” relations are the Samuelson preferences (Samuelson (1950))
The terminology is taken from Herzberger (1973), where many other possibilities
of defining notions of revealed preference are discussed. Obviously, x
;2 y implies
x
y. Notice that neither of these relations is guaranteed to be reflexive, unless
is 1-covering.
;2 is defined for arbitrary
’s, but the definition makes good
sense only for 2-covering ones for which fx; x0 g is always in X. In this case, base
16 Belief Contraction in the Context of the General Theory of Rational Choice 275
The following lemmas list a number of important facts which are basically common
knowledge in the general theory of rational choice (cf. Herzberger 1973; Sen 1982;
Suzumura 1983). For the straightforward proofs, see Appendix 2.
Lemma 1. (a) If
is relational then it satisfies (I) and (II).
(b) If
is 12-covering and satisfies (I) and (II), then
D S.P2 .
//.
(c) If
is 12-covering and satisfies (I), then P.
/ D P2 .
/.
(d) If
is 12-covering and satisfies (I) and (II), then
D S.P.
//.
(e) If
is 12-covering and relational, then
D S.P.
// D S.P2 .
//.
(f) If
is additive and satisfies (I) and (II), then
D S.P.
//.
Lemma 2. (a) If
is 12n-covering and satisfies (I), then P.
/ D P2 .
/ is n-
acyclic. If
is !-covering and satisfies (I), then P.
/ is acyclic.
(b) If
is 123-covering and satisfies (I) and (III), then P.
/ is negatively transitive.
(c) If
is finitely additive and satisfies (IV), then P.
/ is transitive.
Lemma 3. (a) If is smooth with respect to X, then S./ is a selection function
over X which satisfies (I) and (II).
(b) If is negatively transitive and negatively well-founded (or: if is negatively
transitive and smooth and X is subtractive), then S./ satisfies (III).
(c) If is transitive, then S./ satisfies (IV).
Now we bring conditions (I)–(IV) into a form which is more suitable for our
purposes.
(I0 ) For all S; S0 2 X such that S [ S0 2 X,
.S [ S0 /
.S/ [
.S0 /
(II0 ) For all S; S0 2 X such that S [ S0 2 X,
.S/ \
.S0 /
.S [ S0 /
(III0 ) For all S 2 X and S0 such that S [ S0 2 X, if
.S [ S0 / \ S0 D ; then
.S/
.S [ S0 /
(IV ) For all S 2 X and S0 such that S [ S0 2 X, if
.S [ S0 / \ S 6D ;, then
0
.S/ .S [ S0 /
2
It is worth pointing out that the characteristic definition of a relation of epistemic entrenchment
(see Gärdenfors and Makinson 1988; Rott 1992b) between propositions from an observed
contraction behavior, viz.
: :
x E y iff x … A .x ^ y/ or y 2 A .x ^ y/
can also be interpreted as a base preference (Rott 1992b, p. 61). In that paper it is argued that the
instruction “remove x ^ y” should be regarded as an instruction to remove x or remove y, where the
agent holding theory A has free choice which proposition(s) out of fx; yg to remove. [Note added
in 2015: This sketch of an idea was turned into a theory in Rott (2003).]
276 H. Rott
S T
(I&II) For all S; Si 2 X, i 2 I, if S fSi W i 2 Ig then S\ f
.Si / W i 2 Ig
.S/
(I&II0) For all S; S0 ; S00 2 X, if S S0 [ S00 then S \
.S0 / \
.S00 /
.S/
Note that condition (II0 ) is just a restriction of (II) to index sets with at most two
elements, and similarly for (I&II) and (I&II0).
Lemma 4. (a) If
is subtractive, then condition (I) is equivalent to (I 0 ).
(b) If
is finitely additive, compact, and satisfies (I) and (II 0 ), then
D S.P.
//.
(c) If
satisfies (I), then (III) is equivalent to (III 0 ).
(d) (IV) is equivalent to (IV 0 ).
(e) If
is additive, then the conjunction of (I) and (II) is equivalent to (I &II).
(f) If
is finitely additive, then the conjunction of (I) and (II 0 ) is equivalent to
(I &II 0 ).
3
This is no problem for Lindström (1991) whose selection functions are always !-covering.
Consequently, Lindström’s constructions can always make use of the base preferences P2 .
/.
16 Belief Contraction in the Context of the General Theory of Rational Choice 277
4
This marks a difference with Lewis (1981) who identifies propositions with sets of extra-linguistic
possible worlds and logical consequence with set-theoretic inclusion. Lacking compactness, Lewis
has to ponder the impact of a “Limit Assumption” for premise semantics.
278 H. Rott
Now let A again be a theory. For M 2 UA , put W.M/ D Cn.M [ f:xg/ where x
is any proposition in A M, and put M.W/ D W \ A, for any maximally consistent
set W such that A 6 W. The reader is invited to verify: W is well-defined(!), W is a
bijection from UA to VA D fW 2 W W A 6 Wg, and M is the converse of W, i.e.,
M.W.M// D M and W.M.W// D W.
Given this representation of the elements of UA , it is clear that they satisfy the
following fullness and primeness conditions: If x 2 A M and y 2 A then x ! y 2 M,
and if x; y 2 A M then x _ y 2 A M (cf. Observations 6.1 and 6.2 in AGM 1985).
UA is just the set of all maximal proper subtheories of the theory A. Moreover, we
immediately get
Corollary 1. Let A be a theory and x; y; yi be in A Cn.;/.
(i) For M 2 UA , x … M iff M 2 A? x.
(ii) A? .x ^ y/ D A? x [ A? y.
(iii) A? .x _ y/ D A? x \ A? y.
(iv) A? .x _ :y/
SD A? x A? y. S
(v) If A? x fA? yi W i 2 Ig, then A? x fA? yi W i 2 I0 g for some finite
I0 I.
Facts (i) and (ii) are contained in AGM (1985, Lemma 2.4 and Lemma 4.1).
We see that A? , the special domain X of the selection functions which will figure
in partial meet contraction s, is closed under finite unions, finite intersections, and
differences.
S We give a direct proof of the compactness property (v). Let A? x
fA? yi W i 2 Ig. Then fyi W i 2 Ig ` x. For otherwise, by Lindenbaum’s Lemma and
Lemma 5, there would be an M 2 A? x such that fyi W i 2 Ig M, so M … A? yi for
every i, contradicting our hypothesis. Compactness of Cn gives us fyi W i 2 I0 g ` x
for some finite I0 I. Thus, there is, for every M 2 A? x, an i 2 I0 such that yi … M.
Hence, by S (i), there is, for every M 2 A? x, an i 2 I0 such that M 2 A? yi . Thus
A? x fA? yi W i 2 I0 g, as desired.
Now we can introduce selection functions for belief revision. Let A be a theory.
A
A selection function
W A? ! 22 is called a selection function for A. It follows
from Corollary 1 that every selection function
for A is finitely additive, subtractive
and compact.
Let
be a selection function
T for A. The completion
* of
is defined by
*.A? x/ D fM 2 A? x W
.A? x/ Mg, for all x 2 A Cn.;/. Following
AGM (1985), we call a selection T function
complete if
D
*. If
is complete,
then MT 2
.A? x/
T whenever
.A? x/ M 2 UA (Proof: Since fx ! y W y 2
Ag 2
.A? x/,
.A? x/ M implies M 6` x, so M 2 A? x by Corollary 1(i).)
A contraction function over a theory A is a function : A W L ! 2L . We write A : x
for : A .x/, and as there will be no danger of confusion, we shall often write just :
for : A . The term ‘A : x’ should be read as the ‘the result of rationally removing x
from A’. The idea of a contraction function dictates that it should satisfy at least the
postulates A : x A and x … A : x (unless ` x). More postulates will be discussed
in section “Postulates for Contraction Functions”. Intuitively, A : x is the minimal,
most economical or rational, change of A needed to discard x.
16 Belief Contraction in the Context of the General Theory of Rational Choice 279
Given a selection function
for A, the partial meet contraction it determines will
be denoted by C.
/. There is a slight deviation from AGM (1985) in order to avoid
the application of
to ; and to preserve the perfect correspondence between UA and
VA . The function : is called a partial meet contraction over A if there is a selection
function
for A such that : D C.
/.
A selection function
for A is called (transitively, quasi-transitively, con-
nectively, acyclicly) relational over A if there is a (transitive, quasi-transitive,
connected, acyclic) preference relation over UA (over 2A in AGM 1985) such
that for every x 2 A Cn.;/:
then clearly,
which include B), in symbols W0 D ŒŒy (W0 D ŒŒB ). We call
elementary (-
elementary) if
W .ŒŒ:x/ is elementary (-elementary) for every x 2 A Cn.;/.
If
is elementary (-elementary) and
W .ŒŒ:x/ D ŒŒy (
W .ŒŒ:x/ D ŒŒB), then
A : x D Cn.fz _ y W z 2 Ag/ D A \ Cn.y/ (respectively, A : x D Cn.fz _ z0 W z 2 A
and z0 2 Bg/ D A \ Cn.B/).
Remark 1.
is complete iff it is -elementary.
T
Proof. From left to right. We showT that ŒŒ.
.A? x// [ f:xg D
W .ŒŒ:x/, for
every x 2 A Cn.;/. Clearly, .
.A? x// [ f:xgT W for every W 2
W .ŒŒ:x/.
To show the converse, suppose for reductio that .
.A? x// [ f:xg W and
WT …
W .ŒŒ:x/. Hence, by the latter,
TM.W/ …
.A? x/, so by the T completeness of
, T
.A? x/ 6 M.W/. But also .
.A? x// [ f:xg W, and
.A? x/ A,
so
.A? x/ M.W/, and we have a contradiction.
T to left. Suppose for reductio that there are x 2 A Cn.;/ and M 2 UA
From right
such that
.A? x/ M and M …
.A? x/. Hence W.M/ …
W .ŒŒ:x/. Since
is -elementary, there is a set B of propositions such that
W .ŒŒ:x/ D ŒŒB. So
B 6 W.M/. Take some yT 2 B W.M/. Since x _ y …TW.M/, we get, by the definition
of W,T x _ y … M. So by
.A? x/ TM, x _ y … T
.A? x/. But since x 2 A and
y 2
W .ŒŒ:x/, we get x _ y 2 A \
W .ŒŒ:x/ D
.A? x/, and we have again
a contradiction. Q . E. D .
A set B of propositions will be called logically finite (or finite modulo Cn, or simply
finite) if Cn partitions B into finitely many cells. The finite case is much easier to
handle than the general one. This is due to the fact that every selection function over
a logically finite theory is !-covering and even full.
Notice that for a theory A to be finite modulo a common logic Cn, it will be
necessary that the underlying language has only finitely many atoms. For suppose
16 Belief Contraction in the Context of the General Theory of Rational Choice 281
there are infinitely many atoms p1 ; p2 ; p3 ; : : : in our language. Then for every
proposition x from A there are infinitely many atoms pi not occurring in x. Thus the
infinitely many .x_pi /’s which are all contained in A will be mutually nonequivalent,
so A is not finite. Conversely, if only a finite number of nonequivalent propositional
operators of bounded arity is definable in Cn, as in classical propositional logic or
in modal logics with finitely many modalities, then the finiteness of the number of
propositional atoms is also sufficient for the logical finiteness of theories A phrased
in the language in question.
Given a logically finite theory A, it is clear that each of the following sets is finite:
A? x, for every x, A? , and UA .
A representative of a logically finite set of propositions B is a conjunction of
representatives of all the equivalence classes under Cn of propositions in B.
Henceforth, we shall use the following notational convention. For sets of
propositions A; B; : : : ; M; N; : : : ; W; : : : which are finite modulo Cn, the lower case
letters a; b; : : : ; m; n; : : : ; w; : : : denote their representatives. For any two sets A and
B such that B A, bA denotes the disjunction of representatives of those equivalence
classes under Cn, the elements of which are in Cn.A/ Cn.B/. When we are dealing
with a fixed theory A, we simply write b instead of bA . If B D A, then bA is defined
to be the falsity, ?. We may call bA the corepresentative of B (relative to A).
It is easy to verify in the finite case that for M 2 UA and W D W.M/, w is
equivalent with :a ^ m, and that for W 2 VA and M D M.W/, m is equivalent with
a _ w. This helps us to get the following useful
Lemma 6. Let A be a finite theory and M 2 UA .
(i) If w is the representative of W.M/ and wL is the corepresentative of W.M/
relative to the set of all propositions in L, then the following four propositions
are equivalent: mA , m ! a, wL and :w.
(ii) A? m D fMg.
(iii) For all M1 ; : : : ; Mn 2 UA , if m ` m1 _ _ mn , then M D Mi for some i.
Lemma 6(ii), together with Corollary 1(ii), shows that in the V finite case every
subset M of UA equals A? x for some proposition x, viz., for x D fm W M 2 Mg.
This in turn means that all selection functions for a finite theory A are !-covering
and in fact full. They are not only complete (-elementary) but even elementary (cf.
AGM 1985, Observation 4.6).
We now turn to a set of rationality criteria which has gained some prominence in the
literature on belief change. The basic AGM postulates are given by ( : 1) – ( : 6), and
the two supplementary ones are ( : 7) and ( : 8). For their motivation, see Gärdenfors
(1988).
( : 1) A : x is a theory
( : 2) A : x A
282 H. Rott
( : 3) If x … A then A : x D A
( : 4) If x 2 A : x then ` x
( : 5) A Cn..A : x/ [ fxg/
( : 6) If Cn.x/ D Cn.y/ then A : x D A : y
( : 7) A : x \ A : y A : .x ^ y/
( : 8) If x … A : .x ^ y/ then A : .x ^ y/ A : x.
These postulates, and all following postulates, are understood as quantified over
all theories A and all propositions x and y. It follows from ( : 1) and ( : 5) that A : x D
A for every x 2 Cn.;/. We introduce two new conditions.
( : 8r) A : .x ^ y/ Cn.A : x [ A : y/
( : 8c) If y 2 A : .x ^ y/, then A : .x ^ y/ A : x.
With two very recent exceptions,5 I have never seen a condition like ( : 8r)
discussed in writings on the logic of theory change. Given ( : 4), ( : 8) implies the
“covering condition”
A : .x ^ y/ A : x or A : .x ^ y/ A : y
(AGM 1985, Observation 3.4) and hence ( : 8r).6 Postulate ( : 8c) was found to be
relevant in Rott (1992b), where it has the same name. The “r” in ( : 8r) stands for
“relational”, and “c” stands for “cumulative”. The first name will be explained by the
present paper, the second one is explained in Rott (1992b). For contraction functions
satisfying ( : 4), ( : 8c) is also a weakening of ( : 8). However, there is no logical
relationship between ( : 8c) and ( : 8r), not even in the finite case and in the presence
of ( : 1) – ( : 7). To see this, consider the propositional
language L over the two
x
atoms p and q. In the following two figures, “ y ” is short for “A : x D Cn.y/”.
It is easily verified that the contraction function : over A D Cn.p ^ q) defined
in Fig. 16.1 satisfies ( : 1) – ( : 7) and ( : 8r), but it does not satisfy ( : 8c), because
q 2 A : .p ^ q/ but Cn.q/ D A : .p ^ q/ 6 A : p D Cn.:p _ q/. On the other hand,
the contraction function : over the same theory A defined in Fig. 16.2 satisfies ( : 1)
– ( : 7) and ( : 8c), but it does not satisfy ( : 8r), because Cn.p _ q/ D A : .p ^ q/ 6
Cn..A : p/ [ .A : q// D Cn.f:p _ q; p _ :qg/ D Cn.p $ q/.
We now relate the abstract postulates for contraction functions to our previous
requirements for selection functions in partial meet contraction.
5
Both were discovered independently and concern belief revision rather than belief contraction.
The first exception is condition (R8) in Katsuno and Mendelzon (1991). The second is the infinitary
condition “Gamma” in Lindström (1991) which is labelled (BC7) in its variant for belief revision
operations.
6 : :
Incidentally, it is proved in AGM (1985, Observation 6.5) that the conjunction of ( 7) and ( 8)
is equivalent to the even stronger “ventilation condition”
: : : : : : :
A .x ^ y/ D A x or A .x ^ y/ D A y or A .x ^ y/ D A x \ A y:
16 Belief Contraction in the Context of the General Theory of Rational Choice 283
p q
p↔q
¬p ∨ q p
q
p∧ q
q
p q
p↔q p∨¬q
¬p∨ q
p ∨q
p∧ q
∧
p q
:
Lemma 7. Let A be a theory. 8̂ If D C.
/ for some 9 selection function
ˆ .a/ >
>
ˆ
ˆ >
>
ˆ
ˆ .b/ .I 00
/ >
>
< =
for A and
satisfies 00
.c/ .II / and
is complete then : satisfies
ˆ
ˆ >
>
ˆ
ˆ .d/ .III 00 / >
>
ˆ >
>
:̂ .e/ .IV / 00 ;
8̂ 9
ˆ
ˆ .a/ . : 1/ . : 6/ >
>
>
ˆ
ˆ : : >
>
ˆ
< .b/ . 1/ . 7/ >
=
: : :
.c/ . 1/ . 6/ and . 8r/ .
ˆ
ˆ >
>
ˆ : : : >
ˆ .d/ . 1/ . 6/ and . 8c/ >
ˆ >
:̂ .e/ . : 1/ . : 6/ and . : 8/ >
;
:
Proof. Let be a partial meet contraction function over A determined by
.
:
(a) It is proved in AGM (1985, Observation 2.5), that satisfies ( : 1) – ( : 6).
284 H. Rott
It is easy to verify that ( : 7) and ( : 8), and thus also ( : 8r) and ( : 8c), are
satisfied whenever one of the limiting cases x … A Cn.;/ or y … A Cn.;/
holds. In the rest of this proof, we always presume the principal case x; y 2
A Cn.;/.
00
(b) Now
T let
satisfy T (I ), i.e., letT
.A? .x ^ y//
.A? x/ T [
.A? y/. Hence
.
.A? x// \ .
.A? y// D .
.A? x/ [
.A? y//
.A? .x ^ y//, i.e.,
A : x \ A : y A : .x ^ y/. That is, : satisfies ( : 7). T
(c) Now let
be complete Tand satisfy (II00 ). Let z 2 A : .x ^ y/ D
.A? .x ^
y//. So by (II00 ), z 2 .
.A? x/ \
.A? y//. Now suppose forTreductio that
zT… Cn.A : x [ A : y/. Then there is an M 2 A? z such that .
.A? x// [
.
.A? y// M. Since
is complete, we get MT2
.A? x/ and M 2
.A? y/,
so M 2
.A? x/ \
.A? y/. But z … M, so z … .
.A? x/ \
.A? y// which
gives us a contradiction. So : satisfies ( : 8r).
(d) Now let
satisfy (III00 ). Since the antecedents
T of (III00 ) and ( : T
8c) are identical
and
.A? x/
.A? .x ^ y// entails
.A? .x ^ y//
.A? x/, it is
obvious that : satisfies ( : 8c).
(e) Now let
satisfy (IV00 ). By exactly the same argument as in (d), : satisfies
( : 8). Q . E. D .
It is clear from Lemma 8 that in the proof of the completeness half of Theorem 1
the determining selection function is chosen complete, but we do not think it is
286 H. Rott
necessary to state this in the theorem. The same comment applies to the following
three representation theorems.
Theorem 2. Every relational partial meet contraction function : over A satisfies
( : 1) – ( : 7), and if : is determined by a selection function that is both relational
and complete (equivalently, -elementary), then it satisfies ( : 8r). Conversely, every
contraction function : over A satisfying ( : 1) – ( : 7) and ( : 8r) is a relational
partial meet contraction function.
Proof. For the first part, let : be a partial meet contraction function determined by
a preference relation . By Lemma 1(a), S./ satisfies (I) and (II). Since A? is
subtractive, it also satisfies (I0 ) and (II0 ), by Lemma 2(a), and also (I00 ) and (II00 ),
by reformulation. So by Lemma 7 (b), C.S.// satisfies ( : 1) – ( : 7), and by
Lemma 7(c), it satisfies ( : 8r), if S./ is complete, i.e., -elementary.
For the converse, let : satisfy ( : 1) – ( : 7) and ( : 8r). By Lemma 8(b) and
(c), C.S. : // D : and S. : / satisfies (I00 ) and (II00 ), and also (I0 ) and (II0 ),
by reformulation. Since A? is subtractive, S. : / satisfies (I), by Lemma 4(a).
Since A? is also finitely additive and compact, S. : / is relational with respect
to P.S. : //, by Lemma 4(b). That is, S. : / D S.P.S. : ///. Hence : D
C.S.P.S. : ////, i.e., : is relational with respect to P.S. : //. Q . E. D .
Proof. This is Observation 4.4 of AGM (1985), but we shall sketch our proof which
is quite different from the construction offered there.
For the first part of the theorem, let : be a partial meet contraction function
determined by a transitive preference relation . We show in the same way as
in the proof of Theorem 2 that S./ satisfies (I0 ) and (II0 ), and we know from
Lemmas 3(c) and 4(d) that it satisfies (IV) and (IV0 ). So by Lemma 7(b), C.S.//
satisfies ( : 1) – ( : 7), and by Lemma 7(e), it satisfies ( : 8).
For the converse, let : satisfy ( : 1) – ( : 7) and ( : 8). As in the proof of
Theorem 2, we get that : is relational with respect to P.S. : //. Moreover, by
Lemma 8(e), S. : / satisfies (IV0 ) and also (IV), by Lemma 4(d). Since A? is
finitely additive, P.S. : // is transitive, by Lemma 2(c). Q . E. D .
Corollary
8̂ 2. Let A be a logically 9 finite theory. Then : is a
ˆ — >
>
ˆ
< >
=
relational
partial meet contraction function iff it satisfies
ˆ
ˆ negatively transitively relational >
>
:̂ >
;
transitively relational
8̂ 9
ˆ . : 1/ . : 6/ >
>
ˆ
< : : : >
=
. 1/ . 7/ and . 8r/
: : : : .
ˆ
ˆ . 1/ . 7/; . 8r/ and . 8c/ > >
:̂ >
;
. : 1/ . : 7/ and . : 8/
Proof. By Theorems 1, 2, 3, and 4, because for finite theories A,
over A?
is complete, every negatively transitive nonstrict preference relation over A is
smooth, and A? is 123-covering.
It is easy to locate the impact of the finiteness assumption. In the case of relational
partial meet contraction s (Theorem 2), it is a constraint (completeness) imposed
by model theory which has to be met in order to make the “soundness” direction
work. In the case of negatively transitively relational partial meet contraction s
(Theorem 3), it is a constraint imposed by the theory of rational choice (that
be
123-covering and smooth) which is satisfied without strains on intuitive plausibility
only in the finite case.
It should be noted that although . : 8/ in the context of the other postulates
implies . : 8r/ and . : 8c/, transitive relationality does not imply negatively transitive
relationality. However, transitivity in companion with connectivity implies negative
transitivity. And it is known from AGM (1985, § 5) that a connectivity requirement
on the underlying preference relation changes very little in the partial meet
mechanism.7
We conclude this section with a direct representation of the Samuelson prefer-
ences over UA revealed by contraction behavior. Let D P.S. : //. Then in the
general case, M M 0 iff there is an x such that A : x M 0 and M 2 A? x, or there
7
In order to reconcile this with your intuitions, cf. Footnote 1.
288 H. Rott
8
This simplified rephrasing of the AGM construction makes use of the fact that for all M 2 UA and
: :
contraction functions satisfying the AGM postulates, A x M implies M 2 A? x.
16 Belief Contraction in the Context of the General Theory of Rational Choice 289
Ci D C \ Bi for any set of propositions C. Then, in the context of this section, C
is a preferred x-discarding subset of B if C B and for every i, Ci is an inclusion
maximal subset of Bi subject to the condition that x is not implied.
Two kinds of information are used in prioritized base contraction: the syntactical
information encoded in the structure of the propositions in B, and the weighting of
these propositions expressed by . If the belief base is simple, i.e., if n D 1, then
we only exploit syntactical information. Prioritized base contractions and revisions
are studied in an infinitistic framework by Nebel (1989, 1992).9 He shows that
prioritized base revisions can be represented by partial meet theory revisions. Then
he proves that a certain variant of simple base contraction satisfies ( : 1) – ( : 7)
but not ( : 8), and a corresponding result for prioritized base revisions. However,
Nebel leaves open the question which logic is characterized by prioritized base
contractions. Building on Theorem 3 and its corollary, we shall answer this question
for the finite case. Our concern will be the slightly more complicated case of
prioritized base contractions, and we try to make our proof more transparent by
introducing the concept of a base-oriented selection function.
Now consider the following strict preference relations between arbitrary sets of
propositions. For every i 2 f1; : : : ; ng and any two sets C and C0 , we write C Îi C0
if and only if Ci Ci0 and Cj Cj0 for every j > i. We write C Î C00 if there is an
i such that C Îi C0 . The relation Î is to reflect the fact that intuitively a set C0 is
better than another set C just in case it contains more important beliefs than C. In
particular, Î satisfies a version of Hansson’s (1992) maximizing property, because
C \ B C0 \ B implies C Î C0 . It is immediately verified that C is a preferred
x-discarding subset of B if and only if it is a maximal element in B? x under Î.
(Recall that B? x is the set fN B W x … Cn.N/ and x 2 Cn.D/ for all D with
N D Bg.)
Following Nebel, we let Î be the converse complement of Î, i.e.,
C Î C0 iff not C0 Î C. We denote this preference relation over arbitrary sets
of propositions by P.hBi i/. Clearly, since Î is irreflexive, asymmetric, conversely
well-founded and transitive, P.hBi i/ is reflexive, connected, negatively well-
founded and negatively transitive. But P.hBi i/ is not transitive, not even in the
special case of simple belief bases (n D 1) where Î coincides with 6.
As before, a selection function
D S.Î/ can be defined by stringent
maximization. But this time
is a selection function for the base B as well as
for the theory A, and its domain may in fact be construed as the class of all
L
nonempty sets of sets propositions in L, that is 22 f;g. In the following, it is
understood that
.B? x/ D fN 2 B? x W N 0 N for all N 0 2 B? xg, while
.A? x/ D fM 2 A? x W M 0 Î M for all M 0 2 A? xg. Either way,
is a selection
function.
9
Nebel’s (1992) treatment of the fully general infinite case is not quite correct. Slips have crept
into his claim that his C + is nonempty, into his definition (9) of , and into the proof
of Proposition 8. As Nebel (personal communication) has suggested, they can be remedied by
imposing a condition of converse well-foundedness on .
290 H. Rott
Let hBi i be a prioritized belief base for A, and let
D S.P.hBi i// be the
selection function over A? determined by hBi i. A straightforward idea to get
prioritized base contractions is the following:
T
: fCn.N/ W N 2
.B? x/g for every x 2 A Cn.;/
A xD
A for every x … A and every x 2 Cn.;/
T (n D 1) we have
.B? x/ D B? x, so
In the special case of simple belief bases
this definition boils down to the full meet fB? xg.
Contraction functions obtained in this way satisfy most but not all of the
postulates we would want them to satisfy.
Theorem 5. If : is a prioritized base contraction function as determined by the
above definition, then it satisfies ( : 1) – ( : 4), ( : 6), ( : 7), and ( : 8c). However, even
if the base is simple, : will in general fail to satisfy ( : 5) and ( : 8r).
Proof. It is obvious that : satisfies ( : 1) – ( : 4) and ( : 6).
For ( : 7), assume that N ` z for every N 2
.B? x/ [
.B? y/. We need to show
that N 0 ` z for every N 0 2
.B? .x ^ y//. Let N 0 2
.B? .x ^ y// B? .x ^ y/.
First we note that every element of B? .x ^ y/ is either in B? x or in B? y. Without
loss of generality, assume that N 0 2 B? x. We are ready if we can show that N 0 is in
.B? x/. Suppose it is not. Then there is an N1 2 B? x such that N 0 Î N1 . N1 must
not be in B? .x^y/, since N 0 is in
.B? .x^y//. Because N1 2 B? x B? .x^y/, we
get that there is a proper superset N2 of N1 such that N2 2 B? y. Since N1 2 B? x,
N2 ` x. Since every proper superset of N2 satisfies both x and y, N2 is in B? .x ^ y/.
By the maximizing property, we have N1 Î N2 . By the transitivity of Î, we get
N 0 Î N2 . This, however, contradicts N 0 2
.B? .x ^ y// and N2 2 B? .x ^ y/.
Therefore our supposition must have been false.
For ( : 8c), assume that, first, N ` y and, second, N ` z for every N 2
.B? .x ^
y//. We need to show that N 0 ` z for every N 0 2
.B? x/. Let N 0 2
.B? x/ and
suppose for reductio that N 0 6` z. If N 0 is in B? .x ^ y/ then there must be, by the
second assumption, an N1 in B? .x^y/ such that N 0 Î N1 . In fact, N1 can be chosen
from
.B? .x ^ y//, by the transitivity and converse well-foundedness of Î. By the
first assumption, N1 2 B? x. But this contradicts N 0 Î N1 and N 0 2
.B? x/.
Hence N 0 cannot be in B? .x ^ y/. But because N 0 6` x ^ y, there must be a proper
superset N2 of N 0 in B? .x ^ y/. We know that N2 ` x, by N 0 2 B? x. By the first
assumption then, N2 cannot be in
.B? .x ^ y//. By the transitivity and converse
well-foundedness of Î, there must be an N3 in
.B? .x ^ y// such that N2 Î N3 .
By our first assumption, N3 is in B? x. By the maximizing property, we have N 0 Î
N2 Î N3 , so by transitivity N 0 Î N3 . This, however, contradicts N 0 2
.B? x/ and
N3 2 B? x. Therefore our supposition must have been false.
The failure of ( : 5) is obvious. Consider for instance B D fp ^ qg, for which
A D Cn.p ^ q/ 6 Cn..A : p/ [ fpg/ D Cn.p/.
The failure of ( : 8r) is not so easy to see. The simplest examples involve three
atoms p; q; r and look somewhat like the following. Let B consist of the following
four propositions:
16 Belief Contraction in the Context of the General Theory of Rational Choice 291
.p ^ q/ _ .p ^ r/ _ .q ^ r/; r ! .p $ q/;
q ^ .p $ r/; p ^ r:
and
Therefore, the above account yields A : p D Cn.q _.p ^r// and A : q D Cn.:.:p ^
q ^ r//, so Cn.A : p [ A : q/ D Cn..p ^ r/ _ .q ^ :r//. On the other hand,
A:xD
T
fCn.N [ fx ! ag/ W N 2
.B? x/g10 for every x 2 A Cn.;/
A for every x … A and x 2 Cn.;/
10
T : 0
Or equivalently, Cn.. fCn.N/ W N 2
.B? x/g/ [ fx ! ag/ D Cn.A x [ fx ! ag/ where
:
A 0 x follows the first definition.
11
Nebel’s (1992) later paper deals with revisions where this problem does not arise. In fact, if
:
revisions are construed as being generated by the so-called Levi-identity A x D Cn..A :x/ [
fxg/, then the modification made in our official definition does not have any effect on revisions.
292 H. Rott
of this, note that the official definition cures our above counterexample to ( : 8r)
by strengthening A : x to Cn.y ^ .x ! z// and A : y to Cn.y ! .x ^ z//, so that
Cn.A : x [ A : y/ D Cn.x ^ y ^ z/. Since this has been the full theory A, ( : 8r) is
clearly satisfied.
We denote the prioritized base contraction over A determined by the prioritized
belief base hBi i by C.hBi i/. A contraction function : over A is a simple (or
prioritized) base contraction function if there is a simple belief base B (a prioritized
belief base hBi i) for A such that : D C.B/ (respectively, : D C.hBi i/). In
conformity with the terminology of AGM (1985), simple base contractions could
also be named full meet base contractions.
S
Lemma 9. Let B D Bi , A D Cn.B/, M 2 A? x, and let
D S.P.hBi i//. Then
M 2
.A? x/ iff N M for some N 2
.B? x/.
Let us call selection functions which harmonize choices in A? with choices in
B? in the way exhibited in Lemma 9 base-oriented. The following result shows
that the selection function
induced by a prioritized belief base leads to the same
result when applied directly to belief bases as when applied to the generated theory.
Prioritized base contraction thus reduces to a special kind of partial meet theory
contraction. A proof of almost the same fact has already been given by Nebel (1989,
Theorem 14, 1992, Theorem 7). Our proof is somewhat more general in that it only
turns on the fact that
is base-oriented. Nothing hinges on the particular definition
of Î.
Theorem 6. Let hBi i be a prioritized belief base for a logically finite theory A, and
let
D S.P.hBi i//. Then C.hBi i/ D C.
/.
Proof. For the principal case x 2 A Cn.;/, we have to show that
\ \
fCn.N [ fx ! ag/ W N 2
.B? x/g D
.A? x/:
We are now going to construct, for an arbitrary given preference relation over
a logically finite theory A, an equivalent simple belief base B for A. This base will be
denoted B./. For M 2 UA , let M O be the set fM 0 2 UA W M 0 6 Mg[fMg of elements
in UA which are notW covered by M, together with M itself. Let m O be an abbreviation
for the disjunction fm0 W M 0 2 Mg, O with each m0 being a representative of the
16 Belief Contraction in the Context of the General Theory of Rational Choice 293
Theorem 7 tells us that for every prioritized belief base hBi i for A there is
a simple belief base B for A such that C.B/ D C.hBi i/. In the finite case,
every prioritization can be encoded syntactically. Does this mean that prioritization
is superfluous? In answering this question we first have to emphasize that our
generation of B from hBi i took a rather roundabout route: B D B.P.hBi i//. An
interesting problem now is whether a more perspicuous construction of B from hBi i
is possible. This question, too, is put in informal terms, and as such it permits no
absolutely precise answer. Still we think that the answer must be no. Even for
the most straightforward prioritized belief bases hBi i the generated simple base
B D B.P.hBi i// becomes grossly unintuitive, and there is no prospect of finding
different solutions to the problem. Consider for example the base containing p and
p ! q with the alternative prioritizations hfp ! qg; fpgi and hfpg; fp ! qgi. In the
former case, B D fp ^ q; p; p _ q; q ! pg, while in the latter case, B0 D fq; p $ qg
will lead to exactly the same results as the prioritized belief base. But in neither case
is there anything like a transparent relation to the original base hBi i. It appears that
prioritization is useful, notwithstanding its formal dispensability in the finite case.
This question does not have a simple answer. Three different points have to be
taken into consideration.
First, the approaches of AGM and KLM are not as distinct as Makinson and
Gärdenfors seemed to assume. AGM contract and revise by single propositions, and
similarly KLM consider the nonmonotonic consequences of simple propositions.
Makinson and Gärdenfors’s equation y 2 A x iff x j y (iff y 2 C.x/) fully reflects
this. A truly infinitistic stance towards both theory change and nonmonotonic logic
is taken only by Lindström (1991). The question of whether the theory A is logically
finite has no bearing on this issue. Here Makinson and Gärdenfors saw a difference
which simply does not exist.
Secondly, Makinson and Gärdenfors tacitly passed over the fact that in KLM
there is no counterpart to (8) or ( : 8). But this difference is indeed crucial, as is
clear from a later paper of Lehmann and Magidor (1992). As regards the preferential
logics dealt with by KLM, no “relatively short and abstract” proof seems to
be possible. Thus it appears reasonable to construe Makinson and Gärdenfors’s
question as referring to the rational logics treated by Lehmann and Magidor.
But thirdly, Lehmann and Magidor’s rational logics still differ from AGM-style
theory revisions in that they are not required to satisfy a condition of consistency
preservation which corresponds to postulate ( : 4) for contractions. In this respect,
Makinson and Gärdenfors show a keen sense of the intricacies of the situation.
In unpublished notes, we have applied the techniques developed in this paper to
the problem of providing rational logics with canonical models. We have found
that it is possible to transfer our proof to the area of nonmonotonic logic, but
that consistency preservation with respect to the underlying monotonic logic Cn
is, in fact, indispensable for a perfect matching. The reason is that the compactness
property presupposed for Cn runs idle if there are any “inaccessible worlds” (and
this is just what a violation of consistency preservation amounts to). Since the results
of our efforts bear considerable similarity with the venerable presentation in Lewis
(1973)—except for the fact that the role of Lewis’s extra-linguistic propositions (sets
of “real” possible worlds) is played by the propositions of the object language—we
had better refrain from expounding them here in more detail.
12
This appendix was not included in the original 1993 publication of this paper.
296 H. Rott
RHS LHS: Let x 2 SS and assume that for every y 2 S, Sfx; yg 2 X and
x 2
.fx; yg/. Note thatS ffx; yg W y 2 Sg D S 2 X and x 2 ffx; yg W y 2 Sg.
By (II), we get x 2
. ffx; yg W y 2 Sg/ D
.S/, as desired.
(c) Let
be 12-covering and satisfy (I).
P2 .
/ P.
/: This direction is always valid.
P.
/ P2 .
/: Let x
y. Then there is an S 2 X such that y 2
.S/ S
and x 2 S. Because
is 12-covering, fx; yg 2 X. But as fx; yg S, (I) gives us
y 2
.fx; yg/, so x
;2 y.
(d) Follows from (b) and (c).
(e) Follows from (a), (b), and (d).
(f) Let
be additive and satisfy (I) and (II). We have to show that for every S 2 X,
.S/ D fx 2 S W for all y 2 S there is a T 2 X such that x 2
.T/ and y 2 Tg.
LHS RHS: Take T D S.
RHS LHS: Let x 2 S and assume T that for all yS2 S there is a T y such that
x 2
.T / and y 2 T . Thus x 2 S
.T y / and S T y . From the former and
y y
the additivity of
we get x 2
. T y /, by (II), and now the latter and (I) yield
x 2
.S/, as desired. Q . E. D .
Proof of Lemma 2. (a) Let
be 12n-covering and satisfy (I). We show the claim for
P2 .
/, which is identical with P.
/, by Lemma 1(c). Suppose for reductio that
x1 <2
x2 <2
<2
xn <2
x1 for some x1 ; : : : xn 2 X. That is,
.fxi ; xiC1 g/ D
fxiC1 g with C denoting addition modulo n. Now consider
.fx1; : : : ; xn g/ 6D ;.
Let xk 2
.fx1 ; : : : ; xn g/. But xk …
.fxk ; xk C 1g/. This is in contradiction with
(I). The rest of (a) is trivial.
(b) Let
be 123-covering and satisfy (I) and (III). We show the claim for P2 .
/,
which is identical with P.
/, by Lemma 1(c). Assume that x 6
;2 y and
y 6
;2 z. By definition of
;2 , this means that y …
.fx; yg/ and z …
.fy; zg/.
Now consider
.fx; y; zg/. By (I), y …
.fx; y; zg/ and z …
.fx; y; zg/, so
.fx; y; zg/ D fxg since
.fx; y; zg/ must be a non-empty subset of fx; y; zg.
By (I), x 2
.fx; zg/, so
.fx; y; zg/
.fx; zg/. Hence, by (III),
.fx; zg/
.fx; y; zg/ D fxg, so z …
.fx; zg/, i.e. x 6
;2 z, as desired.
(c) Let
be finitely additive and satisfy (IV), and let x
y and y
z. This
means that there is an S 2 X such that y 2
.S/ and x 2 S and a T 2 X such
that z 2
.T/ and y 2 T. Now consider S [ T. Obviously, x 2 S [ T. In order
to show that x
z it suffices to show that z 2
.S [ T/. By finite additivity,
S [ T 2 X. By z 2
.T/ and (IV), it suffices to show that
.S [ T/ \ T 6D ;.
Suppose for reductio that
.S [ T/ \ T D ;. Then, since ; 6D
.S [ T/ S [ T,
.S [ T/ \ S 6D ;. So by (IV),
.S/
.S [ T/. So y 2
.S [ T/. But since
also y 2 T,
.S [ T/ \ T 6D ; after all, and we have a contradiction.
(Notice that an attempted proof of the transitivity of
;2 would also need,
apart from the 123-covering condition, (I) in order to come from z 2
.fx; y; zg/
to z 2
.fx; zg/, so we can rest content with (c).) Q . E. D .
Proof of Lemma 3. (a) Assume that S./ is no selection function over X. By the
definition of S./, this can only happen if some S 2 X possesses no greatest
element under . Thus for every xi 2 S there is an xiC1 2 S such that xiC1 6< x1 .
16 Belief Contraction in the Context of the General Theory of Rational Choice 297
This, however, contradicts smoothness. That S./ satisfies (I) and (II) follows
from Lemma 1(a).
(b) Let
be relational with respect to some negatively transitive and negatively
well-founded relation . (Or alternatively, let
be subtractive and relational
with respect to some negatively transitive and X-smooth relation .) Let S; S0 2
X, S S0 and
.S0 /
.S/. We want to show that
.S/
.S0 /. Suppose for
reductio that this is not the case, i.e., that there is some x which is in
.S/ but
not in
.S0 /. The latter means that there is some y1 in S0 such that y1 6 x. As
x 2
.S/, y1 2 S0 S. But because
.S0/
.S/ S, y1 …
.S0 /. So there is
some y2 2 S0 such that y2 6 y1 . By negative transitivity, y2 6 x. So by the same
reasoning as before, y2 2 S0 S and y2 …
.S0 /. So there is some y3 2 S0 such
that y3 6 y2 . By negative transitivity again, y3 6 x, and the same reasoning can
be continued again and again. What we get is an infinite chain y1 ; y2 ; y3 ; : : : in
S0 S such that 6 y3 6 y2 6 y1 . But this contradicts the negative well-
foundedness of (or the subtractivity of
, which guarantees that S0 S 2 X,
and smoothness of ).
(c) Let
be relational with respect to some transitive relation . Let S; S0 2 X,
S S0 and x 2
.S0 / \ S. By relationality, the latter conditions says that y x
for all y 2 S0 . Let z 2
.S/. We have to show that z 2
.S0 /, i.e., by relationality,
that y z for all y 2 S0 . But since x 2 S and z 2
.S/, x z, so since y x for
all y 2 S0 , the desired conclusion follows from the transitivity of . Q . E. D .
S
In order to see that (I) and (II) taken together imply S that fSi W
T (I&II), we note
i 2 Ig 2 X, bySadditivity. So (II) gives us SS \
.Si / S \
. ST i /, and (I)
gives us S \
. Si /
.S/, whenever S Si . So in this case S \
.Si /
.S/, as desired.
(f) Similar to (e). Q . E. D .
positive integers playing the role of the odd ones in the previous case.
We find that A : q and A : r do not even jointly imply p0 . For consider the
maximally consistent set W containing :pi for every i 0, together with :q
and :r. Considering what we have just said about A : q and A : r, it becomes clear
that W contains both of these contracted theories, as well as :p0 .
Since p0 2 A : .q ^ r/ but p0 … Cn.A : q [ A : r/, we have a violation of ( : 8r).
16 Belief Contraction in the Context of the General Theory of Rational Choice 299
¬r
¬q
¬pi W
p5
p4 “best” in ¬q
p3
“best” in ¬r
p2
p1
p0
pi , q, r W =A
:
Fig. 16.3 Relational partial meet contraction not satisfying ( 8r)
p
q
r
A . ´p
A . ´q
A .´p ∩ A .´q
A . ´p ∧ q
A. p
A. q
A . p∩A . q
A . p∧ q
: 0 :
Fig. 16.4 Prioritized base contraction with simple base violating ( 8r), and strengthened
: :
contraction satisfying ( 8r)
16 Belief Contraction in the Context of the General Theory of Rational Choice 301
p∨q
p ∨¬q ( p → q) ∧ ( p ↔ q) ¬p ∨ q
(p→ q) ∧ q (q → p) ∧ ( p ↔ q) p∧p
( p ∨ q) ∧ q p∧p
p q
( p → q) ∧ (p → q) p ↔q p ∧ (q → p)
⊤∧ (p → q) p ∧ ( p∨ q) p ∧ (q → p)
p ∧ ( p∨ q)
p∧q
p∧⊤
Fig. 16.5 Prioritized base contraction of a two-element base, with p having priority over p ! q
p∨q
p ∨¬q ( p → q) ∧ ( p ↔ q) ¬p ∨ q
(p→ q) ∧ q ( p ↔ q) ∧ ( p ↔ q) p∧p
q∧ q ⊤∧ p
p q
( p → q) ∧ (p → q) p ↔q ( p → q) ∧ ( q → p)
( p → q) ∧ (p → q) ( p → q) ∧ ( p ∨ q) ( p ↔ q) ∧ ( q → p)
q ∧ ( p ∨ q)
p∧q
( p → q) ∧ ⊤
( p → q) ∧ ⊤
Fig. 16.6 Prioritized base contraction of a two-element base, with p ! q having priority over p
Remarks. Prioritized and corresponding simple base contractions indeed lead to the
same results—as they should, according to the proof of Theorem 7. But note that
the recovery-guaranteeing appendage is essential in quite a few cases. As for the
contraction function : , the differences in prioritization are effective only in the case
of A : .p ^ q/, A : q and A : .p $ q/.
302 H. Rott
References
Alchourrón, C., Gärdenfors, P., & Makinson, D. (1985). On the logic of theory change: Partial
meet contraction functions and their associated revision functions. Journal of Symbolic Logic,
50, 510–530.
Alchourrón, C., & Makinson, D. (1986). Maps between some different kinds of contraction
function: The finite case. Studia Logica, 45, 187–198.
Arrow, K. J. (1959). Rational choice functions and orderings. Economica, N.S., 26, 121–127.
Chernoff, H. (1954). Rational selection of decision functions. Econometrica, 22, 422–443.
de Condorcet, N. (1785). Essai sur l’application de l’analyse á la probabilité des décisions rendues
á la pluralité des voix. Paris: Imprimerie Royale. Reprint Cambridge: Cambridge University
Press (2014).
Fuhrmann, A., & Morreau, M. (Eds.) (1991). The logic of theory change (Lecture notes in computer
science, Vol. 465). Berlin: Springer.
Gärdenfors, P. (1988). Knowledge in flux: Modeling the dynamics of epistemic states. Cambridge:
Bradford Books/MIT.
Gärdenfors, P. (Ed.) (1992). Belief revision. Cambridge: Cambridge University Press.
Gärdenfors, P., & Makinson, D. (1988). Revisions of knowledge systems using epistemic
entrenchment. In M. Vardi (Ed.), Theoretical aspects of reasoning about knowledge (pp. 83–
95). Los Altos: Morgan Kaufmann.
Grove, A. (1988). Two modellings for theory change. Journal of Philosophical Logic, 17, 157–170.
Hansson, S. O. (1992). Similarity semantics and minimal changes of belief. Erkenntnis, 37,
401–429.
Herzberger, H. G. (1973). Ordinal preference and rational choice. Econometrica, 41, 187–237.
Katsuno, H., & Mendelzon, A. O. (1991). Propositional knowledge base revision and minimal
change. Artificial Intelligence, 52, 263–294.
Kraus, S., Lehmann, D., & Magidor, M. (1990). Nonmonotonic reasoning, preferential models and
cumulative logics. Artificial Intelligence, 44, 167–207.
Lehmann, D., & Magidor, M. (1992). What does a conditional knowledge base entail? Artificial
Intelligence, 55, 1–60.
Lewis, D. (1973). Counterfactuals. Oxford: Blackwell.
Lewis, D. (1981). Ordering semantics and premise semantics for counterfactuals. Journal of
Philosophical Logic, 10, 217–234.
Lindström, S. (1991). A semantic approach to nonmonotonic reasoning: Inference and choice.
University of Uppsala, April 1991 (manuscript).
Makinson, D., & Gärdenfors, P. (1991). Relations between the logic of theory change and
nonmonotonic logic. In Fuhrmann & Morreau (1991) (pp. 185–205).
Nebel, B. (1989). A knowledge level analysis of belief revision. In R. Brachman, H. Levesque, &
R. Reiter (Eds.), Proceedings of the 1st International Conference on Principles of Knowledge
Representation and Reasoning (pp. 301–311). San Mateo: Morgan Kaufmann.
Nebel, B. (1992). Syntax-based approaches to belief revision. In Gärdenfors (1992) (pp. 52–88).
Rott, H. (1991). Two methods of constructing contractions and revisions of knowledge systems.
Journal of Philosophical Logic, 20, 149–173.
Rott, H. (1992a). On the logic of theory change: More maps between different kinds of contraction
function. In Gärdenfors (1992) (pp. 122–141).
Rott, H. (1992b). Preferential belief change using generalized epistemic entrenchment. Journal of
Logic, Language and Information, 1, 45–78.
Rott, H. (2003). Basic entrenchment. Studia Logica, 73, 257–280.
Samuelson, P. A. (1950). The problem of integrability in utility theory. Economica, N.S., 17,
355–381.
Sen, A. K. (1982) Choice, welfare and measurement. Oxford: Blackwell.
Suzumura, K. (1983) Rational choice, collective decisions, and social welfare. Cambridge:
Cambridge University Press.
Uzawa, H. (1956). Note on preference and axioms of choice. Annals of the Institute of Statistical
Mathematics, 8, 35–40.
Chapter 17
A Survey of Ranking Theory
Wolfgang Spohn
Introduction
W. Spohn ()
Fachbereich Philosophie, Universität Konstanz, 78457 Konstanz, Germany
e-mail: [email protected]
I use ‘Baconian probability’ as a collective term for the alternative ideas. This
is legitimate since there are strong family resemblances among the alternatives.
Cohen has chosen an apt term since it gives historical depth to ideas that can be
traced back at least to Bacon (1620) and his powerful description of ‘the method
of lawful induction’. Jacob Bernoulli and Johann Heinrich Lambert struggled with
a non-additive kind of probability. When Joseph Butler and David Hume spoke of
probability, they often seemed to have something else or more general in mind than
our precise explication. In contrast to the German Fries school British nineteenth
century’s philosophers like John Herschel, William Whewell, and John Stuart Mill
elaborated non-probabilistic methods of inductive inference. And so forth.1
Still, one might call this an underground movement. The case of alternative forms
of belief became a distinct hearing only in the second half of the twentieth century.
On the one hand, there were scattered attempts like the ‘functions of potential
surprise’ of Shackle (1949), heavily used and propagated in the epistemology of
Isaac Levi since his (1967), Rescher’s (1964) account of hypothetical reasoning,
further developed in his (1976) into an account of plausible reasoning, or Cohen’s
(1970) account of induction which he developed in his (1977) under the label ‘Non-
Pascalian probability’, later on called ‘Baconian’. On the other hand, one should
think that modern philosophy of science with its deep interest in theory confirmation
and theory change produced alternatives as well. Indeed, Popper’s hypothetical-
deductive method proceeded non-probabilistically, and Hempel (1945) started a
vigorous search for a qualitative confirmation theory. However, the former became
popular rather among scientists than among philosophers, and the latter petered out
after 25 years, at least temporarily.
I perceive all this rather as a prelude, preparing the grounds. The outburst came
only in the mid 70s, with strong help from philosophers, but heavily driven by the
needs of Artificial Intelligence. Not only deductive, but also inductive reasoning
had to be implemented in the computer, probabilities appeared intractable2 , and
thus a host of alternative models were invented: a plurality of default logics, non-
monotonic logics and defeasible reasonings, fuzzy logic as developed by Zadeh
(1975, 1978), possibility theory as initiated by Zadeh (1978) and developed by
Dubois and Prade (1988), the Dempster-Shafer belief functions originating from
Dempster (1967, 1968), but essentially generalized by Shafer (1976), AGM belief
revision theory (cf. Gärdenfors 1988), a philosophical contribution with great
success in the AI market, Pollock’s theory of defeasible reasoning (summarized
in Pollock 1995), and so forth. The field has become rich and complex. There are
attempts of unification like Halpern (2003) and huge handbooks like Gabbay et al.
(1994). One hardly sees the wood for trees. It seems that what had been forgotten
for centuries had to be made good for within decades.
1
This is not the place for a historical account. See, e.g., Cohen (1980) and Shafer (1978) for some
details.
2
Only Pearl (1988) showed how to systematically deal with probabilities without exponential
computational explosion.
17 A Survey of Ranking Theory 305
Ranking theory, first presented in Spohn (1983, 1988)3, belongs to this field as
well. Since its development, by me and others, is scattered in a number of papers,
one goal of the present paper is to present an accessible survey of the present
state of ranking theory.4 This survey will emphasize the philosophical applications,
thus reflecting my bias towards philosophy. My other goal is justificatory. Of
course, I am not so blinded to claim that ranking theory would be the adequate
account of Baconian probability. As I said, ‘Baconian probability’ stands for a
collection of ideas united by family resemblances; and I shall note some of the
central resemblances in the course of the paper. However, there is a multitude of
epistemological purposes to serve, and it is entirely implausible that there is one
account to serve all. Hence, postulating a reign of probability is silly, and postulating
a duumvirate of probability and something else is so, too. Still, I am not disposed to
see ranking theory as just one offer among many. On many scores, ranking theory
seems to me to be superior to rival accounts, the central score being the notion
of conditional ranks. I shall explain what these scores are, thus trying to establish
ranking theory as one particularly useful account of the laws of thought.
The plan of the paper is simple. In the five subsections of section “The theory”,
pp. 305ff, I shall outline the main aspects of ranking theory. This central section
will take some time. I expect the reader to get impatient meanwhile; you will get the
strong impression that I am not presenting an alternative to (Pascalian) probability,
as the label ‘Baconian’ suggests, but simply probability itself in a different disguise.
This is indeed one way to view ranking theory, and a way, I think, to understand
its virtues. However, the complex relation between probability and ranking theory,
though suggested at many earlier points, will be systematically discussed only in
section “Ranks and probabilities”, pp. 328ff. The section “Further comparisons”,
pp. 335ff, will finally compare ranking theory to some other accounts of Baconian
probability.
The Theory
Basics
We have to start with fixing the objects of the cognitive attitudes we are going to
describe. This is a philosophically highly contested issue, but here we shall stay
conventional without discussion. These objects are pure contents, i.e., propositions.
3
There I called its objects ordinal conditional functions. Goldszmidt and Pearl (1996) started
calling them ranking functions, a usage I happily adapted.
4
In the meantime, my comprehensive book on ranking theory (Spohn 2012) has appeared. This
paper may also serve as an introduction to this book. Reversely, various topics, which are only
touched here and then referred back to older papers of mine, are developed in this book in a better
and more comprehensive way.
306 W. Spohn
›.A [ B/ D min f›.A/; ›.B/g Œthe law of disjunction .for negative ranks/ :
(17.2)
5
For systematic reasons I am slightly rearranging my terminology from earlier papers. I would be
happy if the present terminology became the official one.
17 A Survey of Ranking Theory 307
(B), being a penguin (P), and being able to fly (F). This makes for eight possibilities.
Suppose you have no idea what Tweetie is, for all you know it might even be a car.
Then your ranking function may be the following one, for instance:6
In this case, the strongest proposition you believe is that Tweetie is either no
penguin and no bird B&P or a flying bird and no penguin (F & B & P). Hence,
you neither believe that Tweetie is a bird (B) nor that it is not a bird (B). You are
also neutral concerning its ability to fly. But you believe, for instance: if Tweetie is a
bird, it is not a penguin and can fly B ! P&F ; and if Tweetie is not a bird, it is not
a penguin B ! P – each if-then taken as material implication. In this sense you
also believe: if Tweetie is a penguin, it can fly .P ! F/; and if Tweetie is a penguin,
it cannot fly P ! F – but only because you believe that it is not a penguin in the
first place; you simply do not reckon with its being a penguin. If we understand
the if-then differently, as we shall do later on, the picture changes. The larger ranks
in the last column indicate that you strongly disbelieve that penguins are not birds.
And so we may discover even more features of this example.
What I have explained so far makes clear that we have already reached the first
fundamental aim ranking functions are designed for: the representation of belief.
Indeed, we may define B› D fA j ›.A/ > 0g to be the belief set associated with the
ranking function ›. This belief set is finitely consistent in the sense that whenever
A1 , : : : , An 2 B› , then A1 \ : : : \ An ¤ ¿; this is an immediate consequence of
the law of negation. And it is finitely deductively closed in the sense that whenever
A1 , : : : , An 2 B› and A1 \ : : : \ An B 2 A, then B 2 B› ; this is an immediate
consequence of the law of disjunction. Thus, belief sets just have the properties they
are normally assumed to have. (The finiteness qualification is a little cause for worry
that will be addressed soon.)
There is a big argument about the rationality postulates of consistency and
deductive closure; we should not enter it here. Let me only say that I am
disappointed by all the attempts I have seen to weaken these postulates. And let
me point out that the issue was essentially decided at the outset when we assumed
belief to operate on propositions or truth-conditions or sets of possibilities. With
these assumptions we ignore the relation between propositions and their sentential
expressions or modes of presentation; and it is this relation where all the problems
hide.
6
I am choosing the ranks in an arbitrary, though intuitively plausible way (just as I would have to
arbitrarily choose plausible subjective probabilities, if the example were a probabilistic one). The
question how ranks may be measured will be taken up in section “The dynamics of belief and the
measurement of belief”, pp. 316ff.
308 W. Spohn
When saying that ranking functions represent belief I do not want to further
qualify this. One finds various notions in the literature, full beliefs, strong beliefs,
weak beliefs, one finds a distinction of acceptance and belief, etc. In my view, these
notions and distinctions do not respond to any settled intuitions; they are rather
induced by various theoretical accounts. Intuitively, there is only one perhaps not
very clear, but certainly not clearly divisible phenomenon which I exchangeably
call believing, accepting, taking to be true, etc.
However, if the representation of belief were our only aim, belief sets or their
logical counterparts as developed in doxastic logic (see already Hintikka 1962)
would have been good enough. What then is the purpose of the ranks or degrees?
Just to give another account of the intuitively felt fact that belief is graded? But what
guides such accounts? Why should degrees of belief behave like ranks as defined?
Intuitions by themselves are not clear enough to provide this guidance. Worse still,
intuitions are usually tainted by theory; they do not constitute a neutral arbiter.
Indeed, problems already start with the intuitive conflict between representing belief
and representing degrees of belief. By talking of belief simpliciter, as I have just
insisted, I seem to talk of ungraded belief.
The only principled guidance we can get is a theoretical one. The degrees must
serve a clear theoretical purpose and this purpose must be shown to entail their
behavior. For me, the theoretical purpose of ranks is unambiguous; this is why I
invented them. It is the representation of the dynamics of belief ; that is the second
fundamental aim we pursue. How this aim is reached and why it can be reached in
no other way will unfold in the course of this section. This point is essential; as we
shall see, it distinguishes ranking theory from all similarly looking accounts, and it
grounds its superiority.
For the moment, though, let us look at a number of variants of Definition 17.1.
Above I mentioned the finiteness restriction of consistency and deductive closure.
I have always rejected this restriction. An inconsistency is irrational and to be
avoided, be it finitely or infinitely generated. Or, equivalently, if I take to be true
a number of propositions, I take their conjunction to be true as well, even if the
number is infinite. If we accept this, we arrive at a somewhat stronger notion:
Definition 17.2 Let A be a complete algebra over W (closed also under infinite
Boolean operations). Then › is a complete negative ranking function for A iff › is a
function from W into NC D N [ f1g (i.e., into the set of non-negative integers plus
infinity) such that › 1 (0) ¤ ¿ and and › 1 (n) 2 A for each n 2 NC . › is extended
to propositions by defining ›(¿) D 1 and ›(A) D minf›(w) j w 2 Ag for each non-
empty A 2 A.
Obviously, the propositional function satisfies the laws of negation and disjunc-
tion. Moreover, we have for any B A:
Due to completeness, we could start in Definition 17.2 with the point function
and then define the set function as specified. Equivalently, we could have defined
the set functions by the conditions (17.1) and (17.4) and then reduce the set function
to a point function. Henceforth I shall not distinguish between the point and the set
function. Note, though, that without completeness the existence of an underlying
point function is not guaranteed; the relation between point and set function in this
case is completely cleared up in Huber (2006).
Why are complete ranking functions confined to integers? The reason is condi-
tion (17.4). It entails that any infinite set of ranks has a minimum and hence that the
range of a complete ranking function is well-ordered. Hence, the natural numbers
are a natural choice. In my first publications (1983) and (1988) I allowed for more
generality and assumed an arbitrary set of ordinal numbers as the range of a ranking
function. However, since we want to calculate with ranks, this meant to engage into
ordinal arithmetic, which is awkward. Therefore I later confined myself to complete
ranking functions as defined above.
The issue about condition (17.4) was first raised by Lewis (1973, sect. 1.4)
where he introduced the so-called Limit Assumption in relation to his semantics
of counterfactuals. Endorsing (17.4), as I do, is tantamount to endorsing the Limit
Assumption. Lewis finds reason against it, though it does not affect the logic of
counterfactuals. From a semantic point of view, I do not understand his reason. He
requests us to counterfactually suppose that a certain line is longer than an inch
and asks how long it would or might be. He argues in effect that for each © > 0
we should accept as true: “If the line would be longer than 1 inch, it would not
be longer than 1 C © inches.” This strikes me as blatantly inconsistent, even if we
cannot derive a contradiction in counterfactual logic (due to its ¨–incompleteness).
Therefore, I am accepting the Limit Assumption and, correspondingly, the law of
infinite disjunction. This means in particular that in that law the minimum must not
be weakened to the infimum.
Though I prefer complete ranking functions for the reasons given, the issue
will have no further relevance here. In particular, if we assume the algebra of
propositions to be finite, each ranking function is complete, and the issue does not
arise. In the sequel, you can add or delete completeness as you wish.
Let me add another observation apparently of a technical nature. It is that we can
mix ranking functions in order to form a new ranking function. This is the content
of
Definition 17.3 Let ƒ be a non-empty set of negative ranking functions for an
algebra A of propositions, and let ¡ be a complete negative ranking function over ƒ.
Then › defined by
is obviously a negative ranking function for A as well and is called the mixture of ƒ
by ¡.
310 W. Spohn
It is nice that such mixtures make formal sense. However, we shall see in the
course of this paper that the point is more than a technical one; such mixtures will
acquire deep philosophical importance later on.
So far, (degree of) disbelief was our basic notion. Was this necessary? Certainly
not. We might just as well express things in positive terms:
Definition 17.4 Let A be an algebra over W. Then is a positive ranking function
for A iff is a function from A into R* such that for all A, B 2 A:
.A \ B/ D min f .A/; .B/g Œthe law of conjunction for positive ranks: (17.7)
Positive ranks express degrees of belief. (A) > 0 says that A is believed (to some
positive degree), and (A) D 0 says that A is not believed. Obviously, positive ranks
are the dual to negative ranks; if .A/ D ›.A/ for all A 2 A, then is a positive
function iff › is a negative ranking function.
Positive ranking functions seem distinctly more natural. Why do I still prefer
the negative version? A superficial reason is that we have seen complete negative
ranking functions to be reducible to point functions, whereas it would obviously
be ill-conceived to try the same for the positive version. This, however, is only
indicative of the main reason. Despite appearances, we shall soon see that negative
ranks behave very much like probabilities. In fact, this parallel will serve as
our compass for a host of exciting observations. (For instance, in the finite case
probability measures can also be reduced to point functions.) If we were thinking in
positive terms, this parallel would remain concealed.
There is a further notion that may appear even more natural:
Definition 17.5 Let A be an algebra over W. Then £ is a two-sided ranking function7
for A iff £ is a function from A into R [ f 1, 1g such that there is a negative
ranking function › and its positive counterpart for which for all A 2 A:
7
In earlier papers I called this a belief function, obviously an unhappy term which has too many
different uses. This is one reason fort the mild terminological reform proposed in this paper.
17 A Survey of Ranking Theory 311
8
I am grateful to Matthias Hild for making this point clear to me.
312 W. Spohn
This amounts to the highly intuitive assertion that one has to add the degree of
disbelief in B given A to the degree of disbelief in A in order to get the degree of
disbelief in A-and-B.
Moreover, it immediately follows for all A, B 2 A with ›(A) < 1:
This law says that even conditional belief must be consistent. If both, ›(B j A) and
›.B j A/, were > 0, both, B and B, would be believed given A, and this ought to be
excluded, as long as the condition A itself is considered possible.
Indeed, my favorite axiomatization of ranking theory runs reversely, it consists
of the definition of conditional ranks and the conditional law of negation. The
latter says that min f›.A j A [ B/; ›.B j A [ B/g D 0, and this and the definition
of conditional ranks entail that min f›.A/; ›.B/g D ›.A [ B/, i.e., the law of
disjunction. Hence, the only substantial assumption written into ranking functions is
conditional consistency, and it is interesting to see that this entails deductive closure
as well. Huber (2007) has further improved upon this important idea and shown that
ranking theory is indeed nothing but the assumption of dynamic consistency, i.e., the
preservation of consistency under any dynamics of belief. (He parallels, in a way,
the dynamic Dutch book argument for probabilities by replacing its assumption of
no sure loss by the assumption of consistency under all circumstances.)
It is instructive to look at the positive counterpart of negative conditional ranks.
If is the positive ranking function corresponding to the negative ranking
function
›, Definition 17.6 simply translates into: .B j A/ D A [ B A . Defining
A ! B D A [ B as set-theoretical ‘material implication’, we may as well write:
.A ! B/ D .B j A/ C A Œthe law of material implication : (17.10)
Again, this is highly intuitive. It says that the degree of belief in the material
implication A ! B is added up from the degree of belief in its vacuous truth (i.e.,
in A) and the conditional degree of belief of B given A.9 However, again comparing
the negative and the positive version, one can already sense the analogy between
probability and ranking theory from (17.8), but hardly from (17.10). This analogy
will play a great role in the following subsections.
9
Thanks again to Matthias Hild for pointing this out to me.
17 A Survey of Ranking Theory 313
10
A case in point is the so-called problem of old evidence, which has a simple solution in terms of
Popper measures and the second inequality; cf. Joyce (1999, pp. 203ff.).
314 W. Spohn
If A is a reason for B, it must obviously take one of these four forms; and the only
way to have two forms at once is by being a necessary and sufficient reason.11
Talking of reasons here is, I find, natural, but it stirs a nest of vipers. There is
a host of philosophical literature pondering about reasons, justifications, etc. Of
course, this is a field where multifarious philosophical conceptions clash, and it is
not easy to gain an overview over the fighting parties. Here is not the place for
starting a philosophical argument12, but by using the term ‘reason’ I want at least to
submit the claim that the topic may gain enormously by giving a central place to the
above explication of reasons.
To elaborate only a little bit: When philosophers feel forced to make precise their
notion of a (theoretical, not practical) reason, they usually refer to the notion of
a deductive reason, as fully investigated in deductive logic. The deductive reason
relation is reflexive, transitive, and not symmetric. By contrast, Definition 17.7
captures the notion of a deductive or inductive reason. The relation embraces the
deductive relation, but it is reflexive, symmetric, and not transitive. Moreover, the
fact that reasons may be additional or insufficient reasons according to Definition
17.8 has been neglected by the relevant discussion, which was rather occupied
with necessary and/or sufficient reasons. Pursue, though, the use of the latter terms
throughout the history of philosophy. Their deductive explication is standard and
almost always fits. Often, it is clear that the novel inductive explication given by
Definition 17.8 would be inappropriate. Very often, however, the texts are open to
that inductive explication as well, and systematically trying to reinterpret these old
texts would yield a highly interesting research program in my view.
The topic is obviously inexhaustible. Let me take up only one further aspect.
Intuitively, we weigh reasons. This is a most important activity of our mind. We
do not only weigh practical reasons in order to find out what to do, we also weigh
theoretical reasons. We are wondering whether or not we should believe B, we are
searching for reasons speaking in favor or against B, we are weighing these reasons,
and we hopefully reach a conclusion. I am certainly not denying the phenomenon of
11
In earlier publications I spoke of weak instead of insufficient reasons. Thanks to Arthur Merin
who suggested the more appropriate term to me.
12
I attempted to give a partial overview and argument in Spohn (2001a).
17 A Survey of Ranking Theory 315
inference that is also important, but what is represented as an inference often rather
takes the form of such a weighing procedure. ‘Reflective equilibrium’ is a familiar
and somewhat more pompous metaphor for the same thing.
If the balance of reasons is such a central phenomenon the question arises: how
can epistemological theories account for it? The question is less well addressed than
one should think. However, the fact that there is a perfectly natural Bayesian answer
is a very strong and more or less explicit argument in favor of Bayesianism. Let us
take a brief look at how that answer goes:
Let P be a (subjective) probability measure over A and let B be the focal
proposition. Let us look at the simplest case, consisting of one reason A for B
and the automatic counter-reason
A against B. Thus, in analogy to Definition 17.7,
P.B j A/ > P B j A . How does P balance these reasons and thus fit in B? The
answer is simple, we have:
P.B/ D P.B j A/ P.A/ C P B j A P A : (17.12)
This means that the probabilistic balance of reason is a beam balance in the literal
sense. The length of the lever is P.B j A/ P B j A ; the two ends of the lever are
loaded with the weights P(A) and P(A) of the reasons; P(B) divides the lever into
two parts of length P(B j A) – P(B) and P.B/ P B j A representing the strength
of the reasons; and then P(B) must be chosen so that the beam is in balance. Thus
interpreted (17.12) is nothing but the law of levers.
Ranking theory has an answer, too, and I am wondering who else has. According
to ranking theory, the balance of reasons works like a spring balance. Let › be a
negative ranking function for A, £ the corresponding two-sided ranking function,
B the focal proposition, and A a reason for B. So, £.B j A/ > £ B j A . Again, it
easily proved that always £.B j A/ £.B/ £ B j A . But where in between is £(B)
located? A little calculation shows the following specification to be correct:
This does not look as straightforward as the probabilistic beam balance. Still, it is
not so complicated to interpret (17.13) as a spring balance. The idea is that you hook
in the spring at a certain point, that you extend it by the force of reasons, and that
£(B) is where the spring extends. Consider first
the case where x, y > 0. Then you
hook in the spring at point 0 D £ B j A C x and exert the force £(A) on the spring.
Either, this force transcends the lower stopping point –x or the upper stopping point
y. Then the spring extends exactly till the stopping point, as (17.13b C c) say. Or, the
force £(A) is less. Then the spring extends exactly by £(A), according to (17.13d).
316 W. Spohn
The second case is that x D 0 and y > 0. Then you fix the spring at £ B j A , the lower
point of the interval in which £(B) can move. The spring cannot extend below that
point, says (17.13b). But according to (17.13c C d) it can extend above, by the force
£(A), but not beyond the upper stopping point. For the third case x > 0 and y D 0 just
reverse the second picture. In this way, the force of the reason A, represented by its
two-sided rank £(A), pulls the two-sided
rank
of the focal proposition B to its proper
place within the interval £ B j A ; £.B j A/ fixed by the relevant conditional ranks.
I do not want to assess these findings in detail. You might prefer the probabilistic
balance of reasons, a preference I would understand. You might be happy to have at
least one alternative model, an attitude I recommend. Or you may search for further
models of the weighing of reasons; in this case, I wish you good luck. What you
may not do is ignoring the issue; your epistemology is incomplete if it does not
take a stance. And one must be clear about what is required for taking a stance. As
long as one considers positive relevance to be the basic characteristic of reasons, one
must provide some notion of conditional degrees of belief, conditional probabilities,
conditional ranks, or whatever. Without some well-behaved conditionalization one
cannot succeed.
Our next point will be to define a reasonable dynamics for ranking functions
that entails a dynamic for belief. There are many causes which affect our beliefs,
forgetfulness as a necessary evil, drugs as an unnecessary evil, and so on. From a
rational point of view, it is scarcely possible to say anything about such changes.13
The rational changes are due to experience or information. Thus, it seems we have
already solved our task: if › is my present doxastic state and I get informed about
the proposition A, then I move to the conditionalization
›A of › by A. This, however,
would be a bad idea. Recall that we have ›A A D1, i.e., A is believed with
absolute certainty in ›A ; no future evidence could cast any doubt on the information.
This may sometimes happen; but usually information does not come so firmly.
Information may turn out wrong, evidence may be misleading, perception may be
misinterpreted; we should provide for flexibility. How?
One point of our first attempt was correct; if my information consists solely in
the proposition A, this cannot affect my beliefs conditional on A. Likewise it cannot
affect my beliefs conditional on A. Thus, it directly affects only how firmly I believe
A itself. So, how firmly should I believe A? There is no general answer. I propose
to turn this into a parameter of the information process itself; somehow the way
I get informed about A entrenches A in my belief state with a certain firmness x.
13
Although there is a (by far not trivial) decision rule telling that costless memory is never bad,
just as costless information; cf. Spohn (1976/78, sect. 4.4).
17 A Survey of Ranking Theory 317
The point is that as soon as the parameter is fixed and the constancy of the relevant
conditional beliefs is accepted, my posterior belief state is fully determined. This is
the content of
Definition 17.9 Let › be a negative ranking function for A, A 2 A such that ›(A),
*
›.A/ < 1, and
x 2 R . Then the A ! x-conditionalization ›A!x of › is defined by
›.B j A/ for B A;
›A!x .B/ D . From this ›A!x .B/ may be inferred for all
› B j A C x for B A
other B 2 A with the law of disjunction.
Hence, the effect of the A ! x-conditionalization is to shift the possibilities in A
so that ›A!x .A/ D 0 and the possibilities in A (to higher ranks) so
(to lower ranks)
that ›A!x A D x. If one is attached to the idea that evidence consists in nothing but
a proposition, the additional parameter is a mystery. The processing of evidence may
indeed be so automatic that one hardly becomes aware of this parameter. Still, I find
it entirely natural that evidence comes more or less firmly. Consider, for instance,
the proposition: “There are tigers in the Amazon jungle”, and consider six scenarios:
(a) I read a somewhat sensationalist coverage in the yellow press claiming this, (b)
I read a serious article in a serious newspaper claiming this, (c) I hear the Brazilian
government officially announcing that tigers have been discovered in the Amazon
area, (d) I see a documentary in TV claiming to show tigers in the Amazon jungle,
(e) I read an article in Nature by a famous zoologist reporting of tigers there, (f) I
travel there by myself and see the tigers. In all six cases I receive the information
that there are tigers in the Amazon jungle, but with varying and, I find, increasing
certainty.
One might object that the evidence and thus the proposition received is clearly
a different one in each of the scenarios. The crucial point, though, is that we are
dealing here with a fixed algebra A of propositions and that we have nowhere
presupposed that this algebra consists of all propositions whatsoever; indeed, that
would be a doubtful presupposition. Hence A may be course-grained and unable to
represent the propositional differences between the scenarios; the proposition in A
which is directly affected in the various scenarios may be just the proposition that
there are tigers in the Amazon jungle. Still the scenarios may be distinguished by
the firmness parameter.
So, the dynamics of ranking functions I propose is simply this: Suppose › is your
prior doxastic state. Now you receive some information A with firmness x. Then
your posterior state is ›A!x . Your beliefs change accordingly; they are what they
are according to ›A!x . Note that the procedure is iterable. Next, you receive the
information B with firmness y, and so you move to .›A!x /B!y . And so on. This
point will acquire great importance later on.
I should mention, though, that this iterability need not work in full generality.
Let us call a negative ranking function › regular iff ›(A) < 1 for all A ¤ ¿. Then
we obviously have that ›A!x is regular if › is regular and x < 1. Within the realm
of regular ranking functions iteration of changes works without restriction. Outside
this realm you may get problems with the rank 1.
318 W. Spohn
14
Generalized probabilistic conditionalization as originally proposed by Jeffrey was result-oriented
as well. However, Garber (1980) observed that there is also an evidence-oriented version of
generalized probabilistic conditionalization. The relation, though, is not quite as elegant.
17 A Survey of Ranking Theory 319
15
Here it does not carry us far beyond the beginnings. In Spohn (1991, 1999) I have argued for
some stronger rationality requirements and their consequences.
320 W. Spohn
I do not really want to discuss the issue. I only want to point out that we have
already taken a stance insofar as expansions, revisions, and contractions are all
special cases of our A ! x–conditionalization. This is more easily explained in terms
of result-oriented conditionalization:
If ›(A) D 0, i.e., if A is not disbelieved, then ›A!x represents an expansion by A
for any x > 0. If ›.A/ D 0, the expansion is genuine, if ›.A/ > 0, i.e., if A is already
believed in ›, the expansion is vacuous. Are there many different expansions? Yes
and no. Of course, for each x > 0 a different ›A!x results. On the other hand, one
and the same belief set is associated with all these expansions. Hence, the expanded
belief set is uniquely determined.
Similarly for revision. If ›(A) > 0, i.e., if A is disbelieved, then ›A!x represents
a genuine revision by A for any x > 0. In this case, the belief in A must be given up
and along with it many other beliefs; instead, A must be adopted together with many
other beliefs. Again, there are many different revisions, but all of them result in the
same revised belief set.
Finally, if ›(A) D 0, i.e., if A is not disbelieved, then ›A!0 represents contraction
by A. If ›.A/ > 0, i.e., if A is even believed, the contraction is genuine; then belief
in A is given up after contraction and no new belief adopted. If ›.A/ D 0, the
contraction is vacuous; there was nothing to contract in the first place. If ›(A) > 0,
i.e., if A is believed, then ›A!0 D ›A!0 rather represents contraction by A.16
As observed in Spohn (1988, footnote 20) and more fully explained in Gärden-
fors (1988, pp. 73f.), it is easily checked that expansions, revisions, and contractions
thus defined satisfy all of the original AGM postulates (K*1-8) and (K 1-8) (cf.
Gärdenfors 1988, pp. 54–56 and 61–64) (when they are translated from AGM’s
sentential framework into our propositional or set-theoretical one). For those like
me who accept the AGM postulates this is a welcome result.
For the moment, though, it may seem that we have simply reformulated AGM
belief revision theory. This is not so; A ! x-conditionalization is much more general
than the three AGM changes. This is clear from the fact that there are many different
expansions and revisions that cannot be distinguished by the AGM account. It is
perhaps clearest in the case of vacuous expansion that is no change at all in the
AGM framework, but may well be a genuine change in the ranking framework, a
redistribution of ranks which does not affect the surface of beliefs. Another way to
state the same point is that insufficient and additional reasons also drive doxastic
changes, which, however, are inexpressible in the AGM framework. For instance, if
A is still disbelieved in the A"x-conditionalization ›A"x of › (since ›(A) > x), one has
obviously received only an insufficient reason for A, and the A"x-conditionalization
might thus be taken to represent what is called non-prioritized belief revision in the
AGM literature (cf. Hansson 1997).
16
If we accept the idea in section “Basics” (p. 311) of taking the interval [ z, z] of two-sided
ranks as the range of neutrality, contraction seems to become ambiguous as well. However, the
contraction just defined would still be distinguishable as a central contraction since it gives the
contracted proposition central neutrality.
17 A Survey of Ranking Theory 321
This is not the core of the matter, though. The core of the matter is iterated belief
change, which I have put into the center of my considerations in Spohn (1983, sect.
5.3, 1988). As I have argued there, AGM belief revision theory is essentially unable
to account for iterated belief change. I take 20 years of multifarious, but in my
view unsatisfactory attempts to deal with that problem (see the overview in Rott
2008) as confirming my early assessment. By contrast, changes of the type A ! x-
conditionalization are obviously indefinitely iterable.
In fact, my argument in Spohn (1988) was stronger. It was that if AGM belief
revision theory is to be improved so as to adequately deal with the problem of
iterated belief change, ranking theory is the only way to do it. I always considered
this to be a conclusive argument in favor of ranking theory.
This may be so. Still, AGM theorists, and others as well, remained skeptical.
“What exactly is the meaning of numerical ranks?” they asked. One may well
acknowledge that the ranking apparatus works in a smooth and elegant way, has
a lot of explanatory power, etc. But all this does not answer this question. Bayesians
have met this challenge. They have told stories about the operational meaning
of subjective probabilities in terms of betting behavior, they have proposed an
ingenious variety of procedures for measuring this kind of degrees of belief. One
would like to see a comparative achievement for ranking theory.
It exists and is finally presented in Hild and Spohn (2008). There is no space
here to fully develop the argument. However, the basic point can easily be indicated
so as to make the full argument at least plausible. The point is that ranks do not
only account for iterated belief change, but can reversely be measured thereby. This
may at first sound unhelpful. A ! x-conditionalization refers to the number x; so
even if ranks can somehow be measured with the help of such conditionalizations,
we do not seem to provide a fundamental measurement of ranks. Recall, however,
that (central) contraction by A (or A) is just A ! 0-conditionalization and is thus
free of a hidden reference to numerical ranks; it only refers to rank 0 which has a
clear operational or surface interpretation in terms of belief. Hence, the idea is to
measure ranks by means of iterated contractions; if that works, it really provides a
fundamental measurement of ranks that is based only on the beliefs one now has
and one would have after various iterated contractions.
How does the idea work? Recall our observation above that the positive rank of
a material implication A ! B is the sum of the degree of belief in B given A and the
degree of belief in the vacuous truth of A ! B, i.e., of A. Hence, after contraction
by A, belief in the material implication A ! B is equivalent to belief in B given A,
i.e., to the positive relevance of A to B. This is how the reason relation, i.e., positive
relevance, manifests itself in beliefs surviving contractions. Similarly for negative
relevance and irrelevance.
Next observe that positive relevance can be expressed by certain inequalities
for ranks that compare certain differences between ranks (similarly for negative
relevance and irrelevance). This calls for applying the theory of difference measure-
ment, as paradigmatically presented by Krantz et al. (1971, ch. 4).
Let us illustrate how this might work in our Tweetie example, pp. 306f. There we
had specified a ranking function › for the eight propositional atoms, entailing ranks
322 W. Spohn
for all 256 propositions involved. Focusing on the atoms, we are thus dealing with
a realm X D fx1 , : : : x8 g (where x1 D B & P & F, etc.) and a numerical function f
such that
17
For an overview over such proposals see Rott (2008). For somewhat more detailed comparative
remarks see Hild and Spohn (2008, sect. 5).
17 A Survey of Ranking Theory 323
It is worthwhile looking a bit more at the details of belief formation and revision.
For this purpose we should give more structure to propositions. They have a Boolean
structure so far, but we cannot yet compose them from basic propositions as we
intuitively do. A common formal way to do this is to generate propositions from
(random) variables. I identify a variable with the set of its possible values. I intend
variables to be specific ones. E.g., the temperature at March 15, 2005, in Konstanz
(not understood as the actual temperature, but as whatever it may be, say, between
100 and C 100 ı C) is such a variable. Or, to elaborate, if we consider each of
the six general variables temperature, air pressure, wind, humidity, precipitation,
cloudiness at each of the 500 weather stations in Germany twice a day at each of
the 366 days of 2004, we get a collection of 6 500 732 specific variables with
which we can draw a detailed picture of the weather in Germany in 2004.
So, let V be the set of specific variables considered, where each v 2 V is just at
least a binary set. A possible course of events or a possibility, for short, is just a
selection function w for V, i.e., a function w on V such that w(v) 2 v for all v 2V.
Hence, each such function specifies a way how the variables in V may realize. The
set of all possibilities then simply is W D V. As before, propositions are subsets
of W. Now, however, we can say that propositions are about certain variables. Let
X V. Then we say that w, w0 2 W agree on X iff w(v) D w0 (v) for all v 2 X. And
we define that a proposition A is about X V iff, for each w in A, all w0 agreeing
with w on X are in A as well. Let A(X) be the set of propositions about X. Clearly,
A(X) A(Y) for X Y, and A D A(V). In this way, propositions are endowed with
more structure. We may conceive of propositions about single variables as basic
propositions; the whole algebra A is obviously generated by such basic propositions
(at least if V is finite). So much as preparation for the next substantial step.
This step consists in more closely attending to (doxastic) dependence and
independence in ranking terms. In a way, we have already addressed this issue:
dependence is just positive or negative relevance, and independence is irrelevance.
Still, let me state
Definition 17.11 Let › be a negative ranking function for A and A, B, C 2 A. Then
A and B are independent w.r.t. ›, i.e., A ? B, iff .B j A/ D £ B j A , i.e., iff for
˚
˚
independent given the present variable. We have already prepared for explaining
this notion in ranking terms as well.
Definition 17.12 Let › be a ranking function for A D A(V), and let X, Y, Z V be
sets of variables. Then X and Y are independent w.r.t. ›, i.e., X ? Y, iff A ? B for
all A 2 A(X) and all B 2 A(Y). Let moreover Z(Z) be the set of atoms of A(Z), i.e.,
the set of the logically strongest, non-empty proposition in A(Z). Then X and Y are
independent given Z w.r.t. ›, i.e., X ? Y / Z, iff A ? B=C for all A 2 A(X), B 2 A(Y),
and C 2 Z(Z).
In other words, X ? Y=Z iff all propositions about X are independent from all
propositions about Y given any full specification of the variables in Z. Conditional
independence among sets of variables obey the following laws:
Let › be a negative ranking function for A.V/: Then for any mutually disjoint X; Y;
Z; U V:
(a) if X ? Y=Z; then Y ? X=Z [Symmetry],
(b) if X ? Y [ U=Z; then X ? Y=Z and X ? U=Z [Decomposition],
(c) X ? Y [ U=Z; then X ? Y=Z [ U [Weak Union],
(d) X ? Y=Z and X ? U=Z [ Y; then X ? Y [ U=Z [Contraction],
(e) if › is regular and if X ? Y=Z [ U and X ? U=Z [ Y;
then X ? Y [ U=Z [Intersection]
(17.14)
These are nothing but what Pearl (1988, p. 88) calls the graphoid axioms; the
labels are his (cf. p. 84). (Note that law (d), contraction, has nothing to do with
contraction in belief revision theory.) That probabilistic conditional independence
satisfies these laws was first proved in Spohn (1976/78, sect. 3.2) and Dawid (1979).
The ranking Theorem (17.14) was proved in Spohn (1983, sect. 5.3, 1988, sect.
6). I conjectured in 1976, and Pearl conjectured, too, that the graphoid axioms
give a complete characterization of conditional independence. We were disproved,
however, by Studeny (1989) w.r.t. probability measures, but the proof carries over to
ranking functions (cf. Spohn 1994a). Under special conditions, though, the graphoid
axioms are complete, as was proved by Geiger and Pearl (1990) for probability
measures and by Hunter (1991) for ranking functions (cf. again, Spohn 1994a).
I am emphasizing all this, because the main purport of Pearl’s path-breaking
book (1988) is to develop what he calls the theory of Bayesian nets, a theory
that has acquired great importance and is presented in many text books (see, e.g.,
Neapolitan 1990 or Jensen 2001). Pearl makes very clear that the basis of this theory
consists in the graphoid axioms; these allow representing conditional dependence
and independence among sets of variables by Bayesian nets, i.e., by directed acyclic
graphs, the nodes of which are variables. A vertex u ! v of the graph then represents
the fact that v is dependent on u given all the variables preceding v in some given
order, for instance, temporally preceding v. A major point of this theory is that it
can describe in detail how probabilistic change triggered at some node in the net
17 A Survey of Ranking Theory 325
propagates throughout the net. All this is not merely mathematics, it is intuitively
sensible and philosophically highly significant; for instance, inference acquires a
novel and fruitful meaning in the theory of Bayesian nets.
Of course, my point now is that all these virtues carry over to ranking theory
with the help of observation (17.14). The point is obvious, but hardly elaborated;
that should be done.18 It will thus turn out that ranks and hence beliefs can also be
represented and computationally managed in that kind of structure.
This is not yet the end of the story. Spirtes et al. (1993) (see also Pearl 2000)
have made amply clear that probabilistic Bayesian nets have a most natural causal
interpretation; a vertex u ! v then represents that the variable v directly causally
depends on the variable u. Spirtes et al. back up this interpretation, i.e., this
connection of probability and causality, by their three basic axioms: the causal
Markov condition, the minimality condition, and, less importantly, the faithfulness
condition (cf. Spirtes et al. 1993, sect. 3.4). And they go on to develop a really
impressive account of causation and causal inference on the basis of these axioms
and thus upon the theory of Bayesian nets.
Again, all this carries over to ranking theory. Indeed, this is what ranks were
designed for in the first place. In Spohn (1983) I gave an explication of probabilistic
causation that entails the causal Markov condition and the minimality condition,
and also Reichenbach’s principle of the common cause, as I observed later in Spohn
(1994b).19 And I was convinced of the idea that, if the theory of causation is bound
to bifurcate into a deterministic and a probabilistic branch, these two branches
must at least be developed in perfect parallel. Hence, I proposed ranking theory
in Spohn (1983) in order to realize this idea.20 Of course, one has to discuss how
adequate that theory of deterministic causation is, just as the adequacy of the causal
interpretation of Bayesian nets is open to discussion. Here, my point is only that
this deep philosophical perspective lies within reach of ranking theory; it is what
originally drove that theory.
Objective Ranks?
18
It has been done in the meantime. See Hohenadel (2013).
19
I have analyzed the relation between Spirtes’ et al. axiomatic approach to causation and my
definitional approach a bit more thoroughly in Spohn (2001b).
20
For a recent presentation of the account of deterministic causation in terms of ranking functions
and its comparison in particular with David Lewis’ counterfactual approach see Spohn (2006).
326 W. Spohn
Still, one might suspect that I can claim these successes only by turning Bacon
into a fake Pascal. I have never left the Bayesian home, it may seem. Hence, one
might even suspect that ranking theory is superfluous and may be reduced to the
traditional Bayesian point of view. In other words, it is high time to study more
closely the relation between probability and ranking theory. This will be our task in
the next section.
The relation between probabilities and ranks is surprisingly complex and fascinat-
ing. I first turn to the more formal aspects of the comparison before discussing the
philosophical aspects.
Formal Aspects
The reader will have observed since long why ranks behave so much like proba-
bilities. There is obviously a simple translation of probability into ranking theory:
translate the sum of probabilities into the minimum of ranks, the product of
probabilities into the sum of ranks, and the quotient of probabilities into the
difference of ranks. Thereby, the probabilistic law of additivity turns into the law of
disjunction, the probabilistic law of multiplication into the law of conjunction (for
negative ranks), and the definition of conditional probabilities into the definition of
conditional ranks. If the basic axioms and definitions are thus translated, then it is
small wonder that the translation generalizes; take any probabilistic theorem, apply
the above translation to it, and you are almost guaranteed to get a ranking theorem.
This translation is obviously committed to negative ranks; therefore I always favored
negative over positive ranks. However, the translation is not fool-proof; see, e.g.,
Spohn (1994a) for slight failures concerning conditional independence (between
sets of variables) or Spohn (2005a) for slight differences concerning positive and
non-negative instantial relevance. The issue is not completely cleared up.
Is there a deeper reason why this translation works so well? Yes, of course. The
translation of products and quotients of probabilities suggests that negative ranks
simply are the logarithm of probabilities (with respect to some base < 1). This does
not seem to fit with the translation of sums of probabilities. But it does fit when
the logarithmic base is taken to be some infinitesimal i (since for two positive reals
x y ix C iy D ix-j for some infinitesimal j). That is, we may understand ranks as real
orders of magnitude of non-standard probabilities. This is the basic reason for the
pervasive analogy.
Does this mean that ranking epistemology simply reduces to non-standard
Bayesianism? This may be one way to view the matter. However, I do not
particularly like this perspective. Bayesian epistemology in terms of non-standard
17 A Survey of Ranking Theory 329
reals is really non-standard. Even its great proponent, David Lewis, mentions the
possibility only in passing (for the first time in 1980, p. 268). It is well known
that both, non-standard analysis and its continuation as hyperfinite probability
theory, have their intricacies of their own, and it is highly questionable from an
epistemological point of view whether one should buy these intricacies. Moreover,
even though this understanding of ranks is in principle feasible, it is nowhere worked
out in detail. Such an elaboration should also explain the slight failures of the
above translation. Hence, even formally the relation between ranks and non-standard
probabilities is not fully clear. Finally, there are algebraic incoherencies. As long as
the probabilistic law of additivity and the ranking law of disjunction are finitely
restricted, there is no problem. However, it is very natural to conceive probability
measures as ¢-additive (although there is an argument about this point), whereas
it is very natural to conceive of ranking functions as complete (as I have argued).
This is a further disanalogy, which is not resolved by the suggested understanding
of ranks.
All in all, I prefer to stick to the realm of standard reals. Ranking theory is a
standard theory, and it should be compared to other standard theories. So, let us put
the issue of hyperfinite probability theory to one side.
Let us instead pursue another line of thought. I have heavily emphasized that
the fundamental point of ranking theory is to represent the statics and the dynamics
of belief or of taking-to-be-true; it is the theory of belief. So, instead of inquiring
the relation between ranks and probabilities we might as well ask the more familiar
question about the relation between belief and probability.
This relation is well known to be problematic. One naive idea is that belief
vaguely marks some threshold in probability, i.e., that A is believed iff its subjective
probability is greater than 1 – © for some small ©. But this will not do, as is
hightlighted by the famous lottery paradox (see Kyburg 1961, p. 197, and Hempel
(1962, pp. 163–166). According to this idea you may believe A and believe B, but
fail to believe A & B. However, this amounts to saying that you do not know the truth
table of conjunction, i.e., that you have not grasped conjunction at all. So, this idea
is a bad one, as almost all commentators to the lottery paradox agree. One might
think then about more complicated relations between belief and probability, but I
confess not to have seen any convincing one.
The simplest escape from the lottery paradox is, of course, to equate belief with
probability 1. This proposal faces two further problems, though. First, it seems
intuitively inadequate to equate belief with maximal certainty in probabilistic terms;
beliefs need not be absolutely certain. Secondly, but this is only a theoretical
version of the intuitive objection, only belief expansion makes sense according to
this proposal, but no genuine belief revision. Once you assign probability 1 to a
proposition, you can never get rid of it according to all rules of probabilistic change.
This is obviously inadequate; of course, we can give up previous beliefs and easily
do so all the time.
Jeffrey’s radical probabilism (1991) is a radical way out. According to Jeffrey, all
subjective probabilities are regular, and his generalized conditionalization provides
a dynamics moving within regular probabilities. However, Jeffrey’s picture and the
330 W. Spohn
The previous paragraphs again urge the issue of hyperfinite probability; ranked
probabilities look even more like probabilities in terms of non-standard reals.
However, I cannot say more than I already did; I recommend the issue for further
investigation.21 I should use the occasion for clarifying a possible confusion,
though. McGee (1994, pp. 181ff.) showed that Popper measures correspond to non-
standard probability measures in a specific way. Now, I have suggested that ranked
probabilities do so as well. However, my (1986, 1988) together entail that ranked
probabilities are more general than Popper measures. These three assertions do
not fit together. Yet, the apparent conflict is easily dissolved. The correspondence
proved by McGee is not a unique one. Different non-standard probability measures
may correspond to the same Popper measure, just as different ranked probabilities
may. Hence, if McGee says that the two approaches, Popper’s and the non-standard
one, “amount to the same thing” (p. 181), this is true only for the respects McGee
is considering, i.e., w.r.t. conditional probabilities. It is not true for the wider
perspective I am advocating here, i.e., w.r.t. probability dynamics.
Philosophical Aspects
The relation between belief and probability is not only a formal issue, it is
philosophically deeply puzzling. It would be disturbing if there should be two (or
more) unrelated ways of characterizing our doxastic states. We must somehow come
to grips with their relation.
The nicest option would be reductionism, i.e., reducing one notion to the other.
This can only mean reducing belief to probability. As we have seen, however,
this option seems barred by the lottery paradox. Another option is eliminativism
as most ably defended in Jeffrey’s radical probabilism also mentioned above. This
option is certainly viable and most elegant. Still, I find it deeply unsatisfactory; it is
unacceptable that our talk of belief should merely be an excusable error ultimately
to be eliminated. Thus, both versions of monism seem excluded.
Hence, we have to turn to dualism, and then interactionism may seem the most
sensible position. Of course, everything depends on the precise form of interaction
between belief and probability. In Spohn (2005b) I had an argument with Isaac Levi
whom I there described as the champion of interactionism. My general experience,
though, is that belief and probability are like oil and water; they do not mix easily.
Quite a different type of interactionism is represented by Hild (t.a.) who has many
interesting things to say about how ranking and probability theory mesh, indeed how
heavily ranking ideas are implicitly used in statistical methodology. I do not have
space to assess this type of interactionism.
21
For quite a different way of relating probabilities and ranks appealing neither to infinitesimals
nor to Popperian conditional probabilities see Giang and Shenoy (1999).
332 W. Spohn
When the fate of interactionism is unclear one might hope to return to reduction-
ism and thus to monism, not in the form of reducing belief to probability, but in the
form of reducing both to something third. This may be hyperfinite probability, or
it may be ranked probabilities as suggested above. However, as already indicated, I
consider this to be at best a formal possibility with admittedly great formal power of
unification. Philosophically, I am not convinced. It is intuitively simply inadequate
to equate belief with (almost) maximal probabilistic certainty, i.e., with probability
1 (minus an infinitesimal), even if this does not amount to unrevisability within these
unifications. This intuition has systematic counterparts. For centuries, the behavioral
connection of subjective probabilities to gambling and betting has been taken to
be fundamental; many hold that this connection provides the only explanation
of subjective probabilities. This fundamental connection does not survive these
unifications. According to them, I would have to be prepared to bet my life on my
beliefs; but this is true only of very few of my many beliefs. So, there are grave
frictions that should not be plastered by formal means.
In view of all this, I have always preferred separatism, at least methodologically.
If monism and interactionism are problematic, then belief and probability should
be studied as two separate fields of interest. I sense the harshness of this position;
this is why I am recommending it so far only as a methodological one and remain
unsure about its ultimate status. However, the harshness is softened by the formal
parallel which I have extensively exploited and which allows formal unification.
Thus, separatism in effect amounts to parallelism, at least if belief is studied in
ranking terms. Indeed, the effectiveness of the parallel sometimes strikes me as a
pre-established harmony.
Thus, another moral to be drawn may perhaps be structuralism, i.e., the search
for common structures. This is a strategy I find most clearly displayed in Halpern
(2003). He starts with a very weak structure of degrees of belief that he calls
plausibility measures and then discusses various conditions on those degrees that
allow useful strengthenings of that structure such as a theory of conditioning, a
theory of independence, a theory of expectation and integration, and so forth. Both,
ranking and probability theory, but not only they are specializations of that structure
and its various strengthenings. Without doubt, this is a most instructive procedure.
Structuralism would moreover suggest that it is only those structures and not their
specific realizations that matter. Halpern does not explicitly endorse this, and I
think one should withstand it. For instance, one would thereby miss the essential
purpose for which ranking theory was designed, namely the theory of belief. For
this purpose, no less and no more than the ranking structure is required.
Hence, let me further pursue, in the spirit of methodological separatism, the
philosophical comparison between ranks and standard probabilities. I have already
emphasized the areas in which the formal parallel also makes substantial sense:
inductive inference, confirmation, causation, etc. Let us now focus on three actual
or apparent substantial dissimilarities, which in one or the other way concern the
issue what our doxastic states have to do with reality.
The first aspect of this issue is the truth connection; ranks are related to truth
in a way in which probabilities are not. This is the old point all over again.
17 A Survey of Ranking Theory 333
Ranks represent beliefs that are true or false, whereas subjective probabilities do
not represent beliefs and may be assessed in various ways, as well-informed, as
reasonable, but never as true or false. Degrees of belief may perhaps conform to
degrees of truthlikeness; however, it is not clear in the first place whether degrees
of truthlikeness behave like probabilities (cf. Oddie 2001). Or degrees of belief may
conform to what Joyce (1998) calls the norm of gradational accuracy from which he
proceeds with an interesting argument to the effect that degrees of belief then have
to behave like probabilities.22 Such ideas are at best a weak substitute, however;
they never yield an application of truth in probability theory as we have it in ranking
theory.
This is a clear point in favor of ranking theory. And it is rich of consequences.
It means that ranking theory, in contrast to probability theory, is able to connect up
with traditional epistemology. For instance, Plantinga (1993, chs. 6 and 7) despairs
of finding insights in Bayesianism he can use and dismisses it, too swiftly I find.
This would have been different with ranking theory. The reason why ranking theory
is connectible is obvious. Traditional epistemology is interested in knowledge, a
category entirely foreign to probability theory; knowledge, roughly, is justified
true belief and thus analyzed by notions within the domain of ranking theory.
Moreover, the notion of justification has become particularly contested in traditional
epistemology; one focal issue was then to give an account of the truth-conduciveness
of reasons, again notions within the domain of ranking theory.
I am not claiming actual epistemological progress here. But I do claim an advan-
tage of ranking over probability theory, I do claim that traditional epistemology finds
in ranking theory adequate formal means for discussing its issues, and using such
means is something I generally recommend as a formal philosopher.
The second aspect is the behavioral connection. Our doxastic states make some
actions rational and others irrational, and our theories have to say which. Here,
probability theory seems to have a clear advantage. The associated behavioral
theory is, of course, decision theory with its fundamental principle of maximizing
conditional expected utility. The power of this theory need not be emphasized here.
Is there anything comparable on offer for ranking theory?
This appears excluded, for the formal reason that there is a theory of integration
and thus of expectation in probabilistic, but none in ranking terms; this is at least
what I had thought all along. However, the issue has developed. There are various
remarkable attempts of stating a decision theory in terms of non-probabilistic
or non-additive representations of degrees of belief employing the more general
Choquet theory of integration.23 Indeed, there is also one especially for ranking
theory. Giang and Shenoy (2000) translate the axiomatic treatment of utility as it is
22
Cf., however, Maher’s (2002) criticism of Joyce’s argument.
23
Economists inquired the issue; see, e.g., Gilboa (1987), Schmeidler (1989), Jaffray (1989), Sarin
and Wakker (1992) for early contributions, and Wakker (2005) for a recent one. The AI side
concurs; see, e.g., Dubois and Prade (1995), Brafman and Tennenholtz (2000), and Giang and
Shenoy (2005).
334 W. Spohn
given by Luce and Raiffa (1957, sect. 2.5) in terms of simple and compound lotteries
directly into the ranking framework, thus developing a notion of utility fitting to this
framework. These attempts doubtlessly deserve further scrutiny (cf. also Halpern
2003, ch. 5).
Let me raise, though, another point relating to this behavioral aspect. Linguistic
behavior is unique to humans and a very special kind of behavior. Still, one may
hope to cover it by decision theoretic means, too. Grice’s intentional semantics
employs a rudimentary decision theoretic analysis, and Lewis’ (1969) theory of
conventions uses game (and thus decision) theoretic methods in a very sophisticated
way. However, even Lewis’ account of coordination equilibria may be reduced to a
qualitative theory (in Lewis (1975) he explicitly uses only qualitative terminology).
In fact, the most primitive linguistic behavioral law is the disquotation principle:
if a seriously and sincerely utters “p”, then a believes that p.24 The point is that
these linguistic behavioral laws and in particular the disquotation principle is stated
in terms of belief. There is no probabilistic version of the disquotation principle,
and it is unclear what it could be. The close relation between belief and meaning is
obvious and undoubted, though perhaps not fully understood in the philosophy of
language. I am not suggesting that there is a linguistic pragmatics in terms of ranking
functions; there is hardly anything.25 I only want to point out that the standing of
ranking theory concerning this behavioral aspect is at least promising.
There is a third and final aspect, again apparently speaking in favor of probability
theory. We do not only make decisions with the help of our subjective probabilities,
we also do statistics. That is, we find a lot of relative frequencies in the world, and
they are closely related to probabilities. We need not discuss here the exact nature
of this relation. Concerning objective probabilities, it is extensively discussed in the
debate about frequentism, and concerning subjective probabilities it is presumably
best captured in Reichenbach’s principle postulating that our subjective probabilities
should rationally converge to the observed relative frequencies. What is clear, in any
case, is that in some way or other relative frequencies provide a strong anchoring
of probabilities in reality from which the powerful and pervasive application of
statistical methods derives. Subjective probabilities are not simply free-floating in
our minds.
For many years I thought that this is another important aspect in which ranking
theory is inferior to probability theory. Recently, though, I have become more
optimistic. Not that there would be any statistics in ranking terms26 ; I do not see
ranks related to relative frequencies. However, a corresponding role is played by the
notion of exception and thus by absolute frequencies. In section “Objective ranks?”,
24
If a speaks a foreign language, the principle takes a more complicated, but obvious form. There
is also a disquotation principle for the hearer, which, however, requires a careful exchange of the
hearer’s and the speaker’s role.
25
See in particular Merin (2006, appendix B) and (2008) whose relevance-based pragmatics yields
interesting results in probabilistic as well as in ranking-theoretic terms.
26
However, I had already mentioned that Hild (t.a.) finds a much closer connection of probabilities
and ranks within statistical methodology.
17 A Survey of Ranking Theory 335
I left the precise account of objectifiable ranking functions in the dark. If one
studies that account more closely, though, one finds that these objectifiable ranking
functions, or indeed the laws as I have indicated them in section “Objective ranks?”,
are exception or fault counting functions. The rank assigned to some possible world
by such a ranking function is just the number of exceptions from the law embodied
in this function that occur in this world.
This is a dim remark so far, and here is not the place to elaborate on it. Still, I
find the opposition of exceptions and relative frequencies appealing. Often, we take
a type of phenomenon as more or less frequent, and then we apply our sophisticated
statistical methodology to it. Equally often, we try to cover a type of phenomenon
by a deterministic law, we find exceptions, we try to improve our law, we take
recourse to a usually implicit ceteris paribus condition, etc. As far as I know, the
methodology of the latter perspective is less sophisticated. Indeed, there is little
theory. Mill’s method of relevant variables, e.g., is certainly an old and famous
attempt to such a theory (cf. its reconstruction in Cohen 1977, ch. 13). Still, both
perspectives, the statistical and the deterministic one, are very familiar to us. What
I am suggesting is that the deterministic perspective can be thoroughly described in
terms of ranking theory.27
It would moreover be most interesting to attend to the vague borderline.
Somewhere, we switch from one to the other perspective, from exceptions to small
relative frequencies or the other way around. I am not aware of any study of this
borderline, but I am sure it is worth getting inquired. It may have the potential of
also illuminating the relation of belief and probability, the deterministic and the
statistical attitude.
All these broad implications are involved in a comparison of ranks and
probabilities. I would find it rather confusing to artificially combine them in some
unified theory, be it hyperfinite or ranked probabilities. It is more illuminating to
keep them separate. Also, I did not want to argue for any preference. I wanted to
present the rich field of comparison in which both theories can show their great,
though partially diverging virtues. There should be no doubt, however, that the
driving force behind all these considerations is the formal parallelism which I
have extensively used in section “The theory” (pp. 305ff) and explained in section
“Formal aspects” (pp. 328ff).
Further Comparisons
Let me close the paper with a number of brief comparative remarks about alternative
accounts subsumable under the vague label ‘Baconian probability’. I have already
27
I attempted to substantiate this suggestion with my account of strict and ceteris paribus laws in
Spohn (2002) and with my translation of de Finetti’s representation theorem into ranking theory in
Spohn (2005a). (New addendum: For the most recent ranking-theoretic account of ceteris paribus
laws see Spohn (2014).)
336 W. Spohn
made a lot of such remarks en passant, but it may be useful to have them collected.
I shall distinguish between the earlier and usually more philosophical contributions
on the one hand and the more recent, often more technical contributions from the
computer science side on the other hand. The borderline is certainly fuzzy, and
I certainly do not want to erect boundaries. Still, the centuries old tendency of
specialization and of transferring problems from philosophy to special fields may
be clearly observed here as well.
It is perhaps appropriate to start with L. Jonathan Cohen, the inventor of the label.
In particular his (1977) is an impressive document of dualism, indeed separatism
concerning degrees of provability and degrees of probability or inductive (Baconian)
and Pascalian probability. His work is, as far as I know, the first explicit and powerful
articulation of the attitude I have taken here as well.28
However, his functions of inductive support are rather a preform of my ranking
functions. His inductive supports correspond to my positive ranks. Cohen clearly
endorsed the law of conjunction for positive ranks; see his (1970, pp. 21f. and p.
63). He also endorsed the law of negation; but he noticed its importance only in his
(1977, pp. 177ff.), whereas in his (1970) it is well hidden as theorem 306 on p. 226.
His presentation is a bit imperspicuous, though, since he is somehow attached to the
idea that i , i.e., having an inductive support i, behaves like iterable S4-necessity
and since he even brings in first-order predicate calculus.
Moreover, Cohen is explicit on the relationality of inductive support; it is a two-
place function relating evidence and hypothesis. Hence, one might expect to find a
true account of conditionality. This, however, is not so. His conditionals behave like
strict implication29 , a feature Lewis (1973, sect. 1.2–3) has already warned against.
Moreover, Cohen discusses only laws of support with fixed evidence – with one
exception, the consequence principle, as he calls it (1970, p. 62). Translated into my
notation it says for a positive ranking function that
which is clearly not a theorem of ranking theory. These remarks sufficiently indicate
that the aspect so crucial for ranking functions is scarcely and wrongly developed in
Cohen’s work.
The first clear articulation of the basic Baconian structure is found, however, not
in Cohen’s work, but in Shackle (1949, 1969). His functions of potential surprise
28
I must confess, though, that I had not yet noticed his work when I basically fixed my ideas on
ranking functions in 1983.
29
This is particularly obvious from Cohen (1970, p. 219, def. 5).
17 A Survey of Ranking Theory 337
He accepts the criticism this axiom has met, and changes it into a second version
(1969, p. 83), which I find must be translated into
›.A \ B/ D min Œmax f›.A/; ›.B j A/g ; max f›.B/; ›.A j B/g : (17.18)
And on pp. 204f. he even considers, and rejects (for no convincing reason), the
equation
i.e., our law of conjunction for negative ranks. In all these discussions, conditional
degrees of potential surprise appear to be an unexplained primitive notion. So,
Shackle may have been here on the verge of getting things right. On the whole,
though, it seems fair to say that his struggle has not led to a clear result.
Isaac Levi has always pointed to this pioneering achievement of Shackle, and
he has made his own use of it. In a way he did not develop Shackle’s functions
338 W. Spohn
of potential surprise; he just stuck to the laws of negation and of disjunction for
negative ranks. In particular, there is no hint of any notion of conditionalization.
This is not to say that his epistemology is poorer than the one I have. Rather, he
finds a place for Shackle’s functions in his elaborated doxastic decision theory, more
precisely, in his account of belief expansion. He adds a separate account of belief
contraction, and with the help of what is called Levi’s identity he can thus deal
with every kind of belief change. He may even claim to come to grips with iterated
change.30 One may thus sense that his edifice is at cross-purposes with mine.
A fair comparison is hence a larger affair. I have tried to give it in Spohn (2005b).
Let me only mention one divergence specifically related to ranking functions. Since
Levi considers ranking functions as basically identical with Shackle’s functions of
potential surprise and since he sees the latter’s role in expansion, he continuously
brings ranking functions into the same restricted perspective. I find this inadequate.
I rather see the very same structure at work at expansions as well as at contractions,
namely the structure of ranks. Insofar I do not see any need of giving the two kinds
of belief change an entirely different treatment.
This brings me to the next comparison, with AGM belief revision theory (cf. e.g.,
Gärdenfors 1988). I have already explained that I came to think of ranking theory as
a direct response to the challenge of iterated belief revision for AGM belief revision
theory, and I have explained how A ! x-conditionalization for ranks unifies and
generalizes AGM expansion, revision, and contraction. One may wonder how that
challenge was taken up within the AGM discussion. With a plethora of proposals
(see Rott 2008), that partially ventilated ideas that I thought to have effectively
criticized already in Spohn (1988) and that do not find agreement, as far as I see,
with the exception of Darwiche and Pearl (1997). As mentioned, Hild and Spohn
(2008) gives a complete axiomatization of iterated contraction. Whether it finds
wider acceptance remains to be seen.
By no means, though, one should underestimate the richness of the AGM
discussion, of which, e.g., Rott (2001) or Hanson (1999) give a good impression.
A pertinent point is that ranking theory generalizes and thus simply sides with
the standard postulates for revision and contraction (i.e., (K*1-8) and (K 1-8) in
Gärdenfors 1988, pp. 54–56 and 61–64). The ensuing discussion has shown that
these postulates are not beyond criticism and that many alternatives are worth
discussing (cf., e.g., Rott 2001, pp. 103ff., who lists three alternatives of K*7, nine
of K*8, six of K 7, and ten of K 8). I confess I would not know how to modify
ranking theory in order to do justice to such alternatives. Hence, a fuller comparison
with AGM belief revision theory would have to advance a defense of the standard
postulates against the criticisms related with the alternatives.
The point is, of course, relevant in the debate with Levi, too. He prefers what
he calls mild contraction to standard AGM contraction that can be represented in
30
Many aspects of his epistemology are already found in Levi (1967). The most recent statement
is given in Levi (2004), where one also gets a good idea of the development of his thought.
17 A Survey of Ranking Theory 339
ranking theory only as a form of iterated contraction. Again, one would have to
discuss whether this representation is acceptable.
It is worth mentioning that the origins of AGM belief revision theory clearly lie
in conditional logic. Gärdenfors’ (1978) epistemic semantics for conditionals was a
response to the somewhat unearthly similarity spheres semantics for counterfactuals
in Lewis (1973), and via the so-called Ramsey test Gärdenfors’ interest more and
more shifted from belief in conditionals to conditional beliefs and thus to the
dynamics of belief. Hence, one finds a great similarity in the formal structures of
conditional logic and belief revision theory. In particular, Lewis’ similarity spheres
correspond to Gärdenfors’ entrenchment relations (1988, ch. 4). In a nutshell, then,
the progress of ranking theory over Lewis’ counterfactual logic lies in proceeding
from an ordering of counterfactuality (as represented by Lewis’ nested similarity
spheres) to a cardinal grading of disbelief (as embodied in negative ranking
functions).31
Indeed, the origins reach back farther. Conditional logic also has a history, the
earlier one being somewhat indeterminate. However, the idea of having an ordering
of levels of counterfactuality or of far-fetchedness of hypotheses is explicitly found
already in Rescher (1964). If is a positive ranking function taking only finitely
many values 0, x1 , : : : , xm , 1, then 1 (1), 1 (xm ), : : : , 1 (x1 ), 1 (0) is just
a family of modal categories M0 , : : : , Mn (n D m C 2), as Rescher (1964, pp. 47–50)
describes it. His procedure on pp. 49 f. for generating modal categories makes them
closed under conjunction; this is our law of conjunction for positive ranks. And he
observes on p. 47 that all the negations of sentences in modal categories up to Mn-1
must be in Mn D 1 (0); this is our law of negation.
To resume, I cannot find an equivalent to the ranking account of conditionaliza-
tion in all this literature. However, the philosophical fruits I have depicted in section
“The theory”, pp. 305ff., and also in section “Philosophical aspects”, pp. 331ff.,
sprang from this account. Therefore, I am wondering to which extent this literature
can offer similar fruits, and for all I know the answer tends to be negative.
31
For my ideas how to treat conditionals in ranking-theoretic terms see Spohn (2015).
340 W. Spohn
In the computer science literature, ranking theory is usually subsumed under the
heading “uncertainty” and “degrees of belief”. This is not wrong. After all, ranks are
degrees, and if (absolute) certainty is equated with unrevisability, revisable beliefs
are uncertain beliefs. Still, the subsumption is also misleading. My concern was
not to represent uncertainty and to ventilate alternative models of doing so. Thus
stated, this would have been an enterprise with too little guidance. My concern
was exclusively to statically and dynamically represent ungraded belief, and my
observation was that this necessarily leads to the ranking structure. If this is so,
then, as I have emphasized, all the philosophical benefits of having a successful
representation of ungraded belief are conferred to ranking theory. By contrast, if one
starts modeling degrees of uncertainty, it is always an issue (raised, for instance, by
the lottery paradox vis à vis probability) to which extent such a model adequately
captures belief and its dynamics. So, this is a principled feature that sets ranking
theory apart from the entire uncertainty literature.
The revisability of beliefs was directly studied in computer science under
headings like “default logic” or “nonmonotonic reasoning”. This is another large
and natural field of comparison for ranking theory. However, let me cut things
short. The relation between belief revision theory and nonmonotonic reasoning
is meticulously investigated by Rott (2001). He proved far-reaching equivalences
between many variants on both sides. This is highly illuminating. At the same time,
however, it is a general indication that the concerns that led me to develop AGM
belief revision theory into ranking theory are not well addressed in these areas of
AI. Of course, such lump-sum statements must be taken with caution.
The uncertainty literature has observed many times that the field of nonmono-
tonic reasoning is within its reach. Among many others, Pearl (1988, ch. 10)
has investigated the point from the probabilistic side, and Halpern (2003, ch. 8)
has summarized it from his more comprehensive perspective. This direction of
inquiry is obviously feasible, but the reverse line of thought of deriving kinds of
uncertainty degrees from kinds of nonmonotonic reasoning is less clear (though the
results in Hild and Spohn (2008) about the measurement of ranks with via iterated
contractions may be a step in the reverse direction).
So, let me return to accounts of uncertainty in a bit more detail, and let me
take up possibility theory first. It originates from Zadeh (1978), i.e. from fuzzy set
theory and hence from a theory of vagueness. Its elaboration in the book by Dubois
and Prade (1988) and many further papers shows its wide applicability, but never
denies its origin. So, it should at least be mentioned that philosophical accounts of
vagueness (cf., e.g., Williamson 1994) have nothing much to do with fuzzy logic.
If one abstracts from this interpretation, though, possibility theory is formally very
similar to ranking theory. If Poss is a possibility measure, then the basic laws are:
So far, the difference is merely one of scale. Full possibility 1 is negative rank
0, (im)possibility 0 is negative rank 1, and translating the scales translates the
17 A Survey of Ranking Theory 341
characteristic axiom of possibility theory into the law of disjunction for negative
ranks. Indeed, Dubois and Prade often describe their degrees of possibility in such
a way that this translation fits not only formally, but also materially.
Hence, the key issue is again how conditionalization is treated within possibility
theory. There is some uncertainty. First, there is the motive that also dominated
Shackle’s account of the functions of potential surprise, namely to keep possibility
theory as an ordinal theory where degrees of possibility have no arithmetical
meaning. Then the idea is to stipulate that
This is just Shackle’s proposal (17.19). Hisdal (1978) proposed to go beyond (17.19)
just by turning (17.22) into a definition of conditional possibility by additionally
assuming that conditionally things should be as possible as possible, i.e., by defining
Poss(B j A) as the maximal degree of possibility that makes (17.22) true:
P .A \ B/ ; if Poss .A \ B/ < Poss.A/
Poss .B j A/ D : (17.23)
1; if Poss .A \ B/ D Poss.A/
Halpern (2003, Proposition 3.9.2, Theorem 4.4.5, and Corollary 4.5.8) entails that
Bayesian net theory works also in terms of conditional possibility thus defined.
Many things, though, do not work well. It is plausible that Poss(B j A) is between
the extremes 1 and Poss(A \ B). However, (17.23) implies that it can take only
those extremes. This is unintelligible. (17.22) implies that, if neither Poss(B j A) nor
Poss(A j B) is 1, they are equal, a strange symmetry. And so on. Such unacceptable
consequences spread through the entire architecture.
However, there is a second way to introduce conditional possibilities (cf., e.g.,
Dubois and Prade 1998, p. 206), namely by taking numerical degrees of possibility
seriously and defining
This looks much better. Indeed, if we define ›(A) D log Poss(A), the logarithm taken
w.r.t. some positive base < 1, then › is a negative ranking function such that also
›(B j A) D log Poss(B jj A). Hence, (17.24) renders possibility and ranking theory
isomorphic, and all the philosophical benefits may be gained in either terms. Still,
there remain interpretational differences. If we are really up to degrees of belief
and disbelief, then the ranking scale is certainly more natural; this is particularly
clear when we look at the possibilistic analogue to two-sided ranking functions.
My remarks about objectifiable ranking functions as fault counting functions would
make no sense for a possibilistic scale. And so on. Finally, one must be aware that
the philosophical benefits resulted from adequately representing belief. Hence, it is
doubtful whether the formal structure suffices to maintain the benefits for alternative
interpretations of possibility theory.
342 W. Spohn
32
Does this contradict the fact that ranking functions are equivalent to possibility measures (with
their second kind of conditionalization), that possibility measures may be conceived as a special
case of DS belief (or rather: plausibility) functions, and that Jeffrey conditionalization works
for possibility measures as defined by Halpern (2003, p. 107)? No. The reason is that Jeffrey
conditionalization for possibility measures is not a special case of Jeffrey conditionalization for
DS belief functions in general. Cf. Halpern (2003, p. 107).
344 W. Spohn
References
Merin, A. (2008). Relevance and reasons in probability and epistemic ranking theory. A study in
cognitive economy. In: Forschungsberichte der DFG-Forschergruppe Logik in der Philosophie
(Nr. 130). University of Konstanz.
Neapolitan, R. E. (1990). Probabilistic reasoning in expert systems: Theory and algorithms. New
York: Wiley.
Oddie, G. (2001). Truthlikeness. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy
(Fall 2001 Edition). http://plato.stanford.edu/archives/fall2001/entries/truthlikeness
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference.
San Mateo: Morgan Kaufman.
Pearl, J. (2000). Causality. Models, reasoning, and inference. Cambridge: Cambridge University
Press.
Plantinga, A. (1993). Warrant: The current debate. Oxford: Oxford University Press.
Pollock, J. L. (1995). Cognitive carpentry. Cambridge: MIT Press.
Rescher, N. (1964). Hypothetical reasoning. Amsterdam: North-Holland.
Rescher, N. (1976). Plausible reasoning. Assen: Van Gorcum.
Rott, H. (2001). Change, choice and inference: A study of belief revision and nonmonotonic
reasoning. Oxford: Oxford University Press.
Rott, H. (2008). Shifting priorities: Simple representations for twenty seven iterated theory change
operators. In D. Makinson, J. Malinowski, & H. Wansing (Eds.), Towards mathematical
philosophy. Dordrecht: Springer.
Sarin, R., & Wakker, P. P. (1992). A simple axiomatization of nonadditive expected utility.
Econometrica, 60, 1255–1272.
Schmeidler, D. (1989). Subjective probability and expected utility without additivity. Economet-
rica, 57, 571–587.
Shackle, G. L. S. (1949). Expectation in economics. Cambridge: Cambridge University Press.
Shackle, G. L. S. (1969). Decision, order and time in human affairs (2nd ed.). Cambridge:
Cambridge University Press.
Shafer, G. (1976). A mathematical theory of evidence. Princeton: Princeton University Press.
Shafer, G. (1978). Non-additive probabilities in the work of Bernoulli and Lambert. Archive for
History of Exact Sciences, 19, 309–370.
Shenoy, P. P. (1991). On Spohn’s rule for revision of beliefs. International Journal of Approximate
Reasoning, 5, 149–181.
Smets, P. (1998). The Transferable Belief Model for Quantified Belief Representation. In D.M.
Gabbay & P. Smets (Eds.), Handbook of defeasible reasoning and uncertainty management
systems (Vol. 1) (pp. 267–301). Dordrecht: Kluwer.
Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. Berlin:
Springer, 2nd ed. 2000.
Spohn, W. (1976/78). Grundlagen der Entscheidungstheorie. Ph.D. thesis, University of
Munich 1976, published: Kronberg/Ts.: Scriptor 1978, out of print, pdf-version at:
http://www.uni-konstanz.de/FuF/Philo/Philosophie/philosophie/files/ge.buch.gesamt.pdf
Spohn, W. (1983). Eine Theorie der Kausalität, unpublished Habilitationsschrift, Universität
München, pdf-version at: http://www.uni-konstanz.de/FuF/Philo/Philosophie/philosophie/files/
habilitation.pdf
Spohn, W. (1986). The representation of Popper measures. Topoi, 5, 69–74.
Spohn, W. (1988). Ordinal conditional functions. A dynamic theory of epistemic states. In W. L.
Harper & B. Skyrms (Eds.), Causation in decision, belief change, and statistics (Vol. II, pp.
105–134). Dordrecht: Kluwer.
Spohn, W. (1990). A general non-probabilistic theory of inductive reasoning. In R. D. Shachter, T.
S. Levitt, J. Lemmer, & L. N. Kanal (Eds.), Uncertainty in artificial intelligence (Vol. 4, pp.
149–158). Amsterdam: Elsevier.
Spohn, W. (1991). A reason for explanation: Explanations provide stable reasons. In W. Spohn,
B. C. van Fraassen, & B. Skyrms (Eds.), Existence and explanation (pp. 165–196). Dordrecht:
Kluwer.
17 A Survey of Ranking Theory 347
Spohn, W. (1993). Causal laws are objectifications of inductive schemes. In J. Dubucs (Ed.),
Philosophy of probability (pp. 223–252). Dordrecht: Kluwer.
Spohn, W. (1994a). On the properties of conditional independence. In P. Humphreys (Ed.), Patrick
suppes: Scientific philosopher. Vol. 1: Probability and probabilistic causality (pp. 173–194).
Dordrecht: Kluwer.
Spohn, W. (1994b). On Reichenbach’s principle of the common cause. In W. C. Salmon &
G. Wolters (Eds.), Logic, language, and the structure of scientific theories (pp. 215–239).
Pittsburgh: Pittsburgh University Press.
Spohn, W. (1999). Two coherence principles. Erkenntnis, 50, 155–175.
Spohn, W. (2001a). Vier Begründungsbegriffe. In T. Grundmann (Ed.), Erkenntnistheorie. Positio-
nen zwischen Tradition und Gegenwart (pp. 33–52). Paderborn: Mentis.
Spohn, W. (2001b). Bayesian nets are all there is to causal dependence. In M. C. Galavotti, P.
Suppes, & D. Costantini (Eds.), Stochastic dependence and causality (pp. 157–172). Stanford:
CSLI Publications.
Spohn, W. (2002). Laws, ceteris paribus conditions, and the dynamics of belief. Erkenntnis, 57,
373–394; also in: Earman, J., Glymour, C., Mitchell, S. (Eds.). (2002). Ceteris paribus laws
(pp. 97–118). Dordrecht: Kluwer.
Spohn, W. (2005a). Enumerative induction and lawlikeness. Philosophy of Science, 72, 164–187.
Spohn, W. (2005b). Isaac Levi’s potentially surprising epistemological picture. In E. Olsson (Ed.),
Knowledge and inquiry: Essays on the pragmatism of Isaac Levi. Cambridge: Cambridge
University Press.
Spohn, W. (2006). Causation: An alternative. British Journal for the Philosophy of Science, 57,
93–119.
Spohn, W. (2012). The laws of belief. Ranking theory and its philosophical applications. Oxford:
Oxford University Press.
Spohn, W. (2014). The epistemic account of ceteris paribus conditions. European Journal for the
Philosophy of Science, 4(2014), 385–408.
Spohn, W. (2015). Conditionals: A unified ranking-theoretic perspective. Philosophers’ Imprint
15(1)1–30; see: http://quod.lib.umich.edu/p/phimp/3521354.0015.001/
Studeny, M. (1989). Multiinformation and the problem of characterization of conditional indepen-
dence relations. Problems of Control and Information Theory, 18, 3–16.
Wakker, P. P. (2005). Decision-foundations for properties of nonadditive measures: General state
spaces or general outcome spaces. Games and Economic Behavior, 50, 107–125.
Williamson, T. (1994). Vagueness. London: Routledge.
Zadeh, L. A. (1975). Fuzzy logics and approximate reasoning. Synthese, 30, 407–428.
Zadeh, L. A. (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems, 1,
3–28.
Part III
Decision Theory
Chapter 18
Introduction
The classical account of decision-making derives from seminal work done by Frank
P. Ramsey (1926) and later on by Von Neumann and Morgenstern (1947). This work
culminates later on with the influential account of Leonard Savage (1954), Ascombe
and Aumann (1963) and de Finetti (1974). We can recapitulate here in a compact
form the classical presentation by Von Neumann and Morgenstern. Define a lottery
as follows: If A1 , : : : , Am is a partition of the possible outcomes of an experiment
with ’j D Pr(Aj ) for each j, then the lottery (’1 , : : : , ’m ) awards prize zj if Aj occur.
We can assume that the choice of the partition events does not affect the lottery. We
can then introduce some central axioms for preferences among lotteries.
Axiom 18.1 (Weak Order) There is a weak order, , among lotteries such that
L1 L2 iff L1 is not strictly preferred to L2 .
Then we have a second crucial axiom:
Axiom 18.2 (Independence) For each L, L1 , L2 , and 0 < a < 1, L1 L2 iff aLl C
(1 a)L aL2 C (1 a)L.
The paradox proposed by Daniel Ellsberg is quite different. We can present here
the simplest version of the paradox. Urn A contains exactly 100 balls. 50 of these
balls are solid black and the remaining 50 are solid white. Urn B contains exactly
100 balls. Each of these balls is either solid black or solid white, although the ratio of
black balls to white balls is unknown. Consider now the following questions: How
much would you be willing to pay for a ticket that pays $25 ($0) if the next random
selection from Urn A results in black (white) ball? Repeat then the same question for
Urn B. It is well known that subjects tend to offer higher maximum buying prices for
urn A than for urn B. This indicates that subjects do not have identical probabilities
for both urns (.5 for each color) as Savage’s theory predicts. It is considerably less
clear that this behavior has to be interpreted as some sort of error. Ellsberg himself
saw this behavior as an indication that Savage’s theory has to be amended to deal
with situations where uncertainty and vague or imprecise probabilities are involved.
One can perfectly think, for example, that probabilities remain indeterminate in the
case of Urn B. There is a vast literature dealing with decisions under ambiguity that
is reviewed in the article by Gilboa and Marinacci reprinted here. As Seidenfeld’s
article indicates there are two main choices: either embracing a theory that abandons
Axiom 18.1 (Ordering) or alternatively embracing a theory that abandons Axiom
18.2 (Independence). Seidenfeld argues that abandoning Independence (a solution
that is rather popular and that Ellsberg himself supported) has a costly price: it
leads to a form of sequential incoherence. Seidenfeld’s argument requires the use
of axioms for sequential decision making that many have found controversial.
Seidenfeld’s article remains mainly concerned with normative solutions to the
paradoxes. The article by Tversky and Kahnemann reprinted here intends to extend
the initial version of prospect theory to the case of uncertainty as well. So, they think
that the common choices elicited by Ellsberg constitute also an error. This implies
having a conservative attitude regarding the normative status of standard decision
theory that clearly clashes with the motivation and some of the central theoretical
ideas that motivated Ellsberg’s work.
Mark J. Schervish, Teddy Seidenfeld and Josheph B. Kadane question in
their paper another central tenet of the standard theories of decision making: the
assumption that utility has to be state-independent. They show via an ingenious
example that the uniqueness of probability in standard representations is relative
to the choice of what counts as a constant outcome. Moreover they prove an
important result showing how to elicit a unique state-dependent utility. The result
does not assume that there are prizes with constant value by introducing a new
kind of hypothetical kind of act in which both the prize and the state of nature are
determined by an auxiliary experiment.
Our final paradox is the one proposed by the physicist William Newcomb.
Consider an opaque box and a transparent box. An agent may choose one or the
other taking into account the following: The transparent box contains one thousand
dollars that the agent plainly sees. The opaque box contains either nothing or one
million dollars, depending on a prediction already made. The prediction was about
the agent’s choice. If the prediction was that the agent will take both boxes, then the
opaque box is empty. On the other hand, if the prediction was that the agent will take
354 H. Arló-Costa et al.
just the opaque box, then the opaque box contains a million dollars. The prediction
is reliable. The agent knows all these features of his decision problem. So, we can
depict the agent’s options as follows:
A classical and still rather useful book presenting the received view in decision theory is Savage’s
influential and seminal book: The Foundations of Statistics, Dover Publications; 2 Revised edition
(June 1, 1972). A slightly more accessible but pretty thorough textbook presentation of Savage’s
account and beyond is the monograph by David Kreps: Notes on the Theory of Choice, Westview
Press (May 12, 1988).
The classical essay by Daniel Ellsberg introducing his now famous paradox continues to be a
very important source in this area: “Risk, Ambiguity and the Savage Axioms,” Quarterly Journal
of Economics, 75: 643-669, 1961. Isaac Levi presented a unified normative view of both Allais and
Ellsberg in: “The Paradoxes of Allais and Ellsberg,” Economics and Philosophy, 2: 23-53, 1986.
18 Introduction 355
This solution abandons ordering rather than independence unlike the solution proposed by Ellsberg
himself. Solutions abandoning independence have been in general more popular. Some of the
classical papers in this tradition appear in the bibliography of the paper by Gilboa and Marinacci
reprinted here. Many of the responses to Allais have been descriptive rather than normative.
Prospect theory is a classical type of response along these lines. We reprint here an article that
intends to present a unified descriptive response to both Allais and Ellsberg. The reader can find
an excellent, thorough and mathematically mature presentation of the contemporary state of the
art in Prospect theory in a recent book published by Peter Wakker: Prospect Theory for Risk and
Ambiguity, Cambridge University Press, Cambridge, 2011.
The debate about the foundations of causal decision theory is in a way still open. An excellent
presentation of causal decision theory can be found in an important book by Jim Joyce: The
Foundations of Causal Decision Theory, Cambridge Studies in Probability, Induction and Decision
Theory, Cambridge, 2008. Joyce has also written an interesting piece answering challenges to
causal decision theory: “Regret and Instability in Causal Decision Theory,” forthcoming in the
second volume of a special issue of Synthese devoted to the foundations of the decision sciences
(eds.) Horacio Arlo-Costa and Jeffrey Helzner. This special issue contains as well an essay by
Wolfgang Spohn that intends to articulate the main ideas of causal decision theory by appealing
to techniques used in Bayesian networks: “Reversing 30 Years of Discussion: Why Causal
Decision Theorists should be One-Box.” This is a promising line of investigation that has also
been considered preliminary in an insightful article by Christopher Meek and Clark Glymour:
“Conditioning and Intervening”, British Journal for the Philosophy of Science 45, 1001-1021,
1994. With regard to issues related to actual causation a useful collection is the book: Causation
and Counterfactuals, edited by J. Collins, N. Hall and L.A. Paul, MIT Press, 2004.
Finally a paper by Joseph Halpern and Judea Pearl offers a definition of actual causes
using structural equations to model counterfactuals, Halpern, J. Y. and Pearl, J. (2005) “Causes
and explanations: a structural-model approach. Part I: Causes”, British Journal for Philosophy
of Science 56:4, 843-887. This paper articulates ideas about causation based on recent work
on Bayesian networks and related formalisms. Current work in this area seems to point to an
unification of causal decision theory and an account of causation based on Bayesian networks.
Chapter 19
Allais’s Paradox
Leonard Savage
Introspection about certain hypothetical decision situations suggests that the sure-
thing principle and, with it, the theory of utility are normatively unsatisfactory.
Consider an example based on two decision situations each involving two gambles.1
Situation 1. Choose between
Gamble 1. $500,000 with probability 1; and
Gamble 2. $2,500,000 with probability 0.1,
$500,000 with probability 0.89, status quo with probability 0.01.
Situation 2. Choose between
Gamble 3. $500,000 with probability 0.11, status quo with probability 0.89; and
Gamble 4. $2,500,000 with probability 0.1, status quo with probability 0.9.
Many people prefer Gamble 1 to Gamble 2, because, speaking qualitatively, they
do not find the chance of winning a very large fortune in place of receiving
a large fortune outright adequate compensation for even a small risk of being
left in the status quo. Many of the same people prefer Gamble 4 to Gamble 3;
because, speaking qualitatively, the chance of winning is nearly the same in both
gambles, so the one with the much larger prize seems preferable. But the intuitively
acceptable pair of preferences, Gamble 1 preferred to Gamble 2 and Gamble 4 to
Gamble 3, is not compatible with the utility concept or, equivalently, the sure-thing
principle. Indeed that pair of preferences implies the following inequalities for any
hypothetical utility function.
U ($500,000) > 0.1U ($2,500,000) C 0.89U ($500,000) C 0.1U ($0),
(3)
0.1U ($2,500,000) C 0.9U ($0) > 0.11U ($500,000) C 0.89U ($0);
and these are obviously incompatible.
Examples2 like the one cited do have a strong intuitive appeal; even if you do
not personally feel a tendency to prefer Gamble 1 to Gamble 2 and simultaneously
Gamble 4 to Gamble 3, I think that a few trials with other prizes and probabilities
will provide you with an example appropriate to yourself.
If, after thorough deliberation, anyone maintains a pair of distinct preferences
that are in conflict with the sure-thing principle, he must abandon, or modify, the
principle; for that kind of discrepancy seems intolerable in a normative theory.
Analogous circumstances forced D. Bernoulli to abandon the theory of mathe-
matical expectation for that of utility (Bernoulli 1738). In general, a person who
has tentatively accepted a normative theory must conscientiously study situations
in which the theory seems to lead him astray; he must decide for each by
reflection—deduction will typically be of little relevance—whether to retain his
initial impression of the situation or to accept the implications of the theory for it.
To illustrate, let me record my own reactions to the example with which this
heading was introduced. When the two situations were first presented, I immediately
expressed preference for Gamble 1 as opposed to Gamble 2 and for Gamble 4 as
opposed to Gamble 3, and I still feel an intuitive attraction to those preferences.
But I have since accepted the following way of looking at the two situations, which
amounts to repeated use of the sure-thing principle.
One way in which Gambles 1–4 could be realized is by a lottery with a
hundred numbered tickets and with prizes according to the schedule shown in
Table 19.1.
2
Allais has announced (but not yet published) an empirical investigation of the responses of
prudent, educated people to such examples (Allais 1953).
19 Allais’s Paradox 359
Now, if one of the tickets numbered from 12 through 100 is drawn, it will not
matter, in either situation, which gamble I choose. I therefore focus on the possibility
that one of the tickets numbered from 1 through 11 will be drawn, in which
case Situations 1 and 2 are exactly parallel. The subsidiary decision depends in
both situations on whether I would sell an outright gift of $500,000 for a 10-to-1
chance to win $2,500,000—a conclusion that I think has a claim to universality, or
objectivity. Finally, consulting my purely personal taste, I find that I would prefer the
gift of $500,000 and, accordingly, that I prefer Gamble 1 to Gamble 2 and (contrary
to my initial reaction) Gamble 3 to Gamble 4.
It seems to me that in reversing my preference between Gambles 3 and 4 I have
corrected an error. There is, of course, an important sense in which preferences,
being entirely subjective, cannot be in error; but in a different, more subtle sense they
can be. Let me illustrate by a simple example containing no reference to uncertainty.
A man buying a car for $2,134.56 is tempted to order it with a radio installed, which
will bring the total price to $2,228.41, feeling that the difference is trifling. But,
when he reflects that, if he already had the car, he certainly would not spend $93.85
for a radio for it, he realizes that he has made an error.
One thing that should be mentioned before this chapter is closed is that the law
of diminishing marginal utility plays no fundamental role in the von Neumann-
Morgenstern theory of utility, viewed either empirically or normatively. Therefore
the possibility is left open that utility as a function of wealth may not be concave,
at least in some intervals of wealth. Some economic-theoretical consequences of
recognition of the possibility of non-concave segments of the utility function have
been worked out by Friedman and myself (1948), and by Friedman alone (1953).
The work of Friedman and myself on this point is criticized by Markowitz (1952).3
References
Allais, M. (1953). Le comportement de l’homme rationnel devant le risque: Critique des postulats
et axioms de l’école Americaine. Econometrica, 21, 503–546.
Archibald, G. C. (1959). Utility, risk, and linearity. Journal of Political Economy, 67, 437–450.
Bernoulli, D. (1738). Specimen theoriae novae de mensura sortis. Commentarii academiae
scientiarum imperialis Petropolitanae (for 1730 and 1731), 5, 175–192.
Centre National de Recherche Scientifique Fondements et applications de la théorie du risque
en économetrie, Paris, Centre National de la Recherche Scientifique. (1954) Report of an
international econometric colloquium on risk, in which there was much discussion of utility,
held in Paris, May 12–17, 1952.
Friedman, M. (1953). Choice, chance, and personal distribution of income. Journal of Political
Economy, 61, 277–290.
Friedman, M., & Savage, L. J. (1948). The utility analysis of choices involving risk. Journal of
Political Economy, 56, 279–304, Reprinted, with a correction, in Stigler and Boulding (1952).
3
See also Archibald (1959) and Hakansson (1970).
360 L. Savage
Teddy Seidenfeld
Introduction
T. Seidenfeld ()
Departments of Philosophy and Statistics, Carnegie Mellon University, Pittsburgh,
PA 15213, USA
e-mail: [email protected]
For ease of exposition, let us adopt an axiomatization similar to the von Neumann
and Morgenstern (1947) theory, as condensed by Jensen (1967). Let R be a set of “-
many rewards (or payoffs), R D (r’ : ’ “). In the spirit of Debreu’s (1959, chapter
4) presentation, we can think of R as including (infinitely divisible) monetary
rewards. A (simple) lottery over R is a probability measure P on R with the added
requirement that P(X) D 1 for some finite subset of rewards.
Lotteries are individuated according to the following (0th) reduction postulate:
Let L1 , L2 be two lotteries with probabilities P1 , P2 and let Rn D (r1, : : : , rn ) be the
finite set of the union of the payoffs under these two lotteries. A convex combination,
’L1 C (1 ’)L2 (0 ’ 1), of the two lotteries is again a lottery with probability
measure ’P1 C (1 ’)P2 over Rn . Thus, the set of lotteries is a mixture set M(R) in
the sense of Herstein and Milnor (1953).
Three postulates comprise expected utility theory:
(1) An ordering requirement: preference, ., a relation over M M, is a weak-order.
That is, . is reflexive, transitive, and all pairs of lotteries are comparable under
.. (Strict preference, <, and indifference, , are defined relations.)
(2) An Archimedean requirement: If L1 < L2 and L2 < L3 , there is a nontrivial
convex combination of L1 and L3 strictly preferred (and another combination
strictly dispreferred) to L2 . That is, there exist
where L$x is a degenerate lottery having only one prize, $x. Then principle (2)
deserves its title for, with (*) and the added stipulation that more is (strictly) better
when it comes to money, (1) and (2) entail a real-valued utility representation for .
(continuous in $).1 To simplify still further, let us restrict attention to lotteries with
none but monetary payoffs.
1
By assuming (*), we fix it that M/ (no longer assumed to be a mixture set) has a countable
dense subset in the < order on M/ , e.g., the rational-valued sure-dollar equivalents. Then
our first two postulates ensure a real-valued utility u on M with the property that L1 < L2 if and
only if u(L1 ) < u(L2 ). The point is that, without “independence,” the usual Archimedean axiom
is neither necessary nor sufficient for a real valued utility. See Fishburn (1970, Section 3.1) for
details, or Debreu (1959, Chapter 4), who discusses conditions for u to be continuous. Debreu
uses a “continuity” postulate in place of (2) that, in our setting, requires that if the sequence [Li ]
converges (in distribution) to the lottery Li , and Li < Lk , then all but finitely many of the Li < Lk .
If we extend . to general distributions over R, Debreu’s continuity postulate entails countably
20 Decision Theory Without “Independence” or Without “Ordering” 363
(3) The “independence” principle: For all Li , Lj , and Lk , and for all ’ (0 < ’ 1), Li
. Lj () ’Li C (1 ’)Lk . ’Li C (1 ’)Lk .
Let us examine these postulates for the special case of lotteries on three rewards:
R D (r1 < r2 < r3 ), where the reward ri is identified with the degenerate lottery having
point-mass P(ri ) D 1 (i D 1,2,3). Following the excellent presentation by Machina
(1982), we arrive at a simple geometric account of what is permitted by expected-
utility theory. Figure 20.1 depicts the consequences of postulates (l)–(3).
1
Increasing preference
P(r3) P(r2) = 0
Lj
Li
0 P(r1) 1
P(r2) = 0
P(r3)
Li
0 P(r1) 1
additive probability. See Seidenfeld and Schervish (1983) for some discussion of the decision-
theoretic features of finitely additive probability.
364 T. Seidenfeld
P(r2) = 0
P(r3)
L2 L3
L1
0 P(r1) 1
L4
According to the postulates (l)–(3), indifference curves () over lotteries are
parallel, straight lines of (finite) positive slope. Li is (strictly) preferred to Lj , Lj < Li ,
is just in case the indifference curve for Li is to the left of the indifference curve
for Lj .
Consider a lottery Li , as in Fig. 20.2. Stochastic dominance provides a (strict)
preference for lotteries to the NW of Li , whereas Li is (strictly) preferred to lotteries
to its SE.2 Thus, the indifference lines must have positive slope. Hence, in this
setting with lotteries over three rewards, expected-utility theory permits one degree
of freedom for preferences, corresponding to the choice of a slope for the lines
of indifference.
In a collaborated effort, Seidenfeld et al. (1987, Section 1), we apply this analysis
to the selection of “sizes” (’-levels) for statistical tests of a simple null hypothesis
against a simple rival hypothesis. The conclusion we derive is the surprising
“incoherence” (conflict with expected-utility theory) of the familiar convention to
choose statistical tests with a size, e.g., ’ D .01 or ’ D .05, independent of the
sample size. This reasoning generalizes that of Lindley (1972, p. 14, where he
gives his argument for the special case of “0–1” losses). In a purely “inferential”
(nondecision-theoretic) Bayesian treatment for the testing of a simple hypothesis
versus a composite alternative, Jeffreys (1971, p. 248) argues for the same caveat
about constant ’-levels.
2
Recall, lottery L2 (first order) stochastically dominates lottery L1 if L2 can be obtained from L1 by
shifting probability mass from less to more desirable payoffs. More precisely, L2 stochastically
dominates L1 if, as a function of increasingly preferred rewards, the cumulative probability
distribution for L2 is everywhere less than (or equal to) the cumulative probability distribution for
L1 . Of course, whenever L2 stochastically dominates L1 , there is a scheme for payoffs, in accord
with the two probability measures, where L2 weakly dominates L1 .
20 Decision Theory Without “Independence” or Without “Ordering” 365
3.1
Allais (1953) poses the following question. For the three rewards, r1 D $0, r2 D $1
million, and r3 D $5 million (so r1 < r2 < r3 ), what are your preferences, in the choice
between lotteries L1 and L2 , and in the choice between lotteries L3 and L4 , where:
The common response, choose L1 over L2 , and L3 over L4 , violates EU theory (under
the assumption that the choices reveal <). This is made evident by an application,
Fig. 20.2, of (Machina’s) figure 1.
The lines connecting the pairs of lotteries in the two choices are parallel.
Thus, regardless of the slope of the parallel, straight-line indifference curves (from
Fig. 20.1) imposed on lotteries over the three rewards, either L2 and L3 are preferred
to their rivals, or else L1 and L4 are preferred. EU precludes the common answer to
Allais’ question.
3.2
Ellsberg’s (1961) paradox of preference for lotteries with known risk (2 M) over
uncertain lotteries (62 M), bearing unknown risk, does not fit the simple mixture-
set model, M. We can accommodate Ellsberg-styled problems by generalizing our
concept of acts so that an act is a function f from states, a (finite, exhaustive)
partition, to distributions on the reward set R. These more general acts are called
“horse lotteries” by Anscombe and Aumann (1963). Denote by M 0 ( M) the
generalized mixture-set for the class of horse lotteries.3 Then lotteries of known risk
belong to this enlarged (mixture) set M0 as a special case: they are the “constant”
acts. That is, the acts of known risk are those for which f 1 is a determinate
probability measure.
Let us see how an Ellsberg-styled paradoxical choice violates postulate 3,
supposing (1) and (2) obtain, when the postulates are applied to M 0 . Imagine I
3
To define the generalized mixture-set M 0 , it suffices to define the operation of convex-combination
of two (generalized) lotteries. This is done exactly as in Anscombe and Aumann’s (1963) treatment
of “horse lotteries.” Horse lotteries, the generalized postulates (l)–(3) for horse lotteries, and, with
the addition of two minor assumptions (precluding a preference-interaction between payoffs and
states), the subjective expected-utility theory that results, are discussed by Fishburn (1970, Chapter
13) and briefly in section Sequential coherence of Levi’s decision theory here.
366 T. Seidenfeld
have placed $10 in one of two pockets, which are otherwise empty. Consider the
following three lotteries:
Lleft – take the contents of my left pocket,
Lright – take the contents of my right pocket, and
Lmix – take the contents of my left pocket if a “fair” coin lands tails up, and take the
contents of my right pocket if the fair coin lands head up.
Lotteries Lleft and Lright are uncertain prospects. Suppose you are indifferent ()
between these two, which you evaluate as having a sure-dollar equivalent of $2.50.
However, the third option, Lmix , is (under the “reduction” postulate) a lottery of
known risk. That is, Lmix is a lottery with an equal (.5; .5) probability distribution on
the two payoffs ($0, $10). In the spirit of the Ellsberg paradoxical choice, suppose
you evaluate the fair gamble on these two payoffs as having, say, a sure-dollar
equivalent of $4.00. You (strictly) prefer the lottery of known risk, Lmix , to either of
the two uncertain lotteries. Finally, as the coin flip gives you no relevant information
about the location of the $10, your conditional preferences over the two uncertain
lotteries (and their $2.50 equivalent) are unaffected by the outcome of the coin flip.
Then, as Lmix is (under reduction) equivalent to the (’ D .5) convex combination of
Lleft and Lright , preference for “risk” over “uncertainty” violates the independence
postulate 3, assuming (1) and (2) hold.4
In fact, given (1) and (2), this version of the Ellsberg paradox conflicts with a
principle (4), (strictly) weaker than principle (3).
(4) Mixture dominance (“betweenness”): Of lotteries L1 and L2 , if each is (weakly
or strictly) preferred (or dispreferred) to a lottery L3 ; so, too, each convex
combination of L1 and L2 is (weakly or strictly) preferred (or dispreferred) to L3 .
4
These preferences are in conflict with Savage’s (1954) “sure-thing” postulate P2. P2 is inconsis-
tent with the following two preferences:
(i) Lright < Lmix.
(ii) Lleft Lright , given the coin lands heads up.
Consider the four-event partition generated by whether the coin lands heads (H) or tails (T), and
whether the $10 is in the left (L) or right (R) pocket. Then, by (i), the first row (below) is preferred
to the second. Savage’s theory uses “called-off” acts to capture conditional preference. Thus, by
(ii), the agent is indifferent between the third and fourth rows.
HL HR TL TR
Lmix $10 $0 $0 $10
Lright $0 $10 $0 $10
Lleft j H $10 $0 $0 $0
Lright j H $0 $10 $0 $0
20 Decision Theory Without “Independence” or Without “Ordering” 367
For regular lotteries, the ranking of a lottery by prospect theory uses the formula:
X
ŒP .ri / .ri / ;
i
where ¤ is a value-function for rewards (akin to the utility u), and is some
monotone-increasing function with (0) D 0 and (1) D 1. Again, let us consider
(regular) lotteries on three rewards r1 < r2 < r3 , where we may take r1 as status quo.
If (and only if) is linear do we have agreement between prospect theory and EU
(for then (x) D x with the scalar constant absorbed into the utility u, defined up to
5
Let u be a utility on payoffs and assume u is positive. Denote by Eu (L1 ) the expected utility of
lottery L1 under utility function u. Denote by L1 1 the lottery that has payoffs with (multiplicative)
inverse utility to L1 . Samuelson’s (1950) “Ysidro” ranking, .” , on lotteries is given by the function
1
5
Y .L1 / D Eu .L1 / =Eu L1 :
Not only does .” satisfy the ordering, Archimedean, and mixture dominance postulates while
failing independence but in addition .” respects stochastic dominance!
368 T. Seidenfeld
6
The result is elementary and has been noted by many, including Kahneman and Tversky (1979,
p. 283–284). Suppose is not linear so that (p C q) > (p) C (q). Then by letting the value
¤(r2 ) approach the value ¤(r1 ), the agent is required (strictly) to prefer L1 : P1 (r1 ) D (1 – [p C q]),
P1 (r2 ) D p C q, and P1 (r1 ) D 0 – over L2 : P2 (r1 ) D P1 (r1 ), P2 (r2 ). D p, P2 (r3 ) D q, even though L2
stochastically dominates L1 . The argument for the other case is similar: (p C q) < (p) C (q).
20 Decision Theory Without “Independence” or Without “Ordering” 369
offered for assessing the performance of a choice rule in sequential decisions when
basic values are unchanging. Clause (ii) is not cogent, I would argue, when basic
values are subject to revision over time. Then there may be a current preference
between two rewards that are (to be) judged indifferent relative to the future,
changed values. Thus there is no reason to demand that substitution of “future”
indifferents preserves the inadmissibility of what is, by current values, a dominated
option.
Of course, the agent’s knowledge of events, chance occurrences, and preceding
choices inevitably changes in the course of a sequential decision. In fact, these
changes in evidence are what makes valuable adaptive experimental designs.
Respect for stochastic dominance provides a safeguard that choice over lotter-
ies attends to sure-gains in payoffs. Single stage (nonsequential) decisions are
thereby protected from violations of weak dominance over rewards. That is not
the case, however, when we attend to sequential decisions. Specifically, coher-
ence in nonsequential decisions, in choices over lotteries, does not entail the
sequential version of coherence in choices over plans. This is illustrated by an
example.
Consider what happens when mixture dominance (4) fails: Example: Let lotteries
L1 and L2 be indifferent with a sure-dollar equivalent of $5.00. Suppose, contrary
to (4), that an equal (’ D .5) convex combination of them is strictly preferred with
a sure-dollar equivalent of, e.g., $6.00. Denote this by L3 D .5 L1 C .5 L2 $6.00.
Then, by continuity of preference for monetary payoffs, there is some fee, $2, that
can be attached to the payoffs of L1 and L2 (resulting in the lotteries denoted by
“L1 2” and “L2 2”) satisfying
Also, we can find some dollar prize strictly dispreferred to both of the
$2 modifications of L1 and L2 , e.g., suppose
$4:00 < .L1 2/ ; .L2 2/ < L1 L2 $5:00 < L4 $5:75 < L3 $6:00
370 T. Seidenfeld
7
Hammond’s (1976, Section 3.3) felicitous phrase is that the agent uses “sophisticated” versus
“myopic” choice.
8
McClennen (1986, 1988, forthcoming) sketches a program of “resolute” choice to govern
sequential decisions when independence fails. I am not very sure how resolute choice works. Part
of my ignorance stems from my inability to find a satisfactory answer to several questions.
As I understand McClennen’s notion of resolute choice, the agent’s preferences for basic
lotteries change across nodes in a sequential decision tree. (Then, the premises of the argument
in Section 4 do not obtain.) In terms of the problem depicted in Fig. 4, at node A the agent resolves
that he will choose L1 at (a) of node B, and by so resolving increases its value at (a) of node B
above the $5.50 alternative.
There are several difficulties I find with this proposal. I suspect that the new value of L1 at (a)
will be fixed at $6.00, and likewise for L2 at (b). (The details of resolute choice are lacking on
this point, but this suspicion is based on the observation that a minor variation in the sequential
incoherence argument applies unless these two lotteries change their value from node A to node B
as indicated. Just modify the construction so that the rejected cash alternative at B is $6.00 – •.)
Then the assessed value of $6.00 for L3 (a mixture of the lotteries L1 and L2, now valued at $6.00
each) is in accord with postulate (2). Such resolutions mandate that changes in preferences agree,
sequentially, with the independence postulate. In terms of sequential decisions, is it not the case
that resolute choice requires changes in values to agree with the independence postulate?
A second problem with resolute choice directs attention at the reasonableness of these
mandatory changes in values. For example, consider the Ellsberg-styled choice problem described
in Section 3.2. Cast in a sequential form, under this interpretation of resolute choice, if Lmix is most
preferred, then the agent is required to increase the value for the option “take the contents of the
right pocket,” given that the coin lands heads up, over the value it has prior to the coin flip.
20 Decision Theory Without “Independence” or Without “Ordering” 371
(a) L1
“heads”
α =.5 $5.50
“tails” $5.50
(b)
A
(c) L1 − ∈
“heads”
α = .5
$5.75 ~ 2 $4.00
L2 − ∈
1−α = .5
“tails” (d) $4 00
But the coin flip is irrelevant to a judgment of where the money is. However uncertain the agent
is prior to the coin flip, is he not just as uncertain afterwards? Concern with uncertainty in the
location of the money is the alleged justification for a failure of independence when comparing the
three terminal options: Lleft , Lright , Lmix , and declaring Lmix (strictly) better than the other two. What
justifies the preference shift, given the outcome of the coin flip, when Lright becomes equivalued
with an even-odds lottery over $10 and $0 despite the same state of uncertainty about the location
of the money before and after the coin flip?
372 T. Seidenfeld
Third, by what standards can the agent reassess the merits of a resolution made at an earlier
time? In other words, how is the agent to determine whether or not to ignore an earlier judgment:
the judgment to commit himself to a resolute future choice and thereby to change his future values.
Once the future has arrived, why not instead frame the decision with the current choice node as the
initial node without altering basic values? Unless this issue is addressed, the question of how to
ratify a resolution is left unanswered, and the problem remains of how to make sense of the earlier
resolution once the moment of choice is at hand.
20 Decision Theory Without “Independence” or Without “Ordering” 373
(a) L1
“heads”
α =.5 $5.50
$L − ∈
1−α = .5 2
sequential problem each of the two plans is represented by a set of four lotteries. In
normal form, the decision is among the eight lotteries:
Of these L3 is most preferred, say. In normal form, then, plan 1 is chosen and is
valued at $6.00 (L3 ). However, the argument offered in this section (establishing
sequential incoherence) does not presume the equivalence of extensive and normal
forms.9
In the sequential problem, at node A, the agent knows L3 is not available to him
under plan 1. This is because he knows that (at nodes B) the dollar prize ($5.50)
is preferred to each of the lotteries L1 and L2 . Under these preferences, the choice
of plan 1 at (A) in the hope that L1 will be chosen if heads and L2 if tails is a pipe
dream – mere wishful thinking that is brought up short by Dynamic Feasibility.10
9
By contrast, Raiffa’s (1968, pp. 83–85) classic objection to the failure of independence in the
Allais paradox depends upon a reduction of extensive to normal forms. Also, in his interesting
discussion, Hammond (1984) requires the equivalence of extensive and normal forms through
his postulate of “consequentialism” in decision trees. These authors defend a strict expected-
utility theory, in which the equivalence obtains. Likewise, LaValle and Wapman (1986) argue for
the independence postulate with the aid of the assumption that extensive and normal forms are
equivalent.
The analysis offered here does not presume this equivalence, nor does avoidance of sequential
incoherence entail this equivalence, since, e.g., it is not satisfied in Levi’s theory either – though
his theory avoids such incoherence. Hence, for the purpose of separating decision theories without
independence from those without ordering, it is critical to avoid equating extensive and normal
forms of decision problems.
10
One may propose that, by force of will, an agent can introduce new terminal options at an
initial choice node, corresponding to the “normal” form version of a sequential decision given
in “extensive” form. Thus, for the problem depicted in Fig. 20.3, the assumption is that the agent
can create the terminal option L3 at node A by opting for plan 1 at A and then choosing L1 at (a)
and L2 at (b).
Whatever the merit of this proposal, it does not apply to the sequential decisions discussed
here, since, by stipulation, the agent cannot avoid reconsideration at nodes B. There may be some
problems in which agents can create new terminal options at will, but that is not a luxury we freely
enjoy. Sometimes we have desirable terminal options and sometimes we can only plan. (See Levi’s
[1980, Chapter 17] interesting account of “using data as input” for more on this subject.)
20 Decision Theory Without “Independence” or Without “Ordering” 375
Proof The proof is given in two cases. (The argument for the second case uses the
full assumption of stochastic dominance rather than the weaker assumption [used in
Case 20.1] that < respects simple dominance in dollar payoffs.)
Case 20.1 Let L1 . L2 , yet for some L3 and ’, ’L2 C (1 ’)L3 < ’L1 C (1 ’)L3 .
Let $X ’L2 C (1 ’)L3 , $Z ’L1 C (1 ’)L3, and $U L3 , with X < Z. By our
assumptions of a weak order for preference, continuity in dollar payoffs, and the
strict preference for more (over less) money, there is some $22 fee and amount $Y
for which:
If we consider the sequential decision problem whose “tree” is depicted in Fig. 20.5
for Case 20.1, we discover by the same reasoning we used in the example above:
$X ~ 1 $(u −δ)
(1−∝)
L3
A
L1 − 2∈
$Y ~ 2 ∝
L2 − g
L3 − 2∈
(1−∝)
$(u −2 δ)
At node A, plan 1 is valued at $X, whereas plan 2 is valued at $Y. Thus, at node
A, plan 2 is the preferred choice.
But at nodes B, regardless of which “chance event” occurs, the favored option
under plan 1 is preferred to the favored option under plan 2. Thus, the preferences
leading to a failure of independence in Case 20.1 succumb to sequential incoher-
ence. An application of indifferences (from the ordering postulate 1 at nodes B leads
to an inconsistent evaluation, at (A), of the sequential plans 1 and 2.
Case 20.2 L1 < L2 , yet there are L3 and ’ > 0 with ’L1 C (1 ’)L3 ’L2 C (1 ’)
L3 . Let $U L3 and $Z ’L1 C (1 ’)L3 . Then choose an $2 fee so that
L1 < L2 2. Let $X ’(L2 2) C (1 ’)L3 . Then by stochastic dominance, X < Z.
Next, by continuity, choose a $• fee to satisfy $Y ’(L1 •) C (1 ’)(L3 •),
where X < Y < Z. Choose an integer n so that L2 n2 < L1 •. Finally, find any $”
fee and choose an integer m so that $(u m
) < L3 •.
Consider a sequential decision problem whose “tree” is depicted in Fig. 20.6 for
Case 20.2. Once again we find an episode of sequential incoherence as:
$X ~ 1 $(u − g)
(1−∝)
A L3
L1 − δ
$Y ~ 2 ∝
L2 − n∈
(1−∝) L3 − δ
$(u −m g)
At (A), plan 1 is valued at $X, whereas plan 2 is valued at $Y. Thus, at node A plan
2 is the preferred choice.
At node B, regardless of which chance event occurs, the favored option under plan
1 is preferred to the favored option under plan 2. Thus, the preferences leading to a
failure of independence in Case 20.2 succumb to sequential incoherence.
Concluding Remark
Can familiar “Dutch Book” arguments (de Finetti 1974; Shimony 1955) be used
to duplicate the results obtained here? Do the considerations of book establish
sequential incoherence when independence fails? I think they do not.
The book arguments require an assumption that the concatenation (conjunction)
of favorable or indifferent or unfavorable gambles is, again, a favorable or indifferent
or unfavorable gamble. That is, the book arguments presume payoffs with a simple,
additive, utility-like structure. The existence of such commodities does not follow
from a (subjective) expected-utility theory, like Savage’s. And rivals to EU, such as
Samuelson’s (1950) “Ysidro” ranking, can fail to satisfy this assumption though
they respect stochastic dominance in lotteries. Thus, in light of this assumption
about combinations of bets, the Dutch Book argument is not neutral to the dispute
over coherence of preference when independence fails. (Of course, that debate is
not what Dutch Book arguments are designed for.)11
This objection to the use of a book argument does not apply to the analysis
presented here. The argument for sequential incoherence is not predicated on the
Dutch Book premise about concatenations of favorable gambles. That assumption
is replaced by a weaker one, to wit: . respects stochastic dominance in $-rewards.
There is no mystery why the weakening is possible. Here, we avoid the central
question addressed by the Dutch Book argument: When are betting odds subjective
probabilities? The book arguments pursue the representation of coherent belief as
probabilities, given a particular valuation for combinations of payoffs. Instead, the
spotlight here is placed on the notion of coherent sequential preference, given a
preference (a weak ordering) of lotteries with canonical probabilities.
11
See Frederic Schick’s “Dutch Bookies and Money Pumps” (1986) for discussion of the import
of this concatenation assumption in the Dutch Book argument. Its abuse in certain “intertemporal”
versions of Dutch Book is discussed in Levi (1987).
378 T. Seidenfeld
Summary
A detailed analysis of Levi’s Decision Theory (LDT), a theory without the ordering
postulate, is beyond the scope of this essay. Here, instead, I shall merely report
the central results that establish coherence of LDT in sequential choices over horse
lotteries, a setting where both values (utility/security) and beliefs (probability) may
be indeterminate. (I give proofs of these results in a technical report Seidenfeld
(1987)).
To begin with, consider the following choice-based generalizations of the
concepts: indifference, preference, and stochastic dominance. These generalizations
are intended to apply in the domain of horse-lottery options, regardless of whether
or not a decision rule induces a weak-ordering of acts.
The notational abbreviations I use are these. An option is denoted by oi and, since
acts are functions from states to outcomes, also by the function on states oi (s). Sets
of feasible options are denoted by O, and the admissible options (according to a
decision rule) from a feasible set O are denoted by the function C[O].
Call two options indifferent, if and only if, whenever both are available either
both are admissible or neither is.
Definition
12
Levi (1980, Section 5.6) offers a novel rule, here called rule’, for determining expectation-
inequalities when the partition of states, , is finite but when events may have subjective probability
0. The motive for this emendation is to extend the applicability of “called-off” bets (Shimony 1955)
to include a definition of conditional probability given an event of (unconditional) probability 0.
Also, it extends the range of cases where a weak-dominance relation determines a strict preference.
Given a probability/utility pair (P, U ), maximizing -expected utility (with rule’) includes a
weak-order that satisfies the independence axiom, though, -expectations may fail to admit a real-
valued representation, i.e., the “Archimedean” axiom is not then valid. Under rule , given a pair
(P, U ), -expectations are represented by a lexicographic ordering of a vector-valued quantity.
380 T. Seidenfeld
sec2 [o] D infemum PU EP,U [o] – security indexed by the least expected utility, also
called the “-minimax” level for option o. (I avoid the minor details of defining
sec2 when events have probability O and expectations are determined by rule , as
reported in note 12.)
Thus, an option is admissible in LDT just in case it is both E-admissible
and maximizes security among those that likewise are E-admissible. As a special
case, when security is vacuous or is indexed by sec2 , and when both P and U
are unit sets, C[O] is the subset of options that maximize subjective expected
utility: the strict Bayesian framework. Then admissibility satisfies the ordering and
independence postulates (and the Archimedean axiom, too, provided events have
positive probability or rule is not used).
The next few results, stated without proof, provide the core of the argument for
sequential coherence of LDT. Condition i of coherence in nonsequential decisions,
and more, is shown by the following.
Theorem 20.1 If o2 can be obtained from o1 by shifting distribution masses
from categorically less to more preferred rewards, then o1 < o2 , and thus o1 is
inadmissible whenever o2 is available.
The theorem follows directly from a useful lemma about < and categorical
preference in LDT.
Lemma o1 o2 () 8(P, U)EPU (o1 ) < EPU (o2 ). Thus, r2 is categorically
preferred to r1 () 8(U) U(r1 ) < U(r2 ).
Thus, both -dominance and categorical preference are strict partial orders,
being irreflexive and transitive. In addition, following obviously from its definition,
-dominance satisfies the independence axiom. [Note the term “categorical prefer-
ence” is borrowed from Levi (1986b, p. 91). The lemma provides the collateral for
this conceptual loan.]
Condition ii of coherence for LDT in nonsequential decisions is shown by a result
that two options are related exactly when they have the same security index and
are indiscernible by expectations:
Theorem 20.2 o1 o2 () (sec[o1 ] D sec[o2 ] & 8(P,U)(EPU [o1 ] D EPU [o2 ]).
Thus, is an equivalence relation and, in nonsequential decisions, admissibility is
preserved under substitution of indifferent options.
In order to extend the analysis to sequential decisions, the notion of
indifference is generalized to include conditional assessments, conditional upon the
occurrence of either “chance” or “event” outcomes. The next corollary is elementary
and indicates how conditional indifference applies when for instance, choice
nodes follow chance nodes.
Corollary 20.1 is preserved under chance mixtures,
I have argued that decision theories that relax only the independence postulate
succumb to sequential incoherence. That is, such programs face the embarrassment
of choosing stochastically dominated options when, in simple two-stage sequential
decisions, dollar equivalents are substituted for their indifferent options at terminal
choice nodes. Moreover, the criticism does not presume an equivalence between
sequential decisions (in extensive form) and their normal form reductions; instead,
all decisions are subject to a principle of Dynamic Feasibility.
In section “Sequential coherence of Levi’s decision theory”, I generalize sequen-
tial coherence to choice rules that may not induce an ordering of options by
preference. Also, I outline reasons for the claim that Levi’s Decision Theory is
sequentially coherent (in a setting where both belief and value are subject to
indeterminacy). Since Levi’s theory is one that fails the ordering postulate, the
combined results establish a demarcation between these two strategies for relaxing
traditional (subjective) expected-utility theory. The difference is that only one of the
two approaches is sequentially coherent.
Acknowledgments I have benefitted from discussions with each of the following about the
problems addressed in this essay: W. Harper. J. Kadane, M. Machina, P. Maher, E. F. McClennen,
M. Schervish; and I am especially grateful to I. Levi.
Discussions
Editors’ Note
ideal of rational behavior, these failures may simply show that people often behave
irrationally. Yet if the gap between ideal and actual behavior is too wide, or if
behavior that on the best analysis we can make is rational but not consistent with
subjective expected-utility theory, then we may come to doubt some of the axioms
of the theory. Two main lines of revision have been suggested: either weakening
the “ordering” axiom that requires preferences to be complete or surrendering
the so-called independence principle. Although the issues are highly abstract and
somewhat technical, the stakes are high; subjective expected-utility theory is critical
to contemporary economic thought concerning rational conduct in public as well as
private affairs.
In the preceding article, “Decision Theory without ‘Independence’ or without
‘Ordering’: What Is the Difference?” Teddy Seidenfeld argued for the sacrifice
of ordering rather than independence by attempting to show that abandoning the
latter leads to a kind of sequential incoherence in decision making that will not
result from one specific proposal (Isaac Levi’s) for abandoning ordering. In their
comments in this section, Edward McClennen, who supports surrendering the
independence postulate rather than ordering, and Peter Hammond, who argues
against any weakening of subjective expected-utility theory, discuss Seidenfeld’s
argument from their quite different theoretical perspectives.
Economics and Philosophy, 4, 1988, 292–297. Printed in the United States of
America.
References
Allais, M. (1953). Le comportement de l’homme rationnel devant le risque: Critique des postulats
et axiomes de l’école americaine. Econometrica, 21, 503–546.
Allais, M. (1979). The so-called Allais Paradox and rational decisions under uncertainty. In
M. Allais & O. Hagen (Eds.), Expected utility hypotheses and the Allais Paradox (pp. 437–
681). Dordrecht: Reidel.
Anscombe, F. J., & Aumann, R. J. (1963). A definition of subjective probability. Annals of
Mathematical Statistics, 34, 199–205.
Bell, D. (1982). Regret in decision making under uncertainty. Operations Research, 20, 961–981.
Bell, D., & Raiffa, H. (1979). Decision regret: A component of risk aversion. Unpublished
manuscript, Harvard University.
Chew, S. H. (1981). A mixture set axiomatization of weighted utility theory (4th revision). Tuscon:
Department of Economics, University of Arizona.
Chew, S. H. (1983). A generalization of the quasilinear mean with applications to the measurement
of income inequality and decision theory resolving the Allais Paradox. Tuscon: Department of
Economics, University of Arizona.
Chew, S. H., & MacCrimmon, K. R. (1979). Alpha-Nu choice theory: A generalization of expected
utility theory (University of British Columbia Working Paper).
de Finetti, B. (1974). Theory of probability (2 vols.). New York: Wiley.
Debreu, G. (1959). Theory of value. New Haven: Yale University Press.
Drèze, J. H. (1985). Decision theory with moral hazard and state-dependent preferences (CORE
discussion paper #8545). Belgium: Center for Operations Research and Econometrics, Univer-
sité Catholique de Louvain.
20 Decision Theory Without “Independence” or Without “Ordering” 383
Ellsberg, D. (1961). Risk, ambiguity, and the savage axioms. Quarterly Journal of Economics, 75,
643–699.
Fishburn, P. C. (1970). Utility theory for decision making. New York: Kriefer Publishing Company.
Fishburn, P. C. (1981). An axiomatic characterization of skew-symmetric bilinear functionals, with
applications to utility theory. Economic Letters, 8, 311–313.
Fishburn, P. C. (1983). Nontransitive measurable utility. Journal of Mathematical Psychology, 26,
31–67.
Good, I. J. (1952). Rational decisions. Journal of the Royal Statistical Society B, 14, 107–114.
Hammond, P. J. (1976). Changing tastes and coherent dynamic choice. Review of Economic
Studies, 43, 159–173.
Hammond, P. J. (1984). Consequentialist behaviour in decision trees is Bayesian rational. Stanford
University.
Herstein, I. N., & Milnor, J. (1953). An axiomatic approach to measurable utility. Econometrica,
21, 291–297.
Jeffreys, H. (1971). Theory of probability (3rd ed.). Oxford: Oxford University Press.
Jensen, N. E. (1967). An introduction to Bernoullian utility theory: I. Utility functions. Swedish
Journal of Economics, 69, 163–183.
Kadane, J. (1986). Toward a more ethical clinical trial. Journal of Medicine and Philosophy, 11,
385–404.
Kadane, J., & Sedransk, N. (1980). Toward a more ethical clinical trial. In J. Bernardo, M. H.
DeGroot, D. V. Lindley, & A. F. M. Smith (Eds.), Bayesian statistics (pp. 329–338). Valencia:
University Press.
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk.
Econometrica, 47, 263–291.
LaValle, I. H., & Wapman, K. R. (1986). Rolling back decision trees requires the independence
axiom? Management Science, 32(3), 382–385.
Levi, I. (1974). On indeterminate probabilities. Journal of Philosophy, 71, 391–418.
Levi, I. (1980). The enterprise of knowledge. Cambridge: MIT Press.
Levi, I. (1986a). The paradoxes of Allais and Ellsberg. Economics and Philosophy, 2, 23–53.
Levi, I. (1986b). Hard choices: Decision making under unresolved conflict. Cambridge: Cambridge
University Press.
Levi, I. (1987). The demons of decision. The Monist, 70, 193–211.
Lindley, D. V. (1972). Bayesian statistics: A review. Philadelphia: SIAM.
Loomes, G., & Sugden, R. (1982). Regret theory: An alternative theory of rational choice under
uncertainty. Economic Journal, 92, 805–824.
Luce, D., & Raiffa, H. (1957). Games and decisions. New York: Wiley.
Machina, M. (1982). Expected utility analysis without the independence axiom. Econometrica, 50,
277–323.
Machina, M. (1983, October). The economic theory of individual behavior toward risk: Theory,
evidence and new directions (Technical report #433). San Diego: Department of Economics,
University of California.
McClennen, E. F. (1983). Sure-thing doubts. In B. Stigum & F. Wenstop (Eds.), Foundations of
utility and risk theory with applications (pp. 117–136). Dordrecht: Reidel.
McClennen, E. F. (1986). Prisoner’s dilemma and resolute choice. In R. Campbell & L. Sowden
(Eds.), Paradoxes of rationality and cooperation (pp. 94–104). Vancouver: University of British
Columbia Press.
McClennen, E. F. (1988). Dynamic choice and rationality. In B. R. Munier (Ed.), Risk, decision,
and rationality (pp. 517–536). Dordrecht: Reidel.
McClennen, E. F. (forthcoming). Rationality and dynamic choice: Foundational explorations.
Cambridge: Cambridge University Press.
Raiffa, H. (1968). Decision analysis: Introductory lectures on choices under uncertainly. Reading:
Addison-Wesley.
Samuelson, P. (1950). Probability and the attempts to measure utility. Economic Review, 1, 167–
173.
384 T. Seidenfeld
Introduction
John and Lisa are offered additional insurance against the risk of a heart disease.
They would like to know the probability of developing such a disease over the
next 10 years. The happy couple shares some key medical parameters: they are
70 years old, smoke, and never had a blood pressure problem. A few tests show that
both have a total cholesterol level of 310 mg/dL, with HDL-C (good cholesterol) of
45 mg/dL, and that their systolic blood pressure is 130. Googling “heart disease risk
calculator”, they find several sites that allow them to calculate their risk. The results
(May 2010) are:
I. Gilboa ()
HEC, Paris, France
Tel-Aviv University, Tel Aviv, Israel
e-mail: [email protected]
M. Marinacci
Università Bocconi, Milano, Italy
e-mail: [email protected]
As we can see from the table, the estimates vary substantially: the highest
for John is 100 % higher than the lowest, whereas for Lisa the ratio is 5:2.
Opinion diverge in these examples, even though there are based on many causally
independent observations that allow the use of statistical techniques such as logistic
regression. However, in many important economic questions, such as the extent of
global warming, there are very few past events to rely on. Further, many events,
such as revolutions and financial crises, cannot be assumed independent of past
observations. Thus, it appears that for many events of interest one cannot define an
objective, agreed-upon probability.
1
As Cyert and DeGroot (1974) write on p. 524 “To the Bayesian, all uncertainty can be represented
by probability distributions.”
21 Ambiguity and the Bayesian Paradigm 387
sense, to uncertainty . . . The sense in which I am using the term is that in which the prospect
of a European war is uncertain, or the price of copper and the rate of interest twenty years
hence . . . About these matters there is no scientific basis on which to form any calculable
probability whatever. We simply do not know.
Gilboa et al. (2008, 2009, 2012) argue that the axiomatic foundations of the
Bayesian approach are not as compelling as they seem, and that it may be irrational
to follow this approach. In a nutshell, their argument is that the Bayesian approach
is limited because of its inability to express ignorance: it requires that the agent
express beliefs whenever asked, without being allowed to say “I don’t know” . Such
an agent may provide arbitrary answers, which are likely to violate the axioms, or
adopt a single probability and provide answers based on it. But such a choice would
be arbitrary, and therefore a poor candidate for a rational mode of behavior.
Axiomatic derivations such as Savage’s may convince the DM that she ought
to have a probability, but they do not tell her which probability it makes sense to
adopt. If there are no additional guiding principles, an agent who picks a probability
measure arbitrarily should ask herself, is it so rational to make weighty decisions
based on my arbitrarily-chosen beliefs? If there are good reasons to support my
beliefs, others should agree with me, and then the probabilities would be objective.
If, however, the probabilities are subjective, and others have different probabilities,
what makes me so committed to mine? Wouldn’t it be more rational to admit that
these beliefs were arbitrarily chosen, and that, in fact, I don’t know the probabilities
in question?
Outline
The rest of this paper is organized as follows. Section “History and Background”
discusses the history and background of the Bayesian approach. It highlights the
fact that this approach has probably never been adopted with such religious zeal as
it has within economic theory over the past 60 years. Section “Alternative Models”
describes several alternatives to the standard Bayesian model. It surveys only a
few of these, attempting to show that much of the foundations and machinery
of the standard model need not be discarded in order to deal with uncertainty.
Section “Ambiguity Aversion” surveys the notion of ambiguity aversion. The
updating of non-Bayesian beliefs is discussed in section “Updating Beliefs”.
Section “Applications” briefly describes some applications of non-Bayesian models.
The applications mentioned here are but a few examples of a growing literature.
They serve to illustrate how non-Bayesian models may lead to different quali-
tative predictions than Bayesian ones. A few general comments are provided in
section “Conclusion”.
388 I. Gilboa and M. Marinacci
Early Pioneers
Decision theory was born as a twin brother of probability theory through the works
of a few scholars in the sixteenth and seventeenth century, originally motivated
by the study of games of chance. Among them the works of Christiaan Huygens
(1629–1695) and Blaise Pascal (1623–1662) are particularly relevant. We begin with
Pascal, whose footsteps Huygens followed.
Pascal (1670) Since its very early days, probability had two different interpre-
tations: first, it captures the notion of chance, referring to relative frequencies
of occurrences in experiments that are repeated under the same conditions. This
includes the various games of chance that provided the motivation for the early
development of the theory. Second, probability can capture the notion of degree of
belief, even when no randomness is assumed, and when nothing remotely similar to
a repeated experiment can be imagined.
It is this second interpretation that, over time, evolved into the Bayesian approach
in both decision and probability theory. In this regard, Pascal is perhaps the most
important pioneer of probability theory. Though he made early key contributions
to the probabilistic modeling of games of chance, it is his famous wager that is
mostly relevant here. Roughly at the same time that Descartes and Leibniz were
attempting to prove that God existed, Pascal changed the question from the proof
of existence to the argument that it is worthwhile to believe in God, an option that
he identified with the choice of a pious form of life based on the precepts of the
Christian religion.2 In so doing, he applied the mathematical machinery developed
for objective probabilities in games of chance to the subjective question of God’s
existence, where no repeated experiment interpretation is possible. This led him to
informally introduce several major ideas of modern decision theory, including the
decision matrix, the notion of dominant strategies, subjective probability, expected
utility maximization, and non-unique probability.3
Thus, the subjective interpretation of probabilities and their application as a tool
to quantify beliefs showed up on the scene more or less as soon as did the objective
interpretation and the application to games of chance. Further, as soon as the notion
of subjective probability came on stage, it was accompanied by the possibility that
this probability might not be known (see Shafer (1986), for related remarks on
Bernoulli (1713), who introduced the law of large numbers).
2
According to Pascal a pious life would ultimately induce faith. Importantly, Pascal did not assume
that one can simply choose one’s beliefs.
3
Pascal did not finish his Pensées, which appeared in print in 1670, 8 years after his death. The
text that was left is notoriously hard to read since he only sketches his thoughts (here we use the
1910 English edition of W. F. Trotter). Our rendering of his argument crucially relies on Hacking
(1975)’s interpretation (see Hacking 1975, pp. 63–72; Gilboa 2009, pp. 38–40).
21 Ambiguity and the Bayesian Paradigm 389
Huygens (1657) In the wake of the early probability discussions of Fermat and
Pascal, Huygens (1657) first clearly proposed expected values to evaluate games of
fortune.4 Unlike Pascal’s grand theological stand, Huygens only dealt with games
of fortune ( “cards, dice, wagers, lotteries, etc.” as reported in the 1714 English
version). Nevertheless, he was well aware of the intellectual depth of his subject.
Huygens’ arguments are a bit obscure (at least for the modern reader; see Daston
(1995)). His essay has, however, a few remarkable features from our perspective.
First, he does not present the expected value criterion as an axiom; rather, he justifies
its relevance by starting from more basic principles. For this reason his essay is
articulated in a sequence of mathematical propositions that establish the expected
value criterion for more and more complicated games. Huygens’s propositions can
be thus viewed as the very first decision-theoretic representation theorems, in which
the relevance of a decision criterion is not viewed as self-evident, but needs to be
justified through logical arguments based on first principles.
A second remarkable feature of Huygens’ essay is the basic principle, his
“postulat” in the English version, which he based his analysis upon. We may
call it the principle of equivalent games, in which he assumes that the values of
games of chances should be derived through the value of equivalent fair games.
Ramsey’s assumption of the existence of bets with equally likely outcomes (that
he calls “ethically neutral”) is an instance of this principle, as well as de Finetti’s
assumption of the existence of partitions of equally likely events. More recently,
the central role that certainty equivalents play in many axiomatic derivations can be
viewed as a later instance of Huygens’ comparative principle of studying uncertain
alternatives by means of benchmark alternatives with suitably simple structures.
Hacking (1975) makes some further observations on the relevance of Huygens’
book for the history of subjective probability. We refer the interested reader to his
book, with a warning on the difficulty of interpreting some of Huygens’ arguments.
Modern decision theory, and in particular the way it models uncertainty, is the result
of the pioneering contributions of a truly impressive array of scholars. Some of
the finest minds of the first half of last century contributed to the formal modeling
of human behavior. Among them, especially remarkable are the works of Frank
Plumpton Ramsey (1903–1930) with his early insights on the relations between
utilities and subjective probabilities, John von Neumann (1901–1957) and Oskar
Morgenstern (1902–1977) with their classic axiomatization of expected utility
presented in the 1947 edition of their famous game theory book, Bruno de Finetti
(1906–1985) with his seminal contributions to subjective probability, and Leonard J.
Savage (1917–1971), who – in an unparalleled conceptual and mathematical tour de
4
See Ore (1960).
390 I. Gilboa and M. Marinacci
5
Operationalism started with Bridgman (1927), after Ramsey’s articles of 1926a and 1926b.
21 Ambiguity and the Bayesian Paradigm 391
Ellsberg Paradox
The classic Bayesian theory culminating in Savage’s opus represents beliefs prob-
abilistically, but it does not capture the degree of confidence that DMs have in
their own probabilistic assessments, a degree that depends on the quality of the
information that DMs use in forming these assessments. The classic theory focused
on how to measure beliefs, without providing a way to assess the quality of such
measurements.
Ellsberg (1961) provided two stark thought experiments that showed how this
limitation may lead many people to violate Savage’s otherwise extremely com-
pelling axioms, and to express preferences that are incompatible with any (single,
additive) probability measure. Ellsberg argued that a situation in which probabilities
are not known, which he referred to as ambiguity,8 induces different decisions than
situations of risk, namely, uncertainty with known probabilities. Specifically, one of
Ellsberg’s experiments involves two urns, I and II, with 100 balls in each. The DM
is told that
6
Frisch (1926) was the first article we are aware of that adopted a similar approach in economic
theory.
7
See, e.g., chapter 8 of Kreps (1988).
8
Today, the terms “ambiguity”, “uncertainty” (as opposed to “risk”), and “Knightian uncertainty”
are used interchangeably to describe the case of unknown probabilities.
392 I. Gilboa and M. Marinacci
9
See Gilboa (2009) and Wakker (2010) for the analysis.
21 Ambiguity and the Bayesian Paradigm 393
next 10 years. She cannot assume that this probability is 50 %, based on Laplace’s
Principle of Indifference (or “Principle of Insufficient Reason”, Laplace 1814).
The two eventualities, “average temperature increases by 4 degrees or more” and
“average temperature does not increase by 4 degrees” are not symmetric. Moreover,
if Mary replaces 4 degrees by 5 degrees, she will obtain two similar events, but she
cannot generally assign a 50–50 % probability to any pair of complementary events.
Nor will a uniform distribution over the temperature scale be a rational method
of assigning probabilities.10 The fundamental difficulty is that in most real life
problems there is too much information to apply the Principle of Indifference, yet
too little information to single out a unique probability measure.11 Global warming
and stock market crashes, wars and elections, business ventures and career paths
face us with uncertainty that is neither readily quantified nor easily dismissed by
symmetry considerations.
Other Disciplines
The Bayesian approach has proved useful in statistics, machine learning, philosophy
of science, and other fields. In none of these fellow disciplines has it achieved
the status of orthodoxy that it enjoys within economic theory. It is a respectable
approach, providing fundamental insights and relishing conceptual coherence. It is
worth pointing out, however, that in these disciplines the Bayesian approach is one
among many. More importantly, in all of these disciplines the Bayesian approach
is applied to a restricted state space, such as a space of parameters, whereas in
economics it is often expected to apply also to a grand state space, whose elements
describe anything that can possibly be of interest.
Consider statistics first. The statistical inference problem is defined by a set of
distributions, or data generating processes, out of which a subset of distributions has
to be chosen. In parametric problems, the set of distributions is assumed to be known
up to the specification of finitely many parameters. Classical statistics does not
allow the specification of prior beliefs over these parameters. By contrast, Bayesian
10
Bertrand’s (1907) early critique of the principle of indifference was made in the context of a
continuous space. See also Gilboa (2009) and Gilboa et al. (2009).
11
It is not entirely clear how one can justify the Principle of Indifference even in cases of
ignorance. For example, Kass and Wasserman (1996) p. 1347 discuss the partition paradox and
lack of parametric invariance, two closely related issues that arise with Laplace’s Principle. Similar
remarks from a Macroeconomics perspective can be found in Kocherlakota (2007) p. 357.
Based on a result by Henri Poincaré, Machina (2004) suggests a justification of the Laplace’s
Principle using a sequence of fine partitions of the state stace. This type of reasoning seems to
underlie most convincing examples of random devices, such as tossing coins, spinning roulette
wheels, and the like. It is tempting to suggest that this is the only compelling justification of the
Principle of Indifference, and that this principle should not be invoked unless such a justification
exists.
394 I. Gilboa and M. Marinacci
statistics demands that such beliefs be specified. Thus the Bayesian approach offers
a richer language, within which the statistician can represent prior knowledge and
intuition. Further, the Bayesian prior, updated to a posterior based on sampling,
behaves in a much more coherent way than the techniques of classical statistics.
(See, for example, Welch (1939), also described in DeGroot (1975), pp. 400–401.)
The main disadvantage of the Bayesian approach to statistics is its subjectivity:
since the prior beliefs of the parameters is up to the statistician to choose, they
will differ from one statistician to another. Admittedly, classical statistics cannot
claim to be fully objective either, because the very formulation of the problem as
well as the choice of statistics, tests, and significance levels leave room for the
statistician’s discretion. Yet, these are typically considered necessary evils, with
objectivity remaining an accepted goal, whereas the Bayesian approach embraces
subjective inputs unabashedly.12 On the bright side, if a Bayesian statistician selects
a sufficiently “diffused” or “uninformative” prior, she hopes not to rule out the true
parameters a priori, and thereby to allow learning of objective truths in the long run,
despite the initial reliance on subjective judgments.13
The Bayesian approach has a similar status in the related fields of computer
science and machine learning.14 On the one hand, it appears to be the most
conceptually coherent model of inference. On the other, its conclusions depend on
a priori biases. For example, the analysis of algorithms’ complexity is typically
conducted based on their worst case. The Bayesian alternative is often dismissed
because of its dependence on the assumptions about the underlying distribution.
It is important to emphasize that in statistics and in computer science the state
space, which is the subject of prior and posterior beliefs, tends to be a restricted
space that does not grow with the data. For example, it can comprise of all
combinations of values of finitely many parameters, which are held fixed throughout
the sampling procedure. By contrast, the standard approach in economic theory
suggests that the state of the world resolves all uncertainty, and thus describes
everything that might be of relevance to the problem at hand, from the beginning
of time until eternity. As a result, the state space that is often assumed in economics
is much larger than in other disciplines. Importantly, it increases with the size of the
data.
When one considers a restricted set of parameters, one may argue that the prior
probability over this set is derived from past observations of similar problems, each
with its own parameters, taken out of the same set. But when the grand state space
is considered, and all past repetitions of the problem are already included in the
12
See Lewis (1980) and chapter 4 of van Frassen (1989) (and the references therein) for a
discussion of the relations between “objectivity” and subjective probabilities from a philosophical
standpoint.
13
Kass and Wasserman (1996), Bayarri and Berger (2004), and Berger (2004) discuss uninforma-
tive priors and related “objective” issues in Bayesian statistics (according to Efron (1986), some
of these issues explain the relatively limited use of Bayesian methods in applied statistics).
14
See Pearl (1986) and the ensuing literature on Bayesian networks.
21 Ambiguity and the Bayesian Paradigm 395
description of each state, the prior probability should be specified on a rather large
state space before any data were observed. With no observations at all, and a very
large state space, the selection of a prior probability seems highly arbitrary.
In applications of the Bayesian approach in statistics, computer science, and
machine learning, it is typically assumed that the basic structure of the process is
known, and only a bounded number of parameters need to be learnt. Many non-
parametric methods allow an infinitely dimensional parameter space, but one that
does not grow with the number of observations. This approach is sufficient for
many statistical inference and learning problems in which independent repetitions
are allowed. But economics is often interested in events that do not repeat. Applying
the Bayesian approach to these is harder to justify.
We are not fully aware of the origins of the application of the Bayesian approach
to the grand state space. It is well known that de Finetti was a devout Bayesian.
Savage, who followed his footsteps, was apparently much less religious in his
Bayesian beliefs. Yet, he argued that a state of the world should “resolve all
uncertainty” and, with a healthy degree of self-criticism, urged the reader to imagine
that she had but one decision to be taken in her lifetime, and this is her choice of her
strategy before being born. Harsanyi (1967, 1968) made a fundamental contribution
to economics by showing how players’ types should be viewed as part of the state of
the world, and assumed that all unborn players start with a common prior over the
grand state space that is thus generated. Aumann (1974, 1976, 1987) pushed this line
further by assuming that all acts and all beliefs are fully specified in each and every
state, while retaining the assumption that all players have a prior, and moreover, the
same prior over the resulting state space.
Somewhere along recent history, with path-breaking contributions by de Finetti,
Savage, Harsanyi, and Aumann, economic theory found itself with a state space that
is much larger than anything that statisticians or computer scientists have in mind
when they generate a prior probability. Surprisingly, the economic theory approach
is even more idealized than the Bayesian approach in the philosophy of science.
There is nothing wrong in formulating the grand state space as a canonical model
within which claims can be embedded. But the assumption that one can have a prior
probability over this space, or that this is the only rational way to think about it is
questionable.
Summary
Since the mid-twentieth century economic theory has adopted a rather unique
commitment to the Bayesian approach. By and large, the Bayesian approach is
assumed to be the only rational way to describe knowledge and beliefs, and this
holds irrespective of the state space under consideration. Importantly, economic
theory clings to Bayesianism also when dealing with problems of unique nature,
where nothing is known about the structure of the data generating process. Research
in recent decades plainly shows that the Bayesian approach can be extremely
396 I. Gilboa and M. Marinacci
fruitful even when applied to such unique problems. But it is also possible that
the commitment to the Bayesian approach beclouds interesting findings and new
insights.
The preceding discussion highlights our view that there is nothing irrational
about violating the Bayesian doctrine in certain problems. As opposed to models
of bounded rationality, psychological biases, or behavioral economics, the focus of
this survey are models in which DMs may sometimes admit that they do not know
what the probabilities they face are. Being able to admit ignorance is not a mistake.
It is, we claim, more rational than to pretend that one knows what cannot be known.
Bounded rationality and behavioral economics models often focus on descriptive
interpretations. At times, they would take a conditionally-normative approach,
asking normative questions given certain constraints on the rationality of some
individuals. Such models are important and useful. However, the models discussed
here are different in that they are fully compatible with normative interpretations.
When central bank executives consider monetary policies, and when leaders of a
country make decisions about military actions, they will not make a mistake if they
do not form Bayesian probabilities. On the contrary, they will be well advised to
take into account those uncertainties that cannot be quantified.
Alternative Models
15
Throughout the section we use interchangeably the terms lotteries and simple probabilities.
Pn
Simple acts have the form f D iD1 pi 1Ei , where fEi gniD1 † is a partition of S and fpi gniD1
16
A key feature of .X/ is its convexity, which makes it possible to combine acts.
Specifically, given any ˛ 2 Œ0; 1, set
The mixed act ˛f C .1 ˛/ g delivers in each state s the compound lottery ˛f .s/ C
.1 ˛/ g .s/. In other words, ex post, after the realization of state s, the DM obtains
a risky outcome governed by the lottery ˛f .s/ C .1 ˛/ g .s/.17
The possibility of mixing acts is a key dividend of the assumption that .X/ is
the consequence space, which gives the AA setting a vector structure that the Savage
setting did not have. The derivation of the subjective expected utility representation
in the AA setting is based on this vector structure.
Risk preference The DM has a primitive preference % on F . In turn, this
preference induces a preference % on lotteries by setting, for all p; q 2 .X/,
p % q , f % g;
where f and g are the constant acts such that f .s/ D p and g .s/ D q for all s 2 S.
Constant acts are not affected by state uncertainty, only by the risk due to the
lotteries’ exogenous probabilities. For this reason, % can be seen as the risk
preference of the DM. This is an important conceptual implication of having .X/
as the consequence space. This richer consequence space mathematically delivers a
most useful vector structure, while from a decision theoretic standpoint it enriches
the setting with a risk preference that allows to consider the DMs’ risk behavior
separately. Differently put, the AA consequence space can be viewed as derived
from an underlying consequence space X a la Savage, enriched by a lottery structure
that allows to calibrate risk preferences.
Alternatively, one may view AA’s model as an improved version of de Finetti’s
(1931, 1937) axiomatic derivation of expected value maximization with subjective
probabilities. de Finetti assumed additivity or linearity in payoffs. This is a
problematic assumption if payoffs are monetary, but it is more palatable if payoffs
are probabilities of receiving a fixed desirable outcome. Replacing the payoffs in de
Finetti’s model by probabilities of outcomes, one obtains a model akin to AA’s.
In a sense, the AA model is a hybrid between vNM’s and Savage’s. Mathemati-
cally it is akin to the former, as it starts with a vNM theorem on a particular mixture
space, and imposes additional axioms to derive subjective probabilities. Conceptu-
ally, it is closer to Savage’s model, as it derives probabilities from preferences. Many
view this derivation as conceptually less satisfactory than Savage’s, because the
latter does not assume probabilities, or any numbers for that matter, to be part of the
data. Anscombe and Aumann, however, viewed the use of objective probabilities as
a merit, because they believed that people think in terms of subjective probabilities
17
For this reason, mixing acts in this way is sometimes called “ex post randomization.” For recent
models with ex ante randomization, see Epstein et al. (2007), Ergin and Sarver (2009), Seo (2009),
and Saito (2015).
398 I. Gilboa and M. Marinacci
after they have internalized the concept of objective probability. Be that as it may,
there is no doubt that the AA model has become the main testbed for new models
of decision under uncertainty.18
Axioms We now make a few assumptions on the primitive preference %. The first
one is a standard weak order axiom.
AA.1 WEAK ORDER: % on F is complete and transitive.
The next axiom is a monotonicity assumption: if state by state an act f delivers a
weakly better (risky) consequence than an act g, then f should be weakly preferred
to g. It is a basic rationality axiom.
AA.2 MONOTONICITY: for any f ; g 2 F , if f .s/ % g .s/ for each s 2 S, then
f % g.
Next we have an independence axiom, which is peculiar to the AA setting since
it relies on its vector structure.
AA.3 INDEPENDENCE: for any three acts f ; g; h 2 F and any 0 < ˛ < 1, we
have
According to this axiom, the DM’s preference over two acts f and g is not
affected by mixing them with a common act h. In the special case when all these
acts are constant, axiom AA.3 reduces to von Neumann-Morgenstern’s original
independence axiom on lotteries.
We close with standard Archimedean and nontriviality assumptions.19
AA.4 ARCHIMEDEAN: let f , g, and h be any three acts in F such that f g h.
Then, there are ˛; ˇ 2 .0; 1/ such that ˛f C .1 ˛/h g ˇf C .1 ˇ/h.
AA.5 NONDEGENERACY: there are f ; g 2 F such that f g.
We can now state the AA subjective expected utility theorem.
Theorem 1. Let % be a preference defined on F . The following conditions are
equivalent:
(i) % satisfies axioms AA.1–AA.5;
(ii) there exists a non-constant function u W X ! R and a probability measure
P W † ! Œ0; 1 such that, for all f ; g 2 F , f % g if and only if
0 1 0 1
Z X Z X
@ u.x/f .s/A dP .s/ @ u.x/g .s/A dP .s/ : (21.3)
S S
x2supp f .s/ x2supp g.s/
18
See Ghirardato et al. (2003) for a subjective underpinning of the AA setup.
19
See Gilboa (2009) for some more details on them.
21 Ambiguity and the Bayesian Paradigm 399
is the expected utility of the lottery f .s/ that act f delivers when state s obtains. It
is easy to see that this expected utility represents the DM’s risk preference % . The
outer part
0 1
Z X
@ u.x/f .s/A dP .s/
S
x2supp f .s/
averages all expected utilities (21.5) according to the probability P, which quantifies
the DM’s beliefs over the state space.
The classical models of Savage and Anscombe-Aumann were considered the
gold standard of decision under uncertainty, despite the challenge posed by Ells-
berg’s experiments. In the 1980s, however, several alternatives were proposed, most
notably models based on probabilities that are not necessarily additive, or on sets
of probabilities. We now turn to review these contributions and some of the current
research in the area.
20
Throughout the paper, cardinally unique means unique up to positive affine transformations.
400 I. Gilboa and M. Marinacci
in one case, the DM practically knows that each side of the coin has probability
of 50 % of coming up. In the other case, the numbers 50–50 % are obtained with a
shrug of one’s shoulders, relying on symmetry of ignorance rather than symmetry
of information.21 Observe that Schmeidler’s two-coin example is very close to
Ellsberg’s two-urn experiment. However, Schmeidler was not motivated by the
desire to explain Ellsberg’s results; rather, he considered the standard theory and
found it counter-intuitive.
Schmeidler (1989) suggested to model probabilities by set functions that are not
necessarily additive. For example, if H (T) designates the event “the unknown coin
falls with H (T) up”, and is the measure of credence, we may have
where on the right-hand side we have a Riemann integral. To see why the Riemann
integral is well defined, observe that the sets Et D fs 2 S W .s/ tg define a
21
See Fischhoff and Bruine De Bruin (1999) for experimental evidence on how people use 50–50 %
statements in this sense.
22
We refer the interested reader to Denneberg (1994) and to Marinacci and Montrucchio (2004) for
detailed expositions of Choquet integration.
21 Ambiguity and the Bayesian Paradigm 401
chain that is decreasing in t (in the sense of set inclusion), and, since a capacity,
is monotone, .Et / is a decreasing function of t. For more detailed explanation of
the Choquet integral the reader is referred to Gilboa (2009).
Schmeidler (1989) axiomatized Choquet expected utility in the AA setup.
The key innovation relative to the AA axioms AA.1–AA.4 was to restrict the
Independence axiom AA.3 to comonotonic acts, that is, acts f ; g 2 F for which
it is never the case that both f .s/ f .s0 / and g .s/ g .s0 / for some states of the
world s and s0 . This is the preference version of comonotonicity.
S.3 COMONOTONIC INDEPENDENCE: for any pairwise comonotonic acts
f ; g; h 2 F and any 0 < ˛ < 1,
According to this axiom, the DM’s preference between two comonotonic acts
f and g is not affected by mixing them with another act h that is comonotonic
with both. The intuition behind this axiom can best be explained by observing that
the classical independence axiom may not be very compelling in the presence of
uncertainty. For example, assume that there are two states of the world, and two
vNM lotteries P Q. Let f D .P; Q/ and g D .Q; P/. Suppose that, due to
ignorance about the state of the world, the DM is driven to express indifference,
f g. By AA’s independence, for every h we will observe
1 1 1 1
f C h gC h
2 2 2 2
However, for h D g this implies that 21 f C 21 g g, despite the fact that the act
1
2
f C 21 g is risky while g is uncertain.
In this example, g can serve as a hedge against the uncertainty inherent in f ,
but it clearly cannot hedge against itself. The standard independence axiom is too
demanding, because it does not distinguish between mixing operations ˛f C.1 ˛/h
that reduce uncertainty (via hedging) and mixing operations that do not. Restricting
the independence axiom to pairwise comonotonic acts neutralizes this asymmetric
effect of hedging.
Using the Comonotonic Independence axiom S.3, Schmeidler (1989) was able
to prove the following representation theorem, which generalizes the subjective
expected utility representation established by Theorem 1 by allowing for possibly
non-additive probabilities. The proof of the result is based on some results on
Choquet integration established in Schmeidler (1986).
Theorem 2 ((ii)). Let % be a preference defined on F . The following conditions
are equivalent:
(i) % satisfies axioms AA.1, AA.2, S.3 (Comonotonic Independence), AA.4, and
AA.5;
402 I. Gilboa and M. Marinacci
If core./ ¤ ;, we may think of .E/ as the lower bound on P.E/, and then
is a concise way to represent a set of probabilities, presumably those that are
considered possible. The lower envelope of a set of probabilities is also the common
interpretation of belief functions (Dempster 1967; Shafer 1976).
Schmeidler (1986) has shown that if is convex in the sense that
23
Nakamura and Wakker’s papers use versions of the so-called tradeoff method (see Kobberling
and Wakker (2003), for a detailed study of this method and its use in the establishment of axiomatic
foundations for choice models).
21 Ambiguity and the Bayesian Paradigm 403
f g ) ˛f C .1 ˛/g % f :
This mixing can be thus viewed as a form of hedging against ambiguity that the DM
can choose.25
Theorem 3. In Theorem 2, % satisfies axiom S.6 if and only if the capacity
in (21.8) is convex.
Theorem 3 (Schmeidler 1989) shows that convex capacities characterize ambi-
guity averse Choquet expected utility DMs (in the sense of axiom S.6). Since
most DMs are arguably ambiguity averse, this is an important result in Choquet
expected utility theory. Moreover, relating this theory to maximization of the worst-
case expected utility over a set of probabilities has several advantages. First, it
obviates the need to understand the unfamiliar concept of Choquet integration.
Second, it provides a rather intuitive, if extreme, cognitive account of the decision
process: as in classical statistics, the DM entertains several probability measures
as potential beliefs. Each such “belief” induces an expected utility index for each
act. Thus, each act has many expected utility values. In the absence of second-order
beliefs, the cautious DM chooses the worst-case expected utility as summarizing
the act’s desirability. Wakker (1990, 1991) established several important behavioral
properties and characterizations of concave/convex capacities in the CEU model.
24
See Marinacci and Montrucchio (2004) p. 73. They show on p. 78 that (21.9) can be derived from
this result of Choquet through a suitable application of the Hahn-Banach Theorem.
25
Klibanoff (2001a,b) studied in detail the relations between randomization and ambiguity
aversion.
404 I. Gilboa and M. Marinacci
Gilboa and Schmeidler (1989) This account of Choquet expected utility maxi-
mization also relates to the maxmin criterion of Wald (1950; see also Milnor 1954).
However, there are many natural sets of probabilities that are not the core of any
capacity. Assume, for example, that there are three states of the world, S D f1; 2; 3g.
Assume that the DM is told that, if state 1 is not the case, then the (conditional)
probability of state 2 is at least 2=3. If this is all the information available to her,
she knows only that state 2 is at least twice as likely than state 3. Hence the set of
probability vectors P D .p1 ; p2 ; p3 / that reflects the DM’s knowledge consists of all
vectors such that
p2 2p3
It is easy to verify that this set is not the core of a capacity. Similarly, one may
consider a DM who has a certain probability measure P in mind, but allows for the
possibility of error in its specification. Such a DM may consider a set of probabilities
for some norm kk and " > 0, and this set is not the core of any capacity (such sets
were used in Nishimura and Ozaki (2007)).
It therefore makes sense to generalize Choquet expected utility with convex
capacities to the maxmin rule, where the minimum is taken over general sets of
probabilities. Decision rules of this type have been suggested first by Hurwicz
(1951), under the name of Generalized Bayes-minimax principle, and then by Smith
(1961), Levi (1974, 1980), and Gärdenfors and Sahlin (1982). Recently, related
ideas appeared in mathematical finance (see Artzner et al. 1997, 1999).
Gilboa and Schmeidler (1989) provided an axiomatic model of maxmin expected
utility maximization ( “MMEU”, also referred to as “MEU”). This model is also
formulated in the AA framework and, like the Choquet expected utility model,
is based on a suitable weakening of the Independence axiom AA.3. Schmeidler’s
Comonotonic Independence axiom restricted AA.3 to the case that all acts are
pairwise comonotonic. This rules out obvious cases of hedging, but it may allow
for more subtle ways in which expected utility can be “smoothed out” across states
of the world.26 A more modest requirement restricts the independence condition to
the case in which the act h is constant:
26
For example, assume that there are three states of the world, and two acts offer the following
expected utility profiles: f D .0; 10; 20/ and g D .4; 10; 14/. Assume that the DM is indifferent
between f and g, that is, that she is willing to give up 1 unit of expected utility in state 3 in order
to transfer 5 units from state 3 to state 1. Comonotonic independence would imply that the DM
should also be indifferent between f and g when they are mixed with any other act comonotonic
with both, such as f itself. However, while f clearly doesn’t offer a hedge against itself, mixing f
with g can be viewed as reducing the volatility of the latter, resulting in a mix that is strictly better
than f and g.
21 Ambiguity and the Bayesian Paradigm 405
27
Schmeidler required that all three acts be pairwise comonotonic, whereas C-Independence does
not restrict attention to comonotonic pairs .f ; g/. Thus, C-Independence is not, strictly speaking,
weaker than Comonotonic Independence. However, in the presence of Schmeidler’s other axioms,
Comonotonic Independence is equivalent to the version in which f and g are not required to be
comonotonic.
28
See Ghirardato et al. (1998) for details.
406 I. Gilboa and M. Marinacci
The set C is a singleton if and only if % satisfies the Independence axiom AA.3. A
slightly more interesting result actually holds, which shows that maxmin expected
utility DMs reduce to subjective expected utility ones when their choices do not
involve any hedging against ambiguity.29
Proposition 5. In Theorem 4, C is a singleton if and only if, for all f ; g 2 F ,
1 1
f g) f C g g:
2 2
When C is not a singleton, the model can express more complex states of
knowledge, reflected by various sets C of probabilities. For applications in economic
theory, the richness of the maxmin model seems to be important. In particular,
one may consider any model in economic theory and enrich it by adding some
uncertainty about several of its parameters. By contrast, in order to formulate
Choquet expected utility, one needs to explicitly consider the state space and the
capacity defined on it. Often, this exercise may be intractable.
By contrast, for some practical applications such as in medical decision making,
the richness of the maxmin model may prove a hindrance. Wakker (2010) presents
the theory of decision making under risk and under ambiguity geared for such
applications. He focuses on capacities as a way to capture ambiguity, rather than
on sets of probabilities.30
The maxmin model allows for more degrees of freedom than the CEU model,
but it does not generalize it. In fact, the overlap of the two models is described in
Theorem 3 and occurs when the uncertainty averse axiom S.6 holds. But, whereas
uncertainty aversion – through axiom S.6 – is built into the decision rule of the
maxmin model, Choquet expected utility can express attitudes of uncertainty liking.
This observation in part motivated the search by Ghirardato et al. (2004) of a class of
preferences that may not satisfy S.6 and is able to encompass both CEU and MMEU
preferences. We review this contribution below.
Finally, Casadesus-Masanell et al. (2000), Ghirardato et al. (2003), and Alon and
Schmeidler (2014) established purely subjective versions of Gilboa and Schmei-
dler’s representation result.31
Countably additive priors Theorem 4 considers the set .†/ of all finitely
additive probabilities. In applications, however, it is often important to consider
29
See Ghirardato et al. (2004) for details.
30
Wakker (2010) also introduces the gain-loss asymmetry that is one of the hallmarks of Prospect
Theory (Kahneman and Tversky 1979). The combination of gain-loss asymmtry with rank-
dependent expected utility (Quiggin 1982; Yaari 1987) resulted in Cumulative Prospect Theory
(CPT, Tversky and Kahneman 1992). When CPT is interpreted as dealing with ambiguity, it is
equivalent to Choquet expected utility with the additional refinement of distinguishing gains from
losses.
31
For a critical review of the maxmin and other non-Bayesian models, see Al-Najjar and Weinstein
(2009) (see Mukerji 2009; Siniscalchi 2009b, for a discussion).
21 Ambiguity and the Bayesian Paradigm 407
32
As Chateauneuf et al. (2005) show, this control prior exists because, under Axiom MC, the set C
is weakly compact, a stronger compactness condition than the weak -compactness that C features
in Theorem 4. Their results have been generalized to variational preferences by Maccheroni et al.
(2006a).
33
In this regard, Arrow (1970) wrote that “the assumption of Monotone Continuity seems, I believe
correctly, to be the harmless simplification almost inevitable in the formalization of any real-life
problem.” See Kopylov (2010) for a recent version of Savage’s model under Monotone Continuity.
In many applications, countable additivity of the measure(s) necessitates the restriction of the
algebra of events to be a proper subset of 2S . Ignoring many events as “non-measurable” may
appear as sweeping the continuity problem under the measurability rug. However, this approach
may be more natural if one does not start with the state space S as primitive, but derives it as the
semantic model of a syntactic system, where propositions are primitive.
408 I. Gilboa and M. Marinacci
(P .E/ D 0 if and only if P0 .E/ D 0 for all E 2 †). Epstein and Marinacci (2007)
provide a behavioral condition that ensures this minimal consistency among priors,
which is especially important in dynamic problems that involve priors’ updating.
Interestingly, this condition turns out to be a translation in a choice under
uncertainty setup of a classic axiom introduced by Kreps (1979) in his seminal work
on menu choices. Given any two consequences x and y, let
x if x % y
x_yD
y otherwise
and given any two acts f and g, define the act f _ g by .f _ g/ .s/ D f .s/ _ g .s/ for
each s 2 S.
GK GENERALIZED KREPS: For all f ; f 0 ; g 2 F , f f _ f 0 ) f _ g
.f _ g/ _ f 0 .
In every state, the act f _ f 0 gives the better of the two outcomes associated
with f and f 0 . Thus we say that f _ f 0 weakly improves f in ‘the direction’ f 0 . GK
requires that if an improvement of f in direction f 0 has no value, then the same must
be true for an improvement in direction f 0 of any act (here f _ g) that improves f .
The next result of Epstein and Marinacci (2007) shows that for maxmin preferences
this seemingly innocuous axiom is equivalent to the mutual absolute continuity of
priors.
Theorem 7. In Theorem 4, % satisfies Axiom GK if and only if the probabilities in
C are equivalent.
Unanimity Preferences
34
A caveat: the unanimity rule (21.11) is slightly different from Bewley’s, who represents strict
preference by unanimity of strict inequalities. This is generally not equivalent to representation of
weak preference by unanimity of weak inequalities.
410 I. Gilboa and M. Marinacci
relations, and show that certain axioms, stated on each relation separately as well as
relating the two, are equivalent to a joint representation of the two relations by the
same set of probabilities C: one by the unanimity rule, and the other – by the maxmin
rule. Their results provide a bridge between the two classic representations (21.10)
and (21.11), as well as a possible account by which maxmin behavior might emerge
from incomplete preferences.
Ghirardato et al. (GMM, 2004) used some insights from Bewley’s unanimity
representation to remove the Uncertainty Aversion axiom S.6 in the derivation of
Gilboa and Schmeidler (1989) and, in this way, to propose a class of preferences
that encompasses both Choquet and maxmin preferences. To this end, they consider
the following definition.
Definition 9. A preference % on F is said to be invariant biseparable if it satisfies
axioms AA.1, AA.2, GS.3 (C-Independence), AA.4, and AA.5.
Invariant biseparable (IB) preferences thus satisfy all AA axioms, except for
the Independence axiom AA.3, which is replaced by the C-Independence axiom
GS.3 of Gilboa and Schmeidler (1989).35 Thanks to this key weakening, invariant
biseparable preferences include as special cases both CEU and MMEU preferences:
the former constitute the special case when the Comonotonic Independence Axiom
S.3 holds, while the latter – when the Uncertainty Aversion axiom S.6 holds.
The main tool that GMM use to study IB preferences is an auxiliary relation %
on F . Specifically, given any two acts f ; g 2 F , act f is said to be unambiguously
(weakly) preferred to g, written f % g, if
˛f C .1 ˛/ h % ˛g C .1 ˛/ h
for all ˛ 2 Œ0; 1 and all h 2 F . In words, f % g holds when the DM does not
find any possibility of hedging against or speculating on the ambiguity that she may
perceive in comparing f and g. GMM argue that this DM’s choice pattern reveals
that ambiguity does not affect her preference between f and g, and this motivates
the “unambiguously preferred” terminology.
The unambiguous preference relation is, in general, incomplete. This incomplete-
ness is due to ambiguity
Lemma 10. The following statements hold:
(i) If f % g, then f % g.
(ii) % satisfies axioms B.1, AA.2, and AA.3
35
The name biseparable originates in Ghirardato and Marinacci (2001, 2002), which we will
discuss later.
21 Ambiguity and the Bayesian Paradigm 411
Intuitively, in this case the DM perceives a similar ambiguity in both acts. For
example, p q for all lottery acts, which are unambiguous.
It is easy to see that is an equivalence relation. Denote by Œf the relative
equivalence class determined by an act f , and by FÍ the quotient space of F that
consists of these equivalence classes.
36
That is, if %0 % and %0 satisfies independence, then %0 % .
37
This latter feature of % relates this notion to an earlier one by Nehring (2001), as GMM discuss.
38
GMM also show the form that C takes for some CEU preferences that do not satisfy S.6.
412 I. Gilboa and M. Marinacci
This is the a-MEU criterion that Jaffray (1989) suggested to combine Hurwicz
(1951)’s criterion (see also Arrow and Hurwicz 1972) with a maxmin approach.
Intuitively, a 2 Œ0; 1 measures the degree of the individual’s pessimism, where
a D 1 yields the maxmin expected utility model, and a D 0 – its dual, the
maxmax expected utility model. However, this apparently natural idea turned out
to be surprisingly tricky to formally pin down. GMM provided a specific axiom
21 Ambiguity and the Bayesian Paradigm 413
Smooth Preferences
The MMEU model discussed above is often viewed as rather extreme: if, indeed, a
set of probability measures˚R C is stipulated,
and each act f is mapped to a range of
expected utility values, S u.f /dp j p 2 C , why should such an f be evaluated by
the minimal value in this interval? This worst-case scenario approach seems almost
paranoid: why should the DM assume that nature39 will choose a probability as if
to spite the DM? Isn’t it more plausible to allow for other ways that summarize the
interval by a single number?
The extreme nature of the maxmin model is not evident from the axiomatic
derivation of the model. Indeed, this model is derived from Anscombe-Aumann’s
by relaxing their independence axiom in two ways: first, by restricting it to mixing
with a constant act (h above) and, second, by assuming uncertainty aversion. These
weaker axioms do not seem to reflect the apparently-paranoid attitude of the maxmin
principle. A question then arises, how do these axioms give rise to such extreme
uncertainty attitude?
In this context it is important to recall that the axiomatic derivation mentioned
above is in the revealed preferences tradition, characterizing behavior that could
be represented in a certain mathematical formula. An individual who satisfies the
axioms can be thought of as if she entertained a set C of priors and maximized
the minimal expected utility with respect to this set. Yet, this set of priors need
not necessarily reflect the individual’s knowledge. Rather, information and personal
taste jointly determine the set C. Smaller sets may reflect both better information
and a less averse uncertainty attitude. For example, an individual who bets on a
flip of a coin and follows the expected utility axioms with respect to a probability
p D 0:5 of “Head” may actually know that the probability p is 0:5, or she may have
no clue about p but chooses the model p D 0:5 because she is insensitive to her
ignorance about the true data generating process. Thus, information and attitude to
uncertainty are inextricably intertwined in the set C. More generally, it is possible
that the individual has objective information that the probability is in a set D, but
39
Relations between ambiguity and games against nature are discussed in Hart et al. (1994),
Maccheroni et al. (2006a,b), and Ozdenoren and Peck (2008).
414 I. Gilboa and M. Marinacci
behaves according to the maxmin expected utility rule with respect to a set C D,
reflecting her uncertainty attitude. This intuition has motivated the model of Gajdos
et al. (2008) that axiomatically established the inclusion C D (some related ideas
can be found in Wang (2003a) and Giraud (2005)).
If, however, the set of priors C is interpreted cognitively a la Wald, that is, as the
set of probabilities that are consistent with objectively available information, one
may consider alternatives to the maxmin rule that, under this Waldean interpretation,
has an extreme nature. One approach to address this issue is to assume that the DM
has a prior probability over the possible probability distributions in C. Thus, if .†/
is the space of all “first order” probability distributions (viewed as data generating
processes), and is a “second order” prior probability over them, one can use to
have an averaging of sorts over all expected utility values of an act f .
Clearly, the expectation of expectations is an expectation. Thus, if one uses to
compute the expectation of the expected utility, there will exist a probability pO on S,
given by
Z
pO D pd
.†/
In this case, the new model cannot explain any new phenomena, as it reduces
to the standard Bayesian model. However, if the DM uses a non-linear function
to evaluate expected utility values, one may explain non-neutral attitudes to
uncertainty. Specifically, assume that
'WR!R
V is a smooth functional, whereas the Choquet expected utility and the maxmin
expected utility functionals are typically not everywhere differentiable (over the
space of acts).
The notion of second order probabilities is rather old and deserves a separate
survey.40 This idea is at the heart of Bayesian statistics, where Bayes’s rule is
retained and a probability over probabilities over a state space is equivalent to
a probability over the same space. Within decision theory, Segal (1987) already
suggested that Ellsberg’s paradox can be explained by second-order probabilities,
provided that we allow the decision maker to violate the principle of reduction
of compound lotteries. Specifically, Segal’s model assumed that the second-order
probabilities are used to aggregate first-order expectations via Quiggin’s (1982)
anticipated utility. Other related models have been proposed by Nau (2001, 2006,
2011), Chew and Sagi (2008), Ergin and Gul (2009), and Seo (2009). Halevy and
Feltkamp (2005) proposed another approach according to which the decision maker
does not err in the computation of probabilities, but uses a mis-specified model,
treating a one-shot choice as if it were repeated.
As compared to Choquet expected utility maximization, the smooth preferences
model, like the maxmin model, has the advantage of having a simple and intelligible
cognitive interpretation. As opposed to both Choquet and maxmin expected utility
models, smooth preferences have the disadvantage of imposing non-trivial episte-
mological demands on the DM: the smooth model requires the specification of a
prior over probability models, that is, of a probability over a much larger space,
.†/, something that may be informationally and observationally demanding.
That said, beyond the above mentioned separation, the smooth preferences model
enjoys an additional advantage of tractability. If S is finite, one may choose to be
a uniform prior over .†/ and specify a simple functional form for ', to get a
simple model in which uncertainty/ambiguity attitudes can be analyzed in a way
that parallels the treatment of risk attitudes in the classical literature. Specifically,
assume that
1 ˛x
'.x/ D e
˛
for ˛ > 0. In this case, the DM can be said to have a constant ambiguity aversion ˛;
when ˛ ! 0, the DM’s preferences converge to Bayesian preferences with prior pO ,
whereas when ˛ ! 1, preferences converge to MMEU preferences relative to the
support of . (See Klibanoff et al. 2005, for details.) Thus, the smooth ambiguity
aversion model can be viewed as an extension of the maxmin model, in its Waldean
interpretation.
40
Bayes (1763) himself writes in his Proposition 10 that “the chance that the probability of the
event lies somewhere between . . . ” (at the beginning of his essay, in Definition 6 Bayes says that
“By chance I mean the same as probability”).
416 I. Gilboa and M. Marinacci
Variational Preferences
where > 0, and R .kQ/ W .†/ ! Œ0; 1 is the relative entropy with respect to
Q.
Preferences % on F represented by criterion (21.15) are called multiplier
preferences by Hansen and Sargent. The relative entropy R .PkQ/ measures the
relative likelihood of the alternative models P with respect to the reference model Q.
The positive parameter reflects the weight that agents are giving to the possibility
that Q might not be the correct model (as becomes larger, agents focus more on Q
as the correct model, giving less importance to the alternatives P).
Model uncertainty, which motivated the study of multiplier preferences, is clearly
akin to the problem of ambiguity, underlying maxmin preferences. Yet, neither
class of preferences is nested in the other. A priori, it was not clear what are the
commonalities between these models and how they can be theoretically justified. To
address this issue, MMR introduced and axiomatized a novel class of preferences
that includes both multiplier and maxmin preferences as special cases.
Specifically, observe that the maxmin criterion (21.10) can be written as
Z
V .f / D min u .f .s// dP .s/ C ıC .P/ ; (21.16)
P2.†/ S
Like the relative entropy, the indicator function is a convex function defined on the
simplex .†/. This suggests the following general representation
Z
V .f / D min u .f .s// dP .s/ C c .P/ ; (21.17)
P2.†/
21 Ambiguity and the Bayesian Paradigm 417
Lemma 13 (MMR p. 1454) shows that axiom GS.3 actually involves two types
of independence: independence relative to mixing with constants and independence
relative to the weights used in such mixing. The next axiom, due to MMR, retains
the first form of independence, but not the second one.
MMR.3 WEAK C-INDEPENDENCE: If f ; g 2 F , p; q 2 .X/, and ˛ 2 .0; 1/,
AA.7 UNBOUDEDNESS: There exist x y in X such that for all ˛ 2 .0; 1/ there
exists z 2 X satisfying either y ˛z C .1 ˛/ x or ˛z C .1 ˛/ y x.
We can now state the representation result of MMR, which generalizes Theorem
4 by allowing for general functions c W .†/ ! Œ0; 1. Here xf denotes the
certainty equivalent of act f ; i.e., f xf .
Theorem 14. Let % be a binary relation on F . The following conditions are
equivalent:
(i) % satisfies conditions AA.1, AA.2, MMR.3, AA.4, AA.5, S.6, and AA.7;
(ii) there exists an affine function u W X ! R, with u .X/ unbounded, and a
grounded,41 convex, and lower semicontinuous function c W .†/ ! Œ0; 1
such that, for all f ; g 2 F
Z
f % g , min u .f .s// dP .s/ C c .P/
P2.†/ S
Z
min u .g .s// dP .s/ C c .P/ : (21.18)
P2.†/
MMR show how the function c can be viewed as an index of ambiguity aversion,
as we will discuss later in section “Ambiguity Aversion”. Alternatively, they observe
that the function c can be interpreted as the cost of an adversarial opponent of
selecting the prior P. In any case, formula (21.19) allows to determine the index
c from behavioral (e.g., experimental) data in that it only requires to elicit u and the
certainty equivalents xf .
Behaviorally, maxmin preferences are the special class of variational preferences
that satisfy the C-Independence axiom GS.3. For multiplier preferences, however,
MMR did not provide the behavioral assumption that characterize them among vari-
ational preferences. This question left open by MMR was answered by Strzalecki
(2010), who found the sought-after behavioral conditions. They turned out to be
closely related to some of Savage’s axioms. Strzalecki’s findings thus completed
the integration of multiplier preferences within the framework of choice under
ambiguity.
The weakening of C-Independence in MMR.3 has a natural variation in which
independence is restricted to a particular lottery act, but not to a particular weight ˛.
Specifically, one may require that, for the worst possible outcome x (if such exists),
41
The function c W .†/ ! Œ0; 1 is grounded if its infimum value is zero.
21 Ambiguity and the Bayesian Paradigm 419
This decision rule suggests that the DM has a degree of confidence '.P/ in each
possible prior P. The expected utility associated with a prior P is multiplied by
the inverse of the confidence in P, so that a low confidence level is less likely to
determine the minimum confidence-weighted expected utility of f .
The intersection of the classes of variational preferences with confidence prefer-
ences is the maxmin model, satisfying C-Independence in its full force.42 See also
Ghirardato et al. (2005) for other characterizations of C-Independence.
All the choice models that we reviewed so far feature some violation of the
Independence axiom AA.3, which is the main behavioral assumption questioned
in the literature on choice under ambiguity in a AA setup. In order to better
understand this class of models, Cerreia-Vioglio et al. (2011) recently established
a common representation that unifies and classifies them. Since a notion of
minimal independence among uncertain acts is, at best, elusive both at a theoretical
and empirical level, this common representation does not use any independence
condition on uncertain acts, however weak it may appear.
Cerreia-Vioglio et al. (2011) thus studied uncertainty averse preferences, that
is, complete and transitive preferences that are monotone and convex, without
any independence requirement on uncertain acts. This general class of preferences
includes as special cases variational preferences, confidence preferences, as well as
smooth preferences with a concave '.
Though no independence assumption is made on uncertain acts, to calibrate risk
preferences Cerreia-Vioglio et al. assumed standard independence on lottery acts.
CMMM.3 RISK INDEPENDENCE: If p; q; r 2 .X/ and ˛ 2 .0; 1/, p q )
˛p C .1 ˛/ r ˛q C .1 ˛/ r.
42
This is so because one axiom relates preferences between mixtures with different coefficients
˛; ˇ and the other – between mixtures with different constant acts x ; p.
420 I. Gilboa and M. Marinacci
Along with the other axioms, CMMM.3 implies that the risk preference %
satisfies the von Neumann-Morgenstern axioms. In the representation result of
Cerreia-Vioglio et al. (2011) functions of the form G W R .†/ ! . 1; 1
play a key role. Denote by G .R .†// the class of these functions such that:
(i) G is quasiconvex on R .†/,
(ii) G .; P/ is increasing for all P 2 .†/,
(iii) infP2.†/ G .t; P/ D t for all t 2 T.
We can now state a version of their main representation theorem.
Theorem 15. Let % be a binary relation on F . The following conditions are
equivalent:
(i) % satisfies axioms AA.1, AA.2, CMMM.3, AA.4, AA.5, S.6, AA.7;
(ii) there exists a non-constant affine u W X ! R, with u .X/ D R, and a lower
semicontinuous G W R .†/ ! . 1; 1 that belongs to G .R .†// such
that, for all f and g in F ,
Z Z
f % g , min G u .f / dP; P min G u .g/ dP; P : (21.20)
P2.†/ P2.†/
In
R this representation
DMs can be viewed as if they considered, through the term
G
R u .f / dP; P , all possible probabilities P and the associated expected utilities
u .f / dP of act f . They then behave as if they summarized all these evaluations
by taking their minimum. The quasiconvexity of G and the cautious attitude
reflected by the minimum in (21.20) derive from the convexity of preferences. Their
monotonicity, instead, is reflected by the monotonicity of G in its first argument.
The representation (21.20) features both probabilities and expected utilities, even
though no independence assumption whatsoever is made on uncertain acts. In other
words, this representation establishes a general connection between the language
of preferences and the language of probabilities and utilities, in keeping with the
tradition of the representation theorems in choice under uncertainty.
Cerreia-Vioglio et al. (2011) show that G can be interpreted as index of
uncertainty aversion, in the sense of section “Ambiguity Aversion” below. More-
over, (21.21) shows that this index can be elicited from choice behavior.
Variational preferences correspond to additively separable functions G, i.e., these
preferences are characterized by
G .t; P/ D t C c .P/
21 Ambiguity and the Bayesian Paradigm 421
where c W .†/ ! Œ0; 1 is a convex function. In this case (21.20) reduces to the
variational representation (21.18).
Smooth preferences with concave correspond to the uncertainty aversion index
given by
The scope of this paper does not allow us to do justice to the variety of decision
models that have been suggested in the literature to deal with uncertainty in a non-
probabilistic way, let alone the otherwise growing literature in decision theory.43
Here we only mention a few additional approaches to the problem of ambiguity.
As mentioned above, Segal (1987, 1990) suggested a risk-based approach to
uncertainty, founded on the idea that people do not reduce compound lotteries.
Recently, Halevy (2007) provided some experimental evidence on the link between
lack of reduction of compound lotteries and ambiguity, and Seo (2009) carried out
an in depth theoretical analysis of this issue. Since failure to reduce compound
lotteries is often regarded as a mistake, this source of ambiguity has a stronger
positive flavor than the absence of information, which is our main focus.
Stinchcombe (2003), Olszewski (2007), and Ahn (2008) model ambiguity
through sets of lotteries, capturing exogenous or objective ambiguity. (See also
Jaffray (1988), who suggested related ideas). Preferences are defined over these sets,
with singleton and nonsingleton ones modelling risky and ambiguous alternatives,
respectively.
R For example,
these sets can be ranked either according to the criterion
V .A/ D A ı ud = .A/ where and model ambiguity attitudes (Ahn 2008)
or the criterion V .A/ D ˛ minl2A U .l/ C .1 ˛/ maxl2A U .l/ where ˛ models
ambiguity attitudes (Olszewski 2007). Viero (2009) combines this approach with
the Anscombe-Aumann model.
Chateauneuf et al. (2007) axiomatize neo-additive ChoquetRexpected utility, a
tractable CEU criterion of the “Hurwicz” form V .f / D ˛ u .f .s// dP .s/ C
ˇ maxs u .f .s// C .1 ˛ ˇ/ mins u .f .s//. Through the values of the weights ˛
and ˇ, the preference functional V captures in a simple way different degrees of
optimism and pessimism, whose extreme forms are given by the min and max of
u .f .s//.
43
Other sub-fields include choices from menus, decision under risk, minmax regret approaches,
and others. On the first of these, see Limpan and Pesendorfer (2013).
422 I. Gilboa and M. Marinacci
with x y. Biseparable preferences include both CEU and MMEU. Ghirardato and
Marinacci (2001) provide a definition of uncertainty aversion that does not depend
on the specific model of decision making and applies to all biseparable preferences.
More recently, Machina (2005) suggested a general approach to preferences
under uncertainty which, similarly to Machina (1982), assumes mostly smoothness
and monotonicity of preferences, but remains silent regarding the actual structure of
preferences, thereby offering a highly flexible model.
Ambiguity Aversion
Here we present the approach of Ghirardato and Marinacci because of its sharper
model implications. This approach relies on two key ingredients:
(i) A comparative notion of ambiguity aversion that, given any two preferences %1
and %2 on F , says when %1 is more ambiguity averse than %2 .
(ii) A benchmark for neutrality to ambiguity; that is, a class of preferences % on F
that are viewed as neutral to ambiguity.
The choice of these ingredients in turn determines the absolute notion of
ambiguity aversion, because a preference % on F is classified as ambiguity averse
provided it is more ambiguity averse than an ambiguity neutral one.
The comparative notion (i) is based on comparisons of acts with lottery acts that
deliver a lottery p at all states. We consider them here because they are the most
obvious example of unambiguous acts, that is, acts whose outcomes are not affected
by the unknown probabilities.
Consider DM1 and DM2 , whose preferences on F are %1 and %2 , respectively.
Suppose that
f %1 p;
that is, DM1 prefers the possibly ambiguous act f to the unambiguous lottery act p.
If DM1 is more ambiguity averse than DM2 it is natural to expect that DM2 will also
exhibit such preferences:
f %2 p:
For, if DM1 is bold enough to have f %1 p, then DM2 – who dislikes ambiguity no
more than DM1 – must be at least equally bold.
We take this as the behavioral characterization of the comparative notion of
ambiguity aversion.
Definition 16. Given two preferences %1 and %2 on F , %1 is more ambiguity
averse than %2 if, for all f 2 F and p 2 .X/,
f %1 p ) f %2 p: (21.23)
44
Epstein (1999) takes the standard for ambiguity neutrality to be preferences that are probabilis-
tically sophisticated in the sense of Machina and Schmeidler (1992). In his approach Theorem 18
below does not hold.
21 Ambiguity and the Bayesian Paradigm 425
prejudgment on which preferences qualify for this role. Sharp model implications
will follow, nevertheless, as we will see momentarily.
Having thus prepared the ground, we can define ambiguity aversion
Definition 17. A preference relation % on F is ambiguity averse if it is more
ambiguity averse than some SEU preference on F .
The next result, due to Ghirardato and Marinacci (2002), applies these notions
to the maxmin expected utility (MEU) model. Here u1 u2 means that there exist
˛ > 0 and ˇ 2 R such that u1 D ˛u2 C ˇ.
Theorem 18. Given any two MMEU preferences %1 and %2 on F , the following
conditions are equivalent:
(i) %1 is more ambiguity averse than %2 ,
(ii) u1 u2 and C1 C2 (provided u1 D u2 ).
Given that u1 u2 , the assumption u1 D u2 is just a common normalization
of the two utility indices. Therefore, Theorem 18 says that more ambiguity averse
MMEU preferences are characterized, up to a normalization, by smaller sets of
priors C. Therefore, the set C can be interpreted as an index of ambiguity aversion.
This result thus provides a behavioral foundation for the comparative statics
exercises in ambiguity through the size of the sets of priors C that play a key role in
the economic applications of the MMEU model. In fact, a central question in these
applications is how changes in ambiguity attitudes affect the relevant economic
variables.
An immediate consequence of Theorem 18 is that, not surprisingly, MMEU
preferences are always ambiguity averse. That is, they automatically embody a
negative attitude toward ambiguity, an attitude inherited from axiom S.6.
The condition u1 u2 ensures that risk attitudes are factored out in comparing
the MMEU preferences %1 and %2 . This is a dividend of the risk calibration
provided by the AA setup via the risk preference % discussed in section “The
Anscombe-Aumann Setup”. In a Savage setup, where this risk calibration is no
longer available, Definition 16 has to be enriched in order to properly factor out risk
attitudes, so that they do not interfere with the comparison of ambiguity attitudes
(see Ghirardato and Marinacci (2002), for details on this delicate conceptual issue).
Maccheroni et al. (2006a) generalize Theorem 18 to variational preferences by
showing that the condition C1 C2 takes in this case the more general form c1 c2 .
The function c can thus be viewed as an index of ambiguity aversion that generalizes
the sets of priors C. Variational preferences are always ambiguity averse, a fact that
comes as no surprise since they satisfy axiom S.6.
For CEU preferences, Ghirardato and Marinacci (2002) show that more ambi-
guity averse CEU preferences are characterized, up to a common normalization
of utility indexes, by smaller capacities . More interestingly, they show that
CEU preferences are ambiguity averse when the cores of the associated capacities
are nonempty. Since convex capacities have nonempty cores, CEU preferences
that satisfy axiom S.6 are thus ambiguity averse. The converse, however, is not
426 I. Gilboa and M. Marinacci
true since there are capacities with nonempty cores that are not convex. Hence,
there exist ambiguity averse CEU preferences that do not satisfy S.6, which is
thus a sufficient but not necessary condition for the ambiguity aversion of CEU
preferences. Ghirardato and Marinacci (2002) discuss at length this feature of CEU
preferences, and we refer the interested reader to that paper for details (see also
Chateauneuf and Tallon (2002), who present a notion of weak ambiguity aversion
for CEU preferences, as well as Montesano and Giovannone (1996), who investigate
how CEU preferences may reflect aversion to increasing ambiguity).
Unambiguous events Unambiguous events should be events over which decision
makers do not perceive any ambiguity. Intuitively, in terms of functional forms an
event E is unambiguous for a preference % if:
(i) .E/ C .Ec / D 1 when % is CEU;
(ii) P .E/ D P0 .E/ for all P; P0 2 C when % is MMEU and, more generally, for all
P; P0 2 dom c when % is variational;45
(iii) p .E/ D k-a.e. for some k 2 Œ0; 1 when % is smooth.
A few behavioral underpinnings of these notions of unambiguous event have
been proposed by Nehring (1999), Epstein and Zhang (2001), Ghirardato and
Marinacci (2002), Zhang (2002), Ghirardato et al. (2004), Klibanoff et al. (2005),
and Amarante and Feliz (2007) (who also provide a discussion of some of the earlier
notions which we refer the interested to).
Updating Beliefs
How should one update one’s beliefs when new information is obtained? In the
case of probabilistic beliefs there is an almost complete unanimity that Bayes’s
rule is the only sensible way to update beliefs. Does it have an equivalent rule
for the alternative models discussed above? The answer naturally depends on the
particular non-Bayesian model one adopts. At the risk of over-generalizing from
a small sample, we suggest that Bayes’s rule can typically be extended to non-
Bayesian beliefs in more than one way. Since the focus of this survey is on static
preferences, we mention only a few examples, which by no means exhaust the
richness of dynamic models.
For instance, if one’s beliefs are given by a capacity , and one learns that an
event B has obtained, one may assign to an event A the weight corresponding to the
straightforward adaptations of Bayes’s formula:
.A \ B/
.AjB/ D
.B/
45
dom c is the effective domain of the function c; i.e., dom c D fP 2 .S/ W c .p/ < C1g.
21 Ambiguity and the Bayesian Paradigm 427
However, another formula has been suggested by Dempster (1967, see also Shafer
1976) as a special case of his notion of merging of belief functions:
..A \ B/ [ Bc / .Bc /
.AjB/ D
1 .Bc /
Clearly, this formula also boils down to standard Bayesian updating in case is
additive. Yet, the two formulae are typically not equivalent if the capacity fails to
be additive. Each of these formulae extends some, but not all, of the interpretations
of Bayesian updating from the additive to the non-additive case.
If beliefs are given by a set of priors C, and event B is known to have occurred,
a natural candidate for the set of priors on B is simply the same set C, where
each probability is updated according to Bayes’s rule. This results in full Bayesian
updating (FBU), defining the set of priors (on B)
CB D f p .jB/ j p 2 C g
FBU allows standard learning given each possible prior, but does not reflect any
learning about the set of priors that should indeed be taken into consideration. It
captures Bayesian learning (conditional on a prior) but not the statistical inference
typical of classical statistics, namely, the selection of subsets of distributions from
an a priori given set of distributions. If we were to think of each prior p in C as an
expert, who expresses her probabilistic beliefs, FBU can be interpreted as if each
expert were learning from the evidence B, while the DM does not use the evidence
to decide which experts’ advice to heed.46
Following this line of reasoning, and in accordance with statistical principles,
one may wish to select probabilities from the set C based on the given event B.
One, admittedly extreme way of doing so is to adopt the maximum likelihood
principle. This suggests that only the priors that a priori used to assign the highest
probability to the event B should be retained among the relevant ones. Thus,
maximum likelihood updating (MLU) is given by
ˇ
ˇ
CBM D ˇ
p .jB/ ˇ p 2 arg max q.B/
q2C
If one’s beliefs are given by a convex capacity, or, equivalently, by a set C which
is the core of a convex capacity, MLU is equivalent to Dempster-Shafer’s updating.
This rule has been axiomatized by Gilboa and Schmeidler (1993), whereas FBU,
suggested by Jean-Yves Jaffray, has been axiomatized by Pires (2002).
FBU and MLU are both extreme. Using the experts metaphor, FBU retains all
experts, and gives as much weight to those who were right as to those who were
46
See Seidenfeld and Wasserman (1993) who study counter-intuitive updating phenomena in this
context.
428 I. Gilboa and M. Marinacci
Marinacci (1999, 2002b) and Maccheroni and Marinacci (2005). The behavior of
the set of probabilities in the context of the maxmin model was analyzed in Epstein
and Schneider (2007).
Applications
There are many economic models that lead to different qualitative conclusions when
analyzed in a Bayesian way as compared to the alternative, non-Bayesian theories.
The past two decades have witnessed a variety of studies that re-visited classical
results and showed that they need to be qualified when one takes ambiguity into
account. The scope of this paper allows us to mention but a fraction of them. The
following is a very sketchy description of a few studies, designed only to give a
general idea of the scope of theoretical results that need to be re-examined in light
of the limitations of the Bayesian approach.47
Dow and Werlang (1992) analyzed a simple asset pricing model. They showed
that, if an economic agent is ambiguity averse as in the CEU or MMEU model, then
there will be a range of prices at which she will wish neither to buy nor to sell a
financial asset. This range will be of non-zero length even if one ignores transaction
costs. To see the basic logic of this result, consider two states of the world, where
the probability of the first state, p, is only known to lie in the interval Œ0:4; 0:6. (This
will also be the core of a convex capacity). Assume that a financial asset X yields
1 in the first state and 1 in the second. The MMEU model values both X and X
at 0:2. In a Bayesian model, p would be known, and the agent would switch, at a
certain price , from demanding X to offering it. This is no longer the case when
p is not known. In this case, assuming ambiguity aversion, there will be an interval
of prices at which neither X nor X will seem attractive to the agent. This may
explain why people refrain from trading in certain markets. It can also explain why
at times of greater volatility one may find lower volumes of trade: with a larger set
of probabilities that are considered possible, there will be more DMs who prefer
neither to buy nor to sell.48 The question of trade among uncertainty averse agents
has been also studied in Billot et al. (2000), Kajii and Ui (2006, 2009), and Rigotti
et al. (2008).
Epstein and Miao (2003) use uncertainty aversion to explain the home bias
phenomenon in international finance, namely, the observation that people prefer to
trade stocks of their own country rather than foreign ones. The intuition is that agents
know the firms and the stock market in their own country better than in foreign
ones. Thus, there is more ambiguity about foreign equities than about domestic
47
Mukerji and Tallon (2004) survey early works in this area.
48
This argument assumes that the decision maker starts with a risk-free portfolio. A trader who
already holds an uncertain position may be satisfied with it with a small set of probabilities, but
wish to trade in order to reduce uncertainty if the set of probabilities is larger.
430 I. Gilboa and M. Marinacci
on the degree of ambiguity, Bose et al. (2006), who study auctions under ambiguity,
Nishimura and Ozaki (2007), who show that an increase in ambiguity changes the
value of an investment opportunity differently than does an increase in risk, Easley
and O’Hara (2009, 2010), who study how ambiguity affects market participation,
and Treich (2010), who studies when the value of a statistical life increases under
ambiguity aversion.
As mentioned above, this list is but a sample of applications and has no claim
even to be a representative sample.
Conclusion
References
Ahn, D. (2008). Ambiguity without a state space. Review of Economic Studies, 75, 3–28.
Akerlof, G. A. (1970). The market for ‘Lemons’: Quality uncertainty and the market mechanism.
The Quarterly Journal of Economics, 84, 488–500.
Al-Najjar, N., & Weinstein, J. L. (2009). The ambiguity aversion literature: A critical assessment.
Economics and Philosophy, 25, 249–284.
Alon, S., & Schmeidler, D. (2014). Purely subjective maxmin expected utility. Journal of Economic
Theory, 152, 382–412
Amarante, M. (2009). Foundations of Neo-Bayesian statistics. Journal of Economic Theory, 144,
2146–2173.
Amarante, M., & Feliz, E. (2007). Ambiguous events and maxmin expected utility. Journal of
Economic Theory, 134, 1–33.
Anscombe, F. J., & Aumann, R. J. (1963). A definition of subjective probability. Annals of
Mathematics and Statistics, 34, 199–205.
Arlo-Costa, H., & Helzner, J. (2010a). Ambiguity aversion: The explanatory power of indetermi-
nate probabilities. Synthese, 172, 37–55.
Arlo-Costa, H., & Helzner, J. (2010b). Ellsberg choices: Behavioral anomalies or new normative
insights? Philosophy of Science, 3, 230–253.
Arrow, K. J. (1970). Essays in the theory of risk-bearing. Amsterdam: North-Holland.
Arrow, K. J., & Hurwicz, L. (1972). An optimality criterion for decision making under ignorance.
In C. F. Carter & J. L. Ford (Eds.), Uncertainty and expectations in economics. Oxford: Basil
Blackwell.
Artzner, P., Delbaen, F., Eber, J. M., & Heath, D. (1997). Thinking coherently. Risk, 10, 68–71.
Artzner, P., Delbaen, F., Eber, J. M., & Heath, D. (1999). Coherent measures of risk. Mathematical
Finance, 9, 203–228.
Aumann, R. J. (1962). Utility theory without the completeness axiom. Econometrica, 30, 445–462.
Aumann, R. J. (1974). Subjectivity and correlation in randomized strategies. Journal of Mathemat-
ical Economics, 1, 67–96.
Aumann, R. J. (1976). Agreeing to disagree. Annals of Statistics, 4, 1236–1239.
Aumann, R. J. (1987). Correlated equilibrium as an expression of bayesian rationality. Economet-
rica, 55, 1–18.
Bayarri, M. J., & Berger, J. O. (2004). The interplay of bayesian and frequentist analysis. Statistical
Science, 19, 58–80.
Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. Philosophical
Transactions of the Royal Society of London, 53, 370–418.
Berger, J. (2004). The case for objective bayesian analysis. Bayesian Analysis, 1, 1–17.
Bernoulli, J. (1713). Ars Conjectandi. Basel: Thurneysen Brothers (trans. E. D. Sylla, The art of
conjecturing. Johns Hopkins University Press, 2005).
Bertrand, J. (1907). Calcul de probabilite (2nd ed.). Paris: Gauthiers Villars.
Bewley, T. (2002). Knightian decision theory: Part I. Decisions in Economics and Finance, 25,
79–110. (Working paper, 1986).
21 Ambiguity and the Bayesian Paradigm 433
Billot, A., Chateauneuf, A., Gilboa, I., & Tallon, J.-M. (2000). Sharing beliefs: Between agreeing
and disagreeing. Econometrica, 68, 685–694.
Bose, S., Ozdenoren, E., & Pape, A. (2006). Optimal auctions with ambiguity. Theoretical
Economics, 1, 411–438.
Bridgman, P. W. (1927). The logic of modern physics. New York: Macmillan.
Carnap, R. (1923). Uber die Aufgabe der Physik und die Andwednung des Grundsatze der
Einfachstheit. Kant-Studien, 28, 90–107.
Casadesus-Masanell, R., Klibanoff, P., & Ozdenoren, E. (2000). Maxmin expected utility over
savage acts with a set of priors. Journal of Economic Theory, 92, 35–65.
Caskey, J. (2009). Information in equity markets with ambiguity-averse investors. Review of
Financial Studies, 22, 3595–3627.
Castagnoli, E., Maccheroni, F., Marinacci, M. (2003). Expected utility with multiple priors. In
Proceedings of ISIPTA 2003, Lugano.
Cerreia-Vioglio, S., Maccheroni, F., Marinacci, M., & Montrucchio, L. (2011). Uncertainty averse
preferences. Journal of Economic Theory, 146(4), 1275–1330.
Chateauneuf, A., Dana, R.-A., & Tallon, J.-M. (2000). Optimal risk-sharing rules and equilibria
with Choquet expected utility. Journal of Mathematical Economics, 34, 191–214.
Chateauneuf, A., Eichberger, J., & Grant, S. (2007). Choice under uncertainty with the best and
worst in mind: Neo-additive capacities. Journal of Economic Theory, 137, 538–567.
Chateauneuf, A., Maccheroni, F., Marinacci, M., & Tallon, J.-M. (2005). Monotone continuous
multiple priors. Economic Theory, 26, 973–982.
Chateauneuf, A., & Faro, J. H. (2009). Ambiguity through confidence functions. Journal of
Mathematical Economics, 45, 535–558.
Chateauneuf, A., & Tallon, J.-M. (2002). Diversification, convex preferences, and non-empty core
in the Choquet expected utility model. Economic Theory, 19, 509–523.
Chew, H. S., & Sagi, J. (2008). Small worlds: Modeling attitudes toward sources of uncertainty.
Journal of Economic Theory, 139, 1–24.
Choquet, G. (1953). Theory of capacities. Annales de l’Institut Fourier, 5, 131–295.
Cifarelli, D. M., & Regazzini, E. (1996). de Finetti’s contribution to probability and statistics.
Statistical Science, 11, 253–282.
Cyert, R. M., & DeGroot, M. H. (1974). Rational expectations and bayesian analysis. Journal of
Political Economy, 82, 521–536.
Daston, L. (1995). Classical probability in the enlightenment. Princeton: Princeton University
Press.
de Finetti, B. (1931). Sul Significato Soggettivo della Probabilità. Fundamenta Mathematicae, 17,
298–329.
de Finetti, B. (1937). La Prevision: ses Lois Logiques, ses Sources Subjectives. Annales de
l’Institut Henri Poincare, 7, 1–68. (trans. H. E. Kyburg & H. E. Smokler (Eds.) Studies in
subjective probability. Wiley, 1963).
DeGroot, M. H. (1975). Probability and statistics. Reading: Addison-Wesley.
Dempster, A. P. (1967). Upper and lower probabilities induced by a multivalued mapping. Annals
of Mathematical Statistics, 38, 325–339.
Denneberg, D. (1994). Non-additive measure and integral. Dordrecht: Kluwer.
Dow, J., & Werlang, S. R. C. (1992). Uncertainty aversion, risk aversion, and the optimal choice
of portfolio. Econometrica, 60, 197–204.
Easley, D., & O’Hara, M. (2009). Ambiguity and nonparticipation: The role of regulation. Review
of Financial Studies, 22, 1817–1843.
Easley, D., & O’Hara, M. (2010). Microstructure and ambiguity. Journal of Finance, 65, 1817–
1846.
Eichberger, J., Grant, S., & Kelsey, D. (2008). Differentiating ambiguity: An expository note.
Economic Theory, 36, 327–336.
Eichberger, J., Grant, S., Kelsey, D., & Koshevoy, G. A. (2011). The ˛-MEU model: A comment.
Journal of Economic Theory, 146(4), 1684–1698.
434 I. Gilboa and M. Marinacci
Efron, B. (1986). Why Isn’t everyone a Bayesian? The American Statistician, 40, 1–11. With
discussion.
Ellsberg, D. (1961). Risk, ambiguity and the savage axioms. Quarterly Journal of Economics, 75,
643–669.
Epstein, L. (1999). A definition of uncertainty aversion. Review of Economic Studies, 66, 579–608.
Epstein, L. G., & Marinacci, M. (2007). Mutual absolute continuity of multiple priors. Journal of
Economic Theory, 137, 716–720.
Epstein, L. G., Marinacci, M., & Seo, K. (2007). Coarse contingencies and ambiguity. Theoretical
Economics, 2, 355–394.
Epstein, L. G., & Miao, J. (2003). A two-person dynamic equilibrium under ambiguity. Journal of
Economic Dynamics and Control, 27, 1253–1288.
Epstein, L. G., & Schneider, M. (2007). Learning under ambiguity. Review of Economic Studies,
74, 1275–1303.
Epstein, L. G., & Schneider, M. (2008). Ambiguity, information quality and asset pricing. Journal
of Finance, 63, 197–228.
Epstein, L. G., & Schneider, M. (2010). Ambiguity and asset markets. Annual Review of Financial
Economics, 2, 315–346.
Epstein, L. G., & Wang, T. (1994). Intertemporal asset pricing under knightian uncertainty.
Econometrica, 62, 283–322.
Epstein, L. G., & Wang, T. (1995). Uncertainty, risk-neutral measures and security price booms
and crashes. Journal of Economic Theory, 67, 40–82.
Epstein, L. G., & Zhang, J. (2001). Subjective probabilities on subjectively unambiguous events.
Econometrica, 69, 265–306.
Ergin, H., & Gul, F. (2009). A theory of subjective compound lotteries. Journal of Economic
Theory, 144, 899–929.
Ergin, H., & Sarver, T. (2009). A subjective model of temporal preferences. Northwestern and
WUSTL, Working paper.
Fischhoff, B., & Bruine De Bruin, W. (1999). Fifty–Fifty=50%? Journal of Behavioral Decision
Making, 12, 149–163.
Fishburn, P. C. (1970). Utility theory for decision making. New York: Wiley.
Frisch, R. (1926). Sur un problème d’économie pure. Norsk Matematisk Forenings Skrifter, 1, 1–
40.
Gajdos, T., Hayashi, T., Tallon, J.-M., & Vergnaud, J.-C. (2008). Attitude toward Imprecise
Information. Journal of Economic Theory, 140, 27–65.
Gärdenfors, P., & Sahlin, N.-E. (1982). Unreliable probabilities, risk taking, and decision making.
Synthese, 53, 361–386.
Garlappi, L., Uppal, R., & Wang, T. (2007). Portfolio selection with parameter and model
uncertainty: a multi-prior approach. Review of Financial Studies, 20, 41–81.
Ghirardato, P. (2002). Revisiting savage in a conditional world. Economic Theory, 20, 83–92.
Ghirardato, P., Klibanoff, P., & Marinacci, M. (1998). Additivity with multiple priors. Journal of
Mathematical Economics, 30, 405–420.
Ghirardato, P., & Marinacci, M. (2001). Risk, ambiguity, and the separation of utility and beliefs.
Mathematics of Operations Research, 26, 864–890.
Ghirardato, P., & Marinacci, M. (2002). Ambiguity made precise: A comparative foundation.
Journal of Economic Theory, 102, 251–289.
Ghirardato, P., Maccheroni, F., & Marinacci, M. (2004). Differentiating ambiguity and ambiguity
attitude. Journal of Economic Theory, 118, 133–173.
Ghirardato, P., Maccheroni, F., & Marinacci, M. (2005). Certainty independence and the separation
of utility and beliefs. Journal of Economic Theory, 120, 129–136.
Ghirardato, P., Maccheroni, F., Marinacci, M., & Siniscalchi, M. (2003). Subjective foundations
for objective randomization: A new spin on roulette wheels. Econometrica, 71, 1897–1908.
Gilboa, I. (1987). Expected utility with purely subjective non-additive probabilities. Journal of
Mathematical Economics, 16, 65–88.
Gilboa, I. (2009). Theory of decision under uncertainty. Cambridge: Cambridge University Press.
21 Ambiguity and the Bayesian Paradigm 435
Gilboa, I., Maccheroni, F., Marinacci, M., & Schmeidler, D. (2010). Objective and subjective
rationality in a multiple prior model. Econometrica, 78, 755–770.
Gilboa, I., Postlewaite, A., & Schmeidler, D. (2008). Probabilities in economic modeling. Journal
of Economic Perspectives, 22, 173–188.
Gilboa, I., Postlewaite, A., & Schmeidler, D. (2009). Is it always rational to satisfy Savage’s
axioms? Economics and Philosophy, 25(03), 285–296.
Gilboa, I., Postlewaite, A., & Schmeidler, D. (2012). Rationality of belief or: Why Savage’s axioms
are neither necessary nor sufficient for rationality. Synthese, 187(1), 11–31.
Gilboa, I., & Schmeidler, D. (1989). Maxmin expected utility with a non-unique prior. Journal of
Mathematical Economics, 18, 141–153. (Working paper, 1986).
Gilboa, I., & Schmeidler, D. (1993). Updating ambiguous beliefs. Journal of Economic Theory,
59, 33–49.
Giraud, R. (2005). Objective imprecise probabilistic information, second order beliefs and
ambiguity aversion: An axiomatization. In Proceedings of ISIPTA 2005, Pittsburgh.
Gollier, C. (2011, forthcoming). Does ambiguity aversion reinforce risk aversion? Applications to
portfolio choices and asset pricing. Review of Economic Studies.
Greenberg, J. (2000). The right to remain silent. Theory and Decisions, 48, 193–204.
Guidolin, M., & Rinaldi, F. (2013). Ambiguity in asset pricing and portfolio choice: A review of
the literature. Theory and Decision, 74(2), 183–217.
Gul, F., & Pesendorfer, W. (2008). Measurable ambiguity. Princeton, Working paper.
Hacking, I. (1975). The emergence of probability. Cambridge: Cambridge University Press.
Halevy, Y. (2007). Ellsberg revisited: An experimental study. Econometrica, 75, 503–536.
Halevy, Y., & Feltkamp, V. (2005). A bayesian approach to uncertainty aversion. Review of
Economic Studies, 72, 449–466.
Hanany, E., & Klibanoff, P. (2007). Updating preferences with multiple priors. Theoretical
Economics, 2, 261–298.
Hanany, E., & Klibanoff, P. (2009). Updating ambiguity averse preferences. The B.E. Journal of
Theoretical Economics, 9(Advances), Article 37.
Hansen, L. P. (2007). Beliefs, doubts, and learning: Valuing macroeconomic risk. American
Economic Review, 97, 1–30.
Hansen, L. P., & Sargent, T. J. (2001). Robust control and model uncertainty. American Economic
Review, 91, 60–66.
Hansen, L. P., & Sargent, T. J. (2008). Robustness. Princeton: Princeton University Press.
Hansen, L. P., Sargent, T. J., & Tallarini, T. D. (1999). Robust permanent income and pricing.
Review of Economic Studies, 66(4), 873–907.
Hansen, L. P., Sargent, T. J., & Wang, N. E. (2002). Robust permanent income and pricing with
filtering. Macroeconomic Dynamics, 6(01), 40–84.
Hart, S., Modica, S., & Schmeidler, D. (1994). A Neo2 Bayesian foundation of the maxmin value
for two-person zero-SUM games. International Journal of Game Theory, 23, 347–358.
Harsanyi, J. C. (1967). Games with incomplete information played by “Bayesian” players, I–III
Part I. The basic model. Management Science, INFORMS, 14(3), 159–182.
Harsanyi, J. C. (1968). Games with incomplete information played by “Bayesian” players Part II.
Bayesian equilibrium points. Management Science, INFORMS, 14(5), 320–334.
Hayashi, T., & Miao, J. (2011). Intertemporal substitution and recursive smooth ambiguity
preferences. Theoretical Economics, 6(3), 423–472.
Hurwicz, L. (1951). Some specification problems and application to econometric models. Econo-
metrica, 19, 343–344.
Huygens, C. (1657). De Ratiociniis in Ludo Aleae. Amsterdam: van Schooten (trans.: E. D. Sylla,
The art of conjecturing. Johns Hopkins University Press, 2005).
Jaffray, J.-Y. (1988). Application of linear utility theory to belief functions. In Uncertainty and
intelligent systems (pp. 1–8). Berlin: Springer.
Jaffray, J. Y. (1989). Coherent bets under partially resolving uncertainty and belief functions.
Theory and Decision, 26(2), 99–105.
Ju, N., & Miao, J. (2012). Ambiguity, learning, and asset returns. Econometrica, 80(2), 559–591.
436 I. Gilboa and M. Marinacci
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk.
Econometrica, 47, 263–291.
Kajii, A., & Ui, T. (2006). Agreeable bets with multiple priors. Journal of Economic Theory, 128,
299–305.
Kajii, A., & Ui, T. (2009). Interim efficient allocations under uncertainty. Journal of Economic
Theory, 144, 337–353.
Kass, R. E., & Wasserman, L. (1996). The selection of prior distributions by formal rules. Journal
of the American Statistical Association, 91, 1343–1370.
Keynes, J. M. (1921). A treatise on probability. London: MacMillan.
Keynes, J. M. (1937). The Quarterly Journal of Economics. From The Collected Writings of John
Maynard Keynes (Vol. XIV, pp. 109–123).
Klibanoff, P. (2001a). Stochastically independent randomization and uncertainty aversion. Eco-
nomic Theory, 18, 605–620.
Klibanoff, P. (2001b). Characterizing uncertainty aversion through preference for mixtures. Social
Choice and Welfare, 18, 289–301.
Klibanoff, P., Marinacci, M., & Mukerji, S. (2005). A smooth model of decision making under
ambiguity. Econometrica, 73, 1849–1892.
Klibanoff, P., Marinacci, M., & Mukerji, S. (2009). Recursive smooth ambiguity preferences.
Journal of Economic Theory, 144, 930–976.
Knight, F. H. (1921). Risk, uncertainty, and profit. Boston/New York: Houghton Mifflin.
Kobberling, V., & Wakker, P. P. (2003). Preference foundations for nonexpected utility: A
generalized and simplified technique. Mathematics of Operations Research, 28, 395–423.
Kocherlakota, N. R. (2007). Model fit and model selection. Federal Reserve Bank of St. Louis
Review, 89, 349–360.
Kopylov, I. (2001). Procedural rationality in the multiple prior model. Rochester, Working paper.
Kopylov, I. (2010). Simple axioms for countably additive subjective probability. UC Irvine,
Working paper.
Kreps, D. M. (1979). A representation theorem for “preference for flexibility”. Econometrica:
Journal of the Econometric Society, 47(3), 565–577.
Kreps, D. (1988). Notes on the theory of choice (Underground classics in economics). Boulder:
Westview Press.
Laplace, P. S. (1814). Essai Philophique sur les Probabilites. Paris: Gauthier-Villars (English ed.,
1951, A philosophical essay on probabilities. New York: Dover).
Levi, I. (1974). On indeterminate probabilities. Journal of Philosophy, 71, 391–418.
Levi, I. (1980). The enterprise of knowledge. Cambridge: MIT.
Lewis, D. (1980). A subjectivist’s guide to objective chance. In R. C. Jeffrey (Ed.), Studies in
inductive logic and probability. Berkeley/Los Angeles: University of California Press.
Lipman, B., & Pesendorfer, W. (2013). Temptation. In D. Acemoglu, M. Arellano, & E. Dekel
(Eds.), Advances in economics and econometrics: Theory and applications. Cambridge:
Cambridge University Press.
Maccheroni, F., & Marinacci, M. (2005). A strong law of large numbers for capacities. The Annals
of Probability, 33, 1171–1178.
Maccheroni, F., Marinacci, M., & Rustichini, A. (2006a). Ambiguity aversion, robustness, and the
variational representation of preferences. Econometrica, 74, 1447–1498.
Maccheroni, F., Marinacci, M., & Rustichini, A. (2006b). Dynamic variational preference. Journal
of Economic Theory, 128, 4–44.
Machina, M. J. (1982). ‘Expected Utility’ analysis without the independence axiom. Econometrica,
50, 277–323.
Machina, M. J. (2004). Almost-objective uncertainty. Economic Theory, 24, 1–54.
Machina, M. J. (2005). ‘Expected Utility/Subjective Probability’ analysis without the sure-thing
principle or probabilistic sophistication. Economic Theory, 26, 1–62.
Machina, M. J., & Schmeidler, D. (1992). A more robust definition of subjective probability.
Econometrica, 60, 745–780.
21 Ambiguity and the Bayesian Paradigm 437
Marinacci, M. (1999). Limit laws for non-additive probabilities and their frequentist interpretation.
Journal of Economic Theory, 84, 145–195.
Marinacci, M. (2002a). Probabilistic sophistication and multiple priors. Econometrica, 70, 755–
764.
Marinacci, M. (2002b). Learning from ambiguous urns. Statistical Papers, 43, 143–151.
Marinacci, M., & Montrucchio, L. (2004). Introduction to the mathematics of ambiguity. In
I. Gilboa (Ed.), Uncertainty in economic theory. New York: Routledge.
Miao, J. (2004). A note on consumption and savings under knightian uncertainty. Annals of
Economics and Finance, 5, 299–311.
Miao, J. (2009). Ambiguity, risk and portfolio choice under incomplete information. Annals of
Economics and Finance, 10, 257–279.
Miao, J., & Wang, N. (2011). Risk, uncertainty, and option exercise. Journal of Economic
Dynamics and Control, 35(4), 442–461.
Milnor, J. (1954). Games against nature. In R. M. Thrall, C. H. Coombs, & R. L. Davis (Eds.),
Decision processes. New York: Wiley.
Montesano, A., & Giovannone, F. (1996). Uncertainty aversion and aversion to increasing
uncertainty. Theory and Decision, 41, 133–148.
Mukerji, S. (1998). Ambiguity aversion and the incompleteness of contractual form. American
Economic Review, 88, 1207–1232.
Mukerji, S. (2009). Foundations of ambiguity and economic modelling. Economics and Philoso-
phy, 25, 297–302.
Mukerji, S., & Tallon, J.-M. (2001). Ambiguity aversion and incompleteness of financial markets.
Review of Economic Studies, 68, 883–904.
Mukerji, S., & Tallon, J.-M. (2004). An overview of economic applications of David Schmeidler’s
models of decision making under uncertainty. In I. Gilboa (Ed.), Uncertainty in economic
theory. New York: Routledge.
Nakamura, Y. (1990). Subjective expected utility with non-additive probabilities on finite state
spaces. Journal of Economic Theory, 51, 346–366.
Nau, R. F. (2001, 2006). Uncertainty aversion with second-order utilities and probabilities.
Management Science, 52, 136–145. (see also Proceedings of ISIPTA 2001).
Nau, R. (2011). Risk, ambiguity, and state-preference theory. Economic Theory, 48(2–3), 437–467.
Nehring, K. (1999). Capacities and probabilistic beliefs: A precarious coexistence. Mathematical
Social Sciences, 38, 197–213.
Nehring, K. (2001). Common priors under incomplete information: A unification. Economic
Theory, 18(3), 535–553.
Nishimura, K., & Ozaki, H. (2004). Search and knightian uncertainty. Journal of Economic Theory,
119, 299–333.
Nishimura, K., & Ozaki, H. (2007). Irreversible investment and knightian uncertainty. Journal of
Economic Theory, 136, 668–694.
Olszewski, W. B. (2007). Preferences over sets of lotteries. Review of Economic Studies, 74, 567–
595.
Ore, O. (1960). Pascal and the invention of probability theory. American Mathematical Monthly,
67, 409–419.
Ortoleva, P. (2010). Status quo bias, multiple priors and uncertainty aversion. Games and Economic
Behavior, 69, 411–424.
Ozdenoren, E., & Peck, J. (2008). Ambiguity aversion, games against nature, and dynamic
consistency. Games and Economic Behavior, 62, 106–115.
Pascal, B. (1670). Pensées sur la Religion et sur Quelques Autres Sujets.
Pearl, J. (1986). Fusion, propagation, and structuring in belief networks. Artificial Intelligence, 29,
241–288.
Pires, C. P. (2002). A rule for updating ambiguous beliefs. Theory and Decision, 33, 137–152.
Quiggin, J. (1982). A theory of anticipated utility. Journal of Economic Behaviorand Organization,
3, 225–243.
438 I. Gilboa and M. Marinacci
Ramsey, F. P. (1926a). Truth and probability. In R. Braithwaite (Ed.), The foundation of mathemat-
ics and other logical essays, (1931). London: Routledge and Kegan.
Ramsey, F. P. (1926b). Mathematical logic. Mathematical Gazette, 13, 185–194.
Rigotti, L., & Shannon, C. (2005). Uncertainty and risk in financial markets. Econometrica, 73,
203–243.
Rigotti, L., Shannon, C., Strzalecki, T. (2008). Subjective beliefs and ex ante trade. Econometrica,
76, 1167–1190.
Rosenmueller, J. (1971). On core and value. Methods of Operations Research, 9, 84–104.
Rosenmueller, J. (1972). Some properties of convex set functions, Part II. Methods of Operations
Research, 17, 287–307.
Saito, K. (2015). Preferences for flexibility and randomization under uncertainty. The American
Economic Review, 105(3), 1246–1271.
Sarin, R., & Wakker, P. P. (1992). A simple axiomatization of nonadditive expected utility.
Econometrica, 60, 1255–1272.
Sarin, R., & Wakker, P. P. (1998). Dynamic choice and nonexpected utility. Journal of Risk and
Uncertainty, 17, 87–119.
Savage, L. J. (1954). The foundations of statistics. New York: Wiley. (2nd ed. in 1972, Dover)
Schmeidler, D. (1986). Integral representation without additivity. Proceedings of the American
Mathematical Society, 97, 255–261.
Schmeidler, D. (1989). Subjective probability and expected utility without additivity. Economet-
rica, 57, 571–587. (Working paper, 1982).
Seo, K. (2009). Ambiguity and second-order belief. Econometrica, 77, 1575–1605.
Segal, U. (1987). The ellsberg paradox and risk aversion: An anticipated utility approach.
International Economic Review, 28, 175–202.
Segal, U. (1990). Two-stage lotteries without the reduction axiom. Econometrica, 58, 349–377.
Seidenfeld, T., & Wasserman, L. (1993). Dilation for sets of probabilities. The Annals of Statistics,
21, 1139–1154.
Shafer, G. (1976). A mathematical theory of evidence. Princeton: Princeton University Press.
Shafer, G. (1986). Savage revisited. Statistical Science, 1, 463–486.
Shapley, L. S. (1972). Cores of convex games. International Journal of Game Theory, 1, 11–26.
(Working paper, 1965).
Siniscalchi, M. (2006a). A behavioral characterization of plausible priors. Journal of Economic
Theory, 128, 91–135.
Siniscalchi, M. (2006b). Dynamic choice under ambiguity. Theoretical Economics, 6(3). Septem-
ber 2011.
Siniscalchi, M. (2009a). Vector expected utility and attitudes toward variation. Econometrica, 77,
801–855.
Siniscalchi, M. (2009b). Two out of three ain’t bad: A comment on ‘The ambiguity aversion
literature: A critical assessment’. Economics and Philosophy, 25, 335–356.
Smith, C. A. B. (1961). Consistency in statistical inference and decision. Journal of the Royal
Statistical Society, Series B, 23, 1–25.
Stinchcombe, M. (2003). Choice and games with ambiguity as sets of probabilities. UT Austin,
Working paper.
Strzalecki, T. (2010, forthcoming). Axiomatic foundations of multiplier preferences. Economet-
rica.
Suppe, F. (1977). The structure of scientific theories. Champaign: University of Illinois Press.
Treich, N. (2010). The value of a statistical life under ambiguity aversion. Journal of Environmental
Economics and Management, 59, 15–26.
Tversky, A., & Fox, C. (1995). Weighing risk and uncertainty. Psychological Review, 102, 269–
283.
Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of
uncertainty. Journal of Risk and Uncertainty, 5, 297–323.
van Fraassen, B. C. (1989). Laws and symmetry. Oxford: Oxford University Press.
21 Ambiguity and the Bayesian Paradigm 439
Viero, M.-L. (2009) Exactly what happens after the Anscombe-Aumann race? Representing
preferences in vague environments. Economic Theory, 41, 175–212.
von Neumann, J., & Morgenstern, O. (1947). Theory of games and economic behavior (2nd ed.).
Princeton: Princeton University Press.
Wakker, P. P. (1989a). Continuous subjective expected utility with nonadditive probabilities.
Journal of Mathematical Economics, 18, 1–27.
Wakker, P. P. (1989b). Additive representations of preferences: A new foundation of decision
analysis. Dordrecht: Kluwer.
Wakker, P. P. (1990). Characterizing optimism and pessimism directly through comonotonicity.
Journal of Economic Theory, 52, 453–463.
Wakker, P. P. (1991). Testing and characterizing properties of nonadditive measures through
violations of the sure-thing principle. Econometrica, 69, 1039–1059.
Wakker, P. P. (2010). Prospect theory. Cambridge: Cambridge University Press.
Wald, A. (1950). Statistical decision functions. New York: Wiley.
Walley, P. (1991). Statistical reasoning with imprecise probabilities. London: Chapman and Hall.
Wang, T. (2003a). A class of multi-prior preferences. UBC, Working paper.
Wang, T. (2003b). Conditional preferences and updating. Journal of Economic Theory, 108, 286–
321.
Welch, B. L. (1939). On confidence limits and sufficiency, and particular reference to parameters
of location. Annals of Mathematical Statistics, 10, 58–69.
Yaari, M. E. (1969). Some remarks on measures of risk aversion and on their uses. Journal of
Economic Theory, 1, 315–329.
Yaari, M. E. (1987). The dual theory of choice under risk. Econometrica, 55, 95–115.
Zhang, J. (2002). Subjective ambiguity, expected utility, and choquet expected utility. Economic
Theory, 20, 159–181.
Chapter 22
State-Dependent Utilities
Introduction
Expected utility theory is founded upon at least one of several axiomatic derivations
of probabilities and utilities from expressed preferences over acts, Savage (1954),
deFinetti (1974), Anscombe and Aumann (1963), and Ramsey (1926). These
theories provide for the simultaneous existence of a unique personal probability
over the states of nature and a unique (up to positive affine transformations)
utility function over the prizes such that the ranking of acts is by expected utility.
For example, suppose that there are n states of nature which form the set S D
fs1 ; : : : ; sn g and m prizes in the set Z D fz1 ; : : : ; zm g. An example of an act is a
function f mapping S to Z. That is, if f .si / D zj , then we receive prize zj if state si
occurs. (We will consider more complicated acts than this later.) Now, suppose that
This research was reported, in part, at the Indo-United States Workshop on Bayesian Analysis in
Statistics and Econometrics. The research was supported by National Science Foundation grants
DMS-8805676 and DMS-8705646, and Office of Naval Research contract N00014-88-K0013. The
authors would like to thank Morris DeGroot, Bruce Hill, Irving LaValle, Isaac Levi, and Herman
Rubin for helpful comments during the preparation of this paper. We especially thank the associate
editor for the patience and care that was given to this submission.
M.J. Schervish ()
Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213, USA
e-mail: [email protected]
T. Seidenfeld
Departments of Philosophy and Statistics, Carnegie Mellon University, Pittsburgh,
PA 15213, USA
J.B. Kadane
Departments of Statistics and Social and Decision Sciences, Carnegie Mellon University,
Pittsburgh, PA 15213, USA
there is a probability over the states such that pi D Pr.si / and that there is a utility
U over prizes. To say that acts are ranked by expected utility means that we strictly
prefer act g to act f if and only if
n
X n
X
pi U.f .si // < pi U.g.si //: (22.1)
iD1 iD1
where Ui .zj / is the utility of prize zj given that state si occurs. However, without
restrictions on the degree to which Ui can differ from Ui0 for i 6D i0 , the uniqueness
of the personal probability no longer holds. For example, let q1 ; : : : ; qn be another
probability over P the states such that pP i > 0 if and only if qi > 0. Then, for an
arbitrary act f , niD1 qi Vi .f .si // D n
iD1 pi Ui .f .si //, where Vi ./ D pi Ui ./=qi
when qi > 0 (Vi can be arbitrary when qi D 0.) In this case, it is impossible to
determine an agent’s personal probability by studying the agent’s preferences for
acts. Rubin (1987) notes this fact and develops an axiom system that does not lead
to a separation of probability and utility. Arrow (1974) considers the problem for
insurance. A footnote in Arrow (1974) credits Herman Rubin with raising this same
issue in an unpublished 1964 lecture.
DeGroot (1970) begins his derivation of expected utility theory by assuming
that the concept of “at least as likely as” is an undefined primitive. This allows
a construction of probability without reference to preferences. However, DeGroot
also needs to introduce preferences among acts in order to derive a utility function.
In section “State-Independent Utility”, we will examine the axiomatization of
VonNeumann and Morgenstern (1947) together with the extension of Anscombe
and Aumann (1963) to see how it attempts to avoid the non-uniqueness problem
just described. In section “Savage’s Postulates”, we look at the system of Savage
(1954) with the same goal in mind. In section “An Example”, we give an example
that illustrates that the problem can still arise despite the best efforts of those
who have derived the theories. This example leads to a critical examination of
the theory of deFinetti (1974) in section “deFinetti’s Gambling Approach”. While
reviewing an example from Savage (1954) in section “Savage’s ‘Small Worlds’
Example”, we see how close Savage was to discovering the non-uniqueness
problem in connection with his own theory. In section “How to Elicit Unique
Probabilities and Utilities Simultaneously”, we describe a method for obtaining a
unique personal probability and state-dependent utility based on a proposal of Karni
et al. (1983).
22 State-Dependent Utilities 443
State-Independent Utility
Even when the four axioms above hold, there is no requirement that the utility
function U be the same conditional on each state of nature. As we did when we
constructed Eq. (22.2), we could allowP Ui .zj / D ai U.zj / C bi where each ai > 0.
Then we could letP Q.si / D ai P.si /= nkD1 aP
k P.sk /. It would now be true that H1
H2 if and only if niD1 Q.si /Ui .H1 .si // niD1 Q.si /Ui .H2 .si //. The uniqueness
of the probability in Theorem 1 depends on the use of a state-independent utility
U. Hence, one cannot determine an agent’s probability from the agent’s stated
preferences unless one assumes that the agent’s utility is state-independent. This
may not seem like a serious difficulty when Axiom 4 holds. However, as we will see
in section “An Example”, the problem is more complicated.
Savage’s Postulates
Savage (1954) gives a set of postulates that does not rely on an auxiliary random-
ization in order to extract probabilities and utilities from preferences. Rather, the
postulates rely on the use of prizes that can be considered as “constant” across
states. Savage’s most general acts are functions from states to prizes. Because he
does not introduce an auxiliary randomization, he requires that there be infinitely
many states. The important features of Savage’s theory, for this discussion, are the
first three postulates and a few definitions. Some of the axioms and definitions are
stated in terms of events, which are sets of states. The postulates of Savage are
consistent with the axioms of section “State-Independent Utility” in that they both
provide models for preference by maximizing expected utility.
The first postulate of Savage is the same as Axiom 1. The second postulate
requires a definition of conditional preference.
Definition 4. Let B be an event. We say that f g given B if and only if
• f 0 g0 for each pair f 0 and g0 such that f 0 .s/ D f .s/ for all s 2 B, g0 .s/ D g.s/
for all s 2 B, and f 0 .s/ D g0 .s/ for all s 62 B.
• and f 0 g0 for every such pair or for none.
The second postulate is an analog of Axiom 2. (See Fisburn 1970, p. 193.)
Postulate 2. For each pair of acts f and g and each event B, either f g given B
or g f given B.
Savage has a concept of null event that is similar to the concept of null state from
Definition 3.
Definition 5. An event B is null if, for every pair of acts f and g, f g given B. An
event B is non-null if it is not null.
The third postulate of Savage concerns acts that are constant such as f .s/ D z for all
s, where z is a single prize. For convenience, we will call such an act f by the name
z also.
446 M.J. Schervish et al.
Postulate 3. For each non-null event B and each pair of prizes z1 and z2 (consid-
ered as constant acts), z1 z2 if and only if z1 z2 given B.
Savage’s definition of probability relies on Postulate 3.
Definition 6. Suppose that A and B are events. We say that A is at least as likely as
B if, for each pair of prizes z and w with z w we have fB fA , where fA .s/ D w if
s 2 A, fA .s/ D z if s 62 A, fB .s/ D w if s 2 B, and fB .s/ D z if s 62 B.
Postulate 2 guarantees that, with fA and fB as defined in Definition 6, either fB fA
no matter which pair of prizes z and w one chooses (so long as z w) or fA fB no
matter which pair of prizes one chooses.
Postulate 3 says that the relative values of prizes cannot change between states.
Savage (1954, p. 25) suggests that problems in locating prizes which satisfy this
postulate might be solved by a clever redescription. For example, rather than
describing prizes as “receiving a bathing suit” and “receiving a tennis racket”
(whose relative values change depending on which of the two states “picnic at the
beach” or “picnic in the park” occurs), Savage suggests that the prizes might be
“a refreshing swim with friends,” “sitting alone on the beach with a tennis racket,”
etc. However, we do not see how to carry out such redescriptions while satisfying
Savage’s structural assumption that each prize is available as an outcome under
each state. (What does it mean to receive the prize “sitting alone on the beach with
a tennis racket” when the state “picnic in the park” occurs?)
Our problem, however, is deeper than this. Definition 6 assumes that the absolute
values of prizes do not change from state to state. For example, suppose that A and
B are disjoint and the value of z is 1 for the states in A and 2 for the states in B.
Similarly, suppose that the value of w is 2 for the states in A and 4 for the states
in B. Then, even if A is more likely than B, but is not twice as likely, we would
get fA fB , and we would conclude, by Definition 6, that B is more likely than A.
The example in section “An Example” (using just one of the currencies), as well as
our interpretation of Savage’s “small worlds” problem (in section “Savage’s ‘Small
Worlds’ Example”) suggest that it might be very difficult to find prizes with the
property that their “absolute” values do not change from state to state even though
their “relative” values remain the same from state to state.
deFinetti (1974) assumes that there is a set of prizes with numerical values such
that utility is linear in the numerical value. That is, a prize numbered 4 is worth
twice as much as a prize numbered 2. More specifically, to say that utility is
linear in the numerical values of prizes, we mean the following. For each pair of
prizes, .z1 ; z2 / with z1 < z2 , and each 0 ˛ 1, the lottery that pays z1 with
probability 1 ˛ and pays z2 with probability ˛ (using the auxiliary randomization
of section “State-Independent Utility”) is equivalent to the lottery that pays prize
22 State-Dependent Utilities 447
.1 ˛/z1 C ˛z2 for sure. Using such a set of prizes, deFinetti supposes that an
agent will accept certain gambles that pay these prizes. If f is an act, to gamble
on f means to accept a contract that pays the agent the prize c.f .s/ x/ when
state s occurs, where c and x are some values. A negative outcome means that the
agent has to pay out, while a positive outcome means that the agent gains some
amount.
Definition 7. The prevision of an act f is the number x that one would choose so
that all gambles of the form c.f x/ would be accepted, for all small values of c,
both positive and negative.
If an agent is willing to gamble on each of several acts, then it is assumed that the
agent will also gamble on them simultaneously. (For a critical discussion of this
point, see Kadane and Winkler (1988) and Schick (1986).)
Definition 8. A collection of previsions for acts is coherent if, for each finite
set of the acts, say f1 ; : : : ; fn with previsions
P x1 ; : : : ; xn respectively, and each set
of numbers c1 ; : : : ; cn , we have supall s niD1 ci .fi .s/ xi / 0. Otherwise, the
previsions are incoherent.
deFinetti (1974) proves that a collection of previsions of bounded acts is coherent
if and only if there exists a finitely additive probability such that the prevision of
each act is its expected value. This provides a method of eliciting probabilities
by asking an agent to specify previsions for acts such as f .s/ D 1 if s 2
A and f .s/ D 0 if s 62 A. The prevision of such an act f would be its
probability if the previsions are coherent. As plausible as this sounds, the example
below casts doubt on the ability of deFinetti’s program to elicit probabilities
accurately.
An Example
Let the set of available prizes be various amounts of Dollars. We suppose that there
are three states of nature, which we will describe in more detail later, and we suppose
that the agent expresses preferences that satisfy the axioms of section “State-Inde-
pendent Utility” and the postulates of Savage (1954). Furthermore, suppose that the
agent’s utility for money is linear. That is, for each state i, Ui .$cx/ D cUi .$x/. In
particular, Ui .$0/ D 0. Now, we offer the agent three horse lotteries H1 , H2 , and H3
whose outcomes are
State of Nature
s1 s2 s3
H1 $1 $0 $0
H2 $0 $1 $0
H3 $0 $0 $1
448 M.J. Schervish et al.
Suppose that the agent claims that these three horse lotteries are equivalent. If we
assume that the agent has a state-independent utility, the expected utility of Hi is
U.$1/P.si /. It follows from the fact that the three horse lotteries are equivalent, that
P.si / D 1=3 for each i.
Next, we alter the set of prizes to be various Yen amounts (the Japanese
currency). Suppose that we offer the agent three Yen horse lotteries H4 , H5 , and
H6 whose outcomes are
State of Nature
s1 s2 s3
H4 100Y 0Y 0Y
H5 0Y 125Y 0Y
H6 0Y 0Y 150Y
If the agent were to claim that these three horse lotteries were equivalent, and if
we assumed that the agent used a state-independent utility for Yen prizes, then
P.s1 /U.100Y/ D P.s2 /U.125Y/ D P.s3 /U.150Y/. Supposing that the agent’s
utility is linear in Yen, as it was in dollars, we conclude that P.s1 / D 1:25P.s2 / D
1:5P.s3 /. It follows that P.s1 / D :4054, P.s2 / D :3243, and P.s3 / D :2703. It would
seem incoherent for the agent to express both sets of equivalences since it appears
that the agent is now committed to two different probability distributions over the
three states. This is not correct, however, as we now see.
Suppose that the three states of nature represent three different exchange rates
between Dollars and Yen. s1 D {$1 is worth 100Y}, s2 D {$1 is worth 125Y},
and s3 D {$1 is worth 150Y}. Suppose further that the agent can change monetary
units at the prevailing rate of exchange without any penalty. As far as this agent is
concerned, Hi and H3Ci are worth exactly the same for i D 1; 2; 3 since, in each
state the prizes they award are worth the same amount. The problem that arises
in this example is that the two probability distributions were constructed under
incompatible assumptions. The discrete uniform probability was constructed under
the assumption that U.$1/ is the same in all three states, while the other probability
was constructed under the assumption that U.100Y/ was the same in all three states.
Clearly these cannot both be true given the nature of the states. What saves both
Theorem 1 and Savage’s theory is that preference can be represented by expected
utility no matter which of the two assumptions one makes. Unfortunately, this same
fact makes the uniqueness of the probability relative to the choice of which prizes
count as constants in terms of utility. There are two different representations of the
agent’s preferences by probability and state-independent utility. But what is state-
independent in one representation is state-dependent in the other.
If we allow both types of prizes at once, we can calculate the marginal exchange
rate for the agent. That is, we can ask, “For what value x will the agent claim
that $1 and xY are equivalent?” This question can be answered using either of the
two probability-utility representations and the answers will be the same. First, with
22 State-Dependent Utilities 449
Dollars having constant value, the expected utility of a horse lottery paying $1 in all
three states is U.$1/. The expected value of the horse lottery paying xY in all three
states is
using the linearity of utility and the state-specific exchange rates. By setting this
expression equal to U.$1/, we obtain that x D 121:62. Equivalently, we can
calculate the exchange rate assuming that Yen have constant value over states. The
act paying xY in all states has expected utility U.xY/ D :01xU.100Y/. The act
paying $1 in all states has expected utility
Setting this equal to :01xU.100Y/ yields x D 121:62, which is the same exchange
rate as calculated earlier.
The implications of this example for elicitation are staggering. Suppose we
attempt to elicit the agent’s probabilities over the three states by offering acts
in Dollar amounts using deFinetti’s gambling approach from section “deFinetti’s
Gambling Approach”. The agent has utility that is linear in both Dollars and Yen
without reference to the states, hence deFinetti’s program will apply. To see this,
select two prizes, such as $0 and $1, to have utilities 0 and 1 respectively. Then,
for 0 < x < 1, U.$x/ must be the value c that makes the following two lotteries
equivalent: L1 D $x, for certain and L2 D $1 with probability c and $0 with
probability 1 c. Assuming that Dollars have constant utility, it is obvious that
c D x. Assuming that Yen have constant utility, the expected utility of L1 is
1:2162xU.100Y/ and the expected utility of L2 is cU.121:62Y/. These two are the
same if and only if x D c. A similar argument works for x not between 0 and 1, and
a similar argument works when the two prizes with utilities 0 and 1 are Yen prizes.
Now, suppose that the agent actually uses the state-independent utility for Dollars
and the discrete uniform distribution to rank acts, but the eliciter does not know
this. The eliciter will try to elicit the agent’s probabilities for the states by offering
gambles in Yen (linear in utility). For example, the agent claims that the gamble
c.f 40:54/ would be accepted for all small values of c, where f .s/ D 150Y if
s D s3 and equals 0Y otherwise. The reason for this is that, since 150Y equals $1
when s3 occurs, the winnings are $1 when s3 occurs, which has probability 1/3.
The marginal exchange rate is 121:62Y for $1, so the appropriate amount to pay
(no matter which state occurs), in order to win $1 when s3 occurs, is $1/3, which
450 M.J. Schervish et al.
equals 121:62Y=3 D 40:54Y. Realizing that utility is linear in Yen, the eliciter now
decides that Pr.s3 / must equal 40:54=150 D :2703. Hence, the eliciter elicits the
wrong probability, even though the agent is coherent!
The expressed preferences satisfy the four axioms of section “State-Independent
Utility”, all of Savage’s postulates, and deFinetti’s linearity condition, but we are
still unable to determine the probabilities of the states based only on preferences.
The problem becomes clearer if we allow both Dollar and Yen prizes at the same
time. Now, it is impossible for a single utility to be state-independent for all prizes.
That is, Axiom 4 and Postulate 3 would no longer hold. Things are more confusing
in deFinetti’s framework, because there is no room for state-dependent utilities.
The agent would appear to have two different probabilities for the same event even
though there would be no incoherency.
In Section 5.5 of Savage (1954), the topic of small worlds is discussed. An anomaly
occurs in this discussion, and Savage seems to imply that it is an effect of the
construction of the small world. In this section, we briefly introduce small worlds
and then explain why we believe that the anomaly discovered by Savage is actually
another example of the non-uniqueness illustrated in section “An Example”. The
fact that it arose in the discussion of small worlds is a mere coincidence. We show
how precisely the same effect arises without any mention of small worlds.
A small world can be thought of as a description of the states of nature in
which each state can actually be partitioned into several smaller states, but we
don’t actually do the partitioning when making comparisons between acts. For
a mathematical example, Savage mentions the following case. Consider the unit
square S D f.x; y/ W 0 x; y 1g as the finest possible partition of the states of
nature. Suppose, however, that we consider as states the subsets x D f.x; y/ W 0
y 1g for each x 2 Œ0; 1. The problem that Savage discovers in this example is
the following. It is possible to define small world prizes in a natural way and for
preferences among small world acts to satisfy all of his axioms and, at the same
time, consistently define prizes in the “grand world” consisting of the whole square
S. However, it is possible for the preferences among small world acts to be consistent
with the preferences among grand world acts in such a way that the probability
measure determined from the small world preferences is not the marginal probability
measure over the sets x induced from the grand world probability. As we will see,
the problem that Savage discovers is due to using different prizes as constants in the
two problems. It is not due to the small world but actually will appear in the grand
world as well.
Any grand world act can be considered a small world prize. In fact, the very
reason for introducing small worlds is to deal with the case in which what we count
as a prize turns out to actually be worth different amounts depending on which
of the subdivisions of the small world state of nature occurs. So, suppose we let
22 State-Dependent Utilities 451
the grand world prizes be non-negative numbers and the grand world acts be all
bounded measurable functions on S. The grand world probability is uniform over
the square and the grand world utility is the numerical value of the prize. In order
to guarantee that Savage’s axioms hold in the small world, choose the small world
prizes to be 0 and positive multiples of a single function h. Assuming that U.h/ D 1,
the smallRworld probability of a set B D fx W x 2 Bg is (from p. 89 of Savage 1954)
Q.B/ D B q.x/dx, where
R1
h.x; y/dy
q.x/ D R 1 R01 : (22.3)
0 0 h.x; y/dydx
R1
Unless 0 h.x; y/dy is constant as a function of x, Q will not be the marginal distri-
R1
bution induced from the uniform distribution over S. However, even if 0 h.x; y/dy
is not constant, the ranking of small world acts is consistent with the ranking
of grand world acts. Let ch.; /, considered as a small world prize, be denoted
c. Let U.c/ D c denote the small world utility of small world prize c. If f is
a small world act, then for each x, f .x/ D c for some c. The expected small
R1
world utility of f is 0 U.f .x//q.x/dx. Let the grand world act f corresponding to
f be defined by f .x; y/ D f .x/h.x; y/. It follows from (22.3) that U.f .x//q.x/ D
R1 R1R1
0 f .x; y/dy= 0 0 h.x; y/dydx. Hence, the expected small world utility of f is
Z 1
R1
0 f .x; y/dy
R1R1 dx;
0 0 0 h.x; y/dydx
which is just a constant times the grand world expected utility of f . Hence, small
world acts are ranked in precisely the same order as their grand world counterparts,
even though the small world probability is not consistent with the grand world
probability.
We claimed that the inconsistency of the two probabilities is due to the
choice of “constants” and not to the small worlds. To see this, let the grand
world constants be 0 and the positive multiples of h. Then an act f in the
original problem becomes an act f with f .x; y/ D f .x; y/=h.x; y/. That
is, the prize that f assigns to .x; y/ is the number of multiples of h.x; y/
that f .x; y/ is. RWe define theRnew probability, for B a two-dimensional Borel
Rset, R.B/ D B h.x; y/dydx=
R R The expected
S h.x; y/dydx. R utility of f is now
S f .x; y/h.x; y/dydx= S h.x; y/dydx D S f .x; y/dydx= S h.x; y/dydx. This is
just a constant times the original expected utility. Hence, acts are ranked in the
same order by both probability-utility representations. Both representations are
state-independent, but each one is relative to a different choice of constants. The
constants in one representation have different utilities in different states in the other
representation. Both representations satisfy Savage’s axioms, however. (Note that
the small world probability constructed earlier is the marginal probability associated
with the grand world probability R, so that Savage’s small world problem evaporates
452 M.J. Schervish et al.
There is one obvious way to avoid the confusion of the previous examples. That
would be to elicit a unique probability without reference to preferences. This is the
approach taken by DeGroot (1970). This approach requires that the agent have an
understanding of the primitive concept “at least as likely as” in addition to the more
widely understood primitive “is preferred to”. Some decision theorists prefer to
develop the theory solely from preference without reference to the more statistical
primitive “at least as likely as”. It is these latter decision theorists who need an
alternative to the existing theories in order to separate probability from utility.
Karni et al. (1983) (see also Karni 1985) propose a scheme for simultaneously
eliciting probability and state-dependent utility. Their scheme is essentially as
follows. In addition to preferences among horse lotteries, an agent is asked to
state preferences among horse lotteries under the assumption that the agent holds
a particular probability distribution over the states (explicitly, they say on p. 1024,
“. . . contingent upon a strictly positive probability distribution p0 on S”.) And they
require the agent to compare acts with different “contingent” probabilities as well.
Karni (1985) describes these (in a slightly more general setting) P as prize-state
lotteries which are functions fO from Z S to <C such that all .z; s/ fO .z; s/ D 1,
and such that the probability fO .z; s/ for each z and s is to be understood in the same
sense as the probabilities involved in the lotteries of section “State-Independent
Utility”. That is, the results of a prize-state lottery are determined by an auxiliary
randomization. The agent is asked to imagine that the state of nature could be chosen
by the randomization scheme rather than by the forces of nature. This is intended
to remove the uncertainty associated with how the state of nature is determined so
that a pure utility can be extracted using Axioms 1, 2, and 3 applied to a preference
relation among prize-state lotteries.
For example, suppose that the agent in section “An Example” expresses a strict
preference for the prize-state lottery that awards $1 in state 2 with probability 1
(fO .$1; s2 / D 1) over gO .$1; s1 / D 1. This preference would not be consistent with
a state-independent utility for dollar prizes, however it would be consistent with a
state-independent utility in Yen prizes.
22 State-Dependent Utilities 453
The pure utility elicited in this fashion is a function of both prizes and states, so
that it is actually a state-dependent utility. So long as the preferences among prize-
state lotteries are consistent with the preferences among horse lotteries, the elicited
state-dependent utility can then be assumed to be the agent’s utility. There will then
be a unique probability such that H1 H2 if and only if the expected utility of H1
is at most as large as the expected utility of H2 . The type of consistency that (Karni
et al. 1983) require between the two sets of preferences is rather more complicated
than it needs to be. The following simple consistency axiom will suffice.
Axiom 5 (Consistency).PFor each non-null state s and each pair .fO1 ; fO2 / of prize-
state lotteries satisfying all z fOi .z; s/ D 1, and some pair of horse lotteries H1 and
H2 satisfying H1 .si / D H2 .si / for all si 6D s and H1 .s/ D f1 and H2 .s/ D f2 , we
have H1 H2 if and only if fO1 fO2 , where f1 and f2 are lotteries that correspond
to fO1 and fO2 as follows: fi D .fOi .z1 ; s/; : : : ; fOi .zm ; s//; i D 1; 2, in the notation of
section “State-Independent Utility”.
All that Axiom 5 says is that preferences among prize-state lotteries with all of their
probabilities on the same state must be reproduced as preferences between horse-
lotteries which differ only in that common state.
Theorem 2. Suppose that there are n states of nature and m prizes. Assume that
preferences among horse lotteries satisfy Axioms 1, 2, and 3. Also assume that
preferences among prize-state lotteries satisfy Axioms 1, 2, and 3. Finally, assume
that Axiom 5 holds. Then there exists a unique probability P over the states and a
utility U W Z S ! <, unique up to positive affine transformation, satisfying
P Pn
1. H1 H2 if and only if niD1 P.si /U.H1 .si /; si / iD1 P.si /U.H2 .si /; si /,
P
where, for each lottery L D .˛1 ; : : : ; ˛m /, U.L; si / stands for m ˛j U.zj ; si /,
Pn Pm Pn Pm jD1
O O
2. f gO if and only if iD1 jD1 f .zj ; si /U.zj ; si / iD1 jD1 gO .zj ; si /U.zj ; si /.
The proof of Theorem 2 makes use of the following theorem from Fisburn (1970,
p. 176):
Theorem 3 (Fishburn). Under Axioms 1, 2, and 3, there exist real-valued func-
tions W1 ; : : : ; Wn such that H1 H2 if and only if
n
X n
X
Wi .H1 .si // Wi .H2 .si //; (22.4)
iD1 iD1
and the Wi that satisfy (22.4) are unique up to a similar positive linear transforma-
tions, with Wi constant if and only if si is null.
We provide only a sketch of the proof of Theorem 2. Let .W1 ; : : : ; Wn / be the state-
dependent utility for horse lotteries guaranteed by Theorem 3, and let VO be the utility
for prize-state lotteries guaranteed by the theorem of VonNeumann and Morgenstern
(1947). All we need to show is that there exist c1 ; : : : ; cn and positive a1 ; : : : ; an such
that for each i D 1; : : : ; n,
O si / C ci ; for all z.
Wi .z/ D ai V.z; (22.5)
454 M.J. Schervish et al.
O
If (22.5) were true, then it follows directly from P(22.4) that U D V would serve
as the state-dependent utility and P.si / D ai = nkD1 ak would be the probability.
The uniqueness follows from the uniqueness of the Wi and of V. O To prove (22.5),
O O
let s D sj for some j and suppose that H1 , H2 , f1 , f2 , f1 , and f2 are as in the
statement of Axiom 5. Now, consider the set Hj of all horse lotteries H such that
H.si / D H1 .si / for all i 6D j. The stated preferences among this set of horse
lotteries satisfies Axioms 1, 2, and 3. Hence there is a utility Vj for this set, and
Vj is unique up to positive affine transformation. Clearly, Wj is such a utility, hence
we will assume that Vj D Wj . Next, consider the set HO j of all prize-state lotteries
P
fO that satisfy m O O
kD1 f .zk ; sj / D 1. The stated preferences among elements of Hj
O
also satisfy Axioms 1, 2, and 3. Hence there is a utility Vj , which is unique up to
positive affine transformation. Clearly V, O with domain restricted to HO j , is such a
O
utility, hence we will assume that Vj D V. O The mapping Tj W Hj ! HO j defined by
Tj .H/.z; s/ D 0 for all .z; s/ with s 6D sj and Tj .H/ D ˛i for z D zi and s D sj ,
where H.sj / D .˛1 ; : : : ; ˛m /, is one-to-one and Tj preserves convex combination. It
then follows from Axiom 5 that, for H1 ; H2 2 Hj , Wj .H1 / Wj .H2 / if and only
O j .H1 // V.T
if V.T O j .H2 //. Since both Vj D Wj and VO j D VO are unique up to
positive affine transformation, we have Wj D aj VO C bj for some positive aj . This
proves (22.5).
Discussion
The need for state-dependent utilities arises out of the possibility that what may
appear to be a constant prize may not actually have the same value to an agent in
all states of nature. Much of probability theory and statistical theory deals solely
with probabilities and not with utilities. If probabilities are unique only relative to
a specified utility, then the meaning of much of this theory is in doubt. Much of
statistical decision theory makes use of utility functions of the form U.; d/, where
is a state of nature and d is a possible decision. The prize awarded when decision
d is chosen and the state of nature is is not explicitly mentioned. Rather, the utility
of the prize is specified without reference to the prize. Although it would appear that
U.; d/ is a state-dependent utility (as well it might be), one has swept comparisons
between states “under the rug,” For example, if U.; d/ D . d/2 , one might ask
how it was determined that an error of 1 when D a has the same utility as an error
of 1 when D b.
DeGroot (1970) avoids these problems by assuming that the concept of one
event being “at least as likely as” another is understood without definition. He
then proceeds to state axioms that imply the existence of a unique subjective
probability distribution over states of nature. (For a discussion of attempts to derive
quantitative probability from qualitative probability, see Narens 1980.) Further
axioms could then be introduced that govern preference. These would then lead
22 State-Dependent Utilities 455
References
• EXAMPLE: Prisoner’s Dilemma with Twin (PDT). You are caught in a standard,
one-shot prisoner’s dilemma (diagram next page), and the other player is your
twin. You don’t know for sure what Twin will do, but you know that Twin is
amazingly like you psychologically. What you do, he or she too will likely do:
news that you were going to rat would be good indication that Twin will rat, and
news that you were going to keep mum would be a good sign that Twin will keep
mum. Your sole goal is to minimize your own time in jail: Family feelings affect
you not, and you care not a whit about loyalty, returning good for good, or how
long Twin spends in jail. What course of action is rational for you in pursuit of
your goals?
Many will find the answer easy—though they may disagree with each other on
which the answer is. A standard line on the prisoner’s dilemma rests on dominance:
What you do won’t affect what Twin does. Twin may rat or keep mum, but in either
In the nearly 20 years since this article was written there has been a revolution in the understanding
of causal and counterfactual reasoning. This revolution had its roots in early work by Rubin
(1974), Holland (1986) and Robbins (1986), which gave rise to the so-called “potential outcomes”
framework. At roughly the same time the closely related “structural equations/causal graphs”
approach was being developed and used to great effect by Spirtes et al. (1993), and Pearl (2000). In
both treatments counterfactual reasoning plays a leading role in causal inference, just as in causal
decision theory. While the core claims of this article remain true, and the basic structure of causal
decision theory remains intact, these new models of provide us with far more sophisticated ways
of representing and identifying causal relationships than were available and widely known when
we wrote. As a result, some of our remarks about “the need for new advances in understanding
of localization in relation to rational belief” have been rendered moot. Readers are encouraged to
investigate these new developments, which we see as great advances.
J.M. Joyce () • A. Gibbard
University of Michigan, Ann Arbor, MI, USA
e-mail: [email protected]; [email protected]
case, you yourself will do better to rat. Whichever Twin is doing, you would spend
less time in jail if you were to rat than if you were to keep mum. Therefore the
rational way to minimize your own time in jail is to rat.
Mum Rat
Mum 1, 1 10, 0
Rat 0, 10 9, 9
Another line of argument leads to the opposite conclusion. Assess each act by its
auspiciousness, by how welcome the news would be that you were about to perform
it. News that you’re about to rat would indicate that Twin is likewise about to rat.
That’s bad news; it means a long time in jail, for you as well as for Twin. News
that you’re about to keep mum, on the other hand, would be good news: It indicates
that Twin is likewise about to keep mum, and your both keeping mum will mean
a short time in jail. Keeping mum, then, is the auspicious act, and so—in terms of
your selfish goals—you achieve best prospects by keeping mum.1
The two lines of reasoning, then, lead to opposite conclusions. One or the other,
to be sure, may strike a reader as obviously wrong. Still, if one of them is cogent and
the other not, decision theory should tell us why. Standard theories haven’t spoken,
though, with one voice on this matter. Savage himself (1972) was mostly silent on
issues that would decide between the two lines: his system could be read in more
than one way, and the few pertinent remarks he left us point in opposing directions.
Various other decision-theoretic systems do have implications for this matter. Some
imply that the argument from auspiciousness is correct: the principle of dominance,
these systems entail, doesn’t properly apply to a case like PDT. Taking the other
side, a group called—perhaps somewhat misleadingly—causal decision theorists
have formulated systems according to which the principle of dominance does apply
to this case, and the rational thing to do is to rat.
“Causal” theorists maintain that decision theory requires a notion of causal
dependency, explicit or implicit. Otherwise, they say, the theory will yield the
wrong prescription for cases like PDT. We touch below on how causal notions
might be made explicit for decision theorists’ purposes, and how causality might
be vindicated as empirically respectable. Auspiciousness theorists—or evidential
1
An interesting model of Prisoner’s Dilemma with a twin can be found in Howard (1988). Howard,
who endorses a version of the auspiciousness argument, shows how to write a Basic program for
playing the game which is capable of recognizing and cooperating with programs that are copies
of itself.
23 Causal Decision Theory 459
decision theorists, as they are called in the literature—have no need for causal terms
in their theory: they manage everything with standard subjective probabilities and
conditional probabilities. Some evidential theorists deny that their theory, properly
construed or developed, really does say not to rat in PDT. They deny, then, that
causal notions must be introduced into decision theory, even if causal theorists are
right about what to do in this case. We touch below on debates between “causal”
theorists and this camp of “evidential” theorists, but mostly stick with the “causal”
theory, explaining it and examining its potentialities.2
Cases with the structure of PDT can’t be rare. The prisoner’s dilemma itself
is a parable, but economics, politics, war, and the like will be full of cases where
one’s own acts suggest how others are acting. Consider, for instance, a sophisticated
speculator playing a market. Mustn’t he reasonably take himself to model other
sophisticated players? Why should he be unique? A rational agent interacting with
others must escape the hubris of thinking that only he is smart and insightful—
but then he’ll have to take himself as a likely model for the schemings and
reasonings of others. In such cases, different versions of decision theory may
prescribe incompatible actions.
where u(A, Si ) is the utility of act A for state Si and (Si ) is the subjective probability
of Si . From (23.1) follows a principle which we’ll call the Unqualified Principle of
Dominance, or UPD:
• UPD: If for each Si , u(A, Si ) > u(B, Si ), then V(A) > V(B).
Which Savage matrix correctly represents a problem, though, must be decided
with care, as is shown by a spoof due to Jeffrey (1967), p. 8:
2
Nozick (1969) introduced PDT and other cases of this kind, focusing his discussion on
Newcomb’s problem, which he credits to physicist William Newcomb. He makes many of the
points that causal theorists have come to accept, but recognizes only one kind of expected utility,
the one we are calling auspiciousness. Stalnaker originated causal expected utility in a 1972 letter
published only much later (Stalnaker 1981). Gibbard and Harper (1978) proselytize Stalnaker’s
proposal, and Lewis (1981) gives an alternative formulation which we discuss below. Gibbard and
Harper (1978) and Lewis (1979b) also discuss PDT along with Newcomb’s problem.
460 J.M. Joyce and A. Gibbard
• EXAMPLE: Better Red Than Dead (BRD). I’m an old-time American cold warrior
with scruples, deciding whether or not my country is to disarm unilaterally. I
construct a matrix as follows: My two possible states are that the Soviets invade
and that they don’t. In case they invade, better red than dead; in case they don’t,
better rich than poor. In either case, unilateral disarmament beats armament, and
so by dominance, I conclude, it is rational to disarm.
Now whether or not unilateral disarmament would be rational all told, this
argument can’t be right. As the proponent of deterrence will point out, unilateral
disarmament may decrease the likelihood of the Soviets’ invading, and a scenario
in which they don’t invade is better than one in which they do. The argument from
“dominance” treats these considerations as irrelevant—even if the Soviets are sure
to invade if we disarm and to hold back if we arm.
Savage’s states, then, must be act-independent: They must obtain or not inde-
pendently of what the agent does. How, then, shall we construe this requirement?
The first answer was developed, independently, by Jeffrey (1967) and by Luce
and Krantz (1971): For dominance correctly to apply, they say, the states must be
stochastically (or probabilistically) independent of acts. Where acts A1 : : : Am are
open to the agent and (S/Aj ) is the standard conditional probability of S given Aj ,3
S is stochastically act-independent iff
3
More precisely, Ai is the proposition that one performs a particular one of the alternative acts open
to one in one’s circumstances. We reserve the notation (SjAj ) for a more general use later in this
chapter.
23 Causal Decision Theory 461
n
X
V.A/ D .Si =A/u.A; Si/: (23.3)
iD1
The Savage formula (23.1) is then a special case of (23.3), for conditions of
evidential act-independence.4 Since UPD follows from (23.1), it follows from (23.3)
plus condition (23.2) of evidential act-independence. Evidential decision theory,
then, has (23.3) as its general formula for expected utility. Its version of the principle
of dominance is UPD qualified by condition (23.2) of evidential act-independence.
In general, it recommends using “auspiciousness” to guide choices: a rational agent
should select an act whose performance would be best news for him—roughly, an
act that provides best evidence for thinking desirable outcomes will obtain.
Evidential theory has the advantage of avoiding philosophically suspect talk
of causality: Its general formula (23.3) sticks to mathematical operations on
conditional probabilities, and likewise, its requirement (23.2) of evidential act-
independence—its condition, that is, for the Savage formula (23.1) to apply to a
matrix—is couched in terms of conditional probabilities.
The causal theorist, in contrast, maintains that to apply a principle of dominance
correctly, one can’t avoid judgments of causality. One must form degrees of
belief as to the causal structure of the world; one must have views on what is
causally independent of what. Belief in Twin’s causal isolation is a case in point.
Dominance applies to PDT, their contention is, because you and your twin are
causally isolated—and you know it. What you do, you know, will in no way affect
what twin does. The argument, then, invokes a causal notion: the notion of what will
and what won’t causally affect what else. Causal decision theory then recommends
using causal efficacy to guide choices: It holds, roughly, that a rational agent should
select an act whose performance would be likely to bring about desirable results.
The causal decision theorist’s requirement on a state S of a Savage matrix, then,
is the following: The agent must accept that nothing he can do would causally affect
whether or not S obtains. For each act A open to him, he must be certain that S’s
obtaining would not be causally affected by his doing A.
Can the causal theorist find a formula for expected utility that dispenses with this
requirement of believed causal act-independence? A way to do so was proposed
by Stalnaker (1968); see also Gibbard and Harper (1978). It requires a special
conditional connective, which we’ll render ‘ !’. Read ‘A ! B’ as saying, “If
A obtained then B would.” In other words, either A’s obtaining would cause B to
obtain, or B obtains independently (causally) of whether or not A obtains. Then
to say that S is causally independent of which act A1 : : : An one performs is to say
this: Either S would hold whatever one did, or whatever one did S would fail to hold.
In other words, for every act Ai , we have Ai ! S iff S. We can now generalize
the Savage formula (23.1) for the causal theorist’s kind of expected utility. Use as
weights, now, the probabilities of conditionals (A ! Si), as follows:
4
Formula (11.1) is introduced by Jeffrey (1967) and by Luce and Krantz (1971).
462 J.M. Joyce and A. Gibbard
n
X
U.A/ D .A !Si /u.A; Si / (23.4)
iD1
Call this U(A) the instrumental expected utility of act A. The Savage formula
n
X
U.A/ D .Si /u.A; Si /; (23.5)
iD1
is then (23.4) for the special case where the following condition holds:
A sufficient condition for this to hold is that, with probability one, Si is causally
independent of A—in other words, that
Note that for the prisoner’s dilemma with twin, condition (23.7) does hold. Twin,
you know, is causally isolated from you. You know, then, that whether Twin would
defect if you were to defect is just a matter of whether twin is going to defect
anyway. In other words, you know that Dy ! Dt holds iff Dt holds, and so for
you, ([Dy ! Dt ] $ Dt ) D 1. This is an instance of (23.7), and similar informal
arguments establish the other needed instances of (23.7) for the case.
In short, then, causal decision theory can be formulated taking formula (23.4)
for instrumental expected utility as basic. It is instrumental expected utility as given
by (23.4), the causal theorist claims, that is to guide choice. The Savage formula
is then a special case of (23.4), for conditions of known causal act-independence—
where (23.7) holds, so that for each state Si , (A ! Si ) D (Si ). The Unqualified
Principle of Dominance for U is
• UPD: If for each Si , u(A, Si ) > u(B, Si ), then U(A) > U(B).
Causal decision theory, in this formulation, has (23.4) as its general formula for
the instrumental expected utility that is to guide choice, and its own version of the
principle of dominance: UPD qualified by condition (23.7) of known causal act-
independence.
Evidential and causal decision theorists, in short, accept different general
formulas for the expected utility that is to guide choice, and consequently, they
accept different conditions for the Savage formula to apply, and different principles
of dominance. Causal theory—in the formulation we’ve been expounding—adopts
(23.4) as its formula for expected utility, whereas evidential theory adopts (23.3).
Causal theory, in other words, weighs the values of outcomes by the probabilities
of the relevant conditionals, (A ! Si ), whereas evidential theory weighs them
by the relevant conditional probabilities (Si /A). Different conditions, then, suffice,
according to the two theories, for the Savage formula correctly to apply to a matrix,
23 Causal Decision Theory 463
and consequently for UPD to apply. That makes for distinct principles of dominance:
For the causal theorist, UPD qualified by condition (23.7) of known causal act-
independence, and for the evidential theorist, UPD qualified by condition (23.2) of
evidential act-independence.
What, then, is the contrast on which all this hinges: the contrast between the
probability (A ! S) of a conditional A ! S and the corresponding conditional
probability (S/A)? Where probability measure gives your credences—your
degrees of belief—the conditional probability (S/A) is the degree to which you’d
believe S if you learned A and nothing else. In the prisoner’s dilemma with your
twin, then, (Dt /Dy ) measures how much you’d expect twin to rat on learning that
you yourself were about to rat. If (Dt /Dy ) ¤ (Dt /Cy ), that doesn’t mean that Dt is
in any way causally dependent on whether Dy or Cy obtains. It just means that your
act is somehow diagnostic of Twin’s. Correlation is not causation. Probability (Dy
! Dt ), on the other hand, is the degree to which you believe that if you were to
defect, then Twin would. There are two circumstances in which this would obtain:
Either Twin is about to defect whatever you do, or your defecting would cause Twin
to defect. To the degree to which (Dy ! Dt ) > (Dy ! Ct ), you give some
credence to the proposition [Dy ! Dt ]&:[Dy ! Ct ], that twin would defect if
you did, but not if you cooperated. This is credence in the proposition that your act
will make a causal difference.
In daily life, we guide ourselves by judgments that seem to be conditional: What
would happen if we did one thing, or did another? What would be the effects of the
various alternatives we contemplate? We make judgments on these matters and cope
with our uncertainties. Classic formulations of decision theory did not explicitly
formalize such notions: notions of causal effects or dependency, or of “what would
happen if”. In Ramsey’s and Savage’s versions, causal dependency may be implicit
in the representational apparatus, but this formal apparatus is open to interpretation.
Other theorists had hoped that whatever causal or “would” beliefs are involved
in rational decisions could be captured in the structure of an agent’s conditional
probabilities for non-causal propositions or events.
This last maneuver might have great advantages if it worked, but causal decision
theorists argue that it doesn’t. Causal or “would” notions must somehow be
introduced into decision theory, they claim, if the structure of decision is to be
elucidated by the theory. The introduction can be explicit, as in the general forumula
(23.4) for U above, or it can be in the glosses we give—say, in interpretations of
the Savage formula (23.1) or (23.5). If causal theorists are right, then, the theory
of subjective conditional probability won’t give us all we need for describing the
beliefs relevant to decision. We’ll need some way of displaying such beliefs and
theorizing about them.
464 J.M. Joyce and A. Gibbard
Causal theorists have differed, though, on how causal beliefs are best represented.
So far, we’ve spoken in Stalnaker’s terms, but we need to say more on what his
treatment consists in, and what some of the alternatives might be for representing
causal decision theory.
First, some terminology. Savage spoke of “states” and “events”, and distin-
guished these from “acts” or “strategies”. The philosophers who developed causal
decision theory often spoke of “propositions”, and include as propositions not
only Savage’s “events” and “states”, but also acts and strategies. That is to say,
propositions can characterize not only what happens independently of the agent,
but also what the agent does—or even what he would do in various eventualities.
A proposition can say that I perform act a or adopt strategy s. Such propositions
can be objects of belief and of desire, and so can be assigned credences (subjective
probabilities) and utilities.
Let A, then, be the proposition that I perform act a. Stalnaker constructs a
conditional proposition A ! B, which we read as “If I did A, then B would
obtain.” How does such a conditional proposition work? Much as Savage treats an
event as a set of states, so Stalnaker treats a proposition as a set of possible worlds or
maximally specific ways things might have been. Abstractly, the connective ‘ !’
is a two-place propositional function: To each pair of propositions it assigns a
proposition.
Stalnaker hoped originally that this conditional function ! could be defined
so that the probability of a conditional is always the corresponding conditional
probability: So that whenever (C/A) is defined, (A ! C) D (C/A). Lewis
(1976) proved that—with trivial exceptions—no such equality will survive con-
ditionalization. Read A , in what follows, as probability measure conditioned
on A, so that by definition, A (C) D (C/A). What Lewis showed impossible is
this: that for all propositions A, C and B for which (A&B) > 0, one has B (A
! C ) D B (C/A). For if this did obtain, then one would have both C (A !
C) D C (C/A) and :C (A W C) D :C (C/A). But then
We’d have (A ! C) D (C/A), then, at most when (C) D (C/A). No such
equality can survive conditionalization on an arbitrary proposition.
How, then, should we interpret the probability (A ! C) of a conditional
proposition A ! C, if it is not in general the conditional probability (C/A). Many
languages contrast two forms of conditionals, with pairs like this one5 :
5
Adams (1975) examines pairs like this.
23 Causal Decision Theory 465
Conditionals like (23.8) are often called indicative, and conditionals like (23.9)
subjunctive or counterfactual. Now indicative conditional (23.8) seems epistemic:
To evaluate it, you might take on, hypothetically, news that Shakespeare didn’t write
Hamlet. Don’t change anything you now firmly accept, except as you would if this
news were now to arrive. See, then, if given this news, you think that someone else
did write Hamlet. You will, because you are so firmly convinced that Hamlet was
written by someone, whether or not the writer was Shakespeare. The rule for this
can be put in terms of a thinker’s subjecive probabilities—or her credences, as we
shall say: indicative conditional (23.8) is acceptable to anyone with a sufficiently
high conditional credence (E/D) that someone else wrote Hamlet given that
Shakespeare didn’t. Subjunctive conditional (23.9) works differently: If you believe
that Shakespeare did write Hamlet, you will find (23.9) incredible. You’ll accept
(23.8), but have near zero credence in (23.9). Your conditional credence (E/D) in
someone else’s having written Hamlet given that Shakespeare didn’t will be high,
but your credence (D ! E) in the subjunctive conditional proposition (23.9) will
be near zero. Here, then, is a case where one’s credence in a conditional proposition
(23.9) diverges from one’s corresponding conditional credence. Speaking in terms
of the subjective “probabilities” that we have been calling credences, we can put the
matter like this: the probability (D ! E) of a subjunctive conditional may differ
from the corresponding conditional probability (E/D).
The reason for this difference lies in the meaning the ! operator. Stalnaker
(1968) puts his account of conditionals in terms of alternative “possible worlds”.
A world is much like a “state” in Savage’s framework (except that it will include
a strategy one might adopt and its consequences). Think of a possible world as a
maximally specific way things might have been, or a maximally specific consistent
proposition that fully describes a way things might have been. Now to say what
the proposition A ! C is, we have to say what conditions must obtain for it
to be true. There is no difficulty when the antecedent A obtains, for then, clearly,
A ! C holds true if and only if C obtains. The puzzle is for cases where A is
false. In those situations, Stalnaker proposes that we imagine the possible world
wA in which A is true, and that otherwise is most similar to our actual world in
relevant respects. A ! C, then, holds true iff C holds in this world wA . Stalnaker
and Thomason offered a rigorous semantics and representation theorem for this
explication. Stalnaker’s distinctive axioms are these6 :
• Intermediate strength: If A necessitates B, then A ! B, and if A ! B, then
:(A&:B)
• Conditional non-contradiction: For possible A, :[(A ! B)&(A ! :B)].
6
Stalnaker (1968) p. 106, Stalnaker and Thomason (1970), slightly modified.
466 J.M. Joyce and A. Gibbard
7
Fine (1975) gives this example to make roughly this point.
8
Shin (1991a), for instance, devises a metric that seems suitable for simple games such as
“Chicken”.
23 Causal Decision Theory 467
Conditional excluded middle obtains in Stalnaker’s model. There are models and
logics for conditionals in which it does not obtain.9 In a case like this, however,
the right weight to use for decision is clearly not the probability (F ! H) of a
conditional proposition F ! H. If I’m convinced that neither F ! H nor F
! :H obtains, then my subjective probability for each is zero. The weight to use
if I’m betting on the coin, though, should normally be one-half.
A number of ways have been proposed to handle cases like these. One is to think
that with coins and the like, there’s a kind of conditional chance that isn’t merely
subjective: the chance with which the coin would land heads were I to flip it. Write
this F (H). Then use as one’s decision weight the following: one’s subjectively
expected value for this objective conditional chance. Suppose you are convinced
that the coin is loaded, but don’t know which way: You think that the coin is loaded
either .6 toward heads or .6 toward tails, and have subjective probability of .5 for
each of these possibilities:
9
Lewis (1973) constructs a system in which worlds may tie for most similar, or it may be that
for every A-world, there is an A-world that is more similar. He thus denies Conditional Excluded
Middle: It fails, for instance, when two A-worlds tie for most similar to the actual world, one a
C-world and the other a :C-world.
468 J.M. Joyce and A. Gibbard
Your subjectively expected value for F (H), then, will be the average of .6 and .4.
Call this appropriate decision weight "F (H). We can express this weighted averaging
in measure theoretic terms, so that in general,
Z 1
"A .C/ D x .A .C/ 2 dx/ : (23.12)
0
"A (C) thus measures one’s subjective expectation of C’s obtaining were A to occur:
the sum of (i) the degree to which A’s obtaining would tend to bring it about that
C obtained, plus (ii) the degree to which C would tend to hold whether or not A
obtained.
We can now write formulas for U using "A (C) where we had previously used (A
! C). Formula (23.4) above for instrumental expected utility now becomes
n
X
U.A/ D "A .Si /u.A; Si / (23.13)
iD1
10
Skyrms (1980) offers another formulation, invoking a distinction between factors that are within
the agent’s control and factors that aren’t. Lewis (1981) discusses both Skyrms and unpublished
work of Jordan Howard Sobel, and Skyrms (Skyrms 1984), 105–6, compares his formulation with
those of Lewis (1981) and Stalnaker (1981).
23 Causal Decision Theory 469
causes as we can about other features of the layout of the world. A rational thinker
forms his credences in causal propositions in much the same Bayesian way he does
for any other matter: He updates his subjective probabilities by conditionalizing on
new experience. He starts with reasonable prior credences, and updates them. Sub-
jective probability theorists like de Finetti long ago explained how, for non-causal
propositions, updating produces convergence. The story depends on surprisingly
weak conditions placed on the thinker’s prior credence measure. The same kind
of story, we suspect, could be told for credences in objective chance and objective
dependence.
Lewis (1980) has told a story of this kind for credence in objective chance.
His story rests on what he labels the “Principal Principle”, a condition which
characterizes reasonable credences in objective chances. Take a reasonable credence
measure , and a proposition about something that hasn’t yet eventuated—say, that
the coin I’m about to flip will land heads. Let me conditionalize his credences on the
proposition that as of now, the objective probability of this coin’s landing heads is
.6. Then his resulting conditional credence in the coin’s landing heads, the principle
says, will likewise be .6. Many features of reasonable credence in objective chance
follow from from this principle. From a condition on reasonable prior credences
in objective chance follows an account of how one can learn about them from
experience.
A like project for objective dependency hasn’t been carried through, so far as
we know, but the same broad approach would seem promising.11 In the meantime,
there is much lore as to how experience can lead us to causal conclusions—and
even render any denial of a causal depencency wildly implausible. The dependence
of cancer on smoking is a case in point. Correlation is not causation, we all know,
and a conditional probability is not a degree of causal dependence. (In the notation
introduced above, the point is that (C/A) need not be "A (C), the subjectively
expected value of the objective chance of C were one to do a.) Still, correlations,
examined with sophistication, can evidence causation. A chief way of checking is
to “screen off” likely common causes: A correlation between smoking and cancer
might arise, say, because the social pressures that lead to smoking tend also to lead to
drinking, and drinking tends to cause cancer. A statistician will check this possibility
by separating out the correlation between smoking and cancer among drinkers on
the one hand, and among non-drinkers on the other. More generally, the technique
is this: A correlation between A and C, imagine, is suspected of being spurious—
suspected not to arise from a causal influence of A on C or vice versa. Let F be a
suspected common cause of A and C that might account for their correlation. Then
see if the correlation disappears with F held constant. Econometricians elaborate
such devices to uncover causal influences in an economy. The methodological
literature on gleaning causal conclusions from experience includes classic articles
by Herbert Simon (see Simon (1957), chs. 1–3).
11
The work of Spirtes et al. (1993) and Pearl (2000) goes a long way toward realizing this goal.
470 J.M. Joyce and A. Gibbard
Screening off is not a sure test of causality. A correlation might disappear with
another factor held constant, not because neither factor depends causally on the
other, but because the causal dependency is exactly counterbalanced by a contrary
influence by a third factor.12 Such a non-correlation might be robust, holding reliably
with large sample sizes. But it will also be a coincidence: opposing tendencies may
happen to cancel out, but we can expect such cases to be rare. Lack of correlation
after screening off is evidence of lack of causal influence, but doesn’t constitute lack
of causal influence.
When controlled experiments can be done, in contrast, reasonable credences
in a degree of objective dependency can be brought to converge without limit as
sample size increases. Subjects are assigned to conditions in a way that we all agree
has no influence on the outcome: by means of a chance device, say, or a table of
pseudo-random numbers. Observed correlations then evidence causal dependence
to whatever degree we can be confident that the correlation is no statistical fluke.
With the right kind of partition, then, screening off does yield a reliable test of
causality. But what makes a partition suitable for this purpose, we would claim,
must be specified in terms that somehow invoke causality—in terms, for instance,
of known causal independence.
How we can use evidence to support causal conclusions needs study. Standard
statistical literature is strangely silent on questions of causation, however much
the goals of statistical techniques may be to test and support causal findings. If
we are right, then one class of treatments of causality will fail: namely, attempts
to characterize causal beliefs in terms of the subjective probabilities and the like
of non-causal propositions. Correct treatments must take causality as somehow
basic. A constellation of relations—cause, chance, dependence, influence, laws of
nature, what would happen if, what might likely happen if—are interrelated and
may be intercharacterizable, but they resist being characterized purely from outside
the constellation. Our hope should be that we can show how the right kind of
evidence lets us proceed systematically from causal truisms to non-obvious causal
conclusions. Fortunately, much of decision and game theory is already formulated in
terms that are causal, implicitly at least, or that can be read or interpreted as causal.
(When games are presented in normal form, for instance, it may be understood that
no player’s choice of strategies depends causally on the choice of any other.) A chief
aim of causal decision theory is to make the role of causal beliefs in decision and
game theory explicit.
Ratificationism
12
Gibbard and Harper (1978), 140–2, construct an example of such a case.
13
See, for example, Horgan (1981).
23 Causal Decision Theory 471
The case has already been made for an agent who is already certain about what
she will decide. But what about agents who have yet to made up their minds? Here
things get dicey. If you have not yet decided what to do, then the probabilities you
assign to Cy and D y will be far from one. This puts the auspiciousness values of
Cy and Dy near those of (Cy &Cy ) and (Dy &D y ) respectively, and since V(C&y )
C
> V(D&D y ), evidential decision theory tells you to choose cooperation. However,
as soon as you make this choice, you will assign D y a credence close to one, and as
we have seen, you will then favor defection. Thus, the pursuit of good news forces
you to make choices that you are certain to rue from the moment you make them—
clearly something to avoid.
Jeffrey hopes to circumvent this difficulty by denying that evidential decision
theory requires one to maximize auspiciousness as one currently estimates it. If
you are savvy, he argues, you will realize that any choice you make will change
some of your beliefs, thereby altering your estimates of auspiciousness. Thus, given
that you want to make decisions that leave you better off for having made them,
you should aim to maximize auspiciousness not as you currently estimate it, but
as you will estimate it once your decision is made. You ought to, “choose for the
person you expect to be when you have chosen,”14 by maximizing expected utility
computed relative to the personal probabilities you will have after having come to a
firm decision about what to do. This is only possible if your choices conform to the
maxim
• Evidential Ratifiability. An agent cannot rationally choose to perform A unless
A is ratifiable, in the sense that V(A&4A ) V(B&4A ) for every act B under
consideration.
This principle advises you to ignore your current views about the evidentiary
merits of cooperating versus defecting, and to focus on maximizing future auspi-
ciousness by making choices that you will regard as propitious from the epistemic
perspective you will have once you have made them. Since in the presence of
Screening, defection is the only such choice, the maxim of Evidential Ratifiability
seems to provide an appropriately “evidentialist” rationale for defecting in PDT.
Unfortunately, though, the Screening condition need not always obtain. There
are versions of PDT in which the actual performance of an act provides better
evidence for some desired state than does the mere decision to perform it. This
would happen, for example, if you and twin are bumblers who tend to have a
similar problems carrying out your decisions. The fact that you were able to carry
out a decision would then be evidentially correlated with you twin’s act, and this
correlation would not be screened off by the decision itself. In such cases the
evidential ratifiability principle sanctions cooperation.15 Therefore, the Jeffrey/Eells
14
Jeffrey (1983) p. 16.
15
Jeffrey, in his original published treatment of ratificationism (Jeffrey 1983), 20, gives this
counterexample and credits it to Bas van Fraassen. Shin (1991b) treats cases in which the respective
players’ “trembles” are independent of each other.
23 Causal Decision Theory 473
strategy does not always provide a satisfactory evidentialist rationale for defecting
in PDT. We regard this failure as reinforcing our contention that any adequate
account of rational choice must recognize that decision makers have beliefs about
causal or counterfactual relationships, beliefs that cannot be cashed out in terms of
ordinary subjective conditional probabilities—in terms of conditional credences in
non-causal propositions.
Despite its failure, the Jeffrey/Eells strategy leaves the theory of rational choice
an important legacy in the form of the Maxim of Ratifiability. We will see in this
section that it is possible to assimilate Jeffrey’s basic insight into casual decision
theory and that so understood, it codifies a type of reasoning commonly employed
in game theory. Indeed, the idea that rational players always play their part in a Nash
equilibrium is a special case of ratificationism.
The notion of a ratifiable act makes sense within any decision theory with the
resources for defining the expected utility of one act given the news that another
will be chosen or performed. In causal decision theory the definition would be this:
n
X
U.B=A/ D ..B !Si /=A/u.B; Si/
iD1
is high, it also follows that your subjective probability for Ht will be high. Thus by
recognizing that you plan to play [HEADS], you give yourself evidence for thinking
that Twin will also play [HEADS]. Note, however, this does nothing to alter the fact
that you still judge Twin’s actions to be causally independent of your own; your
subjective probabilities still obey
HEADS TAILS
HEADS 1, 1 1, 1
TAILS 1, 1 1, 1
Matching Pennies
Hy !Ht =Hy .Tu !Ht / =Hy :
Since (Ht) 1, your overall position is this: because you are fairly sure that you
will play heads, you are fairly sure that Twin will play heads too; but you remain
convinced that she would play heads even if (contrary to what you expect) you were
to play tails. Under these conditions, the conditional expected utility associated with
[TAILS] larger than that associated with [HEADS] on the supposition that [HEADS]
is played. That is to say, [ HEADS] is unratifiable. Similar reasoning shows that
tails is also unratifiable. In fact, the only ratifiable act this game is the mixture ½
[HEADS] C ½ [TAILS].16
It is no coincidence that this mixture also turns out to be the game’s unique Nash
equilibrium; there is a deep connection between ratifiable acts and game-theoretic
equilibria. Take any two-person game, such as Matching Pennies, for which the
unique Nash equilibrium consists of mixed strategies (which the players in fact
adopt). If a player has predicted the other’s strategy correctly and with certainty, then
by playing the strategy she does, she maximizes her expected utility. But this isn’t
the only strategy that, given her credences, would maximize her expected utility;
any other probability mixture of the same pure strategies would do so too. The
16
Piccione and Rubinstein (1997) present another kind of case in which considerations of
ratifiability may be invoked: the case of the “absent-minded driver” who can never remember
which of two intersections he is at. One solution concept they consider (but reject) is that of
being “modified multi-selves consistent”. In our terms, this amounts to treating oneself on other
occasions as a twin, selecting a strategy that is ratifiable on the following assumption: that one’s
present strategy is fully predictive of one’s strategy in any other situation that is subjectively just
like it. This turns out to coincide with the “optimal” strategy, the strategy one would adopt if one
could choose in advance how to handle all such situations.
23 Causal Decision Theory 475
strategy she adopts is unique, though, in this way: it is the only strategy that could
be ratifiable, given the assumption that her opponent has predicted her strategy
and is playing a best response to it. It should be clear that this argument extends
straightforwardly to the n-person case. In any Nash equilibrium, all players perform
causally ratifiable acts.
C1 C2
R1 1, 1 0, 4
R2 4, 0 15, 15
Chicken
17
More precisely, correlated equilibrium is the weakest equilibrium solution concept which
assumes that all players have common beliefs. When this assumption is relaxed one obtains a
subjectively correlated equilibrium. For details see Aumann (1974, 1987).
476 J.M. Joyce and A. Gibbard
R1 pq p(1 q)
R2 (1 p)q (1 p) (1 q)
R1 p +q 1 1 q
R2 1 p 0
each act A signalled by the arbitrator: for instance, “Stop on red and go on green”
or “Run all lights.” A correlated equilibrium is simply a pair of adapted strategies r
and c such that, for all alternatives r* and c*, the following condition holds (call it
CE):
X X
.Ri &Cj / Urow .r.Ri /; Cj / .Ri &Cj / Urow .r .Ri /; Cj /
ij ij
X X
.Ri &Cj / Ucol ..Ri /; c.Cj // .Ri &Cj / Ucol .Ri c .Cj //
ij ij
23 Causal Decision Theory 477
R1
R2 0
for all alternatives R* and C*. CE* requires, then, that agents assign zero prior
probability to any act, either of their own or of their adversaries’, that does not
maximize expected utility on the condition that it will be performed. Aumann
regards this condition as “an expression of Bayesian rationality,” since players
satisfy it by maximizing expected utility at the time they act.
As a number of authors have noted, CE* is an application of the maxim of
ratifiabilty.19 It requires players to give zero credence to non-ratifiable acts. Hence
for a group of players to end up in a correlated equilibrium, it is necessary and
sufficient that all choose ratifiabile acts and expect others to do so as well. Aumann’s
“Bayesian rationality” thus coincides with the notion of rationality found in Jeffrey’s
ratificationism. More specifically, we contend that Aumann is requiring rational
players to choose causally ratifiable actions.
18
Note that such a distribution determines a unique mixed act for each player. Thus, it makes no
difference whether one talks about the players’ acts or the players’ beliefs being in equilibrium.
19
Shin (1991b), Skyrms (1990).
478 J.M. Joyce and A. Gibbard
While Aumann has rather little to say about the matter, he clearly does not
mean the statistical correlations among acts in correlated equilibrium to reflect
causal connections. This is particularly obvious when external signaling devices
are involved, for in such cases each player believes that his opponents would heed
the arbitor’s signal whether or not he himself were to heed it. Each player uses his
knowledge of the arbitor’s signal to him to make inferences about the signals given
to others, to form beliefs about what his opponents expect him to do, and ultimately
to justify his own policy of following the arbitor’s signal. What he does not do is
suppose that the correlations so discovered would continue to hold no matter what he
decided to do. For example, if [ ROW]’s credences are given in Table 23.3, it would be
a mistake for him to run a red light in hopes of making it certain that [COLUMN] will
stop; [COLUMN’s action, after all, is determined by the signal she receives, not by
what [ROW] does. Notice, however, that a straightforward application of evidential
decision theory recommends running red lights in this circumstance—further proof
that the view is untenable. In cases, then, where correlations are generated via
external signaling, a causal interpretation of CE* clearly is called for. To make this
explicit, we can rewrite CE* as the following condition CE**:
X X
..R !Cj /=R/ Urow .R; Cj / ..R !Cj /=R/ Urow .R ; Cj /
j j
X X
..C !Ri /=C/ Ucol .Ri ; C/ ..C !Ri /=C/ Ucol .Ri ; C /
j i
This reduces to CE* on the assumption that [ ROW] and [COLUMN] cannot
influence each other’s acts, so that both R !Cj =R D Cj =R and
..C !Ri / =C/ D .Ri =C/ hold for all R* and C*.
The situation is not appreciably different in cases where the correlation arises
without any signaling device. Imagine playing [ROW] in the coordination game in
Table 23.4, and consider the correlated equilibrium described by Table 23.5: These
correlations need not have arisen through signaling. You might find the (R1 , C1 )
equilibrium salient because it offers the vastly higher payoff, and, knowing that
R1 25, 1 0, 0
R2 0, 0 1, 2
23 Causal Decision Theory 479
R1 0.7 0.01
R2 0.01 0.28
your opponent can appreciate its salience for you, you might conclude that she
expects you to play it. That makes C1 the better play for her, which reinforces
your intention to play R1 and your belief that she will play C1 , and so on. It would
not be unreasonable under these conditions for the two of you to end up in the
correlated equilibrium of Table 23.5. Still, there is no suggestion here that your
initial inclination to play R1 is somehow responsible causally for your opponent’s
credences. Her credences are what they are because she suspects that you are
inclined to play R1 , but neither your decision to play R1 nor your actually playing
of R1 is the cause of these suspicions – she develops them solely on the basis of her
knowledge of the game’s structure. As in the signaling case, the acts in a correlated
equilibrium are evidentially correlated but causally independent.
This explicitly causal reading of Aumann’s discussion helps to clear up a
perplexing feature of correlated equilibria. CE only makes sense as a rationality
constraint if agents are able to treat their own actions as bits of information about
the world, for it is only then that the expressions appearing to the right of the “”
signs can be understood as giving utilities for the starred acts. As Aumann notes, his
model (like that of Jeffrey before him)
does away with the dichotomy usually perceived between uncertainty about acts of nature
and of personal players : : : . In traditional Bayesian decision theory, each decision maker
is permitted to make whatever decision he wishes, after getting whatever information he
gets. In our model this appears not to be the case, since the decision taken by each decision
maker is part of the description of the state of the world. This sounds like a restriction on
the decision maker’s freedom of action. (Aumann 1987), 8.
The problem here is that the utility comparisons in CE seem to portray acts as
things that happen to agents rather than things they do. Moreover, it is not clear why
an agent who learns that he is surely going to perform R should need to compare it
with other acts that he is sure he will not perform. Aumann tries to smooth things
over by suggesting that CE describes the perspective of agents, not as choosers, but
as “outside observers.” He writes,
The “outside observer” perspective is common to all differential information models in
economics : : : . In such models, each player gets some information or “signal”; he hears
only the signal sent to him, not that of others. In analyzing his situation, [the] player must
480 J.M. Joyce and A. Gibbard
first look at the whole picture as if he were an outside observer; he cannot ignore the
possibility of his having gotten a signal other than he actually got, even though he knows
that he actually did not get such a signal. This is because the other players do not know what
signal he got. [He] must take the ignorance of the other players into account when deciding
on his own course of action, and he cannot do this if he does not explicitly include in the
model signals other than the one he actually got. (Aumann 1987), 8.
The problem with this response is that it does not tell us how a player is supposed
to use the knowledge gained from looking at things “externally” (i.e., as if he were
not choosing) to help him with his “internal” decision (where he must choose how
to act). The point is significant, since what’s missing from the outside observer
standpoint is the very thing that makes something a decision problem—the fact that
a choice has to be made.
In our view, talk about outside observers here is misleading. Aumann’s third-
person, “outside observer” perspective is the first-person subjunctive perspective: a
view on what would happen if I were to do something different. It is an advantage
of causal decision theory that it allows one to assess the rationality of acts that
certainly will not be performed in the same way as one assesses the rationality of
any other option. One simply appeals to whatever facts about objective chances,
laws of nature, or causal relations are required to imagine the “nearest” possible
world in which the act is performed, the one most similar, in relevant respects, to
the actual world in which one won’t perform the act. One then sees what would
ensue under those circumstances. Even, for instance, if one assigns zero credence
to the proposition that one will intentionally leap off the bridge one is crossing, it
still makes sense on the causal theory to speak of the utility of jumping. Indeed the
abysmally low utility of this action is the principal reason why one is certain not
perform it. When we interpret correlated equilibria causally, then, along the lines
of CE**, there is no need for an “external” perspective in decision making. All
decisions are made from the first-person subjunctive perspective of “What would
happen if I performed that act?”
A number of other aspects of game-theoretic reasoning can likewise be analyzed
in terms of causal ratifiability.20 Some of the most interesting work in this area is
due to Harper, who has investigated the ways in which causal decision theory and
ratifiability can be used to understand extensive-form games (Harper 1986, 1991).
This is a natural place for issues about causation to arise, since players’ choices
at early stages in extensive-form games can affect other players’ beliefs—and thus
their acts—at later stages.
Harper proposes using the maxim of ratifiability as the first step in an analysis of
game-theoretic reasoning that is “eductive” or “procedural”, an analysis that seeks to
supply a list of rules and deliberative procedures that agents in states of indecision
can use in order to arrive at an intuitively correct equilibrium choice.21 Rational
20
See Shin (1991b), for instance, for an interesting ratificationist gloss on Selten’s notion of a
“perfect” equilibrium.
21
On the need for such “eductive” procedures see Binmore (1987, 1988).
23 Causal Decision Theory 481
players, he suggests, should choose actions that maximize their unconditional causal
expected utility from among the ratifiable alternatives. (Harper regards cases where
no ratifiable strategies exist as genuinely pathological.) The idea is not simply to
choose acts that maximize unconditional expected utility, since these need not be
ratifiable. Nor is it to choose acts with maximal expected utility on the condition
that they are performed, since these may have low unconditional utility. Rather,
one first eliminates unratifiable options, and then maximizes unconditional expected
utility with what is left. In carrying out the second step of this process, each player
imagines her adversaries choosing among their ratifiable options by assigning zero
probability to any option that wouldn’t be ratifiable.
Harper shows that both in normal and in extensive-form games, players who
follow these prescriptions end up choosing the intuitively “right” act in a wide
range of cases. In extensive-form games, his method produces choices that are in
sequential equilibrium—and perhaps most interesting, the method seems to promote
a strong form of “forward induction.” To illustrate the latter point, and to get a sense
of how Harper’s procedure works in practice, consider the game of Harsanyi and
Selten (1988) shown here. [ROW]’s act A yields fixed payoffs, whereas [ROW]’s act
B leads to a strategic subgame with [COLUMN]. Harsanyi and Selten argue that (C,
e) is the unique rational solution to the subgame, and they use backwards induction
to argue that (AC, e) is the only rational solution in the full game. Their thought is
that at the initial choice point [ROW] will know that (C, e) would be played if the
subgame were reached, so he actually faces the “truncated game” as shown below,
which makes A the only rational choice.
Kohlberg and Mertens (1986) have objected to this reasoning on the grounds
that [ROW] can perform B as a way of signaling [COLUMN] that he has chosen BD
(since it would be crazy to pass up A for C), and can thus force [COLUMN] to play f
rather than e. Harsanyi and Selten respond by claiming that [COLUMN] would have
to regard [ROW]’s playing B as a mistake since, “before deciding whether [ ROW]
can effectively signal his strategic intentions, we must first decide what strategies
are rational for the two players in the subgame, and accordingly what strategy is
the rational strategy for [ROW] in the truncation game” (Harsanyi and Selten 1988),
353 (see table). Thus, we have a dispute over what sorts of effects the playing of B
would have on [COLUMN]’s beliefs at the second choice point, and thereby on her
decision. This is just the sort of case where causal decision theory can be helpful.
Harper’s procedure endorses Kohlberg’s and Mertens’ contention. Harsanyi and
Selten’s preferred act for [ROW] will be unratifiable, so long as each player knows
that the other chooses only ratifiable options. For A to be ratifiable, it would at least
have to be the case that
However, since [COLUMN] must choose at the third choice point knowing that
[ROW] has played B, but without knowing whether he has played C or D, it
follows that [ROW]’s credence for BC !f must be (f/B). Now at the final choice
482 J.M. Joyce and A. Gibbard
point, [COLUMN] would know that [ROW] had chosen either BC or BD or some
mixture of the two, and she would have to assume that the option chosen was
ratifiable (if such an option is available). BC clearly cannot be ratifiable, since it
is dominated by A. Harper also shows, using a somewhat involved argument, that
no mixture of BC and BD can be ratifiable.22 BD, however, is ratifiable, provided
that .BD !f =BD/ D .f =B/ 4 10 . Thus, since only one of [ROW]’s B-
acts can be ratifiable, [COLUMN] would have to assign it a probability of one if
she were to find herself at the second choice point. [ROW], knowing all this and
knowing that f is [COLUMN]’s only ratifiable response to BD, will indeed assign a
high value to (f/B), viz. (f/B) D 1. This in turn ensures that U(BD/A) > U(A/A),
making A unratifiable. BD is thus the only ratifiable solution to the Harsanyi/Selten
game. Hence, if Harper is correct, it seems that Kohlberg and Mertens were right to
reject the backwards induction argument and to think that [ROW] can use B to warn
[COLUMN] of his intention to play D.
We hope this example gives the reader something of the flavor of Harper’s
approach. Clearly his proposal needs to be elaborated more fully before we will
be able to make an informed judgment on its merits. We are confident, however,
that any adequate understanding of game-theoretic reasoning will rely heavily on
causal decision theory and the maxim of ratifiability.
Foundational Questions
Before any theory of expected utility can be taken seriously, it must be supplemented
with a representation theorem that shows precisely how its requirements are
reflected in rational preference. To prove such a theorem one isolates a small set of
axiomatic constraints on preference, argues that they are requirements of rationality,
and shows that anyone who satisfies them will automatically act in accordance with
the theory’s principle of expected utility maximization. The best known result of
this type is found in Savage (1972), where it was shown that any agent whose
preferences conform to the well-known Savage axioms will maximize expected
utility, as defined by Eq. (23.1), relative to a (unique) probability and a (unique
up to positive linear transformation) utility u. Unfortunately, this result does not
provide a fully satisfactory foundation for either CDT or ETD, because, as we
have seen, Savage’s notion of expected utility is ambiguous between a causal and
an evidential interpretation. This leaves some important unfinished business for
evidential and causal decision theorists, since each camp has an obligation to present
a representation theorem that unambiguously captures its version of utility theory.
Evidential decision theorists were first to respond to the challenge. The key
mathematical result was proved in Bolker (1966), and applied to decision theory in
Jeffrey (1983). The Jeffrey/Bolker approach differs from Savage’s in two significant
22
Harper (1991), 293.
23 Causal Decision Theory 483
ways: First, preferences are defined over a -algebra of propositions that describe
not only states of the world, but consequences and actions as well. Second, Savage’s
“Sure-thing” Principle (his postulate P3) is replaced by the weaker
• Impartiality Axiom: Let X, Y and Z be non-null propositions such that (a) Z
is incompatible with both X and Y, (b) X and Y are indifferent in the agent’s
preference ordering, and (c) (X _ Z) is indifferent with (Y _ Z) but not with
Z. Then, (X _Z*) must be indifferent with (Y _Z*) for any proposition Z*
incompatible with both X and Y.
In the presence of the other Bolker/Jeffrey axioms, which do not differ sub-
stantially from those used by Savage, Impartiality guarantees that the agent’s
preferences can be represented by a function that satisfies Eq. (23.3).23 It also
ensures that the representation will be partition independent in the sense that,
P any partitions fXi g P
for and fYj g and any act A, one will always have V.A/ D
.X i =A/V.A&X i/ D .Y j =A/V.A&Yj /: Thus, in EDT it does not matter how
one chooses the state partition relative to which expected utilities are computed.
This contrasts sharply with Savage’s theory. Formula (23.1) shows how to
compute expected utilities with respect to a single partition of states, but it gives
no guarantee that different choices of partitions yield the same value for V(A). As
a consequence, Savage had to place restrictions on the state partitions that could
be legitimately employed in well-posed decision problems. Strictly speaking, he
said, his axioms only apply to grand-world decisions whose act and state partitions
slice things finely enough to ensure that each act/state pair produces a consequence
that is sufficiently detailed to decide every question the agent cares about. Thus, on
the official view, we can only be confident that (23.1) will yield the correct value for
V(A) when applied to state partitions in “grand-world” decisions. Savage recognized
this as a significant restriction on his theory, since owing to the extreme complexity
of grand-world decisions, no actual human being can ever contemplate making one.
He tried to make this restriction palatable by suggesting that his axioms might be
usefully applied to certain “small-world” decisions, and expressing the hope that
there would be only a “remote possibility” of obtaining values for V(A) inconsistent
with those obtained in the grand-world case. This hope, however, was never backed-
up by any rigorous proof. We take it as a mark in favor of EDT that it is can solve
this “problem of the small-world” by giving such a proof.
The easiest way to prove a representation result for CDT is to co-opt Sav-
age’s theorem by stipulating that well-posed decision problems must be based
on partitions of states that are certain to be causally independent of acts, and
then imposing Savage’s axioms on such problems. A number of causal decision
23
This representation will not be unique (except in thePrare case where V is unbounded), for, as
a simple calculation shows, if the function V .A/ D .Si =A/u.A; Si / represents a preference
ordering, and if k is such that 1 C kV (X) > 0 for all propositions
X X in the algebra over which
P
Vk .A/ D k.Si =A/uk .A; Si / is defined, then Vk .A/ D k .Si =A/ uk .A; Si / will also represent
the ordering, when pk(X) D (X)(l C kV (X)) and V k (X) D [V (X)(1 C k)]/(l C kV (X)).
484 J.M. Joyce and A. Gibbard
theorists have endorsed the view that there is a “right” partition of states to be used
for computing instrumental expected utility.24 The Lewis formulation of CDT in
terms of dependency hypotheses mentioned in the section “Conditionals and their
probabilities” above is an example of this strategy (Lewis 1981). Minor intramural
squabbles aside, the idea is that each element in the privileged partition, often
denoted K D fKj g following Skyrms (1980), should provide a maximally specific
description of one of the ways in which things that the agent cares about might
depend on what she does.25 It is characteristic of such partitions that they will be
related to the agent’s subjective probability by26
• Definiteness of Outcome: For any proposition O that the agent cares about (in the
sense of not being indifferent between O and :O), any action A, and any Kj 2 K,
either ((A ! O)/Kj ) D 1 or ((A ! :O)/Kj ) D 1.
• Instrumental Act Independence: ([A ! Kj ] $ Kj ) D 1 for all acts A and states
Kj .
The first of these ensures that u (A, Si ) has the same value for each Savage-state
Si in Kj , and thus that
X X
U.A/ D .A !Si /u.A; Si / D .A !Kj /U.A; Kj /
i j
P
The second condition then guarantees that U.A/ D j .Kj /U.A; Kj /. Since this
equation has the form (23.5), it follows that if there exists a partition K that
meets these two requirements, then one can appropriately apply the Savage axioms
to actions whose outcomes are specified in terms of it. Fortunately, the required
partition is certain to exist, because it is always possible to find a K D fKi g such that
(i) Kj entails either (A ! O) or (A ! :O) for every O the agent cares about, and
(ii) ([A ! Kj ] $ Kj ) is a truth of logic for all A and Kj .27 The general existence of
such partitions lets us use Savage’s representation theorem as a foundation for CDT,
subject to the proviso that the Savage axioms should only be applied to decisions
framed in terms of a K-partition.
The trouble with this strategy is that the partition dependence of Savage’s theory
is carried over into CDT. The need for a partition-independent formulation of CDT
24
See, for example, Skyrms (1980), Lewis (1981), Armendt (1986).
25
Notice that states are being viewed here as functions from acts to outcomes, whereas acts are
taken as unanalyzed objects of choice (that is, as propositions the agent can make true or false as
she pleases). This contrasts with Savage’s well-known formalization in which acts are protrayed
as functions from states to outcomes, and states are left as unanalyzed objects of belief. Less hangs
on this distinction than one might think. When one adopts the perspective of Jeffrey (1967, 1983)
and interprets both states and actions as propositions, and views outcomes as conjunctions of these
propositions, the two analyses become interchangeable.
26
Here we are following Gibbard (1986).
27
An explicit construction of K can be found in Gibbard and Harper (1978).
23 Causal Decision Theory 485
has been argued by Sobel (1989), and Eells (1982) has suggested that EDT should be
preferred to CDT on this basis alone. The main difficulty is the problem of “small-
worlds”, which threatens to make the theory inapplicable, in the strictest sense, to
all the decision problems people actually consider. Gibbard (1986) goes a certain
distance toward alleviating this difficulty by, in effect, showing how to find the
smallest K-partition for a given decision problem, but his partition is still rather
“grand”, and a fully partition-invariant version of CDT would still be desirable.
Armendt (1986) proves a representation result that does provide a formulation of
CDT that is partition-independent, even though it does not do away with the notion
of a K-partition. He takes the conditional decision theory of Fishburn (1974) as his
starting point. The basic concept here is that of the utility of a prospect X on the
hypothesis that some condition C obtains. If fC1 , C2 , : : : , Cn g is a partition of C,
these conditional utilities are governed by the (partition independent) equation:
X
U.X=C/ D .Ci =C/U.X=Ci / (23.15)
which shows how X’s utility given C depends on its utilities given the various ways
in which C might be true. Notice that (23.15) allows for a distinction between an act
A’s unconditional utility and its utility conditional on its own performance. These
are given respectively by
X
U.A/ D U.A=A _ :A/ D .Si /U.A=Si /
X
U.A=A/ D .Si =A/U.A=A & Si /
where the state partition may be chosen arbitrarily. In a suggestion that bears
some similarities to Jeffrey’s ratificationist proposal, Armendt argues that decision
problems in which an agent’s unconditional preference for A differs from her
preference for A conditional on itself are just the kinds of cases in which A’s
auspiciousness diverges from its instrumental expected utility. This suggests a
way of characterizing K-partitions directly in terms of the agent’s preferences.
Armendt’s thought is that the elements of an appropriate K-partition should “screen-
off” differences in value between unconditional A and A-conditional-on-A, so that
the agent is indifferent
Pbetween A given KiP and A given (A&Ki ) for every i. When this
is so, we will have .Ki /U.A=Ki / D .K i =A/U.A=Ki /, and Armendt shows
that the conditional utilities U(A/Ki ) can be eliminated in favor of the unconditional
news values V(A&Ki ) as long as there exists at least one partition of “consequences”
O D fOj g such that the agent’s unconditional utility for (A&Oj &Ki ) is equal to her
utility forPA conditional on (A&Oj &Ki ). When such a K and O exist, we have
U.A/ D .Ki /V.A & Ki /; which is precisely the condition in which CDT and
EDT coincide. What Armendt shows, then, is that an appropriate representation
theorem for CDT can be obtained by supplementing Fishburn’s conditional decision
theory with the assumption that every act A can be associated with a K-partition such
that A/Ki A/ (A&Ki ) for all i, and a partition of consequences O (dependent on A
and K) such that (A&Oj &Ki ) A/ (A&Oj &Ki ).
486 J.M. Joyce and A. Gibbard
28
The set C always takes the form C D I, where the ideal I is a collection of -propositions
that contains the contradictory event (X&:X), is closed under countable disjunctions, and which
contains (X&Y) whenever X 2 I and Y 2.
29
These Bayesian suppositions were defined in Renyi (1955), and have come to be called “Popper
measures” in the philosophical literature, after Popper (1934). Interested readers may consult van
Fraassen (1995), Hammond (1994), and McGee (1994) for informative discussions of Popper
measures.
23 Causal Decision Theory 487
one that satisfies Bayes’s Law. There are also suppositions that are neither Bayesian
nor instances of imaging.
Joyce, impressed by the need for partition-independent formulations of utility
theories, uses the notion of a supposition to define an abstract conditional expected
utility to be thought of as “utility under a supposition”. Its (partition independent)
basic equation is
X .Si & AjC/
V.AjC/ D u.A; Si /
i
.AjC/
(23.16)
X .X i & AjC/
D V.A & X i jC/ for any partition fXi g;
i
.AjC/
where (• j •) might be any supposition function for defined relative to a set of
conditions C that contains propositions describing all an agent’s actions (as well as
other things). Since (23.16) is just EDT’s (23.3) with (• j C) substituted in for (•),
V(AjC) gives A’s auspiciousness on the supposition that condition C obtains, where
this supposition may be any provisional belief revision that satisfies (a) (e).
As with Fishburn’s theory, there is no guarantee that A’s unconditional utility,
which is now just V(A), will coincide with its utility conditional on itself,
X
V.AjA/ D .Si jA/u.A; Si /
i
The sole exception occurs when (• j •) is Bayesian, for in that case we have V(A)
D V(A j A), because (Si j A) D (Si &A)/(A) for all i. Given that V(A) and V(A j
A) can differ in general, it becomes a live question whether a decision maker should
choose acts that maximize her unconditional expected utility or choose acts that
maximize expected utility conditional on the supposition that they are performed.
Joyce argues, on grounds having nothing to do with the conflict between CDT and
EDT, that a choiceworthy action is always one that maximizes expected utility on
the condition that it is performed. The rational decision maker’s objective, in other
words, should always be to choose an A such that V(A j A) V(B j B) for all
alternatives B. Neither evidential nor causal decision theorists will dispute this point,
since the former endorse the prescription to maximize V(A j A) when the supposition
function is (A j C) D c (A), which makes V (AjA) equal A’s auspiciousness, and
the latter endorse it when (A j C) D (C ! A), which makes V (A j A) equal A’s
instrumental expected utility. Thus, EDT and CDT are both instances of abstract
conditional utility theory. The difference between them has to do not with the basic
form of the utility function or with the connection between expected utility and
choiceworthiness, but with the correct type of supposition to use in decision making
contexts.
Once we recognize this, it becomes clear that the problem of proving a represen-
tation theorem for CDT can be subsumed under the more general problem of proving
a representation theorem for an abstract conditional utility theory. And, since the
488 J.M. Joyce and A. Gibbard
function V (• j C) obeys Eq. (23.3) relative to any fixed condition C, this latter
problem can be solved by extending the Bolker/Jeffrey axioms for unconditional
preferences to conditional preferences, and showing that anyone who satisfies the
axioms is sure to have conditional preferences that can be represented by some
function V (A j C) of form (23.16) that is defined relative to a supposition function
(• j •) for her subjective probability .
Joyce was able to accomplish this. We refer to reader to (1999) for the details,
which turn out to be rather complicated, but the basic idea is straightforward. One
starts by imagining an agent with a system of conditional preferences of the form:
X on the supposition that B is weakly preferable to Y on the supposition that C,
written X j B Y j C. One assumes that this ranking obeys the usual axioms:
transitivity, connectedness, a continuity principle, an Archimedean axiom, and so
on. One also requires each section of the ranking X j C Y j C, for C fixed, to
satisfy the Bolker/Jeffrey axioms. Bolker’s theorem then ensures that each section
will be associated with a family rC of (V, C ) pairs that satisfy Eq. (23.3) and
that represent X j C Y j C. Different rC -pairs will be related by the equations
C .X/ D C .X/Œ1 C kVC .X/ and V C .X/ D VC .X/Œ.k C 1/=.1 C kVC .X/; where
k is any real number such that [1 C kV(X)] > 0 for all propositions X such that V(X)
is defined. The trick to proving a representation theorem for conditional decision
theory is to find further constraints on conditional preferences under which it is
possible to select a unique (V, PC ) pair from each rC in such a way that V(X) V
(X) is guaranteed to hold whenever X j B Y j C. The main axiom that is needed is
the following generalization of Impartiality:
• Let X1 , X2 , and X3 , and Y1 , Y2 , and Y3 , be mutually incompatible, and suppose
that
then
This is not as complicated as it looks. If clause (23.17) holds, the only sort of
conditional utility that will represent X j C Y j B will be one in which (X1 j
B)/(X1 _ X2 j B) D (Y1 j C)/(Y1 _ Y2 j C). Likewise, if (23.18) holds, then the
representation must be one in which (X1 j B)/(X1 _ X3 j B) D (Y1 j C)/(Y1 _ Y3
jC). Together these two equalities entail that (X1 j B)/(X1 _ X2 _ X3 j B) D (Y1 j
C)/(Y1 _ Y2 _ Y3 j C), and this is just what (23.19) guarantees.
Using this axiom as the main formal tool, Joyce is able to construct a full
conditional expected utility representation for the ranking X j C Y j B. By
23 Causal Decision Theory 489
adding further conditions, one can ensure either that the representation’s supposition
function will be Bayesian or that it will arise via imaging from some similarity
relation among possible worlds. In this way, both EDT and CDT are seen to have a
common foundation in the abstract theory of conditional expected utility.
Conclusion
While the classical theory of Ramsey, de Finetti and Savage remains our best
account of rational choice, its development has yet to be completed. An adequate
theory should explain the role in decision making of causal thinking. True,
a decision theory that did without causal propositions would have been nice:
Cause and effect have long puzzled and divided philosophers and scientists, and
theoretical discussion of causal methodology remains underdeveloped.30 In decision
making, though, we are stuck with causal thinking. Rational choice always involves
judgements of how likely an option is to have various desirable consequences—and
such judgements, we have argued, require the decision maker to have views, explicit
or implicit, about causal or counterfactual relationships. Nothing else can substitute
for these causal beliefs. The conditional credences employed by evidential decision
theory cannot, because they are unable to distinguish causation from correlation.
More refined “screening” techniques, while better at capturing causal connections,
fail to apply in a important class of cases. To specify the kind of case to which they
do apply, we must, one way or another, invoke causal relations.
We should not find this need for causal notions distressing. We draw causal
conclusions all the time, after all, and scientists are able to glean causal tendencies
from experiment and statistical data, using methods of high sophistication. Still,
no one theory of causal notions has the precision and orthodox status of, say, the
standard theory of subjective probability. Thus, an adequate decision theory, if we
are right, must depend on new advances in our understanding of causation and its
relation to rational belief. We might have wished that theoretical life had turned
out easier, but as matters stand, important work on the foundations of utility theory
remains to be done.
References
30
Again, giant steps have been taken in this area since this article first appeared, especially by
Spirtes et al. (1993) and Pearl (2000).
490 J.M. Joyce and A. Gibbard
Nozick, R. (1969). Newcomb’s problem and two principles of choice. In N. Rescher (Ed.), Essays
in honor of Carl G. Hempel. Dordrecht-Holland: Reidel.
Piccione, M., & Rubinstein, A. (1997). On the interpretation of decision problems with imperfect
recall. Games and Economic Behavior, 20, 3–24.
Popper, K. (1934). Logik der Forschung. Vienna: Springer. Translated as The logic of scientific
discovery (London: Hutchinson, 1959).
Pearl, J. (2000). Causality: Models, reasoning, and inference. New York: Cambridge University
Press.
Renyi, A. (1955). On a new axiomatic theory of probability. Acta Mathematica Academiae
Scientiarium Hungaricae, 6, 285–335.
Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained
exposure period – applications to control of the healthy workers survivor effect. Mathematical
Modeling, 7, 1393–1512.
Rubin, D. (1974). Estimating causal effects of treatments in randomized and nonrandomized
studies. Journal of Educational Psychology, 66, 688–701.
Savage, L. J. (1972). The foundations of statistics. New York: Dover (First edition 1954).
Shin, H. S. (1991a). A reconstruction of Jeffrey’s notion of ratifiability in terms of counterfactual
beliefs. Theory and Decision, 31, 21–47.
Shin, H. S. (1991b). Two notions of ratifiability and equilibrium in games. In M. Bacharach & S.
Hurley (Eds.), Foundations of decision theory (pp. 242–262). Oxford: Basil Blackwell.
Simon, H. A. (1957). Models of man. New York: Wiley.
Skyrms, B. (1980). Causal necessity. New Haven: Yale University Press.
Skyrms, B. (1984). Pragmatics and empiricism. New Haven: Yale University Press.
Skyrms, B. (1990). Ratifiability and the logic of decision. Midwest Studies in Philosophy, 15, 44–
56.
Sobel, J. H. (1989). Partition theorems for causal decision theories. Philosophy of Science, 56,
70–95.
Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search (Lecture Notes
in Statistics, Vol. 81). New York: Springer. ISBN: 978-1-4612-7650-0 (Print) 978-1-4612-
2748-9 (Online).
Stalnaker, R. (1968). A theory of conditionals. In Studies in logical theory (American philosophical
quarterly monograph series 2). Oxford: Blackwell.
Stalnaker, R. (1972). Letter to David Lewis. In W. Harper, R. Stalnaker, & G. Pearce (Eds.), Ifs:
Conditionals, belief, decision, chance, and time (pp. 151–152). Dordrecht: Reidel.
Stalnaker, R. (1981). Letter to David Lewis of May 21, 1972. In W. L. Harper, R. Stalnaker, & G.
Pearce (Eds.), Ifs: Conditionals, belief, decision, chance, and time. Dordrecht-Holland: Reidel.
Stalnaker, R., & Thomason, R. (1970). A semantic analysis of conditional logic. Theoria, 36, 23–
42.
Van Fraassen, B. (1995). Fine-grained opinion, probability, and the logic of belief. Journal of
Philosophical Logic, 24, 349–377.
Chapter 24
Advances in Prospect Theory: Cumulative
Representation of Uncertainty
Expected utility theory reigned for several decades as the dominant normative and
descriptive model of decision making under uncertainty, but it has come under
serious question in recent years. There is now general agreement that the theory
does not provide an adequate description of individual choice: a substantial body of
evidence shows that decision makers systematically violate its basic tenets. Many
alternative models have been proposed in response to this empirical challenge (for
reviews, see Camerer 1989; Fishburn 1988; Machina 1987). Some time ago we
presented a model of choice, called prospect theory, which explained the major
violations of expected utility theory in choices between risky prospects with a
small number of outcomes (Kahneman and Tversky 1979; Tversky and Kahneman
1986). The key elements of this theory are (1) a value function that is concave for
gains, convex for losses, and steeper for losses than for gains, and (2) a nonlinear
transformation of the probability scale, which overweights small probabilities and
underweights moderate and high probabilities. In an important later development,
several authors (Quiggin 1982; Schmeidler 1989; Yaari 1987; Weymark 1981)
have advanced a new representation, called the rank-dependent or the cumulative
functional, that transforms cumulative rather than individual probabilities. This
article presents a new version of prospect theory that incorporates the cumulative
functional and extends the theory to uncertain as well to risky prospects with
any number of outcomes. The resulting model, called cumulative prospect theory,
combines some of the attractive features of both developments (see also Luce and
Fishburn 1991). It gives rise to different evaluations of gains and losses, which
are not distinguished in the standard cumulative model, and it provides a unified
treatment of both risk and uncertainty.
To set the stage for the present development, we first list five major phenomena
of choice, which violate the standard model and set a minimal challenge that must
be met by any adequate descriptive theory of choice. All these findings have been
confirmed in a number of experiments, with both real and hypothetical payoffs.
Framing Effects The rational theory of choice assumes description invariance:
equivalent formulations of a choice problem should give rise to the same preference
order (Arrow 1982). Contrary to this assumption, there is much evidence that
variations in the framing of options (e.g., in terms of gains or losses) yield
systematically different preferences (Tversky and Kahneman 1986).
Nonlinear Preferences According to the expectation principle, the utility of a
risky prospect is linear in outcome probabilities. Allais’s (1953) famous example
challenged this principle by showing that the difference between probabilities of .99
and 1.00 has more impact on preferences than the difference between 0.10 and 0.11.
More recent studies observed nonlinear preferences in choices that do not involve
sure things (Camerer and Ho 1991).
Source Dependence People’s willingness to bet on an uncertain event depends not
only on the degree of uncertainty but also on its source. Ellsberg (1961) observed
that people prefer to bet on an urn containing equal numbers of red and green balls,
rather than on an urn that contains red and green balls in unknown proportions. More
recent evidence indicates that people often prefer a bet on an event in their area of
competence over a bet on a matched chance event, although the former probability
is vague and the latter is clear (Heath and Tversky 1991).
Risk Seeking Risk aversion is generally assumed in economic analyses of decision
under uncertainty. However, risk-seeking choices are consistently observed in
two classes of decision problems. First, people often prefer a small probability
of winning a large prize over the expected value of that prospect. Second, risk
seeking is prevalent when people must choose between a sure loss and a substantial
probability of a larger loss.
Loss Aversion One of the basic phenomena of choice under both risk and uncer-
tainty is that losses loom larger than gains (Kahneman and Tversky 1984; Tversky
and Kahneman 1991). The observed asymmetry between gains and losses is far too
extreme to be explained by income effects or by decreasing risk aversion.
The present development explains loss aversion, risk seeking, and nonlinear pref-
erences in terms of the value and the weighting functions. It incorporates a framing
process, and it can accommodate source preferences. Additional phenomena that lie
beyond the scope of the theory—and of its alternatives—are discussed later.
The present article is organized as follows. Section “Cumulative prospect theory”
introduces the (two-part) cumulative functional; section “Relation to previous
work” discusses relations to previous work; and section “Values and weights”
24 Advances in Prospect Theory: Cumulative Representation of Uncertainty 495
describes the qualitative properties of the value and the weighting functions. These
properties are tested in an extensive study of individual choice, described in
section “Experiment”, which also addresses the question of monetary incentives.
Implications and limitations of the theory are discussed in section “Discussion”. An
axiomatic analysis of cumulative prospect theory is presented in the appendix.
Theory
Prospect theory distinguishes two phases in the choice process: framing and
valuation. In the framing phase, the decision maker constructs a representation
of the acts, contingencies, and outcomes that are relevant to the decision. In the
valuation phase, the decision maker assesses the value of each prospect and chooses
accordingly. Although no formal theory of framing is available, we have learned a
fair amount about the rules that govern the representation of acts, outcomes, and
contingencies (Tversky and Kahneman 1986). The valuation process discussed in
subsequent sections is applied to framed prospects.
In the classical theory, the utility of an uncertain prospect is the sum of the utilities
of the outcomes, each weighted by its probability. The empirical evidence reviewed
above suggests two major modifications of this theory: (1) the carriers of value are
gains and losses, not final assets; and (2) the value of each outcome is multiplied
by a decision weight, not by an additive probability. The weighting scheme used
in the original version of prospect theory and in other models is a monotonie
transformation of outcome probabilities. This scheme encounters two problems.
First, it does not always satisfy stochastic dominance, an assumption that many
theorists are reluctant to give up. Second, it is not readily extended to prospects
with a large number of outcomes. These problems can be handled by assuming
that transparently dominated prospects are eliminated in the editing phase, and by
normalizing the weights so that they add to unity. Alternatively, both problems
can be solved by the rank-dependent or cumulative functional, first proposed by
Quiggin (1982) for decision under risk and by Schmeidler (1989) for decision
under uncertainty. Instead of transforming each probability separately, this model
transforms the entire cumulative distribution function. The present theory applies
the cumulative functional separately to gains and to losses. This development
extends prospect theory to uncertain as well as to risky prospects with any number
of outcomes while preserving most of its essential features. The differences between
the cumulative and the original versions of the theory are discussed in section
“Relation to previous work”.
Let S be a finite set of states of nature; subsets of S are called events. It is assumed
that exactly one state obtains, which is unknown to the decision maker. Let X be a
set of consequences, also called outcomes. For simplicity, we confine the present
496 A. Tversky and D. Kahneman
where the decision
weights C f C D 0C ; : : : ; nC and . f / D
m ; : : : ; 0 are defined by:
nC D W C .An / ; m D W .A m / ;
iC D W C .Ai [ [ An / W C .AiC1 [ [ An / ; 0 i n 1;
i D W .A m [ [ Ai / W .A m [ [ Ai 1 / ; 1 m i 0:
n
X
V. f / D i v .xi / : (24.2)
iD m
The decision weight iC , associated with a positive outcome, is the difference
between the capacities of the events “the outcome is at least as good as xi ” and
“the outcome is strictly better then xi .” The decision weight i , associated with
24 Advances in Prospect Theory: Cumulative Representation of Uncertainty 497
a negative outcome, is the difference between the capacities of the events “the
outcome is at least as bad as xi ” and “the outcome is strictly worse than xi .” Thus,
the decision weight associated with an outcome can be interpreted as the marginal
contribution of the respective event,1 defined in terms of the capacities WC and
W . If each W is additive, and hence a probability measure, then i is simply the
probability of Ai . It follows readily from the definitions of and W that for both
positive and negative prospects, the decision weights add to 1. For mixed prospects,
however, the sum can be either smaller or greater than 1, because the decision
weights for gains and for losses are defined by separate capacities.
If the prospect f D (xi , Ai ) is given by a probability distribution p(Ai ) D pi , it can
be viewed as a probabilistic or risky prospect (xi , pi ). In this case, decision weights
are defined by:
nC D wC .pn / ; m D w .p m / ;
iC D wC .pi C C pn / wC .piC1 C C pn / ; 0 i n 1;
i D w .p m C C pi / w .p m C C pi 1 / ; 1 m i 0:
where wC and w are strictly increasing functions from the unit interval into itself
satisfying wC (0) D w (0) D 0, and wC (l) D w (1) D 1.
To illustrate the model, consider the following game of chance. You roll a
die once and observe the result x D 1; : : : ; 6. If x is even, you receive $x;
if x is odd, you pay $x. Viewed as a probabilistic prospect with equiprobable
outcomes, f yields the consequences ( 5, 3, 1, 2, 4, 6), each with probability
1=6. Thus, f C D (0; 1=2I 2; 1=6I 4; 1=6I 6; 1=6), and f D ( 5; 1=6I 3; 1=6I 1; 1=6I 0; 1=2).
Luce and Fishburn (1991) derived essentially the same representation from a more
elaborate theory involving an operation of joint receipt or multiple play. Thus,
f g is the composite prospect obtained by playing both f and g, separately. The
key feature of their theory is that the utility function U is additive with respect
to that is, U(f g) D U(f ) C U(g) provided one prospect is acceptable (i.e.,
1
In keeping with the spirit of prospect theory, we use the decumulative form for gains and the
cumulative form for losses. This notation is vindicated by the experimental findings described in
section “Experiment”.
498 A. Tversky and D. Kahneman
preferred to the status quo) and the other is not. This condition seems too restrictive
both normatively and descriptively. As noted by the authors, it implies that the
utility of money is a linear function of money if for all sums of money x, y,
U(x y) D U(x C y). This assumption appears to us inescapable because the joint
receipt of x and y is tantamount to receiving their sum. Thus, we expect the decision
maker to be indifferent between receiving a $10 bill or receiving a $20 bill and
returning $10 in change. The Luce-Fishburn theory, therefore, differs from ours in
two essential respects. First, it extends to composite prospects that are not treated in
the present theory. Second, it practically forces utility to be proportional to money.
The present representation encompasses several previous theories that employ
the same decision weights for all outcomes. Starmer and Sugden (1989) considered
a model in which w (p) D wC (p), as in the original version of prospect theory. In
contrast, the rank-dependent models assume w .p/ D 1 wC .1 p/ or W .A/ D
1 W C .S A/. If we apply the latter condition to choice between uncertain assets,
we obtain the choice model established by Schmeidler (1989), which is based on the
Choquet integral.2 Other axiomatizations of this model were developed by Gilboa
(1987), Nakamura (1990), and Wakker (1989a, b). For probabilistic (rather than
uncertain) prospects, this model was first established by Quiggin (1982) and Yaari
(1987), and was further analyzed by Chew (1989), Segal (1989), and Wakker (1990).
An earlier axiomatization of this model in the context of income inequality was
presented by Weymark (1981). Note that in the present theory, the overall value
V(f ) of a mixed prospect is not a Choquet integral but rather a sum V(f C ) C V(f )
of two such integrals.
The present treatment extends the original version of prospect theory in several
respects. First, it applies to any finite prospect and it can be extended to continuous
distributions. Second, it applies to both probabilistic and uncertain prospects and
can, therefore, accommodate some form of source dependence. Third, the present
theory allows different decision weights for gains and losses, thereby generalizing
the original version that assumes wC D w . Under this assumption, the present
theory coincides with the original version for all two-outcome prospects and for
all mixed three-outcome prospects. It is noteworthy that for prospects of the form
(x,p;y, 1 – p), where either x > y > 0 or x < y < 0, the original theory is in fact
rank dependent. Although the two models yield similar predictions in general,
the cumulative version—unlike the original one—satisfies stochastic dominance.
Thus, it is no longer necessary to assume that transparently dominated prospects
are eliminated in the editing phase—an assumption that was criticized by some
authors. On the other hand, the present version can no longer explain violations
of stochastic dominance in nontransparent contexts (e.g., Tversky and Kahneman
1986). An axiomatic analysis of the present theory and its relation to cumulative
utility theory and to expected utility theory are discussed in the appendix; a more
comprehensive treatment is presented in Wakker and Tversky (1991).
2
This model appears under different names. We use cumulative utility theory to describe the
application of a Choquet integral to a standard utility function, and cumulative prospect theory
to describe the application of two separate Choquet integrals to the value of gains and losses.
24 Advances in Prospect Theory: Cumulative Representation of Uncertainty 499
In expected utility theory, risk aversion and risk seeking are determined solely by the
utility function. In the present theory, as in other cumulative models, risk aversion
and risk seeking are determined jointly by the value function and by the capacities,
which in the present context are called cumulative weighting functions, or weighting
functions for short. As in the original version of prospect theory, we assume that v is
concave above the reference point (v00 (x) 0, x 0) and convex below the reference
point (v00 (x) 0, x 0). We also assume that v is steeper for losses than for gains
v0 (x) < v0 ( x) for x 0. The first two conditions reflect the principle of diminishing
sensitivity: the impact of a change diminishes with the distance from the reference
point. The last condition is implied by the principle of loss aversion according to
which losses loom larger than corresponding gains (Tversky and Kahneman 1991).
The principle of diminishing sensitivity applies to the weighting functions as
well. In the evaluation of outcomes, the reference point serves as a boundary that
distinguishes gains from losses. In the evaluation of uncertainty, there are two
natural boundaries—certainty and impossibility—that correspond to the endpoints
of the certainty scale. Diminishing sensitivity entails that the impact of a given
change in probability diminishes with its distance from the boundary. For example,
an increase of .1 in the probability of winning a given prize has more impact when
it changes the probability of winning from .9 to 1.0 or from 0 to .1, than when
it changes the probability of winning from .3 to .4 or from .6 to .7. Diminishing
sensitivity, therefore, gives rise to a weighting function that is concave near 0 and
convex near 1. For uncertain prospects, this principle yields subadditivity for very
unlikely events and superadditivity near certainty. However, the function is not
well-behaved near the endpoints, and very small probabilities can be either greatly
overweighted or neglected altogether.
Before we turn to the main experiment, we wish to relate the observed non-
linearity of preferences to the shape of the weighting function. For this purpose,
we devised a new demonstration of the common consequence effect in decisions
involving uncertainty rather than risk. Table 24.1 displays a pair of decision
problems (I and II) presented in that order to a group of 156 money managers
during a workshop. The participants chose between prospects whose outcomes were
contingent on the difference d between the closing values of the Dow-Jones today
and tomorrow. For example, f 0 pays $25,000 if d exceeds 30 and nothing otherwise.
The percentage of respondents who chose each prospect is given in brackets. The
independence axiom of expected utility theory implies that f is preferred to g iff f 0
is preferred to g0 . Table 24.1 shows that the modal choice was f in problem I and
g0 in problem II. This pattern, which violates independence, was chosen by 53 % of
the respondents.
Essentially the same pattern was observed in a second study following the same
design. A group of 98 Stanford students chose between prospects whose outcomes
were contingent on the point-spread d in the forthcoming Stanford-Berkeley football
game. Table 24.2 presents the prospects in question. For example, g pays $10 if
500 A. Tversky and D. Kahneman
Stanford does not win, $30 if it wins by 10 points or less, and nothing if it wins
by more than 10 points. Ten percent of the participants, selected at random, were
actually paid according to one of their choices. The modal choice, selected by 46 %
of the subjects, was f and g0 , again in direct violation of the independence axiom.
To explore the constraints imposed by this pattern, let us apply the present theory
to the modal choices in Table 24.1, using $1,000 as a unit. Since f is preferred to g
in problem I,
v.25/ > v.75/W C .C/ C v.25/ W C .A [ C/ W C .C/
or
v.25/ l W C .A [ C/ C W C .C/ > v.75/W C .C/:
hence,
W C .S/ W C .S B/ > W C .C [ B/ W C .C/: (24.3)
Thus, “subtracting” B from certainty has more impact than “subtracting” B from
C[B. Let WC (D) D 1 WC (S D), and wC (p) D 1 wC (1 p). It follows readily
24 Advances in Prospect Theory: Cumulative Representation of Uncertainty 501
that Eq. (24.3) is equivalent to the subadditivity of WC , that is, WC .B/ C WC .D/
W C .B [ D/. For probabilistic prospects, Eq. (24.3) reduces to
1 wC .1 q/ > wC .p C q/ wC .p/;
or
Allais’s example corresponds to the case where p(C) D .10, p(B) D .89, and
p(A) D .01.
It is noteworthy that the violations of independence reported in Tables 24.1 and
24.2 are also inconsistent with regret theory, advanced by Loomes and Sugden
(1982a, b), and with Fishburn’s (1988) SSA model. Regret theory explains Allais’s
example by assuming that the decision maker evaluates the consequences as if the
two prospects in each choice are statistically independent. When the prospects in
question are defined by the same set of events, as in Tables 24.1 and 24.2, regret
theory (like Fishburn’s SSA model) implies independence, since it is additive over
states. The finding that the common consequence effect is very much in evidence in
the present problems undermines the interpretation of Allais’s example in terms of
regret theory.
The common consequence effect implies the subadditivity of WC and of wC .
Other violations of expected utility theory imply the subadditivity of WC and of wC
for small and moderate probabilities. For example, Prelec (1990) observed that most
respondents prefer 2% to win $20,000 over 1% to win $30,000; they also prefer 1%
to win $30,000 and 32% to win $20,000 over 34% to win $20,000. In terms of the
present theory, these data imply that wC (.02) wC (.01) wC (.34) wC (.33). More
generally, we hypothesize
Experiment
An experiment was carried out to obtain detailed information about the value and
weighting functions. We made a special effort to obtain high-quality data. To this
end, we recruited 25 graduate students from Berkeley and Stanford (12 men and
502 A. Tversky and D. Kahneman
Procedure
3
An IBM disk containing the exact instructions, the format, and the complete experimental
procedure can be obtained from the authors.
24 Advances in Prospect Theory: Cumulative Representation of Uncertainty 503
Table 24.3 Median cash equivalents (in dollars) for all nonmixed prospects
Probability
Outcomes .01 .05 .10 .25 .50 .75 .90 .95 .99
(0, 50) 9 21 37
(0, –50) 8 21 39
(0, 100) 14 25 36 52 78
(0, –100) 8 23.5 42 63 84
(0, 200) 10 20 76 131 188
(0, –200) 3 23 89 155 190
(0, 400) 12 377
(0, –400) 14 380
(50, 100) 59 71 83
(–50, –100) 59 71 85
(50, 150) 64 72.5 86 102 128
(–50, –150) 60 71 92 113 132
(100, 200) 118 130 141 162 178
(–100, –200) 112 121 142 158 179
Note: The two outcomes of each prospect are given in the left-hand side of each row; the probability
of the second (i.e., more extreme) outcome is given by the corresponding column. For example,
the value of $9 in the upper left corner is the median cash equivalent of the prospect (0, .9; $50, .1)
Results
The most distinctive implication of prospect theory is the fourfold pattern of risk
attitudes. For the nonmixed prospects used in the present study, the shapes of the
value and the weighting functions imply risk-averse and risk-seeking preferences,
respectively, for gains and for losses of moderate or high probability. Furthermore,
the shape of the weighting functions favors risk seeking for small probabilities of
gains and risk aversion for small probabilities of loss, provided the outcomes are
not extreme. Note, however, that prospect theory does not imply perfect reflection
in the sense that the preference between any two positive prospects is reversed when
gains are replaced by losses. Table 24.4 presents, for each subject, the percentage of
risk-seeking choices (where the certainty equivalent exceeded expected value) for
gains and for losses with low (p .1) and with high (p .5) probabilities. Table 24.4
shows that for p .5, all 25 subjects are predominantly risk averse for positive
prospects and risk seeking for negative ones. Moreover, the entire fourfold pattern
is observed for 22 of the 25 subjects, with some variability at the level of individual
choices.
Although the overall pattern of preferences is clear, the individual data, of course,
reveal both noise and individual differences. The correlations, across subjects,
between the cash equivalents for the same prospects on successive sessions averaged
.55 over six different prospects. Table 24.5 presents means (after transformation to
Fisher’s z) of the correlations between the different types of prospects. For example,
there were 19 and 17 prospects, respectively, with high probability of gain and high
504 A. Tversky and D. Kahneman
probability of loss. The value of .06 in Table 24.5 is the mean of the 17 19 D 323
correlations between the cash equivalents of these prospects.
The correlations between responses within each of the four types of prospects
average .41, slightly lower than the correlations between separate responses to the
same problems. The two negative values in Table 24.5 indicate that those subjects
who were more risk averse in one domain tended to be more risk seeking in the other.
Although the individual correlations are fairly low, the trend is consistent: 78 % of
the 403 correlations in these two cells are negative. There is also a tendency for
24 Advances in Prospect Theory: Cumulative Representation of Uncertainty 505
subjects who are more risk averse for high-probability gains to be less risk seeking
for gains of low probability. This trend, which is absent in the negative domain,
could reflect individual differences either in the elevation of the weighting function
or in the curvature of the value function for gains. The very low correlations in the
two remaining cells of Table 24.5, averaging .05, indicate that there is no general
trait of risk aversion or risk seeking. Because individual choices are quite noisy,
aggregation of problems is necessary for the analysis of individual differences.
The fourfold pattern of risk attitudes emerges as a major empirical generalization
about choice under risk. It has been observed in several experiments (see, e.g.,
Cohen et al. 1987), including a study of experienced oil executives involving
significant, albeit hypothetical, gains and losses (Wehrung 1989). It should be noted
that prospect theory implies the pattern demonstrated in Table 24.4 within the data of
individual subjects, but it does not imply high correlations across subjects because
the values of gains and of losses can vary independently. The failure to appreciate
this point and the limited reliability of individual responses has led some previous
authors (e.g., Hershey and Schoemaker 1980) to underestimate the robustness of the
fourfold pattern.
Scaling
1.0
0.8
0.6
c/x
0.4
0.2
0.0
Fig. 24.1 Median c/x for all positive prospects of the form (x, p; 0,1 – p). Triangles and circles,
respectively, correspond to values of x that lie above or below 200
1.0
0.8
0.6
c/x
0.4
0.2
0.0
Fig. 24.2 Median c/x for all negative prospects of the form (x, p; 0,1 – p). Triangles and circles,
respectively, correspond to values of x that lie below or above – 200
24 Advances in Prospect Theory: Cumulative Representation of Uncertainty 507
constant, that is, c(kf ) D kc(f ). In expected utility theory, preference homogeneity
gives rise to constant relative risk aversion. Under the present theory, assuming
X D Re, preference homogeneity is both necessary and sufficient to represent v as a
two-part power function of the form
x˛ if x 0
v.x/ D (24.5)
. x/ˇ if x < 0:
Figures 24.1 and 24.2 exhibit the characteristic pattern of risk aversion and risk
seeking observed in Table 24.4. They also indicate that preference homogeneity
holds as a good approximation. The slight departures from homogeneity in Fig. 24.1
suggest that the cash equivalents of positive prospects increase more slowly than the
stakes (triangles tend to lie below the circles), but no such tendency is evident in
Fig. 24.2. Overall, it appears that the present data can be approximated by a two-
part power function. The smooth curves in Figs. 24.1 and 24.2 can be interpreted
as weighting functions, assuming a linear value function. They were fitted using the
following functional form:
p
pı
wC .p/ D
1=
; w .p/ D : (24.6)
.p
C .1 p/ / .pı C .1 p/ı /1=ı
This form has several useful features: it has only one parameter; it encompasses
weighting functions with both concave and convex regions; it does not require
w(.5) D .5; and most important, it provides a reasonably good approximation to
both the aggregate and the individual data for probabilities in the range between
.05 and .95.
Further information about the properties of the value function can be derived
from the data presented in Table 24.6. The adjustments of mixed prospects to
acceptability (problems 1–4) indicate that, for even chances to win and lose, a
prospect will only be acceptable if the gain is at least twice as large as the loss.
This observation is compatible with a value function that changes slope abruptly
at zero, with a loss-aversion coefficient of about 2 (Tversky and Kahneman 1991).
The median matches in problems 5 and 6 are also consistent with this estimate:
when the possible loss is increased by k the compensating gain must be increased
by about 2 k. Problems 7 and 8 are obtained from problems 5 and 6, respectively, by
positive translations that turn mixed prospects into strictly positive ones. In contrast
to the large values of ™ observed in problems 1–6, the responses in problems 7 and
8 indicate that the curvature of the value function for gains is slight. A decrease in
the smallest gain of a strictly positive prospect is fully compensated by a slightly
larger increase in the largest gain. The standard rank-dependent model, which lacks
the notion of a reference point, cannot account for the dramatic effects of small
translations of prospects illustrated in Table 24.6.
The estimation of a complex choice model, such as cumulative prospect theory,
is problematic. If the functions associated with the theory are not constrained, the
508 A. Tversky and D. Kahneman
number of estimated parameters for each subject is too large. To reduce this number,
it is common to assume a parametric form (e.g., a power utility function), but
this approach confounds the general test of the theory with that of the specific
parametric form. For this reason, we focused here on the qualitative properties of
the data rather than on parameter estimates and measures of fit. However, in order to
obtain a parsimonious description of the present data, we used a nonlinear regression
procedure to estimate the parameters of Eqs. (24.5) and (24.6), separately for each
subject. The median exponent of the value function was 0.88 for both gains and
losses, in accord with diminishing sensitivity. The median was 2.25, indicating
pronounced loss aversion, and the median values of
and ı, respectively, were 0.61
and 0.69, in agreement with Eqs. (24.3) and (24.4) above.4 The parameters estimated
from the median data were essentially the same. Figure 24.3 plots wC and w using
the median estimates of
and ı.
Figure 24.3 shows that, for both positive and negative prospects, people over-
weight low probabilities and underweight moderate and high probabilities. As a
consequence, people are relatively insensitive to probability difference in the middle
of the range. Figure 24.3 also shows that the weighting functions for gains and for
losses are quite close, although the former is slightly more curved than the latter (i.e.,
< ı). Accordingly, risk aversion for gains is more pronounced than risk seeking
for losses, for moderate and high probabilities (see Table 24.3). It is noteworthy that
the condition wC (p) D w (p), assumed in the original version of prospect theory,
accounts for the present data better than the assumption wC (p) D 1 w (1 p),
4
Camerer and Ho (1991) applied Eq. (24.6) to several studies of risky choice and estimated
from
aggregate choice probabilities using a logistic distribution function. Their mean estimate (.56) was
quite close to ours.
24 Advances in Prospect Theory: Cumulative Representation of Uncertainty 509
1.0
w+
w–
0.8
0.6
w(p)
0.4
0.2
0.0
Fig. 24.3 Weighting functions for gains (wC ) and for losses (w ) based on median estimates of
and ı in Eq. (24.6)
x3 x3
200 0
1.0
1.0
(a) nonnegative (b) nonpositive
prospects prospects
0.8
0.8
0.6
0.6
ρ3 ρ3
0.4
0.4
0.2
0.2
0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x2 ρ1 x1 x2 ρ1 x1
100 0 -100 -200
Fig. 24.4 Indifference curves of cumulative prospect theory (a) nonnegative prospects (x1 D 0,
x2 D 100, x3 D 200), and (b) nonpositive prospects (x1 D 200, x2 D 100, x3 D 0). The curves
are based on the respective weighting functions of Fig. 24.3, (
D .61, ı D .69) and on the median
estimates of the exponents of the value function (˛ D ˇ D .88). The broken line through the origin
represents the prospects whose expected value is x2
Finally, the indifference curves for nonpositive prospects resemble the curves for
nonnegative prospects reflected around the 45ı line, which represents risk neutrality.
For example, a sure gain of $100 is equally as attractive as a 71 % chance to win
$200 or nothing (see Fig. 24.4a), and a sure loss of $100 is equally as aversive as a
64 % chance to lose $200 or nothing (see Fig. 24.4b). The approximate reflection of
the curves is of special interest because it distinguishes the present theory from the
standard rank-dependent model in which the two sets of curves are essentially the
same.
Incentives
We conclude this section with a brief discussion of the role of monetary incentives.
In the present study we did not pay subjects on the basis of their choices because
in our experience with choice between prospects of the type used in the present
study, we did not find much difference between subjects who were paid a flat fee
and subjects whose payoffs were contingent on their decisions. The same conclusion
was obtained by Camerer (1989), who investigated the effects of incentives using
several hundred subjects. He found that subjects who actually played the gamble
gave essentially the same responses as subjects who did not play; he also found
no differences in reliability and roughly the same decision time. Although some
studies found differences between paid and unpaid subjects in choice between
simple prospects, these differences were not large enough to change any significant
qualitative conclusions. Indeed, all major violations of expected utility theory
24 Advances in Prospect Theory: Cumulative Representation of Uncertainty 511
(e.g. the common consequence effect, the common ratio effect, source dependence,
loss aversion, and preference reversals) were obtained both with and without
monetary incentives.
As noted by several authors, however, the financial incentives provided in choice
experiments are generally small relative to people’s incomes. What happens when
the stakes correspond to three- or four-digit rather than one- or two-digit figures?
To answer this question, Kachelmeier and Shehata (1991) conducted a series of
experiments using Masters students at Beijing University, most of whom had taken
at least one course in economics or business. Due to the economic conditions in
China, the investigators were able to offer subjects very large rewards. In the high
payoff condition, subjects earned about three times their normal monthly income
in the course of one experimental session! On each trial, subjects were presented
with a simple bet that offered a specified probability to win a given prize, and
nothing otherwise. Subjects were instructed to state their cash equivalent for each
bet. An incentive compatible procedure (the BDM scheme) was used to determine,
on each trial, whether the subject would play the bet or receive the “official” selling
price. If departures from the standard theory are due to the mental cost associated
with decision making and the absence of proper incentives, as suggested by Smith
and Walker (1992), then the highly paid Chinese subjects should not exhibit the
characteristic nonlineariry observed in hypothetical choices, or in choices with small
payoffs.
However, the main finding of Kachelmeier and Shehata (1991) is massive risk
seeking for small probabilities. Risk seeking was slightly more pronounced for
lower payoffs, but even in the highest payoff condition, the cash equivalent for a
5 % bet (their lowest probability level) was, on average, three times larger than its
expected value. Note that in the present study the median cash equivalent of a 5 %
chance to win $100 (see Table 24.3) was $14, almost three times the expected value
of the bet. In general, the cash equivalents obtained by Kachelmeier and Shehata
were higher than those observed in the present study. This is consistent with the
finding that minimal selling prices are generally higher than certainty equivalents
derived from choice (see, e.g., Tversky et al. 1990). As a consequence, they found
little risk aversion for moderate and high probability of winning. This was true for
the Chinese subjects, at both high and low payoffs, as well as for Canadian subjects,
who either played for low stakes or did not receive any payoff. The most striking
result in all groups was the marked overweighting of small probabilities, in accord
with the present analysis.
Evidently, high incentives do not always dominate noneconomic considerations,
and the observed departures from expected utility theory cannot be rationalized in
terms of the cost of thinking. We agree with Smith and Walker (1992) that monetary
incentives could improve performance under certain conditions by eliminating care-
less errors. However, we maintain that monetary incentives are neither necessary
nor sufficient to ensure subjects’ cooperativeness, thoughtfulness, or truthfulness.
The similarity between the results obtained with and without monetary incentives
in choice between simple prospects provides no special reason for skepticism about
experiments without contingent payment.
512 A. Tversky and D. Kahneman
Discussion
Theories of choice under uncertainty commonly specify (1) the objects of choice, (2)
a valuation rule, and (3) the characteristics of the functions that map uncertain events
and possible outcomes into their subjective counterparts. In standard applications
of expected utility theory, the objects of choice are probability distributions over
wealth, the valuation rule is expected utility, and utility is a concave function of
wealth. The empirical evidence reported here and elsewhere requires major revisions
of all three elements. We have proposed an alternative descriptive theory in which
(1) the objects of choice are prospects framed in terms of gains and losses, (2) the
valuation rule is a two-part cumulative functional, and (3) the value function is S-
shaped and the weighting functions are inverse S-shaped. The experimental findings
confirmed the qualitative properties of these scales, which can be approximated by a
(two-part) power value function and by identical weighting functions for gains and
losses.
The curvature of the weighting function explains the characteristic reflection
pattern of attitudes to risky prospects. Overweighting of small probabilities con-
tributes to the popularity of both lotteries and insurance. Underweighting of high
probabilities contributes both to the prevalence of risk aversion in choices between
probable gains and sure things, and to the prevalence of risk seeking in choices
between probable and sure losses. Risk aversion for gains and risk seeking for losses
are further enhanced by the curvature of the value function in the two domains.
The pronounced asymmetry of the value function, which we have labeled loss
aversion, explains the extreme reluctance to accept mixed prospects. The shape
of the weighting function explains the certainty effect and violations of quasi-
convexity. It also explains why these phenomena are most readily observed at the
two ends of the probability scale, where the curvature of the weighting function is
most pronounced (Camerer 1992).
The new demonstrations of the common consequence effect, described in
Tables 24.1 and 24.2, show that choice under uncertainty exhibits some of the
main characteristics observed in choice under risk. On the other hand, there are
indications that the decision weights associated with uncertain and with risky
prospects differ in important ways. First, there is abundant evidence that subjective
judgments of probability do not conform to the rules of probability theory (Kahne-
man et al. 1982). Second, Ellsberg’s example and more recent studies of choice
under uncertainty indicate that people prefer some sources of uncertainty over
others. For example, Heath and Tversky (1991) found that individuals consistently
preferred bets on uncertain events in their area of expertise over matched bets on
chance devices, although the former are ambiguous and the latter are not. The
presence of systematic preferences for some sources of uncertainty calls for different
weighting functions for different domains, and suggests that some of these functions
lie entirely above others. The investigation of decision weights for uncertain events
emerges as a promising domain for future research.
The present theory retains the major features of the original version of prospect
theory and introduces a (two-part) cumulative functional, which provides a con-
24 Advances in Prospect Theory: Cumulative Representation of Uncertainty 513
Let F D f f W S ! Xg be the set of all prospects under study, and let FC and F
denote the positive and the negative prospects, respectively. Let . be a binary
preference relation on F, and let and > denote its symmetric and asymmetric
parts, respectively. We assume that . is complete, transitive, and strictly monotonic,
that is, if f ¤ g and f (s) g(s) for all s 2 S, then f > g.
For any f, g 2 F and A S, define h D fAg by: h(s) D f (s) if s 2 A, and h(s) D g(s)
if s 2 S – A. Thus, fAg coincides with f on A and with g on S A. A preference
relation . on F satisfies independence if for all f, g, f 0 , g0 2 F and A S, fAg .
fAg0 iff f 0 Ag . f 0 Ag0 . This axiom, also called the sure thing principle (Savage
1954), is one of the basic qualitative properties underlying expected utility theory,
and it is violated by Allais’s common consequence effect. Indeed, the attempt
to accommodate Allais’s example has motivated the development of numerous
514 A. Tversky and D. Kahneman
models, including cumulative utility theory. The key concept in the axiomatic
analysis of that theory is the relation of comonotonicity, due to Schmeidler (1989). A
pair of prospects f, g 2 F are comonotonic if there are no s, t 2 S such that f (s) > f (t)
and g(t) > g(s). Note that a constant prospect that yields the same outcome in every
state is comonotonic with all prospects. Obviously, comonotonicity is symmetric
but not transitive.
Cumulative utility theory does not satisfy independence in general, but it implies
independence whenever the prospects fAg, fAg0 , f 0 Ag, and f 0 Ag0 above are pairwise
comonotonic. This property is called comonotonic independence.5 It also holds in
cumulative prospect theory, and it plays an important role in the characterization
of this theory, as will be shown below. Cumulative prospect theory satisfies an
additional property, called double matching: for all f, g 2 F, if f C gC and f g ,
then f g.
To characterize the present theory, we assume the following structural conditions:
S is finite and includes at least three states; X D Re; and the preference order is
continuous in the product topology on Rek , that is, ff 2 F:f gg and ff 2 F :
g f g are closed for any g 2 F. The latter assumptions can be replaced by restricted
solvability and a comonotonic Archimedean axiom (Wakker 1991).
Theorem 24.1 Suppose (FC , .) and (F , .) can each be represented by a
cumulative functional. Then (F, .) satisfies cumulative prospect theory iff it
satisfies double matching and comonotonic independence.
The proof of the theorem is given at the end of the appendix. It is based on a
theorem of Wakker (1992) regarding the additive representation of lower-diagonal
structures. Theorem 24.1 provides a generic procedure for characterizing cumulative
prospect theory. Take any axiom system that is sufficient to establish an essentially
unique cumulative (i.e., rank-dependent) representation. Apply it separately to the
preferences between positive prospects and to the preferences between negative
prospects, and construct the value function and the decision weights separately for
FC and for F . Theorem 24.1 shows that comonotonic independence and double
matching ensure that, under the proper rescaling, the sum V(f C ) C V(f ) preserves
the preference order between mixed prospects. In order to distinguish more sharply
between the conditions that give rise to a one-part or a two-part representation, we
need to focus on a particular axiomatization of the Choquet functional. We chose
Wakker’s (1989a, b) because of its generality and compactness.
For x 2 X, f 2 F, and r 2 S, let xfrgf be the prospect that yields x in state r
and coincides with f in all other states. Following Wakker (1989a), we say that a
preference relation satisfies tradeoff consistency6 (TC) if for all x, x0 , y, y0 2 X, f, f0 ,
g, g0 2 F, and s, t 2 S.
5
Wakker (1989b) called this axiom comonotonic coordinate independence. Schmeidler (1989) used
comonotonic independence for the mixture space version of this axiom: f . g iff. ˛f C (1 – ˛)h .
˛g C (1 – ˛)h.
6
Wakker (1989a, b) called this property cardinal coordinate independence. He also introduced an
equivalent condition, called the absence of contradictory tradeoffs.
24 Advances in Prospect Theory: Cumulative Representation of Uncertainty 515
x fsg f . y fsg g; x0 fsg f & y0 fsg g and x ftg f 0 & y ftg g0 imply x0 ftg f 0 & y0 ftg g0 :
To appreciate the import of this condition, suppose its premises hold but
the conclusion is reversed, that is, y0 ftgg0 > x0 ftgf 0 . It is easy to verify that
under expected utility theory, the first two inequalities, involving fsg, imply
u(y) u(y0 ) u(x) u(x0 ), whereas the other two inequalities, involving ftg, imply
the opposite conclusion. Tradeoff consistency, therefore, is needed to ensure that
“utility intervals” can be consistently ordered. Essentially the same condition was
used by Tversky et al. (1988) in the analysis of preference reversal, and by Tversky
and Kahneman (1991) in the characterization of constant loss aversion.
A preference relation satisfies comonotonic tradeoff consistency (CTC) if TC
holds whenever the prospects x fsgf, yfsgg, x0 fsgf, and y0 fsgg are pairwise comono-
tonic, as are the prospects xffgf 0 , yftgg0 , x0 ftgf 0 , and y0 ftgg0 (Wakker 1989a). Finally,
a preference relation satisfies sign-comonotonic tradeoff consistency (SCTC) if
CTC holds whenever the consequences x, x0 , y, y0 are either all nonnegative or
all nonpositive. Clearly, TC is stronger than CTC, which is stronger than SCTC.
Indeed, it is not difficult to show that (1) expected utility theory implies TC, (2)
cumulative utility theory implies CTC but not TC, and (3) cumulative prospect
theory implies SCTC but not CTC. The following theorem shows that, given our
other assumptions, these properties are not only necessary but also sufficient to
characterize the respective theories.
Theorem 24.2 Assume the structural conditions described above.
(a) (Wakker 1989a) Expected utility theory holds iff . satisfies TC.
(b) (Wakker 1989b) Cumulative utility theory holds iff . satisfies CTC.
(c) Cumulative prospect theory holds iff . satisfies double matching and SCTC.
A proof of part c of the theorem is given at the end of this section. It shows that,
in the presence of our structural assumptions and double matching, the restriction
of tradeoff consistency to sign-comonotonic prospects yields a representation with
a reference-dependent value function and different decision weights for gains and
for losses.
Proof of Theorem 24.1 The necessity of comonotonic independence and double
matching is straightforward. To establish sufficiency,Precall that, by assumption,
P
there exist functions C , , vC , v , such that V C D C vC and V D v
preserve . on FC and on F , respectively. Furthermore, by the structural assump-
tions, C and are unique, whereas vC and v are continuous ratio scales. Hence,
we can set vC (1) D 1 and v ( l) D ™ < 0, independently of each other.
Let Q be the set of prospects such that for any q 2 Q, q(s) ¤ q(t) for any distinct
s, t 2 S. Let Fg denote the set of all prospects in F that are comonotonic with G. By
comonotonic independence and our structural conditions, it follows readily from a
theorem of Wakker (1992) on additive representations for lower-triangular subsets
of Rek that, given any P q 2 Q, there exist intervals scales fUqi g, with a common
unit, such that Uq D i Uqi preserves . on Fq . With no loss of generality we
516 A. Tversky and D. Kahneman
can set Uqi (0) D 0 for all i and Uq (1) D 1. Since VC and V above are additive
representations of . on FqC and Fq , respectively, it follows by uniqueness that there
exist aq, bq > 0 such that for all i, Uqi equals aq iC vC on ReC , and Uqi equals
bq i v on Re .
So far the representations were required to preserve the order only within each
Fq . Thus, we can choose scales so that bq D 1 for all q. To relate the different
representations, select a prospect h ¤ q. Since VC should preserve the order on FC ,
and Uq should preserve the order within each Fq , we can multiply VC by ah , and
replace each aq by aq /ah . In other words, we may set ah D 1. For any q 2 Q, select f 2
Fq , g 2 Fh such that f C gC > 0, f g > 0, and g 0. By double matching, then,
f g 0. Thus, aq VC (f C ) C V (f ) D 0, since this form preserves the order on
Fq . But VC (f C ) D VC (gC ) and V (f ) D V (g ), so VC (gC ) C V (g ) D 0 implies
VC (f C ) C V (f ) D 0. Hence, aq D 1, and V(f ) D VC (f C ) C V (f ) preserves the
order within each Fq .
To show that V preserves the order on the entire set, consider any f, g 2 F and
suppose f . g. By transitivity, c(f ) . c(g) where c(f ) is the certainty equivalent
of f. Because c(f ) and c(g) are comonotonic, V(f ) D V(c(f )) V(c(g)) D V(g).
Analogously, f > g implies V(f ) > V(g), which complete the proof of theorem 24.1.
Proof of Theorem 24.2 (part c) To establish the necessity of SCTC, apply cumula-
tive prospect theory to the hypotheses of SCTC to obtain the following inequalities:
X
V .x fsg f / D s v.x/ C r v . f .r//
Xs
r2S
s0 v.y/ C r0 v .g.r// D V .y fsg g/
Xs
r2S
V .x0 fsg f / D s v .x0 / C r v . f .r//
Xs
r2S
s0 v .y0 / C r0 v .g.r// D V y0 fsg g :
r2S s
The decision weights above are derived, assuming SCTC, in accord with Eqs.
(24.1) and (24.2). We use primes to distinguish the decision weights associated
with g from those associated with f. However, all the above prospects belong to
the same comonotonic set. Hence, two outcomes that have the same sign and are
associated with the same state have the same decision weight. In particular, the
weights associated with xfsgf and x0 fsgf are identical, as are the weights associated
with yfsgg and with y0 fsgg. These assumptions are implicit in the present notation.
It follows that
s v.x/ s0 v.y/ s v x0 s0 v y0 :
Because x, y, x0 , y0 have the same sign, all the decision weights associated with state
s are identical, that is, s D s0 . Cancelling this common factor and rearranging
terms yields v(y) v(y0 ) v(x) v(x0 ).
24 Advances in Prospect Theory: Cumulative Representation of Uncertainty 517
Suppose SCTC is not valid, that is, xftgf . yftgg0 but x0 ftgf 0 < y0 ftgg0 . Applying
cumulative prospect theory, we obtain
X
V .x ftg f 0 / D t v.x/ C r v f 0 .r/
Xt
r2S
t v.y/ C r v g0 .r/ D V .y ftg g0 /
Xt
r2S
V .x0 ftg f 0 / D t v .x0 / C r v f 0 .r/
Xt
r2S
< t v .y0 / C t v g0 .r/ D V .y0 ftg g0 / :
r2S t
Adding these inequalities yields v(x) – v(x0 ) > v(y) – v(y0 ) contrary to the previous
conclusion, which establishes the necessity of SCTC. The necessity of double
matching is immediate.
To prove sufficiency, note that SCTC implies comonotonic independence. Letting
x D y, x0 D y0 , and f D g in TC yields xftgf 0 . xftgg0 implies x0 ftgf 0 . x0 ftgg0 ,
provided all the above prospects are pairwise comonotonic. This condition readily
entails comonotonic independence (see Wakker 1989b).
To complete the proof, note that SCTC coincides with CTC on (FC , .) and on
(F , .). By part b of this theorem, the cumulative functional holds, separately,
in the nonnegative and in the nonpositive domains. Hence, by double matching
and comonotonic independence, cumulative prospect theory follows from Theorem
24.1.
References
Allais, M. (1953). Le comportement de l’homme rationel devant le risque, critique des postulates
et axiomes de l’ecole americaine. Econometrica, 21, 503–546.
Arrow, K. J. (1982). Risk perception in psychology and economies. Economic Inquiry, 20, 1–9.
Camerer, C. F. (1989). An experimental test of several generalized utility theories. Journal of Risk
and Uncertainty, 2, 61–104.
Camerer, C. F. (1992). Recent tests of generalizations of expected utility theory. In W. Edwards
(Ed.), Utility: Theories, measurement and applications. Boston: Kluwer Academic Publishers.
Camerer, C. F., & Ho, T.-H., (1991). Nonlinear weighting of probabilities and violations of the
betweenness axiom. Unpublished manuscript, The Wharton School, University of Pennsylva-
nia.
Chew, S.-H. (1989). An axiomatic generalization of the quasilinear mean and the gini mean with
application to decision theory, Unpublished manuscript, Department of Economics, University
of California at Irvine.
Choquet, G. (1955). Theory of capacities. Annales de L’Institut Fourier, 5, 131–295.
Cohen, M., Jaffray, J.-Y., & Said, T. (1987). Experimental comparison of individual behavior
under risk and under uncertainty for gains and for losses. Organizational Behavior and Human
Decision Processes, 39, 1–22.
Ellsberg, D. (1961). Risk, ambiguity, and the savage axioms. Quarterly Journal of Economics, 75,
643–669.
518 A. Tversky and D. Kahneman
Fishburn, P. C. (1988). Nonlinear preference and utility theory. Baltimore: The Johns Hopkins
University Press.
Gilboa, I. (1987). Expected utility with purely subjective non-additive probabilities. Journal of
Mathematical Economics, 16, 65–88.
Heath, C., & Tversky, A. (1991). Preference and belief: Ambiguity and competence in choice under
uncertainty. Journal of Risk and Uncertainty, 4, 5–28.
Hershey, J. C., & Schoemaker, P. J. H. (1980). Prospect theory’s reflection hypothesis: A critical
examination. Organizational Behavior and Human Performance, 25, 395–418.
Hogarth, R., & Einhorn, H. (1990). Venture theory: A model of decision weights. Management
Science, 36, 780–803.
Kachelmeier, S. J., & Shehata, M. (1992). Examining risk preferences under high monetary
incentives: Experimental evidence from the People’s Republic of China. American Economic
Review, 82(5), 1120–1141.
Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk.
Econometrica, 47, 263–291.
Kahneman, D., & Tversky, A. (1984). Choices, values and frames. American Psychologist, 39,
341–350.
Kahneman, D., Slovic, P., & Tversky, A. (Eds.). (1982). Judgment under uncertainty: Heuristics
and biases. New York: Cambridge University Press.
Loomes, G., & Sugden, R. (1982a). Regret theory: An alternative theory of rational choice under
uncertainty. The Economic Journal, 92, 805–824.
Loomes, G., & Sugden, R. (1982b). Some implications of a more general form of regret theory.
Journal of Economic Theory, 41, 270–287.
Luce, R. D., & Fishburn, P. C. (1991). Rank- and sign-dependent linear utility models for finite
first-order gambles. Journal of Risk and Uncertainty, 4, 29–59.
Machina, M. J. (1987). Choice under uncertainty: Problems solved and unsolved. Economic
Perspectives, 1(1), 121–154.
Marschak, J. (1950). Rational behavior, uncertain prospects, and measurable utility. Econometrica,
18, 111–114.
Nakamura, Y. (1990). Subjective expected utility with non-additive probabilities on finite state
space. Journal of Economic Theory, 51, 346–366.
Prelec, D. (1989). On the shape of the decision weight function. Unpublished manuscript, Harvard
Graduate School of Business Administration.
Prelec, D. (1990). A ‘Pseudo-endowment’ effect, and its implications for some recent non-expected
utility models. Journal of Risk and Uncertainty, 3, 247–259.
Quiggin, J. (1982). A theory of anticipated utility. Journal of Economic Behavior and Organiza-
tion, 3, 323–343.
Savage, L. J. (1954). The foundations of statistics. New York: Wiley.
Schmeidler, D. (1989). Subjective probability and expected utility without additivity. Economet-
rica, 57, 571–587.
Segal, U. (1989). Axiomatic representation of expected utility with rank-dependent probabilities.
Annals of Operations Research, 19, 359–373.
Smith, V. L., & Walker, J. M. (1992). Monetary rewards and decision cost in experimental
economics. Unpublished manuscript, Economic Science Lab, University of Arizona.
Starmer, C., & Sugden, R. (1989). Violations of the independence axiom in common ratio
problems: An experimental test of some competing hypotheses. Annals of Operations Research,
19, 79–102.
Tversky, A. (1969). The intransitivity of preferences. Psychology Review, 76, 31–48.
Tversky, A., Kahneman, D. (1986). Rational choice and the framing of decisions, The Journal of
Business 59(4), part 2, S251–S278.
Tversky, A., & Kahneman, D. (1991). Loss aversion in riskless choice: A reference dependent
model. Quarterly Journal of Economics, 107(4), 1039–1061.
Tversky, A., Sattath, S., & Slovic, P. (1988). Contingent weighting in judgment and choice.
Psychological Review, 95(3), 371–384.
24 Advances in Prospect Theory: Cumulative Representation of Uncertainty 519
Tversky, A., Slovic, P., & Kahneman, D. (1990). The causes of preference reversal. The American
Economic Review, 80(1), 204–217.
Viscusi, K. W. (1989). Prospective reference theory: Toward an explanation of the paradoxes.
Journal of Risk and Uncertainty, 2, 235–264.
Wakker, P. P. (1989a). Additive representations of preferences: A new foundation in decision
analysis. Dordrecht: Kluwer Academic Publishers.
Wakker, P. P. (1989b). Continuous subjective expected utility with nonadditive probabilities.
Journal of Mathematical Economics, 18, 1–27.
Wakker, P. P. (1990). Separating marginal utility and risk aversion. Unpublished manuscript,
University of Nijmegen, The Netherlands.
Wakker, P. P. (1991). Additive representations of preferences, a new foundation of decision
analysis; the algebraic approach. In J. D. Doignon & J. C. Falmagne (Eds.), Mathematical
psychology: Current developments (pp. 71–87). Berlin: Springer.
Wakker, P. (1993). Additive representations on rank-ordered sets: II. The topological approach.
Journal of Mathematical Economics, 22(1), 1–26.
Wakker, P. P., & Tversky, A. (1991). An axiomatization of cumulative prospect theory. Unpublished
manuscript. University of Nijmegan, the Netherlands.
Wehrung, D. A. (1989). Risk taking over gains and losses: A study of oil executives. Annals of
Operations Research, 19, 115–139.
Weymark, J. A. (1981). Generalized gini inequality indices. Mathematical Social Sciences, 1,
409–430.
Yaari, M. E. (1987). The dual theory of choice under risk. Econometrica, 55, 95–115.
Part IV
Logics of Knowledge and Belief
Chapter 25
Introduction
This part, like all others in this book, consists of a mix of classic papers that have
defined the area and modern ones illustrating important current issues. These texts
provide rich fare, and they defy simple labels summarizing their content. Moreover,
a look at the list of authors reveals a mixture of different academic cultures, from
philosophy to computer science. One might also add that the texts do not all agree:
they span a landscape with many positions and perspectives.
Epistemic logic as the systematic study of reasoning with knowledge and belief
started with Jaakko Hintikka’s classic book Knowledge and Belief: An Introduction
to Logic of the Two Notions, which set the agenda for many subsequent lines of
research and debate. It interpreted knowledge as what is true in some current range
of epistemically accessible worlds, while doing something similar for belief and
doxastic accessibility. Thus, general methods from modal logic became available
for studying knowledge, and the resulting axiomatic systems have shaped many
philosophical discussions for or against principles of ‘omniscience’, ‘closure’, and
positive and negative ‘introspection’. One general interest behind such specific
issues has been the search for satisfactory definitions of knowledge, an interest
with a long epistemological history running from Plato’s “justified true belief” to
post-Gettier strengthenings involving forms of robustness of true beliefs under new
information, under new relevant considerations, or across counterfactual variations
H. Arló-Costa (deceased)
Carnegie Mellon University, Pittsburgh, PA, USA
V.F. Hendricks ()
Center for Information and Bubble Studies, University of Copenhagen, Copenhagen, Denmark
e-mail: [email protected]
J. van Benthem
University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands
Stanford University, Stanford, United States
e-mail: [email protected]
of the actual world – and in a related spirit, in various forms of faithful tracking
during a history of investigation. The past half-century of formal studies has even
produced further perspectives, such as the availability of proof or evidence in some
precise sense, or convergence in the limit in some process of inquiry.
The paper by Dretske makes the original Hintikka semantics a more dynamic
process, showing how knowledge claims are always based on some current range
of relevant worlds, which can change under the pressure of legitimate new con-
siderations. In a related vein, Lewis provides systematic principles guiding this
process of selecting relevant worlds. Following a further intuition, Nozick proposes
a counterfactual idea of knowledge as a true belief that would stay attuned to the
facts in non-actual worlds close to ours. While these approaches are semantic,
Artemov’s ‘justification logic’ brings explicit proof and evidence into epistemic
logic, allowing us to syntactically manipulate reasons for our beliefs. Finally, Kelly
discusses the learning-theoretic view of temporal convergence for knowledge of
complete histories of the world.
Since this part is about logics of knowledge and belief, many readers will be
interested not just in formal languages and semantics, but also in complete calculi
for reasoning capturing the operational proof-theoretic aspects of reasoning with
knowledge or belief. Stalnaker’s paper presents a broad logical view of possible
modal systems and defensible epistemic and doxastic principles, and in another
perspective, so does Artemov. Interestingly, not all notions of knowledge proposed
in the ‘robustness’ tradition have been studied in this systematic manner, and
many questions remain open, though actively pursued by some young philosophical
logicians.
Another running theme is the issue of which epistemic attitudes form a natural
family that requires scrutiny in its entirety. Knowledge and belief need not be
enough, and for instance, Stalnaker’s survey of doxastic and epistemic logics
proposes a new notion of ‘safe belief’ that will survive true new information,
as an intermediate between logic and belief simpliciter. Parikh even suggests
an algebraic framework that ties together a wide abstract range of knowledge,
belief, and action in general. Specializing general action again to informational
action, we arrive at what has been Hintikka’s guiding interest throughout: the
combination of knowledge as based on our current semantic information with acts
that systematically change that information, such as questions, and games of inquiry
over time.
It is only a short step then to a dynamic focus on learning rather than the mere
statics of knowledge. Kelly’s article looks at this dynamics from the viewpoint of
learning theory, and investigates which truths can be acquired in the limit, i.e.,
which processes of inquiry will reliably converge to stable true belief about the
answer to the main question at stake. In a somewhat related mode, Williamson
focuses on the scope of purely operational definitions of knowledge, and shows
that knowledge-based epistemology remains indispensable. And these are just two
dynamic or computational aspects of knowledge and belief. There is more to
information dynamics when one begins to study effects of specific local acts of
25 Introduction 525
knowledge update or belief revision as an agent navigates the world. Many of these
topics will be addressed in the next section on interactive epistemology, since much
information flow that is crucial to humans involves more than one party: be it a
group of agents, or just one agent interacting with Nature.
There are many further themes to ponder when reading these articles. Does the
semantics describe internal first-person views of epistemic agents, or the theorist’s
external view of their situation? Do different epistemic attitudes correlate with
different sorts of information? How does knowledge of propositions tie in with
knowledge of objects, “that” or “whether” versus “which”, and why not then
also discuss knowledge “how” and “why”? And finally, what is the status of all
these logical theories? Are they normative prescriptions, of do they represent some
existing cognitive practice, if only idealized and at higher levels of abstraction?
Reading the papers in this section will not necessarily answer all these questions,
but it will make readers much better equipped to pursue these issues for themselves.
Starting with a classical trailblazer, J. Hintikka, Knowledge and Belief: An Introduction to the
Logic of the Two Notions, Cornell University Press 1962 and King’s College Publications 2005, set
the whole subject on its course. A series of later books broadened the paradigm to a general view of
information and inquiry, as represented in J. Hintikka, Logic, Language-Games and Information:
Kantian Themes in the Philosophy of Logic, Clarendon Press Oxford, 1973. Putting inquiry at
center stage in epistemology has also been the persistent theme of Robert Stalnaker’s work in
the field, with Inquiry, The MIT Press, 1987, as a classic source. Meanwhile, a richer view of
possible epistemic and doxastic attitudes suggested by natural language was investigated in W.
Lenzen, “Recent Work in Epistemic Logic”, Acta Philosophica Fennica 30 (1978): 1–219, which
also discusses links with probability. Also still in the 1970s, epistemic and doxastic logic were
rediscovered in the foundations of game theory, but references for this will be found in another
part of these readings. But arguably the major development invigorating epistemic logic has
been its crossing into computer science, in the study of information-driven agency. Two major
books demonstrating the resulting outburst of new research are R. Fagin, J. Y. Halpern, Y. Moses
& M. Vardi, Reasoning about Knowledge, The MIT Press, 1995, and W. van der Hoek & J-J
Meijer, Epistemic Logic for AI and Computer Science, Cambridge University Press, 1995. An
even more radically computational algorithmic understanding has been that of formal learning
theory, inspired by learning methods for infinite structures like human languages or the process of
scientific inquiry. The classic source for this is K. Kelly, The Logic of Reliable Inquiry, Oxford
University Press, 1996. We conclude with two other angles on knowledge that bring in further
mathematical paradigms. One is the verificationist perspective on knowledge through proof and
evidence, for which a classic text is M. Dummett, Truth and Other Enigmas, Harvard University
Press, 1978. Finally, while these publications are concerned with knowledge and belief, another
broad stream has taken information flow through Shannon-type channels to be the basic underlying
notion, following Dretske’s classic Knowledge and the Flow of Information, The MIT Press,
1981. An innovative logical framework taking this road much further is J. Barwise & J. Seligman,
Information Flow, Cambridge University Press, 1995.
Chapter 26
Epistemology Without Knowledge and Without
Belief
Jaakko Hintikka
definition or not, should start from the role that they play in real life. Now in real
life we are both producers and consumers of knowledge. We acquire knowledge
in whatever ways we do so, and we then put it to use in our actions and decision-
making. I will here start from the latter role, which takes us to the question: What is
the role that the notion of knowledge plays in that decision-making?
To take a simple example, let us suppose that I am getting ready to face a new
day in the morning. How, then, does it affect my actions if I know that it will not
rain today? You will not be surprised if I say that what it means is that I am entitled
to behave as if it will not rain—for instance to leave my umbrella home. However,
you may be surprised if I claim that most of the important features of the logical
behavior of the notion of knowledge can be teased out of such simple examples. Yet
this is the case. My modest example can be generalized. The role of knowledge in
decision-making is to rule out certain possibilities. In order to use my knowledge,
I must know which possibilities it rules out. In other words, any one scenario must
therefore be either incompatible or compatible with what I know, for I am either
entitled or not entitled to disregard it. Thus the totality of incompatible scenarios
determines what I know and what I do not know, and vice versa. In principle, all that
there is to logic of knowledge is this dichotomy between epistemically impossible
and epistemically possible scenarios.
It is also clear how this dichotomy serves the purposes of decision-making, just as
it does in my mini-example of deciding whether or not to take an umbrella with me.
But the connection with overt behavior is indirect, for what the dichotomy merely
demarcates are the limits of what I am entitled to disregard. And being entitled to
do something does not always mean that I do it. It does not always show up in
the overt ways one actually or even potentially acts. For other considerations may
very well enter into my decision-making. Maybe I just want to sport an umbrella
even though I know that it need not serve its function of shielding myself from rain.
Maybe I am an epistemological akrates and act against what I know. The connection
is nevertheless real, even though it is a subtle one. There is a link between my
knowledge and my decisions, but it is, so to speak, a de jure connection and not
a de facto connection. I think that this is a part of what John Austin (1961a) was
getting at when he compared “I know” with “I promise.” To know something does
not mean simply to have evidence of a superior degree for it, nor does it mean to
have a superior kind of confidence in it. If my first names were George Edward,
I might use the open-question argument to defend these distinctions. By saying “I
promise,” I entitle you to expect that I fulfill my promise. By saying “I know,” I
claim that I am entitled to disregard those possibilities that do not agree with what
I know. There is an evaluative element involved in the concept of knowledge that
does not reduce to the observable facts of the case. Hence, it is already seen to
be unlikely that you could define what it means to know by reference to matters
of fact, such as the evidence that the putative knower possesses or the state of the
knower’s mind.
This evaluative element is due to the role of knowledge in guiding our life in that
it plays a role in the justification of our decisions. This role determines in the last
analysis the logic and in some sense the meaning of knowledge. A Wittgensteinean
might put this point by saying that decision-making is one of the language-games
26 Epistemology Without Knowledge and Without Belief 529
that constitute the logical home of the concept of knowledge. You can remove
knowledge from the contexts of decision-making, but you cannot remove a relation
to decision-making from the concept of knowledge. For this reason, it is among
other things misguided in a fundamental way to try to separate epistemic possibility
from actual (natural) possibility. Of course, the two are different notions, but the
notion of epistemic possibility has conceptual links to the kind of possibility that
we have to heed in our decision-making. For one thing, the set of scenarios involved
in the two notions must be the same.
But the main point here is not that there is an evaluative component to the
notion of knowledge. The basic insight is that there is a link between the concept
of knowledge and human action. The evaluative element is merely a complicating
factor in the equation. The existence of a link between the two is not peculiar to the
notion of knowledge. There is a link, albeit of a different kind, also in the case of
belief. In fact, the conceptual connection is even more obvious in the case of belief.
Behavioral scientists have studied extensively decision principles where belief
constitutes one component, as, for instance, in the principle of maximizing expected
utility. It usually comes in the form of degrees of belief. (They are often identified
with probabilities.) Typically, utilities constitute another component. Whether or
not such explicit decision principles capture the precise links between belief and
behavior, they illustrate the existence of the link and yield clues to its nature.
Indeed, from a systematic point of view, the relative roles assigned to knowledge
and to belief in recent epistemology and recent decision theory cannot but appear
paradoxical. Belief is in such studies generally thought of as a direct determinant of
our decisions, whereas knowledge is related to action only indirectly, if at all. Yet
common sense tells us that one of the main reasons for looking for more knowledge
is to put us in a better position in our decision-making, whereas philosophers often
consider belief—especially when it is contrasted with knowledge—as being initially
undetermined by our factual information and therefore being a much worse guide
to decision-making. Probability is sometimes said to be a guide to life, but surely
knowledge is a better one. Or, if we cannot use black-or-white concepts here,
shouldn’t rational decision-making be guided by degrees of knowledge rather than
degrees of mere belief?
The same point can perhaps be made by noting that in many studies of decision-
making, a rational agent is supposed to base his or her decisions on the agent’s
beliefs (plus, of course, utilities) and then by asking: Would it not be even more
rational for the agent to base his or her decisions on what the agent knows?
In order for a rational agent to act on his or her belief, this belief clearly must be
backed up by some evidence. Otherwise, current decision theory makes little sense.
The difference is that the criteria of what entities are to act are different in the case
of belief from what they are in the case of knowledge. If I act on a belief, that belief
must satisfy my personal requirements for that role. They may vary from person to
person. In contrast, the criteria of knowing are impersonal and not dependent on the
agent in question. In order to define knowledge as distinguished from beliefs, we
would have to spell out those impersonal criteria. This is obviously an extremely
difficult task at best.
530 J. Hintikka
Another fact that complicates the connection between knowledge and behavior—
that is, between what I know and what I do—is that in principle, this link is holistic.
What matters to my decisions in the last analysis is the connection between the
totality of my knowledge. There is not always any hard-and-fast connection between
particular items of knowledge and my behavior. In principle, the connection is via
my entire store of knowledge. This is reflected by the fact emphasized earlier that the
dichotomy that determines the logic of knowledge is a distinction between scenarios
that are ruled out by the totality of what I know and scenarios that are compatible
with the totality of my knowledge and that I therefore must be prepared for. The
same feature of the concept of knowledge also shows up in the requirement of total
evidence that is needed in Bayesian inference and which has prompted discussion
and criticism there. (See, e.g., Earman 1992.)
To spell out the criteria of the justification involved in the applications of the
concept of knowledge is to define what knowledge is as distinguished from other
propositional attitudes. Characterizing these conditions is obviously a complicated
task. I will return to these criteria later in this chapter.
today means that none of the scenarios under which the wet stuff falls down are
among my epistemic alternatives, and likewise for all knowing that statements.
What the concept of knowledge involves in a purely logical perspective is thus
a dichotomy of the space of all possible scenarios into those that are compatible
with what I know and those that are incompatible with my knowledge. What was
just seen is that this dichotomy is directly conditioned by the role of the notion
of knowledge in real life. Now this very dichotomy is virtually all we need in
developing an explicit logic of knowledge, better known as epistemic logic. This
conceptual parentage is reflected by the usual notation of epistemic logic. In it, the
epistemic operator Ka (“a knows that”) receives its meaning from the dichotomy
between excluded and admitted scenarios, while the sentence within its scope
specifies the content of the item of knowledge in question.
Basing epistemic logic on such a dichotomy has been the guiding idea of my
work in epistemic logic right from the beginning. I have seen this idea being credited
to David Lewis, but I have not seen any uses of it that predate my work.
But here we seem to run into a serious problem in interpreting epistemic logic
from the vantage point of a dichotomy of excluded and admitted scenarios. Such
an interpretation might seem to exclude “quantifying in”—that is to say, to exclude
applications of the knowledge operator to open formulas for them, it would not
make any sense to speak of scenarios in which the content of one’s knowledge is
true or false. Such “quantifying in” is apparently indispensable for the purpose of
analyzing the all-important wh-constructions with knows. For instance, “John knows
who murdered Roger Ackroyd” apparently must be expressed by
as distinguished from
which says that John knows that someone murdered the victim and hence can serve
as the presupposition of the question, “Who murdered Roger Ackroyd?”
But in (26.1), the notion of knowledge apparently cannot be interpreted by
reference to a distinction between admitted and excluded scenarios. The reason is
that the knowledge operator in (26.1) is prefixed to an open formula. Such an open
formula cannot be said to be true or false in a given scenario, for its truth depends
on the value of the variable x. Hence it cannot implement the required dichotomy.
In order for our epistemic discourse to express the wh-constructions, the knowl-
edge operator must apparently be allowed to occur also internally, prefixed to open
formulas rather than sentences (formulas without free variables). This prompts a
serious interpretational problem. Indeed we can see here the reason for the deep
theoretical interest of the problem of “quantifying in,” which otherwise might strike
one as being merely the logicians’ technical problem. Fortunately, this apparent
problem can be solved by means of suitable analysis of the relations between
different logical operators (see section “Information acquisition as a questioning
procedure”).
532 J. Hintikka
An epistemic logic of this kind can obviously be developed within the framework
of possible worlds semantics. (For a sketch of how this can be done, see Hintikka
2003b.) In fact, the truth condition for knows that is little more than a translation
of what was just said: “b knows that S” is true in a world W if and only if S is
true in all the epistemic b-alternatives to W. These alternatives are all the scenarios
or “worlds” compatible with everything b knows in W. In certain important ways,
this truth condition for knowledge statements is clearer than its counterpart in the
ordinary (alethic) modal semantics, in that in epistemic logic the interpretation of
the alternativeness relation (alias accessibility relation) is much clearer than in the
logic of physical or metaphysical modalities.
Here we have already reached a major conclusion. Epistemic logic presupposes
essentially only the dichotomy between epistemically possible and epistemically
excluded scenarios. How this dichotomy is drawn is a question pertaining to
the definition of knowledge. However, we do not need to know this definition
in doing epistemic logic. Thus the logic and the semantics of knowledge can
be understood independently of any explicit definition of knowledge. Hence it
should not be surprising to see that a similar semantics and a similar logic can be
developed for other epistemic notions—for instance, belief, information, memory,
and even perception. This is an instance of a general law holding for propositional
attitudes. This law says that the content of a propositional attitude can be specified
independently of differences between different attitudes. This law has been widely
recognized, even if it has not always been formulated as a separate assumption. For
instance, in Husserl (1983, e.g., sec. 133) it takes the form of separating the noematic
Sinn from the thetic component of a noema. As a consequence, the respective logics
of different epistemic notions do not differ much from each other. In particular,
they do not differ at all in those aspects of their logic that depend merely on
the dichotomical character of their semantics. These aspects include prominently
the laws that hold for quantifiers and identity, especially the modifications that
are needed in epistemic contexts in the laws of the substitutivity of identity and
existential generalization.
The fact that different epistemic notions, such as knowledge, belief, and informa-
tion, share the same dichotomic logic should not be surprising in the light of what
has been said. The reason is that they can all serve the same purpose of guiding our
decisions, albeit in different ways. Hence the same line of thought can be applied
to them as was applied earlier to the concept of knowledge, ending up with the
conclusion that their logic is a dichotomic logic not unlike the logic that governs the
notion of knowledge. The common ingredient in all these different logics is then the
true epistemic logic. But it turns out to be a logic of information rather than a logic
of knowledge.
This distinction between what pertains to the mere dichotomy between admitted
and excluded scenarios and what pertains to the criteria relied on in this dichotomy
is not a novelty. It is at bottom only a restatement in structural terms of familiar
contrast, which in the hands of different thinkers has received apparently different
formulations. The dichotomy defines the content of a propositional attitude, while
the criteria of drawing it determine which propositional attitude we are dealing with.
26 Epistemology Without Knowledge and Without Belief 533
Hence we are naturally led to the project of developing a generic logic of contents
of attitudes, independent of the differences between different attitudes.
This generic logic of epistemology can be thought of as the logic of information.
Indeed, what the content of a propositional attitude amounts to can be thought of as
a certain item of information. In attributing different attitudes to agents, different
things are said about this information—for instance, that it is known, believed,
remembered, and so on. This fits in well with the fact that the same content can
be known by one person, believed by another, remembered by a third one, and so
on. This idea that one and the same objective content may be the target of different
people’s different attitudes is part of what Frege (see, e.g., 1984) was highlighting
by his notion of the thought. Thus it might even be happier to talk about the logic of
information than about epistemic logic. John Austin (1961b) once excused his use
of the term “performative” by saying that even though it is a foreign word and an
ugly word that perhaps does not mean very much, it has one good thing about it: It
is not a deep word. It seems to me that epistemology would be in much better shape
if instead of the deep word “knowledge,” philosophers cultivated more the ugly
foreign word “information,” even though it perhaps does not capture philosophers’
profound sense of knowing. In any case, in the generic logic of epistemology here
envisaged, philosophers’ strong sense of knowledge plays no role.
But what about the other context in which we encounter knowledge in real life—the
context of knowledge acquisition? As was noted, what the concept of knowledge
amounts to is revealed by two questions: What is it that we are searching for in
the process of knowledge acquisition? What purpose can the product of such an
inquiry serve? The second question has now been discussed. It remains to examine
the crucial first question. Surely the first order of business of any genuine theory
of knowledge—the most important task both theoretically and practically—is how
new acquired, not merely how previously obtained information can be evaluated. A
theory of information (knowledge) acquisition is both philosophically and humanly
much more important than a theory of whether or not already achieved information
amounts to knowledge. Discovery is more important than the defense of what you
already know. In epistemology, as in warfare, offense frequently is the best defense.
This point can be illustrated in a variety of ways. For instance, a thinker who
does not acquire any information cannot even be a skeptic, for he or she would
not have anything to be skeptical about. And a skeptic’s doubts must be grounded
on some grasp as to how that information is obtained, unless these doubts are
totally irrational. Epistemology cannot start from the experience of wonder or
doubt. It should start from recognition of where the item of information that we are
wondering about or doubting came from in the first place. Any rational justification
or rational distinction of such wonder or doubt must be based on its ancestry.
534 J. Hintikka
But the context of knowledge acquisition is vital even if the aim of your game is
justification and not discovery. Suppose that a scientist has a reason to think that
one of his or her conclusions is not beyond doubt. What is he or she to do? Will the
scientist try to mine his or her data so as to extract from them grounds for a decision?
Sometimes, perhaps, but in an overwhelming majority of actual scientific situations,
the scientist will ask what further information one should in such circumstances try
to obtain in order to confirm or disconfirm the suspect proposition—for instance,
what experiments it would be advisable to perform or what kinds of observation
one should try to make in order to throw light on the subject matter. Unfortunately
such contexts—or should I say, such language-games—of verification by means of
new information have not received much attention from recent philosophers. They
have been preoccupied with the justification of already acquired knowledge rather
than with the strategies of reaching new knowledge.
Thus we must extend the scope of the interrogative model in such a way
that it enables us to cope with justification and not just pure discovery. What
we need is a rule or rules that authorize the rejection—which is tentative and
may be only temporary—of some of the answers that an inquirer receives. The
terminus technicus for such rejection is bracketing. The possibility of bracketing
widens the scope of epistemological and logical methods tremendously. After this
generalization has been carried out, the logic of interrogative inquiry can serve many
of the same purposes as the different variants of non-monotonic reasoning, and
serve them without the tacit assumptions that often make nonmonotonic reasoning
epistemologically restricted or even philosophically dubious. A telling example
is offered by what is known as circumscriptive reasoning. (See McCarthy 1990.)
536 J. Hintikka
It relies on the assumption that the premises present the reasoner with all the
relevant information, so that the reasoner can assume that they are made true in
the intended models in the simplest possible way. This is an assumption that in
fact can often be made, but it is not always available on other occasions. As every
puzzle fan knows, often a key to the clever reasoning needed to solve a puzzle lies
precisely in being able to imagine circumstances in which the normal expectations
evoked by the specification of the puzzle are not realized. Suppose a puzzle goes
as follows: “Evelyn survived George by more than 80 years, even though she
was born many decades before him. How come?” The explanation is easy if you
disregard the presumption that “George” is a man’s name and “Evelyn” a woman’s.
Evelyn Waugh in fact survived George Eliot by 86 years. Here the solution of the
puzzle depends entirely on going beyond the prima facie information provided by
the putative—in other words, on violating the presuppositions of a circumscriptive
inference. Reasoning by circumscription is enthymemic reasoning. It involves tacit
premises that may be false.
Thus by introducing the idea of bracketing, we can dispense with all modes
of ampliative reasoning. The only rules besides rules of logical inference are the
rules for questioning and the rule allowing bracketing. This may at first look like
a cheap trick serving merely to sweep all the difficulties of epistemic justification
under the rug of bracketing. In reality, what is involved is an important insight.
What is involved is not a denial of the difficulties of justification, but an insight
into their nature as problems. Once a distinction is made between strategic and
definitory rules, it is realized that the definitory rules can only be permissive, telling
what one may do in order to reach knowledge and to justify it. The problem of
justification is a strategic problem. It pertains to what one ought to do in order to
make sure that the results of one’s inquiry are secure. This is to be done by the
double process of disregarding dubious results and confirming the survivors through
further inquiry. The only new permissive rule needed for the purpose is the rule that
allows bracketing.
Thus the question as to which answers to bracket is always at bottom a strategic
problem. It is therefore futile in principle to try to capture the justificatory process by
means of definitory rules of this or that kind. To attempt to do so is a fallacy that in
the last analysis vitiates all the usual “logics” of ampliative reasoning. This mistake
is committed not only by non-monotonic logics but also by inductive logic and
by the current theories of belief revision. Ampliative logics can be of considerable
practical interest and value, but in the ultimate epistemological perspective, they
are but types of enthymemic reasoning, relying on tacit premises quite as much as
circumscriptive reasoning. An epistemologist’s primary task here is not to study the
technicalities of such modes of reasoning, fascinating though they are in their own
right. It is to uncover the tacit premises on which such euthymemic reasoning is in
reality predicated.
Allowing bracketing is among other things important because it makes it
possible to conceive of interrogative inquiry as a model also of the confirmation of
hypotheses and other propositions in the teeth of evidence. The interrogative model
can thus also serve as a general model of the justification of hypotheses. It should
26 Epistemology Without Knowledge and Without Belief 537
in fact be obvious that the processes of discovery and justification cannot be sharply
separated from each other in the practice or in the theory of science. Normally,
a new discovery in science is justified by the very same process—for instance,
by the same experiments—by means of which it was made, or could have been
made And this double duty service of questioning is not due only to the practical
exigencies of “normal science.” It has a firm conceptual basis. This basis is the fact
that information (unlike many Federal appropriations) does not come to an inquirer
earmarked for a special purpose—for instance, for the purpose of discovery rather
than justification. The inquirer may ask a question for this or that proximate purpose
in mind, but there is nothing in the answer that rules out its being used for other
purposes as well.
And such an answer can only be evaluated in terms of its service for both causes.
This is because from game theory we know that in the last analysis, game-like goal-
directed processes can be evaluated only in terms of their strategies, not in terms
of what one can say of particular moves—for instance, what kinds of “warrants”
they might have. As a sports-minded logician might explain the point, evaluating
a player’s skills in a strategic game is in principle like judging a figure-skating
performance rather than keeping score in a football game. In less playful terms, one
can in generally associate utilities (payoffs) only with strategies, not with particular
moves. But since discovery and justification are aspects of the same process, they
have to be evaluated in terms of the different possible strategies that are calculated
to serve both purposes.
When we realize this strategic inseparability of the two processes, we can in
fact gain a better understanding of certain otherwise puzzling features of epistemic
enterprise. For instance, we can now see why it sometimes is appropriate to jump to
a conclusion on the basis of relatively thin evidence. The reason is that finding what
the truth is can help us mightily in our next order of business of finding evidence for
that very truth. Sherlock Holmes has abductively “inferred” that the stablemaster has
stolen the famous racing horse “Silver Blaze” (see the Conan Doyle story with this
title) in order to lame it partially. He still has to confirm this conclusion, however,
and in that process he is guided by the very content of that abductive conclusion—
for instance, in directing his attention to the possibility that the stablemaster had
practiced his laming operation on the innocent sheep grazing nearby. He puts a
question to the shepherd as to whether anything had been amiss with them of late.
“Well, sir, not of much account, but three of them have gone lame, sir.” Without
having already hit on the truth, Holmes could not have thought of asking this
particular question.
If you disregard the strategic angle, the frequent practice of such “jumps to a
conclusion” by scientists may easily lead one to believe that scientific discovery
is not subject to epistemological rules. The result will then be the hypothetico-
deductive model of scientific reasoning, which is hence seen to rest on a fallacious
dismissal of the strategic angle.
Thus we reach a result that is neatly contrary to what were once prevalent
views. It used to be held that discovery cannot be subject to explicit epistemological
theory, whereas justification can. We have found out that not only can discovery be
538 J. Hintikka
approached epistemologically, but that justification cannot in the long run be done
justice to by a theory that does not also cover discovery.
A critical reader might initially have been wondering why contexts of verification
and of other forms of justification do not constitute a third logical home of the notion
of knowledge, besides the contexts of decision-making and information-acquisition.
The answer is that processes of justification can only be considered as aspects of
processes of information-acquisition.
The most general argument for the generality of the interrogative approach relies
only on the assumption that the inquirer’s line of thought can be rationally evaluated.
What is needed for such an evaluation? If no new information is introduced into an
argument by a certain step, then the outcome of that step is a logical consequence
of earlier statements reached in the argument. Hence we are dealing with a logical
inference step that has to be evaluated by the criteria of logical validity. It follows
that interrogative steps are the ones in which new information enters into the
argument. In order to evaluate the step, we must know what the source of this
information is, for the reliability of the information may depend on its source. We
must also know what else might have resulted from the inquirer’s approaching this
particular source in this particular way and with what probabilities. If so, what the
inquirer did can be thought of as a question addressed to that source of information.
Likewise, we must know what other sources of information the inquirer could have
consulted and what the different results might have been. This amounts to knowing
what other sources of answers the inquirer might have consulted. But if all of this
is known, we might as well consider what the inquirer did as a step in interrogative
inquiry.
In an earlier work (Hintikka 1998), I have likened such tacit interrogative steps
to Peircean abductions, which Peirce insists are inferences even though they have
interrogative and conjectural aspects.
The interrogative model can be thought of as having also another kind of
generality—namely, generality with respect to the different kinds of questions.
Earlier epistemic logic was incapable of handling questions more complicated than
simple wh-questions. In particular, it could not specify the logical form of questions
in which the questioned ingredient was apparently within the scope of a universal
quantifier, which in turn was in the scope of a knows that operator. This defect
was eliminated by means of the independence indicator (slash) /. (See Hintikka
2003b.) What characterizes the questioned ingredient is its independence of the
epistemic operator, and such independence is perfectly compatible with its being
dependent on a universal quantifier, which is in turn dependent on the universal
quantifier. In symbols we can now write, for instance, K .8x/ .9y=K/ without having
to face the impossible task of capturing the threefold dependence structure by means
of scopes—that is, by ordering K, .8x/, and .9y/ linearly so as to capture their
dependence relations.
26 Epistemology Without Knowledge and Without Belief 539
In this way, we can treat all wh-questions and all prepositional questions
(involving questions where the two kinds of question ingredients are intermingled).
The question ingredient of prepositional questions turns out to be of the form
.v=K/ and the question ingredient of wh-questions of the form .9x=K/. We can
also close a major gap in our argument so far. The connection between knowledge
and decision-making discussed in section “Knowledge and decision-making” is
apparently subject to the serious objection mentioned in section “The logic of
knowledge and information”. It helps to understand a knowledge operator K only
when it occurs clause-initially, prefixed to a closed sentence. For it is only such
sentences, not all and sundry formulas, that express a proposition that can serve as
a justification of an action. Occurrences of K inside a sentence prefixed to an open
formula cannot be interpreted in the same way. Now we can restrict K to a sentence-
initial position, which eliminates this objection. This also helps to fulfill the promise
made in section “The logic of knowledge and information” of constructing a general
logic for the epistemic operator. Here we are witnessing a major triumph of second-
generation epistemic logic, which relies on the notion of independence. It solves
once and for all the problem of “quantifying in.” It turns out that we do not at
bottom quantify into a context governed by the epistemic operator K. What we in
effect do is to quantify independently of this operator.
Why-questions and how-questions require a special treatment, which nevertheless
is not hard to do. (See, e.g., Hintikka and Halonen 1995.)
The most persuasive argument for the interrogative model nevertheless comes
from the applications of the interrogative viewpoint to different problems in
epistemology. An important role in such applications is played by the presuppo-
sitions of questions and by the presuppositions of answers, better known as their
conclusiveness conditions. Examples of such application are offered in Chaps. 4
and 5 of this volume.
It would take me too far afield here to essay a full-fledged description of the
interrogative model. It is nevertheless easy to make an inventory of the concepts that
are employed in it. In an explicit model, question-answer steps are interspersed with
logical inference steps. Hence the concepts of ordinary deductive logic are needed.
As long as the inquirer can trust all the answers, the concepts that are needed are
the presuppositions of a question, the conclusiveness condition of an answer (which
might be called the “presupposition” of the answer), and the notion of information.
To describe an interrogative argument with uncertain answers (responses), we need
the notion of tentative rejection of an answer, also known as bracketing, and hence
also the converse operation of unbracketing, plus ultimately also the notion of
probability needed to judge the conditions of bracketing and unbracketing.
What is remarkable about this inventory is that it does not include the concept
of knowledge. One can construct a full epistemological theory of inquiry as inquiry
540 J. Hintikka
without ever using the k-word. This observation is made especially significant by
the generality of the interrogative model. As was indicated, not only is it by means
of an interrogative argument that all new information can be thought of as having
been discovered, it is by the same questioning method that its credibility must be
established in principle.
What this means is that by constructing a theory of interrogative inquiry we
apparently can build up a complete theory of epistemology without using the
concept of knowledge. We do not need the notion of knowledge in our theory of
knowledge—or so it seems. We do not need it either in the theory of discovery or in
the theory of justification.
This conclusion might seem to be too strange to be halfway plausible. It is not,
but it needs explanations to be seen in the right perspective.
It might perhaps seem that the concept of knowledge is smuggled into interrog-
ative argumentation by the epistemic logic that has to be used in it. This objection
is in fact a shrewd one. I said earlier that the logic of questions and answers, which
is the backbone of the interrogative model, is part of the logic of knowledge. And
this need to resort to epistemic notions is grounded deeply in the facts of the case. It
might at first seem that in an interrogative inquiry, no epistemic notions are needed.
The presuppositions of questions, questions themselves, and replies to them can
apparently be formulated without using epistemic notions.
However, this first impression turns out to be misleading. The structure of and the
rules governing it cannot be specified without using some suitable epistemic logic.
For one thing, many of the properties of questions and answers are best explained
by reference to what is known as the desideratum of a question. This desideratum
specifies the epistemic state that the questioner wants to be brought about (in the
normal use of questions). For instance, the desideratum of “Who murdered Roger
Ackroyd?” is “I know who murdered Roger Ackroyd.” But the desideratum with its
prima facie knowledge operator is not only a part of a theory of question-answer
sequences, it is a vital ingredient of the very interrogative process.
In particular, it is needed to solve Meno’s problem (Plato 1924) applied to
interrogative inquiry. In the initial formulation of the rules for interrogative inquiry,
it is apparently required that we must know not only the initial premises of inquiry
but also their ultimate conclusion. This seems to mean that we can use interrogative
inquiry only to explain conclusions we have already reached but not to solve
problems—in other words, answer questions by means of questions. But in trying
to answer a question by means of interrogative inquiry, we apparently do not know
what the ultimate conclusion is. We are instead looking for it. How, then, can we
use interrogative inquiry for the purpose of answering questions? The answer is that
we must formulate the logic of inquiry in terms of what the inquirer knows (in the
sense of being informed about) at each stage. Then we can solve Meno’s problem
merely by using the desideratum of the overall question as the ultimate conclusion.
But then we seem to need the notion of knowledge with vengeance.
What is true is that a viable theory of questions and answers will inevitably
involve an intensional operator, and in particular an epistemic operator in a wide
26 Epistemology Without Knowledge and Without Belief 541
sense of the word. However, the epistemic attitude this operator expresses is not
knowledge in any reasonable sense of the word, not just not in the philosopher’s
solemn sense. Here, the results reached in section “The logic of knowledge and
information” are applicable. Before an interrogative inquiry has reached its aim—
that is, knowledge—we are dealing with information that has not yet hardened into
knowledge. It was seen earlier that the logic of such unfinished epistemological
business is indeed a kind of epistemic logic, but a logic of information rather than
of knowledge.
This point is worth elaborating. Indeed the real refutation of the accusation of
having smuggled the concept of knowledge into interrogative inquiry in the form
of the epistemic operator used in questions and answers lies in pointing out the
behavior of this operator in epistemic inquiry. It may sound natural to say that after
having received what is known as a conclusive answer to a question, the inquirer
now knows it. But the notion of knowledge employed here is a far cry from the
notion of knowledge that philosophers have tried to define. It looks much more like
the ugly foreign notion of information. It does not even carry the implication of
truth, for the answer might very well have to be bracketed later in the same inquiry.
By the same token, it does not even presuppose any kind of stable belief in what
is “known.” Instead of saying that after having received a conclusive answer, the
inquirer knows it, it would be more accurate to say that he or she has been informed
about it. Here the advantages of the less deep notion of information are amply in
evidence. Unlike knowledge, information need not be true. If an item of information
offered to me turns out to be false, I can borrow a line from Casablanca and ruefully
say, “I was misinformed.” The epistemic operator needed in the logic of questions
and answers is therefore not a knowledge operator in the usual sense of the term.
My emphasis on this point is a penance, for I now realize that my statements in the
past might have conveyed to my readers a different impression. What is involved in
the semantics of questions and answers is the logic of information, not the logic of
knowledge. This role of the notion of information in interrogative inquiry is indeed
crucial, but it does not involve epistemologists’ usual concept of knowledge at all.
This point is so important as to be worth spelling out even more fully. Each
answer presents the inquirer with a certain item of information, and the distinction
between question-answer steps and logical inferences steps hinges on the question
of whether this information must be old or whether it can be new information. But
it is important to realize that such information does not amount to knowledge.
In an ongoing interrogative inquiry, there are no propositions concerning which
question is ever raised, whether they are known or not. There may be a provisional
presumption that, barring further evidence, the answers that an inquirer receives are
true, but there is not even a whiff of a presumption that they are known. Conversely,
when an answer is bracketed, it does not mean that it is definitively declared not to
be known, for further answers may lead the inquirer to unbracket it. In sum, it is
true in the strictest possible sense that the concept of knowledge in anything like
philosophers’ sense is not used in the course of interrogative inquiry.
542 J. Hintikka
These observations show the place of knowledge in the world of actual inquiry,
and it also shows the only context in which questions about the definition of
knowledge can legitimately be asked. The notion of knowledge may or may not
be a discussion-stopper, but it is certainly an inquiry-stopper.
It might be suspected that this is due to the particular way the interrogative model
is set up. Such a suspicion is unfounded, however. The absence of the concept of
knowledge from ampliative inquiry is grounded in the very nature of the concept
of knowledge. Questions of knowledge do not play any role in the questioning
process itself, only in evaluating its results. For what role was it seen to play in
human life? It was seen as what justifies us to act in a certain way. The concept
of knowledge is therefore related to interrogative inquiry by asking: When has an
interrogative inquiry reached far enough to justify the inquirer’s acting on the basis
of the conclusions it has so far reached? Or, to align this question with the locutions
used earlier, when has the inquiry entitled the inquirer to dismiss the scenarios that
are incompatible with the propositions accepted in the inquiry at the time? This is a
genuine question, and it might seem to bring the concept of knowledge to the center
of the theory of interrogative inquiry.
In a sense it does that. But this sense does not bring the notion of knowledge
back as a concept that can possibly figure in the definitory rules of inquiry. It
brings knowledge back to the sphere of strategic aspects of inquiry. The question
as to whether a conclusion of inquiry has been justified strongly enough for it to
qualify as knowledge is on a par with the question as to whether or not a step
in an inquiry (typically an answer to a question) should perhaps be bracketed
(however tentatively). Both are strategic questions. It is hopeless to try to model
knowledge acquisition in a way that turns these decisions into questions of definitory
correctness.
Any context-free definition of knowledge would amount to a definitory rule in
the game of inquiry—namely, a definitory rule for stopping an inquiry. And once
one realizes that this is what a definition of knowledge would have to do in the
light of the conception of inquiry as inquiry, one realizes that the pursuit of such a
definition is a wild goose chase.
It is important to realize that this conclusion does not only apply to attempted
definitions of knowledge that refer only to the epistemic situation that has been
reached at the putative end stage of the “game” of inquiry. In other words, it does
not apply only to the state of an inquirer’s evidence at the end of an inquiry. It
also applies to definitions in which the entire history of inquiry so far is taken into
account.
This conclusion is worth spelling out more fully. What the conclusion says is
that no matter how we measure the credence of the output of interrogative inquiry,
there is no reason to believe that an answer to the question as to when an inquirer
is justified to act on his or her presumed knowledge depends only on the process of
inquiry through which the inquirer’s information has been obtained independently
of the subject matter of the inquiry. In an old terminology, the criteria of justification
cannot be purely ad argumentum, but must also be ad hoc. Neither the amount of
26 Epistemology Without Knowledge and Without Belief 543
information nor the amount of justification that authorizes an agent to stop his or her
inquiry and act on its results can always be specified independently of the subject
matter—for instance, independently of the seriousness of the consequences of being
wrong about the particular question at hand. And if the justification depends on the
subject matter, then so does the concept of knowledge, because of the roots of our
concept of knowledge in action.
But since the notion of knowledge was seen to be tied to the justification of
acting on the basis of what one knows, the concept of knowledge depends on
the subject matter and not only on the epistemological situation. Accordingly, no
general definition of knowledge in purely epistemological terms is possible.
This point is not a relativistic one as far as the possibility of a priori epistemology
is concerned. If anything, the divorce of knowledge from inquiry underlines the
objectivity of inquiry and its independence of the value aspects of the subject
matter. The fashionable recent emphasis on the alleged value-ladenness of science
is misleading in that it is typically predicated on forgetting or overlooking that the
question as to when the results of scientific inquiry authorize acting on them is
different from questions concerning the methodology of scientific inquiry itself. The
dependence of the criteria of knowledge on subject matter ought to be a platitude. It
is one thing for Einstein to claim that he knew that the special theory of relativity was
true notwithstanding prima facie contrary experimental evidence, and another thing
for a medical researcher to be in a position to claim to know that a new vaccine
is safe enough to be administered to sixty million people. But some relativists
mistakenly take this platitude to be a deep truth about scientific methodology and
its dependence on subject matter. This is a mistake in the light of the fact that the
allegedly value-laden concept of knowledge does not play any role in the actual
process of inquiry.
Here, a comparison with such decision principles as the maximization of
expected utility is instructive. What an inquiry can provide is only the expectations
(probabilities). But they do not alone determine the decision, which depends also
on the decider’s utilities. Hence the criteria of knowing cannot be defined by any
topic-neutral general epistemology alone. But this dependence does not mean that
the probabilities used—misleadingly called “subjective” probabilities—should in
rational decision-making depend on one’s utilities. Decision-making based on such
probability estimates would be paradigmatically irrational.
The influence of subject matter on the notion of knowledge does not imply that
the interrogative process through which putative knowledge has been obtained is
irrelevant for the evaluation of its status. Here lies, in fact, a promising field of work
for applied epistemologists. Material for such work is available in, among many
other places, different kinds of studies of risk-taking. Even though considerations
of strategies do not help us to formulate a topic-neutral definition of knowledge, in
such a topic-sensitive epistemology they are bound to play a crucial role. This is
a consequence of the general fact that in game-like processes, only strategies, not
individual moves, can in the last analysis be evaluated.
544 J. Hintikka
as physicists would call it, which typically is not an entire world. (See here
Hintikka 2003a.) All epistemological inquiry is therefore contextual in this sense
of being relative to a model (scenario or “possible world”). But this does not make
epistemology itself contextual or relative as a scientific theory is made contextual
or relative by the fact that it is inevitably applied to reality system by system.
Hence the impact of the line of thought pursued here is diametrically opposed to
the most common form of contextualism. This form of contextualism aims at the
rejection of global epistemological questions. (See Bonjour 2002, p. 267). For us,
global epistemological questions concern in the first place the nature of interrogative
inquiry, and they are in no sense context-dependent or even dependent on the subject
matter.
agent’s beliefs (or degrees of belief) together with utilities determine his, her, or its
behavior is in need of scrutiny. Above all, beliefs, too, must be thought of as being
formed by means of inquiry.
What I take to be a related point has been expressed by Timothy Williamson
by pointing out that a “reason is needed for thinking that beliefs tend to be true.”
(Quoted from the abstract of his contribution to the conference on “Modalism and
Mentalism in Modern Epistemology,” Copenhagen, January 29–31, 2004.) The
relationship is mediated by the fact that, if I am right, interrogative inquiry is, in
the last analysis, the only way of arriving at true beliefs.
The conclusions reached here have repercussions for the entire research strategies
that should be pursued in epistemology. For instance, there is a major school of
thought that conceives of inquiry as a series of belief revisions. But is this at all
realistic as a description of what good reasoners actually do? Georges Simenon’s
Inspector Maigret is sometimes asked what he believes about the case he is
investigating. His typical answer is: “I don’t believe anything.” And this does not
mean, contrary to what one might first suspect, that Maigret wants only to know
and not to believe and that he has not yet reached that state of knowledge. No—in
one story he says, “The moment for believing or not believing hasn’t come yet.”
(Georges Simenon, Maigret and the Pickpocket, Harcourt Brace Jovanovich, San
Diego, 1985.) It is not that Maigret has not carried his investigation far enough
to be in a position to know something. He has not reached for enough to form a
belief. (The mere possibility of using the locution “belief formation” is instructive.)
In serious inquiry, belief too is a matter whether an inquiry has reached far enough.
Belief, too, concerns the question of when to stop an inquiry. That is the place
of this concept in the framework of the interrogative approach. The difference
between belief and knowledge does not lie merely in the degree of justification
the believer has reached. It does not mean that there is an evaluative component in
knowledge but not in belief. The difference lies in the kind of evaluation involved.
It is much more like the difference between satisfying an agent’s own freely chosen
standards of epistemic confidence and satisfying certain impersonal standards that
are appropriate to the subject matter.
In linguists’ terminology, knowing is an achievement verb. In a way, although
not in a literal sense, believing is in the context of interrogative inquiry likewise an
achievement notion. What should be studied in epistemology is belief-formation and
not only belief change. The notion of belief cannot serve the role as a determinant of
human action that is assigned to it in decision theory if it is not influenced by what
the agent knows. But such influence is usually not studied in decision theory.
One corollary to the results we have reached concerns philosophers’ research
strategies. What we can see now is that the interrogative model is not only a rational
reconstruction of knowledge acquisition, it can also be used as a model of belief
formation. The insight that belief, too, is typically a product of inquiry lends some
renewed interest to the “true belief” type of attempted definitions of knowledge.
What they perhaps succeed in capturing is admittedly not philosophers’ strong sense
of knowledge. But there may be other uses (senses?) of the words knowledge and
knowing that can be approached by means of such characterizations.
548 J. Hintikka
From the point of view we have reached, we can also see some serious problems
about the Bayesian approach to inquiry. (See, e.g., Earman 1992.) This approach
deals with belief change rather than belief-formation. Insofar as we can find any
slot for belief-formation within the Bayesian framework from the point of view of
any simple application of, it is pushed back to the selection of priors. In other words,
it is made entirely a priori, at least locally. This is by itself difficult to implement
in the case of theory-formation (belief-formation) in science. Is it, for instance,
realistic to assume that a scientist can associate an a priori probability with each and
every possible law of nature? And these doubts are reinforced by general conceptual
considerations. Assignments of priors amount to assumptions concerning the world.
What is more, prior probabilities pertain to the entire system (model, “world”) that
the inquirer is investigating bit by bit. How can the inquirer choose such priors on
the basis of his or her limited knowledge of the world? These difficulties might not
be crucial if there existed a Bayesian theory of belief-change that included a study
of changes of priors. Even though such changes have been studied, it seems to me
that their theory has not been developed far enough in the Bayesian framework to
cover all possibilities.
All sorts of difficult questions face us here. For instance, in order to use Bayesian
inference, we need to know the prior probabilities. It seems to be thought generally
that this does not amount to asking very much. This may be true in situations
in which the primary data is reasonably reliable, as in typical scientific contexts.
However, if our evidence is likely to be relatively unreliable, the situation may be
different—for instance, when we are dealing with testimony as our basic form of
26 Epistemology Without Knowledge and Without Belief 549
evidence. I may easily end up asking: Do I really have enough information to make
the guess concerning the world that was seem to be involved in the choice of the
priors?
For one thing, even though the matter is highly controversial, fascinating
evidence to this effect comes from the theory of so-called cognitive fallacies studied
by mathematical psychologists such as Amos Tversky and Daniel Kahneman. (See,
e.g., Kahneman et al. 1982; Piatelli-Palmarini 1994.) These alleged fallacies include
the conjunction fallacy and the base-rate fallacy. As I have suggested in Chap. 9 of
this volume (and in Hintikka 2004), at least in certain “crucial experiment” cases,
the alleged mistakes are not fallacious at all, but rather point to certain subtle but
very real ways in which one’s prior probabilities can (and must) be changed in the
light of new evidence. They do not show that certain fallacious ways of thinking
are hardwired into human beings. Rather, what they show is that Bayesians have
so far failed to master certain subtle modes of ampliative reasoning. Tversky’s
and Kahneman’s Nobel Prize notwithstanding, epistemologists should take a long
critical look at the entire theory of cognitive fallacies.
Here I can only give indications of how to view the cognitive fallacies conun-
drum. Very briefly, in the kind of situation that is at issue in the alleged conjunctive
fallacy, the prior probabilities that one in effect relies on include the degrees of
probability (credibility) assigned to the reports one receives. But that credibility can
not only be affected by suitable new evidence, it can be affected by the very report
itself. If the report shows that the reporter is likely to know more about the subject
matter than another one, it is not fallacious to assign a higher prior probability to
his or her report, even though it is a conjunction of a less credible report and further
information.
In the case of an alleged base-rate fallacy, there is no conceivable mistake present
if the intended sample space consists simply of the different possible courses of
events concerning the crucial event—for example, a traffic accident. Base rates enter
into the picture only when a wider class of courses of events is considered—for
example, all possible courses of events that might have led to the accident. This
means considering a larger sample space. Either sample space can of course be
considered entirely consistently, depending on one’s purposes. A fallacy would
inevitably be committed only if the only legitimate application of our language
and our epistemological methods was to the entire world—in this case, the larger
sample space. But such an exclusive preference of the larger sample space is but
an instance of the one-world assumption, which I have criticized elsewhere. (See
Hintikka 2003a.)
Whither Epistemology?
The moral of the insights we have thus reached is not merely to avoid certain
words in our epistemological theorizing. It calls for rethinking our overall research
strategies in epistemology. And the spirit in which we should do so is perhaps
550 J. Hintikka
References
Austin, J. (1961a) (original 1946). Other minds. In Philosophical papers. Oxford: Clarendon Press,
especially pp. 67–68.
Austin, J. (1961b). Performative utterances. In Philosophical papers. Oxford: Clarendon Press, ch.
10, especially p. 220.
Bonjour, L. (2002). Epistemology: Classical problems and contemporary responses. Lanham:
Rowman & Littlefield.
Cohen, L. J. (1992). Essay on belief and acceptance. Oxford: Clarendon.
Cohen, S. (1998). Contextual solutions to epistemological problems. Australasian Journal of
Philosophy, 76, 289–306.
Collingwood, R. G. (1940). An essay on metaphysics. Oxford: Clarendon Press, especially ch. I,
sec. 5.
Davidson, D. (1996). The folly of trying to define truth. Journal of Philosophy, 93, 263–278.
DeRose, K. (1995). Solving the skeptical problem. Philosophical Review, 104, 1–52.
Earman, J. (1992). Bayes or bust? A critical examination of Bayesian confirmation theory.
Cambridge: MIT Press.
26 Epistemology Without Knowledge and Without Belief 551
Frege, G. (1984) (original 1918–19). Thoughts. In Collected papers. Oxford: Basil Blackwell.
Gadamer, H. -G. (1975) (original 1960). Truth and method. Continuum, New York, especially the
section “Logic of Questions and Answers,” pp. 333–341.
Hintikka, J. (1974). Knowledge and the known. Dordrecht: Reidel.
Hintikka, J. (1998). What is abduction? The fundamental problem of contemporary epistemology.
Transactions of the Charles Peirce Society, 34, 503–533. A revised version, “Abduction—
Inference, Conjecture, or an Answer to a Question,” appears as Chapter 2 in this volume.
Hintikka, J. (1999). Inquiry as inquiry: A logic of scientific discovery. Dordrecht: Kluwer.
Hintikka, J. (2002). Post-tarskian truth. Synthese, 126, 17–36.
Hintikka, J. (2003a), A distinction too few or too many: A vindication of the snalytic vs.
synthetic distinction. In Carol C. Gould (Ed.), Constructivism and practice: Toward a historical
epistemology (pp. 47–74). Lanham: Rowman & Littlefield.
Hintikka, J. (2003b). A second-generation epistemic logic and its general significance. In Vincent
F. Hendricks, et al. (Eds.), Knowledge contributors (pp. 33–56). Dordrecht: Kluwer Academic.
And as Chap. 3 in this volume.
Hintikka, J. (2004). A fallacious fallacy? Synthese, 140, 25–35. And as Chap. 9 in this volume.
Hintikka, J. (forthcoming). Wittgenstein on knowledge and skepticism (Working paper).
Hintikka, J., & Halonen, I. (1995). Semantics and pragmatics for why-questions. Journal of
Philosophy, 92, 636–657.
Husserl, E. (1983) (original 1913). Ideas pertaining to a pure phenomenology. First book. The
Hague: Martinus Nijhoff.
Kahneman, D., Slovic, P., & Tversky, A. (Eds.). (1982). Judgment under uncertainty: Heuristics
and biases. Cambridge: Cambridge University Press.
Kant, I. (1787). Kritik der reinen Vernunft, 2nd ed. (see Preface, p. xiii).
McCarthy, J. (1990). Circumscription—A form of non-monotonic reasoning. Artificial Intelli-
gence, 13, 27–39 and 171–172.
Montague, R. (1974). Formal philosophy. New Haven: Yale University Press.
Piatelli-Palmarini, M. (1994). Inevitable illusions. New York: Wiley.
Plato. (1924). Meno, Plato: With an English translation (Loeb classical library, vol. IV). Cam-
bridge: Harvard University Press.
Ramsey, F. (1978) (original 1929). Knowledge. In Foundations: Essays in philosophy, logic,
mathematics, and economics (pp. 126–127). London: Routledge & Kegan Paul.
Safire, W. (1995). The sleeper spy. New York: Random House.
Shope, R. K. (1983). The analysis of knowing: A decade of research. Princeton: Princeton
University Press.
Tarski, A. (1956) (original 1935). The concept of truth in formalized languages. In Logic,
semantics, metamathematics (pp. 152–278). Oxford: Clarendon Press.
Williams, M. (2001). Problems of knowledge: A critical introduction to epistemology. Oxford:
Oxford University Press.
Chapter 27
Epistemic Operators
Fred I. Dretske
Versions of this paper were read to the philosophy departments of several universities in the United
States and Canada during the year 1969/1970. I profited greatly from these discussions. I wish
especially to thank Paul Dietl who helped me to see a number of points more clearly (perhaps still
not clearly enough in his opinion). Finally, my exchanges with Mr. Don Affeldt were extremely
useful; I am much indebted to him in connection with some of the points made in the latter portions
of the paper.
F.I. Dretske (deceased)
University of Wisconsin, Madison, WI, USA
penetration). I mean it, rather, in a rough, comparative, sense: their degree of pene-
tration is less than that of any of the other operators I shall have occasion to discuss.
We have, then, two ends of the spectrum with examples from both ends. Anything
that falls between these two extremes I shall call a semi-penetrating operator. And
with this definition I am, finally, in a position to express my main point, the point
I wish to defend in the rest of this paper. It is, simply, that all epistemic operators
are semi-penetrating operators. There is both a trivial and a significant side to this
claim. Let me first deal briefly with the trivial aspect.
The epistemic operators I mean to be speaking about when I say that all epistemic
operators are semi-penetrating include the following:
(a) S knows that : : :
(b) S sees (or can see) that : : :
(c) S has reason (or a reason) to believe that : : :
(d) There is evidence to suggest that : : :
(e) S can prove that : : :
(f) S learned (discovered, found out) that : : :
(g) In relation to our evidence it is probable that : : :
Part of what needs to be established in showing that these are all semi-penetrating
operators is that they all possess a degree of penetration greater than that of the
nonpenetrating operators. This is the trivial side of my thesis. I say it is trivial
because it seems to me fairly obvious that if someone knows that P and Q, has a
reason to believe that P and Q, or can prove that P and Q, he thereby knows that Q,
has a reason to believe that Q, or can prove (in the appropriate epistemic sense of
this term) that Q. Similarly, if S knows that Bill and Susan married each other, he
(must) know that Susan got married (married someone). If he knows that P is the
case, he knows that P or Q is the case (where the ‘or’ is understood in a sense which
makes ‘P or Q’ a necessary consequence of ‘P’). This is not a claim about what it
would be appropriate to say, what the person himself thinks he knows or would say
he knows. It is a question, simply, of what he knows. It may not be appropriate to
say to Jim’s wife that you know it was either her husband, Jim, or Harold who sent
the neighbor lady an expensive gift when you know it was Harold. For, although you
do know this, it is misleading to say you know it – especially to jim’s wife.
Let me accept, therefore, without further argument that the epistemic operators
are not, unlike ‘lucky that’, ‘strange that’, ‘a mistake that’ and ‘accidental that’,
nonpenetrating operators. I would like to turn, then, to the more significant side of
my thesis. Before I do, however, I must make one point clear lest it convert my
entire thesis into something as trivial as the first half of it. When we are dealing
with the epistemic operators, it becomes crucial to specify whether the agent in
question knows that P entails Q. That is to say, P may entail Q, and S may know
that P, but he may not know that Q because, and perhaps only because, he fails to
appreciate the fact that P entails Q. When Q is a simple logical consequence of P we
do not expect this to happen, but when the propositions become very complex, or
the relationship between them very complex, this might easily occur. Let P be a set
of axioms, Q a theorem. S’s knowing P does not entail S’s knowing Q just because
556 F.I. Dretske
P entails Q; for, of course, S may not know that P entails Q, may not know that
Q is a theorem. Hence, our epistemic operators will turn out not to be penetrating
because, and perhaps only because, the agents in question are not fully cognizant of
all the implications of what they know to be the case, can see to be the case, have a
reason to believe is the case, and so on. Were we all ideally astute logicians, were
we all fully apprised of all the necessary consequences (supposing this to be a well
defined class) of every proposition, perhaps then the epistemic operators would turn
into fully penetrating operators. That is, assuming that if P entails Q, we know that
P entails Q, then every epistemic operator is a penetrating operator: the epistemic
operators penetrate to all the known consequences of a proposition.
It is this latter, slightly modified, claim that I mean to reject. Therefore, I
shall assume throughout the discussion that when Q is a necessary consequence
of P, every relevant agent knows that it is. I shall be dealing with only the
known consequences (in most cases because they are immediate and obvious
consequences). What I wish to show is that, even under this special restriction, the
epistemic operators are only semi-penetrating.
I think many philosophers would disagree with this contention. The conviction
is that the epistemic worth of a proposition is hereditary under entailment, that
whatever the epistemic worth of P, at least the same value must be accorded the
known consequences of P. This conviction finds expression in a variety of ways.
Epistemic logic: if S knows that P, and knows that P entails Q, then S knows
that Q. Probability theory: if A is probable, and B is a logical consequence of A,
then B is probable (relative to the same evidence, of course). Confirmation theory:
if evidence e tends to confirm hypothesis h, then e indirectly confirms all the
logical consequences of h. But perhaps the best evidence in favor of supposing that
most philosophers have taken the epistemic operators to be fully penetrating is the
way they have argued and the obvious assumptions that structure their arguments.
Anyone who has argued in the following way seems to me to be assuming the thesis
of penetrability (as I shall call it): if you do not know whether Q is true or not, and
P cannot be true unless Q is true, then you (obviously) do not know whether P is
true or not. A slightly more elaborate form of the same argument goes like this: If
S does not know whether or not Q is true, then for all he knows it might be false.
If Q is false, however, then P must also be false. Hence, for all S knows, P may be
false. Therefore, S does not know that P is true. This pattern of argument is sprinkled
throughout the epistemological literature. Almost all skeptical objections trade on it.
S claims to know that this is a tomato. A necessary consequence of its being a tomato
is that it is not a clever imitation which only looks and feels (and, if you will, tastes)
like a tomato. But S does not know that it is not a clever imitation that only looks and
feels (and tastes) like a tomato. (I assume here that no one is prepared to argue that
anything that looks, feels, and tastes like a tomato to S must be a tomato.) Therefore,
S does not know that this is a tomato. We can, of course, reply with G. E. Moore that
we certainly do know it is a tomato (after such an examination) and since tomatoes
are not imitations we know that this is not an imitation. It is interesting to note that
this reply presupposes the same principle as does the skeptical objection: they both
assume that if S knows that this is a P, and knows that every P is a Q, then S knows
27 Epistemic Operators 557
that this is a Q. The only difference is that the skeptic performs a modus tollens,
Moore a modus ponens. Neither questions the principle itself.
Whether it be a question of dreams or demons, illusions or fakes, the same
pattern of argument emerges. If you know this is a chair, you must know that
you are not dreaming (or being deceived by a cunning demon), since its being a
(real) chair entails that it is not simply a figment of your own imagination. Such
arguments assume that the epistemic operators, and in particular the operator ‘to
know’, penetrate to all the known consequences of a proposition. If these operators
were not penetrating, many of these objections might be irrelevant. Consider the
following exchange:
S: How strange. There are tomatoes growing in my apple tree.
K: That isn’t strange at all. Tomatoes, after all, are physical objects and what is so
strange about physical objects growing in your apple tree?
What makes K’s reply so silly is that he is treating the operator ‘strange that’
as a fully penetrating operator: it cannot be strange that there are tomatoes growing
in the apple tree unless the consequences of this (e.g., there are objects growing
in your apple tree) are also strange. Similarly, it may not be at all relevant to
object to someone who claims to know that there are tomatoes in the apple tree
that he does not know, cannot be absolutely sure, that there are really any material
objects. Whether or not this is a relevant objection will depend on whether or not
this particular consequence of there being tomatoes in the apple tree is one of the
consequences to which the epistemic operators penetrate. What I wish to argue
in the remainder of this paper is that the traditional skeptical arguments exploit
precisely those consequences of a proposition to which the epistemic operators
do not penetrate, precisely those consequences which distinguish the epistemic
operators from the fully penetrating operators.
In support of this claim let me begin with some examples which are, I think,
fairly intuitive and then turn to some more problematic cases. I shall begin with
the operator ‘reason to believe that’ although what I have to say could be said as
well with any of them. This particular operator has the added advantage that if it
can be shown to be only semi-penetrating, then many accounts of knowledge, those
which interpret it as a form of justified true belief, would also be committed to
treating ‘knowing that’ as a semi-penetrating operator. For, presumably, ‘knowing
that’ would not penetrate any deeper than one’s ‘reasons for believing that’.
Suppose you have a reason to believe that the church is empty. Must you have a
reason to believe that it is a church? I am not asking whether you generally have such
a reason. I am asking whether one can have a reason to believe the church empty
without having a reason to believe that it is a church which is empty. Certainly your
reason for believing that the church is empty is not itself a reason to believe it is
a church; or it need not be. Your reason for believing the church to be empty may
be that you just made a thorough inspection of it without finding anyone. That is a
good reason to believe the church empty. Just as clearly, however, it is not a reason,
much less a good reason, to believe that what is empty is a church. The fact is, or so
it seems to me, I do not have to have any reason to believe it is a church. Of course,
558 F.I. Dretske
I would never say the church was empty, or that I had a reason to believe that the
church was empty, unless I believed, and presumably had a reason for so believing,
that it was a church which was empty, but this is a presumed condition of my saying
something, not of my having a reason to believe something. Suppose I had simply
assumed (correctly as it turns out) that the building was a church. Would this show
that I had no reason to believe that the church was empty?
Suppose I am describing to you the “adventures” of my brother Harold. Harold
is visiting New York for the first time, and he decides to take a bus tour. He
boards a crowded bus and immediately takes the last remaining scat. The little old
lady he shouldered aside in reaching his seat stands over him glowering. Minutes
pass. Finally, realizing that my brother is not going to move, she sighs and moves
resignedly to the back of the bus. Not much of an adventure, but enough, I hope,
to make my point. I said that the little old lady realized that my brother would not
move. Does this imply that she realized that, or knew that, it was my brother who
refused to move? Clearly not. We can say that S knows that X is Y without implying
that S knows that it is X which is Y. We do not have to describe our little old lady as
knowing that the man or the person would not move. We can say that she realized
that, or knew that, my brother would not move (minus, of course, this pattern of
emphasis), and we can say this because saying this does not entail that the little old
lady knew that, or realized that, it was my brother who refused to move. She knew
that my brother would not move, and she knew this despite the fact that she did not
know something that was necessarily implied by what she did know–viz., that the
person who refused to move was my brother.
I have argued elsewhere that to see that A is B, that the roses are wilted for
example, is not to see, not even to be able to see, that they are roses which are
wilted.1 To see that the widow is limping is not to see that it is a widow who is
limping. I am now arguing that this same feature holds for all epistemic operators. I
can know that the roses are wilting without knowing that they are roses, know that
the water is boiling without knowing that it is water, and prove that the square root
of 2 is smaller than the square root of 3 and, yet, be unable to prove what is entailed
by this–viz., that the number 2 has a square root.
The general point may be put this way: there are certain presuppositions
associated with a statement. These presuppositions, although their truth is entailed
by the truth of the statement, are not part of what is operated on when we operate on
the statement with one of our epistemic operators. The epistemic operators do not
penetrate to these presuppositions. For example, in saying that the coffee is boiling
I assert that the coffee is boiling, but in asserting this I do not assert that it is coffee
which is boiling. Rather, this is taken for granted, assumed, presupposed, or what
have you. Hence, when I say that I have a reason to believe that the coffee is boiling,
I am not saying that this reason applies to the fact that it is coffee which is boiling.
This is still presupposed. I may have such a reason, of course, and chances are good
1
Seeing and Knowing (Chicago: University Press, 1969), pp. 93–112, and also “Reasons and
Consequences,” Analysis (April 1968).
27 Epistemic Operators 559
that I do have such a reason or I would not have referred to what I believe to be
boiling as coffee, but to have a reason to believe the coffee is boiling is not, thereby,
to have a reason to believe it is coffee which is boiling.
One would expect that if this is true of the semi-penetrating operators, then it
should also be true of the nonpenetrating operators. They also should fail to reach
the presuppositions. This is exactly what we find. It may be accidental that the two
trucks collided, but not at all accidental that it was two trucks that collided. Trucks
were the only vehicles allowed on the road that day, and so it was not at all accidental
or a matter of chance that the accident took place between two trucks. Still, it was
an accident that the two trucks collided. Or suppose Mrs. Murphy mistakenly gives
her cat some dog food. It need not be a mistake that she gave the food to her cat, or
some food to a cat. This was intentional. What was a mistake was that it was dog
food that she gave to her cat.
Hence, the first class of consequences that differentiate the epistemic operators
from the fully penetrating operators is the class of consequences associated with
the presuppositions of a proposition. The fact that the epistemic operators do not
penetrate to these presuppositions is what helps to make them semi-penetrating.
And this is an extremely important fact. For it would appear that if this is true, then
to know that the flowers are wilted I do not have to know that they are flowers (which
are wilted) and, therefore, do not have to know all those consequences which follow
from the fact that they are flowers, real flowers, which I know to be wilted.
Rather than pursue this line, however, I would like to turn to what I consider to
be a more significant set of consequences– “more significant” because they are the
consequences that are directly involved in most skeptical arguments. Suppose we
assert that x is A. Consider some predicate, ‘B’, which is incompatible with A, such
that nothing can be both A and B. It then follows from the fact that x is A that x
is not B. Furthermore, if we conjoin B with any other predicate, Q, it follows from
the fact that x is A that x is not(B and Q). I shall call this type of consequence
a contrast consequence, and I am interested in a particular subset of these; for
I believe the most telling skeptical objections to our ordinary knowledge claims
exploit a particular set of these contrast consequences. The exploitation proceeds as
follows: someone purports to know that x is A, that the wall is red, say. The skeptic
now finds a predicate ‘B’ that is incompatible with ‘A’. In this particular example
we may let ‘B’ stand for the predicate ‘is white’. Since ‘x is red’ entails ‘x is not
white’ it also entails that x is not-(white and Q) where ‘Q’ is any predicate we care
to select. Therefore, the skeptic selects a ‘Q’ that gives expression to a condition
or circumstance under which a white wall would appear exactly the same as a red
wall. For simplicity we may let ‘Q’ stand for: ‘cleverly illuminated to look red’.
We now have this chain of implications: ‘x is red’ entails ‘x is not white’ entails
‘x is not white cleverly illuminated to look red’. If ‘knowing that’ is a penetrating
operator, then if anyone knows that the wall is red he must know that it is not white
cleverly illuminated to look red. (I assume here that the relevant parties know that if
x is red, it cannot be white made to look red.) He must know that this particular
contrast consequence is true. The question is: do we, generally speaking, know
anything of the sort? Normally we never take the trouble to check the lighting.
560 F.I. Dretske
We seldom acquire any special reasons for believing the lighting normal although
we can talk vaguely about there being no reason to think it unusual. The fact is
that we habitually take such matters for granted, and although we normally have
good reasons for making such routine assumptions, I do not think these reasons are
sufficiently good, not without special precautionary checks in the particular case,
to say of the particular situation we are in that we know conditions are normal. To
illustrate, let me give you another example–a silly one, but no more silly than a great
number of skeptical arguments with which we are all familiar. You take your son to
the zoo, see several zebras, and, when questioned by your son, tell him they are
zebras. Do you know they are zebras? Well, most of us would have little hesitation
in saying that we did know this. We know what zebras look like, and, besides, this is
the city zoo and the animals are in a pen clearly marked “Zebras.” Yet, something’s
being a zebra implies that it is not a mule and, in particular, not a mule cleverly
disguised by the zoo authorities to look like a zebra. Do you know that these animals
are not mules cleverly disguised by the zoo authorities to look like zebras? If you are
tempted to say “Yes” to this question, think a moment about what reasons you have,
what evidence you can produce in favor of this claim. The evidence you had for
thinking them zebras has been effectively neutralized, since it does not count toward
their not being mules cleverly disguised to look like zebras. Have you checked with
the zoo authorities? Did you examine the animals closely enough to detect such a
fraud? You might do this, of course, but in most cases you do nothing of the kind.
You have some general uniformities on which you rely, regularities to which you
give expression by such remarks as, “That isn’t very likely” or “Why should the
zoo authorities do that?” Granted, the hypothesis (if we may call it that) is not very
plausible, given what we know about people and zoos. But the question here is not
whether this alternative is plausible, not whether it is more or less plausible than that
there are real zebras in the pen, but whether you know that this alternative hypothesis
is false. I don’t think you do. In this I agree with the skeptic. I part company with
the skeptic only when he concludes from this that, therefore, you do not know that
the animals in the pen are zebras. I part with him because I reject the principle he
uses in reaching this conclusion–the principle that if you do not know that Q is true,
when it is known that P entails Q, then you do not know that P is true.
What I am suggesting is that we simply admit that we do not know that some of
these contrasting “skeptical alternatives” are not the case, but refuse to admit that
we do not know what we originally said we knew. My knowing that the wall is red
certainly entails that the wall is red; it also entails that the wall is not white and, in
particular, it entails that the wall is not white cleverly illuminated to look red. But
it does not follow from the fact that I know that the wall is red that I know that it
is not white cleverly illuminated to look red. Nor does it follow from the fact that
I know that those animals are zebras that I know that they are not mules cleverly
disguised to look like zebras. These are some of the contrast consequences to which
the epistemic operators do not penetrate.
Aside from asserting this, what arguments can be produced to support it? I
could proceed by multiplying examples, but I do not think that examples alone will
support the full weight of this view. The thesis itself is sufficiently counterintuitive
27 Epistemic Operators 561
2
Unlike our other operators, this one does not have a propositional operand. Despite the rather
obvious differences between this case and the others, I still think it useful to call attention to its
analogous features.
3
One must be careful not to confuse sentential conjunction with similar sounding expressions
involving a relationship between two things. For example, to say Bill and Susan got married (if it
is intended to mean that they married each other), although it entails that Susan got married, does
not do so by simplification. ‘Reason why’ penetrates through logical simplification, not through
the type of entailment represented by these two propositions. That is, the reason they got married
is that they loved each other; that they loved each other is not the reason Susan got married.
562 F.I. Dretske
me to do something, but if I had a reason to quit my job, it does follow that I had a
reason to do something. And if the grass would not be green unless it had plenty of
sunshine and water, it follows that it would not be green unless it had water.
Furthermore, the similarities persist when one considers the presuppositional
consequences. I argued that the epistemic operators fail to penetrate to the presup-
positions; the above three operators display the same feature. In explaining why he
takes his lunch to work, I do not (or need not) explain why he goes to work or why
he works at all. The explanation may be obvious in some cases, of course, but the
fact is I need not be able to explain why he works (he is so wealthy) to explain why
he takes his lunch to work (the cafeteria food is so bad). The reason why the elms on
Main Street are dying is not the reason there are elms on Main Street. I have a reason
to feed my cat, no reason (not, at least, the same reason) to have a cat. And although
it is quite true that he would not have known about our plans if the secretary had not
told him, it does not follow that he would not have known about our plans if someone
other than the secretary had told him. That is, (He knew about our plans) ) (The
secretary told him) even though it is not true that (He knew about our plans) ) (It
was the secretary who told him). Yet, the fact that it was the secretary who told him
is (I take it) a presuppositional consequence of the fact that the secretary told him.
Similarly, if George is out to set fire to the first empty building he finds, it may be
true to say that George would not have set fire to the church unless it (the church)
was empty, yet false to say that George would not have set fire to the church unless
it was a church.
I now wish to argue that these three operators do not penetrate to a certain set of
contrast consequences. To the extent that the epistemic operators are similar to these
operators, we may then infer, by analogy, that they also fail to penetrate to certain
contrast consequences. This is, admittedly, a weak form of argument, depending as
it does on the grounds there are for thinking that the above three operators and the
epistemic operators share the same logic in this respect. Nonetheless, the analogy is
revealing. Some may even find it persuasive.4
(A) The pink walls in my living room clash with my old green couch. Recognizing
this, I proceed to paint the walls a compatible shade of green. This is the reason
I have, and give, for painting the walls green. Now, in having this explanation
for why I painted the walls green, I do not think I have an explanation for two
other things, both of which are entailed by what I do have an explanation for. I
have not explained why I did not, instead of painting the walls green, buy a new
couch or cover the old one with a suitable slipcover. Nor have I explained why,
4
I think that those who are inclined to give a causal account of knowledge should be particularly
interested in the operator ‘R ) : : : ’ since, presumably, it will be involved in many instances of
knowledge (“many” not “all,” since one might wish to except some form of immediate knowledge-
knowledge of one’s own psychological state-from the causal account). If this operator is only
semipenetrating, then any account of knowledge that relies on the relationship expressed by this
operator (as I believe causal accounts must) will be very close to giving a “semi-penetrating”
account of ‘knowing that’.
27 Epistemic Operators 563
instead of painting the walls green, I did not paint them white and illuminate
them with green light. The same effect would have been achieved, the same
purpose would have been served, albeit at much greater expense.
I expect someone to object as follows: although the explanation given for painting
the walls green does not, by itself, explain why the couch was not changed instead,
it nonetheless succeeds as an explanation for why the walls were painted green only
in so far as there is an explanation for why the couch was not changed instead. If
there is no explanation for why I did not change the couch instead, there has been
no real, no complete, examination for why the walls were painted green.
I think this objection wrong. I may, of course, have an explanation for why I did
not buy a new couch: I love the old one or it has sentimental value. But then again I
may not. It just never occurred to me to change the couch; or (if someone thinks that
its not occurring to me is an explanation of why I did not change the couch) I may
have thought of it but decided, for what reasons (if any) I cannot remember, to keep
the couch and paint the walls. That is to say, I cannot explain why I did not change
the couch. I thought of it but I did not do it. I do not know why. Still, I can tell you
why I painted the walls green. They clashed with the couch.
(B) The fact that they are selling Xs so much more cheaply here than elsewhere
may be a reason to buy your Xs here, but it certainly need not be a reason to
do what is a necessary consequence of buying your Xs here–viz., not stealing
your Xs here.
(C) Let us suppose that S is operating in perfectly normal circumstances, a set of
circumstances in which it is true to say that the wall he sees would not (now)
look green to him unless it was green (if it were any other color it would look
different to him). Although we can easily imagine situations in which this is
true, it does not follow that the wall would not (now) look green to S if it were
white cleverly illuminated to look green. That is,
(i) The wall looks green (to S) ) the wall is green.
(ii) The wall is green entails the wall is not white cleverly illuminated to look
green (to S).
are both true; yet, it is not true that
(iii) The wall looks green (to S) ) the wall is not white cleverly illuminated to
look green (to S).
There are dozens of examples that illustrate the relative impenetrability of this
operator. We can truly say that A and B would not have collided if B had not swerved
at the last moment and yet concede that they would have collided without any swerve
on the part of B if the direction in which A was moving had been suitably altered in
the beginning.5
5
The explanation for why the modal relationship between R and P (R ) P) fails to carry over
(penetrate) to the logical consequences of P (i.e., R ) Q where Q is a logical consequence of P)
564 F.I. Dretske
The structure of these cases is virtually identical with that which appeared in the
case of the epistemic operators, and I think by looking just a little more closely at
this structure we can learn something very fundamental about our class of epistemic
operators and, in particular, about what it means to know something. If I may put
it this way, within the context of these operators no fact is an island. If we are
simply rehearsing the facts, then we can say that it is a fact that Brenda did not
take any dessert (though it was included in the meal). We can say this without a
thought about what sort of person Brenda is or what she might have done had she
ordered dessert. However, if we put this fact into, say, an explanatory context, if
we try to explain this fact, it suddenly appears within a network of related facts,
a network of possible alternatives which serve to define what it is that is being
explained. What is being explained is a function of two things-not only the fact
(Brenda did not order any dessert), but also the range of relevant alternatives. A
relevant alternative is an alternative that might have been realized in the existing
circumstances if the actual state of affairs had not materialized.6 When I explain
why Brenda did not order any dessert by saying that she was full (was on a diet, did
not like anything on the dessert menu), I explain why she did not order any dessert
rather than, as opposed to, or instead of ordering some dessert and eating it. It is
this competing possibility which helps to define what it is that I am explaining when
I explain why Brenda did not order any dessert. Change this contrast, introduce a
different set of relevant alternatives, and you change what it is that is being explained
and, therefore, what counts as an explanation, even though (as it were) the same
fact is being explained. Consider the following contrasts: ordering some dessert and
throwing it at the waiter; ordering some dessert and taking it home to a sick friend.
is to be found in the set of circumstances that are taken as given, or held fixed, in subjunctive
conditionals. There are certain logical consequences of P which, by bringing in a reference to
circumstances tacitly held fixed in the original subjunctive (R ) P), introduce a possible variation
in these circumstances and, hence, lead to a different framework of fixed conditions under which
to assess the truth of R @ Q. For instance, in the last example in the text, when it is said that A and
B would not have collided if B had not swerved at the last moment, the truth of this conditional
clearly takes it as given that A and B possessed the prior trajectories they in fact had on the occasion
in question. Given certain facts, including the fact that they were traveling in the direction they
were, they would not have collided if B had not swerved. Some of the logical consequences of the
statement that B swerved do not, however, leave these conditions unaltered–e.g., B did not move
in a perfectly straight line in a direction 20 counterclockwise to the direction it actually moved.
This consequence “tinkers” with the circumstances originally taken as given (held fixed), and a
failure of penetration will usually arise when this occurs. It need not be true that A and B would
not have collided if B had moved in a perfectly straight line in a direction 20 counterclockwise to
the direction it actually moved.
6
I am aware that this characterization of “a relevant alternative” is not, as it stands, very
illuminating. I am not sure I can make it more precise. What I am after can be expressed this way:
if Brenda had ordered dessert, she would not have thrown it at the waiter, stuffed it in her shoes, or
taken it home to a sick friend (she has no sick friend). These are not alternatives that might have
been realized in the existing circumstances if the actual state of affairs had not materialized. Hence,
they are not relevant alternatives. In other words, the ‘might have been’ in my characterization of
a relevant alternative will have to be unpacked in terms of counterfactuals.
27 Epistemic Operators 565
With these contrasts none of the above explanations are any longer explanations of
why Brenda did not order dessert. Anyone who really wants to know why Brenda did
not order dessert and throw it at the waiter will not be helped by being told that she
was full or on a diet. This is only to say that, within the context of explanation and
within the context of our other operators, the proposition on which we operate must
be understood as embedded within a matrix of relevant alternatives. We explain why
P, but we do so within a framework of competing alternatives A, B, and C. Moreover,
if the possibility D is not within this contrasting set, not within this network of
relevant alternatives, then even though not-D follows necessarily from the fact, P,
which we do explain, we do not explain why not-D. Though the fact that Brenda did
not order dessert and throw it at the waiter follows necessarily from the fact that she
did not order dessert (the fact that is explained), this necessary consequence is not
explained by the explanation given. The only contrast consequences to which this
operator penetrates are those which figured in the original explanation as relevant
alternatives.
So it is with our epistemic operators. To know that x is A is to know that x is
A within a framework of relevant alternatives, B, C, and D. This set of contrasts,
together with the fact that x is A, serve to define what it is that is known when
one knows that x is A. One cannot change this set of contrasts without changing
what a person is said to know when he is said to know that x is A. We have subtle
ways of shifting these contrasts and, hence, changing what a person is said to know
without changing the sentence that we use to express what he knows. Take the fact
that Lefty killed Otto. By changing the emphasis pattern we can invoke a different
set of contrasts and, hence, alter what it is that S is said to know when he is said to
know that Lefty killed Otto. We can say, for instance, that S knows that Lefty killed
Otto. In this case (and I think this is the way we usually hear the sentence when
there is no special emphasis) we are being told that S knows the identity of Otto’s
killer, that it was Lefty who killed Otto. Hence, we expect S’s reasons for believing
that Lefty killed Otto to consist in facts that single out Lefty as the assailant rather
than George, Mike, or someone else. On the other hand, we can say that S knows
that Lefty killed Otto. In this case we are being told that S knows what Lefty did to
Otto; he killed him rather than merely injuring him, killed him rather than merely
threatening him, etc. A good reason for believing that Lefty killed Otto (rather than
merely injuring him) is that Otto is dead, but this is not much of a reason, if it is a
reason at all, for believing that Lefty killed Otto. Changing the set of contrasts (from
‘Lefty rather than George or Mike’ to ‘killed rather than injured or threatened’) by
shifting the emphasis pattern changes what it is that one is alleged to know when
one is said to know that Lefty killed Otto.7 The same point can be made here as we
made in the case of explanation: the operator will penetrate only to those contrast
consequences which form part of the network of relevant alternatives structuring
7
The same example works nicely with the operator ‘R) : : : ’. It may be true to say that Otto would
not be dead unless Lefty killed him (unless what Lefty did to him was kill him) without its being
true that Otto would not be dead unless Lefty killed him (unless it was Lefty who killed him).
566 F.I. Dretske
the original context in which a knowledge claim was advanced. just as we have not
explained why Brenda did not order some dessert and throw it at the waiter when we
explained why she did not order some dessert (although what we have explained-
her not ordering any dessert-entails this), so also in knowing that Lefty killed Otto
(knowing that what Lefty did to Otto was kill him) we do not necessarily (although
we may) know that Lefty killed Otto (know that it was Lefty who killed Otto). Recall
the example of the little old lady who knew that my brother would not move without
knowing that it was my brother who would not move.
The conclusions to be drawn are the same as those in the case of explanation.
Just as we can say that within the original setting, within the original framework
of alternatives that defined what we were trying to explain, we did explain why
Brenda did not order any dessert, so also within the original setting, within the set
of contrasts that defined what it was we were claiming to know, we did know that
the wall was red and did know that it was a zebra in the pen.
To introduce a novel and enlarged set of alternatives, as the skeptic is inclined to
do with our epistemic claims, is to exhibit consequences of what we know, or have
reason to believe, which we may not know, may not have a reason to believe; but it
does not show that we did not know, did not have a reason to believe, whatever it
is that has these consequences. To argue in this way is, I submit, as much a mistake
as arguing that we have not explained why Brenda did not order dessert (within the
original, normal, setting) because we did not explain why she did not order some
and throw it at the waiter.
Chapter 28
Elusive Knowledge
David Lewis
We know a lot. I know what food penguins eat. I know that phones used to ring,
but nowadays squeal, when someone calls up. I know that Essendon won the 1993
Grand Final. I know that here is a hand, and here is another.
We have all sorts of everyday knowledge, and we have it in abundance. To doubt
that would be absurd. At any rate, to doubt it in any serious and lasting way would
be absurd; and even philosophical and temporary doubt, under the influence of
argument, is more than a little peculiar. It is a Moorean fact that we know a lot.
It is one of those things that we know better than we know the premises of any
philosophical argument to the contrary.
Besides knowing a lot that is everyday and trite, I myself think that we know
a lot that is interesting and esoteric and controversial. We know a lot about
things unseen: tiny particles and pervasive fields, not to mention one another’s
underwear. Sometimes we even know what an author meant by his writings. But
on these questions, let us agree to disagree peacefully with the champions of “post-
knowledgeism.” The most trite and ordinary parts of our knowledge will be problem
enough.
For no sooner do we engage in epistemology – the systematic philosophical
examination of knowledge – than we meet a compelling argument that we know
next to nothing. The sceptical argument is nothing new or fancy. It is just this: it
seems as if knowledge must be by definition infallible. If you claim that S knows
that P, and yet you grant that S cannot eliminate a certain possibility in which not-
P, it certainly seems as if you have granted that S does not after all know that P.
an investigation that destroys its own subject matter. If so, the sceptical argument
might be flawless, when we engage in epistemology-and only then!1
If you start from the ancient idea that justification is the mark that distinguishes
knowledge from mere opinion (even true opinion), then you well might conclude
that ascriptions of knowledge are context-dependent because standards for adequate
justification are context-dependent. As follows: opinion, even if true, deserves the
name of knowledge only if it is adequately supported by reasons; to deserve that
name in the especially demanding context of epistemology, the arguments from
supporting reasons must be especially watertight; but the special standards of
justification that this special context demands never can be met (well, hardly ever).
In the strict context of epistemology we know nothing, yet in laxer contexts we
know a lot.
But I myself cannot subscribe to this account of the context-dependence of
knowledge, because I question its starting point. I don’t agree that the mark of
knowledge is justification.2 First, because justification is not sufficient: your true
opinion that you will lose the lottery isn’t knowledge, whatever the odds. Suppose
you know that it is a fair lottery with one winning ticket and many losing tickets,
and you know how many losing tickets there are. The greater the number of losing
tickets, the better is your justification for believing you will lose. Yet there is no
number great enough to transform your fallible opinion into knowledge – after all,
you just might win. No justification is good enough – or none short of a watertight
deductive argument, and all but the sceptics will agree that this is too much to
demand.3
Second, because justification is not always necessary. What (non-circular)
argument supports our reliance on perception, on memory, and on testimony?4
And yet we do gain knowledge by these means. And sometimes, far from having
supporting arguments, we don’t even know how we know. We once had evidence,
drew conclusions, and thereby gained knowledge; now we have forgotten our
1
The suggestion that ascriptions of knowledge go false in the context of epistemology is to be found
in Barry Stroud, “Understanding Human Knowledge in General” in Marjorie Clay and Keith Lehrer
(eds.), Knowledge and Skepticism (Boulder: Westview Press, 1989); and in Stephen Hetherington,
“Lacking Knowledge and lustification by Theorising About Them” (lecture at the University of
New South Wales, August 1992). Neither of them tells the story just as I do, however it may be
that their versions do not conflict with mine.
2
Unless, like some, we simply define “justification” as “whatever it takes to turn true opinion into
knowledge” regardless of whether what it takes turns out to involve argument from supporting
reasons.
3
The problem of the lottery was introduced in Henry Kyburg, Probability and the Logic of Rational
Belief (Middletown, CT: Wesleyan University Press, 1961), and in Carl Hempel, “Deductive-
Nomological vs. Statistical Explanation” in Herbert Feigl and Grover Maxwell (eds.), Minnesota
Studies in the Philosophy of Science, Vol. II (Minneapolis: University of Minnesota Press, (962).
It has been much discussed since, as a problem both about knowledge and about our everyday,
non-quantitative concept of belief.
4
The case of testimony is less discussed than the others; but see C. A. J. Coady, Testimony: A
Philosophical Study (Oxford: Clarendon Press, 1992) pp. 79–129.
570 D. Lewis
reasons, yet still we retain our knowledge. Or we know the name that goes with the
face, or the sex of the chicken, by relying on subtle visual cues, without knowing
what those cues may be.
The link between knowledge and justification must be broken. But if we break
that link, then it is not – or not entirely, or not exactly – by raising the standards of
justification that epistemology destroys knowledge. I need some different story.
To that end, I propose to take the infallibility of knowledge as my starting point.5
Must infallibilist epistemology end in scepticism? Not quite. Wait and see. Anyway,
here is the definition. Subject S knows proposition P iff (that is, if and only if) P
holds in every possibility left uneliminated by S’s evidence; equivalently, iff S’s
evidence eliminates every possibility in which not-P.
The definition is short, the commentary upon it is longer. In the first place, there
is the proposition, P. What I choose to call “propositions” are individuated coarsely,
by necessary equivalence. For instance, here is only one necessary proposition.
It holds in every possibility; hence in every possibility left uneliminated by S’s
evidence, no matter who S may be and no matter what his evidence may be. So the
necessary proposition is known always and everywhere. Yet this known proposition
may go unrecognised when presented in impenetrable linguistic disguise, say as
the proposition that every even number is the sum of two primes. Likewise, the
known proposition that I have two hands may go unrecognised when presented as
the proposition that the number of my ands is the least number n such that every even
number is the sum of n primes. (Or if you doubt the necessary existence of numbers,
switch to an example involving equivalence by logic alone.) These problems of
disguise shall not concern us here. Our topic is modal, not hyperintensional,
epistemology.6
Next, there are the possibilities. We needn’t enter here into the question whether
these are concreta, abstract constructions, or abstract simples. Further, we needn’t
decide whether they must always be maximally specific possibilities, or whether
they need only be specific enough for the purpose at hand. A possibility will be
specific enough if it cannot be split into subcases in such a way that anything we
have said about possibilities, or anything we are going to say before we are done,
applies to some subcases and not to others. For instance, it should never happen that
proposition P holds in some but not all subcases; or that some but not all sub-cases
are eliminated by S’s evidence.
But we do need to stipulate that they are not just possibilities as to how the whole
world is; they also include possibilities as to which part of the world is oneself,
and as to when it now is. We need these possibilities de se et nunc because the
5
I follow Peter Unger, Ignorance: A Case for Skepticism (New York: Oxford University Press,
1975). But I shall not let him lead me into scepticism.
6
See Robert Stalnaker. Inquiry (Cambridge, MA: MIT Press, 1984) pp. 59–99.
28 Elusive Knowledge 571
7
See my ‘Attitudes De Dicta and De Se’, The Philosophical Review 88 (1979) pp. 513–543; and
R. M. Chisholm, “The Indirect Reflexive” in C. Diamond and J. Teichman (eds.), Intention and
Intentionality: Essays in Honour of G. E. M. Anscomhe (Brighton: Harvester, 1979).
572 D. Lewis
8
Peter Unger, Ignorance, chapter II. I discuss the case, and briefly foreshadow the present paper,
in my “Scorekeeping in a Language Game,” Journal of Philosophical Logic 8 (1979) pp. 339–359,
esp. pp. 353–355.
28 Elusive Knowledge 573
in which not-Q. To close the circle: we ignore just those possibilities that falsify our
presuppositions. Proper presupposition corresponds, of course, to proper ignoring.
Then S knows that P iff S’s evidence eliminates every possibility in which not-P –
Psst! – except for those possibilities that conflict with our proper presuppositions.9
The rest of (modal) epistemology examines the sotto voce proviso. It asks: what
may we properly presuppose in our ascriptions of knowledge? Which of all the
uneliminated alternative possibilities may not properly be ignored? Which ones are
the ““relevant alternatives”? – relevant, that is, to what the subject does and doesn’t
know?10 In reply, we can list several rules.11 We begin with three prohibitions: rules
to tell us what possibilities we may not properly ignore.
First, there is the Rule of Actuality. The possibility that actually obtains is
never properly ignored; actuality is always a relevant alternative; nothing false may
properly be presupposed. It follows that only what is true is known, wherefore
we did not have to include truth in our definition of knowledge. The rule is
“externalist” – the subject himself may not be able to tell what is properly ignored.
In judging which of his ignorings are proper, hence what he knows, we judge his
success in knowing-not how well he tried.
When the Rule of Actuality tells us that actuality may never be properly ignored,
we can ask: whose actuality? Ours, when we ascribe knowledge or ignorance to
others? Or the subject’s? In simple cases, the question is silly. (In fact, it sounds like
the sort of pernicious nonsense we would expect from someone who mixes up what
is true with what is believed.) There is just one actual world, we the ascribers live in
that world, the subject lives there too, so the subject’s actuality is the same as ours.
But there are other cases, less simple, in which the question makes perfect sense
and needs an answer. Someone mayor may not know who he is; someone may or
may not know what time it is. Therefore I insisted that the propositions that may be
known must include propositions de se et nunc; and likewise that the possibilities
that may be eliminated or ignored must include possibilities de se et nunc. Now we
have a good sense in which the subject’s actuality may be different from ours. I ask
today what Fred knew yesterday. In particular, did he then know who he was? Did
he know what day it was? Fred’s actuality is the possibility de se et nunc of being
9
See Robert Stalnaker, “Presuppositions,” Journal of Philosophical Logic 2 (1973) pp. 447–
457; and “Pragmatic Presuppositions” in Milton Munitz and Peter Unger (eds.), Semantics and
Philosophy (New York: New York University Press. 1974). See also my “Score keeping in a
Language Game.” The definition restated in terms of presupposition resembles the treatment
of knowledge in Kenneth S. Ferguson, Philosophical Scepticism (Cornell University doctoral
dissertation, 1980).
10
See Fred Dretske, “Epistemic Operators,” The Journal of Philosophy 67 (1970) pp. 1007–1022,
and “The Pragmatic Dimension of Knowledge,” Philosophical Studies 40 (1981) pp. 363–378;
Alvin Goldman, “Discrimination and Perceptual Knowledge:’ The Journal of Philosophy 73
(1976) pp. 771–791; G. C. Stine, “Skepticism. Relevant Alternatives, and Deductive Closure,”
Philosophical Studies 29 (1976) pp. 249–261: and Stewart Cohen. “How to be A Fallibilist,”
Philosophical Perspectives 2 (1988) pp. 91–123.
11
Some of them, but only some, taken from the authors just cited.
574 D. Lewis
12
Instead of complicating the Rules of Belief as I have just done. 1 might equivalently have
introduced a separate Rule of High Stakes saying that when error would be especially disastrous.
few possibilities are properly ignored.
28 Elusive Knowledge 575
Yet even when the stakes are high, some possibilities still may be properly
ignored. Disastrous though it would be to convict an innocent man, still the jurors
may properly ignore the possibility that it was the dog, marvellously well-trained,
that fired the fatal shot. And, unless they are ignoring other alternatives more
relevant than that, they may rightly be said to know that the accused is guilty as
charged. Yet if there had been reason to give the dog hypothesis a slightly less
negligible degree of belief – if the world’s greatest dog-trainer had been the victim’s
mortal enemy – then the alternative would be relevant after all.
This is the only place where belief and justification enter my story. As already
noted, I allow justified true belief without knowledge, as in the case of your belief
that you will lose the lottery. I allow knowledge without justification, in the cases
of face recognition and chicken sexing. I even allow knowledge without belief, as
in the case of the timid student who knows the answer but has no confidence that
he has it right, and so does not believe what he knows.13 Therefore any proposed
converse to the Rule of Belief should be rejected. A possibility that the subject does
not believe to a sufficient degree, and ought not to believe to a sufficient degree, may
nevertheless be a relevant alternative and not properly ignored.
Next, there is the Rule of Resemblance. Suppose one possibility saliently
resembles another. Then if one of them may not be properly ignored, neither may
the other. (Or rather, we should say that if one of them may not properly be ignored
in virtue of rules other than this rule, then neither may the other. Else nothing could
be properly ignored; because enough little steps of resemblance can take us from
anywhere to anywhere.) Or suppose one possibility saliently resembles two or more
others, one in one respect and another in another, and suppose that each of these
may not properly be ignored (in virtue of rules other than this rule). Then these
resemblances may have an additive effect, doing more together than any one of
them would separately.
We must apply the Rule of Resemblance with care. Actuality is a possibility
uneliminated by the subject’s evidence. Any other possibility W that is likewise
uneliminated by the subject’s evidence thereby resembles actuality in one salient
respect: namely, in respect of the subject’s evidence. That will be so even if W is in
other respects very dissimilar to actuality – even if, for instance, it is a possibility
in which the subject is radically deceived by a demon. Plainly, we dare not apply
the Rules of Actuality and Resemblance to conclude that any such W is a relevant
alternative – that would be capitulation to scepticism. The Rule of Resemblance was
never meant to apply to this resemblance! We seem to have an ad hoc exception to
the Rule, though one that makes good sense in view of the function of attributions
of knowledge. What would be better, though, would be to find a way to reformulate
the Rule so as to get the needed exception without ad hocery. I do not know how to
do this.
13
A. D. Woozley, “Knowing and Not Knowing:’ Proceedings of the Aristotelian Society 53 (1953)
pp. 151–172: Colin Radford. “Knowledge – by Examples,” Analysis 27 (1966) pp. 1–11.
576 D. Lewis
It is the Rule of Resemblance that explains why you do not know that you will
lose the lottery, no matter what the odds are against you and no matter how sure
you should therefore be that you will lose. For every ticket, there is the possibility
that it will win. These possibilities are saliently similar to one another: so either
everyone of them may be properly ignored, or else none may. But one of them may
not properly be ignored: the one that actually obtains.
The Rule of Resemblance also is the rule that solves the Gettier problems: other
cases of justified true belief that are not knowledge.14
(1) I think that Nogot owns a Ford, because I have seen him driving one; but
unbeknownst to me he does not own the Ford he drives, or any other Ford.
Unbeknownst to me, Havit does own a Ford, though I have no reason to
think so because he never drives it, and in fact I have often seen him taking
the tram. My justified true belief is that one of the two owns a Ford. But I
do not know it; I am right by accident. Diagnosis: I do not know, because I
have not eliminated the possibility that Nogot drives a Ford he does not own
whereas Havit neither drives nor owns a car. This possibility may not properly
be ignored. Because, first. actuality may not properly be ignored; and, second.
this possibility saliently resembles actuality. It resembles actuality perfectly
so far as Nogot is concerned; and it resembles actuality well so far as Havit
is concerned, since it matches actuality both with respect to Havit’s carless
habits and with respect to the general correlation between carless habits and
carlessness. In addition, this possibility saliently resembles a third possibility:
one in which Nogot drives a Ford he owns while Havit neither drives nor owns
a car. This third possibility may not properly be ignored, because of the degree
to which it is believed. This time, the resemblance is perfect so far as Havit is
concerned, rather good so far as Nogot is concerned.
(2) The stopped clock is right twice a day. It says 4:39. as it has done for weeks. I
look at it at 4:39; by luck I pick up a true belief. I have ignored the uneliminated
possibility that I looked at it at 4:22 while it was stopped saying 4:39. That
possibility was not properly ignored. It resembles actuality perfectly so far as
the stopped clock goes.
(3) Unbeknownst to me, I am travelling in the land of the bogus barns; but my eye
falls on one of the few real ones. I don’t know that I am seeing a barn, because
I may not properly ignore the possibility that I am seeing yet another of the
14
See Edmund Gettier, “Is lustified True Belief Knowledge?,” Analysis 23 (1963) pp. 121–123.
Diagnoses have varied widely. The four examples below come from: (1) Keith Lehrer and Thomas
Paxson Jr., “Knowledge: Undefeated True Belief,” The Journal of Philosophy 66 (1969) pp. 225–
237; (2) Bertrand Russell, Human Knowledge: Its Scope and Limits (London: Allen and Unwin,
1948) p. 154; (3) Alvin Goldman, “Discrimination and Perceptual Knowledge,” op. cit.; (4) Gilbert
Harman, Thought (Princeton. N1: Princeton University Press, 1973) p. 143.
Though the lottery problem is another case of justified true belief without knowledge, it is not
normally counted among the Gettier problems. It is interesting to find that it yields to the same
remedy.
28 Elusive Knowledge 577
15
See Alvin Goldman, “A Causal Theory of Knowing,” The Journal of Philosophy 64 (1967)
pp. 357–372; D. M. Armstrong, Belief, Truth and Knowledge (Cambridge: Cambridge University
Press, 1973).
16
See my “Veridical Hallucination and Prosthetic Vision,” Australasian Journal of Philosophy
58 (1980) pp. 239–249. John Bigelow has proposed to model knowledge-delivering processes
generally on those found in vision.
578 D. Lewis
can happen (well, hardly ever) that an ascription of knowledge is true. Not an
ascription of knowledge to yourself (either to your present self or to your earlier
self, untainted by epistemology); and not an ascription of knowledge to others. That
is how epistemology destroys knowledge. But it does so only temporarily.
The pastime of epistemology does not plunge us forevermore into its special
context. We can still do a lot of proper ignoring, a lot of knowing, and a lot of true
ascribing of knowledge to ourselves and others, the rest of the time.
What is epistemology all about? The epistemology we’ve just been doing, at any
rate, soon became an investigation of the ignoring of possibilities. But to investigate
the ignoring of them was ipso facto not to ignore them. Unless this investigation
of ours was an altogether atypical sample of epistemology, it will be inevitable that
epistemology must destroy knowledge. That is how knowledge is elusive. Examine
it, and straightway it vanishes.
Is resistance useless? If you bring some hitherto ignored possibility to our
attention, then straightway we are not ignoring it at all, so a fortiori we are not
properly ignoring it. How can this alteration of our conversational state be undone?
If you are persistent, perhaps it cannot be undone – at least not so long as you
are around. Even if we go off and play backgammon, and afterward start our
conversation afresh, you might turn up and call our attention to it all over again.
But maybe you called attention to the hitherto ignored possibility by mistake.
You only suggested that we ought to suspect the butler because you mistakenly
thought him to have a criminal record. Now that you know he does not – that was
the previous butler – you wish you had not mentioned him at all. You know as well as
we do that continued attention to the possibility you brought up impedes our shared
conversational purposes. Indeed, it may be common knowledge between you and us
that we would all prefer it if this possibility could be dismissed from our attention.
In that case we might quickly strike a tacit agreement to speak just as if we were
ignoring it; and after just a little of that, doubtless it really would be ignored.
Sometimes our conversational purposes are not altogether shared, and it is a
matter of conflict whether attention to some far-fetched possibility would advance
them or impede them. What if some farfetched possibility is called to our attention
not by a sceptical philosopher, but by counsel for the defence? We of the jury may
wish to ignore it, and wish it had not been mentioned. If we ignored it now, we
would bend the rules of cooperative conversation; but we may have good reason
to do exactly that. (After all, what matters most to us as jurors is not whether we
can truly be said to know; what really matters is what we should believe to what
degree, and whether or not we should vote to convict.) We would ignore the far-
fetched possibility if we could – but can we? Perhaps at first our attempted ignoring
would be make-believe ignoring, or self-deceptive ignoring; later, perhaps, it might
ripen into genuine ignoring. But in the meantime, do we know? There may be no
definite answer. We are bending the rules, and our practices of context-dependent
attributions of knowledge were made for contexts with the rules unbent.
If you are still a contented fallibilist, despite my plea to hear the sceptical
argument afresh, you will probably be discontented with the Rule of Attention. You
will begrudge the sceptic even his very temporary victory. You will claim the right to
580 D. Lewis
resist his argument not only in everyday contexts, but even in those peculiar contexts
in which he (or some other epistemologist) busily calls your attention to farfetched
possibilities of error. Further, you will claim the right to resist without having to
bend any rules of cooperative conversation. I said that the Rule of Attention was a
triviality: that which is not ignored at all is not properly ignored. But the Rule was
trivial only because of how I had already chosen to state the sotto voce proviso. So
you, the contented fallibilist, will think it ought to have been stated differently. Thus,
perhaps: “Psst! – except for those possibilities we could properly have ignored”.
And then you will insist that those far-fetched possibilities of error that we attend
to at the behest of the sceptic are nevertheless possibilities we could properly have
ignored. You will say that no amount of attention can, by itself, turn them into
relevant alternatives.
If you say this, we have reached a standoff. I started with a puzzle: how can it
be, when his conclusion is so silly, that the sceptic’s argument is so irresistible?
My Rule of Attention, and the version of the proviso that made that Rule trivial,
were built to explain how the sceptic manages to sway us – why his argument seems
irresistible, however temporarily. If you continue to find it eminently resistible in
all contexts, you have no need of any such explanation. We just disagree about the
explanandum phenomenon.
I say S knows that P iff P holds in every possibility left uneliminated by S’s
evidence – Psst! – except for those possibilities that we are properly ignoring. “We”
means: the speaker and hearers of a given context; that is, those of us who are
discussing S’s knowledge together. It is our ignorings, not S’s own ignorings, that
matter to what we can truly say about S’s knowledge. When we are talking about
our own knowledge or ignorance, as epistemologists so often do, this is a distinction
without a difference. But what if we are talking about someone else?
Suppose we are detectives; the crucial question for our solution of the crime
is whether S already knew, when he bought the gun, that he was vulnerable to
blackmail. We conclude that he did. We ignore various far-fetched possibilities,
as hard-headed detectives should. But S does not ignore them. S is by profession
a sceptical epistemologist. He never ignores much of anything. If it is our own
ignorings that matter to the truth of our conclusion, we may well be right that S
already knew. But if it is S’s ignorings that matter, then we are wrong, because S
never knew much of anything. I say we may well be right; so it is our own ignorings
that matter, not S’s.
But suppose instead that we are epistemologists considering what S knows. If
we are well-informed about S (or if we are considering a well-enough specified
hypothetical case), then if S attends to a certain possibility, we attend to S’s
attending to it. But to attend to 5’s attending to it is ipso facto to attend to it
ourselves. In that case, unlike the case of the detectives. the possibilities we are
properly ignoring must be among the possibilities that S himself ignores. We may
ignore fewer possibilities than S does, but not more.
Even if S himself is neither sceptical nor an epistemologist, he may yet be
clever at thinking up farfetched possibilities that are uneliminated by his evidence.
Then again, we well-informed epistemologists who ask what S knows will have
28 Elusive Knowledge 581
to attend to the possibilities that S thinks up. Even if S’s idle cleverness does not
lead S himself to draw sceptical conclusions, it nevertheless limits the knowledge
that we can truly ascribe to him when attentive to his state of mind. More simply:
his cleverness limits his knowledge. He would have known more, had he been less
imaginative.17
Do I claim you can know P just by presupposing it?! Do I claim you can know
that a possibility W does not obtain just by ignoring it? Is that not what my analysis
implies, provided that the presupposing and the ignoring are proper? Well, yes.
And yet I do not claim it. Or rather, I do not claim it for any specified P or W.
I have to grant, in general, that knowledge just by presupposing and ignoring is
knowledge; but it is an especially elusive sort of knowledge, and consequently it is
an unclaimable sort of knowledge. You do not even have to practise epistemology
to make it vanish. Simply mentioning any particular case of this knowledge, aloud
or even in silent thought, is a way to attend to the hitherto ignored possibility. and
thereby render it no longer ignored, and thereby create a context in which it is no
longer true to ascribe the knowledge in question to yourself or others. So. just as we
should think, presuppositions alone are not a basis on which to claim knowledge.
In general, when S knows that P some of the possibilities in which not-P
are eliminated by S’s evidence and others of them are properly ignored. There
are some that can be eliminated, but cannot properly be ignored. For instance,
when I look around the study without seeing Possum the cat, I thereby eliminate
various possibilities in which Possum is in the study; but had those possibilities not
been eliminated, they could not properly have been ignored. And there are other
possibilities that never can be eliminated, but can properly be ignored. For instance,
the possibility that Possum is on the desk but has been made invisible by a deceiving
demon falls normally into this class (though not when I attend to it in the special
context of epistemology).
There is a third class: not-P possibilities that might either be eliminated or
ignored. Take the farfetched possibility that Possum has somehow managed to get
into a closed drawer of the desk-maybe he jumped in when it was open, then I closed
it without noticing him. That possibility could be eliminated by opening the drawer
and making a thorough examination. But if uneliminated, it may nevertheless be
ignored, and in many contexts that ignoring would be proper. If I look all around
the study, but without checking the closed drawers of the desk, I may truly be said
to know that Possum is not in the study – Dr at any rate, there are many contexts in
which that may truly be said. But if I did check all the closed drawers, then I would
know better that Possum is not in the study. My knowledge would be better in the
17
See Catherine Elgin, “The Epistemic Efficacy of Stupidity,” Synthese 74 (1988) pp. 297–311.
The “efficacy” takes many forms; some to do with knowledge (under various rival analyses),
some to do with justified belief. See also Michael Williams, Unnatural Doubts: Epistemological
Realism and the Basis of Scepticism (Oxford: Blackwell, 1991) pp. 352–355, on the instability of
knowledge under reflection.
582 D. Lewis
second case because it would rest more on the elimination of not-P possibilities, less
on the ignoring of them.18;19
Better knowledge is more stable knowledge: it stands more chance of surviving
a shift of attention in which we begin to attend to some of the possibilities formerly
ignored. If, in our new shifted context, we ask what knowledge we may truly ascribe
to our earlier selves, we may find that only the better knowledge of our earlier
selves still deserves the name. And yet, if our former ignorings were proper at the
time, even the worse knowledge of our earlier selves could truly have been called
knowledge in the former context.
Never – well, hardly ever – does our knowledge rest entirely on elimination and
not at all on ignoring. So hardly ever is it quite as good as we might wish. To that
extent. the lesson of scepticism is right – and right permanently, not just in the
temporary and special context of epistemology.20
What is it all for? Why have a notion of knowledge that works in the way I
described? (Not a compulsory question. Enough to observe that short-cuts – like
satisficing, like having indeterminate degrees of belief – that we resort to because
we are not smart enough to live up to really high, perfectly Bayesian, standards of
rationality. You cannot maintain a record of exactly which possibilities you have
eliminated so far, much as you might like to. It is easier to keep track of which
possibilities you have eliminated if you – Psst! – ignore many of all the possibilities
there are. And besides, it is easier to list some of the propositions that are true in all
the uneliminated, unignored possibilities than it is to find propositions that are true
in all and only the uneliminated, unignored possibilities.
If you doubt that the word “know” bears any real load in science or in meta-
physics, I partly agree. The serious business of science has to do not with knowledge
per se; but rather, with the elimination of possibilities through the evidence of
perception, memory, etc., and with the changes that one’s belief system would (or
might or should) undergo under the impact of such eliminations. Ascriptions of
18
Mixed cases are possible: Fred properly ignores the possibility W1 which Ted eliminates;
however, Ted properly ignores the possibility W2 which Fred eliminates. Ted has looked in all
the desk drawers but not the file drawers, whereas Fred has checked the file drawers but not the
desk. Fred’s knowledge that Possum is not in the study is better in one way, Ted’s is better in
another.
19
To say truly that X is known, I must be properly ignoring any uneliminated possibilities in
which not-X; whereas to say truly that Y is better known than X, I must be attending to some
such possibilities. So I cannot say both in a single context. If I say “X is known, but Y is better
known,” the context changes in mid-sentence: some previously ignored possibilities must stop
being ignored. That can happen easily. Saying it the other way around – ”Y is better known than X,
but even X is known” – is harder, because we must suddenly start to ignore previously unignored
possibilities. That cannot be done, really; but we could bend the rules and make believe we had
done it, and no doubt we would be understood well enough. Saying “X is flat, but Y is flatter” (that
is, “X has no bumps at all, but Y has even fewer or smaller bumps”) is a parallel case. And again,
“Y is flatter, but even X is flat” sounds clearly worse – but not altogether hopeless.
20
Thanks here to Stephen Hetherington. While his own views about better and worse knowledge
are situated within an analysis of knowledge quite unlike mine, they withstand transplantation.
28 Elusive Knowledge 583
knowledge to yourself or others are a very sloppy way of conveying very incomplete
information about the elimination of possibilities. It is as if you had said:
The possibilities eliminated, whatever else they may also include, at least include all the
not-P possibilities; or anyway, all of those except for some we are presumably prepared to
ignore just at the moment.
The only excuse for giving information about what really matters in such a sloppy
way is that at least it is easy and quick! But it is easy and quick; whereas giving full
and precise information about which possibilities have been eliminated seems to be
extremely difficult, as witness the futile search for a “pure observation language.” If
I am right about how ascriptions of knowledge work, they are a handy but humble
approximation. They may yet be indispensable in practice, in the same way that
other handy and humble approximations are.
If we analyse knowledge as a modality, as we have done, we cannot escape the
conclusion that knowledge is closed under (strict) implication.21 Dretske has denied
that knowledge is closed under implication; further, he has diagnosed closure as the
fallacy that drives arguments for scepticism. As follows: the proposition that I have
hands implies that I am not a handless being, and a fortiori that I am not a handless
being deceived by a demon into thinking that I have hands. So, by the closure
principle, the proposition that I know I have hands implies that I know that I am
not handless and deceived. But I don’t know that I am not handless and deceived –
for how can I eliminate that possibility? So, by modus tollens, I don’t know that I
have hands. Dretske’s advice is to resist scepticism by denying closure. He says that
although having hands does imply not being handless and deceived, yet knowing
that I have hands does not imply knowing that I am not handless and deceived. I do
know the former, I do not know the latter.22
What Dretske says is close to right, but not quite. Knowledge is closed under
implication. Knowing that I have hands does imply knowing that I am not handless
and deceived. Implication preserves truth – that is, it preserves truth in any given,
fixed context. But if we switch contexts midway, all bets are off. I say (1) pigs fly; (2)
what I just said had fewer than three syllables (true); (3) what I just said had fewer
than four syllables (false). So “less than three” does not imply “less than four”?
No! The context switched midway, the semantic value of the context-dependent
21
A proof-theoretic version of this closure principle is common to all “normal” modal logics: if
the logic validates an inference from zero or more premises to a conclusion, then also it validates
the inference obtained by prefixing the necessity operator to each premise and to the conclusion.
Further, this rule is all we need to take us from classical sentential logic to the least normal modal
logic. See Brian Chellas, Modal Logic: An Introduction (Cambridge: Cambridge University Press,
1980) p. 114.
22
Dretske, “Epistemic Operators.” My reply follows the lead of Stine, “Skepticism, Relevant
Alternatives, and Deductive Closure,” op. cit.; and (more closely) Cohen, “How to be a Fallibilist,”
op. cit.
584 D. Lewis
phrase “what I just said” switched with it. Likewise in the sceptical argument the
context switched midway, and the semantic value of the context-dependent word
“know” switched with it. The premise “I know that I have hands” was true in its
everyday context, where the possibility of deceiving demons was properly ignored.
The mention of that very possibility switched the context midway. The conclusion
“I know that I am not handless and deceived” was false in its context, because that
was a context in which the possibility of deceiving demons was being mentioned,
hence was not being ignored, hence was not being properly ignored. Dretske gets
the phenomenon right, and I think he gets the diagnosis of scepticism right; it is
just that he misclassifies what he sees. He thinks it is a phenomenon of logic, when
really it is a phenomenon of pragmatics. Closure, rightly understood, survives the
test. If we evaluate the conclusion for truth not with respect to the context in which
it was uttered, but instead with respect to the different context in which the premise
was uttered, then truth is preserved. And if, per impossible, the conclusion could
have been said in the same unchanged context as the premise, truth would have
been preserved.
A problem due to Saul Kripke turns upon the closure of knowledge under
implication. P implies that any evidence against P is misleading. So, by closure,
whenever you know that P. you know that any evidence against P is misleading. And
if you know that evidence is misleading, you should pay it no heed. Whenever we
know – and we know a lot, remember – we should not heed any evidence tending
to suggest that we are wrong, But that is absurd. Shall we dodge the conclusion
by denying closure? I think not. Again, I diagnose a change of context. At first,
it was stipulated that S knew, whence it followed that S was properly ignoring all
possibilities of error. But as the story continues, it turns out that there is evidence
on offer that points to some particular possibility of error. Then, by the Rule of
Attention, that possibility is no longer properly ignored, either by S himself or by we
who are telling the story of S. The advent of that evidence destroys S’s knowledge,
and thereby destroys S’s licence to ignore the evidence lest he be misled.
There is another reason, different from Dretske’s. why we might doubt closure.
Suppose two or more premises jointly imply a conclusion. Might not someone who
is compartmentalized in his thinking – as we all are – know each of the premises
but fail to bring them together in a single compartment? Then might he not fail to
know the conclusion? Yes; and I would not like to plead idealization-of-rationality
as an excuse for ignoring such cases. But I suggest that we might take not the whole
compartmentalized thinker, but rather each of his several overlapping compartments,
as our “subjects.” That would be the obvious remedy if his compartmentalization
amounted to a case of multiple personality disorder; but maybe it is right for milder
cases as well.23
A compartmentalized thinker who indulges in epistemology can destroy his
knowledge, yet retain it as well. Imagine two epistemologists on a bushwalk. As
they walk, they talk. They mention all manner of far-fetched possibilities of error.
23
See Stalnaker, Inquiry, pp. 79–99.
28 Elusive Knowledge 585
And I said that when we do epistemology, and we attend to the proper ignoring
of possibilities, we make knowledge vanish. First we do know, then we do not. But I
had been doing epistemology when I said that. The uneliminated possibilities were
Hot being ignored – not just then. So by what right did I say even that we used to
know?24
In trying to thread a course between the rock of fallibilism and the whirlpool of
scepticism, it may well seem as if I have fallen victim to both at once. For do I not
say that there are all those uneliminated possibilities of error? Yet do I not claim
that we know a lot? Yet do I not claim that knowledge is, by definition, infallible
knowledge?
I did claim all three things. But not all at once! Or if I did claim them all at once,
that was an expository shortcut, to be taken with a pinch of salt. To get my message
across, I bent the rules. If I tried to whistle what cannot be said, what of it? I relied
on the cardinal principle of pragmatics, which overrides every one of the rules I
mentioned: interpret the message to make it make sense – to make it consistent, and
sensible to say.
When you have context-dependence, ineffability can be trite and unmysterious.
Hush! [moment of silence] I might have liked to say, just then, “All of us are silent.”
It was true. But I could not have said it truly, or whistled it either. For by saying it
aloud, or by whistling, I would have rendered it false.
I could have said my say fair and square, bending no rules. It would have been
tiresome, but it could have been done. The secret would have been to resort to
“semantic ascent.” I could have taken great care to distinguish between (1) the
language I use when I talk about knowledge, or whatever, and (2) the second
language that I use to talk about the semantic and pragmatic workings of the first
language. If you want to hear my story told that way, you probably know enough
to do the job for yourself. If you can, then my informal presentation has been good
enough.
24
Worse still: by what right can I even say that we used to be in a position to say truly that we
knew? Then, we were in a context where we properly ignored certain uneliminated possibilities of
error. Now, we are in a context where we no longer ignore them. If now I comment retrospectively
upon the truth of what was said then, which context governs: the context now or the context then?
I doubt there is any general answer, apart from the usual principle that we should interpret what is
said so as to make the message make sense.
Chapter 29
Knowledge and Scepticism
Robert Nozick
You think you are seeing these words, but could you not be hallucinating or
dreaming or having your brain stimulated to give you the experience of seeing
these marks on paper although no such thing is before you? More extremely,
could you not be floating in a tank while super-psychologists stimulate your brain
electrochemically to produce exactly the same experiences as you are now having,
or even to produce the whole sequence of experiences you have had in your lifetime
thus far? If one of these other things was happening, your experience would be
exactly the same as it now is. So how can you know none of them is happening?
Yet if you do not know these possibilities don’t hold, how can you know you are
reading this book now? If you do not know you haven’t always been floating in the
tank at the mercy of the psychologists, how can you know anything-what your name
is, who your parents were, where you come from?
The sceptic argues that we do not know what we think we do. Even when he
leaves us unconverted, he leaves us confused. Granting that we do know, how
can we? Given these other possibilities he poses, how is knowledge possible?
In answering this question. we do not seek to convince the sceptic, but rather
to formulate hypotheses about knowledge and our connection to facts that show
how knowledge can exist even given the sceptic’s possibilities. These hypotheses
must reconcile our belief that we know things with our belief that the sceptical
possibilities are logical possibilities.
The sceptical possibilities, and the threats they pose to our knowledge, depend
upon our knowing things (if we do) mediately, through or by way of something
else. Our thinking or believing that some fact p holds is connected somehow to the
fact that p, but is not itself identical with that fact. Intermediate links establish the
connection. This leaves room for the possibility of these intermediate stages holding
and producing our belief that p, without the fact that p being at the other end. The
intermediate stages arise in a completely different manner, one not involving the
fact that p although giving rise to the appearance that p holds true.
Are the sceptic’s possibilities indeed logically possible? Imagine reading a
science fiction story in which someone is raised from birth floating in a tank with
psychologists stimulating his brain. The story could go on to tell of the person’s
reactions when he is brought out of the tank, of how the psychologists convince him
of what had been happening to him, or how they fail to do so. This story is coherent,
there is nothing self-contradictory or otherwise impossible about it. Nor is there
anything incoherent in imagining that you are now in this situation, at a time before
being taken out of the tank. To ease the transition out, to prepare the way, perhaps
the psychologists will give the person in the tank thoughts of whether floating in the
tank is possible, or the experience of reading a book that discusses this possibility,
even one that discusses their easing his transition. (Free will presents no insuperable
problem for this possibility. Perhaps the psychologists caused all your experiences
of choice, including the feeling of freely choosing; or perhaps you do freely choose
to act while they, cutting the effector circuit, continue the scenario from there.)
Some philosophers have attempted to demonstrate there is no such coherent
possibility of this sort. However, for any reasoning that purports to show this
sceptical possibility cannot occur, we can imagine the psychologists of our science
fiction story feeding it to their tank-subject, along with the (inaccurate) feeling
that the reasoning is cogent. So how much trust can be placed in the apparent
cogency of an argument to show the sceptical possibility isn’t coherent? The
sceptic’s possibility is a logically coherent one, in tension with the existence of
(almost all) knowledge; so we seek a hypothesis to explain how, even given the
sceptic’s possibilities, knowledge is possible. We may worry that such explanatory
hypotheses are ad hoc, but this worry will lessen if they yield other facts as well, fit
in with other things we believe, and so forth. Indeed, the theory of knowledge that
follows was not developed in order to explain how knowledge is possible. Rather,
the motivation was external to epistemology; only after the account of knowledge
was developed for another purpose did I notice its consequences for scepticism, for
understanding how knowledge is possible. So whatever other defects the explanation
might have, it can hardly be called ad hoc.
Knowledge
We would like each condition to be necessary for knowledge, so any case that
fails to satisfy it will not be an instance of knowledge. Furthermore, we would like
the conditions to be jointly sufficient for knowledge, so any case that satisfies all
of them will be an instance of knowledge. We first shall formulate conditions that
seem to handle ordinary cases correctly, classifying as knowledge cases which are
knowledge, and as non-knowledge cases which are not; then we shall check to see
how these conditions handle some difficult cases discussed in the literature.
One plausible suggestion is causal, something like: the fact that p (partially)
causes S to believe that p, that is, (2) because (1). But this provides an inhospitable
environment for mathematical and ethical knowledge; also there are well-known
difficulties in specifying the type of causal connection. If someone floating in a
tank oblivious to everything around him is given (by direct electrical and chemical
stimulation of the brain) the belief that he is floating in a tank with his brain being
stimulated, then even though that fact is part of the cause of his belief, still he does
not know that it is true. Let us consider a different third condition:
(3) If p were not true, S would not believe that p.
Throughout this work, let us write the subjunctive “if-then” by an arrow, and
the negation of a sentence by prefacing “not-” to it. The above condition thus is
rewritten as:
(3) not-p ! not-(S believes that p).
This subjunctive condition is not unrelated to the causal condition. Often when
the fact that p (partially) causes someone to believe that p, the fact also will be
causally necessary for his having the belief without the cause, the effect would
not occur. In that case, the subjunctive condition (3) also will be satisfied. Yet this
condition is not equivalent to the causal condition. For the causal condition will be
satisfied in cases of causal overdetermination, where either two sufficient causes of
the effect actually operate, or a back-up cause (of the same effect) would operate
if the first one didn’t; whereas the subjunctive condition need not hold for these
cases.1,2 When the two conditions do agree, causality indicates knowledge because
it acts in a manner that makes the subjunctive (3) true.
The subjunctive condition (3) serves to exclude cases of the sort first described
by Edward Gettier, such as the following. Two other people are in my office and I
am justified on the basis of much evidence in believing the first owns a Ford car;
though he (now) does not, the second person (a stranger to me) owns one. I believe
truly and justifiably that someone (or other) in my office owns a Ford car, but I do
not know someone does. Concluded Gettier, knowledge is not simply justified true
belief.
1
See Hilary Putnam, Reason, Truth and History (Cambridge, 1981), ch. I.
2
I should note here that I assume bivalence throughout this chapter, and consider only statements
that are true if and only if their negations are false.
590 R. Nozick
The following subjunctive, which specifies condition (3) for this Gettier case,
is not satisfied: if no one in my office owned a Ford car, I wouldn’t believe that
someone did. The situation that would obtain if no one in my office owned a Ford
is one where the stranger does not (or where he is not in the office); and in that
situation I still would believe, as before, that someone in my office does own a Ford,
namely, the first person. So the subjunctive condition (3) excludes this Gettier case
as a case of knowledge.
The subjunctive condition is powerful and intuitive, not so easy to satisfy, yet not
so powerful as to rule out everything as an instance of knowledge. A subjunctive
conditional “if p were true, q would be true,” p ! q, does not say that p entails q or
that it is logically impossible that p yet not-q. It says that in the situation that would
obtain if p were true, q also would be true. This point is brought out especially
clearly in recent “possible-worlds” accounts of subjunctives: the subjunctive is true
when (roughly) in all those worlds in which p holds true that are closest to the
actual world, q also is true. (Examine those worlds in which p holds true closest to
the actual world, and see if q holds true in all these.) Whether or not q is true in p
worlds that are still farther away from the actual world is irrelevant to the truth of
the subjunctive. I do not mean to endorse any particular possible-worlds account of
subjunctives, nor am I committed to this type of account.3 I sometimes shall use it,
though, when it illustrates points in an especially clear way.
The subjunctive condition (3) also handles nicely cases that cause difficulties
for the view that you know that p when you can rule out the relevant alternatives
to p in the context. For, as Gail Stine writes, “what makes an alternative relevant
in one context and not another? : : : if on the basis of visual appearances obtained
under optimum conditions while driving through the countryside Henry identifies an
object as a barn, normally we say that Henry knows that it is a barn. Let us suppose,
however, that unknown to Henry, the region is full of expertly made papier-mache
facsimiles of barns. In that case, we would not say that Henry knows that the object
is a barn, unless he has evidence against it being a papier-mache facsimile, which
is now a relevant alternative. So much is clear, but what if no such facsimiles exist
in Henry’s surroundings, although they once did? Are either of these circumstances
sufficient to make the hypothesis (that it’s a papier-mache object) relevant? Probably
not, but the situation is not so clear:”4 Let p be the statement that the object in the
field is a (real) barn, and q the one that the object in the field is a papier-mache barn.
When papier-mache barns are scattered through the area, if p were false, q would be
true or might be. Since in this case (we are supposing) the person still would believe
p, the subjunctive
3
See Robert Stalnaker, “A Theory of Conditionals,” in N. Rescher, ed., Studies in Logical
Theory (Oxford 1968); David Lewis, Counteifactuals (Cambridge 1973); and Jonathan Bennett’s
critical review of Lewis, “Counterfactuals and Possible Worlds,” Canadian Journal of Philosophy,
4/2 (Dec. 1974), 381–402. Our purposes require, for the most part, no more than an intuitive
understanding of subjunctives.
4
G. C. Stine, “Skepticism, Relevant Alternatives and Deductive Closure,” Philosophical Studies,
29 (1976), 252, who attributes the example to Carl Ginet.
29 Knowledge and Scepticism 591
(1) and (2). Thus, we presuppose some (or another) suitable account of subjunctives.
According to the suggestion tentatively made above, (4) holds true if not only does
he actually truly believe p, but in the ‘close’ worlds where p is true, he also believes
it. He believes that p for some distance out in the p. neighbourhood of the actual
world; similarly, condition (3) speaks not of the whole not-p neighbourhood of the
actual world, but only of the first portion of it. (If, as is likely, these explanations
do not help, please use your own intuitive understanding of the subjunctives (3) and
(4).)
The person in the tank does not satisfy the subjunctive condition (4). Imagine
as actual a world in which he is in the tank and is stimulated to believe he is, and
consider what subjunctives are true in that world. It is not true of him there that if he
were in the tank he would believe it; for in the close world (or situation) to his own
where he is in the tank but they don’t give him the belief that he is (much less instill
the belief that he isn’t) he doesn’t believe he is in the tank. Of the person actually in
the tank and believing it, it is not true to make the further statement that if he were
in the tank he would believe it – so he does not know he is in the tank.
The subjunctive condition (4) also handles a case presented by Gilbert Harman.8
The dictator of a country is killed; in their first edition, newspapers print the story,
but later all the country’s newspapers and other media deny the story, falsely.
Everyone who encounters the denial believes it (or does not know what to believe
and so suspends judgement). Only one person in the country fails to hear any denial
and he continues to believe the truth. He satisfies conditions (1) – (3) (and the causal
condition about belief) yet we are reluctant to say he knows the truth. The reason is
that if he had heard the denials, he too would have believed them, just like everyone
else. His belief is not sensitively tuned to the truth, he doesn’t satisfy the condition
that if it were true he would believe it. Condition (4) is not satisfied.
There is a pleasing symmetry about how this account of knowledge relates
conditions (3) and (4), and connects them to the first two conditions. The account
has the following form.
(1)
(2)
(3) not-l ! not-2
(4) 1 ! 2
I am not inclined, however, to make too much of this symmetry, for I found also
that with other conditions experimented with as a possible fourth condition there
was some way to construe the resulting third and fourth conditions as symmetrical
answers to some symmetrical looking questions, so that they appeared to arise in
parallel fashion from similar questions about the components of true belief.
Symmetry, it seems, is a feature of a mode of presentation, not of the contents
presented. A uniform transformation of symmetrical statements can leave the results
non-symmetrical. But if symmetry attaches to mode of presentation, how can
it possibly be a deep feature of, for instance, laws of nature that they exhibit
symmetry? (One of my favourite examples of symmetry is due to Groucho Marx.
On his radio programme he spoofed a commercial, and ended. “And if you are not
29 Knowledge and Scepticism 593
completely satisfied, return the unused portion of our product and we will return
the unused portion of your money.”) Still, to present our subject symmetrically
makes the connection of knowledge to true belief especially perspicuous. It seems
to me that a symmetrical formulation is a sign of our understanding, rather than a
mark of truth. If we cannot understand an asymmetry as arising from an underlying
symmetry through the operation of a particular factor, we will not understand why
that asymmetry exists in that direction. (But do we also need to understand why the
underlying asymmetrical factor holds instead of its opposite?)
A person knows that p when he not only does truly believe it, but also would
truly believe it and wouldn’t falsely believe it. He not only actually has a true belief,
he subjunctively has one. It is true that p and he believes it; if it weren’t true he
wouldn’t believe it, and if it were true he would believe it. To know that p is to be
someone who would believe it if it were true, and who wouldn’t believe it if it were
false.
It will be useful to have a term for this situation when a person’s belief is thus
subjunctively connected to the fact. Let us say of a person who believes that p, which
is true, that when (3) and (4) hold, his belief tracks the truth that p. To know is to
have a belief that tracks the truth. Knowledge is a particular way of being connected
to the world, having a specific real factual connection to the world: tracking it.
Scepticism
The sceptic about knowledge argues that we know very little or nothing of what
we think we know, or at any rate that this position is no less reasonable than
the belief in knowledge. The history of philosophy exhibits a number of different
attempts to refute the sceptic: to prove him wrong or show that in arguing against
knowledge he presupposes there is some and so refutes himself. Others attempt
to show that accepting scepticism is unreasonable, since it is more likely that the
sceptic’s extreme conclusion is false than that all of his premisses are true, or simply
because reasonableness of belief just means proceeding in an anti-sceptical way.
Even when these counter-arguments satisfy their inventors, they fail to satisfy others,
as is shown by the persistent attempts against scepticism. The continuing felt need to
refute scepticism, and the difficulty in doing so, attests to the power of the sceptic’s
position, the depth of his worries.
An account of knowledge should illuminate sceptical arguments and show
wherein lies their force. If the account leads us to reject these arguments, this had
better not happen too easily or too glibly. To think the sceptic overlooks something
obvious, to attribute to him a simple mistake or confusion or fallacy, is to refuse to
acknowledge the power of his position and the grip it can have upon us. We thereby
cheat ourselves of the opportunity to reap his insights and to gain self-knowledge in
understanding why his arguments lure us so. Moreover, in fact, we cannot lay the
spectre of scepticism to rest without first hearing what it shall unfold.
594 R. Nozick
Sceptical Possibilities
The sceptic often refers to possibilities in which a person would believe something
even though it was false: really, the person is cleverly deceived by others, perhaps
by an evil demon, or the person is dreaming, or he is floating in a tank near Alpha
Centauri with his brain being stimulated. In each case, the p he believes is false, and
he believes it even though it is false.
How do these possibilities adduced by the sceptic show that someone does not
know that p? Suppose that someone is you; how do these possibilities count against
your knowing that p? One way might be the following. (I shall consider other ways
later.) If there is a possible situation where p is false yet you believe that p, then
in that situation you believe that p even though it is false. So it appears you do not
satisfy condition (3) for knowledge.
(3) If p were false, S wouldn’t believe that p.
For a situation has been described in which you do believe that p even though p
is false. How then can it also be true that if p were false, you wouldn’t believe it?
If the sceptic’s possible situation shows that (3) is false, and if (3) is a necessary
condition for knowledge, then the sceptic’s possible situation shows that there isn’t
knowledge.
So construed, the sceptic’s argument plays on condition (3); it aims to show that
condition (3) is not satisfied. The sceptic may seem to be putting forth
5
Gilbert Harman, Thought (Princeton; 1973), ch. 9, 142–54.
6
From the perspective of explanation rather than proof, the extensive philosophical discussion,
deriving from Charles S. Peirce, of whether the sceptic’s doubts are real is beside the point. The
problem of explaining how knowledge is possible would remain the same, even if no one ever
claimed to doubt that there was knowledge.
29 Knowledge and Scepticism 595
Sceptical Results
how could we? – that it is not happening to us. It is a virtue of our account that it
yields, and explains, this result.
The sceptic asserts we do not know his possibilities don’t obtain, and he is right.
Attempts to avoid scepticism by claiming we do know these things are bound to
fail. The sceptic’s possibilities make us uneasy because, as we deeply realize, we do
not know they don’t obtain; it is not surprising that attempts to show we do know
these things leave us suspicious, strike us even as bad faith. Nor has the sceptic
merely pointed out something obvious and trivial. It comes as a surprise to realize
that we do not know his possibilities don’t obtain. It is startling, shocking. For we
would have thought, before the sceptic got us to focus on it, that we did know those
things, that we did know we were not being deceived by a demon, or dreaming that
dream, or stimulated that way in that tank. The sceptic has pointed out that we do
not know things we would have confidently said we knew. And if we don’t know
these things, what can we know? So much for the supposed obviousness of what the
sceptic tells us.
Let us say that a situation (or world) is doxically identical for S to the actual
situation when if S were in that situation, he would have exactly the beliefs (doxa)
he actually does have. More generally, two situations are doxically identical for S
if and only if he would have exactly the same beliefs in them. It might be merely a
curiosity to be told there are non-actual situations doxically identical to the actual
one. The sceptic, however, describes worlds doxically identical to the actual world
in which almost everything believed is false.7
Such worlds are possible because we know mediately, not directly. This leaves
room for a divergence between our beliefs and the truth. It is as though we possessed
only two-dimensional plane projections of three-dimensional objects. Different
three-dimensional objects, oriented appropriately, have the same two-dimensional
plane projection. Similarly, different situations or worlds will lead to our having the
very same beliefs. What is surprising is how very different the doxically identical
world can be – different enough for almost everything believed in it to be false.
Whether or not the mere fact that knowledge is mediated always makes room for
such a very different doxically identical world, it does so in our case, as the sceptic’s
possibilities show. To be shown this is non-trivial, especially when we recall that we
do not know the sceptic’s possibility doesn’t obtain: we do not know that we are not
living in a doxically identical world wherein almost everything we believe is false.
What more could the sceptic ask for or hope to show? Even readers who
sympathized with my desire not to dismiss the sceptic too quickly may feel this
has gone too far, that we have not merely acknowledged the force of the sceptic’s
position but have succumbed to it.
7
I say almost everything, because there still could be some true beliefs such as “I exist.” More
limited sceptical possibilities present worlds doxically identical to the actual world in which almost
every belief of a’ certain sort is false, for example, about the past, or about other people’s mental
states.
598 R. Nozick
The sceptic maintains that we know almost none of what we think we know.
He has shown, much to our initial surprise, that we do not know his (nontrivial)
possibility SK doesn’t obtain. Thus, he has shown of one thing we thought we knew,
that we didn’t and don’t. To the conclusion that we know almost nothing, it appears
but a short step. For if we do not know we are not dreaming or being deceived by
a demon or floating in a tank, then how can I know, for example, that I am sitting
before a page writing with a pen, and how can you know that you are reading a page
of a book?
However, although our account of knowledge agrees with the sceptic in saying
that we do not know that not-SK, it places no formidable barriers before my knowing
that I am writing on a page with a pen. It is true that I am, I believe I am, if! weren’t
I wouldn’t believe I was, and if I were, I would believe it. Also, it is true that you
are reading a page (please, don’t stop now!), you believe you are, if you weren’t
reading a page you wouldn’t believe you were, and if you were reading a page you
would believe you were. So according to the account, I do know that I am writing
on a page with a pen, and you do know that you are reading a page. The account
does not lead to any general scepticism.
Yet we must grant that it appears that if the sceptic is right that we don’t know
we are not dreaming or being deceived or floating in the tank, then it cannot be that
I know I am writing with a pen or that you know you are reading a page. So we must
scrutinize with special care the sceptic’s “short step” to the conclusion that we don’t
know these things, for either this step cannot be taken or our account of knowledge
is incoherent.
Nonclosure
In taking the “short step,” the sceptic assumes that if S knows that p and he knows
that “p entails q” then he also knows that q. In the terminology of the logicians, the
sceptic assumes that knowledge is closed under known logical implication; that the
operation of moving from something known to something else known to be entailed
by it does not take us outside of the (closed) area of knowledge. He intends, of
course, to work things backwards, arguing that since the person does not know that
q, assuming (at least for the purposes of argument) that he does know that p entails
q. it follows that he does not know that p. For if he did know that p, he would also
know that q, which he doesn’t.
The details of different sceptical arguments vary in their structure, but each one
will assume some variant of the principle that knowledge is closed under known
logical implication. If we abbreviate “knowledge that p” by “Kp” and abbreviate
“entails” by the fishhook sign”,” we can write this principle of closure as the
subjunctive principle
If a person were to know that p entails q and he were to know that p then he would
know that q. The statement that q follows by modus ponens from the other two stated
as known in the antecedent of the subjunctive principle p; this principle counts on
the person to draw the inference to q. You know that your being in a tank on Alpha
Centauri entails your not being in place X where you are. (I assume here a limited
readership.) And you know also the contrapositive, that your being at place X entails
that you are not then in a tank on Alpha Centauri. If you knew you were at X you
would know you’re not in a tank (of a specified sort) at Alpha Centauri. But you do
not know this last fact (the sceptic has argued and we have agreed) and so (he argues)
you don’t know the first. Another intuitive way of putting the sceptic’s argument is
as follows. If you know that two statements are incompatible and you know the first
is true then you know the denial of the second.
You know that your being at X and your being in a tank on Alpha Centauri are
incompatible; so if you knew you were at X you would know you were not in the
(specified) tank on Alpha Centauri. Since you do not know the second, you don’t
know the first.
No doubt, it is possible to argue over the details of principle p, to point out it is
incorrect as it stands. Perhaps, though Kp, the person does not know that he knows
that p (that is, not-KKp) and so does not draw the inference to q. Or perhaps he
doesn’t draw the inference because not-KK(p q). Other similar principles face
their own difficulties: for example, the principle that K(p ! q) ! (Kp ! Kq) fails if
Kp stops p ! q from being true, that is, if Kp ! not-(p ! q); the principle that K(p
q)- ! K(Kp ! Kq) faces difficulties if Kp makes the person forget that (pq) and so
he fails to draw the inference to q. We seem forced to pile K upon K until we reach
something like KK(p q) & KKp ! Kq; this involves strengthening considerably
the antecedent of p and so is not useful for the sceptic’s argument that p is not
known. (From a principle altered thus it would follow at best that it is not known
that p is known.)
We would be ill-advised, however, to quibble over the details of p. Although
these details are difficult to get straight, it will continue to appear that something
like p is correct. If S knows that “p entails q,” and he knows that p and knows that
“(p and p entails q) entails q” and he does draw the inference to q from all this and
believes q via the process of drawing this inference, then will he not know that q?
And what is wrong with simplifying this mass of detail by writing merely principle
p, provided we apply it only to cases where the mass of detail holds, as it surely
does in the sceptical cases under consideration? For example, I do realize that my
being in the Van Leer Foundation Building in Jerusalem entails that I am not in a
tank on Alpha Centauri; I am capable of drawing inferences now; I do believe I am
not in a tank on Alpha Centauri (though not solely via this inference, surely); and
so forth. Won’t this satisfy the correctly detailed principle, and shouldn’t it follow
that I know I am not (in that tank) on Alpha Centauri? The sceptic agrees it should
follow; so he concludes from the fact that I don’t know I am not floating in the tank
on Alpha Centauri that I don’t know I am in Jerusalem. Uncovering difficulties in
the details of particular formulations of p will not weaken the principle’s intuitive
600 R. Nozick
appeal; such quibbling will seem at best like a wasp attacking a steamroller, at worst
like an effort in bad faith to avoid being pulled along by the sceptic’s argument.
Principle p is wrong, however, and not merely in detail. Knowledge is not closed
under known logical implication. S knows that p when S has a true belief that p, and
S wouldn’t have a false belief that p (condition (3)) and S would have a true belief
that p (condition (4)). Neither of these latter two conditions is closed under known
logical implication. Let us begin with condition
(3) if p were false, S wouldn’t believe that p.
When S knows that p, his belief that p is contingent on the truth of p. contingent
in the way the subjunctive condition (3) describes. Now it might be that p entails q
(and S knows this), that S’s belief that p is subjunctively contingent on the truth of
p, that S believes q, yet his belief that q is not subjunctively dependent on the truth
of q. in that it (or he) does not satisfy:
(30 ) if q were false, S wouldn’t believe that q.
For (30 ) talks of what S would believe if q were false, and this may be a very
different situation from the one that would hold if p were false, even though p entails
q. That you were born in a certain city entails that you were born on earth.8 Yet
contemplating what (actually) would be the situation if you were not born in that
city is very different from contemplating what situation would hold if you weren’t
born on earth. Just as those possibilities are very different, so what is believed in
them may be very different. When p entails q (and not the other way around) p will
be a stronger statement than q, and so not-q (which is the antecedent of (30 )) will be a
stronger statement than not-p (which is the antecedent of (3)). There is no reason to
assume you will have the same beliefs in these two cases, under these suppositions
of differing strengths.
There is no reason to assume the (closest) not-p world and the (closest) not-
q world are doxically identical for you, and no reason to assume, even though p
entails q, that your beliefs in one of these worlds would be a (proper) subset of your
beliefs in the other.
Consider now the two statements:
p D I am awake and sitting on a chair in Jerusalem;
q D I am not floating in a tank on Alpha Centauri being stimulated by electrochem-
ical means to believe that p.
The first one entails the second: p entails q. Also, I know that p entails q; and I
know that p. If p were false, I would be standing or lying down in the same city, or
perhaps sleeping there, or perhaps in a neighbouring city or town. If q were false,
I would be floating in a tank on Alpha Centauri. Clearly these are very different
situations, leading to great differences in what I then would believe. If p were false,
8
Here again I assume a limited readership, and ignore possibilities such as those described in James
Blish, Cities in Flight (New York, 1982).
29 Knowledge and Scepticism 601
if 1 weren’t awake and sitting on a chair in Jerusalem, 1 would not believe that p.
Yet if q were false, if I was floating in a tank on Alpha Centauri, 1 would believe
that q, that I was not in the tank, and indeed, in that case, I would still believe that
p. According to our account of knowledge, I know that p yet I do not know that q,
even though (1 know) p entails q.
This failure of knowledge to be closed under known logical implication stems
from the fact that condition (3) is not closed under known logical implication;
condition (3) can hold of one statement believed while not of another known to
be entailed by the first. It is clear that any account that includes as a necessary
condition for knowledge the subjunctive condition (3), not-p ! not-(S believes that
p), will have the consequence that knowledge is not closed under known logical
implication. When p entails q and you believe each of them, if you do not have
a false belief that p (since p is true) then you do not have a false belief that q.
However, if you are to know something not only don’t you have a false belief about
it, but also you wouldn’t have a false belief about it. Yet, we have seen how it may
be that p entails q and you believe each and you wouldn’t have a false belief that p
yet you might have a false belief that q (that is, it is not the case that you wouldn’t
have one). Knowledge is not closed under the known logical implication because
‘wouldn’t have a false belief that’ is not closed under known logical implication.
If knowledge were the same as (simply) true belief then it would be closed
under known logical implication (provided the implied statements were believed).
Knowledge is not simply true belief, however; additional conditions are needed.
These further conditions will make knowledge open under known logical impli-
cation, even when the entailed statement is believed, when at least one of the
further conditions itself is open. Knowledge stays closed (only) if all of the
additional conditions are closed. I lack a general non-trivial characterization of
those conditions that are closed under known logical implication: possessing such
an illuminating characterization, one might attempt to prove that no additional
conditions of that sort could provide an adequate analysis of knowledge.
Still, we can say the following. A belief that p is knowledge that p only if it
somehow varies with the truth of p. The causal condition for knowledge specified
that the belief was “produced by” the fact, but that condition did not provide the
right sort of varying with the fact. The subjunctive conditions (3) and (4) are our
attempt to specify that varying. But however an account spells this out, it will hold
that whether a belief that p is knowledge partly depends on what goes on with the
belief in some situations when p is false. An account that says nothing about what
is believed in any situation when p is false cannot give us any mode of varying with
the fact.
Because what is preserved under logical implication is truth, any condition that
is preserved under known logical implication is most likely to speak only of what
happens when p, and q, are true, without speaking at all of what happens when either
one is false. Such a condition is incapable of providing “varies with”; so adding only
such conditions to true belief cannot yield an adequate account of knowledge.
A belief’s somehow varying with the truth of what is believed is not closed
under known logical implication. Since knowledge that p involves such variation,
602 R. Nozick
knowledge also is not closed under known logical implication. The sceptic cannot
easily deny that knowledge involves such variation, for his argument that we don’t
know that we’re not floating in that tank, for example, uses the fact that knowledge
does involve variation. (“If you were floating in the tank you would still think you
weren’t, so you don’t know that you’re not.”) Yet, though one part of his argument
uses that fact that knowledge involves such variation, another part of his argument
presupposes that knowledge does not involve any such variation. This latter is the
part that depends upon knowledge being closed under known logical implication.
as when the sceptic argues that since you don’t know that not-SK, you don’t know
you are not floating in the tank, then you also don’t know, for example, that you
are now reading a book. That closure can hold only if the variation does not. The
sceptic cannot be right both times. According to our view he is right when he holds
that knowledge involves such variation and so concludes that we don’t know, for
example, that we are not floating in that tank: but he is wrong when he assumes
knowledge is closed under known logical implication and concludes that we know
hardly anything.9
Knowledge is a real factual relation, subjunctively specifiable, whose structure
admits our standing in this relation, tracking, to p without standing in it to some q
which we know p to entail. Any relation embodying some variation of belief with
this fact. with the truth (value), will exhibit this structural feature. The sceptic is
right that we don’t track Some particular truths – the ones stating that his sceptical
possibilities SK don’t hold – but wrong that we don’t stand in the real knowledge-
relation of tracking to many other truths, including ones that entail these first
mentioned truths we believe but don’t know.
The literature on scepticism contains writers who endorse these sceptical argu-
ments (or similar narrower ones), but confess their inability to maintain their
9
Reading an earlier draft of this chapter, friends pointed out to me that Fred Dretske already
had defended the view that knowledge (as one among many epistemic concepts) is not closed
under known logical implication. (See his “Epistemic Operators,” Journal of Philosophy, 67,
(1970). 1007–23.) Furthermore, Dretske presented a subjunctive condition for knowledge (in his
“Conclusive Reasons,” Australasian Journal of Philosophy, 49, (1971), 1–22), holding that S
knows that p on the basis of reasons R only if: R would not be the case unless p were the case.
Here Dretske ties the evidence subjunctively to the fact, and the belief based on the evidence
subjunctively to the fact through the evidence. The independent statement and delineation of the
position here I hope will make clear its many merits.
After Goldman’s paper on a causal theory of knowledge, in Journal of Philosophy, 64, (1967),
an idea then already “in the air,” it required no great leap to consider subjunctive conditions.
Some 2 months after the first version of this chapter was written, Goldman himself published a
paper on knowledge utilizing counterfactuals (“Discrimination and Perceptual Knowledge,” Essay
II in this collection), also talking of relevant possibilities (without using the counterfactuals to
identify which possibilities are relevant); and R. Shope has called my attention to a paper of L. S.
Carrier (“An Analysis of Empirical Knowledge,” Southern Journal of Philosophy, 9, (1971), 3–
11) that also used subjunctive conditions including our condition (3). Armstrong’s reliability view
of knowledge (Belief, Truth and Knowledge, Cambridge, 1973, pp. 166, 169) involved a lawlike
connection between the belief that p and the state of affairs that makes it true. Clearly, the idea is
one whose time has come.
29 Knowledge and Scepticism 603
sceptical beliefs at times when they are not focusing explicitly on the reasoning
that led them to sceptical conclusions. The most notable example of this is Hume:
I am ready to reject all belief and reasoning, and can look upon no opinion even as more
probable or likely than another. . . Most fortunately it happens that since reason is incapable
of dispelling these clouds, nature herself suffices to that purpose, and cures me of this
philosophical melancholy and delirium, either by relaxing this bent of mind, or by some
avocation, and lively impression of my senses, which obliterate all these chimeras. I dine,
I play a game of backgammon, I converse, and am merry with my friends; and when after
three or four hours’ amusement, I would return to these speculations, they appear so cold,
and strained, and ridiculous, that I cannot find in my heart to enter into them any farther. (A
Treatise of Human Nature, Book I, Part IV, section VII.)
The great subverter of Pyrrhonism or the excessive principles of skepticism is action,
and employment, and the occupations of common life. These principles may flourish and
triumph in the schools; where it is, indeed, difficult, if not impossible, to refute them. But
as soon as they leave the shade, and by the presence of the real objects, which actuate
our passions and sentiments, are put in opposition to the more powerful principles of our
nature, they vanish like smoke, and leave the most determined skeptic in the same condition
as other mortals. . . And though a Pyrrhonian may throw himself or others into a momentary
amazement and confusion by his profound reasonings; the first and most trivial event in life
will put to flight all his doubts and scruples, and leave him the same, in every point of
action and speculation, with the philosophers of every other sect, or with those who never
concerned themselves in any philosophical researches. When he awakes from his dream, he
will be the first to join in the laugh against himself, and to confess that all his objections are
mere amusement. (An Enquiry Concerning Human Understanding, Section XII, Part II.)
Robert Stalnaker
Introduction
1
See Fagin et al. (1995) and Battigalli and Bonanno (1999) for excellent surveys of the application
of logics of knowledge and belief in theoretical computer science and game theory.
R. Stalnaker ()
Department of Linguistics and Philosophy, MIT, Cambridge, MA, USA
e-mail: [email protected]
but I think there may still be some insights to be gained by looking back, if not at
the details of the analyses, at some of the general strategies of analysis that were
deployed.
There was little contact between these two very different epistemological
projects. The first had little to say about substantive questions about the relation
between knowledge, belief, and justification or epistemic entitlement, or about
traditional epistemological issues, such as skepticism. The second project ignored
questions about the abstract structure of epistemic and doxastic states. But I
think some of the abstract questions about the logic of knowledge connect with
traditional questions in epistemology, and with the issues that motivated the attempt
to find a definition of knowledge. The formal semantic framework provides the
resources to construct models that may help to clarify the abstract relationship
between the concept of knowledge and some of the other concepts (belief and
belief revision, causation and counterfactuals) that were involved in the post-Gettier
project of defining knowledge. And some of the examples that were originally
used in the post-Gettier literature to refute a proposed analysis can be used in a
different way in the context of formal semantic theories: to bring out contrasting
features of some alternative conceptions of knowledge, conceptions that may not
provide plausible analyses of knowledge generally, but that may provide interesting
models of knowledge that are appropriate for particular applications, and that may
illuminate, in an idealized way, one or another of the dimensions of the complex
epistemological terrain.
My aim in this paper will be to bring out some of the connections between issues
that arise in the development and application of formal semantics for knowledge
and belief and more traditional substantive issues in epistemology. The paper will
be programmatic; pointing to some highly idealized theoretical models, some alter-
native assumptions that might be made about the logic and semantics of knowledge,
and some of the ways in which they might connect with traditional issues in episte-
mology, and with applications of the concept of knowledge. I will bring together
and review some old results, and make some suggestions about possible future
developments. After a brief sketch of Hintikka’s basic logic of knowledge, I will
discuss, in section “Partition models”, the S5 epistemic models that were developed
and applied by theoretical computer scientists and game theorists, models that, I
will argue, conflate knowledge and belief. In section “Belief and knowledge”, I will
discuss a basic theory that distinguishes knowledge from belief and that remains
relatively noncommittal about substantive questions about knowledge, but that
provides a definition of belief in terms of knowledge. This theory validates a logic
of knowledge, S4.2, that is stronger than S4, but weaker than S5. In the remaining
four sections, I will consider some alternative ways of adding constraints on the
relation between knowledge and belief that go beyond the basic theory: in section
“Partition models and the basic theory”, I will consider the S5 partition models as a
special case of the basic theory; in section “Minimal and maximal extensions”, I will
discuss the upper and lower bounds to an extension of the semantics of belief to a
semantics for knowledge; in section “Belief revision and the defeasibility analysis”,
30 On Logics of Knowledge and Belief 607
2
I explore the problem of logical omniscience in two papers, Stalnaker (1991, 1999b) both included
in Stalnaker (1999a). I don’t attempt to solve the problem in either paper, but only to clarify it, and
to argue that it is a genuine problem, and not an artifact of a particular theoretical framework.
608 R. Stalnaker
that we should also assume that the relation is transitive, validating the much more
controversial principle that knowing implies knowing that one knows. Knowing and
knowing that one knows are, Hintikka claimed, “virtually equivalent.” Hintikka’s
reasons for this conclusion were not completely clear. He did not want to base
it on a capacity for introspection: he emphasized that his reasons were logical
rather than psychological. His proof of the KK principle rests on the following
principle: If fK®, K§g is consistent, then fK®, §g is consistent, and it is clear
that if one grants this principle, the KK principle immediately follows.3 The reason
for accepting this principle seems to be something like this: Knowledge requires
conclusive reasons for belief, reasons that would not be defeated by any information
compatible with what is known. So if one knows that ® while § is compatible with
what one knows, then the truth of § could not defeat one’s claim to know that ®. This
argument, and other considerations for and against the KK principle deserve more
careful scrutiny. There is a tangle of important and interesting issues underlying
the question whether one should accept the KK principle, and the corresponding
semantics, and some challenging arguments that need to be answered if one does.4
I think the principle can be defended (in the context of the idealizations we are
making), but I will not address this issue here, provisionally following Hintikka in
accepting the KK principle, and a semantics that validates it.
The S4 principles (Knowledge implies truth, and knowing implies knowing that
one knows) were as far as Hintikka was willing to go. He unequivocally rejects the
characteristic S5 principle that if one lacks knowledge, then one knows that one
lacks it.(“unless you happen to be as sagacious as Socrates”5 ), and here his reasons
seem to be clear and decisive6 :
The consequences of this principle, however, are obviously wrong. By its means (together
with certain intuitively acceptable principles) we could, for example, show that the
following sentence is self sustaining:
(13) p Ka Pa p. [In Hintikka’s notation, ‘Pa ’ is the dual of the knowledge operator, ‘Ka ’:
‘Ka ’. I will use ‘M’ for K]
The reason that (13) is clearly unacceptable, as Hintikka goes on to say, is that
it implies that one could come to know by reflection alone, of any truth, that it
was compatible with one’s knowledge. But it seems that a consistent knower might
believe, and be justified in believing, that she knew something that was in fact false.
That is, it might be, for some proposition ® that ®, and BK®. In such a case, if
the subject’s beliefs are consistent, then she does not believe, and so does not know,
that ® is compatible with her knowledge. That is, KK®, along with ®, will
be true, falsifying (13).
3
Substituting ‘K®’ for §, and eliminating a double negation, the principle says that if fK®,
KK®g is consistent, then fK®, K®g is consistent.
4
See especially, Williamson (2000) for some reasons to reject the KK principle.
5
Hintikka (1962, 106).
6
Ibid, 54.
30 On Logics of Knowledge and Belief 609
Partition Models
7
More precisely, if Ri is the accessibility relation for knower i, then the common-knowledge
accessibility relation for a group G is defined as follows; xRG y iff there is a sequence of worlds,
z1 , : : : zn such that z1 D x and zn D y and for all j between 1 and n 1, there is a knower i 2 G,
such that zj Ri zjC1 .
610 R. Stalnaker
accessibility relations. (Though if the logic of knowledge is S4 or S5, then the logic
of common knowledge will also be S4 or S5, respectively).
Theoretical computer scientists have used the logic and semantics for knowledge
to give abstract descriptions of distributed computer systems (such as office
networks or email systems) that represent the distribution and flow of information
among the components of the system. For the purpose of understanding how such
systems work and how to design protocols that permit them to accomplish the
purposes for which they are designed, it is useful to think of them as communities
of interacting rational agents who use what information they have about the system
as a whole to serve their own interests, or to play their part in a joint project. And it
is useful in turn for those interested in understanding the epistemic states of rational
agents to think of them in terms of the kind of simplified models that theoretical
computer scientists have constructed.
A distributed system consists of a set of interconnected components, each
capable of being in a range of local states. The way the components are connected,
and the rules by which the whole system works, constrain the configurations of
states of the individual components that are possible. One might specify such
a system by positing a set of n components and possible local states for each.
One might also include a component labeled “nature” whose local states represent
information from outside the system proper. Global states will be n-tuples of
local states, one for each component, and the model will also specify the set
of global states that are admissible. Admissible global states are those that are
compatible with the rules governing the way the components of the system interact.
The admissible global states are the possible worlds of the model. This kind of
specification will determine, for each local state that any component might be in,
a set of global states (possible worlds) that are compatible with the component
being in that local state. This set will be the set of epistemically possible worlds that
determines what the component in that state knows about the system as a whole.8
Specifically, if ‘a’ and ‘b’ denote admissible global states, and ‘ai ’ and ‘bi ’ denote
the ith elements of a and b, respectively (the local states of component i.), then
global world-state b is epistemically accessible (for i) to global world-state b if and
only if ai D bi . So, applying the standard semantic rule for the knowledge operator,
component (or knower) i will know that ®, in possible world a, if and only if ® is true
8
A more complex kind of model would specify a set of admissible initial global states, and a set
of transition rules taking global states to global states. The possible worlds in this kind of model
are the admissible global histories—the possible ways that the system might evolve. In this kind
of model, one can represent the distribution of information, not only about the current state of the
system, but also about how it evolved, and where it is going. In the more general model, knowledge
states are time-dependent, and the components may have or lack information not only about which
possible world is actual, but also about where (temporally) it is in a given world. The dynamic
dimension, and the parallels with issues about indexical knowledge and belief, are part of the
interest of the distributed systems models, but I will ignore these issues here.
30 On Logics of Knowledge and Belief 611
in all possible worlds in which i has the same local state that it has in world-state a.
One knows that ® if one’s local state carries the information that ®.9
Now it is obvious that this epistemic accessibility relation is an equivalence
relation, and so the logic for knowledge in a model of this kind is S5. Each of
the epistemic accessibility relations partitions the space of possible worlds, and
the cross-cutting partitions give rise to a simple and elegant model of common
knowledge, also with an S5 logic. Game theorists independently developed this
kind of partition model of knowledge and have used such models to bring out the
consequences of assumptions about common knowledge. For example, it can be
shown that, in certain games, players will always make certain strategy choices
when they have common knowledge that all players are rational. But as we have
seen, Hintikka gave reasons for rejecting the S5 logic for knowledge, and the
reasons seemed to be decisive. It seems clear that a consistent and epistemically
responsible agent might take herself to know that ® in a situation in which ® was
in fact false. Because knowledge implies truth, it would be false, in such a case,
that the agent knew that ®, but the agent could not know that she did not know that
® without having inconsistent beliefs. If such a case is possible, then there will be
counterexamples to the S5 principle, K® ! K K®. That is, the S5 principles
require that rational agents be immune to error. It is hard to see how any theory that
abstracts away from the possibility of error could be relevant to epistemology, an
enterprise that begins with skeptical arguments using scenarios in which agents are
systematically mistaken and that seeks to explain the relation between knowledge
and belief, presupposing that these notions do not coincide.
Different theorists have different purposes, and it is not immediately obvious that
the models of knowledge that are appropriate to the concerns of theoretical computer
scientists and game theorists need be relevant to issues in epistemology. But I think
that the possibility of error, and the differences between knowledge and belief are
relevant to the intended domains of application of those models, and that some of
the puzzles and problems that characterize epistemology are reflected in problems
that may arise in applying those theories.
As we all know too well, computer systems sometimes break down or fail
to behave as they were designed to behave. In such cases, the components of a
distributed system will be subject to something analogous to error and illusion. Just
as the epistemologist wants to explain how and when an agent knows some things
even when he is in error about others, and is interested in methods of detecting
and avoiding error, so the theoretical computer scientist is interested in the way
that the components of a system can avoid and detect faults, and can continue to
9
Possible worlds, on this way of formulating the theory, are not primitive points, as they are in
the usual abstract semantics, but complex objects—sequences of local states. But an equivalent
formulation might begin with a given set of primitive (global) states, together with a set of
equivalence relations, one for each knower, and one for “nature”. The local states could then be
defined as the equivalence classes.
612 R. Stalnaker
function appropriately even when conditions are not completely normal. To clarify
such problems, it is useful to distinguish knowledge from something like belief.
The game theorist, or any theorist concerned with rational action, has a special
reason to take account of the possibility of false belief, even under the idealizing
assumption that in the actual course of events, everyone’s beliefs are correct. The
reason is that decision theorists and game theorists need to be concerned with causal
or counterfactual possibilities, and to distinguish them from epistemic possibilities.
When I deliberate, or when I reason about why it is rational to do what I know that
I am going to do, I need to consider possible situations in which I make alternative
choices. I know, for example, that it would be irrational to cooperate in a one-shot
prisoners’ dilemma because I know that in the counterfactual situation in which I
cooperate, my payoff is less than it would be if I defected. And while I have the
capacity to influence my payoff (negatively) by making this alternative choice, I
could not, by making this choice, influence your prior beliefs about what I will do;
that is, your prior beliefs will be the same, in the counterfactual situation in which
I make the alternative choice, as they are in the actual situation Since you take
yourself (correctly, in the actual situation) to know that I am rational, and so that I
will not cooperate, you therefore also take yourself to know, in the counterfactual
situation I am considering, that I am rational, and so will not cooperate. But in that
counterfactual situation, you are wrong—you have a false belief that you take to
be knowledge. There has been a certain amount of confusion in the literature about
the relation between counterfactual and epistemic possibilities, and this confusion
is fed, in part, by a failure to make room in the theory for false belief.10
Even in a context in which one abstracts away from error, it is important
to be clear about the nature of the idealization, and there are different ways of
understanding it that are sometimes confused. But before considering the alternative
ways of making the S5 idealization, let me develop the contrast between knowledge
and belief, and the relation between them, in a more general setting.
Set aside the S5 partition models for the moment, and consider, from a more neutral
perspective, the logical properties of belief, and the relation between belief and
knowledge. It seems reasonable to assume, at least in the kind of idealized context
we are in, that agents have introspective access to their beliefs: if they believe that
®, then they know that they do, and if they do not, then they know that they do not.
(The S5, “negative introspection” principle, K® ! K K®, was problematic
for knowledge because it is in tension with the fact that knowledge implies truth,
but the corresponding principle for belief does not face this problem.) It also seems
reasonable to assume that knowledge implies belief. Given the fact that our idealized
10
These issues are discussed in Stalnaker (1996).
30 On Logics of Knowledge and Belief 613
believers are logically omniscient, we can assume, in addition, that their beliefs
will be consistent. Finally, to capture the fact that our intended concept of belief
is a strong one—subjective certainty—we assume that believing implies believing
that one knows. So our logic of knowledge and belief should include the following
principles in addition to those of the logic S4:
The resulting combined logic for knowledge and belief yields a pure belief logic,
KD45, which is validated by a doxastic accessibility relation that is serial, transitive
and euclidean.11 More interestingly, one can prove the following equivalence
theorem: ` B® $ MK® (using ‘M’ as the epistemic possibility operator, ‘K’).
This equivalence permits a more economical formulation of the combined belief-
knowledge logic in which the belief operator is defined in terms of the knowledge
operator. If we substitute ‘MK’ for ‘B’ in our principle (CB), we get MK® ! KM®,
which, if added to S4 yields the logic of knowledge, S4.2. All of the other principles
listed above (with ‘MK’ substituted for ‘B’) are theorems of S4.2, so this logic
of knowledge by itself yields a combined logic of knowledge and belief with the
appropriate properties.12
The assumptions that are sufficient to show the equivalence of belief with the
epistemic possibility of knowledge (one believes that ®, in the strong sense, if and
only if it is compatible with one’s knowledge that one knows that ®) might also
be made for a concept of justified belief, although the corresponding assumptions
will be more controversial. Suppose (1) one assumes that justified belief is a
necessary condition for knowledge, and (2) one adopts an internalist conception
of justification that supports the positive and negative introspection conditions (if
one has justified belief that ®, one knows that one does, and if one does not, one
knows that one does not), and (3) one assumes that since the relevant concept of
belief is a strong one, one is justified in believing that ® if and only if one is justified
in believing that one knows that ®. Given these assumptions, justified belief will also
11
KD45 adds to the basic modal system K the axioms (D), which is our CB, (4) B® ! BB®,
which follows immediately from our (PI) and (KB), and (5) B® ! B B®, which follows
immediately from (NI) and (KB). The necessitation rule for B (If ` ®, then ` B®) and the
distribution principle .B .® ! §/ ! .B® ! B§// can both be derived from our principles.
12
The definability of belief in terms of knowledge, and the point that the assumptions about
the relation between knowledge and belief imply that the logic of knowledge should be S4.2,
rather than S4, were first shown by Wolfgang Lenzen. See his classic monograph, Recent Work in
Epistemic Logic. Acta Philosophica Fennica 30 (1978). North-Holland, Amsterdam.
614 R. Stalnaker
coincide with the epistemic possibility that one knows, and so belief and justified
belief will coincide. The upshot is that for an internalist, a divergence between belief
(in the strong sense) and justified belief would be a kind of internal inconsistency.
If one is not fully justified in believing ®, one knows this, and so one knows that a
necessary condition for knowledge that ® is lacking. But if one believes that ®, in
the strong sense, then one believes that one knows it. So one both knows that one
lacks knowledge that ®, and believes that one has knowledge that ®.
The usual constraint on the accessibility relation that validates S4.2 is the fol-
lowing convergence principle (added to the transitivity and reflexivity conditions):
if xRy and xRz, then there is a w such that yRw and zRw. But S4.2 is also sound and
complete relative to the following stronger convergence principle: for all x, there is
a y such that for all z, if xRz, then zRy. The weak convergence principle (added
to reflexivity and transitivity) implies that for any finite set of worlds accessible
to x, there is a single world accessible with respect to all of them. The strong
convergence principle implies that there is a world that is accessible to all worlds
that are accessible to x. The semantics for our logic of knowledge requires the
stronger convergence principle.13
Just as, within the logic, one can define belief in terms of knowledge, so within
the semantics, one can define a doxastic accessibility relation for the derived belief
operator in terms of the epistemic accessibility relation. If ‘R’ denotes the epistemic
accessibility relation and ‘D denotes the doxastic relation, then the definition is
as follows: xDyDdf .z/ .xRz ! zRy/. Assuming that R is transitive, reflexive and
strongly convergent, it can be shown that D will be serial, transitive and euclidean—
the constraints on the accessibility relation that characterize the logic KD45.
One can also define, in terms of D, and so in terms of R, a third binary relation
on possible worlds that is relevant to describing the epistemic situation of our ideal
knower: Say that two possible worlds x and y are epistemically indistinguishable
to an agent (xEy) if and only if she has exactly the same beliefs in world x as she
has in world y. That is, xEyDdf .z/ .xDz $ yDz/. E is obviously an equivalence
relation, and so any modal operator interpreted in the usual way in terms of E would
be an S5 operator But while this relation is definable in the semantics in terms of
the epistemic accessibility relation, we cannot define, in the object language with
just the knowledge operator, a modal operator whose semantics is given by this
accessibility relation.
So the picture that our semantic theory paints is something like this: For any
given knower i and possible world x, there is, first, a set of possible worlds that
are subjectively indistinguishable from x, to i (those worlds that are E-related to x);
second, there is a subset of that set that includes just the possible worlds compatible
with what i knows in x (those worlds that are R-related to x); third, there is a subset of
13
The difference between strong and weak convergence does not affect the propositional modal
logic, but it will make a difference to the quantified modal logic. The following is an example
of a sentence that is valid in models satisfying strong convergence (along with transitivity and
reflexivity) but not valid in all models satisfying weak convergence: MK ..x/ .MK® ! ®//.
30 On Logics of Knowledge and Belief 615
that set that includes just the possible worlds that are compatible with what i believes
in x (those worlds that are D-related to x). The world x itself will necessarily be a
member of the outer set and of the R-subset, but will not necessarily be a member of
the inner D-subset. But if x is itself a member of the inner D-set (if world x is itself
compatible with what i believes in x), then the D-set will coincide with the R-set.
Here is one way of seeing this more general theory as a generalization of the
distributive systems models, in which possible world-states are sequences of local
states: one might allow all sequences of local states (one for each agent) to count
as possible world-states, but specify, for each agent, a subset of them that are
normal—the set in which the way that agent interacts with the system as a whole
conforms to the constraints that the system conforms to when it is functioning as it
is supposed to function. In such models, two worlds, x and y, will be subjectively
indistinguishable, for agent i (xEi y), whenever xi D yi (so the relation that was
the epistemic accessibility relation in the unreconstructed S5 distributed systems
model is the subjective indistinguishability relation in the more general models).
Two worlds are related by the doxastic accessibility relation (xDi y) if and only if
xi D yi , and in addition, y is a normal world, with respect to agent i.14 This will
impose the right structure on the D and E relations, and while it imposes some
constraints on the epistemic accessibility relation, it leaves it underdetermined. We
might ask whether R can be defined in a plausible way in terms of the components
of the model we have specified, or whether one might add some independently
motivated components to the definition of a model that would permit an appropriate
definition of R. This question is a kind of analogue of the question asked in the
more traditional epistemological enterprise—the project of giving a definition of
knowledge in terms of belief, truth, justification, and whatever other normative and
causal concepts might be thought to be relevant. Transposed into the model theoretic
framework, the traditional problem of adding to true belief further conditions that
together are necessary and sufficient for knowledge is the problem of extending
the doxastic accessibility relation to a reflexive relation that is the right relation (at
least in the idealized context) for the interpretation of a knowledge operator. In the
remainder of this paper, I will consider several ways that this might be done, and at
the logics of knowledge that they validate.
14
We observed in note 7 that an equivalent formulation of the S5 distributed systems models would
take the global world-states as primitive, specifying an equivalence relation for each agent, and
defining local states as equivalence classes of global states. In an equivalent formulation of this
kind of the more general theory, the assumption that every sequence of local states is a possible
world will be expressed by a recombination condition: that for every sequence of equivalence
classes (one for each agent) there is a possible world that is a member of their intersection. I have
suggested that a recombination condition of this kind should be imposed on game theoretic models
(where the equivalence classes are types, represented by probability functions), defending it as a
representation of the conceptual independence of the belief states of different agents.
616 R. Stalnaker
One extreme way of defining the epistemic accessibility relation in terms of the
resources of our models is to identify it with the relation of subjective indistin-
guishability, and this is one way that the S5 partition models have implicitly been
interpreted. If one simply assumes that the epistemic accessibility relation is an
equivalence relation, this will suffice for a collapse of our three relations into
one. Subjective indistinguishability, knowledge, and belief will all coincide. This
move imposes a substantive condition on knowledge, and so on belief, when it
is understood in the strong sense as belief that one knows, a condition that is
appropriate for the skeptic who thinks that we are in a position to have genuine
knowledge only about our own internal states—states about which we cannot
coherently be mistaken. On this conception of knowledge, one can have a false belief
(in the strong sense) only if one is internally inconsistent, and so this conception
implies a bullet-biting response to the kind of argument that Hintikka gave against
the S5 logic for knowledge. Hintikka’s argument was roughly this: S5 validates
the principle that any proposition that is in fact false, is known by any agent to
be compatible with his knowledge, and this is obviously wrong: The response
suggested by the conception of knowledge that identifies knowledge with subjective
indistinguishability is that if we assume that all we can know is how things seem to
us, and also assume that we are infallible judges of the way things seem to us, then
it will be reasonable to conclude that we are in a position to know, of anything that
is in fact false, that we do not know it.
There is a less radical way to reconcile our basic theory of knowledge and belief
with the S5 logic and the partition models. Rather than making more restrictive
assumptions about the concept of knowledge, or about the basic structure of the
model, one may simply restrict the intended domain of application of the theory to
cases in which the agent in question has, in fact, only true beliefs. On this way of
understanding the S5 models, the model theory does not further restrict the relations
between the three accessibility relations, but instead assumes that the actual world
of the model is a member of the inner D-set.15 This move does not provide us with
15
In most formulations of a possible-worlds semantics for propositional modal logic, a frame
consists simply of a set of worlds and an accessibility relation. A model on a frame determines
the truth values of sentences, relative to each possible world. On this conception of a model, one
cannot talk of the truth of a sentence in a model, but only of truth at a world in a model. Sentence
validity is defined, in formulations of this kind, as truth in all worlds in all models. But in some
formulations, including in Kripke’s original formal work, a frame (or model structure, as Kripke
called it at the time) included, in addition to a set of possible worlds and an accessibility relation, a
designated possible world—the actual world of the model. A sentence is true in a model if it is true
in the designated actual world, and valid if true in all models. This difference in formulation was
a minor detail in semantic theories for most of the normal modal logics, since any possible world
of a model might be the designated actual world without changing anything else. So the two ways
of defining sentence validity will coincide. But the finer-grained definition of a frame allows for
theories in which the constraints on R, and the semantic rules for operators, make reference to the
30 On Logics of Knowledge and Belief 617
a way to define the epistemic accessibility relation in terms of the other resources of
the model; but what it does is to stipulate that the actual world of the model is one for
which the epistemic accessibility relation is determined by the other components.
(That is, the set of worlds y that are epistemically accessible to the actual world
is determined) Since the assumptions of the general theory imply that all worlds
outside the D-sets are epistemically inaccessible to worlds within the D-sets, and
that all worlds within a given D-set are epistemically accessible to each other, the
assumption that the actual world of the model is in a D-set will determine the R-set
for the actual world, and will validate the logic S5.
So long as the object language that is being interpreted contains just one
modal operator, an operator representing the knowledge of a single agent, the
underdetermination of epistemic accessibility will not be reflected in the truth values
in a model of any expressible proposition. Since all possible worlds outside of
any D-set will be invisible to worlds within it, one could drop them from the
model (taking the set of all possible worlds to be those R-related to the actual
world) without affecting the truth values (at the actual world) of any sentence.
This generated submodel will be a simple S5 model, with a universal accessibility
relation. But as soon as one enriches the language with other modal and epistemic
operators, the situation changes. In the theory with two or more agents, even if
one assumes that all agents have only true beliefs, the full S5 logic will not be
preserved. The idealizing assumption will imply that Alice’s beliefs coincide with
her knowledge (in the actual world), and that Bob’s do as well, but it will not
follow that Bob knows (in the actual world) that Alice’s beliefs coincide with her
knowledge. To validate the full S5 logic, in the multiple agent theory, we need
to assume that it is not just true, but common knowledge that everyone has only
true beliefs. This stronger idealization is needed to reconcile the partition models,
used in both game theory and in distributed systems theory, with the general theory
that allows for a distinction between knowledge and belief. But even in a context
in which one makes the strong assumption that it is common knowledge that no
one is in error about anything, the possible divergence of knowledge and belief,
and the failure of the S5 principles to be necessarily true will show itself when
the language of knowledge and common knowledge is enriched with non-epistemic
modal operators, or in semantic models that represent the interaction of epistemic
and non-epistemic concepts. In game theory, for example, an adequate model of
the playing of a game must represent not just the epistemic possibilities for each
of the players, but also the capacities of players to make each of the choices that
are open to that player, even when it is known that the player will not make some
of those choices. One might assume that it is common knowledge that Alice will
act rationally in a certain game, and it might be that it is known that Alice would
be acting irrationally if she chose option X. Nevertheless, it would distort the
representation of the game to deny that Alice has the option of choosing action
actual world of the model. In such theories, truth in all worlds in all models may diverge from truth
in all models, allowing for semantic models of logics that fail to validate the rule of necessitation.
618 R. Stalnaker
X, and the counterfactual possibility in which she exercises that option may play
a role in the deliberations of both Alice and the other players, whose knowledge
that Alice will not choose option X is based on their knowledge of what she knows
would happen if she did. So even if one makes the idealizing assumption that all
agents have only true beliefs, or that it is common belief that everyone’s beliefs
are true, one should recognize the more general structure that distinguishes belief
from knowledge, and that distinguishes both of these concepts from subjective
indistinguishability. In the more general structure that recognizes these distinctions,
the epistemic accessibility relation is underdetermined by the other relations.
So our task is to say more about how to extend the relation D of doxastic accessi-
bility to a relation R of epistemic accessibility. We know, from the assumption that
knowledge implies belief, that in any model meeting our basic conditions on the
relation between knowledge and belief, R will be an extension of D (for all x and y,
if xDy, then xRy), and we know from the assumption that knowledge implies truth
that the extension will be to a reflexive relation. We know by the assumption that
belief is strong belief (belief that one knows) that R coincides with D, within the
D-set (for all x and y, if xDx, then xRy if and only if xDy). What remains to be said
is what determines, for a possible world x that is outside of a D-set, which other
possible worlds outside that D-set are epistemically accessible to x. If some of my
beliefs about what I know are false, what can be said about other propositions that I
think that I know?
The assumptions of the neutral theory put clear upper and lower bounds on the
answer to this question, and two ways to specify R in terms of the other resources of
the model are to make the minimal or maximal extensions. The minimal extension
of D would be the reflexive closure of D. On this account, the set of epistemically
possible worlds for a knower in world x will be the set of doxastically accessible
worlds, plus x. To make this minimal extension is to adopt the true belief analysis
of knowledge, or in case one is making the internalist assumptions about justified
belief, it would be to adopt the justified true belief analysis. The logic of true belief,
S4.4, is stronger than S4.2, but weaker than S5.16 The true belief analysis has its
defenders, but most will want to impose stronger conditions on knowledge, which
in our setting means that we need to go beyond the minimal extension of R.
It follows from the positive and negative introspection conditions for belief that
for any possible world x, all worlds epistemically accessible to x will be subjectively
indistinguishable from x (for all x and y, if xRy, then xEy) and this sets the upper
bound on the extension of D to R. To identify R with the maximal admissible
16
See the appendix for a summary of all the logics of knowledge discussed, their semantics, and
the relationships between them.
30 On Logics of Knowledge and Belief 619
extension is to define it as follows: xRy Ddf either (xDx and xDy) or (not xDx
and xEy). This account of knowledge allows one to know things that go beyond
one’s internal states only when all of one’s beliefs are correct. The logic of this
concept of knowledge, S4F, is stronger than S4.2, but weaker than the logic of
the minimal extension, S4.4. The maximal extension would not provide a plausible
account of knowledge in general, but it might be the appropriate idealization for a
certain limited context. Suppose one’s information all comes from a single source
(an oracle), who you presume, justifiably, to be reliable. Assuming that all of its
pronouncements are true, they give you knowledge, but in possible worlds in which
any one of its pronouncements is false, it is an unreliable oracle, and so nothing it
says should be trusted. This logic, S4F, has been used as the underlying logic of
knowledge in some theoretical accounts of a nonmonotonic logic. Those accounts
don’t provide an intuitive motivation for using this logic, but I think a dynamic
model, with changes in knowledge induced by a single oracle who is presumed to be
reliable, can provide a framework that makes intuitive sense of these nonmonotonic
theories.17
Any attempt to give an account of the accessibility relation for knowledge that
falls between the minimal and maximal admissible extensions of the accessibility
relation for belief will have to enrich the resources of the theory. One way to do
this, a way that fits with one of the familiar strategies for responding to the Gettier
counterexamples to the justified true belief analysis, is to add to the semantics for
belief a theory of belief revision, and then to define knowledge as belief (or justified
belief) that is stable under any potential revision by a piece of information that
is in fact true. This is the defeasibility strategy followed by many of those who
responded to Gettier’s challenge: the idea was that the fourth condition (to be added
to justified true belief) should be a requirement that there be no “defeater”—no true
proposition that, if the knower learned that it was true, would lead her to give up
the belief, or to be no longer justified in holding it.18 There was much discussion in
the post-Gettier literature, about exactly how defeasibility should be defined, but in
the context of our idealized semantic models, supplemented by a semantic version
of the standard belief revision theory, a formulation of a defeasibility analysis of
knowledge is straightforward. First, let me sketch the outlines of the so-called AGM
theory of belief revision,19 and then give the defeasibility analysis.
17
See Schwarz and Truszczyński (1992).
18
See Lehrer and Paxson (1969) and Swain (1974) for two examples.
19
See Gärdenfors (1988) for a survey of the basic ideas of the AGM belief revision theory, and
Grove (1988) for a semantic formulation of the theory.
620 R. Stalnaker
The belief revision project is to define, for each belief state (the prior belief state),
a function taking a proposition (the potential new evidence) to a posterior belief state
(the state that would be induced in one in the prior state by receiving that information
as one’s total new evidence). If belief states are represented by sets of possible
worlds (the doxastically accessible worlds), and if propositions are also represented
by sets of possible worlds, then the function will map one set of worlds (the prior
belief set) to another (the posterior belief set), as a function of a proposition. Let
B be the set representing the prior belief state, ® the potential new information,
and B(®) the set representing the posterior state. Let E be a superset of B that
represents the set of all possible worlds that are potential candidates to be compatible
with some posterior belief state. The formal constraints on this function are then as
follows: (1) B .®/ ! ® (the new information is believed in the posterior belief state
induced by that information). (2) If ® \ B is nonempty, then B .®/ D ® \ B (If the
new information is compatible with the prior beliefs, then nothing is given up—the
new information is simply added to the prior beliefs.). (3) B(®) is nonempty if and
only if ® \ E is non-empty (the new information induces a consistent belief state
whenever that information is compatible with the knower being in the prior belief
state. and only then). (4) If B .®/ \ § is nonempty, then B .® \ §/ D B .®/ \ §.
The fourth conditions is the only one that is not straightforward. What it says is that
if § is compatible, not with Alice’s prior beliefs, but with the posterior beliefs that
she would have if she learned ®, then what Alice should believe upon learning the
conjunction of ® and § should be the same as what she would believe if she first
learned ®, and then learned §. This condition can be seen as a generalization of
condition (2), which is a modest principle of methodological conservativism (Don’t
give up any beliefs if your new information is compatible with everything you
believe). It is also a kind of path independence principle. The order in which Alice
receives two compatible pieces of information should not matter to the ultimate
belief state.20
To incorporate the standard belief revision theory into our models, add, for each
possible world x, and for each agent i, a function that, for each proposition ®,
takes i’s belief state in x, Bx,i D fy: xDi yg, to a potential posterior belief state,
Bx,i (®). Assume that each of these functions meets the stated conditions, where
the set E, for the function Bx,i is the set of possible worlds that are subjectively
indistinguishable from x to agent i. We will also assume that if x and y are
subjectively indistinguishable to i, then i’s belief revision function will be the same
in x as it is in y. This is to extend the positive and negative introspection assumptions
20
The third principle is the least secure of the principles; there are counterexamples that suggest
that it should be given up. See Stalnaker (1994) for a discussion of one. The defeasibility analysis
of knowledge can be given with either the full AGM belief revision theory, or with the more neutral
one that gives up the fourth condition.
30 On Logics of Knowledge and Belief 621
to the agent’s belief revision policies. Just as she knows what she believes, so she
knows how she would revise her beliefs in response to unexpected information.21
We have added some structure to the models, but not yet used it to interpret
anything in the object language that our models are interpreting. Suppose our
language has just belief operators (and not knowledge operators) for our agents,
and only a doxastic accessibility relation, together with the belief revision structure,
in the semantics The defeasibility analysis suggests that we might add, for knower
i, a knowledge operator with the following semantic rule: Ki ® is true in world x iff
Bi ® is true in x, and for any proposition § that is true in x, Bx;i .§/ ! ®. Alice
knows that ® if and only if, for any § that is true, she would still believe that ® after
learning that §. Equivalently, we might define an epistemic accessibility relation in
terms of the belief revision structure, and use it to interpret the knowledge operator
in the standard way. Let us say that xRi y if and only if there exists a proposition
® such that fx; yg ®, and y … Bx;i .®/. The constraints imposed on the function
Bx,i imply that this relation will extend the doxastic accessibility relation Di , and
that it will fall between our minimal and maximal constraints on this extension.
The relation will be transitive, reflexive, and strongly convergent, and so meet all
the conditions of our basic theory. It will also meet an additional condition: it will
be weakly connected (if xRy and xRy, then either yRz, or zRy). This defeasibility
semantics will validate a logic of knowledge, S4.3, that is stronger than S4.2, but
weaker than either S4F or S4.4.22
So a nice, well behaved version of our standard semantics for knowledge falls
out of the defeasibility analysis, yielding a determinate account, in terms of the
belief revision structure, of the way that epistemic accessibility extends doxastic
accessibility. But I doubt that this is a plausible account of knowledge in general,
even in our idealized setting. The analysis is not so demanding as the S4F theory,
but like that theory, it threatens to let any false belief defeat too much of our
knowledge, even knowledge of facts that seem unrelated. Consider the following
kind of example: Alice take herself to know that the butler didn’t do it, since she
saw him in the drawing room, miles away from the scene of the crime, at the time
21
It should be noted that even with the addition of the belief revision structure to the epistemic
models I have been discussing, they remain static models. A model of this kind represents only
the agents’ beliefs at a fixed time, together with the policies or dispositions to revise her beliefs
that she has at that time. The model does not represent any actual revisions that are made when
new information is actually received. The models can be enriched by adding a temporal dimension
to represent the dynamics, but doing so requires that the knowledge and belief operators be time
indexed, and that one be careful not to confuse belief changes that are changes of mind with belief
changes that result from a change in the facts. (I may stop believing that the cat is on the mat
because I learn that what I thought was the cat was the dog, or I may stop believing it because the
cat gets up and leaves, and the differences between the two kinds of belief change are important).
22
In game theoretic models, the strength of the assumption that there is common knowledge of
rationality depends on what account one gives of knowledge (as well as on how one explains
rationality). Some backward induction arguments, purporting to show that common knowledge of
rationality suffices to determine a particular course of play (in the centipede game, or the iterated
prisoners’ dilemma, for example) can be shown to work with a defeasibility account of knowledge,
even if they fail on a more neutral account. See Stalnaker (1996).
622 R. Stalnaker
of the murder (or so she thinks). She also takes herself to know there is zucchini
planted in the garden, since the gardener always plants zucchini, and she saw the
characteristic zucchini blossoms on the vines in the garden (or so she thinks). As
it happens, the gardener, quite uncharacteristically, failed to plant the zucchini this
year, and coincidentally, a rare weed with blossoms that resemble zucchini blossoms
have sprung up in its place. But it really was the butler that Alice saw in the drawing
room, just as she thought. Does the fact that her justified belief about the zucchini is
false take away her knowledge about the butler? It is a fact that either it wasn’t really
the butler in the drawing room, or the gardener failed to plant zucchini. Were Alice
to learn just this disjunctive fact, she would have no basis for deciding which of her
two independent knowledge claims was the one that was wrong. So it seems that,
on the simple defeasibility account, the disjunctive fact is a defeater. The fact that
she is wrong about one of her knowledge claims seems to infect other, seemingly
unrelated claims. Now it may be right that if Alice was in fact reliably informed that
one of her two knowledge claims was false, without being given any information
about which, she would then no longer know that it was the butler that she saw. But
if the mere fact that the disjunction is true were enough to rob her of her knowledge
about the butler, then it would seem that almost all of Alice’s knowledge claims
will be threatened. The defeasibility account is closer than one might have thought
to the maximally demanding S4F analysis, according to which we know nothing
except how things seem to us unless we are right about everything we believe.
I think that one might plausibly defend the claim that the defeasibility analysis
provides a sufficient condition for knowledge (in our idealized setting), and so the
belief revision structure might further constrain the ways in which the doxastic
accessibility relation can be extended to an epistemic accessibility relation. But it
does not seem to be a plausible necessary and sufficient condition for knowledge.
In a concluding section, I will speculate about some other features of the relation
between a knower and the world that may be relevant to determining which of his
true beliefs count as knowledge.
respond to information that they do not in fact receive. This, of course, is a strategy
that played a central role in many of the responses to the Gettier challenge. I will
describe a very simple model of this kind, and then mention some of the problems
that arise in making the simple model even slightly more realistic.
Recall that we can formulate the basic theory of belief this way: a relation of
subjective indistinguishability, for each agent, partitions the space of possibilities,
and there will be a nonempty subset of each partition cell which is the set of worlds
compatible with what the agent believes in the worlds in that cell. We labeled those
worlds the normal one, since they are the worlds in which everything determining
the agent’s beliefs is functioning normally, all of the beliefs are true in those worlds,
and belief and knowledge coincide. The problem was to say what the agent knows
in the worlds that lie outside of the normal set. One idea is to give a more detailed
account of the normal conditions in terms of the way the agent interacts with the
world he knows about; we start with a crude and simple model of how this might
be done. Suppose our agent receives his information from a fixed set of independent
sources—different informants who send messages on which the agent’s knowledge
is based. The “informants” might be any kind of input channel. The agent might
or might not be in a position to identify or distinguish different informants. But
we assume that the informants are, in fact, independent in the sense that there
may be a fault or corruption that leads one informant to send misinformation (or
more generally, to be malfunctioning) while others are functioning normally. So
we might index normal conditions to the informant, as well as to the agent. For
example, if there are two informants, there will be a set of worlds that is normal
with respect to the input channel for informant one, and an overlapping set that is
normal for informant two. Possible worlds in which conditions are fully normal will
be those in which all the input channels are functioning normally—the worlds in
the intersection of the two sets.23 This intersection will be the set compatible with
the agent’s beliefs, the set where belief and knowledge coincide. If conditions are
abnormal with respect to informant one (if that information channel is corrupted)
then while that informant may influence the agent’s beliefs, it won’t provide any
knowledge. But if the other channel is uncorrupted, the beliefs that have it as
their sole source will be knowledge. The formal model suggested by this picture
is a simple and straightforward generalization of the S4F model, the maximal
admissible extension of the doxastic accessibility relation. Here is a definition of
the epistemic accessibility relation for the S4F semantics, where E(x) is the set of
worlds subjectively indistinguishable from x (to the agent in question) and N(x)
is the subset of that set where conditions are normal (the worlds compatible with
what the agent believes in world x): xRy if and only if x 2 N.x/ and y 2 N.x/,
or x … N.x/ and y 2 E.x/. In the generalization, there is a finite set of normal-
conditions properties, Nj , one for each informant j, that each determines a subset
of E(x), Nj (x), where conditions are functioning normally in the relation between
that informant and the agent. The definition of R will say that the analogue of the
23
It will be required that the intersection of all the normal-conditions sets be nonempty.
624 R. Stalnaker
S4F condition holds for each Nj . The resulting logic (assuming that the number of
independent information channels or informants is unspecified) will be the same as
the basic theory: S4.2.
Everything goes smoothly if we assume that information comes from discrete
sources, even if the agent does not identify or distinguish the sources. Even when
the agent makes inferences from beliefs derived from multiple sources, some of
which may be corrupt and other not, the model will determine which of his
true beliefs count as knowledge, and which do not. But in even a slightly more
realistic model, the causal explanations for our beliefs will be more complex, with
different sources not wholly independent, and deviations from normal conditions
hard to isolate. Beliefs may have multiple interacting sources—there will be cases
of overdetermination and preemption There will be problems about how to treat
cases where a defect in the system results, not in the reception of misinformation,
but from the failure to receive a message. (It might be that had the system been
functioning normally, I would have received information that would have led me
to give up a true belief.) And along with complicating the causal story, one might
combine this kind of model with a belief revision structure, allowing one to explore
the relation between beliefs about causal structure and policies for belief revision,
and to clarify the relation between the defeasibility analysis and an account based
on the causal strategy. The abstract problems that arise when one tries to capture a
more complex structure will reflect, and perhaps help to clarify, some of the patterns
in the counterexamples that arose in the post-Gettier literature. Our simple model
avoids most of these problems, but it is a start that may help to provide a context for
addressing them.
Appendix
To give a very concise summary of all the logics of knowledge I have discussed,
and their corresponding semantics, I will list, first the alternative constraints on the
accessibility relation, and then the alternative axioms. Then I will distinguish the
different logics, and the semantic conditions that are appropriate to them in terms of
the items on the lists.
Conditions on R:
(Ref) (x)xRx
(Tr) .x/.y/.z/..xRy & yRz/ ! xRz/
(Cv) .x/.y/.z/..xRy & yRz/ ! .9w/.yRw & zRw//
(SCv) .x/.9z/.y/.xRy ! yRz/
(WCt) .x/.y/.z/..xRy
& xRz/ ! .yRz _ zRy//
(F) .x/.y/ xRy ! ..z/.xRz ! yRz/ _ .z/.xRz ! zRy//
(TB) .x/.y/.xRy & x ¤ y/.z/.xRz ! zRy/
(E) .x/.y/.z/..xRy & xRz/ ! yRz/
30 On Logics of Knowledge and Belief 625
Axioms:
(T) K® ! ®
(4) K® ! KK®
(4.2) MK® ! KM®
(4.3) .K .® ! M§/ _ K .§ ! M®//
(f) ..M®&MK / ! K .M® _ //
(4.4) ..® & MK§/ ! K .® _ §//
(5) M® ! KM®
S4 KCTC4 Ref C Tr
S4.2 S4 C 4.2 Ref C Tr C SCv OR Ref C Tr C Cv
S4.3 S4 C 4.3 Ref C Tr C SCv C WCt OR Ref C Tr C WCt
S4F S4 C f Ref C Tr C F
S4.4 S4 C 4.4 Ref C Tr C TB
S5 S4 C 5 Ref C Tr C E
In each of the logics of knowledge we have considered, from S4.2 to S4.4, the
derived logic of belief, with belief defined by the complex operator MK, will be
KD45. (In S4, belief is not definable, since in that logic, the complex operator MK
does not satisfy the K axiom, and so is not a normal modal operator. In S5, belief
and knowledge coincide, so the logic of belief is S5.) KD45 is K C D C 4 C 5, where
D is .K® ! M®/. The semantic constraints are Tr C E C the requirement that the
accessibility relation be serial: (x)(9y)xRy.
In a semantic model with multiple knowers, we can add a common knowledge
operator, defined in terms of the transitive closure of the epistemic accessibility
relations for the different knowers. For any of the logics, from S4 to S4.4, with
the corresponding semantic conditions, the logic of common knowledge will be S4,
and the accessibility relation will be transitive and reflexive, but will not necessarily
have any of the stronger properties. If the logic of knowledge is S5, then the logic
of common knowledge will also be S5, and the accessibility relation will be an
equivalence relation.
626 R. Stalnaker
References
Battigalli, P., & Bonanno, G. (1999). Recent results on belief, knowledge and the epistemic
foundations of game theory. Research in Economics, 53, 149–225.
Fagin, R., Halpern, J. Y., Moses, Y., & Vardi, M. (1995). Reasoning about knowledge. Cambridge,
MA: MIT Press.
Gärdenfors, P. (1988). Knowledge in flux: Modeling the dynamics of epistemic states. Cambridge,
MA: MIT Press.
Gettier, E. (1963). Is justified true belief knowledge? Analysis, 6, 121–123.
Grove, A. (1988). Two modeling for theory change. Journal of Philosophical Logic, 17, 157–170.
Hintikka, J. (1962). Knowledge and belief. Ithaca: Cornell University Press.
Lehrer, K., & Paxson, T. (1969). Knowledge: Undefeated justified true belief. The Journal of
Philosophy, 66, 225–237.
Lenzen, W. (1978). Recent work in epistemic logic (Acta philosophica Fennica, Vol. 30).
Amsterdam: North-Holland.
Schwarz, G., & Truszczyński, M. (1992). Modal logic S4F and the minimal knowledge paradigm.
In Proceedings of the fourth conference on theoretical aspects of reasoning about knowledge
(pp. 184–198). Sam Mateo: Morgan Kaufmann Publishers, Inc.
Stalnaker, R. (1991). The problem of logical omniscience, I. Synthese, 89, 425–440. (Reprinted in
Stalnaker (1999a), 240–254).
Stalnaker, R. (1994). What is a non-monotonic consequence relation? Fundamenta Informaticae,
21, 7–21.
Stalnaker, R. (1996). Knowledge, belief and counterfactual reasoning in games. Economics and
Philosophy, 12, 133–162.
Stalnaker, R. (1999a). Context and content: Essays on intentionality in speech and thought. Oxford:
Oxford University Press.
Stalnaker, R. (1999b). “The Problem of Logical Omniscience II” in Stalnaker (1999a), 255–273.
Swain, M. (1974). Epistemic defeasibility. The American Philosophical Quarterly, 11, 15–25.
Williamson, T. (2000). Knowledge and its limits. Oxford: Oxford University Press.
Chapter 31
Sentences, Belief and Logical Omniscience,
or What Does Deduction Tell Us?
Rohit Parikh
Introduction
R. Parikh ()
City University of New York, 365 Fifth Avenue, New York, NY 10016, USA
e-mail: [email protected]
language. But the first kind of choice is perfectly open to animals and to pre-lingual
children.1
We argue that this way of thinking about beliefs not only allows us to address the
issue of logical omniscience and to offer formal models of “inconsistent” beliefs, it
also allows us to say something useful about Frege’s problem, whether it is sentences
or something else that we believe, and whether believes can be a binary relation at
all.2
We start with a dialogue between Socrates, Meno, and Meno’s slave boy from
Plato’s Meno. In this dialogue, Socrates carries on a conversation with the boy and
asks him the area (space) of a square whose side is two. The boy correctly answers
that the area is four. And then Socrates raises the question of doubling the area to
eight. He wants to know what the side should be. The boy makes two conjectures,
first that the side of the larger square should also be double (i.e., four) – but that
yields an area of sixteen, twice what is wanted. The boy’s second guess is that the
side should be three – but that yields an area of nine, still too large.
Socrates: Do you see, Meno, what advances he has made in his power of recollection? He
did not know at first, and he does not know now, what is the side of a figure of eight feet: but
then he thought that he knew, and answered confidently as if he knew, and had no difficulty;
now he has a difficulty, and neither knows nor fancies that he knows.
Meno: True.
Socrates: Is he not better off in knowing his ignorance?
Meno: I think that he is.
Socrates: If we have made him doubt, and given him the “torpedo’s shock,” have we done
him any harm?
Meno: I think not.
Socrates: We have certainly, as would seem, assisted him in some degree to the discovery
of the truth; and now he will wish to remedy his ignorance, but then he would have been
ready to tell all the world again and again that the double space should have a double side.
Now Socrates suggests that the diagonal of the smaller square would work as the
side of the larger square and the boy agrees to this. We continue with the quotation:
Socrates: And that is the line which the learned call the diagonal. And if this is the proper
name, then you, Meno’s slave, are prepared to affirm that the double space is the square of
the diagonal?
Boy: Certainly, Socrates.
1
In Ruth Marcus (1990, 1995) describes a man and his dog in a desert, deprived of water and
thirsty. When they both react the same way to a mirage, it is hard to deny that they both have the
same false belief that they are seeing water. The fact that animals can experience mirages would
seem to be substantiated by the fact that the Sanskrit word for a mirage is mrigajal which literally
means ‘deer water,’ faux water which deer pursue to their deaths. Frans de Waal makes a much
more detailed case for animal intentionality in Waal (2005).
2
In Levi (1997), Isaac Levi considers the doxastic commitment we have to try to achieve logical
closure of our beliefs, even when, as he admits, we cannot actually achieve such logical closure.
I am sympathetic to Levi’s requirement, but in this paper, my concern is to develop a theory of
actual beliefs rather than of doxastic commitments. The latter are less problematic from a purely
logical point of view. If the agent’s full beliefs are consistent, then his doxastic commitments will
satisfy a suitable modal logic, perhaps the logic KD4.
31 Sentences, Belief and Logical Omniscience, or What Does Deduction Tell Us? 629
Socrates: What do you say of him, Meno? Were not all these answers given out of his own
head?
Meno: Yes, they were all his own.
Socrates: And yet, as we were just now saying, he did not know?
Meno: True.
Socrates: But still he had in him those notions of his – had he not?
Meno: Yes.
Socrates: Then he who does not know may still have true notions of that which he does not
know?
Meno: He has.
—–
Here Socrates appears to be arguing that the beliefs which Meno’s slave could
be brought to have via a discussion were beliefs which he already had.3 There is
of course a tension in this line of argument, for if this is so, then deduction would
appear to be dispensable, and yet, leading the boy to make deductions was Socrates’
own method in bringing him to a new state of mind.
A conclusion similar to that of Socrates also follows from the Kripke Semantics
which has become a popular tool for formalizing logics of knowledge. The
semantics for the logic of knowledge uses Kripke structures with an accessibility
relation R, typically assumed to be reflexive, symmetric, and transitive. If we are
talking about belief rather than knowledge, then R would be serial, transitive, and
Euclidean. Then some formula is said to be believed (or known) at state s iff is
true at all states R-accessible from s. Formally,
3
This impression is surely strengthened by the remarks which Socrates makes elsewhere in Meno
to the effect that if knowledge was always present then the soul must be eternal. The soul, then,
as being immortal, and having been born again many times, and having seen all things that exist,
whether in this world or in the world below, has knowledge of them all; and it is no wonder that she
should be able to call to remembrance all that she ever knew about virtue, and about everything.
630 R. Parikh
omniscience, and then fiddling with it by various methods, leaves aside the question
of what our goal is. Unless we have a clear sight of the goal we are not likely to
reach it.
For instance, we do not know if the Goldbach conjecture is true, but if it is true,
then it is necessarily true. Surely we cannot argue that we do not know which
only because we are not aware of it. That is surely not the problem. Nor can
computational complexity help us, because if it is true, then there is a sound formal
system which has it as an axiom, and then a proof of the Goldbach conjecture is
going to be very fast in that system. If it is false, then of course the falsity is provable
in arithmetic as it only requires us to take notice of a particular even number which
is not the sum of two (smaller) primes.
The issue of computational complexity can only make sense for an infinite family
of questions, whose answers may be undecidable or at least not in polytime. But for
individual mathematical questions whose answers we do not know, the appeal to
computational complexity misses the issue.
In sum, our goal is to first answer the question, How do we know what people
believe? and then try to develop a theory of what people do believe.
Inconsistent Beliefs
Moreover, beliefs are not always consistent, so one could say, thank heaven for a
lack of logical omniscience! A person, some of whose beliefs conflict with others,
and who believes all logical consequences of her belief, would end up believing
everything!
The following two examples are from Daniel Kahneman’s Nobel lecture, (2002)
A bat and a ball cost $1.10 in total. The bat costs $1 more than the ball. How much does
the ball cost?” Almost everyone reports an initial tendency to answer “10 cents” because
the sum $1.10 separates naturally into $1 and 10 cents, and 10 cents is about the right
magnitude. Frederick found that many intelligent people yield to this immediate impulse:
50 % (47/93) of Princeton students, and 56 % (164/293) of students at the University of
Michigan gave the wrong answer. Clearly, these respondents offered a response without
checking it.
———–
Linda is 31 years old, single, outspoken and very bright. She majored in philosophy. As a
student she was deeply concerned with issues of discrimination and social justice and also
participated in antinuclear demonstrations.
#6 Linda is a bank teller
#8 Linda is a bank teller and active in the feminist movement
89 % of respondents rated item #8 higher in probability than item #6.
But the set of bank tellers who are active in the feminist movement is a proper
subset (perhaps even a rather small subset) of the set of all bank tellers, so #8 cannot
have higher probability4 than #6.
4
The “conjunction fallacy” is committed when someone assigns higher probability to a conjunction
than to one of the conjuncts. Gigerenzer (1996), Levi (2004) and Hintikka (2004) all dispute that
the 89 % of people who responded the way indicated were actually committing the conjunction
31 Sentences, Belief and Logical Omniscience, or What Does Deduction Tell Us? 631
Clearly human states of belief are not usually consistent. Nor are they usually
closed under logical inference. Various researchers including ourselves have looked
into this issue (Fagin et al. 1995; Gaifman 2004; Parikh 1987, 1995; Stalnaker
1999). There is also more recent work by Artemov and Nogina (2005), and by
Fitting (2004), aimed towards the logic of proofs.
And yet we do operate in the world without getting into trouble, and when
we discover an inconsistency or incompleteness in our thinking, we remove it,
either by adding some beliefs which we did not have before, or by deleting some
which we had. What we need is a formal account of this activity and a more
general representation of beliefs than is afforded by the current Kripke semantics.
In particular, logically omniscient belief states need to be represented as a proper
subset of all belief states.
The proposal we make here draws on the work of Ramsey (1931), de Finetti
(1937), Hayek (1948), and Savage (1954) with echoes from Marcus (1990), Marcus
(1995), Millikan (2006), and Whyte (1990).5 According to this view, an agent’s
beliefs are revealed by the choices which an agent makes, and while ‘incoherent’
choices are unwise, they are not inconsistent, for if they were, they would be
impossible. For an agent to make such unwise, incoherent, choices is perfectly
possible and is done all the time.
Imagine that Carol assigns probabilities of .3, .3, and .8 respectively to events
X; Y; X [ Y. One could say that these probabilities are inconsistent. But in fact
nothing prevents Carol from accepting bets based on these probabilities. What
makes them incoherent is that we can make Dutch book against Carol – i.e., place
bets in such a way that no matter what happens, she will end up losing money.6
Thus incoherent beliefs, on this account, are possible, (and hence consistent) but
unwise. It will be important to keep in mind this distinction between inconsistency
and incoherence.
We now look at some previous work which contains strong indications of the
direction in which to proceed.
Hayek (1948) considers an isolated person acting over a period according to a
preconceived plan. The plan
may, of course, be based on wrong assumptions concerning external facts and on this
account may have to be changed. But there will always be a conceivable set of external
events which would make it possible to execute the plan as originally conceived.
fallacy. However, I assume that the dispute is about the interpretation of this particular experiment,
and that these three writers would not dispute the more general point that people do sometimes
reason incorrectly.
5
However, unlike Ramsey et al., we shall not try to explain probabilistic belief.
6
For instance we can bet $3 on X, $3 on Y, and $2 against X [ Y. If either X or Y happens, we earn
$7 (at least), and lose (at most) $5. If neither happens, we gain $8 and lose $6, so that we again
make a profit – and Carol makes a loss.
632 R. Parikh
Beliefs are related here to an agent’s plans, and it is implicit in Hayek’s view that
the agent believes that the world is such that his plans will work out (or have a good
chance of doing so).
But note that the belief states which are implicit in plans are more general than the
belief states which correspond to Kripke structures, and the first may be incoherent
and/or logically incomplete. An agent may plan to buy high and sell low, and expect
to make money. It is not possible for the agent to actually do so, but it is perfectly
possible for the agent to have such a plan.
Animal Beliefs
Now, for a creature to have a plan it is not necessary that the plan be formulated
explicitly in language, or even that the planner has a language. It is perfectly
possible for an animal to have a plan and to a smaller extent, it is also possible
for a pre-lingual child to engage in deliberate behaviour which is plan-like.
This is an important point made explicitly by Marcus (1995), Searle (1994), and
also fairly clear in Millikan (2006), that we ought not to limit states like belief and
desire to language using creatures, i.e., to adult humans and older children. Frans de
Waal is even more emphatic on this point.7 We should also attribute some kinds of
intentional states to higher animals and to pre-lingual children. Going back further,
Hume (1988) is quite explicit on this point:
Next to the ridicule of denying an evident truth is that of taking much pains to defend it; and
no truth appears to me more evident, than that beasts are endow’d with thought and reason
as well as men.
7
“It wasn’t until an ape saved a member of our own species that there was public awakening to
the possibility of nonhuman kindness. This happened on August 16, 1996 when an eight-year old
female gorilla named Binti Jua helped a three-year-old boy who had fallen eighteen feet into the
primate exhibit at Chicago’s Brookfield Zoo. Reacting immediately, Binti scooped up the boy and
carried him to safety.” De Waal is quite disdainful of Katherine Hepburn’s remark in The African
Queen: “Nature, Mr. Allnut, is what we are put in this world to rise above.”
8
Of course we need not and should not attribute to the chicken the specific belief that such
caterpillars are poisonous. Davidson (1982) is right on this particular point. But we can attribute to
it the belief that eating them will lead to bad consequences.
31 Sentences, Belief and Logical Omniscience, or What Does Deduction Tell Us? 633
hypothetically, of considering possibilities without yet fully believing or intending them. The
Popperian animal discovers means by which to fulfill its purposes by trial and error with
inner representations. It tries things out in its head, which is, of course, quicker and safer
than trying them out in the world. It is quicker and safer than either operant conditioning or
natural selection. One of many reasonable interpretations of what it is to be rational is that
being rational is being a Popperian animal. The question whether any non-human animals
are rational would then be the question whether any of them are Popperian.
Finally Whyte (1990) suggests that we can even define truth in this way. He
appeals to Ramsey’s principle (R):
(R) A belief’s truth condition is that which guarantees the fulfilment of any desire by the
action which that belief and desire would combine to cause.
Defining Belief
However, we need not address the issue of truth here. Brandom (1994) subjects
Whyte to some criticism, but it only has to do with whether the criterion of truth
is adequate. We are only interested here in a representation of belief. Such a
representation must accommodate all the following groups: language possessing,
logically omniscient humans9; language possessing but fallible humans; and non-
lingual creatures like animals and very young children.10
However, if beliefs are not (always) expressed by sentences, or coincide with
propositions expressed by sentences, then we need another way of representing
belief states, and then relate language-oriented belief states to some specific species
of such belief states.
We shall consider two kinds of beliefs. Non-linguistic beliefs which may also
be possessed by animals, and linguistic beliefs which can only be possessed by
9
Of course I do not personally know any logically omniscient humans, but in a limited context it
is possible for a human to show full logical competence. Suppose that p stands for Pandas live in
Washington DC, q stands for Quine was born in Ohio, and r stands for Rabbits are called gavagai
at Harvard. Suppose that Jill believes that p is true and that q and r have the same truth values.
Then she is allowing two truth valuations, v D .t; t; t/, and v 0 D .t; f ; f /. Given a formula on
p; q; r in disjunctive normal form, she can evaluate v./ and v 0 ./. If both are t she can say that
she believes . If both are f , she disbelieves , and otherwise she is suspending judgment. Then
Jill will be logically omniscient in this domain. But note that she will actually have to make the
calculations rather than just sit back and say, “Now do I believe ?” In fact if it so happens that
is a complex formula logically equivalent to p, then represents the same proposition as p, and
is therefore believed by Jill. And yet, Jill will not agree to because it is the same ‘proposition’
as p, but rather that she will agree to the formula whose truth value she has calculated. See also,
Dennett (1985) p. 11.
10
It is plausible that when a vervet monkey utters a leopard call, then it is saying, ‘Ware leopard!’,
but surely nothing in the behaviour of such monkeys would justify us to think that they might utter
If there were no leopards here, then this would be a wonderful spot to picnic.
634 R. Parikh
humans; adults and older children. Of course the last two groups will also have non-
linguistic beliefs which must be somehow correlated with their linguistic beliefs.
Let B be the space (so far unspecified) of belief states of some agent. Then the
elements of B will be correlated with the choices which the agent makes. Roughly
speaking, if I believe that it is raining, I will take my umbrella, and if I believe that
it is not raining, then I won’t. What I believe is revealed by what I do.
But clearly the choice of whether to take my umbrella or not is correlated with
my belief only if I don’t want to get wet. So my preferences enter in addition to my
beliefs.
Conventional theories like that of Savage (1954) take the (actual and potential)
choices of an agent, assume that these choices satisfy certain axioms, and simulta-
neously derive both the agent’s beliefs (her subjective probability) and the agent’s
preferences (expressed by a utility function). But it is known that agents which are
rational in Savage’s sense, i.e., obey his axioms, are not very common and so we
would like to retain the rough framework without the inappropriate precision of
Savage’s theory.
So we will just assume that the agent has some space P of preferences, and that
the choices are governed by the beliefs as well as the preferences. We will use S for
the set of choice situations, and C for the set of choices. Thus the set fU; :Ug could
be a choice situation, or an element of S, (with U standing for take the umbrella);
and both U and :U are elements of C. We could say very roughly that an agent who
prefers not to get wet will choose U from fU; :Ug iff she believes that it is raining.
An agent who does have language can also be subjected to a purely linguistic
choice. If asked Do you think it is raining? the agent may choose from the set {Yes,
No, Not sure}. And it is going to be usually the case that the agent will choose U in
the situation, fU; :Ug iff she chooses Yes in the situation where she hears Do you
think that it is raining? But this is not a logical requirement, only a pragmatic one.
We will say that an agent endorses (agrees with) a sentence iff she chooses
Yes when asked, Do you think ?, and denies (or disagrees with) iff she chooses
No. She may also choose Not sure, in which case of course she neither endorses nor
denies.
Note that nothing prevents an agent from endorsing as well as :, but few
agents are that ‘out of it’. But of course an agent may endorse , ! , and either
deny or at least fail to endorse . We will say in the first case that the agent is
logically incoherent, and in the second that the agent is an incomplete reasoner – or
simply incomplete.
Given the fact that . ^ :/ ! for arbitrary is a tautology, it is obvious that
an agent who is logically incoherent, but also logically complete, and who endorses
both and :, will end up endorsing everything. Fortunately, most of us, though
we are logically incoherent, tend also to be incomplete. If we endorse the statements
that all men are equal, that Gandhi and Hitler are men, and that Gandhi and Hitler
are not equal, we are still not likely to agree that pigs fly – which we would if we
were also complete.
As we have made clear, elements of B cannot be identified with propositions,
for an agent may agree to one sentence expressing a proposition and disagree (or
31 Sentences, Belief and Logical Omniscience, or What Does Deduction Tell Us? 635
not agree with) another sentence expressing the same proposition. An agent in some
state b 2 B may agree with Hesperus is bright this evening, while disagreeing with
Phosphorus is bright this evening. But surely we may think of an agent speaking
some language L as agreeing with or endorsing a particular sentence .
Definition 1. A belief state b is incomplete when there are sentences 1 ; : : : ; n
which are endorsed in b, follows logically from 1 ; : : : ; n , and is not endorsed
in b.
A belief state b is incoherent when there are sentences 1 ; : : : ; n which are
endorsed in b, follows logically from 1 ; : : : ; n , and is denied in b.
There is of course an entirely different sort of incoherence which arises when an
agent’s linguistic behaviour does not comport with his choices. Suppose an agent
who prefers not to get wet, says that it is raining, but does not take her umbrella,
then she may well be quite coherent in her linguistic behaviour, but her linguistic
behaviour and her non-linguistic choices have failed to match. Whether we then
conclude that the agent is irrational, is being deceptive, or does not know the
language, will depend on the weight of the evidence.
We can easily see now how deduction changes things. Meno’s slave was initially
both incoherent and incomplete. He believed that a square of side four had an area
of eight. Since he knew some arithmetic and some geometry, his belief state was not
actually coherent. At the end of the conversation with Socrates, he came to endorse
the sentence, The square whose side is the diagonal of a square of side two, has an
area of eight. It is not quite clear what practical application this information had for
him in this case, but surely carpenters carrying out deductions of the same kind will
manage to make furniture which does not collapse and bears the load of people and
objects.
Beliefs can change not only as a result of deductions, as those Meno’s slave did,
they can also change as a result of experience, e.g., raindrops falling on your head,
or as the result of hearing something, like It is raining.
A Philosophical Aside
In this paper I shall avoid taking sides in philosophical disputes, but one thing does
need to be mentioned. The representational account of belief seems simply wrong
to me. We can certainly think of beliefs as being stored in the brain in some form
and called forth as needed, but when we think of the details, we can soon see that
the stored belief model is too meager to serve. I will offer an analogy.
Suppose Jennifer owns a bookstore. If someone calls her and asks if she has
Shakespeare’s Hamlet, she will look on her shelves, and answer yes if Hamlet
is in her store. But suppose now that someone asks her if she has Shakespeare’s
Tragedies. These include Hamlet of course, but also Macbeth, Romeo and Juliet
and King Lear. If they are stored alphabetically by title, then they will be in
different locations and they won’t be in her store as one item. But she can create
636 R. Parikh
the set and ship it out as one item to the customer. Did she have the set when the
customer called? Surely yes. It would sound silly to say to the customer, “I do have
Hamlet, Macbeth, Romeo and Juliet and King Lear, but unfortunately I don’t have
Shakespeare’s Tragedies”.
Let us go one step further. Suppose she actually only has a CD containing the
files for the four plays, a printer, and a binder. She can still create the set using what
she has, even though at the time of the call she had no physical object corresponding
to the noun phrase Shakespeare’s Tragedies.
It is what she has, namely the CD, the printer, and the binder, and what she
can do, namely print and bind, which together allow her to fulfill the order. There
are elements of pure storage, and elements which are algorithmic which together
produce the item in question. These two kinds of elements may not always be easy
to separate. It is wiser just to concentrate on what she can supply to her customer.
It is the same, in my view, with beliefs. No doubt there are certain beliefs which
are stored in some way, but there may be other equally valid beliefs which can be
produced on the spot so to say, without having been there to start with. And note
that if Jennifer has a large number of actual physical books, but also CDs for some,
it may be easier for her to produce a book from a CD than to find a book which
exists physically in the store.
If someone asks me if I believe that 232345456 is even, I can answer yes at once
(since the number ends in a 6), even though that particular belief had never been
stored. But if someone asks me the name of some classmate from high school, I
might take a long time to remember. Retrieving from storage is one way to exhibit
a belief, but not the only one, and often, not even the best one.
In this paper, I shall not assume any particular representation of beliefs, but deal
with them purely in terms of how a person acts, and how the potential for acting in
some way is affected by various kinds of update.
We assume given a space B for some agent whose beliefs we are considering. The
elements of B are the belief states of that agent, and these are not assumed to be
sentences in Mentalese although for some restricted purposes they could be. There
are three important update operations on B coming about as a result of (i) events
observed, (ii) sentences heard, and (iii) deductions made. Elements of B are also
used to make choices. Thus in certain states of belief an agent may make the choice
to take his umbrella and we could then say that the agent believes it is raining. Many
human agents are also likely to make the choice to say, “I think it is raining and so
I am taking my umbrella” but clearly only if the agent is English speaking. Thus
two agents speaking different languages, both of whom are taking their umbrellas,
but making different noises, have the same belief in one sense but a different one in
another. Both of these will matter. Later on we shall look into the connection.
31 Sentences, Belief and Logical Omniscience, or What Does Deduction Tell Us? 637
Deduction is an update which does not require an input from the outside. But it
can result in a change in B. Suppose for instance that Jane thinks it is clear, and is
about to leave the building without an umbrella. She might say, “Wait, didn’t I just
see Jack coming in with a wet umbrella? It must be raining.” The sight of Jack with
a wet umbrella might not have caused her to believe that it was raining, perhaps she
was busy with a phone call. But the memory of that wet umbrella may later cause a
deduction to take place of the fact that it is raining.
Thus our three update operations are:
B E !e B
B L !s B
B !d B
B S !ch B C
11
Of course, hearing a sentence is also an event, but its effect on speakers of the language goes
beyond just the event. It is this second part which falls under !s .
12
Even a dog may revise its state of belief on hearing Sit!, see for instance Parikh and Ramanujam
(2003). Note also that if the sentence heard is inconsistent with one’s current beliefs and one
notices the inconsistency, then some theory like that in Alchourron et al. (1985) may need to be
deployed.
13
Clearly Lois Lane will react differently to the sentences Superman flew over the Empire State
Building, and Clark Kent flew over the Empire State Building. Similarly, Kripke’s Pierre (1979)
will react differently to the questions, Would you like a free trip to Londra? and Would you like a
free trip to London? Indeed in the second case he might offer to pay in order not to go to London!
14
See, however, Alchourron et al. (1985) where more complex kinds of reactions to sentences heard
are described.
638 R. Parikh
An agent in a certain belief state makes a choice among various alternatives, and
may arrive at a different state of belief after making that choice.
If we want to explicitly include preferences, we could write,
B P S !ch B C
While S is the family of choice sets, C is the set of possible choices and P is some
representation of the agent’s preferences. Thus {take umbrella, don’t take umbrella}
is a choice set and an element of S but take umbrella is a choice and an element of
C.
Example. Suppose that Vikram believes it is not raining. In that case he will be in a
belief state b such that ch.b; fU; :Ug/ D .b0 ; :U/. Given a choice between taking
an umbrella or not, he chooses not to take the umbrella, and goes into state b0 .
Suppose, however, that he looks out the window and sees drops of rain falling.
Let r be the event of rain falling. Then the update operation !e causes him to go
into state c such that in state c he chooses U from fU; :Ug. Thus ch.c; fU; :Ug/ D
.c0 ; U/
Note that state c will also have other properties beside choosing to take an
umbrella. It may also cause him to say to others, “You know, it is raining,” or to
complain, “Gosh, does it always rain in this city?” There is no such thing as the
state of believing that it is raining. Every belief state has many properties, most of
them unrelated to rain.
Among the choices that agents make are choices to assent to or dissent from
sentences. But there is no logical reason why an agent who assents to “It is raining”
must take an umbrella or a raincoat. It is just the more pragmatic choice to take the
umbrella when one says that it is raining, because otherwise either one gets wet, or
one is suspected of insincerity. We could say that the agent believes the sentence “It
is raining”, and dis-believes the proposition that it is raining. But we would feel
uncomfortable with such an account and might prefer to say that the agent is either
lying or confused.
Of course an agent may prefer to get wet and in that case, saying “It is raining,”
and not taking an umbrella are perfectly compatible choices. This shows that an
agent’s preferences need to be taken into account when correlating the actions which
an agent takes and what an agent believes. But we usually do not want to get wet
and to make such choices, and usually we do not say what we do not believe. It does
not work for us.
Thus our theory of an agent presupposes such a belief set B, and appropriate
functions !e ; !s ; !d ; !ch . We can understand an agent (with some caveats) if
what we see as the effects of these maps conforms to some theory of what an agent
wants and what the agent thinks. And we succeed pretty well. Contra Wittgenstein
(1958), we not only have a theory of what a lion wants, and what it means when
it growls, we even have theories for bees and bats. Naturally these theories do not
have the map !s except with creatures like dogs or cats and some parrots (who not
only “parrot” our words but understand them to some extent (Pepperberg 2004)).
31 Sentences, Belief and Logical Omniscience, or What Does Deduction Tell Us? 639
The Setting
Let be a sentence. Then jjjj D fwjw ˆ g, the set of worlds where is
true, is the proposition corresponding to the sentence . Note that if and are
logically equivalent, then jjjj D jj jj.
Definition 2. We will say that i e-believes , Bie ./ if .P/ jjjj. We will
suppress the superscript i when it is clear from context.
It is obvious in terms of the semantics which we just gave that the statement “The
dog e-believes that there is a bone where he is digging” will be true from our point
of view. A Martian observing the same dog, but not having the notion of a bone (I
assume there are no vertebrates on Mars) will obviously not assign the same belief
to the dog; but the Martian may still have his own theory of the dog’s belief state
31 Sentences, Belief and Logical Omniscience, or What Does Deduction Tell Us? 641
which will allow him to predict the dog’s behavior. In other words, the Marian will
assign a space B 0 to the dog which will be bi-similar in relevant ways to the space
B which we would assign to the dog.
It is easy to see that if an agent e-believes and then the agent also e-believes
^ and that if the agent e-believes and ! then the agent e-believes .15
Oddly enough, creatures which do not use language do not suffer from a lack of
logical omniscience!
A lot of logic goes along with e-belief, but only within the context of a single
plan. For instance, one may drive a colleague to the airport for a two week
vacation in Europe and then forget, and arrange a meeting three days later at which
this colleague’s presence is essential. But within the context of a single (short)
plan, consistency and logical omniscience will tend to hold. The situation is more
complex with multiple plans. And there is nothing to prevent an agent from having
one e-belief in one plan and another contradicting e-belief in another plan. It is
pragmatic considerations – the logical counterpart of avoiding Dutch Book – which
will encourage the agent i to be consistent and to use logical closure.
Suppose someone has a plan P consisting of, “If then do ˛, else do ˇ” and
another plan P0 consisting of “If then do
, else do ı”. Now we find him doing ˛
and also doing ı (we are assuming that the truth value of has not changed). We
could accuse him of being illogical, but there is no need to appeal to logic. For he is
doing Dutch book against himself.
Presumably he assumed16 that u.˛j/ > u.ˇj/ but u.˛j:/ < u.ˇj:/. Thus
given , ˛ was better than ˇ but with : it was the other way around. Similarly,
u.
j/ > u.ıj/, but u.
j:/ < u.ıj:/. And that is why he had these plans. But
then his choice of ˛; ı results in a loss of utility whether is true or not. If is true
then he lost out doing ı and if is false, then he lost out doing ˛.
For a concrete example of this, suppose that on going out I advise you to take
your umbrella, but fail to take mine. If it is raining, there will be a loss of utility for
I will get wet. If it is not raining, there will be a loss of utility because you will be
annoyed at having to carry an umbrella for no good reason. My choice that I advise
you to take your umbrella, but fail to take mine, is not logically impossible. It just
makes no pragmatic sense. A similar argument will apply if someone endorses ,
endorses ! and denies . If such a person makes plans comporting with these
three conditions, then he will make choices which do not maximise his utility. Of
course such arguments go back to Ramsey (1931) and Savage (1954).
15
To see this, if .P/ jjjj, and .P/ jj jj then clearly .P/ jjjj\jj jj D jj ^ jj. The
proof for the other case is similar using the fact that jj ! jj D .C jjjj/ [ jj jj. Since .P/
is contained in jjjj, it is disjoint from C jjjj. Hence it can be contained in .C jjjj/ [ jj jj
if and only if it contained in jj jj.
16
The use of the letter u for utility is not meant to suggest that we have a formal notion of utility in
mind; only a rough one.
642 R. Parikh
If we assume that over time, people learn to maximise their utility (they do not
always but often do), then they will ‘learn’ a certain amount of logic and they will
make certain obvious logical inferences.17
A little girl who was in kindergarten was in the habit of playing with her older
sister in the garden every day when she came home. Once, her older sister was sick.
So the little girl went to visit her sick sibling in her bedroom, and then, as usual,
went out into the garden to play with her. Clearly the little girl had not yet learned
to maximise her utility!
We now define a second notion of belief which does not imply logical omniscience.
This is a more self-conscious, language-dependent notion of belief.
For agents i who do have a language (assumed to be English from now on), their
plan may contain linguistic elements. At any moment of time they have a finite stock
of currently believed sentences. This stock may be revised as time passes. These
agents may perform atomic actions from time to time, and also make observations
which may result in a revision in their stock of believed sentences.18
Thus Lois seeing Superman in front of her will add the sentence “Superman is in
front of me”, to her stock, but, since she does not know that Clark Kent is Superman,
she will not add the sentence “Clark Kent is in front of me”. Someone else may add
the sentence “I see the Evening Star”, but not the sentence “I see the Morning Star”
at 8 PM on a summer night. A person who knows that ES = MS, may add the
sentence, “Venus is particularly bright tonight.” In any case, this stock consists of
sentences and not of propositions.
The basic objects in the agents’ plans are atomic actions and observations which
may be active (one looks for something) or passive (one happens to see something).
These are supplemented by the operations of concatenation (sequencing), if then
else, and while do, where the tests in the if then else and while do are on sentences.
There may also be recursive calls to the procedure: find out if the sentence or
its negation is derivable within the limits of my current resources, from my current
stock of beliefs. Thus if i’s plan has currently a test on , then, to be sure, the stock
of sentences will be consulted to see if or its negation is in the stock. But there
may also be a recursive call to a procedure for deciding . If someone asks “Do you
17
In Bicchieri (1997) suggests that co-operation also comes about as a result of such a learning
process. Such suggestions have of course also been made by many others. Since we are only
considering the one-agent case here, we shall not go into this issue any further. See, however, our
Parikh (1991).
18
It may seem to the reader as if I am endorsing a representational theory after all, but not so. First,
the stock may not literally exist, but may simply refer to those sentences which the agent assents
to quickly. Secondly, the agent’s beliefs need not be restricted to this stock – just as earlier, the
bookseller Jill was not restricted to selling sets of books which were in her store as a set.
31 Sentences, Belief and Logical Omniscience, or What Does Deduction Tell Us? 643
know the time?”, we do not usually say, “I don’t”, but look at our watches. Thus
consulting our stock of sentences is typically only the first step in deciding if some
sentence or its negation can be derived with the resources we have.
This difference between sentences and propositions matters as we now show.19
It has been suggested in this context (e.g. by Stalnaker 1999) that such issues
can be addressed by using the notion of fine grain. By this account, if I understand
it correctly, logical equivalence is too coarse a grain and that a finer grain may be
needed. So if two sentences are in the same fine grain and an agent knows or believes
one, then the agent will also know or believe the other. But if we try to flesh out this
metaphor then we can see that it is not going to work.
For instance if G.; / means that and are in the same grain, then G will
be an equivalence relation. But surely we cannot have transitivity in reality, because
we could have a sequence 1 ,. . . ,n of sentences, any two successive ones of which
are easily seen to be equivalent, whereas it is quite hard to see the equivalence of 1
and n .
Moreover, “being in the same grain” sounds interpersonal. If two molecules are
in the same rice grain for you, then they are also in the same fine grain for me –
it is just a fact of the matter. But in reality people differ greatly in their ability to
perceive logical equivalences.
Thus suppose some set theorist thinks of some new axiom and wonders if
implies the continuum hypothesis, call it . The set theorist may find it quite easy
to decide this question even if we see no resemblance between and . And if
he does not find it easy, he may look in the literature or ask another set theorist,
processes which cannot easily be built into a formal theory. And they should not be!
For if they were easy to build into a formal theory, then the representation is almost
certain to be wrong.
Or a chess champion may be able to see 20 moves (40 half-moves) ahead in the
end game, but you and I cannot. And he too usually cannot do this during the middle
game. Thus context, habit and expertise matter a lot.
Suppose for instance that Lois Lane has invited Clark Kent to dinner but he has
not said yes or no. So she forms the plan, While I do not have a definite answer
one way or another, if I see Clark Kent, I will ask him if he is coming to dinner.
Here seeing Clark Kent is understood to consist of an observation followed by the
addition of the sentence “I am seeing Clark Kent” to her stock.
Suppose now that she sees Superman standing on her balcony. She will not ask
him if he is coming to dinner as the sentence “I am seeing Clark Kent” will not be in
19
We might compare entertaining a proposition as a bit like entering a building. If Ann and Bob
enter the same building through different doors, they need not be in the same spot, and indeed they
might never meet. But what makes it the same building is that they could meet without going out
into the street. Thus if Ann and Bob are apt to assent to sentences s; s0 respectively where we know
that s; s0 are equivalent; then it need not follow that there is a sentence they share. But they could
through a purely deductive process, and without appealing to any additional facts, be brought to a
situation where Ann assents to s0 and Bob to s (unless one of them withdraws a belief, which may
also happen).
644 R. Parikh
her stock of sentences. And this is the sense in which she does not know that when
she is seeing Superman, she is also seeing Clark Kent. If she suspects that Clark Kent
is Superman, then it may happen that her recursive call to the procedure “decide if
I am seeing Clark Kent” will take the form of the question, “Are you by any chance
Clark Kent, and if so, are you coming to dinner?” addressed to Superman.
Have we given up too much by using sentences and not propositions as the
objects of i-belief? Suppose her plan is, “If I see Jack and Jill, I will ask them to
dinner” but she sees Jill first and then Jack so that she adds the sentence “Jill and
Jack are in front of me” to her stock. Will this create a problem? Not so, because if a
sentence is in her stock, easily implies , and she needs to know the value of
so she can choose, then the program find out about which she calls will probably
find the sentence she needs. If the program terminates without yielding an answer
she may well have a default action which she deems safest.
So here we make use of the fact that Lois does have a reasonable amount of
intelligence. Even if she does not explicitly add some sentence to her stock, when
she comes to a point where the choice of action depends on , she will ask if or
its negation is derivable from her present stock of sentences, possibly supplemented
by some actions which add to this stock.
Definition 3. If an agent a comes to a point in her plan where her appropriate action
is If then do ˛ else do ˇ, and she does ˛, then we will say that she i-believes .
If, moreover, is true, and we believe that in a similar context she would judge it
to be true only if it is true, then (within the context of this plan) we will say that she
i-knows .
A common example of such a plan is the plan to answer a question correctly. Thus
if an agent is asked “Is true?”, the agent will typically call the procedure “decide if
is true”, and then answer “yes”, “no”, or “I don’t know” in the appropriate cases.
Now note that if an agent means to deceive, then the same procedure will be
called, but the answers given will be the opposite of the ones indicated by the
procedure. But if we ourselves know that the agent’s plan is to deceive, then we
can clearly take the statement “ is false” to indicate that the agent believes .
We no longer have the law that if the agent i-knows and implies then the
agent of necessity i-knows . But if the agent has the resources to decide and the
proof of from is easy, then she might well also know . But her notion of “easy”
may be different from ours, and how much effort she devotes to this task will depend
on her mood, how much energy she has, etc. Dennett (1985), who makes a related
point, does not seem to make the distinction between linguistic and non-linguistic
belief which I am making here.
For instance if a customer sits down on a bar stool and the bartender sees him,
we do not need to ask, “Does the bartender know he wants a drink?” Of course he
knows. But suppose a woman suffering from a persistent cough calls her husband
and says, “I got an appointment with the doctor for 2:30 PM”, she may later think,
“I wonder if he realized that I cannot pick up our daughter at 3”. When i knows
and is deducible from , then whether we assume that i knows will depend less
on some objective distance between and than on what we know or can assume
about i’s habits.
31 Sentences, Belief and Logical Omniscience, or What Does Deduction Tell Us? 645
Note now that we often tend to assign knowledge and beliefs to agents even when
they are not in the midst of carrying out a plan. Even when we are brushing our teeth
we are regarded as knowing that the earth goes around the sun. This can be explained
by a continuity assumption. Normally when we are asked if the earth goes around
the sun or vice versa, we say that the former is the case. An agent is then justified
in assuming that even in between different occasions of being so asked, if we were
asked, we would give the same answer. This assumption, which is usually valid, is
what accounts for such attributions of belief.
Conclusion
We have tried to give an account of belief which starts with behavior rather than with
some ad hoc logical system. There are many things to be worked out. For instance,
why do we perform deductions at all? They must benefit us in some way, but a
detailed logical account of how systems can evolve towards better logical acumen
is yet to be developed. Yet another question to ask is about partial beliefs. In some
circumstances these partial beliefs will be correlated with a probability function,
but again, a concordance with the Kolmogorov axioms will be a norm which is not
always observed in practice.
Acknowledgements We thank Sergei Artemov, Can Başkent, Samir Chopra, Horacio Arló Costa,
Juliet Floyd, Haim Gaifman, Isaac Levi, Mike Levin, Larry Moss, Eric Pacuit, Catherine Wilson,
and Andreas Witzel for comments. The information about chess came from Danny Kopec. This
research was supported by a grant from the PSC-CUNY faculty research assistance program.
Earlier versions of this paper were given at TARK-05, ESSLLI-2006, at the Jean Nicod Institute,
at a seminar in the philosophy department at Bristol University, and at the Philosophy Colloquium
at the City University Graduate Center. Some of the research for this paper was done when the
author was visiting Boston University and the Netherlands Institute for Advanced Study. A very
preliminary version of some of the ideas was presented at Amsterdam, and published as Parikh
(2001). This research was partially supported by grants from the PSC-CUNY program at the City
university of New York.
References
Alchourron, C., Gärdenfors, P., & Makinson, D. (1985). On the logic of theory change: Partial
meet contraction and revision functions. The Journal of Symbolic Logic, 50, 510–530.
Artemov, S., & Nogina, E. (2005). On epistemic logics with justifications. In R. Meyden (Ed.),
Theoretical aspecits of rationality and knowledge (pp. 279–294). Singapore: University of
Singapore press.
Aumann, R. (1976). Agreeing to disagree. Annals of Statistics, 4, 1236–1239.
van Benthem, J. (1976). Modal correspondence theory. Doctoral dissertation, University of
Amsterdam.
Bicchieri, C. (1997). Learning to co-operate. In C. Biccheri, R. C. Jeffrey, & B. Skyrms (Eds.), The
dynamics of norms (pp. 17–46). Cambridge: Cambridge University Press.
646 R. Parikh
Ramsey, F. P. (1990). Facts and propositions. In D. H. Mellor (Ed.), Philosophical papers (pp. 34–
51). Cambridge/New York: Cambridge University Press.
Ramsey, F. P. (1931). Truth and probability. In The foundations of mathematics (pp. 156–198).
London: Routledge and Kegan Paul.
Savage, L. J. (1954). The foundations of statistics. New York: Wiley.
Schwitzgebel, E. (2002). A phenomenal, dispositional account of belief. Nous, 36:2, 249–275.
Schwitzgebel, E. (2006, Fall). Belief. In E. N. Zalta (Ed.), The stanford encyclopedia of philosophy
(Fall 2006 ed.). http://plato.stanford.edu/archives/fall2006/entries/belief/.
Searle, J. (1994). Animal minds. In P. French & H. Wettstein (Eds.), Philosophical naturalism,
midwest studies in philosophy (XIX, pp. 206–219). Notre Dame: University of Notre Dame
Press.
Stalnaker, R. (1999). Context and content. Oxford/New York: Oxford University Press.
de Waal, F. (2005). Our inner ape. Penguin. Singapore.
Whyte, J. T. (1990). Success semantics. Analysis, 50, 149–157.
Wittgenstein, L. (1958). Philosophical investigations. New York: MacMillan.
Chapter 32
The Logic of Justification
Sergei Artemov
Introduction
The resulting Epistemic Logic has been remarkably successful in terms of develop-
ing a rich mathematical theory and applications (cf. Fagin et al. 1995; Meyer and
van der Hoek 1995, and other sources). However, the notion of justification, which
has been an essential component of epistemic studies, was conspicuously absent
in the mathematical models of knowledge within the epistemic logic framework.
This deficiency is displayed most prominently, in the Logical Omniscience defect
of the modal logic of knowledge (cf. Fagin and Halpern 1985, 1988; Hintikka 1975;
This work has been supported by NSF grant 0830450, CUNY Collaborative Incentive Research
Grant CIRG1424, and PSC CUNY Research Grant PSCREG-39-721.
S. Artemov ()
Graduate Center CUNY, 365 Fifth Avenue, New York, NY 10016, USA
e-mail: [email protected]
Moses 1988; Parikh 1987). In the provability domain, the absence of an adequate
description of the logic of justifications (here mathematical proofs) remained an
impediment to both formalizing the Brouwer-Heyting-Kolmogorov semantics of
proofs and providing a long-anticipated exact provability semantics for Gödel’s
provability logic S4 and intuitionistic logic (Artemov 1999, 2001, 2007; van Dalen
1986). This lack of a justification component has, perhaps, contributed to a certain
gap between epistemic logic and mainstream epistemology (Hendricks 2003, 2005).
We would like to think that Justification Logic is a step towards filling this void.
The contribution of this paper to epistemology can be briefly summarized as
follows.
We describe basic logical principles for justifications and relate them to both mainstream
and formal epistemology. The result is a long-anticipated mathematical notion of justi-
fication, making epistemic logic more expressive. We now have the capacity to reason
about justifications, simple and compound. We can compare different pieces of evidence
pertaining to the same fact. We can measure the complexity of justifications, which leads
to a coherent theory of logical omniscience. Justification Logic provides a novel, evidence-
based mechanism of truth-tracking which seems to be a key ingredient of the analysis of
knowledge. Finally, Justification Logic furnishes a new, evidence-based foundation for the
logic of knowledge, according to which
t is a justification of F: (32.3)
In this section, we will survey the Logic of Proofs, Gettier’s examples (Gettier
1963), and examine some classical post-Gettier sources to determine what logical
principles in the given Justification Logic format (propositional Boolean logic with
justification assertions t W F) may be extracted. As is usual with converting infor-
mally stated principles into formal ones, a certain amount of good will is required.
This does not at all mean that the considerations adduced in Dretske (1971),
Goldman (1967), Lehrer and Paxson (1969), Nozick (1981), and Stalnaker (1996)
may be readily formulated in the Boolean Justification Logic. The aforementioned
652 S. Artemov
papers are written in natural language, which is richer than any formal one; a more
sophisticated formal language could probably provide a better account here, which
we leave to future studies.
The Logic of Proofs LP was suggested by Gödel in (1995) and developed in full
in Artemov (1995, 2001). LP gives a complete axiomatization of the notion of
mathematical proof with natural operations ‘application,’ ‘sum,’ and ‘proof checker.’
We discuss these operations below in a more general epistemic setting.
In LP, justifications are represented by proof polynomials, which are terms built
from proof variables x; y; z; : : : and proof constants a; b; c; : : : by means of two
binary operations: application ‘’ and sum (union, choice) ‘C,’ and one unary
operation proof checker ‘Š’. The formulas of LP are those of propositional classical
logic augmented by the formation rule: if t is a proof polynomial and F a formula,
then t W F is again a formula.
The Logic of Proofs LP contains the postulates of classical propositional logic
and the rule of Modus Ponens along with
s W .F ! G/ ! .t W F ! .st/ W G/ (Application)
s W F ! .sCt/ W F, t W F ! .sCt/ W F (Sum)
t W F !Št W .t W F/ (Proof Checker)
t W F !F (Reflection).
Proof constants in LP represent ‘atomic’ proofs of axioms which are not analyzed
any further. In addition to the usual logical properties, such as being closed under
substitution and respecting the Deduction Theorem, LP enjoys the Internalization
property:
Gettier Examples
Gettier in Gettier (1963) described two situations, Case I and Case II, that
were supposed to provide examples of justified true beliefs which should not be
considered knowledge. In this paper we will focus on formalizing Case I, which
proved to be more challenging. Case II can be easily formalized in a similar fashion.
Here is a shortened exposition of Case I from Gettier (1963).
Suppose that Smith and Jones have applied for a certain job. And suppose that
Smith has strong evidence for the following conjunctive proposition:
(d) Jones is the man who will get the job, and Jones has ten coins in his pocket.
Proposition (d) entails:
32 The Logic of Justification 653
(e) The man who will get the job has ten coins in his pocket.
Let us suppose that Smith sees the entailment from (d) to (e), and accepts (e) on
the grounds of (d), for which he has strong evidence. In this case, Smith is clearly
justified in believing that (e) is true. But imagine, further, that unknown to Smith, he
himself, not Jones, will get the job. And, also, unknown to Smith, he himself has ten
coins in his pocket. Then, all of the following are true:
(1) (e) is true,
(2) Smith believes that (e) is true, and
(3) Smith is justified in believing that (e) is true.
But it is equally clear that Smith does not know that (e) is true: : :.
Gettier uses a version of the epistemic closure principle, closure of justification
under logical consequence:
: : : if Smith is justified in believing P, : : : and Smith deduces Q from P : : :, then Smith is
justified in believing Q.
Goldman’s Reliabilism
Goldman’s principle makes it clear that a justified belief (in our language, a situation
t justifies F for some t) for an agent occurs only if F is true, which provides the
Factivity Axiom for ‘knowledge-producing’ justifications
The Factivity Axiom is assumed for factive justifications (systems JT, LP, JT45
below) but not for general justification systems J, J4, J45, JD45.
With a certain amount of good will, we can assume that the ‘causal chain’
leading from the truth of F to a justified belief that F manifests itself in the Principle
of Internalization which holds for many Justification Logic systems:
Lehrer and Paxson in Lehrer and Paxson (1969) offered the following ‘indefeasibil-
ity condition’:
there is no further truth which, had the subject known it, would have defeated [subject’s]
present justification for the belief.
The ‘further truth’ here could refer to a possible update of the subject’s database,
or some possible-worlds situation, etc.: these readings lie outside the scope of our
language of Boolean Justification Logic. A natural reading of ‘further truth’ in
our setting could be ‘other postulate or assumption of the system,’ which means
a simple consistency property which vacuously holds for all Justification Logic
systems considered here. Another plausible reading of ‘further truth’ could be
‘further evidence,’ and we assume this particular reading here. Since there is no
temporal or update component in our language yet, ‘any further evidence’ could be
understood for now as ‘any other justification,’ or just ‘any justification.’
Furthermore, Lehrer and Paxson’s condition seems to involve a negation of an
existential quantifier over justifications ‘there is no further truth : : : ,’ or
there is no justification: : :.
However, within the classical logic tradition, we can read this as a universal
quantifier over justifications followed by a negation
given s W F, for any evidence t, it is not the case that t would have defeated s W F.
The next step is to formalize ‘t does not defeat s W F.’ This informal statement seems
to suggest an implication
if s W F holds, then the joint evidence of s and t, which we denote here as s C t, is also an
evidence for F, i.e., .s C t/ W F holds.
Here is the resulting formal version of Lehrer–Paxson’s condition: for any proposi-
tion F and any justifications s and t, the following holds
Further Assumptions
In order to build a formal account of justification, we will make some basic structural
assumptions: justifications are abstract objects which have structure, operations on
justifications are potentially executable, agents do not lose or forget justifications,
agents apply the laws of classical logic and accept their conclusions, etc.
In the following, we consider both: justifications, which do not necessarily yield
the truth of a belief, and factive justifications, which yield the truth of the belief.
Application
Note that the epistemological closure principle which could be formalized using
the knowledge modality K as
smuggles the logical omniscience defect into modal epistemic logic. The latter does
not have the capacity to measure how hard it is to attain knowledge (Fagin and
Halpern 1985, 1988; Hintikka 1975; Moses 1988; Parikh 1987). Justification Logic
provides natural means of escaping logical omniscience by keeping track of the size
of justification terms Artemov and Kuznets (2006).
Monotonicity of Justification
The Monotonicity property of justification has been expressed by the operation sum
‘C,’ which can be read from (32.11). If s W F, then whichever evidence t occurs,
the combined evidence s C t remains a justification for F. Operation ‘C’ takes
justifications s and t and produces s C t, which is a justification for everything
justified by s or by t.
s W F ! .s C t/ W F and s W F ! .t C s/ W F:
A similar operation ‘C’ is present in the Logic of Proofs LP, where the sum ‘s C t’
can be interpreted as a concatenation of proofs s and t.
Correspondence Theorem 10 uses Monotonicity to connect Justification Logic
with epistemic modal logic. However, it is an intriguing challenge to develop
a theory of non-monotonic justifications which prompt belief revision. Some
Justification Logic systems without Monotonicity have been studied in Artemov
and Strassen (1993) and Krupski (2001, 2006).
1
More elaborate models considered below in this paper also use additional operations on
justifications, e.g., verifier ‘Š’ and negative verifier ‘‹’.
32 The Logic of Justification 657
if x W A, y W B, : : : , z W C hold, then t W F.
J0 is able, with this capacity, to adequately emulate other Justification Logic systems
in its language.
The Logical Awareness principle states that logical axioms are justified ex officio:
an agent accepts logical axioms (including the ones concerning justifications) as
justified. As stated here, Logical Awareness is too restrictive and Justification Logic
offers a flexible mechanism of Constant Specifications to represent all shades of
logical awareness.
Justification Logic distinguishes between an assumption and a justified assump-
tion. Constants are used to denote justifications of assumptions in situations when
we don’t analyze these justifications any further. Suppose we want to postulate that
an axiom A is justified for a given agent. The way to say it in Justification Logic is
to postulate
e1 W A
e2 W .e1 W A/
for the similar constant e2 with index 2, etc. Keeping track of indices is not
necessary, but it is easy and helps in decision procedures (cf. Kuznets 2008). The set
of all assumptions of this kind for a given logic is called a Constant Specification.
Here is a formal definition.
A Constant Specification CS for a given logic L is a set of formulas
en W en 1 W : : : W e1 W A .n 1/;
658 S. Artemov
en W en 1 W : : : W e1 W A 2 CS;
then
enC1 W en W en 1 W : : : W e1 W A 2 CS:
en W en 1 W : : : W e1 W A 2 CS:
We are reserving the name TCS for the total constant specification (for a given
logic). Naturally, the total constant specification is axiomatically appropriate.
Logic of Justifications with given Constant Specification
JCS D J0 C CS:
Logic of Justifications
J D J0 C R4;
Note that J0 is J; , and J coincides with JTCS . The latter reflects the idea of the
unrestricted Logical Awareness for J. A similar principle appeared in the Logic
of Proofs LP; it has also been anticipated in Goldman’s Goldman (1967). Note
that any specific derivation in J may be regarded as a derivation in JCS for a
corresponding finite constant specification CS, hence finite CS’s constitute an
important representative class of constant specifications.
32 The Logic of Justification 659
`F ) ` KF (32.14)
applied to axioms.
Let us consider some basic examples of derivations in J. In Examples 1 and 2,
only constants of level 1 have been used; in such situations we skip indices
completely.
Example 1. This example shows how to build a justification of a conjunction from
justifications of the conjuncts. In the traditional modal language, this principle is
formalized as
A ^ B ! .A ^ B/:
A _ B ! .A _ B/:
2
Dretske (2005).
32 The Logic of Justification 661
We proceed in the spirit of the Red Barn Example and consider it a general test for
theories that explain knowledge. What we want is a way to represent what is going
on here which maintains epistemic closure,
one knows everything that one knows to be implied by what one knows, (32.15)
but also preserves the problems the example was intended to illustrate.
We present plausible formal analysis of the Red Barn Example in epistemic
modal logic (sections “Red Barn in Modal Logic of Belief” and “Red Barn in Modal
Logic of Knowledge”) and in Justification Logic (sections “Red Barn in Justification
Logic of Belief” and “Red Barn in Justification Logic of Knowledge”). We will see
that epistemic modal logic is capable only of telling us that there is a problem,
whereas Justification Logic helps to analyse what has gone wrong. We see that
closure holds as it is supposed to, and we see that if we keep track of justifications
we can analyse why we had a problem.
In our first formalization, the logical derivation will be made in epistemic modal
logic with ‘my belief’ modality . We then interpret some of the occurrences of
as ‘knowledge’ according to the problem’s description. We will not try to capture
the whole scenario formally; to make our point, it suffices to formalize and verify
its “entailment” part. Let
• B be ‘the object in front of me is a barn,’
• R be ‘the object in front of me is red,’
• be ‘my belief’ modality.
The formulation considers observations ‘I see a barn’ and ‘I see a red barn,’ and
claims logical dependencies between them. The following is a natural formalization
of these assumptions in the epistemic modal logic of belief:
1. B, ‘I believe that the object in front of me is a barn’;
2. .B^R/, ‘I believe that the object in front of me is a red barn.’
At the metalevel, we assume that 2 is knowledge, whereas 1 is not knowledge by
the problem’s description. So, we could add factivity of 2, .B^R/ ! .B^R/, to
the formal description, but this would not matter for our conclusions. We note that
indeed 1 logically follows from 2 in the modal logic of belief K:
3. .B^R/ ! B, logical axiom;
4. Œ.B ^ R/ ! B, from 3, by Necessitation. As a logical truth, this is a case of
knowledge too;
5. .B^R/ ! B, from 4, by modal logic.
662 S. Artemov
Within this formalization, it appears that Closure Principle (32.15) is violated: .B^
R/ is knowledge by the problem’s description, Œ.B ^ R/ ! B is knowledge as a
simple logical axiom, whereas B is not knowledge.
Now we will use epistemic modal logic with ‘my knowledge’ modality K. Here is
a straightforward formalization of Red Barn Example assumptions:
1. :KB, ‘I do not know that the object in front of me is a barn’;
2. K.B^R/, ‘I know that the object in front of me is a red barn.’
It is easy to see that these assumptions are inconsistent in the modal logic of
knowledge. Indeed,
3. K.B^R/ ! .KB^KR/, by normal modal logic;
4. KB^KR, from 2 and 3, by Modus Ponens;
5. KB, from 4, by propositional logic.
Lines 1 and 5 formally contradict each other.
Modal logic of knowledge does not seem to apply here.
Justification Logic seems to provide a more fine-grained analysis of the Red Barn
Example. We naturally refine assumptions by introducing individual justifications u
for belief that B, and v for belief that B^R. The set of assumptions in the Justification
Logic is
1. u W B, ‘u is the reason to believe that the object in front of me is a barn’;
2. v W .B ^ R/, ‘v is the reason to believe that the object in front of me is a red
barn.’ On the metalevel, the description states that this is a case of knowledge,
not merely a belief.
Again, we can add the factivity condition for 2, v W .B^R/ ! .B^R/, but this does
not change the analysis here. Let us try to reconstruct the reasoning of the agent in
J:
3. .B^R/ ! B, logical axiom;
4. a W Œ.B^R/ ! B, from 3, by Axiom Internalization. This is also knowledge, as
before;
5. v W .B^R/ ! .av/ W B, from 4, by Application and Modus Ponens;
6. .av/ W B, from 2 and 5, by Modus Ponens.
32 The Logic of Justification 663
Closure holds! Instead of deriving 1 from 2 as in section “Red Barn in Modal Logic
of Belief”, we have obtained a correct conclusion that .a v/ W B, i.e., ‘I know B
for reason a v,’ which seems to be different from u: the latter is the result of a
perceptual observation, whereas the former is the result of logical reasoning. In
particular, we cannot conclude that 2, v W .B^R/, entails 1, u W B; moreover, with
some basic model theory of J in section “Basic Epistemic Semantics”, we can show
that 2 does not entail 1. Hence, after observing a red façade, I indeed know B, but
this knowledge does not come from 1, which remains a case of belief rather than of
knowledge.
the latter ‘knowledge.’ But what if we need to keep track of a larger number of
different unrelated reasons? By introducing a number of distinct modalities and
then imposing various assumptions governing the inter-relationships between these
modalities, one would essentially end up with a reformulation of the language of
Justification Logic itself (with distinct terms replaced by distinct modalities). This
suggests that there may not really be a ‘halfway point’ between the modal language
and the language of Justification Logic, at least inasmuch as one tries to capture the
essential structure of examples involving the deductive failure of knowledge (e.g.,
Kripke’s Red Barn example). Accordingly, one is either stuck with modal logic and
its inferior account of these examples or else moves to Justification Logic and its
superior account of these examples. This move can either come about by taking a
multi-modal language and imposing inter-dependencies on different modals–ending
up with something essentially equivalent to the language of Justification Logic–or
else one can use the language of Justification Logic from the start. Either way, all
there is to move to is Justification Logic.
The standard epistemic semantics for J has been provided by the proper adaptation
of Kripke-Fitting models Fitting (2005) and Mkrtychev models Mkrtychev (1997).
A Kripke-Fitting J-model M D .W; R; E;
/ is a Kripke model .W; R;
/
enriched with an admissible evidence function E such that E.t; F/ W for any
justification t and formula F. Informally, E.t; F/ specifies the set of possible worlds
where t is considered admissible evidence for F. The intended use of E is in the
truth definition for justification assertions:
u
t W F if and only if
1. F holds for all possible situations, that is, v
F for all v such that uRv;
2. t is an admissible evidence for F at u, that is, u 2 E.t; F/.
An admissible evidence function E must satisfy the closure conditions with respect
to operations ‘’ and ‘C’:
• Application: E.s; F ! G/ \ E.t; F/ E.s t; G/. This condition states that
whenever s is an admissible evidence for F ! G and t is an admissible evidence
for F, their ‘product,’ st, is an admissible evidence for G.
• Sum: E.s; F/ [ E.t; F/ E.s C t; F/. This condition guarantees that s C t is an
admissible evidence for F whenever either s is admissible for F or t is admissible
for F.
These are natural conditions to place on E because they are necessary for making
basic axioms of Application and Monotonicity valid.
We say that E.t; F/ holds at a given world u if u 2 E.t; F/.
Given a model M D .W; R; E;
/, the forcing relation
is extended from
sentence variables to all formulas as follows: for each u 2 W,
32 The Logic of Justification 665
F if and only if F 2 :
There are several features of the canonical model which could be included into
the formulation of the Completeness Theorem to make it stronger.
Strong Evidence. We can show that the canonical model considered in this proof
satisfies the Strong Evidence property
2 E.t; F/ implies t W F:
Note that for axiomatically appropriate constant specifications CS, the Internaliza-
tion property holds: if G is provable in JCS , then t W G is also provable there for
some term t. Here is the proof of the Fully Explanatory property for canonical
models.3 Suppose 6
t W F for any justification term t. Then the set ] [ f:Fg
is consistent. Indeed, otherwise for some t1 W X1 ; t2 W X2 ; : : : ; tn W Xn 2 ,
X1 ! .X2 ! : : : ! .Xn ! F/ : : :/ is provable. By Internalization, there is a
justification s such that s W .X1 ! .X2 ! : : : ! .Xn ! F/ : : :// is also provable. By
Application, t1 W X1 ! .t2 W X2 ! : : : ! .tn W Xn ! .st1t2 : : :tn / W F/ : : :/ is provable,
hence ` t W F for t D s t1 t2 : : : tn . Therefore,
t W F—a contradiction. Let
be a maximal consistent set extending ] [ f:Fg. By the definition of R, R,
by the Truth Lemma, 6
F, which contradicts the assumptions.
Mkrtychev semantics is a predecessor of Kripke-Fitting semantics (Mkrtychev
1997). Mrktychev models are Kripke-Fitting models with a single world, and the
proof of Theorem 4 can be easily modified to establish completeness of JCS with
respect to Mkrtychev models.
Theorem 5. For any Constant Specification CS, JCS is sound and complete for the
class of Mrktychev models respecting CS.
Proof. Soundness follows immediately from Theorem 4. For completeness, define
the canonical model as in Theorem 4 except for R, which should be taken empty.
3
This proof for LP was offered by Fitting in Fitting (2005).
32 The Logic of Justification 667
This assumption makes the condition ‘
F for all such that R’ vacuously
true, and the forcing condition for justification assertions
t W F becomes
equivalent to 2 E.t; F/, i.e., t W F 2 . This simplification immediately verifies
the Truth Lemma.
The conclusion of the proof of Theorem 5 is standard. Let F be not derivable
in JCS . Then the set f:Fg is consistent. Using the standard saturation construction,
extend it to a maximal consistent set containing :F. By consistency, F 62 . By
the Truth Lemma, 6
F. The Mkrtychev model consisting of this particular is
the desired counter-model for F. The rest of the canonical model is irrelevant.
Q . E. D .
Note that Mkrtychev models built in Theorem 5 are not reflexive, and possess the
Strong Evidence property. On the other hand, Mkrtychev models cannot be Fully
Explanatory, since ‘
F for all such that R’ is vacuously true, but
t W F
is not.
Theorem 5 shows that the information about Kripke structure in Kripke-Fitting
models can be completely encoded by the admissible evidence function. Mkrtychev
models play an important theoretical role in Justification Logic (Artemov 2008;
Brezhnev and Kuznets 2006; Krupski 2006; Kuznets 2000; Milnikel 2007). On the
other hand, as we will see in section “Formalization of Gettier Examples”, Kripke-
Fitting models can be useful as counter-models with desired properties since they
take into account both epistemic Kripke structure and evidence structure. Speaking
metaphorically, Kripke-Fitting models naturally reflect two reasons why a certain
fact F can be unknown to an agent: F fails at some possible world or an agent does
not have a sufficient evidence of F.
Another application area of Kripke-Fitting style models is Justification Logic
with both epistemic modalities and justification assertions (cf. Artemov 2006;
Artemov and Nogina 2005).
Corollary 6 (Model existence). For any constant specification CS, JCS is consis-
tent and has a model.
Proof. JCS is consistent. Indeed, suppose JCS proves ?, and erase all justification
terms (with ‘:’s) in each of its formulas. What remains is a chain of propositional
formulas provable in classical logic (an easy induction on the length of the original
proof) ending with ?—contradiction.
To build a model for JCS , use the Completeness Theorem (Theorem 4). Since
JCS does not prove ?, by Completeness, there is a JCS -model (where ? is false, of
course). Q . E. D .
Factivity
Factivity states that justifications of F are factive, i.e., sufficient for an agent to
conclude that F is true. This yields the Factivity Axiom
t W F ! F; (32.16)
which has a similar motivation to the Truth Axiom in epistemic modal logic
KF ! F; (32.17)
JT0 D J0 C A4;
JT D J C A4;
with
A4. Factivity Axiom t W F ! F.
Systems JTCS corresponding to Constant Specifications CS are defined as in
section “Logical Awareness and Constant Specifications”.
JT-models are J-models with reflexive accessibility relations R. The reflexivity
condition makes each possible world accessible from itself which exactly corre-
sponds to the Factivity Axiom. The direct analogue of Theorem 3 holds for JTCS as
well.
Theorem 7. For any Constant Specification CS, each of the logics JTCS is sound
and complete with respect to the class of JT-models respecting CS.
Proof. We now proceed as in the proof of Theorem 4. The only addition to
soundness is establishing that the Factivity Axiom holds in reflexive models. Let
R be reflexive. Suppose u
t W F. Then v
F for all v such that uRv. By reflexivity
of R, uRu, hence u
F as well.
For completeness, it suffices to check that R in the canonical model is reflexive.
Indeed, if s W F 2 , then, by the properties of the maximal consistent sets, F 2
as well, since JT derives s W F ! F (with any CS). Hence ] and R. Q.E.D.
Mkrtychev JT-models are singleton JT-models, i.e., JT-models with singleton
W’s.
32 The Logic of Justification 669
Theorem 8. For any Constant Specification CS, each of the logics JTCS is sound
and complete with respect to the class of Mkrtychev JT-models respecting CS.
Proof. Soundness follows from Theorem 7. For completeness, we follow the
footprints of Theorems 4 and 5, but define the accessibility relation R as
R iff D :
Q . E. D .
As in the Red Barn Example (section “Red Barn Example and Tracking Justifica-
tions”), we have to handle a wrong reason for a true justified fact. Again, the tools
at Justification Logic seem to be useful and adequate here.
Let B stand for
Furthermore, let w be a wrong reason for B and r the right (hence factive) reason for
B. Then, Russell’s example yields the following assumptions:
fw W B; r W B; r W B ! Bg: (32.18)
4
Which was common knowledge back in 1912.
670 S. Artemov
However, this derivation utilizes the fact that r is a factive justification for B to
conclude w W B ! B, which constitutes the case of ‘induced factivity’ of w W B.
The question is, how can we distinguish the ‘real’ factivity of r W B from an
‘induced factivity’ of w W B? Again, some sort of truth-tracking is needed here, and
Justification Logic seems to do the job. The natural approach would be to consider
the set of assumptions (32.18) without r W B, i.e.,
fw W B; r W B ! Bg; (32.19)
and establish that factivity of w, i.e., w W B ! B is not derivable from (32.19). Here
is a J-model M D .W; R; E;
/ in which (32.19) holds but w W B ! B does not.
W D f0g, R D ;, 0 6
B, and E.t; F/ holds for all pairs .t; F/ except .r; B/. It is
easy to see that the closure conditions Application and Sum on E are fulfilled. At 0,
w W B holds, that is,
0 w W B;
0 6 r W B;
0 r W B ! B:
0 6 w W B ! B
In this section, we discuss other principles and operations which may or may not be
added to the core Justification Logic systems.
Positive Introspection
KF ! KKF:
32 The Logic of Justification 671
This principle has an adequate explicit counterpart: the fact that the agent accepts t
as a sufficient evidence of F serves as a sufficient evidence that t W F. Often, such
meta-evidence has a physical form, e.g., a referee report certifying that a proof of
a paper is correct, a computer verification output given a formal proof t of F as an
input, a formal proof that t is a proof of F, etc. Positive Introspection assumes that
given t, the agent produces a justification Št of t W F such that
t W F !Št W .t W F/:
Positive Introspection in this operational form first appeared in the Logic of Proofs
LP (Artemov 1995, 2001). A similar suggestion was made by Gödel (1995).
We define
J4 D J C A5
and
LP D JT C A5; 5
with
A5. Positive Introspection Axiom t W F !Št W .t W F/.
We also define J40 , J4CS , LP0 , and LPCS in the natural way (cf. section “Logical
Awareness and Constant Specifications”). The direct analogue of Theorem 3 holds
for J4CS and LPCS as well.
Note that in the presence of the Positive Introspection Axiom, one could limit the
scope of the Axiom Internalization Rule R4 to internalizing axioms which are not
yet of the form e W A. This is how it has been done in LP: the Axiom Internalization
can then be emulated by using ŠŠe W .Še W .e W A// instead of e3 W .e2 W .e1 W A//, etc.
The notion of Constant Specification could also be simplified accordingly.
Such modifications are minor and they do not affect the main theorems and
applications of Justification Logic.
Negative Introspection
Pacuit and Rubtsova considered in Pacuit (2005, 2006) and Rubtsova (2005, 2006)
the Negative Introspection operation ‘‹’ which verifies that a given justification
assertion is false. A possible motivation for considering such an operation could
be that the positive introspection operation ‘Š’ may well be regarded as capable
5
In our notation, LP can be assigned the name JT4. However, in virtue of a fundamental role
played by LP for Justification Logic, we suggest keeping the name LP for this system.
672 S. Artemov
J45 D J4 C A6;
JD45 D J45 C :t W ?;
JT45 D J45 C A4;
6
A proof-compliant way to represent negative introspection in Justification Logic was suggested
in Artemov et al. (1999), but we will not consider it here.
32 The Logic of Justification 673
Note that J45-models satisfy the Stability property: uRv yields ‘u 2 E.t; F/
iff v 2 E.t; F/.’ In other words, E is monotone with respect to R 1 as well.
Indeed, the direction ‘u 2 E.t; F/ yields v 2 E.t; F/’ is due to Monotonicity.
Suppose u 62 E.t; F/. By Negative Introspection closure, u 2 E.‹t; :t W F/. By
Strong Evidence, u
‹t W .:t W F/. By the definition of forcing, v
:t W F, i.e.,
v 6
t W F. By Strong Evidence, v 62 E.t; F/.
Note also that the Euclidean property of the accessibility relation R is not
required for J45-models and is not needed to establish the soundness of J45
with respect to J45-models. However, the canonical model for J45 is Euclidean,
hence both soundness and completeness claims trivially survive an additional
requirement that R is Euclidean.
• JD45-models are J45-models with the Serial condition on the accessibility
relation R: for each u there is v such that uRv holds.
• JT45-models are J45-models with reflexive R. Again, the Euclidean property
(or, equivalently, symmetry) of R is not needed for soundness. However, these
properties hold for the canonical JT45-model, hence they could be included into
the formulation of the Completeness Theorem.
Theorem 9. Each of the logics J4CS , LPCS , J45CS , JT45CS for any Constant
Specification is sound and complete with respect to the corresponding class of
epistemic models. JD45CS is complete w.r.t. its epistemic models for axiomatically
appropriate CS.
Proof. We will follow the footprints of the proof of Theorem 4.
1. J4. For soundness, it now suffices to check the validity of the Positive Introspec-
tion Axiom at each node of any J4-model. Suppose u
t W F. Then u 2 E.t; F/
and v
F for each v such that uRv. By the closure condition, u 2 E.Št; t W F/,
and it remains to check that v
t W F. By monotonicity of E, v 2 E.t; F/. Now,
take any w such that vRw. By transitivity of R, uRw as well, hence w
F. Thus
v
t W F, u
Št Wt WF, and u
t WF ! Št Wt WF.
Completeness is again established as in Theorem 4. It only remains to check
that the accessibility relation R is transitive, the admissible evidence function E
is monotone, and the additional closure condition on E holds.
Monotonicity. Suppose R and 2 E.t; F/, i.e., t W F 2 . By maximality
of , Št W t W F 2 as well, since J4 ` t W F ! Št W t W F. By definition,
t W F 2 , i.e., 2 E.t; F/.
Transitivity. Suppose R, R†, and t W F 2 . Then, by monotonicity,
t W F 2 . By the definition of R, F 2 †, hence R†.
Closure. Suppose 2 E.t; F/, i.e., t W F 2 . Then as above, Št W t W F 2 ,
hence 2 E.Št; t W F/.
2. LP. This is the well-studied case of the Logic of Proofs, (cf. Fitting 2005).
3. J45. Soundness. We have to check the Negative Introspection Axiom. Let u
:t W F, i.e., u 6
t W F. By the Strong Evidence condition, u 62 E.t; F/. By Negative
Introspection closure, u 2 E.‹t; :t W F/. By Strong Evidence, u
‹t W .:t W F/.
674 S. Artemov
7
Brezhnev (2000) also considered variants of Justification Logic systems which, in our notations,
would be called “JD” and “JD4.”
32 The Logic of Justification 675
for some x, x W F:
The language of Justification Logic does not have quantifiers over justifications, but
instead has a sufficiently rich system of operations (polynomials) on justifications.
We can use Skolem’s idea of replacing quantifiers by functions and view Justifica-
tion Logic systems as Skolemized logics of knowledge/belief. Naturally, to convert a
Justification Logic sentence to the corresponding Epistemic Modal Logic sentence,
one can use the forgetful projection ‘Ý’ that replaces each occurrence of t W F
by F.
Example: the sentence
x W P ! f .x/ W Q
P ! Q;
t W P!P Ý P ! P;
t W P !Št W .t W P/ Ý P ! P;
s W .P ! Q/ ! .t W P ! .st/ W Q/ Ý .P ! Q/ ! .P ! Q/:
The Correspondence Theorem shows that the major epistemic modal logics K,
K4, K45, KD45 (for belief) and T, S4, S5 (for knowledge) have exact Justification
Logic counterparts J, J4, J45, JD45 (for partial justifications) and JT, LP, JT45
(for factive justifications).
Is there anything new that we have learned from the Correspondence Theorem about
epistemic modal logics?
First of all, this theorem provides a new semantics for major modal logics. In
addition to the traditional Kripke-style ‘universal’ reading of F as
Perhaps the justification semantics plays a similar role in modal logic to that
played by Kleene realizability in intuitionistic logic. In both cases, the intended
semantics was existential: the Brouwer-Heyting-Kolmogorov interpretation of
intuitionistic logic (Heyting 1934; Troelstra and van Dalen 1988; van Dalen 1986)
and Gödel’s provability reading of S4 (Gödel 1933; Gödel 1995). In both cases, a
later possible-world semantics of universal character became a highly potent and
dominant technical tool. However, in both cases, Kripke semantics did not solve the
32 The Logic of Justification 677
which does not seem to have an epistemically acceptable explicit version. Let us
consider, for example, a case when F is the propositional constant ? for false. A
Skolem-style reading of (32.20) suggests that there are justification terms s and t
such that
x W .s W ? ! ?/ ! t W ?: (32.21)
8
To be precise, we have to substitute c for x everywhere in s and t.
678 S. Artemov
The same holds for any qfJCS with an axiomatically appropriate constant specifica-
tion CS.
Proof. Taking into account Example 1, it suffices to establish that for some t.u/,
.f D g/ ! ŒP.f / ! P.g/:
An unjustified substitution can fail in qfJ. Namely, for any individual variables x
and y, a predicate symbol P, and justification term s, the formula
is not valid. To establish this, one needs some model theory for qfJ.
We define qfJ-models as the usual first-order Kripke models9 equipped with
admissible evidence functions. A model is .W; fDw g; R; E;
/ such that the fol-
lowing properties hold.
• W is an nonempty set of worlds.
• fDw g is the collection of nonempty domains Dw for each w 2 W.
• R is the binary (accessibility) relation on W.
• E is the admissible evidence function which for each justification term t and
formula F, returns the set of worlds E.t; F/ W. Informally, these are the worlds
9
Equality is interpreted as identity in the model.
680 S. Artemov
maximal e
0
where t is admissible evidence for F. We also assume that E satisfies the usual
closure properties Application and Sum (section “Basic Epistemic Semantics”).
•
is the forcing (truth) relation such that
assigns elements of Dw to individual variables and constants for each w 2
W,
for each n-ary predicate symbol P, and any a1 ; a2 ; : : : ; an 2 Dw , it is specified
whether P.a1 ; a2 ; : : : ; an / holds in Dw ,
is extended to all the formulas by stipulating that
w
s D t iff ‘
’ maps s and t to the same element of Dw ,
w
P.t1 ; t2 ; : : : ; tn / iff ‘
’ maps ti ’s to ai ’s and P.a1 ; a2 ; : : : ; an / holds in Dw ,
w
F ^ G iff w
F and w
G,
w
:F iff w 6
F,
w
t W F iff v
F for all v such that wRv, and w 2 E.t; F/.
The notion of a model respecting given constant specification is directly transferred
from section “Basic Epistemic Semantics”.
The following Theorem is established in the same manner as the soundness part
of Theorem 4.
Theorem 13. For any Constant Specification CS, qfJCS is sound with respect to the
corresponding class of epistemic models.
We are now ready to show that instances of unjustified substitution can fail in
qfJ. To do this, it now suffices to build a qfJ-counter-model for (32.22) with the
total constant specification. Obviously, the maximal E (i.e., E.t; F/ contains each
world for any t and F) respects any constant specification.
The Kripke-Fitting counter-model in Fig. 32.1 exploits the traditional modal
approach to refute a belief assertion by presenting a possible world where the object
of this belief does not hold. In the picture, only true atomic formulas are shown next
to possible worlds.
• W D f0; 1g; R D f.0; 1/g; D0 D D1 D fa; bg;
• 1
P.a/ and 1 6
P.b/; the truth value of P at 0 does not matter;
• x and y are interpreted as a at 0; x is interpreted as a and y as b at 1;
• E is maximal at 0 and 1.
32 The Logic of Justification 681
0 6 x D y ! s WŒP.x/ $ P.y/:
We consider Gettier’s Case I in detail; Case II is much simpler logically and can be
given similar treatment. We will present a complete formalization of Case I in qfJ
with a definite description operation. Let
• J.x/ be the predicate x gets the job;
• C.x/ be the predicate x has (ten) coins (in his pocket);
• Jones and Smith be individual constants denoting Jones and Smith, respec-
tively10 ;
• u be a justification variable.
In this section, we will formalize Case I using a definite description -operation such
that xP.x/ is intended to denote
We interpret xP.x/ in a given world of a qfJ-model as the element a such that P.a/
if there exists a unique a satisfying P.a/. Otherwise, xP.x/ is undefined and any
atomic formula where xP.x/ actually occurs is taken to be false. Definite description
terms are non-rigid designators: xP.x/ may be given different interpretations in
different worlds of the same qfJ-model (cf. Fitting 2007). The use of a definite
description
as a justified belief by Smith hints that Smith has strong evidence for the fact that at
most one person will get the job. This is implicit in Gettier’s assumption.
10
Assuming that there are people seeking the job other than Jones and Smith does not change the
analysis.
682 S. Artemov
Jones is the man who will get the job, and Jones has coins
for which Smith has a strong evidence. In addition, Smith has no knowledge of
‘Smith has coins’ and there should be a possible world at which C(Smith) is
false; we use 1 to represent this possibility.
3. World 1 is accessible from 0.
4. Smith has a strong evidence of (d), which we will represent by introducing a
justification variable u such that
holds at the actual world 0. We further assume that the admissible evidence
function E respects the justification assertion (32.24), which yields
To keep things simple, we can assume that E is the maximal admissible evidence
function, i.e., E.t; F/ D f0; 1g for each t; F.
These observations lead to the following model M on Fig. 32.2.
maximal e J(Smith),C(Jones),C(Smith)
0
11
Strictly speaking, Case I explicitly states only that Smith has a strong evidence that C(Jones),
which is not sufficient to conclude that C(Jones), since Smith’s justifications are not necessarily
factive. However, since the actual truth value of C(Jones) does not matter in Case I, we assume
that in this instance, Smith’s belief that C(Jones) was true.
32 The Logic of Justification 683
It follows from the Soundness Theorem 13 that assumptions (32.25) provide a sound
description of the actual world:
Proposition 14. qfJ C (32.25) ` F entails 0
F.
Example 15. The description of a model by (32.25) is not complete. For example,
conditions (32.25) do not specifically indicate whether t W C(Smith) holds at the
actual world for some t, whereas it is clear from the model that 0 6
t W C(Smith) for
any t since 1 6
C(Smith) and 1 is accessible from 0. Model M extends the set of
assumptions (32.25) to a possible complete specification: every ground proposition
F in the language of this example is either true or false at the ‘actual’ world 0 of the
model.
Gettier’s conclusion in Case I states that Smith is justified in believing that ‘The
man who will get the job has ten coins in his pocket.’ In our formal language, this
amounts to a statement that for some justification term t,
t W C.xJ.x// (32.26)
0 .su/ W C.xJ.x//:
Q . E. D .
We can eliminate definite descriptions from Case I using, e.g., Russell’s translation
(cf. Fitting and Mendelsohn 1998; Neale 1990; Russell 1905, 1919) of definite
descriptions. According to Russell, C.xJ.x// contains a hidden uniqueness assump-
tion and reads as
F.Jones/ ^ F.Smith/;
G.Jones/ _ G.Smith/:
32 The Logic of Justification 685
Taking into account all of these simplifying observations, we may assume that for
Smith (and the reader), 8y.J.y/ ! y D Jones/ reads as
which is equivalent12 to
:J.Smith/:
J.Jones/ ^ :J.Smith/;
The assumption that (d) is justified for Smith can now be represented by
‘the man who will get the job has coins,’ (32.31)
8yŒJ.y/ ! .y D Jones/
is equivalent to
:J.Smith/;
and
8yŒJ.y/ ! .y D Smith/
12
We assume that everybody is aware that Smith ¤ Jones.
686 S. Artemov
is equivalent to
:J.Jones/:
Theorem 17. Gettier’s claim (32.34) is derivable in qfJ from the assump-
tion (32.30) of Case I, and holds in the ‘actual world’ 0 of the natural model
M of Case I.
Proof. After all the preliminary work and assumptions, there is not much left to
do. We just note that (32.29) is a disjunct of (32.33). A derivation of (32.34)
from (32.30) in qfJ reduces now to repeating steps of Example 2, which shows
how to derive a justified disjunction from its justified disjunct. Q . E. D .
Comment 18. One can see clearly the essence of Gettier’s example. In (32.33),
one of two disjuncts is justified but false, whereas the other disjunct is unjustified
but true. The resulting disjunction (32.33) is both justified and true, but not really
known to Smith.
Jones will get the job, and Jones has ten coins in his pocket, (32.35)
A man who will get the job has ten coins in his pocket. (32.36)
maximal e J(Smith),C(Jones),C(Smith)
0
It now suffices to build a Fitting qfJ-model (Fig. 32.3) where (32.41) does not hold
at a certain world.
At 0, all assumptions 1–4 hold, but (32.40) is false at 0 for all t’s. Indeed, (32.39) is
false at 1, since its conjunct
J.Smith/ ! C.Smith/
In this subsection, we show that references to coins and pockets, as well as definite
descriptions, are redundant for making the point in Gettier example Case I. Here is
a simpler, streamlined case based on the same material.
Smith has strong evidence for the proposition:
(d) Jones will get the job.
Proposition (d) entails:
(e) Either Jones or Smith will get the job.
Let us suppose that Smith sees the entailment from (d) to (e), and accepts (e) on
the grounds of (d), for which he has strong evidence. In this case, Smith is clearly
justified in believing that (e) is true. But imagine further that unknown to Smith, he
himself, not Jones, will get the job. Then
(1) (e) is true,
(2) Smith believes that (e) is true, and
(3) Smith is justified in believing that (e) is true.
But it is equally clear that Smith does not know that (e) is true: : :.
In this version, the main assumption is
Smith has a strong evidence that Jones gets the job. (32.42)
v W J.Jones/: (32.43)
Smith is justified in believing that either Jones or Smith will get the job. (32.44)
maximal e J(Smith)
0
and
The desired Gettier-style point is made on the same material but without the
unnecessary use of quantifiers, definite descriptions, coins, and pockets (Fig. 32.4).
It is fair to note, however, that Gettier example Case II in Gettier (1963) does not
have these kinds of redundancies and is logically similar to the streamlined version
of Case I presented above.
The question is, what we have learned about Justification, Belief, Knowledge,
and other epistemic matters?
690 S. Artemov
Within the domain of formal epistemology, we now have a basic logic machinery
to study justifications and their connections with Belief and Knowledge. Formaliz-
ing Gettier is a case study that demonstrates the method.
We show that Gettier reasoning was formally correct, with some hidden assump-
tions related to definite descriptions. Gettier examples belong to the area of
Justification Logic dealing with partial justifications and are inconsistent within
Justification Logic systems of factive justifications and knowledge. All this, perhaps,
does not come as a surprise to epistemologists. However, these observations show
that models provided by Justification Logic behave in a reasonable manner.
For epistemology, these developments are furthering the study of justification,
e.g., the search for the ‘fourth condition’ of the JTB definition of knowledge.
Justification Logic provides systematic examples of epistemological principles such
as Application, Monotonicity, Logical Awareness, and their combinations, which
look plausible, at least, within the propositional domain. Further discussion on these
and other Justification Logic principles could be an interesting contribution to this
area.
Conclusions
Justification Logic extends the logic of knowledge by the formal theory of justi-
fication. Justification Logic has roots in mainstream epistemology, mathematical
logic, computer science, and artificial intelligence. It is capable of formalizing a
significant portion of reasoning about justifications. In particular, we have seen
how to formalize Kripke, Russell, and Gettier examples in Justification Logic. This
formalization has been used for the resolution of paradoxes, verification, hidden
assumption analysis, and eliminating redundancies.
Among other known applications of Justification Logic, so far there are
• intended provability semantics for Gödel’s provability logic S4 with the Com-
pleteness Theorem (Artemov 1995, 2001);
• formalization of Brouwer-Heyting-Kolmogorov semantics for intuitionistic
propositional logic with the Completeness Theorem (Artemov 1995, 2001);
• a general definition of the Logical Omniscience property, rigorous theorems that
evidence assertions in Justification Logic are not logically omniscient (Artemov
and Kuznets 2006). This provides a general framework for treating the problem
of logical omniscience;
• an evidence-based approach to Common Knowledge (so-called Justified Com-
mon Knowledge) which provides a rigorous semantics to McCarthy’s ‘any
fool knows’ systems (Antonakos 2007; Artemov 2006; McCarthy et al. 1978).
Justified Common Knowledge offers formal systems which are less restrictive
than the usual epistemic logics with Common Knowledge (Artemov 2006).
• analysis of Knower and Knowability paradoxes (Dean and Kurokawa 2007,
2010).
32 The Logic of Justification 691
It remains to be seen to what extent Justification Logic can be useful for analysis
of empirical, perceptual, and a priori types of knowledge. From the perspective
of Justification Logic, such knowledge may be considered as justified by constants
(i.e., atomic justifications). Apparently, further discussion is needed here.
Acknowledgements The author is very grateful to Walter Dean, Mel Fitting, Vladimir Krupski,
Roman Kuznets, Elena Nogina, Tudor Protopopescu, and Ruili Ye, whose advice helped with this
paper. Many thanks to Karen Kletter for editing this text. Thanks to audiences at the CUNY
Graduate Center, Bern University, the Collegium Logicum in Vienna, and the 2nd International
Workshop on Analytic Proof Systems for comments on earlier versions of this paper. This work has
been supported by NSF grant 0830450, CUNY Collaborative Incentive Research Grant CIRG1424,
and PSC CUNY Research Grant PSCREG-39-721.
References
Antonakos, E. (2007). Justified and common knowledge: Limited conservativity. In S. Artemov &
A. Nerode (Eds.), Logical Foundations of Computer Science. International Symposium, LFCS
2007, Proceedings, New York, June 2007 (Lecture notes in computer science, Vol. 4514, pp. 1–
11). Springer.
Artemov, S. (1995). Operational modal logic. Technical report MSI 95-29, Cornell University.
Artemov, S. (1999). Understanding constructive semantics. In Spinoza Lecture for European
Association for Logic, Language and Information, Utrecht, Aug 1999.
Artemov, S. (2001). Explicit provability and constructive semantics. Bulletin of Symbolic Logic,
7(1), 1–36.
Artemov, S. (2006). Justified common knowledge. Theoretical Computer Science, 357(1–3), 4–22.
Artemov, S. (2007). On two models of provability. In D. M. Gabbay, M. Zakharyaschev, &
S. S. Goncharov (Eds.), Mathematical problems from applied logic II (pp. 1–52). New York:
Springer.
Artemov, S. (2008). Symmetric logic of proofs. In A. Avron, N. Dershowitz, & A. Rabinovich
(Eds.), Pillars of computer science, essays dedicated to Boris (Boaz) Trakhtenbrot on the
occasion of his 85th birthday (Lecture notes in computer science, Vol. 4800, pp. 58–71).
Berlin/Heidelberg: Springer.
Artemov, S., & Beklemishev, L. (2005). Provability logic. In D. Gabbay & F. Guenthner (Eds.),
Handbook of philosophical logic (2nd ed., Vol. 13, pp. 189–360). Dordrecht: Springer.
Artemov, S., Kazakov, E., & Shapiro, D. (1999). Epistemic logic with justifications. Technical
report CFIS 99-12, Cornell University.
Artemov, S., & Kuznets, R. (2006). Logical omniscience via proof complexity. In Computer
Science Logic 2006, Szeged (Lecture notes in computer science, Vol. 4207, pp. 135–149).
Artemov, S., & Nogina, E. (2004). Logic of knowledge with justifications from the provability
perspective. Technical report TR-2004011, CUNY Ph.D. Program in Computer Science.
Artemov, S., & Nogina, E. (2005). Introducing justification into epistemic logic. Journal of Logic
and Computation, 15(6), 1059–1073.
Artemov, S., & Strassen, T. (1993). Functionality in the basic logic of proofs. Technical report
IAM 93-004, Department of Computer Science, University of Bern, Switzerland.
Boolos, G. (1993). The logic of provability. Cambridge: Cambridge University Press.
Brezhnev, V. (2000). On explicit counterparts of modal logics. Technical report CFIS 2000-05,
Cornell University.
Brezhnev, V., & Kuznets, R. (2006). Making knowledge explicit: How hard it is. Theoretical
Computer Science, 357(1–3), 23–34.
692 S. Artemov
Dean, W., & Kurokawa, H. (2007). From the knowability paradox to the existence of proofs.
Manuscript (submitted to Synthese ).
Dean, W., & Kurokawa, H. (2010). The knower paradox and the quantified logic of proofs. In A.
Hieke (Ed.), Austrian Ludwig Wittgenstein society. Synthese, 176(2), 177–225.
Dretske, F. (1971). Conclusive reasons. Australasian Journal of Philosophy, 49, 1–22.
Dretske, F. (2005). Is knowledge closed under known entailment? The case against closure. In
M. Steup & E. Sosa (Eds.), Contemporary Debates in Epistemology (pp. 13–26). Malden:
Blackwell.
Fagin, R., & Halpern, J. Y. (1985). Belief, awareness, and limited reasoning: Preliminary report. In
Proceedings of the Ninth International Joint Conference on Artificial Intelligence (IJCAI-85),
(pp. 491–501). Los Altos, CA: Morgan Kaufman.
Fagin, R., & Halpern, J. Y. (1988). Belief, awareness, and limited reasoning. Artificial Intelligence,
34(1), 39–76.
Fagin, R., Halpern, J. Y., Moses, Y., & Vardi, M. (1995). Reasoning about knowledge. Cambridge:
MIT Press.
Fitting, M. (2003). A semantics for the logic of proofs. Technical report TR-2003012, CUNY Ph.D.
Program in Computer Science.
Fitting, M. (2005). The logic of proofs, semantically. Annals of Pure and Applied Logic, 132(1),
1–25.
Fitting, M. (2007). Intensional logic. Stanford Encyclopedia of Philosophy
(http://plato.stanford.edu), Feb 2007.
Fitting, M., & Mendelsohn, R. L. (1998). First-order modal logic. Dordrecht: Kluwer Academic.
Frege, G. (1952). On sense and reference. In P. Geach & M. Black (Eds.), Translations of the
philosophical writings of Gottlob Frege. Oxford: Blackwell.
Gettier, E. (1963). Is justified true belief knowledge? Analysis, 23, 121–123.
Gödel, K. (1933). Eine Interpretation des intuitionistischen Aussagenkalkuls. Ergebnisse Math.
Kolloq., 4, 39–40. English translation in: S. Feferman et al., (Eds.) (1986). Kurt Gödel collected
works (Vol. 1, pp. 301–303). Oxford: Oxford University Press/New York: Clarendon Press.
Gödel, K. (1995) Vortrag bei Zilsel/Lecture at Zilsel’s (*1938a). In S. Feferman, J. W. Dawson
Jr., W. Goldfarb, C. Parsons, & R. M. Solovay (Eds.), Unpublished essays and lectures (Kurt
Gödel collected works, Vol. III, pp. 86–113). Oxford University Press.
Goldman, A. (1967). A causal theory of knowing. The Journal of Philosophy, 64, 335–372.
Goris, E. (2007). Explicit proofs in formal provability logic. In S. Artemov & A. Nerode (Eds.),
Logical Foundations of Computer Science. International Symposium, LFCS 2007, Proceedings,
New York, June 2007 (Lecture notes in computer science, Vol. 4514, pp. 241–253). Springer.
Hendricks, V. F. (2003). Active agents. Journal of Logic, Language and Information, 12(4),
469–495.
Hendricks, V. F. (2005). Mainstream and formal epistemology. New York: Cambridge University
Press.
Heyting, A. (1934). Mathematische Grundlagenforschung. Intuitionismus. Beweistheorie. Berlin:
Springer.
Hintikka, J. (1962). Knowledge and belief. Ithaca: Cornell University Press.
Hintikka, J. (1975). Impossible possible worlds vindicated. Journal of Philosophical Logic, 4,
475–484.
Kleene, S. (1945). On the interpretation of intuitionistic number theory. The Journal of Symbolic
Logic, 10(4), 109–124.
Krupski, N. V. (2006). On the complexity of the reflected logic of proofs. Theoretical Computer
Science, 357(1), 136–142.
Krupski, V. N. (2001). The single-conclusion proof logic and inference rules specification. Annals
of Pure and Applied Logic, 113(1–3), 181–206.
Krupski, V. N. (2006). Referential logic of proofs. Theoretical Computer Science, 357(1),
143–166.
Kuznets, R. (2000). On the complexity of explicit modal logics. In Computer Science Logic 2000
(Lecture notes in computer science, Vol. 1862, pp. 371–383). Berlin/Heidelberg: Springer.
32 The Logic of Justification 693
Kuznets, R. (2008). Complexity issues in justification logic. PhD thesis, CUNY Graduate Center.
http://kuznets.googlepages.com/PhD.pdf.
Lehrer, K., & Paxson, T. (1969). Knowledge: Undefeated justified true belief. The Journal of
Philosophy, 66, 1–22.
Luper, S. (2005). The epistemic closure principle. Stanford Encyclopedia of Philosophy.
McCarthy, J., Sato, M., Hayashi, T., & Igarishi, S. (1978). On the model theory of knowledge.
Technical report STAN-CS-78-667, Stanford University.
Meyer, J. -J. Ch., & van der Hoek, W. (1995). Epistemic logic for AI and computer science.
Cambridge: Cambridge University Press.
p
Milnikel, R. (2007). Derivability in certain subsystems of the logic of proofs is …2 -complete.
Annals of Pure and Applied Logic, 145(3), 223–239.
Mkrtychev, A. (1997). Models for the logic of proofs. In S. Adian & A. Nerode (Eds.),
Logical Foundations of Computer Science ‘97, Yaroslavl’ (Lecture notes in computer science,
Vol. 1234, pp. 266–275). Springer.
Moses, Y. (1988). Resource-bounded knowledge. In M. Vardi (Ed.), Proceedings of the Second
Conference on Theoretical Aspects of Reasoning About Knowledge, Pacific Grove, March 7–9,
1988 (pp. 261–276). Morgan Kaufmann Publishers.
Neale, S. (1990). Descriptions. Cambridge: MIT.
Nozick, R. (1981). Philosophical explanations. Cambridge: Harvard University Press.
Pacuit, E. (2005). A note on some explicit modal logics. In 5th Panhellenic Logic Symposium,
Athens, July 2005.
Pacuit, E. (2006). A note on some explicit modal logics. Technical report PP-2006-29, University
of Amsterdam. ILLC Publications.
Parikh, R. (1987). Knowledge and the problem of logical omniscience. In Z. Ras &
M. Zemankova (Eds.), ISMIS-87 International Symposium on Methodology for Intellectual
Systems (pp. 432–439). North-Holland.
Rubtsova, N. (2005). Evidence- for based knowledge S5. In 2005 Summer Meeting of the
Association for Symbolic Logic, Logic Colloquium ’05, Athens, 28 July–3 August 2005.
Abstract. Association for Symbolic Logic. (2006, June). Bulletin of Symbolic Logic, 12(2),
344–345. doi:10.2178/bsl/1146620064.
Rubtsova, N. (2006). Evidence reconstruction of epistemic modal logic S5. In Computer
Science – Theory and Applications. CSR 2006 (Lecture notes in computer science, Vol. 3967,
pp. 313–321). Springer.
Russell, B. (1905). On denoting. Mind, 14, 479–493.
Russell, B. (1912). The problems of philosophy. London: Williams and Norgate/New York: Henry
Holt and Company.
Russell, B. (1919). Introduction to mathematical philosophy. London: George Allen and Unwin.
Stalnaker, R. C. (1996). Knowledge, belief and counterfactual reasoning in games. Economics and
Philosophy, 12, 133–163.
Troelstra, A. S. (1998). Realizability. In S. Buss (Ed.), Handbook of proof theory (pp. 407–474).
Amsterdam: Elsevier.
Troelstra, A. S., & Schwichtenberg, H. (1996). Basic proof theory. Amsterdam: Cambridge
University Press.
Troelstra, A. S., & van Dalen, D. (1988). Constructivism in mathematics (Vols. 1 & 2). Amsterdam:
North–Holland.
van Dalen, D. (1986). Intuitionistic logic. In D. Gabbay & F. Guenther (Eds.), Handbook of
philosophical logic (Vol. 3, pp. 225–340). Dordrecht: Reidel.
von Wright, G. H. (1951). An essay in modal logic. Amsterdam: North-Holland.
Yavorskaya (Sidon), T. (2006). Multi-agent explicit knowledge. In D. Grigoriev, J. Harrison, &
E. A. Hirsch (Eds.), Computer Science – Theory and Applications. CSR 2006 (Lecture notes in
computer science, Vol. 3967, pp. 369–380). Springer.
Chapter 33
Learning Theory and Epistemology
Kevin T. Kelly
Introduction
debates about what reliability is to the more objective task of determining which
precise senses of reliability are achievable in a given, precisely specified learning
problem.
A learning problem specifies (1) what is to learned, (2) a range of relevantly
possible environments in which the learner must succeed, (3) the kinds of inputs
those environments provide to the learner, (4) what it means to learn over a range of
relevantly possible environments, and (5) the sorts of learning strategies that will be
entertained as solutions. A learning strategy solves a learning problem just in case it
is admitted as a potential solution by the problem and succeeds in the specified sense
over the relevant possibilities. A problem is solvable just in case some admissible
strategy solves it.
Solvability is the basic question addressed by formal learning theory. To establish
a positive solvability result, one must construct an admissible learning strategy and
prove that the strategy succeeds in the relevant sense. A negative result requires a
general proof that every allowable learning strategy fails. Thus, the positive results
appear “methodological” whereas the negative results look “skeptical”. Negative
results and positive results lock together to form a whole that is more interesting than
the sum of its parts. For example, a learning method may appear unimaginative and
pedestrian until it is shown that no method could do better (i.e., no harder problem
is solvable). And a notion of success may sound too weak until it is discovered that
some natural problem is solvable in that sense but not in the more ambitious senses
one would prefer.
There are so many different parameters in a learning problem that it is common
to hold some of them fixed (e.g., the notion of success) and to allow others to
vary (e.g., the set of relevantly possible environments). A partial specification of
the problem parameters is called a learning paradigm and any problem agreeing
with these specifications is an instance of the paradigm.
The notion of a paradigm raises more general questions. After several solvability
and unsolvability results have been established in a paradigm, a pattern begins to
emerge and one would like to know what it is about the combinatorial structure of
the solvable problems that makes them solvable. A rigorous answer to that question
is called a characterization theorem.
Many learning theoretic results concern the relative difficulty of two paradigms.
Suppose we change a parameter (e.g., success) in one paradigm to produce another
paradigm. There will usually remain an obvious correspondence between problems
in the two paradigms (e.g., identical sets of serious possibilities). A reduction of
paradigm P to another paradigm P0 transforms a solution to a problem in P0 into
a solution to the corresponding problem in P. Then one may say that P is no
harder than P0 . Inter-reducible paradigms are equivalent. Equivalent paradigms
may employ intuitively different standards of success, but the equivalence in
difficulty shows that the quality of information provided by the diverse criteria
is essentially the same. Paradigm equivalence results may therefore be viewed as
epistemic analogues of the conservation principles of physics, closing the door on
the temptation to get something (more reliability) for nothing by fiddling with the
notion of success.
33 Learning Theory and Epistemology 697
Learning in Epistemology
forever” example, the refutation method simply returns “true” until a nonzero value
is observed and then halts inquiry with “false”.
When reliability demands verification with certainty, there is no tension between
the static concept of conclusive justification and the dynamical concept of reliable
success, since convergence to the truth occurs precisely when conclusive justifi-
cation is received. Refutation with certainty severs that tie: the learner reliably
stabilizes to the truth value of h, but when h is true there is no time at which that
guess is certainly justified. The separation of reliablity from complete justification
was hailed as a major epistemological innovation by the American Pragmatists.1 In
light of that innovation, one may either try to invent some notion of partial empirical
justification (e.g., a theory of confirmation), or one may, like Popper, side entirely
with reliability.2 Learning theory has nothing to say about whether partial epistemic
justification exists or what it might be. Insofar as such notions are entertained
at all, they are assessed either as components of reliable learning strategies or
as extraneous constraints on admissible strategies that may make reliability more
difficult or even impossible to achieve. Methodological principles with the latter
property are said to be restrictive.3
“Hypothetico-deductivism” is sometimes viewed as a theory of partial inductive
support (Glymour 1980), but it can also been understood as a strategy for reducing
scientific discovery to hypothesis assessment (Popper 1968; Kemeny 1953; Putnam
1963). Suppose that the relevant possibilities are covered by a countable family of
hypotheses, each of which is refutable with certainty and informative enough to
be interesting. A discovery method produces empirical hypotheses in response to
its successive observations. A discovery method identifies these hypotheses in the
limit just in case, on each relevantly possible input stream, the method eventually
stabilizes to some true hypothesis in the family. Suppose that there is an assessment
method that refutes each hypothesis with certainty. The corresponding hypothetico-
deductive method is constructed as follows. It enumerates the hypotheses (by
“boldness”, “abduction”, “plausibility”, “simplicity”, or the order by which they are
produced by “creative intuition”) and outputs the first hypothesis in the enumeration
that is not rejected by the given refutation method. That reduction has occurred to
just about everyone who has ever thought about inductive methodology. But things
needn’t be quite so easy. What if the hypotheses aren’t even refutable with certainty?
Could enumerating the right hypotheses occasion computational difficulties? Those
1
“We may talk of the empiricist and the absolutist way of believing the truth. The absolutists in this
matter say that we not only can attain to knowing truth, but we can know when we have attained to
knowing it; while the empiricists think that although we may attain it, we cannot infallibly know
when.” (James 1948: 95–96).
2
“Of course theories which we claim to be no more than conjectures or hypotheses need no
justification (and least of all a justification by a nonexistent ‘method of induction’, of which nobody
has ever given a sensible description).” (Popper 1982: 79).
3
Cf. section “A foolish consistency” below.
33 Learning Theory and Epistemology 699
are just the sorts of questions of principle that are amenable to learning theoretic
analysis, as will be seen below.
Another example of learning theoretic thinking in the philosophy of science is
Hans Reichenbach’s “pragmatic vindication” of the “straight rule” of induction
(Reichenbach 1938). Reichenbach endorsed Richard Von Mises’ frequentist inter-
pretation of probability. The relative frequency of an outcome in an input stream
at position n is the number of occurrences of the outcome up to position n divided
by n. The probability of an outcome in an input stream is the limit of the relative
frequencies as n goes to infinity. Thus, a probabilistic statement determines an
empirical proposition: the set of all input streams in which the outcome in question
has the specified limiting relative frequency.
To discover limiting relative frequencies, Reichenbach recommended using the
straight rule, whose guess at the probability of an outcome is the currently observed
relative frequency of that outcome. It is immediate by definition that if the relevant
possibilities include only input streams in which the limiting relative frequency of
an event type is defined, then following the straight rule gradually identifies the
true probability value, in the sense that on each relevantly possible input stream, for
each nonzero distance from the probability, the conjectures of the rule eventually
stay within that distance.
If the straight rule is altered to output an open interval of probabilities of
fixed width centered on the observed relative frequency, then the modified method
evidently identifies a true interval in the limit (given that a probability exists). That is
the same property that hypothetico-deductive inquiry has over countable collections
of refutable hypotheses.
So are probability intervals refutable with certainty? Evidently not, for each finite
input sequence is consistent with each limiting relative frequency: simply extend the
finite sequence with an infinite input sequence in which the probability claim is true.
Is there any interesting sense in which open probability intervals can be reliably
assessed? Say that a learner decides a hypothesis in the limit just in case in each
relevantly possible environment, the learner eventually stabilizes to “true” if the
hypothesis is true and to “false” if the hypothesis is false. According to that notion
of success, the learner is guaranteed to end up with the correct truth value, even
though no relevantly possible environment affords certain verification or refutation.
But even assuming that some limiting relative frequency exists, open probability
intervals are not decidable even in that weak, limiting sense (Kelly 1996).4 A learner
4
This footnote has been added in 2015. The results that follow are valid for the Reichenbach-Von
Mises view that probability is limiting relative frequency. But that is not the most natural way
to relate learning theoretic results to statistical inference. A better idea is to view chance as an
unanalyzed probability distribution over outcomes. Learners receive sequences of random samples
as inputs. Hypotheses are sets of possible chance distributions. Let 0 < ˛ 1. A learner ’-verifies
hypothesis H iff, in each possible chance distribution p, hypothesis H is true in p exactly when there
exists a sample size at which the learner outputs “true” with chance 1 ˛. Then open intervals
of probability are ’-verifiable, for 0 < ˛ 1. All of the learning theoretic criteria of success
discussed below can be re-interpreted probabilistically in a similar way.
700 K.T. Kelly
verifies a hypothesis in the limit just in case, on each relevantly possible input
stream, she converges to “true” if the hypothesis is true and fails to converge to
“true” otherwise. That even weaker notion of success is “one sided”, for when the
hypothesis is true, it is only guaranteed that “false” is produced infinitely often (pos-
sibly at ever longer intervals).5 Analogously, refutation in the limit requires conver-
gence to “false” when the hypothesis is false and anything but convergence to “false”
otherwise. It turns out that open probability intervals are verifiable but not decidable
in the limit given that some probability (limiting relative frequency) exists.6
Identification in the limit is possible even when the possible hypotheses are
merely verifiable in the limit. Indeed, identification in the limit is in general
reducible to limiting verification, but the requisite reduction is a bit more com-
plicated than the familiar hypothetico-deductive construction. Suppose we have
a countable family of hypotheses covering all the relevant possibilities and a
limiting verifier for each of these hypotheses. Enumerate the hypotheses so that
each hypothesis occurs infinitely often in the enumeration. At a given stage of
inquiry, find the first remaining hypothesis whose limiting verifier currently returns
“true”. If there is no such, output the first hypothesis and go to the next stage of
inquiry. If there is one, output it and delete all hypotheses occurring prior to it from
the hypothesis enumeration. It is an exercise to check that this method identifies
a true hypothesis in the limit. So although limiting verification is an unsatisfying
sense of reliable assessment, it sufficees for limiting identification. If the hypotheses
form a partition, limiting verifiability of each cell is also necessary for limiting
identification (Kelly 1996). So limiting verification is more important than it might
first have appeared.
Neyman and Pearson justified their theory of statistical testing in terms of the
frequentist interpretation of probability:
It may often be proved that if we behave according to such a rule, then in the long run we
shall reject h when it is true not more, say, than once in a hundred times, and in addition
we may have evidence that we shall reject h sufficiently often when it is false (Neyman and
Pearson 1933: 142).
The significance level of a test is a fixed upper bound on the limiting relative
frequency of false rejection of the hypothesis under test over all possible input
streams. A test is “useless” if the limiting frequency of mistaken acceptances
exceeds one minus the significance, for then one could have done better at reducing
the limiting relative frequency of error by ignoring the inputs and flipping a coin
biased according to the significance level. “Useful” testability can be viewed as
5
If there were any schedule governing the rate at which the outputs “false” spread apart through
time, that schedule could be used to produce a method that decides the hypothesis in the limit: the
new rule outputs “false” until the simulated rule produces more “true” outputs than the schedule
allows for. Thus, the potential for ever rarer “false” outputs when the hypothesis is false is crucial
to the extra lenience of this criterion.
6
Conjecturing “true” while the observed frequency is in the interval and “false” otherwise
does suffice unless we exclude possible input streams in which the limiting relative frequency
approaches its limit from one side, for all but finitely many stages along the input stream. A reliable
method is presented in Kelly (1996).
33 Learning Theory and Epistemology 701
a learning paradigm over input streams. How does it relate to the “qualitative”
paradigms just discussed? It turns out that the existence of a useful test for a
hypothesis is equivalent to the hypothesis being either verifiable or refutable in
the limit (Kelly 1996). That is an example of a paradigm equivalence theorem,
showing that useful statistical tests provide essentially no more “information” than
limiting verification or refutation procedures, assuming the frequentist interpretation
of probability.
It is standard to assume in statistical studies that the relevant probabilities exist,
but is there a sense in which that claim could be reliably assessed? Demonic
arguments reveal the existence of a limiting relative frequency to be neither
verifiable in the limit nor refutable in the limit over arbitrary input streams. But
that hypothesis is gradually verifiable in the sense that there is a method that
outputs numbers in the unit interval such that the numbers approach one just if
the hypothesis is true (Kelly 1996). A demonic argument shows that the existence
of a limiting relative frequency is not gradually refutable, in the sense of producing
a sequence of numbers approaching zero just in case that hypothesis is false.
Gradual decidability requires that the learner’s outputs gradually converge to the
truth value of the hypothesis whatever the truth value happens to be. Unlike gradual
verification and refutation, which we have just seen to be weaker than their limiting
analogues, gradual decision is inter-reducible with limiting decision: simply choose
a cutoff value (e.g. 0.5) and output “true” if the current output is less than 0.5 and
“false” otherwise. Gradual decision is familiar as the sense of success invoked in
Bayesian convergence arguments. Since Bayesian updating by conditionalization
can never retract a zero or a one on input of nonzero probability, those outputs
indicate certainty (inquiry may as well be halted), so limiting decision may only be
accomplished gradually.
This short discussion illustrates how familiar epistemological issues as diverse
as the problem of induction, Popper’s falsificationism, Reichenbach’s vindication
of the straight rule, statistical testability, and Bayesian convergence all fit within a
single, graduated system of learnability concepts.
Computable Learning
7
Putnam’s actual argument was more complicated than the version presented here.
8
Putnam concluded that a scientific method should always be equipped with an extra input slot into
which hypotheses that occur to us during the course of inquiry can be inserted. But such an “open
minded” method must hope that the external hypothesis source (e.g., “creative intuition”) does not
suggest any programs that go into infinite loops, since the inability to distinguish such programs
from “good” ones is what restricted the reliability of computable predictors to begin with!
33 Learning Theory and Epistemology 703
9
This construction (Case and Smith 1983) is a bit stronger than Gold’s. It produces an input stream
on which infinitely many outputs of the learner are wrong. Gold’s construction merely forces the
learner to vacillate forever (possibly among correct conjectures).
10
Cf. the preceding footnote. In the learning theoretic literature, unstable identification is called BC
identification for “behaviorally correct”, whereas stable identification is called EX identification for
“explanatory”. Osherson and Weinstein (1986) call stable identification “intensional” and unstable
identification “extensional”.
704 K.T. Kelly
predictions from those conjectures in the manner just described, it may get caught
in an infinite loop and hang for eternity.
Blum and Blum (1975) constructed a learning problem that is computably iden-
tifiable in the limit but not computably extrapolable for just that reason. Consider a
problem in which an unknown Turing machine without infinite loops is hidden in a
box and the successive input are the (finite) runtimes of that program on successive
inputs. The learner’s job is to guess some computer program whose runtimes match
the observed runtimes for each input (a task suggestive of fitting a computational
model to psychological reaction time input). In that problem, every program is
computably refutable with certainty: simulate it and see if it halts precisely when the
input say it should. Infinite loops are no problem, for one will observe in finite time
that the program doesn’t halt when it should have. Since the set of all programs is
computably enumerable (we needn’t restrict the enumeration to total programs this
time), a computable implementation of the hypothetico-deductive strategy identifies
a correct hypothesis in the limit. Nonetheless, computable extrapolation of runtimes
is not possible. Let a computable extrapolator be given. The demon is a procedure
that wastes computational cycles in response to the computable predictor’s last
prediction. So at a given stage, the demonic program simulates the learner’s program
on the successive runtimes of the demonic program on earlier inputs. Whatever the
learner’s prediction is, the demon goes into a wasteful subroutine that uses at least
one more step of computation than the predictor expected.
Another question raised by the preceding discussion is whether stable identifica-
tion is equivalent to or harder than unstable identification for computable learners
in the computable function identification paradigm. That question is answered
affirmatively by Case and Smith (1983). To see why the answer might be positive,
consider the function identification problem in which the relevant possibilities are
the “almost self-describing input streams”. A unit variant of an input stream is a
partial computable function that is just like the input stream except that it may
disagree or be undefined in at most one position. An input stream is almost self-
describing just in case it is a unit variant of the function computed by the program
whose index (according to a fixed, effective encoding of Turing programs into
natural numbers) occurs in the input stream’s first position. In other words, an
“almost self-describing” input stream “gives away” a nearly correct hypothesis,
but it doesn’t say where the possible mismatch might be. An unstable learner
can succeed by continually patching the “given away” program with ever larger
lookup tables specifying what has been seen so far, since eventually the lookup
table corrects the mistake in the “given away” program. But a stable learner would
have to know when to stop patching, and that information was not given away.
In the problem just described, it is trivial to stably identify an almost correct
program (just output the first datum) whereas no computable learner can stably
identify an exactly correct program. Indeed, for each finite number of allowed errors
there is a learning problem that is computably solvable under that error error fewer
but not with one error fewer (Case and Smith 83). That result, known as the anomaly
hierarchy theorem, can be established by means of functions that are self-describing
up to n possible errors.
33 Learning Theory and Epistemology 705
There are many more sophisticated results of the kind just presented, all of
which share the following points in common. (1) Uncomputability is taken just as
seriously as the problem of induction from the very outset of the analysis. That is
different from the approach of traditional epistemology, in which idealized logics of
justification are proposed and passed along to experts in computation for advice on
how to satisfy them (e.g., Levi 1991). (2) When computability is taken seriously, the
halting problem (the formal problem of determining whether a computer program
is in an infinite loop on a given input) is very similar to the classical problem of
induction: for as soon as one is sure that a computation will never end, it might, for
all the simulator knows a priori, halt at the next stage. (3) Thus, computable learners
fail when ideal ones succeed because computable solvability requires the learner to
solve an internalized problem of induction (Kelly and Schulte 1997).
11
I.e., the procedure halts on members of the set (indicating acceptance) and not on any other
inputs.
12
The demon presents a text for the infinite language until the learner outputs a grammar for it, then
keeps repeating the preceding datum until the learner produces a grammar for the input presented
so far, then starts presenting the text from where he left off last, etc.
706 K.T. Kelly
13
A systematic compendium of results on language learnability is Osherson and Weinstein (1986).
14
The “onto” assumption can be dropped if empirical adequacy rather than truth is the goal Lauth
(1993).
33 Learning Theory and Epistemology 707
15
I.e., the sentence has the form of a quantifier-free sentence preceded by a sequence of quantifiers.
708 K.T. Kelly
16
The computational versions of these ideas are in Gold (1965), Putnam (1965), and Kugel (1977).
The topological space is introduced in Osherson and Weinstein (1986) and the characterizations
are developed in Kelly (1992, 1996) Logical versions of the characterizations are developed in
Osherson and Weinstein (1991) and Kelly and Glymour (1990).
17
These are, in fact, the open sets of an extensively studied topological space known as the Baire
space (Hinman 1978).
18
Necessity of the condition fails if the hypotheses are mutually compatible or if we drop the
stability requirement.
33 Learning Theory and Epistemology 709
consider the overall correctness relation C(e, h), which expresses that hypothesis
h is correct in environment e. In computable function identification, for example,
correctness requires that h be the index of a computer program that computes e. In
language learning from text, h must be the index of a positive test procedure for
the range of e. By suitable coding conventions, language learning from informant
and logical learning can also modeled with correctness relations in the input stream
paradigm. Computational analogs of the Borel complexity classes can be defined
for correctness relations, in which case analogous characterization theorems hold
for computable inquiry (Kelly 1996).
The moral of this discussion is that the problem of induction, or empirical
underdeterination, comes in degrees corresponding to standard topological and
computational complexity classes, which determine the objective sense in which
reliable inquiry is possible.
A Foolish Consistency
19
i.e., hyperarithmetically definable.
20
Osherson and Weinstein (1986) contains many restrictiveness results carrying a similar moral.
Also, see Osherson and Weinstein (1988).
710 K.T. Kelly
In the Meno, Plato outlined what has come to be known as the concept learning
paradigm, which has captured the imagination of philosophers, psychologists, and
artificial intelligence researchers ever since. A concept learning problem specifies
a domain of examples described as vectors of values (e.g., blue, five kilos) of a
corresponding set of attributes (e.g., color, weight), together with a set of possible
target concepts, which are sets of examples. The learner is somehow presented
with examples labelled either as positive or as negative examples of the concept
to be learned, and the learner’s task is to converge in some specified sense to a
correct definition. In contemporary artificial intelligence and cognitive science, the
“concepts” to be learned are defined by neural networks, logic circuits, and finite
state automata, but the underlying paradigm would still be familiar to Socrates.
Socrates ridiculed interlocutors who proposed disjunctive concept definitions,
which suggests that he would countenance only conjunctively definable concepts as
relevant possibilities. Socrates’ notorious solution to the concept learning problem
was to have the environment “give away” the answer in a mystical flash of insight.
But J. S. Mill’s (i.e., Francis Bacon’s) well-known inductive methods need no
mystical help to identify conjunctive concepts with certainty: the first conjecture
is the first positive example sampled. On each successive positive example in the
sample, delete from the current conjecture each conjunct that disagrees with the
corresponding attribute value of the example (the “method of difference”). On each
successive negative example that agrees with the current conjecture everywhere
except on one attribute, underline the value of that attribute in the current conjecture
(the “method of similarity”). When all conjuncts in the current conjecture are
underlined, halt inquiry.
Boolean concepts are also identifiable with certainty over a finite set of attribute
values: wait for all possible examples to come in and then disjoin the positive
ones. Bacon’s methods sound plausible in the conjunctive case, whereas that
“gerrymandering” procedure for learning Boolean concepts sounds hopeless (it is,
in fact, just what Socrates ridiculed). Yet both procedures identify the truth with
certainty, since the set of examples is finite. The PAC (Probably Approximately
Correct) paradigm distinguishes such “small” problems in terms of tractable rather
than merely computable inquiry.21
21
An excellent source presenting all of the results mentioned here is Kearns and Vazirani (1994),
which provides detailed descriptions and bibliographic notes for all the results mentioned below.
712 K.T. Kelly
In the PAC paradigm, examples are sampled with replacement from an urn in
which the probability of selecting an example is unknown. There is a collection
of relevantly possible concepts and also a collection of hypotheses specifying the
possible forms in which the learner is permitted to define a relevantly possible
concept. Say that a hypothesis is –-accurate just in case the sampling probability
that a single sampled individual is a counterexample is less than –. The learner is
given a confidence parameter ı and an error parameter –. From these parameters, the
learner specifies a sample size and upon inspecting the resulting sample, she outputs
a hypothesis. A learning strategy is probably approximately correct (PAC) just in
case for each probability distribution on the urn and for each –, ı exceeding zero,
the strategy has a probability of at least 1 – of producing an –-accurate hypothesis.
It remains to specify what it means for a PAC learning strategy to be efficient.
Computational complexity is usually analyzed in terms of asymptotic growth rate
over an infinite sequence of “similar” but “ever larger” examples of the problem.
Tractability is understood as resource consumption bounded almost everywhere by
some polynomial function of problem size. The size of a concept learning problem
is determined by (1) the number of attributes (2) the size of the smallest definition
of the target concept, (3) the reciprocal of the confidence parameter, and (4) the
reciprocal of the error parameter (higher accuracy and reliability requirements make
for a “bigger” inference problem). A input efficient PAC learner takes a sample
in each problem whose size is bounded by a polynomial in these four arguments.
Otherwise, the learner requires samples that are exponentially large in the same
parameters, which is considered to be intractable.
There is an elegant combinatorial characterization of how large the sample
required for PAC learning should be. Say that a concept class shatters a set S of
examples just in case each subset of S is the intersection of S with some concept
in the class. The Vapnik-Chervonenkis (VC) dimension of the concept class is the
cardinality of the largest set of instances shattered by the class. There exists a fixed
constant c such that if the VC dimension of the concept class is d, it suffices for PAC
learnability that a sample of size s be taken, where:
1 1 d 1
sc log C log
– ı – –
and not the size of the (unknown) target concept itself will be input-inefficient (since
the sample size grows non-polynomially when concept size is held fixed at the
minimum value).
A computationally efficient PAC learner is a PAC learner whose runtime is
bounded by a polynomial of the sort described in the definition of input efficiency.
Since scanning a sampled instance takes time, computational efficiency implies
input efficiency. Since Bacon’s method is computationally trivial and requires small
samples, it is a computationally efficient PAC learner. Bacon’s method can be
generalized to efficiently PAC learn k-CNF concepts (i.e., conjunctions of k-ary
disjunctions of atomic or negated atomic sentences), for fixed k.
Sometimes computational difficulties arise entirely because it is hard for the
learner to frame her conjecture in the required hypothesis language. It is known, for
example, that the k-term DNF concepts (i.e., disjunctions of k purely conjunctive
concepts) are not efficiently PAC learnable using k-term DNF hypotheses (when
k 2),22 whereas they are efficiently PAC learnable using k-CNF hypotheses.
For some time it was not known whether there exist efficiently solvable PAC
problems that are unsolvable neither due to sample-size complexity nor due to
output representation. It turns out (Kearns and Valiant 1994) that under a standard
cryptographic hypothesis,23 the Boolean concepts of length polynomial in the
number of attributes have that property, as does the neural network training problem.
An alternative way to obtain more refined results in a non-probabilistic context is
to permit the learner to ask questions. A membership oracle accepts an example
from the learner and returns “in” or “out” to indicate whether it is a positive
or a negative example. A Socratic oracle responds to an input conjecture with a
counterexample, if there is one.24 One such result is that Socratic and membership
queries suffice for identification of finite state automata with certainty in polynomial
time (Angluin 1987).
22
This negative result holds only under the familiar complexity-theoretic hypothesis that P ¤ N P.
23
I.e, that computing discrete cube roots is intractable even for random algorithms.
24
In the learning theoretic literature, Socratic queries are referred to as “equivalence” queries.
714 K.T. Kelly
of truth through time, ignoring the possibility of meaning shifts due to conceptual
change.
But on a more careful examination, learning theory amplifies some recent
epistemological trends. The search for incorrigible foundations for knowledge is no
longer considered a serious option, so the fact that reliability depends on contingent
assumptions is hardly a penetrating objection. Indeed, it can be shown by learning
theoretic means that if some background knowledge is necessary for reliability, that
knowledge cannot be reliably assessed according to the same standard, blocking any
attempt at an entirely reliability-based foundationalism.
Externalist epistemologies sidestep the foundational demand that the conditions
for reliability be known by requiring only that we be reliable, without necessarily
being aware of that fact. Knowledge attributions are then empirical hypotheses that
can be studied by ordinary empirical means. But mature empirical investigations are
always focused by general mathematical constraints on what is possible. Accord-
ingly, learning theoretic results constrain naturalistic epistemology by specifying
how reliable an arbitrary system, whether computable or otherwise, could possibly
be in various learning situations.
Externalism has encountered the objection (Lehrer 1990) that reliability is
insufficient for knowledge if one is not justified in believing that one is reliable
(e.g., someone has a thermometer implanted in her brain that suddenly begins
to produce true beliefs about the local temperature). The intended point of such
objections is that reliable belief-forming processes should be embedded in a
coherent belief system incorporating beliefs about the agent’s own situation and
reliability therein. Learning theory may then be viewed as defining the crucial
relation of methodological coherence between epistemic situations, ambitions, and
means. Unlearnability arguments isolate methodological incoherence and positive
arguments suggest methods, background assumptions, or compromised ambitions
which, if adopted, could bring a system of beliefs into methodological coherence.
Incorporating learning theoretic structure into the concept of coherence addresses
what some coherentists take to be the chief objection to their position.
: : : [A]lthough any adequate epistemological theory must confront the task of bridging the
gap between justification and truth, the adoption of a nonstandard conception of truth, such
as a coherence theory of truth, will do no good unless that conception is independently
motivated. Therefore, it seems that a coherence theory of justification has no acceptable
way of establishing the essential connection with truth (Bonjour 1985): 110.
Bibliography
Angluin, D. (1980). Inductive inference of formal languages from positive data. Information and
Control, 45(2), 117–135.
Angluin, D. (1987). Learning regular sets from queries and counterexamples. Information and
Computation, 75, 87–106.
Blum, M., & Blum, L. (1975). Toward a mathematical theory of inductive inference. Information
and Control, 28, 125–155.
Bonjour, L. (1985). The structure of empirical knowledge. Cambridge: Harvard University Press.
Brown, R., & Hanlon, C. (1970). Derivational complexity and the order of acquisition of child
speech. In J. Hayes (Ed.), Cognition and the development of language. New York: Wiley.
Carnap, R. (1950). The logical foundations of probability. Chicago: University of Chicago Press.
Case, J., & Smith, C. (1983). Comparison of identification criteria for machine inductive inference.
Theoretical Computer Science, 24, 193–220.
DeFinetti. (1990). The theory of probability. New York: Wiley.
Glymour, C. (1980). Theory and evidence. Cambridge: M.I.T. Press.
Gold, E. M. (1965). Limiting recursion. Journal of Symbolic Logic, 30, 27–48.
Gold, E. M. (1967). Language identification in the limit. Information and Control, 10, 447–474.
Halmos, P. (1974). Measure theory. New York: Springer.
Hinman, P. (1978). Recursion theoretic hierarchies. New York: Springer.
James, W. (1948). The will to believe. In A. Castell (Ed.), Essays in pragmatism. New York: Collier
Macmillan.
Kearns, M., & Valiant, L. (1994). Cryptographic limitations on learning boolean formulae and
finite automata. Journal of the ACM, 41, 57–95.
Kearns, M., & Vazirani, U. (1994). An introduction to computational learning theory. Cambridge:
M.I.T. Press.
Kelly, K. (1992). Learning theory and descriptive set theory. Logic and Computation, 3, 27–45.
Kelly, K. (1996). The logic of reliable inquiry. New York: Oxford University Press.
Kelly, K., & Glymour, C. (1989). Convergence to the truth and nothing but the truth. Philosophy
of Science, 56, 185–220.
Kelly, K., & Glymour, C. (1990). Theory discovery from data with mixed quantifiers. Journal of
Philosophical Logic, 19, 1–33.
Kelly, K., & Glymour, C. (1992). Inductive inference from theory-laden data. Journal of Philo-
sophical Logic, 21, 391–444.
716 K.T. Kelly
Kelly, K., & Schulte, O. (1995). The computable testability of theories making uncomputable
predictions. Erkenntnis, 43, 29–66.
Kelly, K., & Schulte, O. (1997). Church’s thesis and Hume’s problem. In M. L. Dalla Chiara et al.
(Eds.), Logic and scientific methods. Dordrecht: Kluwer.
Kemeny, J. (1953). The use of simplicity in induction. Philosophical Review, 62, 391–408.
Kugel, P. (1977). Induction pure and simple. Information and Control, 33, 236–336.
Lauth, B. (1993). Inductive inference in the limit for first-order sentences. Studia Logica, 52,
491–517.
Lehrer, K. (1990). Theory of knowledge. San Francisco: Westview.
Levi, I. (1991). The fixation of belief and its undoing. Cambridge: Cambridge University Press.
Miller, D. (1974). On Popper’s definitions of verisimilitude. British Journal of the Philosophy of
Science, 25, 155–188.
Mormann, T. (1988). Are all false theories equally false? British Journal for the Philosophy of
Science, 39, 505–519.
Neyman, J., & Pearson, E. (1933). On the problem of the most efficient tests of statistical
hypotheses. Philosophical Transactions of the Royal Society, 231(A), 289–337.
Osherson, D., & Weinstein, S. (1986). Systems that learn. Cambridge: M.I.T. Press.
Osherson, D., & Weinstein, S. (1988). Mechanical learners pay a price for Bayesianism. Journal
of Symbolic Logic, 56, 661–672.
Osherson, D., & Weinstein, S. (1989a). Paradigms of truth detection. Journal of Philosophical
Logic, 18, 1–41.
Osherson, D., & Weinstein, S. (1989b). Identification in the limit of first order structures. Journal
of Philosophical Logic, 15, 55–81.
Osherson, D., & Weinstein, S. (1991). A universal inductive inference machine. Journal of
Symbolic Logic, 56, 661–672.
Popper, K. (1968). The logic of scientific discovery. New York: Harper.
Popper, K. (1982). Unended quest: An intellectual autobiography. LaSalle: Open Court.
Putnam, H. (1963). Degree of confirmation’ and inductive logic. In A. Schilpp (Ed.), The
philosophy of Rudolph Carnap. LaSalle: Open Court.
Putnam, H. (1965). Trial and error predicates and a solution to a problem of Mostowski. Journal
of Symbolic Logic, 30, 49–57.
Reichenbach, H. (1938). Experience and prediction. Chicago: University of Chicago Press.
Savage, L. (1972). The foundations of statistics. New York: Dover.
Sextus Empiricus. (1985). Selections from the major writings on scepticism, man and god (P.
Hallie, Ed., S. Etheridge, Trans.). Indianapolis: Hackett.
Shapiro, E. (1981). Inductive inference of theories from facts. Report YLU 192. New Haven:
Department of Computer Science, Yale University.
Wexler, K., & Culicover, P. (1980). Formal principles of language acquisition. Cambridge: M.I.T.
Press.
Chapter 34
Some Computational Constraints
in Epistemic Logic
Timothy Williamson
Introduction
This paper concerns limits that some epistemic logics impose on the complexity of
an epistemic agent’s reasoning, rather than limits on the complexity of the epistemic
logic itself.
As an epistemic agent, one theorizes about a world which contains the theo-
rizing of epistemic agents, including oneself. Epistemic logicians theorize about
the abstract structure of epistemic agents’ theorizing. This paper concerns the
comparatively simple special case of epistemic logic in which only one agent is
considered. Such an epistemic agent theorizes about a world which contains that
agent’s theorizing. One has knowledge about one’s own knowledge, or beliefs
about one’s own beliefs. The considerations of this paper can be generalized to
multi-agent epistemic logic, but that will not be done here. Formally, single-agent
epistemic logic is just standard monomodal logic; we call it ‘epistemic’ in view of
the envisaged applications.
In epistemic logic, we typically abstract away from some practical computational
limitations of all real epistemic agents. For example, we are not concerned with
their failure to infer from a proposition q the disjunction q _ r for every unrelated
proposition r. What matters is that if some propositions do in fact follow from the
agent’s theory (from what the agent knows, or believes), then so too do all their
logical consequences. For ease of exposition, we may idealize epistemic agents and
describe them as knowing whatever follows from what they know, or as believing
whatever follows from what they believe, but we could equally well redescribe the
matter in less contentious terms by substituting ‘p follows from what one knows’
T. Williamson ()
University of Oxford, Oxford, UK
e-mail: [email protected]
for ‘one knows p’ or ‘p follows from what one believes’ for ‘one believes p’
throughout the informal renderings of formulas, at the cost only of some clumsiness.
Thus, if we so wish, we can make what looks like the notorious assumption
of logical omniscience true by definition of the relevant epistemic operators. On
suitable readings, it is a triviality rather than an idealization. It does not follow that
no computational constraints are of any concern to epistemic logic. For if one’s
knowledge is logically closed by definition, that makes it computationally all the
harder to know that one does not know something: in the standard jargon, logical
omniscience poses a new threat to negative introspection. That threat is one of the
phenomena to be investigated in this paper.
In a recursively axiomatizable epistemic logic, logical omniscience amounts
to closure under a recursively axiomatizable system of inferences. Thus all the
inferences in question can in principle be carried out by a single Turing machine, an
idealized computer. Epistemic logicians do not usually want to make assumptions
which would require an epistemic agent to exceed every Turing machine in
computational power. In particular, such a requirement would presumably defeat the
purpose of the many current applications of epistemic logic in computer science. By
extension, epistemic logicians might prefer not to make assumptions which would
permit an epistemic agent not to exceed every Turing machine in computational
power only under highly restrictive conditions. Of course, such assumptions might
be perfectly appropriate in special applications of epistemic logic to cases in which
those restrictive conditions may be treated as met. But they would not be appropriate
in more general theoretical uses of epistemic logic.
As an example, let us consider the so-called axiom of negative introspection
alluded to above. It may be read as the claim that if one does not know p then
one knows that one does not know p, or that if one does not believe p then
one believes that one does not believe p. In terms of theories: if one’s theory
does not entail p, then one’s theory entails that one’s theory does not entail p.
That assumption is acceptable in special cases for special values of ‘p’. However,
for a theory to be consistent is in effect for there to be some p which it does
not entail. On this reading, negative introspection implies that if one’s theory is
consistent then it entails its own consistency. But, by Gödel’s second incompleteness
theorem, if one’s theory is recursively axiomatizable and includes Peano arithmetic,
then it entails its own consistency only if it is inconsistent. Thus, combined with
the incompleteness theorem, negative introspection implies that if one’s theory is
recursively axiomatizable then it includes Peano arithmetic only if it is inconsistent.
Yet, in a wide range of interesting cases, the output of a Turing machine, or
the theory of an epistemic agent of equal computational power, is a consistent
recursively axiomatizable theory which includes Peano arithmetic. Thus, except in
special circumstances, the negative introspection axiom imposes an unwarranted
constraint on the computational power of epistemic agents.
Naturally, such an argument must be made more rigorous before we can place
much confidence in it. That will be done below. The problem for the negative
introspection axiom turns out to be rather general: it arises not just for extensions of
Peano arithmetic but for any undecidable recursively axiomatizable theory, that is,
34 Some Computational Constraints in Epistemic Logic 719
for any theory which is the output of some Turing machine while its complement is
not. It is very natural to consider epistemic agents whose theories are of that kind.
The aim of this paper is not primarily to criticize the negative introspection
axiom. Rather, it is to generalize the problem to which that axiom gives rise,
to formulate precisely the conditions which a system of epistemic logic must
satisfy in order not to be susceptible to such problems, and to investigate which
systems satisfy those conditions. The conditions in question will be called r.e.
conservativeness and r.e. quasi-conservativeness. Very roughly indeed, a system
satisfies these conditions if it has a wide enough variety of models in which the
epistemic agent is computationally constrained. Such models appear to be among
the intended models on various applications of epistemic logic. As already noted,
systems of epistemic logic which do not satisfy the conditions may be appropriate
for other applications. But it is time to be more precise.
Computational Constraints
in L which the agent (with only the computational power of a Turing machine) has
learned about the black box. Such situations seem quite reasonable. If we want an
epistemic logic to have a generality beyond some local application, it should apply to
them: such situations should correspond to intended models. Now any application
which has all those intended models thereby satisfies (*) or (*con ), depending on
whether the epistemic agent’s theory is required to be consistent:
(*) For every r.e. theory R in L, L \ 1 M D R for some r.e. intended model M.
(*con ) For every consistent r.e. theory R in L, L \ 1 M D R for some r.e. intended
model M.
(*) is appropriate for readings of like ‘It follows from what I believe that : : : ’,
if the agent is not required to be consistent. For readings of like ‘It follows from
what I know that : : : ’, only (*con ) is appropriate, for one can know only truths and
any set of truths is consistent. We can define corresponding constraints on a modal
logic † without reference to models:
† is r.e. conservative if and only if for every r.e. theory R in L, there is a maximal
†-consistent set X such that 1 X is r.e. and L \ 1 X D R.
† is r.e. quasi-conservative if and only if for every consistent r.e. theory R in L,
there is a maximal †-consistent set X such that 1 X is r.e. and L \ 1 X D R.
Here 1 X D f˛ 2 L : ˛ 2 Xg. Roughly, if † is r.e. (quasi-)conservative
then every (consistent) r.e. theory in the language without is conservatively
extended by an r.e. theory in the language with such that it is consistent in
† for R to be exactly what the agent cognizes in the language without while
what the agent cognizes in the language with constitutes an r.e. theory. If an
application satisfies (*), its logic is r.e. conservative, for X can be the set of
formulas true in M. Conversely, any r.e. conservative logic is the logic of some
application which satisfies (*), for some appropriate kind of intended model. The
same relationships hold between (*con ) and r.e. quasi-conservativeness. For many
applications of epistemic logic, the class of intended models is quite restricted and
even (*con ) does not hold. But if the application interprets as something like ‘It
follows from what I believe/know that’, without special restrictions on the epistemic
subject, then situations of the kind described above will correspond to intended
models and the logic of the application will be r.e. [quasi-] conservative. In this
paper we do not attempt to determine which informally presented applications of
epistemic logic satisfy (*) or (*con ). We simply investigate which logics are r.e.
[quasi-] conservative.
Trivially, every r.e. conservative modal logic is r.e. quasi-conservative. Examples
will be given below of r.e. quasi-conservative normal modal logics which are not r.e.
conservative. For prenormal modal logics, r.e. conservativeness can be characterized
in terms of r.e. quasi-conservativeness in a simple way which allows us to transfer
results about one to the other:
Proposition 34.1 Let † be a prenormal modal logic. Then † is r.e. conservative if
and only if † is r.e. quasi-conservative and not `† ˙>.
34 Some Computational Constraints in Epistemic Logic 723
Which modal logics are not r.e. [quasi-] conservative? Obviously, since `S5 ˙>,
the logic S5 is not r.e. conservative. Since S5 is decidable, this does not result from
non-recursiveness in S5 itself. More significantly:
Proposition 34.2 S5 is not r.e. quasi-conservative.
Proof (Skyrms 1978, 377 and Shin and Williamson 1994, Proposition 34.1 have
similar proofs of related facts about S5): Let R be a non-recursive r.e. theory in L;
R is consistent. Suppose that 1 X is r.e. and L \ 1 X D R for some maximal
S5-consistent set X. Now L – R D f˛ : :˛ 2 Xg \ L. For if ˛ 2 L – R
then ˛ 62 X, so :˛ 2 X; but `S5 :˛ :˛, so :˛ 2 X since X
is maximal S5-consistent. Conversely, if :˛ 2 X then :˛ 2 X since `S5
:˛ :˛, so ˛ 62 X, so ˛ 62 R since L \ 1 X D R. Since 1 X is r.e., so
is f˛ : :˛ 2 X) \ L, i.e. L – R. Contradiction.
Thus the partitional conception of knowledge prevents a subject with the
computational capacity of a Turing machine from having as the restriction of its
theory to the -free language any non-recursive r.e. theory (for other problems with
the S5 schema in epistemic logic and further references see Williamson (2000, 23–
24, 166–167, 226–228, 316–317)). Thus S5 is unsuitable as a general epistemic
logic for Turing machines.
724 T. Williamson
The proof of Proposition 34.2 depends on the existence of an r.e. set whose
complement is not r.e. By contrast, the complement of any recursive set is itself
recursive; decidability, unlike semi-decidability, is symmetric between positive
and negative answers. The analogue of Proposition 34.2 for a notion like r.e.
quasi-conservativeness but defined in terms of recursiveness rather than recursive
enumerability would be false. For it is not hard to show that if R is a consistent
recursive theory in L, then there is a maximal S5-consistent set X in L such that
1 X is recursive and L \ 1 X D R. Thus S5 imposes computational constraints
not on very clever agents (whose theories need not be r.e.) or on very stupid agents
(whose theories must be recursive) but on half-clever agents (whose theories must
be r.e. but need not be recursive).
Proposition 34.2 is the rigorous version of the argument sketched in the
introduction. Can we generalize it? The next result provides a rather unintuitive
necessary condition for r.e. quasi-conservativeness which nevertheless has many
applications.
Theorem 34.3 Let † be a modal logic such that for some formulas ˛ 0 , : : : , ˛ n 2 L
and ˇ 0 , : : : , ˇ n 2 L, `† _f˛ i : i ng and, for each i n, `† (˛ i ^ˇ i ) ?
and not `PC :ˇ i . Then † is not r.e. quasi-conservative.
Proof There are pairwise disjoint r.e. subsets I0 , I1 , I2 , : : : of the natural numbers
N such that for every total recursive function f, i 2 If (i) for some i 2 N. For let f [0],
f [1], f [2], : : : be a recursive enumeration of all partial and total recursive functions
on N and set Ii D fj : f [j](j) is defined and D ig; then j 2 If [j](i) whenever f [j] is total,
Ii is r.e. and Ii \ Ij D fg whenever i ¤ j. Now suppose that (i) `† _f˛ i : i ng;
(ii) `† (˛ i ^ ˇ i ) ? for each i n; (iii) `PC :ˇ i for no i n. Let m be the
highest subscript on any propositional variable occurring in ˇ 0 , : : : , ˇ n . For all
i 2 N, let i and i be substitutions such that i pj D pi(mC1)Cj and i pi(mC1)Cj D pj
for all j 2 N. Set U D f i ˇ j : i 2 Ij g. Since the i are recursive and the Ij are r.e.,
U is r.e. Now `PC : i ˇ j for no i, j, otherwise `PC : i i ˇ j , i.e., `PC :ˇ j , contrary
to (iii). Moreover, if h ¤ i then h ˇ j and i ˇ k have no propositional variable in
common. Thus if h 2 Ij and i 2 Ik and h ˇ j has a variable in common with i ˇ k ,
then h D i, so j D k because the Ij are pairwise disjoint. Hence no two members of U
have a propositional variable in common. Thus U is consistent. Let R be the smallest
theory in L containing U; R is consistent and r.e. Suppose that for some maximal
†-consistent set X, 1 X is r.e. and L \ 1 X D R. Let the total recursive function
g enumerate 1 X. Fix j 2 N. By (i), `† _f j ˛ i : i n.g since † is closed under
US, so j ˛ i 2 Y for some i n since Y is maximal †-consistent. Thus g(k) D j ˛ i
for some k; let k(j) be the least k such that g(k) 2 f j ˛ i : i ng. Let f (j) be the least
i n such that g(k(j)) D j ˛ i . Since g enumerates 1 X, j ˛ f (j) 2 X. Since g and
j are total recursive, k is total recursive, so f is total recursive. Thus j 2 If (j) for
some j 2 N, so j ˇ f (j) 2 U R since f (j) n. Since L \ 1 X D R, j ˇ f (j) 2 X.
By (ii), `† (˛ f (j) ^ ˇ f (j) ) ?, so `† ( j ˛ f (j) ^ j ˇ f (j) ) ?; since X is
maximal †-consistent, ? 2 X. Thus ? 2 R, contradicting the consistency of R.
Thus no such set as X can exist, so † is not r.e. quasi-conservative.
34 Some Computational Constraints in Epistemic Logic 725
Since every modal logic with an r.e. [quasi-] conservative extension is itself r.e.
[quasi-] conservative, an efficient strategy is to seek very strong r.e. [quasi-]
conservative logics, even if they are implausibly strong for most epistemic appli-
cations, because we can note that the weaker and perhaps more plausible logics
which they extend will also be r.e. [quasi-] conservative.
A large class of r.e. conservative logics arises as follows. Let † be any epistemic
logic. The agent might cognize each theorem of †. Moreover, an epistemic logic †*
may imply this, in that `†* ˛ whenever `† ˛. † and †* may be distinct, even
incompatible. For example, let Ver be the smallest normal modal logic containing
?. Interpreted epistemically, Ver implies that the agent is inconsistent; but Ver
itself is consistent. An epistemic theory consisting just of Ver falsely but consistently
self-attributes inconsistency, and an epistemic logic may report that the agent self-
attributes inconsistency without itself attributing inconsistency to the agent. Thus
Ver* may contain ? without ?. Similarly, let Triv be the smallest normal
modal logic containing all theorems of the form ˛ ˛. Interpreted epistemically,
Triv implies that the agent cognizes that his beliefs contain all and only truths; but
Triv itself does not contain all and only truths (neither `Triv p nor `Triv :p). Thus
34 Some Computational Constraints in Epistemic Logic 727
Triv* may contain (p p) without p p. To be more precise, for any modal
logics ƒ and † let ƒ† be the smallest normal extension of ƒ containing f˛ : `†
˛g. We will prove that if † is consistent and normal then K† is r.e. conservative.
K† is an epistemic logic for theorizing about theories that incorporate the
epistemic logic †. R.e. conservativeness implies no constraint on what epistemic
logic the agent uses beyond consistency (if † is inconsistent, then K† contains
Alt0 and so is not even r.e. quasi-conservative). In particular, the smallest normal
logic K itself is r.e. conservative. Moreover, if † is consistent and normal, then
K4† is r.e. conservative; that is, we can add positive introspection. In particular,
K4 itself is r.e. conservative. We prove this by proving that KVer and KTriv
are r.e. conservative. Since KVer and KTriv contain ? and (p p)
respectively, they are too strong to be useful epistemic logics themselves, but equally
they are strong enough to contain many other logics of epistemic interest, all of
which must also be r.e. conservative. By contrast, Ver and Triv are not themselves
even r.e. quasi-conservative, for `Ver Alt0 and `Triv Alt1 .
For future reference, call a mapping from L into L respectful if and only
if p D p for all propositional variables p, ? D ? and (˛ ˇ) D ˛ ˇ for all
formulas ˛ and ˇ.
Lemma 34.7 KTriv is r.e. conservative.
Proof Let R be an r.e. theory in L. Let • and be respectful mappings from L
to L such that ı˛ D ı˛; ˛ D > if R `PC ı˛ and ˛ D ? otherwise for all
formulas ˛. (i) Axiomatize Triv with all truth-functional tautologies and formulas
of the form ˛ ˛ as the axioms and MP as the only rule of inference (schema
K and rule RN are easily derivable). By an easy induction on the length of proofs,
`Triv ˛ only if `PC ı˛. (ii) Axiomatize KTriv with all truth-functional tautologies
and formulas of the forms (˛ ˇ) (˛ ˇ) and ” whenever `Triv ” as the
axioms and MP as the only rule of inference (RN is a derived rule; its conclusion
is always an axiom because the logic so defined is a sublogic of Triv). We show
by induction on the length of proofs that `KTriv ˛ only if `PC ˛. Basis: If
`PC ˛, `PC ˛. If (˛ ˇ) D > and ˛ D > then R `PC ı˛ ıˇ and R `PC
ı˛, so R `PC ıˇ, so ˇ D >, so R `PC ((˛ ˇ) (˛ ˇ)); otherwise
(˛ ˇ) D ? or ˛ D ? and again R `PC ((˛ ˇ) (˛ ˇ)). If `Triv
then `PC ı
by (i), so R `PC ı
, so
D >, so `PC
. Induction step: trivial.
(iii) Put Y D f˛ 2 L : R `PC ı
g [f:˛ 2 L : not R `PC ı˛g. Y is KTriv-
consistent, for if Y0 Y is finite and `KTriv ^ Y0 ? then `PC (^Y0 >) by
(ii), i.e. `PC ^f˛ : ˛ 2 Y0 g ?, which is impossible since f˛ : ˛ 2 Yg f>,
: ?g. Let X be a maximal KTriv-consistent extension of Y. By definition of
Y, 1 X D f˛ : R `PC ı˛g, which is r.e. because R is r.e. and • is recursive (although
need not be). If ˛ 2 L, ı˛ D ˛, so ˛ 2 X if and only if R `PC ˛, i.e., if and only if
˛ 2 R because R is a theory; thus L \ 1 X D R. Hence KTriv is r.e. conservative.
Lemma 34.8 KVer is r.e. conservative.
Proof Like Lemma 34.7, but in place of ı use a respectful mapping such that
œ˛ D >.
728 T. Williamson
A notable sublogic of KVer is GL, the smallest normal modal logic including
(˛ ˛) ˛. Thus a corollary of Lemma 34.8 is that GL is r.e. conservative.
GL is in a precise sense the logic of what is provable in Peano arithmetic (PA) about
provability in PA (Boolos 1993 has exposition and references). More generally, if R
is an !-consistent r.e. extension of PA, then GL is the logic of what is provable in R
about provability in R. Since a Turing machine’s theory of arithmetic is presumably
at best an !-consistent r.e. extension of PA, GL is therefore a salient epistemic logic
for Turing machines, and its r.e. conservativeness is not surprising.
Caution We must be careful in our informal renderings of results about provability
logic. A provability operator creates an intensional context within which the
substitution of coextensive but not provably coextensive descriptions can alter the
truth-value of the whole sentence; this point applies in particular to descriptions
of agents or their theories. On a provability interpretation of , occurrences of
within the scope of other occurrences of in effect involve just such occurrences
of descriptions of agents or their theories in an intensional context, so which logic is
validated can depend on the manner in which a given agent or theory is described.
The validity of GL as an epistemic logic is relative to a special kind of descriptive
self-presentation of the theory T in the interpretation of , by a coding of its axioms
and rules of inference. GL is not valid relative to some extensionally equivalent but
intensionally distinct interpretations of , e.g. the indexical reading ‘I can prove
that’ as uttered by an epistemic subject with the computational capacity of a Turing
machine (Shin and Williamson 1994; Williamson 1996, 1998).
Proposition 34.9 If † is a consistent normal modal logic, K† and K4† are r.e.
conservative.
Proof By Makinson (1971), either † Triv or † Ver. Hence either K†
KTriv or K† KVer. But Schema 4 is easily derivable in both KTriv
and KVer, so K4† KTriv or K4† KVer. By Lemmas 34.7 and 34.8,
KTriv and KVer are r.e. conservative, so K4† is.
All the logics salient in this paper are decidable, and therefore r.e., but we should
note that an epistemic logic need not be r.e. to be r.e. conservative:
Corollary 34.10 Not all r.e. conservative normal modal logics are r.e.
Proof (i) We show that for any normal modal logic †, `† ˛ if and only if
`K†. ˛. Only the ( direction needs proving. Axiomatize K† with all truth-
functional tautologies and formulas of the forms (˛ ˇ) (˛ ˇ) and
whenever `†
as the axioms and MP as the only rule of inference (RN is a derived
rule; its conclusion is always an axiom because the logic so defined is a sublogic
of †). Let be a respectful mapping from L to L such that ˛ D ˛ for all
formulas ˛ ( is distinct from ı in the proof of Lemma 34.7 since p D p
whereas ıp D p). By induction on the length of proofs, `K† ˛ only if `† ˛.
Hence `K† ˛ only if `† ˛. (ii) By (i), for any normal modal logics †1 and †2 ,
K†1 D K†2 if and only if †1 D †2 . But there are continuum many consistent
normal modal logics (Blok 1980 has much more on these lines). Hence there are
34 Some Computational Constraints in Epistemic Logic 729
continuum many corresponding logics of the form K†; all are r.e. conservative
by Proposition 34.9. Since only countably many modal logics are r.e., some of them
are not r.e.
One limitation of Proposition 34.9 is that K† and K4† never contain the
consistency schema D. In a sense this limitation is easily repaired. For any modal
logic †, let † [D] be the smallest extension of † containing D; thus `†[D] ˛ just in
case `† ˙> ˛.
Proposition 34.11 For any r.e. conservative modal logic †, †[D] is r.e. quasi-
conservative.
Proof For any consistent theory R, any maximal †-consistent set X such that L \
1 X D R is †[D]-consistent because ˙> 2 X.
Corollary 34.12 If † is a consistent normal modal logic, (K†)[D] and
(K4†)[D] are r.e. quasi-conservative.
Proof By Propositions 34.9 and 34.11.
Although †[D] is always prenormal, it may not be normal, even if † is normal;
sometimes not `†[D] ˙>. But we can also consider epistemic interpretations
of normal logics with the D schema, e.g., KD and KD4. Such logics contain
˙>; they require agents to cognize their own consistency. By Gödel’s second
incompleteness theorem, this condition cannot be met relative to a Gödelian manner
of representing the theory in itself; no consistent normal extension of the provability
logic GL contains D. But ˙> is true on other epistemic interpretations; for
example, we know that our knowledge (as opposed to our beliefs) does not imply a
contradiction. Since GL KVer, Proposition 34.9 does not generalize to the r.e.
quasi-conservativeness of KD†. But we can generalize Lemma 34.7 thus:
Proposition 34.13 If † Triv then KD† and KD4† are r.e. quasi-
conservative.
Proof It suffices to prove that KDTriv (DKD4Triv) is r.e. quasi-conservative.
Argue as for Lemma 34.1, adding ˙> as an axiom for KDTriv and noting that if
R is consistent then :> D ?, so `PC ˙>.
In particular, KD and KD4 are themselves r.e. quasi-conservative; they are our
first examples of r.e. quasi-conservative logics which are not r.e. conservative.
We now return to systems with the T schema. Since T implies D, only r.e. quasi-
conservativeness is at issue. That constraint was motivated by the idea that any
consistent r.e. theory in the non-modal language might be exactly the restriction
of the agent’s total r.e. theory to the non-modal language. On many epistemic
interpretations, it is in the spirit of this idea that the agent’s total theory might be
true in the envisaged situation (for example, the agent’s theory about the black box
might be true, having been derived from a reliable witness). To require an epistemic
logic † to leave open these possibilities is to require that †[T] be r.e. quasi-
conservative, where †[T] is the smallest extension of † containing all instances
of T. As with †[D], †[T] need not be normal even when † is; sometimes not
730 T. Williamson
`†[T] (˛ ˛) (Williamson 1998, 113–116 discusses logics of the form †[T]).
Agents may not cognize that they cognize only truths. Nevertheless, particularly
when is interpreted in terms of knowledge, one might want an epistemic logic
such as KT containing (˛ ˛).
Proposition 34.11 and Corollary 34.12 have no analogues for T in place of D.
For any modal logic †, if `† ˛ then `(K†)[T] ˛, but `(K†)[T] ˛ ˛, so
`(K†)[T] ˛; thus (K†)[T] extends † and is r.e. quasi-conservative only if † is.
Similarly, Proposition 34.12 would be false with T in place of D (counterexample:
† D S5). Therefore, needing a different approach, we start with the system GL[T].
GL[T] has intrinsic interest, for it is the provability logic GLS introduced by Solovay
and shown by him to be the logic of what is true (rather than provable) about
provability in PA; more generally, it is the logic of what is true about provability
in an !-consistent r.e. extension of PA. GLS is therefore a salient epistemic logic
for Turing machines, and its r.e. quasi-conservativeness is not surprising. Although
GLS is not normal and has no consistent normal extension, we can use its r.e. quasi-
conservativeness to establish that of normal logics containing T.
Proposition 34.14 GLS is r.e. quasi-conservative.
Proof Let R be an consistent r.e. theory in L. Axiomatize a theory RC in L
with all members of R, truth-functional tautologies and formulas of the forms
(˛ ˇ) (˛ ˇ) and (˛ ˛) ˛ as the axioms and MP and RN as
the rules of inference. Since R is r.e., so is RC. Let œ be the respectful mapping
such that œ˛ D > for all formulas ˛. By an easy induction on the length of proofs,
if `RC ˛ then R `PC œ˛. But if ˛ 2 L then œ˛ D ˛, so `RC ˛ only if R `PC ˛, i.e.,
˛ 2 R; conversely, if ˛ 2 R then `RC ˛; thus L\RC D R. Let Y L be a maximal
consistent extension of R. Define a set X L inductively: pi 2 X () pi 2
Y; ?62 X; ˛ ˇ 2 X () ˛ 62 X or ˇ 2 X; ˛ 2 X () `RC ˛. For ˛ 2
L , either ˛ 2 X or : ˛ 2 X. We show by induction on the length of proofs that if
`RC ˛ then ˛ 2 X. Basis: If ˛ 2 R then ˛ 2 Y X. If (˛ ˇ) 2 X and ˛ 2 X
then `RC ˛ ˇ and `RC ˛, so `RC ˇ, so ˇ 2 X; thus (˛ ˇ) (˛ ˇ)
2 X. If (˛ ˛) 2 X then `RC ˛ ˛, so `RC (˛ ˛) because RC is
closed under RN; but `RC (˛ ˛) ˛, so `RC ˛, so `RC ˛, so ˛ 2 X;
thus (˛ ˛) ˛ 2 X. Induction step: Trivial. Now axiomatize GLS with all
theorems of GL and formulas of the form ˛ ˛ as the axioms and MP as the only
rule of inference. We show by induction on the length of proofs that, for all formulas
˛, if `GLS ˛ then ˛ 2 X. Basis: If `GL ˛ then `RC ˛ because GL RC, so ˛ 2 X by
the previous induction. If ˛ 2 X then `RC ˛, so again ˛ 2 X; thus ˛ ˛ 2 X.
Induction step: Trivial. Hence GLS X, so X is maximal GLS-consistent. Now L \
1 X D L\ RC D R and 1 X D RC is r.e. Thus GLS is r.e. quasi-conservative.
We can extend Proposition 34.14 to another system of interest in relation to
provability logic. Grz is the smallest normal modal logic containing all formulas
of the form ((˛ ˛) ˛) ˛. Grz turns out to be in a precise sense the logic
of what is both provable and true in PA (Boolos 1993, 155–161 has all the facts
about Grz used here). Grz is intimately related to GLS in a way which allows us to
extend the r.e. quasi-conservativeness of GLS to Grz:
34 Some Computational Constraints in Epistemic Logic 731
Conclusion
Our investigation has uncovered part of a complex picture. The line between those
modal logics weak enough to be r.e. conservative or r.e. quasi-conservative and
those that are too strong appears not to coincide with any more familiar distinction
between classes of modal logics, although a solution to the problem left open in the
section “Some non-r.e. quasi-conservative logics” about the converse of Theorem
34.3 might bring clarification. What we have seen is that some decidable modal
logics in general use as logics of knowledge (such as S5) or belief (such as KD45 and
K45) when applied in generalized settings impose constraints on epistemic agents
that require them to exceed every Turing machine in computational power. For many
interpretations of epistemic logic, such a constraint is unacceptably strong.
The problem is not the same as the issue of logical omniscience, since many
epistemic logics (such as S4 and various provability logics) do not impose the
unacceptably strong constraints, although they do impose logical omniscience.
Interpretations that finesse logical omniscience by building it into the definition
734 T. Williamson
of the propositional attitude that interprets the symbol do not thereby finesse
the computational issue that we have been investigating. Nevertheless, the two
questions are related, because the deductive closure of a recursively axiomatised
theory is what makes its theorems computationally hard to survey. In particular,
it can be computationally hard to check for non-theoremhood, which is what
negative introspection and similar axioms require. In fact, negative introspection by
itself turned out not to impose unacceptable computational requirements (Corollary
34.18), but its combination with independently more plausible axioms does so.
Perhaps the issues raised in this paper will provide a more fruitful context in which
to discuss some of the questions raised by the debate on logical omniscience and
bounded rationality.
The results proved in the paper also suggest that more consideration should be
given to the epistemic use of weaker modal logics that are r.e. conservative or quasi-
conservative. The plausibility of correspondingly weaker axioms must be evaluated
under suitable epistemic interpretations. Weaker epistemic logics present a more
complex picture of the knowing subject, but also a more nuanced one, because they
make distinctions that stronger logics erase. We have seen that the more nuanced
picture is needed to express the limits in general cognition of creatures whose
powers do not exceed those of every Turing machine.
Acknowledgements Material based on this paper was presented to colloquia of the British Society
for the Philosophy of Science and the Computer Science Laboratory at Oxford. I thank participants
in both for useful comments.
References
Blok, W. J. (1980). The lattice of modal logics: An algebraic investigation. Journal of Symbolic
Logic, 45, 221–236.
Boolos, G. (1993). The logic of provability. Cambridge: Cambridge University Press.
Craig, W. (1953). On axiomatizability within a system. Journal of Symbolic Logic, 18, 30–32.
Fagin, R., Halpern, J. Y., Moses, Y., & Vardi, M. (1995). Reasoning about knowledge. Cambridge,
MA: MIT Press.
Makinson, D. C. (1971). Some embedding theorems in modal logic. Notre Dame Journal of Formal
Logic, 12, 252–254.
Shin, H. S., & Williamson, T. (1994). Representing the knowledge of Turing machines. Theory
and Decision, 37, 125–146.
Skyrms, B. (1978). An immaculate conception of modality. The Journal of Philosophy, 75,
368–387.
Williamson, T. (1996). Self-knowledge and embedded operators. Analysis, 56, 202–209.
Williamson, T. (1998). Iterated attitudes. In T. Smiley (Ed.), Philosophical logic (pp. 85–133).
Oxford: Oxford University Press for the British Academy.
Williamson, T. (2000). Knowledge and its limits. Oxford: Oxford University Press.
Part V
Interactive Epistemology
Chapter 35
Introduction
The last few decades have witnessed the growing importance of multi-agent per-
spectives in epistemic matters. While traditional epistemology has largely centered
on what single agents know, barring the occasional encounter with a skeptic,
there has been a growing focus on interaction in many disciplines, turning from
single-reasoner to many-reasoners problems, the way physicists turned to many-
body constellations as the essence of nature. This trend may be seen in social
epistemology, speaker-hearer views of meaning, dialogical foundations of logic, or
multi-agent systems instead of single computing devices in computer science. While
an inference or an observation may be the basic informational act for a single agent,
think of a question plus answer as the unit of social communication. This agent
exchange involves knowledge about facts and about others, and the information that
flows and thus changes the current epistemic state of both agents in systematic ways.
Existing epistemic and doxastic logics can describe part of this setting, since they
allow for iteration for different agents, expressing thinks like “agent 1 believes that
agent 2 knows whether the heater is on”. But the next level of social interaction
involves the formation of groups with their own forms of knowledge, based perhaps
on shared information.
One crucial notion concerning groups is ‘common knowledge’, which has
come up in several disciplines independently. The article by Lewis gives an early
philosophical motivation and development for common knowledge as a foundation
for coordinated behavior, while the piece by Barwise points out logical subtleties in
defining this notion that still have not all been disentangled decades later. Another
crucial phenomenon, as we said, is the nature of the dynamic actions and events that
drive the flow of information and interaction. Such actions have been brought inside
the scope of logic by combining ideas from epistemic logic and logics of programs
in the seminal paper by Baltag, Solecki and Moss. Baltag and Smets generalize
this social dynamics to include belief revision and related phenomena. In particular,
these papers show how a wide variety of actions of knowledge update and belief
change can be described in ‘dynamic-epistemic logics’ allowing for groups with
individual differences in observational power and habits of revision.
All these things and more come together in the compass of games, where
information-driven agents pursue goals based on their evaluation of outcomes,
through an interaction over time involving individual strategies responding to what
others do. Indeed, the foundations of game theory have been a ‘philosophical lab’
since the mid 1970s, and the classic paper by Aumann on agreeing to disagree shows
how group knowledge, investigative procedure, and eventually communication of
disagreement can be subject to surprising logical laws. Perhaps the most well-
studied notions in game theory are epistemic conditions such as common knowledge
or common belief in rationality guaranteeing, under broad conditions, that players
follow the backward induction solution, or other important solution concepts in
games. The paper by Aumann & Brandenburger provides key examples of this style
of analysis, as a major representative of what is by now a large body of literature.
The philosophical community has started picking up on these developments, and
related ones in the study of multi-agent systems: partly due to their intrinsic
interest, and partly because games form a concrete microcosm where about all
issues that have occupied philosophical logicians occur together. The article by
Stalnaker shows how this encounter of logic and game theory can be of clear mutual
benefit, and the perceptive paper by Halpern analyzes the ensuing debate across
communities about the status of rationality.
Epistemic foundations of game theory as initiated by Aumann is a very rich area. Here is a selection
of just a few papers that will set the reader thinking: J. Geanakoplos & H. Polemarchakis, ‘We Can’t
Disagree Forever’, Journal of Economic Theory 28, 1982, 192–200; P. Battigalli & G. Bonanno,
‘Recent Results on Belief, Knowledge and the Epistemic Foundations of Game Theory’, Research
in Economics 53(2), 1999, 149–225; A. Brandenburger, ‘The Power of Paradox: some Recent
Developments in Interactive Epistemology’, International Journal of Game Theory 35(4): 465–
492. D. Samet & P. Jehiel, ‘Learning to Play Games in Extensive Form by Valuation’, Journal of
Economic Theory 124, 2005, 129–148.
35 Introduction 739
David Lewis
This chapter is an excerpt of David Lewis’ PhD thesis Convention. It contains the primary content
of Chapter 1: Coordination and Convention, where Lewis presents the idea of conventions as a
mutually expected regularity in coordination in a recurrent situation.
D. Lewis (deceased)
Princeton University, Princeton, NJ, USA
(7) Suppose we are contented oligopolists. As the price of our raw material
varies, we must each set new prices. It is to no one’s advantage to set his prices
higher than the others set theirs, since if he does he tends to lose his share of the
market. Nor is it to anyone’s advantage to set his prices lower than the others set
theirs, since if he does he menaces his competitors and incurs their retaliation. So
each must set his prices within the range of prices he expects the others to set.
With these examples, let us see how to describe the common character of coordina-
tion problems.
Two or more agents must each choose one of several alternative actions. Often
all the agents have the same set of alternative actions, but that is not necessary. The
outcomes the agents want to produce or prevent are determined jointly by the actions
of all the agents. So the outcome of any action an agent might choose depends on the
actions of the other agents. That is why—as we have seen in every example—each
must choose what to do according to his expectations about what the others will do.
To exclude trivial cases, a coordination problem must have more than one coor-
dination equilibrium. But that requirement is not quite strong enough. Figure 36.1
shows two matrices in which, sure enough, there are multiple coordination equilibria
(two on the left, four on the right). Yet there is still no need for either agent to
base his choice on his expectation about the other’s choice. There is no need for
them to try for the same equilibrium—no need for coordination—since if they
try for different equilibria, some equilibrium will nevertheless be reached. These
cases exhibit another kind of triviality, akin to the triviality of a case with a unique
coordination equilibrium.
A combination is an equilibrium if each agent likes it at least as well as any other
combination he could have reached, given the others’ choices. Let us call it a proper
equilibrium if each agent likes it better than any other combination he could have
C1 C2 C1 C2 C3
1 1 1 1 .2
R1 R1
1 1 1 1 0
0 0 1 1 .5
R2 R2
0 0 1 1 .2
0 0 0
R3
.5 0 0
Fig. 36.1
36 Convention (An Excerpt on Coordination and Higher-Order Expectations) 743
C1 C2 C3
2 0 0
R1
2 0 0
0 2 0
R2
0 2 0
0 1 1
R3
0 1 1
Fig. 36.2
reached, given the others’ choices. In a two-person matrix, for instance, a proper
equilibrium is preferred by Row-chooser to all other combinations in its column, and
by Column-chooser to all other combinations in its row. In the matrices in Fig. 36.1,
there are multiple coordination equilibria, but all of them are improper.
There is no need to stipulate that all equilibria in a coordination problem must
be proper; it seems that the matrix in Fig. 36.2 ought to be counted as essentially
similar to our clear examples of coordination problems, despite the impropriety of
its equilibrium hR3, C3i. The two proper coordination equilibria—hR1, C1i and
hR2, C2i—are sufficient to keep the problem nontrivial. I stipulate instead that a
coordination problem must contain at least two proper coordination equilibria.
This is only one—the strongest—of several defensible restrictions. We might
prefer a weaker restriction that would not rule out matrices like those in Fig. 36.3.
But a satisfactory restriction would be complicated and would entail too many
qualifications later. And situations like those of Fig. 36.3 can be rescued even under
the strong restriction we have adopted. Let R20 be the disjunction of R2 and R3, and
C20 the disjunction of C2 and C3 in the left-hand matrix. Then the same situation
can be represented by the new matrix in Fig. 36.4, which does have two proper
coordination equilibria. The right-hand matrix can be consolidated in a similar way.
But matrices like the one in Fig. 36.5, which are ruled out by the strong restriction,
and ought to be ruled out, cannot be rescued by any such consolidation.
To sum up: Coordination problems—situations that resemble my 11 examples
in the important respects in which they resemble one another1—are situations of
interdependent decision by two or more agents in which coincidence of interest
predominates and in which there are two or more proper coordination equilibria.
We could also say—though less informatively than one might think—that they are
situations in which, relative to some classification of actions, the agents have a
common interest in all doing the same one of several alternative actions.
1
See Michael Slote, “The Theory of Important Criteria,” Journal of Philosophy, 63 (1966), pp.
211–224. Slote shows that we commonly introduce a class by means of examples and take the
defining features of the class to be those distinctive features of our examples which seem important
for an understanding of their character. That is what I take myself to be doing here and elsewhere.
744 D. Lewis
C1 C2 C3 C1 C2 C3 C4
1 0 0 1 1 0 0
R1 R1
1 0 0 1 1 0 0
0 1 1 1 1 0 0
R2 R2
0 1 1 1 1 0 0
0 1 1 0 0 1 1
R3 R3
0 1 1 0 0 1 1
0 0 1 1
R4
0 0 1 1
Fig. 36.3
C1 C2⬘
1 0
R1
1 0
0 1
R2⬘
0 1
Fig. 36.4
C1 C2 C3
1 1 0
R1
1 1 0
0 1 1
R2
0 1 1
0 0 1
R3
0 0 1
Fig. 36.5
36 Convention (An Excerpt on Coordination and Higher-Order Expectations) 745
Agents confronted by a coordination problem may or may not succeed in each acting
so that they reach one of the possible coordination equilibria. They might succeed
just by luck, although some of them choose without regard to the others’ expected
actions (doing so perhaps because they cannot guess what the others will do, perhaps
because the chance of coordination seems so small as to be negligible).
But they are more likely to succeed—if they do—through the agency of a system
of suitably concordant mutual expectations. Thus in example (1) I may go to a
certain place because I expect you to go there, while you go there because you
expect me to; in example (2) I may call back because I expect you not to, while you
do not because you expect me to; in example (4) each of us may drive on the right
because he expects the rest to do so; and so on. In general, each may do his part
of one of the possible coordination equilibria because he expects the others to do
theirs, thereby reaching that equilibrium.
If an agent were completely confident in his expectation that the others would do
their parts of a certain proper coordination equilibrium, he would have a decisive
reason to do his own part. But if—as in any real case—his confidence is less than
complete, he must balance his preference for doing his part if the others do theirs
against his preferences for acting otherwise if they do not. He has a decisive reason
to do his own part if he is sufficiently confident in his expectation that the others will
do theirs. The degree of confidence which is sufficient depends on all his payoffs
and sometimes on the comparative probabilities he assigns to the different ways
the others might not all do their parts, in case not all of them do. For instance,
in the coordination problem shown in Fig. 36.6, Row-chooser should do his part
of the coordination equilibrium hR1, C1i by choosing R1 if he has more than
.5 confidence that Column-chooser will do his part by choosing C1. But in the
coordination problems shown in Fig. 36.7, Row-chooser should choose R1 only if
he has more than .9 confidence that Column-chooser will choose C1. If he has, say,
.8 confidence that Column-chooser will choose C1, he would do better to choose R2,
sacrificing his chance to achieve coordination at hR1, C1i in order to hedge against
the possibility that his expectation was wrong. And in the coordination problem
shown in Fig. 36.8, Row-chooser might be sure that if Column-chooser fails to do
C1 C2
1 0
R1
1 0
0 1
R2
0 1
Fig. 36.6
746 D. Lewis
C1 C2 C1 C2 C1 C2
1 -8 1 0 3 -26
R1 R1 R1
1 -8 1 0 3 -26
0 1 0 9 0 1
R2 R2 R2
0 1 0 9 0 1
Fig. 36.7
C1 C2 C3
1 0 -8
R1
1 0 -8
0 1 9
R2
0 1 9
Fig. 36.8
his part of hR1, C1i, at least he will choose C2, not C3; if so, Row-chooser should
choose R1 if he has more than .5 confidence that Column-chooser will choose C1.
Or Row-chooser might think that if Column-chooser fails to choose R1, he is just
as likely to choose C3 as to choose C2; if so, Row-chooser should choose R1 only
if he has more than .9 confidence that Column-chooser will choose C1. Or Row-
chooser might be sure that if Column-chooser does not choose C1, he will choose
C3 instead; if so, Row-chooser’s minimum sufficient degree of confidence is about
.95. The strength of concordant expectation needed to produce coordination at a
certain equilibrium is a measure of the difficulty of achieving coordination there,
since however the concordant expectations are produced, weaker expectations will
be produced more easily than stronger ones. (We can imagine cases in which so
much mutual confidence is required to achieve coordination at an equilibrium that
success is impossible. Imagine that a millionaire offers to distribute his fortune
equally among a thousand men if each sends him $10; if even one does not, the
millionaire will keep whatever he is sent. I take it that no matter what the thousand
do to increase their mutual confidence, it is a practical certainty that the millionaire
will not have to pay up. So if I am one of the thousand, I will keep my $10.)
We may achieve coordination by acting on our concordant expectations about
each other’s actions. And we may acquire those expectations, or correct or
corroborate whatever expectations we already have, by putting ourselves in the other
fellow’s shoes, to the best of our ability. If I know what you believe about the matters
of fact that determine the likely effects of your alternative actions, and if I know your
36 Convention (An Excerpt on Coordination and Higher-Order Expectations) 747
preferences among possible outcomes and I know that you possess a modicum of
practical rationality, then I can replicate your practical reasoning to figure out what
you will probably do, so that I can act appropriately.
In the case of a coordination problem, or any other problem of interdependent
decision, one of the matters of fact that goes into determining the likely effects of
your alternative actions is my own action. In order to figure out what you will do by
replicating your practical reasoning, I need to figure out what you expect me to do.
I know that, just as I am trying to figure out what you will do by replicating
your reasoning, so you may be trying to figure out what I will do by replicating my
reasoning. This, like anything else you might do to figure out what I will do, is itself
part of your reasoning. So to replicate your reasoning, I may have to replicate your
attempt to replicate my reasoning.
This is not the end. I may reasonably expect you to realize that, unless I already
know what you expect me to do, I may have to try to replicate your attempt to
replicate my reasoning. So I may expect you to try to replicate my attempt to
replicate your attempt to replicate my reasoning. So my own reasoning may have
to include an attempt to replicate your attempt to replicate my attempt to replicate
your attempt to replicate my reasoning. And so on.
Before things get out of hand, it will prove useful to introduce the concept of
higher-order expectations, defined by recursion thus:
A first-order expectation about something is an ordinary expectation about it.
An (n C l)th-order expectation about something (n 1) is an ordinary expectation about
someone else’s nth-order expectation about it.
For instance, if I expect you to expect that it will thunder, then I have a second-
order expectation that it will thunder.
Whenever I replicate a piece of your practical reasoning, my second-order
expectations about matters of fact, together with my first-order expectations about
your preferences and your rationality, justify me in forming a first-order expectation
about your action. In the case of problems of interdependent decision—for instance,
coordination problems—some of the requisite second-order expectations must be
about my own action.
Consider our first sample coordination problem: a situation in which you and I
want to meet by going to the same place. Suppose that after deliberation I decide to
come to a certain place. The fundamental practical reasoning which leads me to that
choice is shown in Fig. 36.9. (In all diagrams of this kind, heavy arrows represent
implications; light arrows represent causal connections between the mental states or
actions of a rational agent.) And if my premise for this reasoning—my expectation
that you will go there—was obtained by replicating your reasoning, my replication
is shown in Fig. 36.10. And if my premise for this replication—my expectation that
you will expect me to go there—was obtained by replicating your replication of my
reasoning, my replication of your replication is shown in Fig. 36.11. And so on. The
whole of my reasoning (simplified by disregarding the rationality premises) may be
represented as in Fig. 36.12 for whatever finite number of stages it may take for
748 D. Lewis
I have reason
to desire that
I go there
I go there
Fig. 36.9
I have reason
to expect that
you will go there
I expect that
you will go there
Fig. 36.10
36 Convention (An Excerpt on Coordination and Higher-Order Expectations) 749
I have reason
to expect that
you expect that
I will go there
I expect that
you expect that
I will go there
Fig. 36.11
I go there
Fig. 36.12
lower- and lower-order expectations about action. Provided I go on long enough, and
provided all the needed higher-order expectations about preferences and rationality
are available, I eventually come out with a first-order expectation about your
action—which is what I need in order to know how I should act.
Clearly a similar process of replication is possible in coordination problems
among more than two agents. In general, my higher-order expectations about
something are my expectations about x1 ’s expectations about x2 ’s expectations : : :
about it. (The sequence x1 , x2 : : : may repeat, but x1 cannot be myself and no
one can occur twice in immediate succession.) So when m agents are involved,
I can have as many as (m l)n different nth-order expectations about anything,
corresponding to the (m l)n different admissible sequences of length n. Replication
in general is ramified: it is built from stages in which m 1 of my various (n C l)th-
order expectations about action, plus ancillary premises, yield one of my nth-order
expectations about action. I suppressed the ramification by setting m D 2, but the
general case is the same in principle.
Note that replication is not an interaction back and forth between people. It is
a process in which one person works out the consequences of his beliefs about
the world—a world he believes to include other people who are working out the
36 Convention (An Excerpt on Coordination and Higher-Order Expectations) 751
consequences of their beliefs, including their belief in other people who : : : By our
interaction in the world we acquire various high-order expectations that can serve
us as premises. In our subsequent reasoning we are windowless monads doing our
best to mirror each other, mirror each other mirroring each other, and so on.
Of course I do not imagine that anyone will solve a coordination problem by
first acquiring a seventeenth-order expectation from somewhere and then sitting
down to do his replications. For one thing, we rarely do have expectations of higher
order than, say, fourth. For another thing, any ordinary situation that could justify
a high-order expectation would also justify low-order expectations directly, without
recourse to nested replications.
All the same, given the needed ancillary premises, an expectation of arbitrarily
high order about action does give an agent one good reason for a choice of action.
The one may, and normally will, be one reason among the many which jointly
suffice to justify his choice. Suppose the agent is originally justified somehow in
having expectations of several orders about his own and his partners’ actions. And
suppose the ancillary premises are available. Then each of his original expectations
independently gives him a reason to act one way or another. If he is lucky, all
these independent reasons will be reasons for the same action.2 Then that action
is strongly, because redundantly, justified; he has more reason to do it than could
have been provided by any one of his original expectations by itself.
I said earlier that coordination might be rationally achieved with the aid of
concordant mutual expectations about action. We have seen that these may be
derived from first- and higher-order expectations about action, preferences, and
rationality. So we generalize: coordination may be rationally achieved with the aid
of a system of concordant mutual expectations, of first or higher orders, about the
agents’ actions, preferences, and rationality.
The more orders of expectation about action contribute to an agent’s decision,
the more independent justifications the agent will have; and insofar as he is aware of
those justifications, the more firmly his choice will be determined. Circumstances
that will help to solve a coordination problem, therefore, are circumstances in
which the agents become justified in forming mutual expectations belonging to a
concordant system. And the more orders, the better.
2
Michael Scriven, in “An Essential Unpredictability in Human Behavior,” Scientific Psychology:
Principles and Approaches, ed. ¥. B. Wolman (New York: Basic Books, 1965), has discussed
mutual replication of practical reasoning between agents in a game of conflict who want not
to conform to each other’s expectations. There is a cyclic alternation: from my (n C 4)th-order
expectation that I will go to Minsk to my (n C 3)th-order expectation that you will go to Pinsk
to my (n C 2)th-order expectation that I will go to Pinsk to my (n C l)th-order expectation that
you will go to Minsk to my nth-order expectation that I will go to Minsk : : : Scriven notices
that we cannot both act on complete and accurate replications of each other’s reasoning. He takes
this to prove human unpredictability. But perhaps it simply proves that the agents cannot both have
enough time to finish their replications, since the time either needs increases with the time the other
uses. See David Lewis and Jane Richardson, “Scriven on Human Unpredictability,” Philosophical
Studies, 17 (1966), pp. 69–74.
752 D. Lewis
Convention
Let us start with the simplest case of coordination by precedent and generalize in
various ways. In this way we shall meet the phenomenon I call convention, the
subject of this book.
Suppose we have been given a coordination problem, and we have reached
some fairly good coordination equilibrium. Given exactly the same problem again,
perhaps each of us will repeat what he did before. If so, we will reach the same
solution. If you and I met yesterday—by luck, by agreement, by salience, or
however—and today we find we must meet again, we might both go back to
754 D. Lewis
yesterday’s meeting place, each hoping to find the other there. If we were cut off on
the telephone and you happened to call back as I waited, then if we are cut off again
in the same call, I will wait again.
We can explain the force of precedent just as we explained the force of
salience. Indeed, precedent is merely the source of one important kind of salience:
conspicuous uniqueness of an equilibrium because we reached it last time. We may
tend to repeat the action that succeeded before if we have no strong reason to do
otherwise. Whether or not any of us really has this tendency, we may somewhat
expect each other to have it, or expect each other to expect each other to have it, and
so on—that is, we may each have first- and higher-order expectations that the others
will do their parts of the old coordination equilibrium, unless they have reason to act
otherwise. Each one’s expectation that the others will do their parts, strengthened
perhaps by replication using his higher-order expectations, gives him some reason
to do his own part. And if his original expectations of some order or other were
strong enough, he will have a decisive reason to do his part. So he will do it.
I have been supposing that we are given a coordination problem, and then given
the same problem again. But, of course, we could never be given exactly the same
problem twice. There must be this difference at least: the second time, we can draw
on our experience with the first. More generally, the two problems will differ in
several independent respects. We cannot do exactly what we did before. Nothing
we could do this time is exactly like what we did before—like it in every respect—
because the situations are not exactly alike.
So suppose not that we are given the original problem again, but rather that
we are given a new coordination problem analogous somehow to the original one.
Guided by whatever analogy we notice, we tend to follow precedent by trying for
a coordination equilibrium in the new problem which uniquely corresponds to the
one we reached before.
There might be alternative analogies. If so, there is room for ambiguity about
what would be following precedent and doing what we did before. Suppose that
yesterday I called you on the telephone and I called back when we were cut off.
Today you call me and we are cut off. We have a precedent in which I called back and
a precedent—the same one—in which the original caller called back. But this time
you are the original caller. No matter what I do this time, I do something analogous
to what we did before. Our ambiguous precedent does not help us.
In fact, there are always innumerable alternative analogies. Were it not that
we happen uniformly to notice some analogies and ignore others—those we call
“natural” or “artificial,” respectively—precedents would always be completely
ambiguous and worthless. Every coordination equilibrium in our new problem
(every other combination, too) corresponds uniquely to what we did before under
some analogy, shares some distinctive description with it alone. Fortunately, most
of the analogies are artificial. We ignore them; we do not tend to let them guide our
choice, nor do we expect each other to have any such tendency, nor do we expect
each other to expect each other to, and so on. And fortunately we have learned
that all of us will mostly notice the same analogies. That is why precedents can
be unambiguous in practice, and often are. If we notice only one of the analogies
36 Convention (An Excerpt on Coordination and Higher-Order Expectations) 755
between our problem and the precedent, or if one of those we notice seems far more
conspicuous than the others, or even if several are conspicuous but they all happen
to agree in indicating the same choice, then the other analogies do not matter. We
are not in trouble unless conflicting analogies force themselves on our attention.
The more respects of similarity between the new problem and the precedent, the
more likely it is that different analogies will turn out to agree, the less room there
will be for ambiguity, and the easier it will be to follow precedent. A precedent in
which I, the original caller, called back is ambiguous given a new problem in which
you are the original caller—but not given a new problem in which I am again the
original caller. That is why I began by pretending that the new problem was like the
precedent in all respects.
Salience in general is uniqueness of a coordination equilibrium in a preeminently
conspicuous respect. The salience due to precedent is no exception: it is uniqueness
of a coordination equilibrium in virtue of its preeminently conspicuous analogy to
what was done successfully before.
So far I have been supposing that the agents who set the precedent are the
ones who follow it. This made sure that the agents given the second problem were
acquainted with the circumstances and outcome of the first, and expected each other
to be, expected each other to expect each other to be, and so on. But it is not an
infallible way and not the only way. For instance, if yesterday I told you a story about
people who got separated in the subway and happened to meet again at Charles
Street, and today we get separated in the same way, we might independently decide
to go and wait at Charles Street. It makes no difference whether the story I told
you was true, or whether you thought it was, or whether I thought it was, or even
whether I claimed it was. A fictive precedent would be as effective as an actual one
in suggesting a course of action for us, and therefore as good a source of concordant
mutual expectations enabling us to meet. So let us just stipulate that somehow the
agents in the new problem are acquainted with the precedent, expect each other to
be acquainted with it, and so on.
So far I have been supposing that we have a single precedent to follow. But we
might have several. We might all be acquainted with a class of previous coordination
problems, naturally analogous to our present problem and to each other, in which
analogous coordination equilibria were reached. This is to say that the agents’
actions conformed to some noticeable regularity. Since our present problem is
suitably analogous to the precedents, we can reach a coordination equilibrium by all
conforming to this same regularity. Each of us wants to conform to it if the others do;
he has a conditional preference for conformity. If we do conform, the explanation
has the familiar pattern: we tend to follow precedent, given no particular reason to
do anything else; we expect that tendency in each other; we expect each other to
expect it; and so on. We have our concordant first- and higher-order expectations,
and they enable us to reach a coordination equilibrium.
It does not matter why coordination was achieved at analogous equilibria in the
previous cases. Even if it had happened by luck, we could still follow the precedent
set. One likely course of events would be this; the first case, or the first few, acted
756 D. Lewis
as precedent for the next, those for the next, and so on. Similarly, no matter how our
precedents came about, by following them this time we add this case to the stock of
precedents available henceforth.
Several precedents are better than one, not only because we learn by repetition
but also because differences between the precedents help to resolve ambiguity. Even
if our present situation bears conflicting natural analogies to any one precedent,
maybe only one of these analogies will hold between the precedents; so we will
pay attention only to that one. Suppose we know of many cases in which a cut-off
telephone call was restored, and in every case it was the original caller who called
back. In some cases I was the original caller, in some you were, in some neither of us
was. Now we are cut off and I was the original caller. For you to call back would be
to do something analogous—under one analogy—to what succeeded in some of the
previous cases. But we can ignore that analogy, for under it the precedents disagree.
Once there are many precedents available, without substantial disagreement or
ambiguity, it is no longer necessary for all of us to be acquainted with precisely the
same ones. It is enough if each of us is acquainted with some agreeing precedents,
each expects everyone else to be acquainted with some that agree with his, each
expects everyone else to expect everyone else to be acquainted with some precedents
that agree with his, etc. It is easy to see how that might happen: if one has often
encountered cases in which coordination was achieved in a certain problem by
conforming to a certain regularity, and rarely or never encountered cases in which it
was not, he is entitled to expect his neighbors to have had much the same experience.
If I have driven all around the United States and seen many people driving on the
right and never one on the left, I may reasonably infer that almost everyone in the
United States drives on the right, and hence that this man driving toward me also
has mostly seen people driving on the right—even if he and I have not seen any of
the same people driving on the right.
Our acquaintance with a precedent need not be very detailed. It is enough to
know that one has learned of many cases in which coordination was achieved in a
certain problem by conforming to a certain regularity. There is no need to be able to
specify the time and place, the agents involved, or any other particulars; no need to
be able to recall the cases one by one. I cannot cite precedents one by one in which
people drove on the right in the United States; I am not sure I can cite even one case;
nonetheless, I know very well that I have often seen cars driven in the United States,
and almost always they were on the right. And since I have no reason to think I
encountered an abnormal sample, I infer that drivers in the United States do almost
always drive on the right; so anyone I meet driving in the United States will believe
this just as I do, will expect me to believe it, and so on.
Coordination by precedent, at its simplest, is this: achievement of coordination
by means of shared acquaintance with the achievement of coordination in a single
past case exactly like our present coordination problem. By removing inessential
restrictions, we have come to this: achievement of coordination by means of shared
acquaintance with a regularity governing the achievement of coordination in a class
of past cases which bear some conspicuous analogy to one another and to our
36 Convention (An Excerpt on Coordination and Higher-Order Expectations) 757
present coordination problem. Our acquaintance with this regularity comes from our
experience with some of its instances, not necessarily the same ones for everybody.
Given a regularity in past cases, we may reasonably extrapolate it into the
(near) future. For we are entitled to expect that when agents acquainted with the
past regularity are confronted by an analogous new coordination problem, they
will succeed in achieving coordination by following precedent and continuing to
conform to the same regularity. We come to expect conforming actions not only in
past cases but in future ones as well. We acquire a general belief, unrestricted as
to time, that members of a certain population conform to a certain regularity in a
certain kind of recurring coordination problem for the sake of coordination.
Each new action in conformity to the regularity adds to our experience of general
conformity. Our experience of general conformity in the past leads us, by force
of precedent, to expect a like conformity in the future. And our expectation of
future conformity is a reason to go on conforming, since to conform if others
do is to achieve a coordination equilibrium and to satisfy one’s own preferences.
And so it goes—we’re here because we’re here because we’re here because we’re
here. Once the process gets started, we have a metastable self-perpetuating system
of preferences, expectations, and actions capable of persisting indefinitely. As
long as uniform conformity is a coordination equilibrium, so that each wants to
conform conditionally upon conformity by the others, conforming action produces
expectation of conforming action and expectation of conforming action produces
conforming action.
This is the phenomenon I call convention. Our first, rough, definition is:
A regularity R in the behavior of members of a population P when they are agents in a
recurrent situation S is a convention if and only if, in any instance of S among members of
P,
(1) everyone conforms to R;
(2) everyone expects everyone else to conform to R;
(3) everyone prefers to conform to R on condition that the others do, since S is a
coordination problem and uniform conformity to R is a proper coordination equilibrium
in S.
Sample Conventions
Chapter II will be devoted to improving the definition. But before we hide the
concept beneath its refinements, let us see how it applies to examples. Consider
some conventions to solve our sample coordination problems.
(1) If you and I must meet every week, perhaps at first we will make a new
appointment every time. But after we have met at the same time and place for a few
weeks running, one of us will say, “See you here next week,” at the end of every
meeting. Later still we will not say anything (unless our usual arrangement is going
to be unsatisfactory next week). We will just both go regularly to a certain place
at a certain time every week, each going there to meet the other and confident that
he will show up. This regularity that has gradually developed in our behavior is a
convention.
Chapter 37
Three Views of Common Knowledge
Jon Barwise
Introduction
As the pioneering work of Dretske1 has shown, knowing, believing, and having
information are closely related and are profitably studied together. Thus while
the title of this paper mentions common knowledge, I really have in mind the
family of related notions including common knowledge, mutual belief and shared
information. Even though I discuss common knowledge in this introduction, the
discussion is really intended to apply to all three notions.
Common knowledge and its relatives have been written about from a wide variety
of perspectives, including psychology, economics, game theory, computer science,
the theory of convention, deterrence theory, the study of human-machine interaction,
and the famous Conway paradox, just to mention a few. There are literally hundreds
of papers that touch on the topic. However, while common knowledge is widely
recognized to be an important phenomenon, there is no agreement as to just what
it amounts to. Or rather, as we will see, what agreement there is presupposes a
set of simplifying assumptions that are completely unrealistic. This paper offers a
comparison of three competing views in a context which does not presuppose them
to be equivalent, and explores their relationships in this context.2
learning about it. But even if this is so, I am reasonably sure that the particular model I develop
below is original, depending as it does on recent work in set theory by Peter Aczel.
3
David Lewis, Convention, A Philosophical Study (Cambridge, Mass.: Harvard University Press,
1969).
4
See, for example, the paper by Halpern and Moses, “Knowledge and common knowledge in
distributed environments,” Proc. 3rd ACM Symp. on Principles of Distributed Computing (1984),
50–61, and the paper by Fagin, Halpern and Vardi, “A model-theoretic analysis of knowledge:
preliminary report,” Proc. 25th IEEE Symposium on Foundations of C.S., 268–278.
5
See Gilbert Harman’s review of Linguistic Behavior by Jonathan Bennett, Language 53 (1977):
417–24.
37 Three Views of Common Knowledge 761
was pointed out by Tommy Tan and Sergio Ribeiro da Costa Werlang.6 Aumann
suggests that this approach is equivalent to the iterate approach. Tan and Ribeiro
da Costa Werlang develop a mathematical model of the iterate approach and show
that it is equivalent to Aumann’s fixed point model. Similarly, one sees from the
work of Halpern and Moses, that while they start with the iterate approach, in their
set-up, this is equivalent to a fixed point. One of the aims of this paper is to develop
a mathematical model where both iterate and fixed point accounts fit naturally, but
where they are not equivalent. Only in such a framework can we explicitly isolate
the assumptions that are needed to show them equivalent. We will see that these
assumptions are simply false (in the case of knowledge), so that the issue as to
which of the two, if either, is the “right” analysis of the notion is a live one.
The final approach we wish to discuss, the shared-environment approach, was
proposed by Clark and Marshall,7 in response to the enormous processing problems
associated with the iterate account. On their account, p and q have common
knowledge of just in case there is a situation s such that:
• s ˆ ,
• s ˆ p1 knows s,
• s ˆ p2 knows s.
Here s ˆ is a notation for: is a fact of s. The intuitive idea is that common
knowledge amounts to perception or other awareness of some situation, part of
which includes the fact in question, but another part of which includes the very
awarenesses of the situation by both agents. Again we note the circular nature of the
characterization.
It is these three characterizations of common knowledge, and their relatives for the
other notions of mutual belief and shared information, that we wish to compare.
Among common knowledge, mutual belief, and shared information, we focus
primarily on the case of having information, secondarily on the case of knowledge.
Part of the claim of the paper is that these two notions are often conflated, and that
it is this conflation that lends some credibility to the assumptions under which the
first two approaches to common knowledge are equivalent. So I need to make clear
6
R. J. Aumann, “Agreeing to disagree,” Annals of Statistics, 4 (1976), 1236–1239, and the working
paper “On Aumann’s Notion of Common Knowledge – An alternative approach,” Tan and Ribeiro
da Costa Werlang. University of Chicago Graduate School of Business, 1986.
7
H. Clark and C. Marshall, “Definite reference and mutual knowledge,” in Elements of Discourse
Understanding, ed. A. Joshi, B. Webber, and I. Sag (Cambridge: Cambridge University Press,
1981), 10–63.
762 J. Barwise
what I take to be the difference between an agent p knowing some fact , and the
agent simply having the information .
Here I am in agreement with Dretske.8 Knowing is stronger than having the
information . An agent knows if he not only has the information , but moreover,
the information is “had” in a way that is tied up with the agent’s abilities to act.
When might this not be the case? The most notorious example (and by no means the
only one) is when I know one fact , and another fact 0 logically follows from ,
but I disbelieve the latter because I don’t know that the one follows from the other.
Obviously there is a clear sense in which I have the information 0 , but I certainly
don’t know it in the ordinary sense of the word. Another arises with certain forms of
perceptual information. If I see the tallest spy hide a letter under a rock, then there
is a clear sense in which I have the information the tallest spy has hidden the letter.
However, if I don’t know that he is a spy, say, then I don’t know that the tallest spy
has hidden a letter. Information travels at the speed of logic, genuine knowledge
only travels at the speed of cognition and inference.
Much of the work in logic which seems to be about knowledge is best understood
in terms of having information. And for good reason. For example, in dealing with
computers, there is a good reason for our interest in the latter notion. We often
use computers as information processors, after all, for our own ends. We are often
interested less in what the computer does with the information it has, than in just
what information it has and what we can do with it. Or, in the design of a robot, we
may be aiming at getting the robot to behave in a way that is appropriate given the
information it has. One might say, we are trying to make it know the information it
has.
So, as noted earlier, this paper focuses primarily on the case of having infor-
mation. The model I am going to develop originated with an analysis of shared
perceptual information,9 but it also works quite well for primary epistemic percep-
tional information10 and the relation of having information.
Let me say all this in another way, since it seems to be a confusing point. In the
section that follows, I could interpret the model as a model of knowledge if I were
to make the same idealization that is made in most of the literature on common
knowledge. However, part of what I want to do here is make very explicit just what
the role of this idealization is in the modeling of common knowledge. Thus, I am
forced to work in a context where we do not make it. Once we are clear about its
role, we can then decide if we want to make it.
8
Op. Cit.
9
See ch. 2 of Fred Dretske, Seeing and Knowing (Chicago: University of Chicago Press, 1969); or
J. Barwise, “Scenes and other Situations”, Journal of Philosophical Logic 78 (1981): 369–97; or
ch. 8 of J. Barwise and J. Perry, Situations and Attitudes (Cambridge, Mass.: Bradford Books/MIT
Press, 1983).
10
See ch. 3 of Seeing and Knowing or ch. 9 of Situations and Attitudes.
37 Three Views of Common Knowledge 763
Summary of Results
Our results suggest that the fixed point approach gives the right theoretical analysis
of the pretheortic notion of common knowledge. On the other hand, the shared-
environment approach is the right way to understand how common knowledge
usually arises and is maintained over an extended interaction. It does not offer an
adequate characterization of the pretheoretic notion, though, since a given piece of
common knowledge may arise from many different kinds of shared environments.
The fixed point gets at just what is in common to the various ways a given piece of
common knowledge can arise.
What about the iterate approach? We will show that for the relation of having
information, the fixed-point approach is equivalent to the iterate approach, provided
we restrict ourselves to finite situations. Without this assumption, though, the iterate
approach, with only countably many iterations, is far too weak. In general, we must
iterate on indefinitely into the transfinite.
Not only is the iterate approach too weak. When we move from having
information to knowing, then even two iterations are unjustified. In general, the
iterate approach is incomparable and really seems to miss the mark. We will see just
what assumptions are needed to guarantee that the iterate account is equivalent to
the fixed-point account.
In developing our model, we will follow the general line used in The Liar11 in three
ways. First, we take our metatheory to be ZF/AFA, a theory of sets that admits of
circularity. We do this because ZF/AFA offers the most elegant mathematical setting
we know for modeling circularity. Space does not permit us to give an introduction
to this elegant set theory. We refer the reader to Chap. 3 of this book, or to Aczel’s
lectures12 for an introduction.
Second, we follow the approach taken in The Liar in paying special attention to
“situations,” or “partial possible worlds.” As far as this paper goes, the reader can
think of a situation as simply representing an arbitrary set of basic facts, where a fact
is simply some objects standing in some relation. Actually, in this paper, situations
play a dual role. On the one hand they represent parts of the world. On the other
hand they represent information about parts of the world. Thus, for example, we
will define what it means for one situation s0 to support another situation s1 , in the
sense that s0 contains enough facts to support all the facts in s1 .
11
J. Barwise and J. Etchemendy, The Liar: An Essay on Truth and Circularity (New York: Oxford
University Press, 1987).
12
P. Aczel, Non-well-founded Sets (CSLI Lecture Notes (Chicago: University of Chicago Press,
1987 (to appear)).
764 J. Barwise
Finally, on the trivial side, we also follow The Liar in considering a domain of
card players as our domain to be modeled. We use this domain because it is simple,
and because the existence of common knowledge is absolutely transparent to anyone
who has ever played stud poker. And while the example is simple, there is enough
complexity to illustrate many of the general points that need making. However, there
is nothing about the results that depend on this assumption. You could replace the
relation of having a given card with any relation whatsoever, and the results would
still obtain.
Example 37.1 Simply by way of illustration, we have a running example, a game
of stud poker. To make it very simple, we will use two card stud poker,13 with two
players, Claire and Max. We will assume that the players have the following cards:
Except for the rules and the idiosyncrasies of the other players, all the informa-
tion available to the players is represented in this table. Note that based on what he
sees, Max knows that he has the winning hand, or at least a tie, but Claire thinks she
has a good chance of having the winning hand. The question before us is how best
to model the informational difference between up cards and down cards.
Notice how different this situation would be from draw poker, where all cards
are down, even if each player had cheated and learned the value of the second card.
Anyone who has played poker will realize the vast difference. The reason is that
in the standard case, the values of all the up cards is common knowledge, but in
the second it isn’t. Our aim, then, is to use tools from logic to model the three
approaches to the common knowledge and shared information present in such a
situation.
We reiterate that we use this simple card domain simply by way of making things
concrete. We could equally well treat the more general case, if space permitted. We
use S for the relation of seeing (or more generally of having information), H for
the relation of having a card, and appropriate tuples to represent facts involving
these relations. Thus, the fact that Max has the 3♠ will be represented by the triple
hH, Max, 3♠i. The fact that Claire sees this will be represented by hS, Claire, fhH,
13
For the reader unfamiliar with two card stud poker, here is all you need to know to follow the
example. First each player is dealt one card which only he is allowed to see, and there is a round
of betting. Then each player is dealt one card face up on the table and there is another round of
betting. Hands are ranked and players bet if they think their hand is best. But they can also drop
out of the round at any point. After both rounds of betting are over, the hands are displayed, so that
all players can see who won. As far as the ranking, all that matters is that a hand with a matching
pair is better than a hand with no pairs. But among hands with no pairs, a hand with an ace is better
than a hand with no ace.
37 Three Views of Common Knowledge 765
Max, 3♠igi. The question is how to adequately represent the common knowledge,
or public information, of the up cards, like the fact hH, Max, 3˘i that Max has the
3˘. Thus for our formal development we have primitives: players p1 , : : : , pn , cards
A♠, K♠, : : : , 2♣, and relations H for the relation of having some card and S for the
relation of seeing or otherwise having the information contained in some situation.
Definition 37.1
1. The (models of) situations and facts14 form the largest classes SIT, FACT such
that:
• 2 FACT iff is a triple, either of the form hH, p, ci, where p is a player and
c is a card, or of the form hS, p, si, where p is a player and s 2 SIT.
• A set s is in SIT iff s FACT.
2. The wellfounded situations and wellfounded facts form the smallest classes Wf-
SIT and Wf-FACT satisfying the above conditions.
Routine monotonicity considerations suffice to show that there are indeed largest
and smallest such collections. If our working metatheory were ordinary ZF set
theory, then these two definitions would collapse into a single one. However,
working in ZF/AFA, there are many nonwellfounded situations and facts. A fact
D hR, a, bi being in some situation s represents the fact of the relation R holding
of the pair a, b in s, and is said to be a fact of s.
Example 37.1, Cont’d The basic situation s0 about which player has which cards is
represented by the following situation: s0 D
fhH; Claire; Ai ; hH; Max; 3}i ; hH; Claire; 3|i ; hH; Max; 3ig
Abbreviations We sometimes write (pi Hc) for the fact hH, pi , ci, and similarly
(pi Ss) for the fact hS, pi , si. We write (pi S) for (pi Ss) where s D fg. All of our
facts are atomic facts. However, our situations are like conjunctive facts. Hence we
sometimes write ^ for the situation s D f, g, and so we can write (pi S (
^)) for pi Ss, where s D f, g. Similarly when there are more conjuncts.
Example 37.1, Cont’d With these tools and abbreviations, we can discuss the first
two approaches to the public information about the up cards in our example. Toward
this end, let su D
14
In order to keep this paper within bounds, I am restricting attention only to positive, nondisjunc-
tive facts.
766 J. Barwise
Iterates On this account, the fact that su is public information would be represented
by an infinite number of distinct wellfounded facts: (Claire Ssu ), (Max Ssu ), Claire
S(Claire Ssu ), (Max S(Claire Ssu )), etc., in other words, by a wellfounded though
infinite situation.
Fixed-Point On this account, the fact that su is publicly perceived by our players
can be represented by the following public situation sp :
˚
sp D Claire S su [ sp ; .Max S su [ sp
By contrast with the iterate approach, this situation contains just two facts. However,
it is circular and so not wellfounded. The Solution Lemma of ZF/AFA guarantees
that the sets used to represent the situation sp exists.
It will be useful for later purposes to have a notation for some of the situations
that play a role in our example. First, let the situations s1 , s2 represent the visual
situations, as seen by each of Claire and Max, respectively, including both the up
cards and what each sees about what the others see. Consider also the larger situation
sw that represents the whole. Let sw D s0 (from above) union the set of the following
facts:
where the first two facts represent what each player sees about his own down cards,
and, e.g., s1 is everything relevant seen by Claire, with facts su (D the “up” cards, as
above) plus the fact (S, Max, s2 ). Notice that s1 is a constituent of s2 , and vice versa,
so that sw is a circular, nonwellfounded situation.
The next task is to define what it means for a fact a to hold in a situation s, which
we write s ˆ , so that we can show that the situation sw does satisfy the fixed point
fact situation sp defined above, as well as the above iterates.
Definition 37.2 The relation ˆ is the largest subclass of SIT FACT satisfying the
following conditions:
• s ˆ (pHc) iff hH, p, ci 2 s
• s ˆ (pSs0 ) iff there is an s1 such that hS, p, s1 i 2 s, and for each 2 s0 , s1 ˆ .
The motivation for the second clause should be fairly obvious. If, in s, a player
p sees (or otherwise has the information) s1 , and if s1 satisfies each 2 s0 , then in
s that same player p sees (or otherwise has the information) s0 . This would not be a
reasonable assumption about the usual notion of knowledge, since knowledge is not
closed under logical entailment.
There is a difference with the possible worlds approach that sometimes seems
puzzling to someone familiar with the traditional modal approach to knowledge. In
p.w. semantics, partial situations are represented by the set of all possible worlds
37 Three Views of Common Knowledge 767
15
In more recent joint work with Aczel, a generalization of this relation takes center stage.
768 J. Barwise
.pHc/˛ D .pHc/
.pSs/˛ D .pS s<˛ /
where
n ˇ o
ˇ
s<˛ D ˇ ˇ 2 s; ˇ < ˛
Similarly, for any situation s we define the transfinite sequence hs˛ j ˛ 2 Ordinalsi
by letting s˛ D f ˛ j 2 sg.
The reader should verify that if we apply this definition to the fixed point fact in
our example, we generate the iterates for all the finite ordinals, but then we go on
beyond them into the transfinite.
We say that a fact entails a fact , written ) , if for every situation s, if s ˆ
then s ˆ .
Theorem 37.2 Let be some fact.
1. For all ˛, ) ˛ .
2. If each approximation fad ˛ holds in a situation s, then so does .
3. Assume that is a regular cardinal, and that s is a situation of size less than .
If each approximation ˛ , for ˛ < k, holds in s, then so does .
Proof The first is proved by means of a routine induction on ˛. The second is a
consequence of the maximality of ˆ and is not too difficult to prove. The third is a
strengthening of the second involving routine cardinality considerations.
Corollary 37.3 Let be any fact, and let sw be the set of all finite approximations
of . Then, for any finite situation s, s ˆ iff s ˆ sw .
Refinement (3) of (2) of Theorem 37.2, and so the above corollary, were not
present in the original working paper referred to above. They were discovered later
in joint work with Peter Aczel. This result shows that the finite approximations of
a circular fact will be equivalent to it, with respect to finite situations. This is a
bit unsatisfactory, since the iterates themselves form an infinite situation. Still, it
is the best we can hope for. However, in general, when we drop this restriction to
finite models, one must look at the whole transfinite sequence of approximations.
No initial segment is enough, as simple examples show. In this sense, the usual
iterate approach is actually weaker than the simpler fixed-point approach.
When we move from having shared information to knowing, additional consid-
erations must be brought to bear, as we will see below.
37 Three Views of Common Knowledge 769
To compare the shared environment approach with the fixed point approach, we
introduce a simple second-order language which allows us to make existential
claims about situations of just the kind made in the shared environment approach.
We call the statements of this language 9-statements. Before giving the definition,
let’s give an example. The following 9-statement
is one shared environment analysis of the fact that Claire and Max share the
information that Claire has the 3♣. Notice that what we have here is a simple, finite,
wellfounded statement, but one that could only hold of nonwellfounded situations.
Similarly, there is a fairly simple 9-statement explicitly describing the situation sw
in our running example.
To define our language, we introduce variables e1 , e2 , : : : ranging over situations,
in addition to constants for the cards and players. In fact, we do not bother to
distinguish between a card or player and the constant used to denote it in statements.
For atomic statements we have those of the form (pi Hc) (where Pi is a player and
c is a card) and (pi Sej ). The set of 9-statements forms the smallest set containing
these atomic statements and closed under conjunction (^), existential quantification
over situations (9ej ) and the rule: if ˆ is a statement so is (ej ˆ ˆ). We are thus
using ˆ both for a relation symbol of our little language, as well as a symbol in
our metalanguage. No more confusion should result from this than from the similar
use of constants for cards and people. Finally, given any function f which assigns
situations to variables, we define what it means for a statement ˆ to hold in a
situation s relative to f, written s ˆ ˆ[f ], in the expected way.
Definition 37.5
1. If ˆ is an atomic statement, then s ˆ ˆ[f ] iff the appropriate fact is an element
of s. In particular, if ˆ is (pi Sej ), then s ˆ ˆ[f ] iff hS, pi , f (ej )i 2 s.
2. If ˆ is ˆ1 ^ˆ2 then s ˆ ˆ[f ] iff s ˆ ˚ 1 [f ] and s ˆ ˆ2 [f ]
3. If ˆ is 9ej ˆ0 then s ˆ ˆ[f ] iff there is a situation sj so that s ˆ ˆ0 [f (ej /sj )]
4. If ˆ is (ej ˆ ˆ0 ) then s ˆ ˆ[f ] iff the situation sj D f (ej ) satisfies sj ˆ ˆ0 [f ].
A closed 9-statement is one with no free variables, as usual. If ˆ is closed, we
write s ˆ ˆ if some (equivalently, every) assignment f satisfies s ˆ ˆ[f ].
Notice that the 9-statements are all finite and wellfounded. (The results that fol-
low would hold equally well if we allowed infinite conjunctions and infinite strings
of quantifiers, except for the word “finite” in Theorem 37.5 below.) Nevertheless,
some of them can only hold of nonwellfounded situations, as the above example
shows.
770 J. Barwise
Clearly our statement is true in this model. It is also easy to see that s is a
hereditary subsituation of any situation which is a model of our statement, so by
Proposition 37.1, s almost characterizes the statement. This definition is justified by
the following result, which is an easy consequence of Proposition 37.1.
Proposition 37.4 Suppose that the situation s almost characterizes the 9-statement
ˆ. Then for any fact , the following are equivalent:
1. is entailed by ˆ, i.e., holds in all models of ˆ
2. s ˆ
The following is the main result of this paper. It shows the extent to which the
shared environment approach can be approximated by the fixed point and iterate
approaches.
Theorem 37.5 Every 9-statement ˆ is almost characterized by some finite situation
sˆ .
Proof First one establishes a normal form lemma for 9-statements, where all the
quantifiers are pulled out front. One then uses the Solution Lemma of AFA to
define the desired situation. The proof that it almost characterizes the statement
uses Proposition 37.1.
However, there is a distinct sense in which 9-statements are more discriminating
than the situations that almost characterize them. For example, compare our above
example of an 9-statement with the following:
Clearly any model of our first statement is a model of our second. However, it is easy
to see that there are models of our second that are not models of our first. (Think of
a case where the card is not an up card, but is down, but where there are suitably
placed mirrors.) On the other hand, these two statements are almost characterized
by exactly the same situations. Or, in view of Proposition 37.4, the two statements
entail the same facts, both wellfounded and circular.
37 Three Views of Common Knowledge 771
Intuitively, what is going on here is that both of these statements represent ways
in which Max and Claire might share the information that Claire has the 3♣. The
first would be the one predicted by a literal reading of the Clark and Marshall
account, but the second is clear in the spirit of that account. However, this means
that since they are not equivalent, neither one can be the right characterization of the
shared information. Rather, what they represent are two distinct ways, among many,
that Max and Claire might have come to have the shared information. We leave it to
the reader to work out analogous inequivalent 9-statements that also give rise to the
shared information in our running example.
We conclude this section by observing that the results can be extended to the case
where we allow disjunctions to occur in 9-statements, if one also allows disjunctive
facts.
Conclusions
holds in this situation. However, is it a fact in this situation that, say, Max knows that
Dana knows that Claire knows that he, Max, has the 3˘? And even more iterations?
It seems clear that it will not in general be true. After all, some sort of inference
is required to get each iteration, and the players might not make the inference. They
are, after all, only 3 years old. And even if Claire makes her inference, Dana may
have legitimate doubts about whether Claire has made her inference. But once one
player has the least doubt about some other player’s making the relevant inference,
the iterated knowledge facts breaks down. That is, once the making of an inference
is implausible, or even just in doubt, the next fact in the hierarchy is not really a fact
at all.
It is usually said that the iterate account assumes that all the agents are perfectly
rational, that is, that they are perfect reasoners. This example also shows that it in
fact assumes more: it assumes that it is common knowledge among the agents that
they are all perfectly rational. It is only by making this radical idealization, plus
restricting attention to finite situations, that the iterate account is equivalent to the
fixed-point account. And the idealization requires the very notion that one is trying
to understand in the first place.
We began this section by asking three questions. We have proposed answers to
the last two of them, and suggested that the third question, about how common
knowledge is used, is not answered by the iterate approach. But then how do people
make use of common knowledge in ordinary situations?
My own guess is that common knowledge per se, the notion captured by the
fixed-point analysis, is not actually all that useful. It is a necessary but not a sufficient
condition for action. What suffices in order for common knowledge to be useful
is that it arise in some fairly straightforward shared situation. The reason this is
useful is that such shared situations provide a basis for perceivable situated action,
action that then produces further shared situations. That is, what makes a shared
environment work is not just that it gives rise to common knowledge, but also that it
provides a stage for maintaining common knowledge through the maintainence of a
shared environment. This seems to me to be part of the moral of the exciting work
of Parikh, applying ideas of game theory to the study of communication.16
It seems to me that the consequences of this view of common knowledge are
startling, if applied to real world examples, things like deterrence (mutual assured
destruction, say). Indeed, it suggests a strategy of openness that is the antithesis of
the one actually employed. But that goes well beyond the scope of this conference.
Finally, let me note that the results here do not lend themselves to an immediate
comparison with other mathematical models of common knowledge, especially the
approaches in game theory. It would be interesting to see a similar analysis there,
one that pinpoints the finiteness or compactness assumption that must be lurking
behind the Tan and Ribeiro da Cost Werlang result.
16
Prashant Parikh, “Language and strategic inference,” Ph.D. Dissertation, Stanford University,
1987.
Chapter 38
The Logic of Public Announcements, Common
Knowledge, and Private Suspicions
We introduce the issues in this paper by presenting a few epistemic scenarios. These
are all based on the Muddy Children scenario, well-known from the literature on
knowledge. The intention is to expose the problems that we wish to address. These
problems are first of all to get models which are faithful to our intuitions, and then
to build and study logical systems which capture some of what is going on in the
scenarios.
The cast of characters consists of three children: A, B, and C. So that we can use
pronouns for them in the sequel, we assume that A is male, and B and C are female.
Furthermore, A and B are dirty, and C is clean. Each of the children can see all and
only the others. It is known to all (say, as a result of a shout from one of the parents)
that at least one child is dirty. Furthermore, each child must try to figure out his or
her state only by stating “I know whether I’m dirty or not” or “I don’t know whether
I’m dirty or not.” They must tell the truth, and they are perfect reasoners in the sense
that they know all of the semantic consequences of their knowledge. The opening
situation and these rules are all assumed to be common knowledge.
Scenario 1. After reflection, A and B announce to everyone that at that point they do
not know whether they are dirty or not. (The reason we are having A and B make this
announcement rather than all three children is that it fits in better with our scenarios
to follow.) Let ˛ denote this announcement.
A. Baltag ()
ILLC, University of Amsterdam, The Netherlands
e-mail: [email protected]
L.S. Moss • S. Solecki
Mathematics Department, Indiana University, Bloomington, IN 47401, USA
e-mail: [email protected]; [email protected]
As in the classical Muddy Children, there are intuitions about knowledge before
and after ˛. Here are some of those intuitions. Before ˛, nobody should know that
he or she is dirty. However, A should think that it is possible that B knows. (For if A
were clean, B would infer that she must the dirty one.) After ˛, A and B should each
know that they are dirty, and hence they know whether they are dirty or not. On the
other hand, C should not know whether she is dirty or not.
Scenario 1.5. This scenario begins after ˛. At this point, A and B announce to all
three that they do know whether or not they are dirty. We’ll call this event ˛ 0 . Our
intuition is that after ˛ 0 , C should know that she is not dirty. Moreover, A and B
should know that C knows this. Actually, the dirty-or-not states of all the children
should be common knowledge to all three.
Scenario 2. As an alternative to the first scenario, let’s assume that C falls asleep
for a minute. During this time, A and B got together and told each other that they
didn’t know whether they were dirty or not. Let ˇ denote this event. After ˇ, C
wakes up. Part of what we mean by ˇ is that C does not even consider it possible
that ˇ occurred, and that it’s common knowledge to A and B that this is the case.
Then our intuitions are that after ˇ, C should “know” (actually: believe) that A does
not know whether he is dirty (and similarly for B); and this fact about C is common
knowledge for all three children. Of course, it should also be common knowledge
to A and B that they are dirty.
Scenario 2.5. Following Scenario 2, we again have ˛ 0 : A and B announce that they
do know whether they are dirty or not. Our intuitions are not entirely clear at this
point. Surely C should suspect some kind of cheating or miscalculation on the part
of the others. However, we will not have much to say about the workings of this
kind of real-world sensibility. Our goal will be more in the direction of modeling
different alternatives.
Scenario 3. Now we vary Scenario 2. C merely feigned sleep and thought she heard
both A and B whispering. C cannot be sure of this, however, and also entertains the
possibility that nothing was communicated. (In reality, A and B did communicate.)
A and B for their part, still believe that C was sleeping. We call this event
.
One might at first glance think that A and B’s “knowledge” of C’s epistemic state
is unchanged by
. After all, the communication was not about C. However, we
work with a semantic notion of knowledge, and after
, A and B know that they are
dirty, hence then know that C knows that they are dirty. A and B did not know this
at the outset.
So we need to revise the initial intuition. What is correct is that if C knows some
fact ' before
, then after
, A and B know (or rather, believe) that C knows '. This
is because after
, A and B not only know the clean-or-dirty state of everyone, they
(therefore) also know exactly which possibilities everyone is aware of, which they
discard as impossible, etc. So each of them can reconstruct C’s entire epistemic
state. They believe that their reconstruction is current, but of course, what they
reconstruct is C’s original one, before
.
38 The Logic of Public Announcements, Common Knowledge, and Private Suspicions 775
Conversely, if after
, A and B “know” that C knows ', then before
, C really
did know '. That is, the reconstruction is accurate. For example, after
, A believes
that C should not consider it possible that A knows that he is dirty. However, C
thinks it is possible that A knows he is dirty.
There is a stronger statement that is true: C knows ' before
iff after
, it is
common knowledge to A and B that each of them knows that C knows '. Intuitively,
this hold because each of A and B knows that both of them are able to carry out the
reconstruction of C’s state.
Our final intuition is that after
, C should know that if A were to subsequently
announce that he knows that he is dirty, then C would know that B knows that she
is dirty.
Scenario 3.5. Again, continue Scenario 3 by ˛ 0 . At this point, C should know that
her suspicions were confirmed, and hence that she is not dirty. For their part, A and
B should think that C is confused by ˛ 0 : they should think that C is as she was
following Scenario 2.5.
Scenario 4. A and B are on one side of the table and C is on the other, dozing.
C wakes up at what looks to her like the middle of a joint confession by A and B.
The two sides stare each other down. In fact, A and B have already communicated.
We call this action ı. So C suspects that ı is what happened, but can’t tell if it was
ı or nothing. For their part, A and B see that C suspects but does not know that ı
happened.
The basic intuition is that after ı, it should be common knowledge to all three
that C suspects that the communication happened. Even if C thinks that A and B did
not communicate, C should not think that she is sure of this.
One related intuition is that after ı, it should be common knowledge that C
suspects that A knows that he is dirty. As it happens, this intuition is wrong. Here is
a detailed analysis: C thinks it possible that everyone is dirty at the outset, and if this
were the case then the announcement of B’s ignorance would not help A to learn that
he is dirty; from A’s point of view, he still could be clean and B would not know that
she is dirty. C’s view on this does not change as a result of ı, so afterwards, C still
thinks that it could be the case that A says, “It’s possible that B and C are the dirty
one and I am clean, Hence C would see my clean face and not suspect that I know
that I am dirty.” So it certainly should not be common knowledge that C suspects
that A knows he is dirty.
Notice also that C would say after ı: “I think it is possible that no announcement
occurred, and yet A thinks it possible that B is the only dirty one. In that case, what A
would think that I suspect that A told B that he knows that he is not dirty. Of course,
this is not what I actually suspect.” The point is that C’s reasoning about A and B’s
reasoning about her involves suspicion of a different announcement than we at first
considered.
Scenario 4.5. Once again, we continue with ˛ 0 . Our intuition is that this is
tantamount to an admission of private communication by A and B. If we disregard
776 A. Baltag et al.
this and only look at higher order knowledge concerning who is and is not dirty, we
expect that the epistemic state after ˛ 0 is the same for all three children as it is at the
end of Scenario 1.5.
Models
Now that we have detailed a few scenarios and our intuitions about them, it is time
to construct some Kripke models as representations for them.
u 1 : DC u 2 : DB u3 : DB,DC
u4 : DA u5 : DA,DC u6 : DA,DB u7 : DA,DB,DC
A
u5 u1
u2 u7
A C B A
u6 u3
B
u4
Note that we have not indicated any direction on the edges; they are all intended to
be bidirectional. In addition, we have not shown self-loops for the agents. However,
we intend in this model that all of the self-loops be present on all nodes for all
agents.
As an example of reading the picture, in world u3 , A is clean, but B and C are
dirty. Also, the worlds which A thinks are possible are u3 and u7 . Thus, A sees that
B and C are dirty, so A infers that the world is either u3 or u7 . The lines for C are
intended to go across the middle. The rest of the structure is explained similarly,
except for the doubled ellipse around u6 . This specifies u6 as the actual world in the
model, the one which corresponds to our description of the model before ˛. Note
that U incorporates some of the conventions stated in Scenario 1. For example, in
each world, each child has a complete and correct assessment of which worlds are
possible for all three reasoners.
Each of our intuitions about knowledge before ˛ turns into a statement in the
modal logic of knowledge. This logic has atomic sentences DA , DB , and DC standing
for “A is dirty”, etc.; it has knowledge operators A , B , and C along with the
38 The Logic of Public Announcements, Common Knowledge, and Private Suspicions 777
usual boolean connectives. We are going to use the standard Kripke semantics for
multi-modal logic throughout this paper. So given a model-world pair, say hA; ai,
and some agent, say D, we’ll write
The boolean connectives will be interpreted classically. We can then check the
following:
hU; u6i ˆ :A DA ^ :A :DA ^ :B DB ^ :B :DB ^ :C DC ^ :C :DC
hU; u6i ˆ ÞA B DB
A
v5 v1
v7
C
B A
v6 v3
The way we got V from U was to discard the worlds u2 and u4 of U, since in U at
each of those worlds, either A or B would know if they were dirty. We also changed
the u’s to v’s to avoid confusion, and to stress the fact that we get a new model.
Turning back to our intuitions, we can see that the following holds:
w6
(We have renamed v6 to w6 . We continue to omit the three self-loops, but shortly
our pictures will begin to incorporate those when they are appropriate.) This model
reflects our intuition that at this point, C should know that she is not dirty.
778 A. Baltag et al.
The Model X. This corresponds to Scenario 2. We start with U and see the effect of
the private announcement ˇ. The resulting model X is too large to show in a small
diagram, so instead we use a chart:
A B C
World ABC ! ! !
u1 ; : : : ; u7
x1 x1 ; x5 x1 ; x3 u1
x3 x3 ; x7 x1 ; x3 u2 ; u3
x5 x1 ; x5 x5 ; x7 u4 ; u5
p
x6 x6 x6 u6 ; u7
x7 x3 ; x7 x5 ; x7 u6 ; u7
The “real world” is x6 . Notice that the worlds u1 ; : : : ; u7 are also worlds in X. We
did not put any information in the chart above for those worlds since it should be
exactly the same as in U above. The reason for having these “old worlds” in X is
that since C was asleep, the worlds that C considers possible after ˇ should be just
the ones that were possible before ˇ. We can check that
Let ' be the sentence above. Then also, hX; x6 i ˆ fA;B;Cg '. This is our formal
statement that it is common knowledge in the group of three children that ' holds.
The semantics of this is that for all sequences D1 ; : : : ; Dm 2 fA; B; Cg , hX; x6 i ˆ
D1 Dm '. Note that we have no way of saying in the modal language that C
suspects that an announcement happened; the best we can do is (roughly) to say that
C thinks that some sentence is possible in the sense that holds in some possible
world. Of course, we have no way to say that A and B know that C was asleep, either.
C
Note as well that in X, we do not have x6 ! x6 . In other words, the real world
would not be possible for C. This is some indication that something strange is going
on in this model. Further, we consider the model of what happens after A and B’s
announcement. Then in this model, no worlds would be accessible for C from the
actual world. These anomalies should justify our interest in the more complicated
scenarios and models involving suspicions of announcements.
The Model obtained by announcing ’0 in X. This would be the model with one
world, say x6 where A and B are dirty, and whose structure is given by
x∗6 A, B
We have not only deleted the worlds where either A or B does not know that they
are dirty in X, but we also discarded all worlds not reachable from the new version
x6 of x6 . The anomaly here is that C thinks no worlds are possible.
38 The Logic of Public Announcements, Common Knowledge, and Private Suspicions 779
The Model Y. We consider
from Scenario 3, in which C thought she might have
heard A and B, while A and B think that C is unaware of
. We get the model Y
displayed in Fig. 38.1 above. Y has 24 worlds, and so we won’t justify all of them
individually. We will give a more principled construction of Y from W and
, once
we have settle on a mathematical model of
. For now, the ideas are that the y
worlds are those where the announcement happened, and the y0 worlds are those
in which it did not. Note that some of the y worlds are missing, since the truthful
announcement by A and B presupposes that they don’t know whether they are dirty
in U at the corresponding world. The x’s and u’s are from above, and they inherit
the accessibility relations which we have seen.
Now our main intuition here is that hU; u6 i ˆ C ' iff hY; y6 i ˆ C fA;Bg C '.
C
(The sentence fA;Bg means that A knows ', A knows B knows , etc. It differs
from fA;Bg in that it does not entail that is true.) To see this, note that u6 !
C
u6 ; u7
A
and no other worlds. And the only worlds reachable from y6 using one or more !
B C
or ! transitions followed by a ! transition are again u6 and u7 .
Another intuition is that in hY; y6 i, C should think that it is possible that A knows
that he is dirty. This is justified since y6 ! C
x6 , and hY; x6 i Š hX; x6 i (that is, the
submodels of X and Y generated by x6 are isomorphic), and hX; x6 i ˆ A DA .
Our final intuition is that in hY; y6 i, C should know that if A were to subsequently
announce that he knows that he is dirty, then C would know that B knows that she is
dirty. To check this, we need to modify Y by deleting the worlds where A does not
know that he is dirty. These include y7 , y06 and y07 . In the updated model, the only
world accessible for C from (the new version of) y6 is y6 itself, and at y6 in the new
structure, B correctly knows she is dirty.
780 A. Baltag et al.
A, B
C y6∗ x#
6
A, B
We also only keep the worlds accessible from y6 (this change is harmless).
C knows she is not dirty. Technically, A and B “know” this, but this is for the
nonsensical reason that they “know” that C knows everything.
A B C A B C
w ABC ! ! ! w ABC ! ! !
z1 z1 ; z5 z1 ; z3 z1 ; z01 z01 z01 ; z05 z01 ; z03 z1 ; z01
z2 z2 z2 z2 ; z3 ; z02 ; z03 z20
z02 ; z06 z2 ; z02 z2 ; z3 ; z02 ; z03
z3 z3 ; z7 z1 ; z3 z2 ; z3 ; z02 ; z03 z30
z03 ; z07 z01 ; z03 z2 ; z3 ; z02 ; z03
z4 z4 z4 z4 ; z5 ; z04 ; z05 z04 z04 ; z06 z04 ; z06 z4 ; z5 ; z04 ; z05
z5 z1 ; z5 z5 ; z7 z4 ; z5 ; z04 ; z05 z5 z01 ; z05
0
z05 ; z07 z4 ; z5 ; z04 ; z05
p
z6 z6 z6 z6 ; z7 ; z06 ; z07 z06 z02 ; z06 z04 ; z06 z6 ; z7 ; z06 ; z07
z7 z3 ; z7 z5 ; z7 z6 ; z7 ; z06 ; z07 z07 z03 ; z07 z05 ; z07 z6 ; z7 ; z06 ; z07
Recall our last point in Scenario 4, that we need to consider a few possible
announcements for C to suspect. This is reflected in the fact that the z worlds are of
three types. In z2 , B announced that she knows whether she is dirty, and A announced
that he doesn’t. Similar remarks apply to z4 . In all other z worlds, both announced
that they do not know. The worlds accessible from each of these is based on the
relevant announcement. For example, in z2 , neither A nor B thinks any other world
A
is possible. (One might think that z2 ! z6 . But in z6 , B could not announce that she
knows she is dirty. So if the world were z2 and the relevant announcement made, then
A would not think z6 is possible.) The z0 worlds are those in which no announcement
actually happened.
Our key intuition was that it is common knowledge that C suspects that ı
happened. This will not correspond to anything in the formal language L.Œ˛; /
introduced later in this paper. (However, it will be representable in an auxiliary
language about actions; see Example 3.) Informally, the intuition is valid for Z
because for every zi (or z0i ) there is some zj (unprimed) such that zi ! C
zj (or z0i !
C
zj ).
In addition, in this particular model there is a sentence in our formal language which
happens to hold only at the worlds where an announcement occurred. Here is one:
A DA _ A :DA _ ÞfA;Bg ÞC A DA
So hZ; z6 i ˆ fA;B;Cg ÞC .
38 The Logic of Public Announcements, Common Knowledge, and Private Suspicions 781
C A
The explanation of the mistaken intuition in Scenario 4 is that z6 ! z7 ! z3 , and
0 0
ÞA A DA fails in z2 , z3 , z2 , and z3 . Overall, hZ; z6 i ˆ :C A ÞA A DA .
The point that C’s suspicion varies corresponds to the fact that ÞC ÞA ÞC A :DA
holds at hZ; z6 i. Indeed z6 ! C
z06 !
A
z02 !
C
z2 , and hZ; z2 i ˆ A :DA .
A few more involved statements are true in Z. For example, fA;B;Cg .A DA !
ÞC A DA /. It is common knowledge to all three that if A knows he is dirty, then C
thinks it possible that A knows this.
Epistemic Actions
So says that neither A nor B know whether or not they are dirty. This is the
precondition of the announcement, but it is not the structure. The structure of this
announcement is quite simple (so much so that the reader will need to read further
to get an idea for what we mean by structure). It is the following Kripke structure
K: we take one point, call it k, and we set k !D
k for all D 2 fA; B; Cg. We call
hK; ki an action structure. Along with K, we also have a precondition; this will
be from (38.1). To deal with action structures with more than one point, the
precondition will be a function PRE from worlds to sentences. In this case, the
function PRE is just fhk; ig. The tuple hK; k,PREi will be an example of what
we call an action. This particular action is our model of the announcement ˛.
Henceforth we use the symbol ˛ to refer ambiguously to the pretheoretic notion
of the announcement event and to our mathematical model of it.
Another example of an announcement to everyone is ˛ 0 . Here we just change
from (38.1) to the sentence 0 which says that both A and B know whether or
not they are dirty. Yet another example is the null announcement. This models the
announcement of a tautology true to everyone. We’ll write this as .
ˇ: a secure announcement to a set of agents. Next, suppose we have an
announcement made to some possibly proper subset B A in the manner of
Scenario 2. So there is some dispute as to what happened: the agents in B think that
782 A. Baltag et al.
there was an announcement, while those out of B are sure that nothing happened.
D
We formalize this with a Kripke structure of two points, l and t. We set l ! l for all
D 2 B, l ! t for D … B, and t ! t for all D. The point is that l here is the actual
D D
announcement, and the agents in B know that this is the announcement. The agents
not in B think that t is for sure the only possible action, and t in this model will
behave just like the null announcement. The precondition function will be called
PRE in all of our examples. Here PRE is given by PRE .l/ D and PRE.t/ D true,
where is from (38.1). The action overall is hL; l; PREi, where L D fl; tg. We call
this action ˇ.
”: an announcement with a suspicious outsider. This is based on Scenario 3. The
associated structure has four points, as shown below:
PRE .m/ D PRE .n/ D true PRE.l/ D PRE .n/ D true
C
C m n C
A, B A, B
A, B l t A, B, C
C
The idea is that m is the (private) announcement that C suspects, and n is other
announcement that C thinks is possible (where nothing was communicated by A
and B). Then if m happened, A and B were sure that what happened was l; similarly,
if n happened, A and B would think that t was what happened. We call this action
;
technically it is hfm; n; l; tg; m; PREi. We get a different action, say
0 if we use the
same model as above but change the designated (“real”) world from m to n.
ı: an announcement with common knowledge of suspicion. Corresponding to
Scenario 4, we have the following model. In it A denotes the sentence saying that
A knows whether he is dirty but B does not, B the sentence saying that B knows
whether she is dirty but A does not, and ; the sentence stating that neither knows.
PRE .o/ D PRE .p/ D A PRE .q/ D B PRE .r/ D ; PRE .s/ D true
A, B, C A, B, C A, B, C A, B, C
o p q r
C C
C C
s
A, B, C
38 The Logic of Public Announcements, Common Knowledge, and Private Suspicions 783
We call this action ı. There are five possible actions here, depending on whether it
was , A , B , ; or nothing which was announced. In each case, A and B are sure
of what happened. Even if nothing actually happened (s), C would suspect one of the
other four possibilities. In those, C still considers it possible that nothing happened.
Still to come. The reader is perhaps wondering what the actual connection is
between the (formal) actions just introduced and the concrete models of the previous
section. The connection is that there is a way of taking a model and an action
and producing another model. When applied to the specific model U and the
actions of this section, we get the models V, : : :, Z. We delay this connection until
section “Semantics” below, since it is high time that we introduce our language of
epistemic actions and its semantics. The point is that there is a principled reason
behind the models.
The question also arises as to whether there are any principles behind the
particular actions which we put down in this section. As it happens, there is more
which can be said on this matter. We postpone that discussion until section “More
on Actions”, after we have formally defined the syntax and semantics of our logical
languages.
The Issues
The main issue we address in this paper is to formally represent epistemic updates,
i.e., changes in the information states of agents in a distributed system. We think
of these changes as being induced by specific information-updating actions, which
can be of various types: (1) information-gathering and processing (e.g., realizing the
possibility of other agents’ hidden actions, and more generally, learning of any kind
of new possibility via experiment, computation, or introspection); (2) information-
exchange and communication (learning by sending/receiving messages, public
announcements, secret interception of messages, etc.); (3) information-hiding (lying
or other forms of deceiving actions, such as communication over secret chan-
nels, sending encrypted messages, holding secret suspicions); (4) information-loss
and misinformation (being lied to, starting to have gratuitous suspicions, non-
introspective learning, wrong computations or faulty observations, paranoia); (5)
and more generally sequential or synchronous combinations of all of the above.
Special cases of our logic, dealing only with public or semi-public announce-
ments to mutually isolated groups, have been considered in Plaza (1989), Gerbrandy
(1999a,b), and Gerbrandy and Groeneveld (1997). These deal with actions such as
˛ and ˇ in our Introduction. Our examples
and ı go beyond what is possible in the
setting of these papers. But our overall setting is much more liberal setting, since it
allows for all the above-mentioned types of actions. We feel it would be interesting
to study further examples with an eye towards applications, but we leave this to
other papers.
784 A. Baltag et al.
In our formal system, we capture only the epistemic aspect of these real actions,
disregarding other (intentional) aspects. In particular, for simplicity reasons, we only
deal with “‘purely epistemic” actions; i.e., the ones that do not change the facts of
the world, but affect only the agents’ beliefs about the world. However, this is not
an essential limitation, as our formal setting can be easily adapted to express fact-
changing actions (see the end of section “More on Actions” and also section “Two
Extensions”).
On the semantical side, the main original technical contribution of our paper
lies in our decision to represent not only the epistemic states, but also the epistemic
actions, by Kripke structures. While for states, these structures represent in the usual
way the uncertainty of each agent concerning the current state of the system, we
similarly use action-structures to represent the uncertainty of each agent concerning
the current action taking place. The intuition is that we are dealing with potentially
“half-opaque/half-transparent” actions, about which the agents may be incompletely
informed, or even completely misinformed. Besides the structure, actions have
preconditions, defining their domain of applicability: not every action is possible in
every state. We model the update of a state by an action as a partial update operation,
given by a restricted product of the two structures: the uncertainties present in the
given state and the given action are multiplied, while the “impossible” combinations
of states and actions are eliminated (by testing the actions’ preconditions on the
state). The underlying intuition is that the agent’s uncertainties concerning the
state and the ones concerning the action are mutually independent, except for the
consistency of the action with the state.
On the syntactical side, we use a mixture of dynamic and epistemic logic,
with dynamic modalities associated to each action-structure, and with common-
knowledge modalities for various groups of agents (in addition to the usual
individual-knowledge operators). We give a complete and decidable axiomatization
for this logic, and we prove various expressivity results. From a proof-theoretical
point of view, the main originality of our system is the presence of our Action
Rule, an inference rule capturing what might be called a notion of “epistemic
(co)recursion”. We understand this rule and our Knowledge-Action Axiom (a gen-
eralization of Ramsey’s axiom to half-opaque actions) as expressing fundamental
formal features of the interaction between action and knowledge in multi-agent
systems, features that we think have not been formally expressed before.
Section “A Logical Language with Epistemic Actions” gives our basic logic L.Œ˛/
of epistemic actions and knowledge. The idea is to define the logic together
with the action structures which we have just looked at informally. So in L.Œ˛/
we finally will present the promised formal versions of the announcements of
section “Epistemic Actions”. In section “A Logic for L.Œ˛/” we present a sound
38 The Logic of Public Announcements, Common Knowledge, and Private Suspicions 785
Syntax
We begin with a set AtSen of atomic sentences, and we define two sets simultane-
ously: the language L.Œ˛/, and a set of actions (over L.Œ˛/).
L.Œ˛/ is the smallest collection which includes AtSen and which is closed under
:, ^, A for A 2 A, and Œ˛', where ˛ is an action over L.Œ˛/, and ' 2 L.Œ˛/.
An action structure (over L.Œ˛/) is a pair hK; PREi, where K is a finite Kripke
frame over the set A of agents, and PRE is a map PRE W K ! L. We will usually
write K for the action structure hK; PREi. An action (over L.Œ˛/) is a tuple ˛ D
hK; k; PREi, where hK; PREi is an action structure over L.Œ˛/, and k 2 K. Each
D
action ˛ thus is a finite set with relations ! for D 2 A, together with a precondition
function and a specified actual world.
The actions themselves constitute a Kripke frame Actions in the natural way, by
setting
hK; k; PRE i !
D
hL; l; PRE 0 i iff K D L, PRE D PRE 0 , and k !
D
l in K: (38.2)
When ˛ D hK; k; PREi, we set PRE.˛/ D PRE .k/. That is, PRE.˛/ is the
precondition associated to the distinguished world of the action. For this reason,
we often write PRE.˛/ instead of PRE.k/.
Examples 1. All of the sentences mentioned in section “Models” are sentences
of L.Œ˛/, except for the ones that use fA;B;Cg . This construct gives us a more
expressive language, as we shall see. The structures ˛, , ˇ,
,
0 , ı, and ı 0 described
informally in section “Epistemic Actions” are bona fide actions. As examples of the
D
accessibility relation on the class of actions, we have the following facts: ˛ ! ˛ and
D B C C AB 0 C
! for all D 2 fA; B; Cg; ˇ ! ˇ; ˇ ! ˇ; ˇ ! ;
! ˇ;
;
!
;
;
0 ! 0 AB
;
AB 0 AB 0 0 C 0
ı ! ı, ı ! ı , and ı; ı ! ı; ı .
Many other types of examples are possible. We can represent misleading
epistemic actions, e.g. lying, or more generally acting such that some people do
not suspect that your action is possible. We can also represent gratuitous suspicion
(“paranoia”): maybe no “real” action has taken place, except that some people start
suspecting some action (e.g., some private communication) has taken place.
786 A. Baltag et al.
Semantics
As with the syntax, we define two things simultaneously: the semantic relation
hW; wi ˆ ', and a partial operation .hW; wi; ˛/ 7! hW; wi˛ . Before this, we need
another definition. Given a model W and an action structure K, we define the model
W K as follows:
1. The worlds of W K are the pairs .w; k/ 2 W K such that hW; wi ˆ PRE .k/.
2. For such pairs,
A
.w; k/ ! .w0 ; k0 / A
iff w ! w0 and k !
A
k0 : (38.3)
3. We interpret the atomic sentences by setting vW K ..w; k// D vW .w/. That is, p is
true at .w; k/ in W K iff p is true at w in W.
Given an action ˛ D hK; ki and a model-world pair hW; wi, we say that hW; wi˛
is defined iff hW; wi ˆ PRE .k/, and in that case we set hW; wi˛ D hW; wihK;ki D
hW K ; .w; k/i. One can now check that the following holds for these definitions.
hW; wi˛ !
A
hW; xiˇ iff hW; wi˛ and hW; xiˇ are defined, w !
A A
x in W, and ˛ ! ˇ:
The semantics is given by extending the usual clauses for modal logic by one for
actions:
hW; wi ˆ Œ˛' iff hW; wi˛ is defined implies hW; wi˛ ˆ ':
hW; wi ˆ h˛i' iff hW; wi˛ is defined and hW; wi˛ ˆ ':
We also abbreviate the boolean connectives classically, and we let true denote some
tautology such as p _ :p.
The larger language L.Œ’; / We also consider a larger language L.Œ˛; /.
This is defined by adding operators B for all subsets B A. (When we do this,
of course we get more actions as well.) The semantics works by taking B ' to
abbreviate the infinitary conjunction
^
A1 An ':
hA1 ;:::;An i2B
Here B is the set of all sequences from B. This includes the empty sequence, so
B ' logically implies '.
38 The Logic of Public Announcements, Common Knowledge, and Private Suspicions 787
Bisimulation Given two models, say K and L, over the same set of A of agents, a
bisimulation between K and L is a relation R K L such that if kRl and A 2 A,
then:
1. For all atomic p, hK; ki ˆ p iff hL; li ˆ p.
2. For all k !A
k0 there is some l !
A
l0 such that k0 Rl0 .
A 0
3. For all l ! l there is some k ! k0 such that k0 Rl0 .
A
Given two model-world pairs hK; ki and hL; li, we write hK; ki hL; li iff there is
some bisimulation R such that kRl. It is a standard fact that if hK; ki hL; li, then
the two pairs agree on all sentences of standard modal logic. In our setting, we also
can speak about actions being bisimilar: we change condition (1) above to refer to
say that PRE.k/ D PRE .l/. It is easy now to check two things simultaneously: (1)
bisimilar pairs agree on all sentences of L.Œ˛/; and (2) if hK; ki hL; li and ˛ ˇ,
then hK; ki˛ hL; liˇ . Furthermore, these results extend to L.Œ˛; /.
Examples 2. We look back at section “Models” for some examples. We use Š to
denote the relation of isomorphism on model-world pairs. It is not hard to check
the following: hU; u6 i˛ Š hV; v6 i, hU; u6 iˇ Š hX; x6 i, hU; u6i
Š hY; y6 i, and
hU; u6 iı Š hZ; z6 i. For example, the isomorphism which shows that hU; u6 iı Š
hZ; z6 i is .ui ; o/ 7! zi for i ¤ 2; 4, .u2 ; q/ 7! z2 , .u4 ; p/ 7! z4 , and .ui ; r/ 7! z0i for
all i.
Let ˛ 0 be the action of announcing to all agents that both A and B do know
0 0
whether they are dirty. Then hV; v6 i˛ Š hX; x6 i. Moreover, hZ; v6 i˛ hX; x6 i.
Note that in this case we only have bisimilarity. However, we know that our
languages will not discriminate between bisimilar pairs, so we can regard them
as the same. This models our intuition that the epistemic states at the end of
Scenarios 1.5 and 4.5 should be the same.
Finally, all of the semantic facts about the various models in section “Models”
now turn into precise statements. For example, hU; u6 i ˆ Œ˛ ÞA B DB . Also,
hU; u6 i ˆ Œ˛Œ˛ 0 A;B;C C DC . This formalizes our intuition that if we start with
hU; u6 i, first announce that each of A and B do not know their state, then second
announce that they each do know it, then at that point it will be common knowledge
to all three that C knows she is dirty.
More on Actions
In this section, we have a few remarks on actions. The point here is to clarify the
relation between the scenarios of section “Introduction: Example Scenarios and
Their Representations” and the intuitions concerning them, and the corresponding
actions of section “Epistemic Actions”.
First and foremost, here are the the conceptual points involved in our formaliza-
tion. The idea is that epistemic actions present a lot of uncertainty. Indeed, what
might be thought of as a single action (or event) is naturally interpreted by agents
788 A. Baltag et al.
in different ways. The various agents might be unclear on what exactly happened,
and again they might well have different interpretations on what is happening. Our
formalization reflects this by making epistemic actions into Kripke models. So our
use of possible-worlds modeling of actions is on a par with other uses of these
models, and it inherits all of the features and bugs of those approaches.
Next, we want to spell out in words what our proposal amounts to. The basic
problem is to decide how to represent what happens to a Kripke model W after an
announcement ˛. (Of course, we are modeling ˛ by an action in our formal sense.)
Our solution begins by considering copies of W, one for each action token k of ˛ in
which PRE .˛/ holds. We can think of tagging the worlds of W with the worlds of
˛, and then we must give an account of the accessibility relation between them. The
intuition is that the agents’ relations to alternative worlds should be independent
from their relations to other possibilities for ˛. So the accessibility relations of K
and W should be combined independently. This is expressed formally in (38.3).
The auxiliary language LO has as atomic sentences all sentences ' of L.Œ˛; /.
It has all boolean connectives, standard modal operators A for A 2 A, and also
group knowledge operators B for B A.
We interpret LO on actions using the standard clauses for the connectives and
modal operators, and by interpreting the atomic sentences as follows hK; ki ˆ p iff
PRE .k/ D p.
Examples 3. The idea here is that the auxiliary language formalizes talk about what
the different agents think is happening in our announcements. We refer back to the
actions of section “Epistemic Actions”. For example, ˛ ˆ fA;B;Cg . Intuitively, in
˛, it is common knowledge that was announced. Another example: that
ı ˆ fA;B;Cg ÞC . _ A _ B /:
That is, in ı, it is common knowledge that C thinks it is possible that some non-
trivial announcement happened. Recall that this was one of our basic intuitions about
ı, one which is not in general statable in our main language L.Œ˛; /.
O Then
Definition. Let hK; ki be a model-world pair, and let ' be a sentence of L.
characterizes hK; ki iff for all hL; li, hL; li ˆ iff hL; li hK; ki.
Proposition 4. Let hK; ki be a model-world pair with K finite. Then there is a
sentence of LO which characterizes hK; ki.
Proof. By replacing hK; ki by its quotient under the largest auto-bisimulation, we
may assume that if l ¤ m, then hK; li 6 hK; mi. It is well-known that the relation
of elementary equivalence in modal logic is a bisimulation on models in which each
world has finitely many arrows coming in and out. It follows from this and the
overall finiteness of K that we can find sentences 'l for l 2 K with the property that
for all l and m, hK; mi ˆ 'l iff m D l. Let be the following sentence
38 The Logic of Public Announcements, Common Knowledge, and Private Suspicions 789
^ _ ^
'l ! A 'l0 ^ ÞA 'l0
l2K;A2A A A
l ! l0 l ! l0
Going back to our original hK; ki, let be 'k ^ A . It is easy to check that each
hK; li satisfies ; hence each satisfies A . Therefore hK; ki ˆ . We claim that
characterizes hK; ki. To see this, suppose that hJ; ji ˆ . Consider the relation
R K J given by
It is sufficient to see that R is a bisimulation. We’ll verify half of this: suppose that
k0 Rj0 and j0 !
A
j00 . By using , we see that there is some k00 such that k0 ! A
k00 and
00 00
j ˆ 'k00 . And also, since ˆ A ! A A , we see that hJ; j i ˆ A . This
completes the proof.
The connection of this result and our discussion of actions is that it is often
difficult to go from an informal description of an an epistemic action to a formal one
along our lines. (For example, our formulation of ı was the last of several versions.)
Presumably, one way to get a formal action in our sense is to think carefully about
which properties the action should have, express them in the auxiliary language, and
then write a characterizing sentence such as in the proof of Proposition 4. Then
one can construct the finite model by standard methods. Although this would be a
tedious process, it seems worthwhile to know that it is available.
Our formalization of actions reflects some choices which one might wish to
modify. One of these choices is to take the range of the function PRE to be some
language. Another option would be to have the range to be the power set of that
language. This would make actions into Kripke models over the whole set of
sentences. (And so what we have done is like considering modal logic with the
restriction that at any world satisfies exactly one atomic sentence.) Taking this
other option thus brings actions and models closer. This idea is pursued in Baltag
(1999), a continuation of this work which develops a “calculus of epistemic actions.”
This replaces the “semantic” actions of this paper with action expressions. These
expressions have nicer properties than the auxiliary language of this paper, but it
would take us too far afield to discuss this further.
On a different matter, it makes sense to restrict attention from the full collection
of actions as we have defined it to the smaller collection of S5 actions, where each
A
accessibility ! is an equivalence relation. This corresponds to the standard move
of restricting attention to models with this property, and the reasons for doing this
are similar. Intuitively, an S5 action is one in which every agent is introspective
(with respect to their own suspicions about actions). Moreover, the introspection is
accurate, and this fact is common knowledge.
A final modification which is quite natural is to allow actions which change the
world. One would do this by adding to our notion of action a sentential update u.
This would be a function defined on AtSen and written in terms of update equations
790 A. Baltag et al.
such as u.p/ WD p ^ q; u.q/ D false, etc. We are confident that our logical systems
can be modified to reflect this change, and we discuss this at certain points below.
We decided not to make this change mostly in order to keep the basic notions as
simple as possible.
With respect to both of the changes mentioned in the last two paragraphs, it is
not hard to modify our logical work to get completeness results for the new systems.
We discuss all of this in section “Two Extensions”.
In Fig. 38.2 below we present a logic for L.Œ˛; / which we shall study later. In
this section, we shall restrict the logic to the simpler language L.Œ˛/. We do so
partly to break up the study of a system with many axioms and rules, and partly to
emphasize the significance of adding the infinitary operators B to L.Œ˛/. To carry
Fig. 38.2 The logical system for L.Œ˛; /. For L.Œ˛/, we drop the axioms and rules
38 The Logic of Public Announcements, Common Knowledge, and Private Suspicions 791
out the restriction, we forget the axioms and rules of inference in Fig. 38.2 which
are marked by a . In particular ˛ ı ˇ will be defined later (section “A Logic for
L.Œ˛; /”).
The rules of the system are all quite standard from modal logic. The Action
Axioms are the interesting new ones. In the Atomic Permanence axiom, p is an
atomic sentence. The axiom then says that announcements do not change the brute
fact of whether or not p holds. This axiom reflects the fact that our actions do
not change any kind of local state. (We discuss an extension of our system in
section “Two Extensions” where this axiom is not sound.) The Partial Functionality
Axiom corresponds to the fact that the operation hW; wi 7! hW; wi˛ is a partial
function. The key axiom of the system is the Action-Knowledge Axiom, giving
a criterion for knowledge after an announcement. We will check soundness of this
axiom leaving checking soundness of other unstarred axioms and rules to the reader.
Proposition 1. The Action-Knowledge Axiom
^
Œ˛A ' $ .PRE .˛/ ! fA Œˇ' W ˛ !
A
ˇg/
is sound.
Proof. We remind the reader that the relevant definitions and notation are found in
section “Semantics”. Let ˛ be the action hK; ki. Fix a pair hW; wi. If hW; wi ˆ
:PRE.˛/, then both sides of our biconditional hold. We therefore assume that
hW; wi ˆ PRE.˛/ in the rest of this proof. Assume that hW; wi˛ ˆ A '. Take
some ˇ such that ˛ ! A
ˇ. This ˇ is of the form hK; k0 i for some k0 such that k !
A
k0 .
Let w !A
w0 . We have two cases: hW; w0 i ˆ PRE .k0 /, and hW; w0 i ˆ :PRE.k0 /. In the
latter case, hW; w0 i ˆ Œˇ' trivially. We’ll show this in the former case, so assume
hW; w0 i ˆ PRE .k0 /. Then .w0 ; k0 / is a world of W K , and indeed .w; k/ !A
.w0 ; k0 /.
Now our assumption that hW; wi˛ ˆ A ' implies that hW K ; .w0 ; k0 /i ˆ '. This
means thatVhW; w0 iˇ ˆ '. Hence hW; w0 i ˆ Œˇ'. Since ˇ and w0 were arbitrary,
hW; wi ˆ ˇ A Œˇ'.
The other direction is similar.
The rest of this section is devoted to the completeness result for L.Œ˛/. The
reader not interested in this may omit the rest of this section, but at some points later
we will refer back to the term rewriting system R which we shall describe shortly.
Our completeness proof is based on a translation of L.Œ˛/ to ordinary modal logic
L. And this translation is based on a term rewriting system to be called R.
The rewriting rules of R are:
This takes some work, and because the details are less important than the facts
themselves, we have placed the entire matter in an Appendix to this paper. (The
Appendix also discusses an extension of the rewrite system R to a system R for
the larger language L.Œ˛; /, so if you read it at this point you will need to keep
this in mind.)
In the next result, we let L be ordinary modal logic over AtSen (where of course
there are no actions).
Proposition 3. There is a translation t W L.Œ˛/ ! L such that for all ' 2 L.Œ˛/,
' is semantically equivalent to ' t .
Proof. Every sentence ' of L.Œ˛/ may be rewritten to a normal form. By Lemma 2,
the normal forms of ' is a sentence in L. We therefore set ' t to be any normal
form of ', say the one obtained by carrying out leftmost reductions. The semantic
equivalence follows from the fact that the rewrite rules themselves are sound, and
from the fact that semantic equivalence is preserved by substitutions.
Lemma 4 (Substitution). Let ' be any sentence, and let ` $ 0 . Suppose
that 'Œp= comes from ' by replacing p by at some point, and 'Œp=0 comes
similarly. Then ` 'Œp= $ 'Œp=0 .
Proof. By induction on '. The key point is that we have necessitation rules for each
Œ˛.
Theorem 5. This logical system for L.Œ˛/ is strongly complete: † ` ' iff † ˆ '.
Proof. The soundness half being easy, we only need to show that if † ˆ ', then
† ` '. First, †t ˆ ' t . Since our system extends the standard complete proof
system of modal logic, †t ` ' t . Now for each of L.Œ˛/, ` $ t . (This is an
easy induction on < using Lemma 4.) As a result, † ` t for all 2 †. So † ` ' t .
As we know ` ' t $ '. So we have our desired conclusion: † ` '.
Strong completeness results of this kind may also be found in Plaza (1989) and
in Gerbrandy and Groeneveld (1997). We discuss some of the history of the subject
in section “Conclusions and Historical Remarks”.
38 The Logic of Public Announcements, Common Knowledge, and Private Suspicions 793
At this point, we turn to the completeness result for L.Œ˛; /. It is easy to check
that there is no hope of getting a strong completeness result (where one has arbitrary
sets of hypotheses). The best one can hope for is weak completeness: ` ' if and only
if ˆ '. Also, in contrast to our translations results for L.Œ˛/, the larger language
L.Œ˛; / cannot be translated into L or even to L. / (modal logic with extra
modalities B ). We prove this in Theorem 2 below. So completeness results for
L.Œ˛; / cannot simply be based on translation.
Our logical system is listed in Fig. 38.2 above. We discussed the fragment of the
system which does not have the axioms and rules in section “A Logic for L.Œ˛/”.
The C -normality Axiom and C -necessitation Rule are standard, as is the Mix
Axiom. We leave checking their soundness to the reader. The key features of the
system are thus the Composition Axiom and the Action Rule. We begin with the
Action Rule, restated below:
The Action Rule Let be a sentence, and let C be a set of agents. Let there be
sentences ˇ for all ˇ such that ˛ !C ˇ (including ˛ itself), and such that
1. ` ˇ ! Œˇ .
2. If A 2 C and ˇ !
A
, then ` .ˇ ^ PRE.ˇ// ! A
.
From these assumptions, infer ` ˛ ! Œ˛C .
Remark. WeSuse !C as an abbreviation for the reflexive and transitive closure of
the relation A2C ! A
. Recall that there are only finitely many ˇ such that ˛ !C ˇ,
since each is determined by a world of the same finite Kripke frame that determines
˛. So even though the Action Rule might look like it takes infinitely many premises,
it really only takes finitely many.
Another point: if one so desires, the Action Rule could be replaced by a (more
complicated) axiom scheme which we will not state here.
Lemma 1. hW; wi ˆ h˛i ÞC ' iff there is a sequence of worlds from W
such that Ai 2 C and hW; wi i ˆ PRE.˛i / for all 0 i k, and hW; wk i ˆ h˛k i'.
Remark. The case k D 0 just says that hW; wi ˆ h˛i ÞC ' is implied by hW; wi ˆ
h˛i'.
Proof. hW; wi ˆ h˛i ÞC ' iff hW; wi ˆ PRE.˛/ and hW ˛ ; .w; ˛/i ˆ ÞC '; iff
hW; wi ˆ PRE.˛/ and there is a sequence in W ˛ ,
794 A. Baltag et al.
where k 0 and each Ai 2 C, and also a sequence of actions of length k, with the
same labels,
such that hW; wi i ˆ PRE.˛i / for all 0 < i k, and hW; wk i ˆ h˛k i: . If k D 0,
we have hW; wi ˆ h˛i: . But since ` ˛ ! Œ˛ , we have hW; wi ˆ Œ˛ . This
is a contradiction.
Now we argue the case k > 0. We show by induction on 1 i k that hW; wi i ˆ
˛i ^ Œ˛i . In particular, hW; wk i ˆ Œ˛k . This is a contradiction.
We close with a discussion of the Composition Rule, beginning with a general
definition.
Definition. Let ˛ D hK; ki and ˇ D hL; li be actions. Then the action composition
˛ ı ˇ is the action defined as follows. Consider the product set K L. We turn this
into a Kripke frame using the restriction of the product arrows. We get an action
structure by setting
0
PRE ..k ; l0 // D PRE .k
0
/ ^ ŒhK; k0 iPRE .l0 /:
Proof. Let ˛ D hK; ki and ˇ D hL; li. For (1), note that the worlds of .W ˛ /ˇ are
of the form ..w; k0 /; l0 /, where .w; k0 / 2 W ˛ and hW ˛ ; .w; k0 /i ˆ PRE.l0 /. For such
38 The Logic of Public Announcements, Common Knowledge, and Private Suspicions 795
..w; k0 /; l0 /, hW; wi ˆ PRE.k0 / and hW; wi ˆ ŒhK; k0 iPRE.l0 /. That is, .w; .k0 ; l0 // 2
W ˛ıˇ . The converse is similar, and the rest of the isomorphism properties are easy.
Part (2) follows from (1). We use the obvious isomorphism ..k; l/; m/ 7!
.k; .l; m// in part (3). We use the Composition and Œ˛-necessitation axioms to show
that this isomorphism preserves the PRE function up to logical equivalence. Part (4)
is easy, using the fact that ˆ Œ' $ '.
and completeness can be obtained by an elaboration of the work which we shall do.
We did not present this work, mostly because adding the Composition Axiom leads
to shorter proofs.
This completes the discussion of the axioms and rules of our logical system for
L.Œ˛; /.
In this section, we prove the completeness of the logical system for L.Œ˛; /.
Section “Some Syntactic Results” has some technical results which culminate
in the Substitution Lemma 3. This is used in some of our work on normal
forms in the Appendix, and that work figures in the completeness theorem of
section “Completeness”.
Proof. Part (1) follows easily from the Mix Axiom and modal reasoning. For
part (2), we start with a consequence of the Mix Axiom: ` C ! A C .
Then by modal reasoning, ` Œ˛C ! Œ˛A C . By the Action-Knowledge
Axiom, we have ` Œ˛C ^ PRE .˛/ ! A ŒˇC .
Definition. Let ˛ and ˛ 0 be actions. We write ` ˛ $ ˛ 0 if ˛ and ˛ 0 are based on
the same Kripke frame W and the same world w, and if for all v 2 W, ` PRE.v/ $
PRE 0 .v/, where PRE is the announcement function for ˛, and PRE 0 for ˛ 0 .
the result for . We use the Action Rule to show that ` Œ˛C ! Œ˛ 0 C . For
each ˇ 0 , we let ˇ0 be ŒˇC , where ˇ is such that ` ˇ $ ˇ 0 . We need to show
that for all relevant ˇ 0 and
0 ,
(a) ` ŒˇC ! Œˇ 0 ; and
(b) If ˇ 0 !
A
0 , then ` ŒˇC ^ PRE.ˇ 0 / ! A Œ
C .
For (a), we know from Lemma 1, part (1) that ` ŒˇC ! Œˇ . By induction
hypothesis on , ` Œˇ $ Œˇ 0 . And this implies (a). For (b), Lemma 1, part (2)
tells us that under the assumptions,
Completeness
f .p/ D fpg
f .:'/ D f .'/ [ f:'g
f .' ^ / D f .'/ [ f . / [ f' ^ g
f .A '/ D f .'/ [ fA 'g
f .B '/ D f .'/ [ fB 'g [ fA B ' W A 2 Bg
f .Œ˛C '/ D fA ŒˇC ' W ˛ !C ˇ & A 2 Cg
[ fŒˇC ' W ˛ !C ˇ & A 2 Cg
S
[ ff ./ W .9ˇ/ ˛ !C ˇ & 2 s.PRE.ˇ//g
[ f .C '/
S
[ ff .Œˇ'/ W ˛ !C ˇg
For ' not in normal form, let f .'/ D f .nf .'//. (Note that we need to define f on
sentences which are not normal forms, because f .Œˇ / figures in f .Œ˛C '/. Also,
the definition makes sense because the calls to f on the right-hand sides are all <
the arguments on the left-hand sides, and since nf .'/ ' for all '; see Lemma 4.)
Lemma 5. For all ':
1. f .'/ is a finite set of normal form sentences.
2. nf .'/ 2 f .'/.
3. If 2 f .'/, then f . / f .'/.
4. If 2 f .'/, then s. / f .'/.
5. If Œ
C 2 f .'/,
!C ı, and A 2 C, then f .'/ also contains A ŒıC ,
ŒıC , PRE.ı/, and nf .Œı/.
Proof. All of the parts are by induction on ' in the well-order <. For part (1),
note that if Œ˛C is a normal form, then each sentence A ŒˇC and all
subsentences of this sentence are normal forms. For part (2), note that when ' is
a normal form, ' 2 f .'/.
In part (3), we only need to consider ' in normal form. The result is immediate
when ' is an atomic sentence p. The induction steps for :, ^, and A are easy.
For B ', note that since ' < B ', our induction hypothesis implies the result for
'; we verify it for B '. The only interesting case is when is A B ' for some
A 2 B. And in this case
To complete part (3), we consider Œ˛C '. If there is some < Œ˛C ' such
that 2 f ./ and f ./ f .Œ˛C '/, then we are easily done by the induction
hypothesis. This covers all of the cases except for D ŒˇC ' and D
A ŒˇC '. For the first of these, we use the transitivity of !C to check that
f .ŒˇC '/ f .Œ˛C '/. And now the second case follows:
f .A ŒˇC '/ D f .ŒˇC '/ [ fA ŒˇC 'g f .Œ˛C '/:
38 The Logic of Public Announcements, Common Knowledge, and Private Suspicions 799
Part (4) is similar to part (3), using equation (38.4) at the beginning of this
subsection.
For part (5), assume that Œ
C 2 f .'/. By part (1), Œ
C is a normal form.
We show that A ŒıC , ŒıC , PRE.ı/, and nf .Œı/ all belong to f .Œ
C /,
and then use part (3). The first two of these sentences are immediate by the definition
of f ; the third one follows from part (4); and the last comes from part (2) since
nf .Œı/ 2 f .Œı/ f .Œ
C .
The set D .'/ Fix a sentence '. We set D f .'/ (i.e., we drop ' from the
notation). This set is the version for our logic of the Fischer-Ladner closure of '.
Let D f 1 ; : : : ; n g. Given a maximal consistent set U of L.Œ˛; /, let
ŒŒU D C 1 ^ ^ C n;
where the signs are taken in accordance with membership in U. That is, if i 2 U,
then is a conjunct of ŒŒU; but if i … U, then : i is a conjunct.
Two (standard) observations are in order. Notice that if ŒŒU ¤ ŒŒV, then ŒŒU ^
ŒŒV is inconsistent. Also, for all 2 ,
_
` $ fŒŒW W W is maximal consistent and 2 Wg: (38.5)
and
_
`: $ fŒŒW W W is maximal consistent and : 2 Wg: (38.6)
A
ŒU ! ŒV in F iff whenever A 2 U \ , then also 2V: (38.7)
nf .Œ˛k / Œ˛k . By Lemma 4, nf .Œ˛k / Œ˛k < Œ˛C . Since the path
is good, Uk contains h˛k i: and hence :Œ˛k . It also must contain the normal
form of this, by Lemma 4. So by induction hypothesis, hF ; ŒUk i ˆ nf .:Œ˛k /.
By soundness, hF ; ŒUk i ˆ h˛k i: . Now it does follow from Lemma 1 that
hF ; ŒUi ˆ h˛i ÞC : .
Going the other way, suppose that hF ; ŒUi ˆ h˛i ÞC : . By Lemma 1, we get
a path in F witnessing this. The argument of the previous paragraph shows that this
path is a good path from ŒU for h˛i ÞC : . By Lemma 6, U contains h˛i ÞC : .
This completes the proof.
Theorem 9 (Completeness). For all ', ` ' iff ˆ '. Moreover, this relation is
decidable.
Proof. By Lemma 4, ` ' $ nf .'/. Let ' be consistent. By the Truth Lemma,
nf .'/ holds at some world in the filtration F . So nf .'/ has a model; thus ' has
one, too. This establishes completeness. For decidability, note that the size of the
filtration is computable in the size of the original '.
Two Extensions
In this section, we present two results which show that adding announcements to
modal logic with Þ adds expressive power as does adding private announcements
to modal logic with Þ and public announcements. To show these results it will
38 The Logic of Public Announcements, Common Knowledge, and Private Suspicions 803
be sufficient to take the set A of agents to be fA; Bg and consider only languages
contained in a language built-up from the atomic sentences p and q, using ÞA , ÞB ,
ÞA , ÞB , and ÞAB , and the actions Œ'A , Œ'B of announcing ' to A or B privately, and
Œ'AB the action of announcing ' to A and B publicly. Let Lall stand for this language.
We use here the customary notation (Œ'A , Œ'B , Œ'AB ) for announcements, but Œ'A
is simply the action with the Kripke structure K D fkg with ! A
from k to k and
PRE .k/ D '. We think of Œ'B similarly. Œ'AB is the action with the Kripke structure
K D fkg with ! A
and !B
going from k to k and PRE.k/ D '.
We need to define a rank j'j on sentences from Lall . Let jpj D 0 for p atomic,
j:'j D j'j, j' ^ j D max.j'j; j j/, j:'j D j'j, j ÞX 'j D 1 C j'j, for X D A
or X D B, j ÞX 'j D 1 C j'j for X D A, X D B, or X D AB, and jŒ'X j D
max.j'j; j j/ for X D A, X D B, or X D AB.
First we present a lemma which allows us, in certain circumstances, to do the
following: from the existence of a sentence in a language L1 which is not equivalent
to any sentence in a language L0 infer that there exists a sentence in L1 not
equivalent to any theory in L0 .
Lemma 1. Let L0 be a language included in Lall , and let be a sentence in Lall .
Assume that for each n we have models Fn and Gn with some worlds fn 2 Fn and
gn 2 Gn such that hFn ; fn i satisfies : , hGn ; gn i satisfies , and hFn ; fn i and hGn ; gn i
agree on all sentences in L0 of rank n. Then ÞA is not equivalent with any
theory in L0 .
Proof.
L For a sequence of model-world pairs hHn ; hn i, n 2 D !, we let
n2D .H n ; hn / be a model-world pair defined as follows. Let h be a new world. Take
disjoint copies of the Hn ’s and add an A-arrow from h to each hn . All other arrows
are within the Hn ’s and stay the same as in Hn . No atomic sentencesL are true at h.
Atomic sentences true in the worlds belonging to the copy of Hn in n2D .Hn ; hn /
are precisely L those true at the corresponding worlds of Hn .
Let F be Ln2! .Fn ; fn / with the new world denoted by f . Define also F m , for
m 2 !, to be n2! .Hn ; hn / with the new world f m where Hm D Gm , hm D gm and
for all n 6D m, Hn D Fn and hn D fn
Now assume towards a contradiction that ÞA is equivalent with a theory ˆ in
L0 . Clearly ÞA fails in hF; f i. Thus some sentence ' 2 ˆ fails in hF; f i. On the
other hand, each hF m ; f m i satisfies ÞA , whence hF m ; f m i satisfies '. Let m0 D j'j.
The following claim shows that both hF; f i and hF m0 ; f m0 i make ' true or both of
them make it false, which leads to a contradiction.
Claim. Let ' be a sentence in ˆ of rank m. Let Hn , Kn , n 2 D, with hn 2 Hn and
kn 2 Kn be models
L such that hHn ;L
hn i and hKn ; kn i agree on sentences in ˆ of rank
m. Then h n .Hn ; hn /; hi and h n .Kn ; kn /; ki agree on '.
This claim is proved by induction on complexity of '. It is clear for atomic
sentences. The induction steps for boolean connectives are trivial. A moment of
thought gives the induction step for Þ and Þ with various subscripts. It remains
to consider the case when ' D Œ'1 A '2 . (The cases when ' D Œ'1 B '2 and ' D
Œ'1 AB '2 are similar.) Fix Hn , Kn , hn 2 Hn , kn 2 Kn , with n 2 D, such that hHn ; hn i
804 A. Baltag et al.
However,
and
In the result below, there will be only one agent A, and so we omit the letter A from
the notation. We let L.Œ ; Þ / be modal logic with announcements (to this A) and
Þ D ÞA . We also let L.Þ / be the obvious sublanguage.
Theorem 2. There is a sentence of L.Œ ; Þ / which cannot be expressed by any set
of sentences of L.Þ /.
Proof. We show first that Œp ÞC q D Œp Þ Þ q cannot be expressed by any single
sentence of L. /. (Incidentally, the same holds for Œp Þ q.) Fix a natural number
n. We define structures A D An and B D Bn as follows. First B has 2n C 3 points
arranged cyclically as
For the atomic sentences, we set p true at all points except n C 1, and q true only at
0.
The structure A is a copy of B with n more points 1; : : : ; n arranged as
0 ! 1 ! ! n ! 0:
The shape of A is a figure-8. In both structures, every point is reachable from every
point by the transitive closure of the ! relation. At the points i, p is true and q is
false. Notice that 1 ˆ Œp ÞC q in A, but 1 6ˆ Œp ÞC q in B.
38 The Logic of Public Announcements, Common Knowledge, and Private Suspicions 805
In this section, L.Œ AB ; Þ / denotes the set of sentences built from p using Œ'AB ,
ÞA , ÞB , ÞA , ÞB , and ÞAB . L.Œ A ; ÞA / denotes the set built from p using Œ'A , ÞA ,
and ÞB .
Theorem 3. There is a sentence of L.Œ A ; ÞA / which cannot be expressed by any
set of sentences in L.Œ AB ; Þ /.
Proof. We consider ŒpA ÞA ÞB :p.
A
Let Gn be the following model. We begin with a cycle in ! :
A
a1 ! A
a1 ! A
b ! A
an ! an 1
A
! !
A A
a2 ! a1 (38.8)
We add edges ai ! A
b for all i (including i D 1), and also x ! A
a1 for all x (again
including x D a1 ). The only ! edge is a1 ! b. The atomic sentence p is true at
B B
Claim. Assume that 1 < j n, ' 2 L.Œ AB ; Þ / and j'j < j. Then hGn ; aj i ˆ '
iff hGn ; a1 i ˆ '.
The proof is by induction on '. For ' D p, the result is clear, as are the induction
steps for : and ^. For ÞA ', suppose that aj ˆ ÞA '. Either a1 ˆ ', in which case
a1 ˆ ÞA ', or else aj 1 ˆ '. In the latter case, by induction hypothesis, a1 ˆ ';
whence a1 ˆ ÞA '. The converse is similar.
The case of ÞB ' is trivial: aj ˆ : ÞB ' and a1 ˆ : ÞB '.
For ÞA ', note that since we have a cycle (38.8) containing all points, the truth
value of ÞA ' does not depend on the point. The cases of ÞB ' and ÞAB ' are similar.
For Œ'AB , assume the result for ' and , and let jŒ'AB j < j. Then also
j'j < j and j j < j. Let H D fx W x ˆ 'g be the updated model, and recall that
hGn ; xi ˆ Œ'AB iff x 2 H and hH; xi ˆ . We have two cases: First, H D Gn .
Then hGn ; xi ˆ Œ'AB iff hGn ; xi ˆ . So we are done by the induction hypothesis.
The other case is when there is some x … H. If ak … H for some k j or for
k D 1, then all these ak do not belong to H. In particular, neither aj nor a1 belong.
And so both aj and a1 satisfy Œ'AB . If b … H, then H is bisimilar to a one-point
model. This is because every ai 2 H would have some ! A
-successor in H (e.g.,
a1 ), and there would be no ! edges. So we assume b 2 H. Thus ai … H for
B
some i < j. Let k be least so that for k l 1, al ˆ '. Then 1 < k j. Let
Ak D fal W k l 1g. The submodels generated by aj and a1 contain the
same worlds: all worlds in Ak and b. We claim that .Ak Ak / [ fhb; big is a
bisimulation on H. The verification here is easy.
So in H, aj and a1 agree on all sentences in any language which is invariant for
bisimulation. Now L.Œ AB ; Þ / has this property (as do all the languages which we
study: they are translatable into infinitary modal logic). In particular, hH; aj i ˆ
iff hH; a1 i ˆ . This concludes the claim.
We get Theorem 3 directly from the claim, the observation that hGn ; an i ˆ and
hGn ; a1 i ˆ :, and Lemma 1.
We feel that our two results on expressive power are just a sample of what
could be done in this area. We did not investigate the next natural questions: Do
announcements with suspicious outsiders extend the expressive power of modal
logic with all secure private announcements and common knowledge operators?
And then do announcements with common knowledge of suspicion add further
expressive power?
The work of this paper builds on the long tradition of epistemic logic as well as
technical results in other areas. In recent times, one very active arena for work on
knowledge is distributed systems, and the main source of work in recent times on
knowledge in distributed systems is the book Reasoning About Knowledge (Fagin
et al. 1995) by Fagin, Halpern, Moses, and Vardi. We depart from Fagin et al.
(1995) by introducing the new operators for epistemic actions, and by doing without
808 A. Baltag et al.
temporal logic operators. In effect, our Kripke models are simpler, since they do
not incorporate all of the runs of a system; the new operators can be viewed as a
compensation for that. We have not made a detailed comparison of our work with
the large body of work on knowledge on distributed systems, and such a comparison
would require both technical and conceptual results. On the technical side, we
suspect that neither framework is translatable into the other. One way to show this
would be by expressivity results. Perhaps another way would use complexity results.
In this direction, we note that Halpern and Vardi (1989) examines 96 logics of
knowledge and time. Thirty-two of these contain common knowledge operators, and
of these, all but twelve of these are undecidable. But overall, our logics are based on
differing conceptual points and intended applications, and so we are confident that
they differ.
As far as we know, the first paper to study the interaction of communication and
knowledge in a formal setting is Plaza’s paper “Logics of Public Communications”
(Plaza 1989). As the title suggests, the epistemic actions studied are announcements
to the whole group, as in our ˛ and ˛ 0 . Perhaps the main result of the paper is
a completeness theorem for the logic of public announcements and knowledge.
This result is closely related to a special case of our Theorem 5. The difference
is that Plaza restricts attention to the case when all of the accessibility relations
are equivalence relations. Incidentally, Plaza’s proof involves a translation to multi-
modal logic, just as ours does. In addition to this, Plaza (1989) contains a number of
results special to the logic of announcements which we have not generalized, and it
also studies an extension of the logic with non-rigid constants.
Other predecessors to this paper are the papers of Gerbrandy (1999a,b) and
Gerbrandy and Groeneveld (1997). These study epistemic actions similar to our ˇ,
where an announcement is made to set of agents in a private way with no suspicions.
They presented a logical system which included the common knowledge operators.
An important result is that all of the reasoning in the original Muddy Children
scenario can be carried out in their system. This shows that in order to get a formal
treatment of the problem, one need not posit models which maintain histories. They
did not obtain the completeness/decidability result for their system, but it would be
the version of Theorem 9 restricted to actions which are compositions of private
announcements. So it follows from our work that all of the reasoning in the Muddy
Children can be carried out in a decidable system.
We should mention that the systems studied in Gerbrandy (1999a,b) and Ger-
brandy and Groeneveld (1997) differ from ours in that they are variants of dynamic
logic rather than propositional logic. That is, announcements are particular types of
programs as opposed to modalities. This is a natural move, and although we have
not followed it in this paper, we have carried out a study of expressive power issues
of various fragments of a dynamic logic with announcement operators. We have
shown, for example, that the dynamic logic formulations are more expressive than
the purely propositional ones. Details on this will appear in a forthcoming paper.
Incidentally, the semantics in Gerbrandy (1999a,b) and Gerbrandy and Groen-
eveld (1997) use non-wellfounded sets. In other words, they work with models
modulo bisimulation. The advantages of moving from these to arbitrary Kripke
38 The Logic of Public Announcements, Common Knowledge, and Private Suspicions 809
models are that the logic can be used by those who do not know about non-
wellfounded sets, and also that completeness results are slightly stronger with a
more general semantics. The relevant equivalence of the two semantics is the subject
of the short note (Moss 1999).
The following are the new contributions of this paper:
1. We formulated a logical system with modalities corresponding to intuitive
group-level epistemic actions. These actions include natural formalizations of
announcements such as
and ı, which allow various types of suspicion by out-
siders. Our apparatus also permits us to study epistemic actions which apparently
have not yet been considered in this line of work, such as actions in which nothing
actually happens but one agent suspects that a secret communication took place.
2. We formulated a logical system with these modalities and with common knowl-
edge operators for all groups. Building on the completeness of PDL and using a
bit of term rewriting theory, we axiomatized the validities in our system.
3. We obtained some results on expressive power: in the presence of common
knowledge operators, it is not possible to translate away public announcements,
and in our framework, private announcements add expressive power to public
ones.
In this appendix, we give the details on the lexicographic path ordering (LPO), both
in general and in connection with L.Œ˛/ and L.Œ˛; /.
Fix some many-sorted signature † of terms. In order to define the LPO < on the
†-terms, we must first specify a well-order < on the set of function symbols of †.
The LPO determined by such choices is the smallest relation < such that:
(LPO1) If .t1 ; : : : ; tn / < .s1 ; : : : ; sn / in the lexicographic ordering on n-tuples,
and if tj < f .s1 ; : : : ; sn / for 1 j n, then f .t1 ; : : : ; tn / < f .s1 ; : : : ; sn /.
(LPO2) If t si for some i, then t < f .s1 ; : : : ; sn /.
(LPO3) If g < f and ti < f .s1 ; : : : ; sn / for all i m, then g.t1 ; : : : ; tm / <
f .s1 ; : : : ; sn /.
Here is how this is applied in this paper. We shall take two sorts: sentences and
actions. Our signature contains the usual sentence-forming operators p (for p 2
AtSen) :, ^, and A for all A 2 A. Here each p is 0-ary, : and A are unary, and
^ is binary. We also have an operator app taking actions and sentences to sentences.
We think of app. ; ˛/ as merely a variation on Œ˛ . (The order of arguments to app
is significant.) We further have a binary operator ı on actions. (This is a departure
from the treatment of this paper, since we used ı as a metalinguistic abbreviation
instead of as a formal symbol. It will be convenient to make this change because this
leads to a smoother treatment of the Composition Axiom.) Finally, for each finite
Kripke frame K over L.Œ˛/ and each 1 i jKj, we have a symbol FiK taking jKj
sentences and returning an action.
810 A. Baltag et al.
Each sentence ' has a formal version ' in this signature, and each action ˛ also
has a formal version ˛. These are defined by the recursion which is obvious except
for the clauses
Œ˛' D app.'; ˛/
˛ D FiK .PRE.k1 /; : : : ; PRE .kn //
4. < is wellfounded.
5. Consider a term rewriting system every rule of which of the form l r with r <
l. Then the system is terminating: there are no infinite sequences of rewritings.
Proof. Here is a sketch for part (1): We check by induction on the construction of
the least relation < that if s < t, then for all u such that t < u, s < u. For this, we
use induction on the term u. We omit the details. Further, (2) follows easily from (1)
and (LPO2), and (3) from (LPO1), (1) and (2). Moreover, (5) follows easily from (4)
and (3), since the latter implies that any replacement according to the rewrite system
results in a smaller term in the order <.
Here is a proof of of the wellfoundedness property (4), taken from on Buchholz
(1995). (We generalized it slightly from the one-sorted to the many-sorted setting
and from the assumption that < is a finite linear order on † to the assumption that
< is any wellfounded order.)
Let W be the set of terms t such that the order < is wellfounded below t. W is
then itself wellfounded under <. So for all n, W n is wellfounded under the induced
lexicographic order. We prove by induction on the given wellfounded relation on
function symbols of † that for all n-ary f , f ŒW n W. So assume that for g < f , say
with arity m, gŒW m W. We check this for f by using induction on W n . Fix Es 2 W n ,
and assume that whenever uE < Es in W n , that f .Eu/ 2 W. We prove that f .Es/ 2 W by
checking that for all t such that t < f .Es/, t 2 W. And this is done by induction on the
structure of t. If t D f .Eu/ < f .Es/ via (LPO1), then uE < Es lexicographically, and each
38 The Logic of Public Announcements, Common Knowledge, and Private Suspicions 811
Proof. It is immediate that every modal sentence is a normal form in L.Œ˛/, that
every Œ˛C ' is a normal form in L.Œ˛; /, and that if each PRE.ˇ/, with ˛ !
ˇ,
is a normal form, then ˛ is a normal form action. Going the other way, we check that
if ' 2 L.Œ˛/, Œ˛' is not a normal form. So we see by an easy induction that the
normal forms of L.Œ˛/ are exactly the modal sentences. We also argue by induction
for L.Œ˛; /, and we note that every Œ˛Œˇ' is not a normal form, using the rule
Œ˛Œˇ' Œ˛ ı ˇ'.
One fine point concerning R and our work in section “A Logic for L.Œ˛/” is that
to reduce sentences of L.Œ˛/ to normal form we may restrict ourselves to rewriting
sentences which are not subterms of actions. This simplification accounts for the
differences between parallel results of sections “A Logic for L.Œ˛/” and “A Logic
for L.Œ˛; /”.
Acknowledgements We thank Jelle Gerbrandy and Rohit Parikh for useful conversations on this
work. An earlier version of this paper was presented at the 1998 Conference on Theoretical Aspects
of Rationality and Knowledge.
References
Introduction
This paper contributes to the recent and on-going work in the logical community
(Aucher 2003; Baltag and Sadrzadeh 2006; Baltag and Smets 2006a,b,c; van
Benthem 2007; van Ditmarsch 2005) on dealing with mechanisms for belief
revision and update within the Dynamic-Epistemic Logic (DEL) paradigm. DEL
originates in the work of Gerbrandy and Groeneveld (1997) and Gerbrandy (1999),
anticipated by Plaza in (1989), and further developed by numerous authors Baltag
et al. (1998), Gerbrandy (1999), van Ditmarsch (2000, 2002), Baltag (2002), Kooi
(2003), Baltag and Moss (2004), and van Benthem et al. (2006a,b) etc. In its
standard incarnation, as presented e.g., in the recent textbook by van Ditmarsch
et al. (2007), the DEL approach is particularly well fit to deal with complex multi-
agent learning actions by which groups of interactive agents update their beliefs
(including higher-level beliefs about the others’ beliefs), as long as the newly
received information is consistent with the agents’ prior beliefs. On the other hand,
the classical AGM theory and its more recent extensions have been very successful
in dealing with the problem of revising one-agent, first-level (factual) beliefs when
they are contradicted by new information. So it is natural to look for a way to
combine these approaches.
1
Or “doxastic events”, in the terminology of van Benthem (2007).
2
To verify that a higher-level belief about another belief is “true” we need to check the content
of that higher-level belief (i.e., the existence of the second, lower-level belief) against the “real
world”. So the real world has to include the agent’s beliefs.
39 Dynamic Interactive Belief Revision 815
(1992): we completely neglect here the ontic changes,3 considering only the changes
induced by “purely doxastic” actions (learning by observation, communication,
etc.).
Our formalism for “static” revision can best be understood as a modal-logic
implementation of the well-known view of belief revision in terms of conditional
reasoning (Stalnaker 1968, 2006). In Baltag and Smets (2006a,c), we introduced
two equivalent semantic settings for conditional beliefs in a multi-agent epistemic
context (conditional doxastic models and epistemic plausibility models), taking the
first setting as the basic one. Here, we adopt the second setting, which is closer to
the standard semantic structures used in the literature on modeling belief revision
(Board 2002; Friedmann and Halpern 1994; Grove 1988; Spohn 1988; Stalnaker
2006; van Benthem 2007, 2004). We use this setting to define notions of knowledge
Ka P, belief Ba P and conditional belief BQ a P. Our concept of “knowledge” is
the standard S5-notion, partition-based and fully introspective, that is commonly
used in Computer Science and Economics, and is sometimes known as “Aumann
knowledge”, as a reference to Aumann (1999). The conditional belief operator
is a way to “internalize”, in a sense, the “static” (AGM) belief revision within
a modal framework: saying that, at state s, agent a believes P conditional on Q
is a way of saying that Q belongs to a’s revised “theory” (capturing her revised
beliefs) after revision with P (of a’s current theory/beliefs) at state s. Our conditional
formulation of “static” belief revision is close to the one in Stalnaker (1968), Ryan
and Schobbens (1997), Board (2002), Bonanno (2005), and Rott (1989). As in Board
(2002), the preference relation is assumed to be well-preordered; as a result, the
logic CDL of conditional beliefs is equivalent to the strongest system in Board
(2002).
We also consider other modalities, capturing other “doxastic attitudes” than just
knowledge and conditional belief. The most important such notion expresses a form
of “weak (non-introspective) knowledge” a P, first introduced by Stalnaker in his
modal formalization (Stalnaker 1968, 2006) of Lehrer’s defeasibility analysis of
knowledge (Lehrer 1990; Lehrer and Paxson 1969). We call this notion safe belief,
to distinguish it from our (Aumann-type) concept of knowledge. Safe belief can
be understood as belief that is persistent under revision with any true information.
We use this notion to give a new solution to the so-called “Paradox of the Perfect
Believer”. We also solve the open problem posed in Board (2002), by providing a
complete axiomatization of the “static” logic K of conditional belief, knowledge
and safe belief. In a forthcoming paper, we apply the concept of safe belief to
Game Theory, improving on Aumann’s epistemic analysis of backwards induction
in games of perfect information.
Moving thus on to dynamic belief revision, the first thing to note is that (unlike the
case of “static” revision), the doxastic features of the actual “triggering event” that
induced the belief change are essential for understanding this change (as a “dynamic
3
But our approach can be easily modified to incorporate ontic changes, along the lines of van
Benthem et al. (2006b).
816 A. Baltag and S. Smets
revision”, i.e., in terms of the revised beliefs about the state of the world after
revision). For instance, our beliefs about the current situation after hearing a public
announcement (say, of some factual information, denoted by an atomic sentence
p) are different from our beliefs after receiving a fully private announcement with
the same content p. Indeed, in the public case, we come to believe that p is now
common knowledge (or at least common belief ). While, in the private case, we come
to believe that the content of the announcement forms now our secret knowledge.
So the agent’s beliefs about the learning actions in which she is currently engaged
affect the way she updates her previous beliefs.
This distinction is irrelevant for “static” revision, since e.g., in both cases above
(public as well as private announcement) we learn the same thing about the situation
that existed before the learning: our beliefs about that past situation will change
in the same way in both cases. More generally, our beliefs about the “triggering
action” are irrelevant, as far as our “static” revision is concerned. This explains a fact
observed in van Benthem (2007), namely that by and large, the standard literature
on belief revision (or belief update) does not usually make explicit the doxastic
events that “trigger” the belief change (dealing instead only with types of abstract
operations on beliefs, such as update, revision and contraction etc). The reason for
this lies in the “static” character of AGM revision, as well as its restriction (shared
with the “updates” of Katsuno and Mendelzon 1992) to one-agent, first-level, factual
beliefs.
A “truly dynamic” logic of belief revision has to be able to capture the doxastic-
epistemic features (e.g., publicity, complete privacy etc.) of specific “learning
events”. We need to be able to model the agents’ “dynamic beliefs”, i.e., their beliefs
about the learning action itself : the appearance of this action (while it is happening)
to each of the agents. In Baltag and Moss (2004), it was argued that a natural way to
do this is to use the same type of formalism that was used to model “static” beliefs:
epistemic actions should be modeled in essentially the same way as epistemic states;
and this common setting was taken there to be given by epistemic Kripke models.
A similar move is made here in the context of our richer doxastic-plausibility
structures, by introducing plausibility pre-orders on actions and developing a notion
of “action plausibility models”, that extends the “epistemic action models” from
Baltag and Moss (2004), along similar lines to (but without the quantitative features
of) the work in Aucher (2003) and van Ditmarsch (2005).
Extending to (pre)ordered models the corresponding notion from Baltag and
Moss (2004), we introduce an operation of product update of such models, based
on the anti-lexicographic order on the product of the state model with the action
model. The simplest and most natural way to define a connected pre-order on a
Cartesian product from connected pre-orders on each of the components is to use
either the lexicographic or the anti-lexicographic order. Our choice is the second,
which we regard as the natural generalization of the AGM theory, giving priority to
incoming information (i.e., to “actions” in our sense). This can also be thought of
as a generalization of the so-called “maximal-Spohn” revision. We call this type of
update rule the “Action-Priority” Update. The intuition is that the beliefs encoded
in the action model express the “incoming” changes of belief, while the state model
39 Dynamic Interactive Belief Revision 817
only captures that past beliefs. One could say that the new “beliefs about actions” are
acting on the prior “beliefs about states”, producing the updated (posterior) beliefs.
This is embedded in the Motto that we give in the paragraph on “Action Models” in
the third section, the Motto is: “beliefs about changes encode (and induce) changes
of beliefs”.
By abstracting away from the quantitative details of the plausibility maps when
considering the associated dynamic logic, our approach to dynamic belief revision
is in the spirit of the one in van Benthem (2007): instead of using “graded belief”
operators as in e.g., Aucher (2003) and van Ditmarsch (2005), or probabilistic modal
logic as in Kooi (2003), both our account and the one in van Benthem (2007)
concentrate on the simple, qualitative language of conditional beliefs, knowledge
and action modalities (to which we add here the safe belief operator). As a
consequence, we obtain simple, elegant, general logical laws of dynamic belief
revision, as natural generalizations of the ones in van Benthem (2007). These
“reduction laws” give a complete axiomatization of the logic of doxastic actions,
“reducing” it to the “static” logic K. Compared both to our older axiomatization
in Baltag and Smets (2006c) and to the system in Aucher (2003), one can easily see
that the introduction of the safe belief operator leads to a major simplification of the
reduction laws.
Our qualitative logical setting (in this paper and in Baltag and Smets 2006a,b,c),
as well as the closely related setting in van Benthem (2007), are conceptually very
different from the more “quantitative” approaches to dynamic belief revision taken
in (Aucher 2003; van Ditmarsch 2005; van Ditmarsch and Labuschagne 2007),
approaches based on “degrees of belief” given by ordinal plausibility functions.
This is not just a matter of interpretation, but it makes a difference for the choice
of dynamic revision operators. Indeed, the update mechanisms proposed in Spohn
(1988), Aucher (2003), and van Ditmarsch (2005) are essentially quantitative, using
various binary functions in transfinite ordinal arithmetic, in order to compute the
degree of belief of the output-states in terms of the degrees of the input-states and
the degrees of the actions. This leads to an increase in complexity, both in the
computation of updates and in the corresponding logical systems. Moreover, there
seems to be no canonical choice for the arithmetical formula for updates, various
authors proposing various formulas. No clear intuitive justification is provided to
any of these formulas, and we see no transparent reason to prefer one to the others.
In contrast, classical (AGM) belief revision theory is a qualitative theory, based on
natural, intuitive postulates, of great generality and simplicity.
Our approach retains this qualitative flavor of the AGM theory, and aims to
build a theory of “dynamic” belief revision of equal simplicity and naturality as the
classical “static” account. Moreover (unlike the AGM theory), it aims to provide a
“canonical” choice for a dynamic revision operator, given by our “Action Priority”
update. This notion is a purely qualitative one,4 based on a simple, natural relational
4
One could argue that our plausibility pre-order relation is equivalent to a quantitative notion (of
ordinal degrees of plausibility, such as in Spohn (1988)), but unlike in Aucher (2003) and van
818 A. Baltag and S. Smets
definition. From a formal point of view, one might see our choice of the anti-
lexicographic order as just one of the many possible options for developing a belief-
revision-friendly notion of update. As already mentioned, it is a generalization of the
“maximal-Spohn” revision, already explored in van Ditmarsch (2005) and Aucher
(2003), among many other possible formulas for combining the “degrees of belief”
of actions and states. But here we justify our option, arguing that our qualitative
interpretation of the plausibility order makes this the only reasonable choice.
It may seem that by making this choice, we have confined ourselves to only one
of the bewildering multitude of “belief revision policies” proposed in the literature
by Spohn (1988), Rott (1989), Segerberg (1998), Aucher (2003), van Ditmarsch
(2005), van Benthem (2004), and van Benthem (2007). But, as argued below, this
apparent limitation is not so limiting after all, but can instead be regarded as an
advantage: the power of the “action model” approach is reflected in the fact that
many different belief revision policies can be recovered as instances of the same
type of update operation. In this sense, our approach can be seen as a change of
perspective: the diversity of possible revision policies is replaced by the diversity
of possible action models; the differences are now viewed as differences in input,
rather than having different “programs”. For a computer scientist, this resembles
“Currying” in lambda-calculus: if every “operation” is encoded as an input-term,
then one operation (functional application) can simulate all operations.5 In a sense,
this is nothing but the idea of Turing’s universal machine, which underlies universal
computation.
The title of our paper is a paraphrase of Oliver Board’s “Dynamic Interactive
Epistemology” (Board 2002), itself a paraphrase of the title (“Interactive Epistemol-
ogy”) of a famous paper by Aumann (1999). We interpret the word “interactive”
as referring to the multiplicity of agents and the possibility of communication.
Observe that “interactive” does not necessarily imply “dynamic”: indeed, Board and
Stalnaker consider Aumann’s notion to be “static” (since it doesn’t accommodate
any non-trivial belief revision). But even Board’s logic, as well as Stalnaker’s
(2006), are “static” in our sense: they cannot directly capture the effect of learning
actions (but can only express “static” conditional beliefs). In contrast, our DEL-
based approach has all the “dynamic” features and advantages of DEL: in addition
to “simulating” a range of individual belief-revision policies, it can deal with an even
wider range of complex types of multi-agent learning and communication actions.
We thus think it is realistic to expect that, within its own natural limits,6 our Action-
Priority Update Rule could play the role of a “universal machine” for qualitative
dynamic interactive belief-revision.
Ditmarsch (2005) the way belief update is defined in our account does not make any use of the
ordinal “arithmetic” of these degrees.
5
Note that, as in untyped lambda-calculus, the input-term encoding the operation (i.e., our “action
model”) and the “static” input-term to be operated upon (i.e., the “state model”) are essentially of
the same type: epistemic plausibility models for the same language (and for the same set of agents).
6
E.g., our update cannot deal with “forgetful” agents, since “perfect recall” is in-built. But finding
out what exactly are the “natural limits” of our approach is for now an open problem.
39 Dynamic Interactive Belief Revision 819
Using the terminology in van Benthem (2007) and Baltag and Smets (2006a,b,c,
2007a), “static” belief revision is about pre-encoding potential belief revisions
as conditional beliefs. A conditional belief statement BPa Q can be thought of as
expressing a “doxastic predisposition” or a “plan of doxastic action”: the agent is
determined to believe that Q was the case, if he learnt that P was the case. The
semantics for conditional beliefs is usually given in terms of plausibility models
(or equivalent notions, e.g., “spheres”, “onions”, ordinal functions etc.) As we shall
see, both (Aumann, S5-like) knowledge and simple (unconditional) belief can be
defined in terms of conditional belief, which itself could be defined in terms of a
unary belief-revision operator: a P captures all the revised beliefs of agent a after
revising (her current beliefs) with P.
In addition, we introduce a safe belief operator a P, meant to express a weak
notion of “defeasible knowledge” (obeying the laws of the modal logic S4:3). This
concept was defined in Stalnaker (2006) and Board (2002) using a higher-order
semantics (quantifying over conditional beliefs). But this is in fact equivalent to a
first-order definition, as the Kripke modality for the (converse) plausibility relation.
This observation greatly simplifies the task of completely axiomatizing the logic of
safe belief and conditional beliefs: indeed, our proof system K below is a solution
to the open problem posed in Board (2002).
To warm up, we consider first the case of only one agent, a case which fits well with
the standard models for belief revision.
A single-agent plausibility frame is a structure .S; /, consisting of a set S of
“states” and a “well-preorder” , i.e., a reflexive, transitive binary relation on S
such that every non-empty subset has minimal elements. Using the notation
for the set of -minimal elements of P, the last condition says that: For every set
P S, if P 6D ; then Min P 6D ;.
The usual reading of s t is that “state s is at least as plausible as state t”. We
keep this reading for now, though we will later get back to it and clarify its meaning.
The “minimal states” in Min P are thus the “most plausible states” satisfying
proposition P. As usual, we write s < t iff s t but t 6 s, for the “strict” plausibility
relation (s is more plausible than t). Similarly, we write s Š t iff both s t and t s,
for the “equi-plausibility” (or indifference) relation (s and t are equally plausible).
S-propositions and models. Given an epistemic plausibility frame S, an S-
proposition is any subset P S. Intuitively, we say that a state s satisfies the
proposition P if s 2 P. Observe that a plausibility frame is just a special case of a
820 A. Baltag and S. Smets
relational frame (or Kripke frame). So, as it is standard for Kripke frames in general,
we can define a plausibility model to be a structure S D .S; ; kk/, consisting of a
plausibility frame .S; / together with a valuation map kk W ˆ ! P.S/, mapping
every element of a given set ˆ of “atomic sentences” into S-propositions.
Interpretation. The elements of S will represent the possible states (or “possible
worlds”) of a system. The atomic sentences p 2 ˆ represent “ontic” (non-doxastic)
facts, that might hold or not in a given state. The valuation tells us which facts hold at
which worlds. Finally, the plausibility relations capture the agent’s (conditional)
beliefs about the state of the system; if e.g., the agent was given the information
that the state of the system is either s or t, she would believe that the system was
in the most plausible of the two. So, if s < t, the agent would believe the real state
was s; if t < s, she would believe it was t; otherwise (if s Š t), the agent would
be indifferent between the two alternatives: she will not be able to decide to believe
any one alternative rather than the other.
Propositional operators, Kripke modalities. For every model S, we have the
usual Boolean operations with S-propositions
P ^ Q WD P \ Q; P _ Q WD P [ Q;
:P WD S n P; P ! Q WD :P _ Q;
s ! t iff t 2 Min S :
We read this as saying that: when the actual state is s, the agent believes that
any of the states t with s ! t may be the actual state. This matches the above
interpretation of the preorder: the states believed to be possible are the minimal
(i.e., “most plausible”) ones.
In order to talk about conditional beliefs, we can similarly define a conditional
doxastic accessibility relation for each S-proposition P S:
s !P t iff t 2 Min P :
We read this as saying that: when the actual state is s, if the agent is given the
information (that) P (is true at the actual state), then she believes that any of the
states t with s ! t may be the actual state.
39 Dynamic Interactive Belief Revision 821
s t iff s; t 2 S :
KP WD ŒP ;
BP WD Œ!P ;
BQ P WD Œ!Q P :
There is nothing quantitative here, no need for us to refer in any way to the
“strength” of this agent’s belief: though she might have beliefs of unequal strengths,
we are not interested here in modeling this quantitative aspect. Instead, we give the
agent some information about a state of a virtual system (that it is either s or t) and
we ask her a yes-or-no question (“Do you believe that virtual state to be s ?”); we
write s < t iff the agent’s answer is “yes”. This is a firm answer, so it expresses a
firm belief. “Firm” does not imply “un-revisable” though: if later we reveal to the
agent that the state in question was in fact t, she should be able to accept this new
information; after all, the agent should be introspective enough to realize that her
belief, however firm, was just a belief.
One possible objection against this qualitative interpretation is that our postulate
that is a well-preorder (and so in particular a connected pre-order) introduces
a hidden “quantitative” feature; indeed, any such preorder can be equivalently
described using a plausibility map as in e.g., Spohn (1988), assigning ordinals to
states. Our answer is that, first, the specific ordinals will not play any role in our
definition of a dynamic belief update; and second, all our postulates can be given a
justification in purely qualitative terms, using conditional beliefs. The transitivity
condition for is just a consistency requirement imposed on a rational agent’s
conditional beliefs. And the existence of minimal elements in any non-empty subset
is simply the natural extension of the above setting to general conditional beliefs, not
only conditions involving two states: more specifically, for any possible condition
P S about a system S, the S-proposition Min P is simply a way to encode
everything that the agent would believe about the current state of the system, if
she was given the information that the state satisfied condition P.
Note on other models in the literature. Our models are the same as Board’s
“belief revision structures” (Board 2002), i.e., nothing but “Spohn models” as in
Spohn (1988), but with a purely relational description. Spohn models are usually
described in terms of a map assigning ordinals to states. But giving such a map is
equivalent to introducing a well pre-order on states, and it is easy to see that all
the relevant information is captured by this order.
Our conditions on the preorder can also be seen as a semantic analogue
of Grove’s conditions for the (relational version of his) models in Grove (1988).
The standard formulation of Grove models is in terms of a “system of spheres”
(weakening Lewis’ similar notion), but it is equivalent (as proved in Grove 1988)
to a relational formulation. Grove’s postulates are still syntax-dependent, e.g.,
existence of minimal elements is required only for subsets that are definable in his
language: this is the so-called “smoothness” condition, which is weaker than our
“well-preordered” condition. We prefer a purely semantic condition, independent of
the choice of a language, both for reasons of elegance and simplicity and because
we want to be able to consider more than one language for the same structure.7
7
Imposing syntactic-dependent conditions in the very definition of a class of structures makes the
definition meaningful only for one language; or else, the meaning of what, say, a plausibility model
is won’t be robust: it will change whenever one wants to extend the logic, by adding a few more
39 Dynamic Interactive Belief Revision 823
So, following Board (2002) and Stalnaker (2006) and others, we adopt the natural
semantic analogue of Grove’s condition, simply requiring that every subset has
minimal elements: this will allow our conditional operators to be well-defined on
sentences of any extension of our logical language.
Note that the minimality condition implies, by itself, that the relation is both
reflexive (i.e., s s for all s 2 S) and connected 8 (i.e., either s t or t s,
for all s; t 2 S). In fact, a “well-preorder” is the same as a connected, transitive,
well-founded9 relation, which is the setting proposed in Board (2002) for a logic
of conditional beliefs equivalent to our logic CDL below. Note also that, when the
set S is finite, a well-preorder is nothing but a connected preorder. This shows that
our notion of frame subsumes, not only Grove’s setting, but also some of the other
settings proposed for conditionalization.
In the multi-agent case, we cannot exclude from the model the states that are known
to be impossible by some agent a: they may still be considered possible by a second
agent b. Moreover, they might still be relevant for a’s beliefs/knowledge about what
b believes or knows. So, in order to define an agent’s knowledge, we cannot simply
quantify over all states, as we did above: instead, we need to consider, as usually
done in the Kripke-model semantics of knowledge, only the “possible” states, i.e.,
the ones that are indistinguishable from the real state, as far as a given agent is
concerned. It is thus natural, in the multi-agent context, to explicitly specify the
agents’ epistemic indistinguishability relations a (labeled with the agents’ names)
as part of the basic structure, in addition to the plausibility relations a . Taking
this natural step, we obtain epistemic plausibility frames .S; a ; a /. As in the case
of a single agent, specifying epistemic relations turns out to be superfluous: the
relations a can be recovered from the relations a . Hence, we will simplify the
above structures, obtaining the equivalent setting of multi-agent plausibility frames
.S; a /.
Before going on to define these notions, observe that it doesn’t make sense
anymore to require the plausibility relations a to be connected (and even less
sense to require them to be well-preordered): if two states s; t are distinguishable
by an agent a, i.e., s 6a t, then a will never consider both of them as epistemically
possible in the same time. If she was given the information that the real state is
either s or t, agent a will immediately know which of the two: if the real state was
s, she would be able to distinguish this state from t, and would thus know the state
operators. This is very undesirable, since then one cannot compare the expressivity of different
logics on the same class of models.
8
In the Economics literature, connectedness is called “completeness”, see e.g., Board (2002).
9
I.e., there exists no infinite descending chain s0 > s1 > .
824 A. Baltag and S. Smets
was s; similarly, if the real state was t, she would know it to be t. Her beliefs will
play no role in this, and it would be meaningless to ask her which of the two states is
more plausible to her. So only the states in the same a -equivalence class could, and
should, be a -comparable; i.e., s a t implies s a t, and the restriction of a to
each a -equivalence class is connected. Extending the same argument to arbitrary
conditional beliefs, we can see that the restriction of a to each a -equivalence
class must be well-preordered.
Epistemic plausibility frames. Let A be a finite set of labels, called agents.
An epistemic plausibility frame over A (EPF, for short) is a structure S D
.S; a ; a /a2A , consisting of a set S of “states”, endowed with a family of equiv-
alence relations a , called epistemic indistinguishability relations, and a family
of plausibility relations a , both labeled by “agents” and assumed to satisfy two
conditions: (1) a -comparable states are a -indistinguishable (i.e., s a t implies
s a t); (2) the restriction of each plausibility relation a to each a -equivalence
class is a well-preorder. As before, we use the notation Mina P for the set of a -
minimal elements of P. We write s <a t iff s a t but t 6a s (the “strict” plausibility
relation), and write s Ša t iff both s a t and t a s (the “equi-plausibility”
relation). The notion of epistemic plausibility models (EPM, for short) is defined in
the same way as the plausibility models in the previous section.
Epistemic plausibility models. We define a (multi-agent) epistemic
plausibility model (EPM, for short) as a multi-agent EPF together with a valuation
over it (the same way that single-agent plausibility models were defined in the
previous section).
It is easy to see that our definition of EPFs includes superfluous information: in
an EPF, the knowledge relation a can be recovered from the plausibility relation
a , via the following rule:
s a t iff either s a t or t a s :
In other words, two states are indistinguishable for a iff they are comparable (with
respect to a ).
So, in fact, one could present epistemic plausibility frames simply as multi-agent
plausibility frames. To give this alternative presentation, we use, for any preorder
relation , the notation for the associated comparability relation
WD [
The converse problem is studied in Board (2002), where it is shown that, if full
introspection is assumed, then one can recover “uniform” plausibility relations a
from the relations wa .
Information cell. The equivalence relation a induces a partition of the state space
S, called agent a’s information partition. We denote by s.a/ the information cell of
s in a’s partition, i.e., the a -equivalence class of s:
s.a/ WD ft 2 S W s a tg :
The information cell s.a/ captures all the knowledge possessed by the agent at state
s: when the actual state of the system is s, then agent a knows only the state’s
equivalence class s.a/.
Example 1. Alice and Bob play a game, in which an anonymous referee puts a coin
on the table, lying face up but in such a way that the face is covered (so Alice and
Bob cannot see it). Based on previous experience, (it is common knowledge that)
Alice and Bob believe that the upper face is Heads (since e.g., they noticed that
826 A. Baltag and S. Smets
the referee had a strong preference for Heads). And in fact, they’re right: the coin
lies Heads up. Neglecting the anonymous referee, the EPM for this example is the
following model S:
H T
a,b
H T
Example 2. In front of Alice, the referee shows the face of the coin to Bob, but
Alice cannot see the face. The EPM is now the following model W:
H T
H a T
Since Bob now knows the state of the coin, his local plausibility relation
consists only of loops, and hence we have no arrows for Bob in this diagrammatic
representation.
39 Dynamic Interactive Belief Revision 827
sa WD Mina s.a/
of the “most plausible” states that are consistent with the agent’s knowledge at state
s. The doxastic appearance of s captures the way state s appears to the agent, or (in
the language of Belief Revision) the agent’s current “theory” about the world s. We
can extend this to capture conditional beliefs (in full generality), by associating to
each S-proposition P S and each state s 2 S the conditional doxastic appearance
sPa of state s to agent a, given (information) P. This can be defined as the S-
proposition
given by the set of all a -minimal states of s.a/ \ P: these are the “most plausible”
states satisfying P that are consistent with the agent’s knowledge at state s. The
conditional appearance SaP gives the agent’s revised theory (after learning P) about
the world s. We can put these in a relational form, by defining doxastic accessibility
relations !a , !Pa , as follows:
s !a t iff t 2 sa ;
s !Pa t iff t 2 sPa :
Ka P WD Œa P D fs 2 S W s.a/ Pg ;
Ba P WD Œ!a P D fs 2 S W sa Pg ;
BQ Q Q
a P WD Œ!a P D fs 2 S W sa Pg:
We also need a notation for the dual of the K modality (“epistemic possibility”):
KQ a P WD :Ka :P:
given model. But, since later we will proceed to study systematic changes of models
(when dealing with dynamic belief revision), we need a notion of proposition that is
not confined to one model, but makes sense on all models:
A doxastic proposition is a map P assigning to each plausibility model S some S-
proposition PS S. We write s ˆS P, and say that the proposition P is true at s 2 S,
iff s 2 .P/S . We skip the subscript and write s ˆ P when the model is understood.
We denote by Prop the family of all doxastic propositions. All the Boolean
operations on S-propositions as sets can be lifted pointwise to operations on Prop:
in particular, we have the “always true” > and “always false” ? propositions, given
by .?/S WD ;; .>/S WD S, negation .:P/S WD S n PS , conjunction .P ^ Q/S WD
PS \ QS , disjunction .P _ Q/S WD PS [ QS and all the other standard Boolean
operators, including infinitary conjunctions and disjunctions. Similarly, we can
define pointwise the epistemic and (conditional) doxastic modalities: .Ka P/S WD
Ka PS , .Ba P/S WD Ba PS , .BQ QS
a P/S WD Ba PS . It is easy to check that we have: Ba P D
>
Ba P. Finally, the relation of entailment P ˆ Q between doxastic propositions is
given pointwise by inclusion: P ˆ Q iff PS QS for all S.
Ever since Plato’s identification of knowledge with “true justified (or justifiable)
belief” was shattered by Gettier’s celebrated counterexamples (Gettier 1963),
philosophers have been looking for the “missing ingredient” in the Platonic
equation. Various authors identify this missing ingredient as “robustness” (Hintikka
1962), “indefeasibility” (Klein 1971; Lehrer 1990; Lehrer and Paxson 1969;
Stalnaker 2006) or “stability” (Rott 2004). According to this defeasibility theory
of knowledge (or “stability theory”, as formulated by Rott), a belief counts as
“knowledge” if it is stable under belief revision with any new evidence: “if a person
has knowledge, than that person’s justification must be sufficiently strong that it is
not capable of being defeated by evidence that he does not possess” (Pappas and
Swain 1978).
One of the problems is interpreting what “evidence” means in this context. There
are at least two natural interpretations, each giving us a concept of “knowledge”.
The first, and the most common,10 interpretation is to take it as meaning “any true
information”. The resulting notion of “knowledge” was formalized by Stalnaker in
(2006), and defined there as follows: “an agent knows that ' if and only if ' is
true, she believes that ', and she continues to believe ' if any true information
is received”. This concept differs from the usual notion of knowledge (“Aumann
knowledge”) in Computer Science and Economics, by the fact that it does not satisfy
the laws of the modal system S5 (in fact, negative introspection fails); Stalnaker
10
This interpretation is the one virtually adopted by all the proponents of the defeasibility theory,
from Lehrer to Stalnaker.
39 Dynamic Interactive Belief Revision 829
shows that the complete modal logic of this modality is the modal system S4.3. As
we’ll see, this notion (“Stalnaker knowledge”) corresponds to what we call “safe
belief” P. On the other hand, another natural interpretation, considered by at
least one author Rott (2004), takes “evidence” to mean “any proposition”, i.e., to
include possible misinformation: “real knowledge” should be robust even in the face
of false evidence. As shown below, this corresponds to our “knowledge” modality
KP, which could be called “absolutely unrevisable belief”. This is a partition-based
concept of knowledge, identifiable with “Aumann knowledge” and satisfying all
the laws of S5. In other words, this last interpretation provides a perfectly decent
“defeasibility” defense of S5 and of negative introspection!
In this paper, we adopt the pragmatic point of view of the formal logician:
instead of debating which of the two types of “knowledge” is the real one, we
simply formalize both notions in a common setting, compare them, axiomatize the
logic obtained by combining them and use their joint strength to express interesting
properties. Indeed, as shown below, conditional beliefs can be defined in terms of
knowledge only if we combine both the above-mentioned types of “knowledge”.
Knowledge as unrevisable belief. Observe that, for all propositions P, we have
^
Ka Q D BPa Q
P
11
This of course assumes agents to be “rational” in a sense that excludes “fundamentalist” or
“dogmatic” beliefs, i.e., beliefs in unknown propositions but refusing any revision, even when
contradicted by facts. But this “rationality” assumption is already built in our plausibility models,
which satisfy an epistemically friendly version of the standard AGM postulates of rational belief
revision. See Baltag and Smets (2006a) for details.
830 A. Baltag and S. Smets
Ka Q D B:Q :Q
a Q D Ba ? (39.2)
(where ? is the “always false” proposition). This captures in a different way the
“absolute un-revisability” of knowledge: something is “known” if it is believed even
if conditionalizing our belief with its negation. In other words, this simply expresses
the impossibility of accepting its negation as evidence (since such a revision would
lead to an inconsistent belief).
Safe belief. To capture “Stalnaker knowledge”, we introduce the Kripke modality
a associated to the converse a of the plausibility relation, going from any state s
to all the states that are “at least as plausible” as s. For S-propositions P S over
any given model S, we put
Ka P ˆ a P ;
a P ˆ Ba P:
The last observation can be strengthened to characterize safe belief in a similar way
to the above characterization (39.1) of knowledge (as belief invariant under any
revision): safe beliefs are precisely the beliefs which are persistent under revision
with any true information. Formally, this says that, for every state s in every model
S, we have
12
This identity corresponds to the definition of “necessity” in Stalnaker (1968) in terms of doxastic
conditionals.
39 Dynamic Interactive Belief Revision 831
We can thus see that safe belief coincides indeed with Stalnaker’s notion of
“knowledge”, given by the first interpretation (“evidence as true information”)
of the defeasibility theory. As mentioned above, we prefer to keep the name
“knowledge” for the strong notion (which gives absolute certainty), and call this
weaker notion “safe belief”: indeed, these are beliefs that are “safe” to hold, in the
sense that no future learning of truthful information will force us to revise them.
Example 3 (Dangerous Knowledge). This starts with the situation in Example 1
(when none of the two agents has yet seen the face of the coin). Alice has to get out
of the room for a minute, which creates an opportunity for Bob to quickly raise the
cover in her absence and take a peek at the coin. He does that, and so he sees that the
coin is Heads up. After Alice returns, she obviously doesn’t know whether or not
Bob took a peek at the coin, but she believes he didn’t do it: taking a peek is against
the rules of the game, and so she trusts Bob not to do that. The model is now rather
complicated, so we only represent the MPM:
H a T
a a
a a
H T
a,b
Let us call this model S0 . The actual state s01 is the one in the upper left corner, in
which Bob took a peek and saw the coin Heads up, while the state t10 in the upper
right corner represents the other possibility, in which Bob saw the coin lying Tails
up. The two lower states s02 and t20 represent the case in which Bob didn’t take a
peek. Observe that the above drawing includes the (natural) assumption that Alice
keeps her previous belief that the coin lies Heads up (since there is no reason for
her to change her mind). Moreover, we assumed that she will keep this belief even
if she’d be told that Bob took a peek: this is captured by the a-arrow from t10 to s01 .
This seems natural: Bob’s taking a peek doesn’t change the upper face of the coin,
so it shouldn’t affect Alice’s prior belief about the coin.
In both Examples 1 and 3 above, Alice holds a true belief (at the real state)
that the coin lies Heads up: the actual state satisfies Ba H. In both cases, this true
belief is not knowledge (since Alice doesn’t know the upper face), but nevertheless
in Example 1, this belief is safe (although it is not known by the agent to be safe):
no additional truthful information (about the real state s) can force her to revise this
belief. (To see this, observe that any new truthful information would reveal to Alice
the real state s, thus confirming her belief that Heads is up.) So in the model S from
Example 1, we have s ˆ a H (where s is the actual state). In contrast, in Example 2,
Alice’s belief (that the coin is Heads up), though true, is not safe. There is some piece
832 A. Baltag and S. Smets
of correct information (about the real state s01 ) which, if learned by Alice, would
make her change this belief: we can represent this piece of correct information as
the doxastic proposition H ! Kb H. It is easy to see that the actual state s01 of the
model S0 satisfies the proposition BH!K a
b H T (since .H ! K H/ 0 D fs0 ; t0 ; t0 g and
b S 1 1 2
the minimal state in the set s1 .a/ \ fs1 ; s1 ; t2 g D fs01 ; t10 ; t20 g is t20 , which satisfies T.)
0 0 0 0
So, if given this information, Alice would come to wrongly believe that the coin is
Tails up! This is an example of a dangerous truth: a true information whose learning
can lead to wrong beliefs.
Observe that an agent’s belief can be safe without him necessarily knowing this
(in the “strong” sense of knowledge given by K): “safety” (similarly to “truth”) is an
external property of the agent’s beliefs, that can be ascertained only by comparing
his belief-revision system with reality. Indeed, the only way for an agent to know a
belief to be safe is to actually know it to be truthful, i.e., to have actual knowledge
(not just a belief) of its truth. This is captured by the valid identity
Ka a P D Ka P: (39.4)
In other words: knowing that something is safe to believe is the same as just knowing
it to be true. In fact, all beliefs held by an agent “appear safe” to him: in order to
believe them, he has to believe that they are safe. This is expressed by the valid
identity
Ba a P D Ba P (39.5)
saying that: believing that something is safe to believe is the same as just believing
it. Contrast this with the situation concerning “knowledge”: in our logic (as in most
standard doxastic-epistemic logics), we have the identity
Ba Ka P D Ka P: (39.6)
So this leads to a triviality result: knowledge and belief collapse to the same thing,
and all beliefs are always true! One solution to the “paradox” is to reject .‹/,
as an (intuitive but) wrong “axiom”. In contrast, various authors Friedmann and
Halpern (1994); Hoek (1993); Voorbraak (1993); Williamson (2001) accept .‹/ and
propose other solutions, e.g., giving up the principle of “negative introspection” for
knowledge.
Our solution to the paradox, as embodied in the contrasting identities (39.5)
and (39.6), combines the advantages of both solutions above: the “axiom” (?) is
correct if we interpret “knowledge” as safe belief a , since then (?) becomes
equivalent to identity (39.5) above; but then negative introspection fails for this
interpretation! On the other hand, if we interpret “knowledge” as our Ka -modality
then negative introspection holds; but then the above “axiom” (?) fails, and on the
contrary we have the identity (39.6).
So, in our view, the paradox of the perfect believer arises from the conflation of
two different notions of “knowledge”: “Aumann” (partition-based) knowledge and
“Stalnaker” knowledge (i.e., safe belief).
(Conditional) beliefs in terms of “knowledge” notions. An important observa-
tion is that one can characterize/define (conditional) beliefs only in terms of our two
“knowledge” concepts (K and ): For simple beliefs, we have
Ba P D KQ a a P D Þa a P ;
From a modal logic perspective, it is natural to introduce the Kripke modalities Œ>a
and ŒŠa for the other important relations (strict plausibility and equiplausibility):
For S-propositions P S over a given model S, we put
and as before these pointwise induce corresponding operators on Prop. The intuitive
meaning of these operators is not very clear, but they can be used to define other
interesting modalities, capturing various “doxastic attitudes”.
Weakly safe belief. We can define a weakly safe belief operator weak P in terms
of the strict order by putting:
weak
a P D P ^ Œ>a P :
s ˆ weak
a P iff: s ˆ P and t ˆ P for all t < s :
s ˆ weak
a Q iff: s ˆ :BPa :Q for every P such that s ˆ P :
So “weakly safe beliefs” are beliefs which (might be lost but) are never reversed
(into believing the opposite) when revising with any true information.
The unary revision operator. Using the strict plausibility modality, we can also
define a unary “belief revision” modality a , which in some sense internalizes the
standard (binary) belief revision operator, by putting:
a P D P ^ Œ>a :P :
s ˆ a P iff s 2 sPa :
It is easy to see that a P selects from any given information cell s.a/ precisely those
states that satisfy agent a’s revised theory sPa :
a P \ s.a/ D sPa :
and defining
Sba P D Ba P ^ Ka .P ! a P/ :
In terms of the plausibility order, it means that all the P-states in the information
cell s.a/ of s are bellow (more plausible than) all the non-P states in s.a/ (and that,
moreover, there are such P-states in s.a/). This notion is called “strong belief” by
Battigalli and Siniscalchi (2002), while Stalnaker (1996) calls it “robust belief”.
Another characterization of strong belief is the following
s ˆ Sba Q iff:
s ˆ Ba Q and s ˆ BPa Q for every P such that s ˆ :Ka .P ! :Q/:
In other words: something is strong belief if it is believed and if this belief can only
be defeated by evidence (truthful or not) that is known to contradict it. An example
is the “presumption of innocence” in a trial: requiring the members of the jury to
hold the accused as “innocent until proven guilty” means asking them to start the
trial with a “strong belief” in innocence.
836 A. Baltag and S. Smets
The logic CDL (“conditional doxastic logic”) introduced in Baltag and Smets
(2006a) is a logic of conditional beliefs, equivalent to the strongest logic considered
in Board (2002). The syntax of CDL (without common knowledge and common
belief operators13) is:
13
The logic in Baltag and Smets (2006a) has these operators, but for simplicity we decided to leave
them aside in this presentation.
39 Dynamic Interactive Belief Revision 837
while the semantics over plausibility models is given as for CDL, by inductively
defining an interpretation map from sentences to doxastic propositions, using the
obvious compositional clauses. Belief and conditional belief are derived operators
here, defined as abbreviations:
where KQ a ' WD :Ka :' is the Diamond modality for K, and > D :.p ^ :p/ is some
tautological sentence. So the logic K is more expressive than CDL.
Proof system. In addition to the rules and axioms of propositional logic, the proof
system for the logic K includes the following:
• the Necessitation Rules for both Ka and a ;
• the S5-axioms for Ka ;
• the S4-axioms for a ;
• Ka P ! a P ;
• Ka .P _ a Q/ ^ Ka .Q _ a P/ ! Ka P _ Ka Q.
Theorem 5 (Completeness and Decidability). The logic K is (weakly) com-
plete with respect to MPMs (and so also with respect to EPMs). Moreover, it is
decidable and has the finite model property.
Proof. A non-standard frame (model) is a structure .S; a ; a /a (together with a
valuation, in the case of models) such that a are equivalence relations, a are
preorders, a a and the restriction of a to each a -equivalence class is
connected. For a logic with two modalities, a for a and Ka for the relation
a , we can use well-known results in Modal Correspondence Theory to see
838 A. Baltag and S. Smets
that each of these semantic conditions corresponds to one of our modal axioms
above. By general classical results on canonicity and modal correspondence,14 we
immediately obtain completeness for non-standard models. Finite model property
for these non-standard models follows from the same general results. But every
finite strict preorder relation > is well-founded, and an MPM is nothing but a non-
standard model whose strict preorders >a are well-founded. So completeness for
(“standard”) MPMs immediately follows. Then we can use Proposition 4 above
to obtain completeness for EPMs. Finally, decidability follows, in the usual way,
from finite model property together with completeness (with respect to a finitary
proof system) and with the decidability of model-checking on finite models. (This
last property is obvious, given the semantics.) Q . E. D .
` B'a '
would fail for higher-level beliefs. To see this, consider a “Moore sentence”
' WD p ^ :Ba p ;
saying that some fact p holds but that agent a doesn’t believe it. The sentence '
is consistent, so it may very well happen to be true. But agent a’s beliefs about
the situation after learning that ' was true cannot possibly include the sentence '
itself: after learning this sentence, agent a knows p, and so he believes p, contrary
to what ' asserts. Thus, after learning ', agent a knows that ' is false now (after
the learning). This directly contradicts the Success axiom: far from believing the
sentence after learning it to be true, the agent (knows, and so he correctly) believes
that it has become false. There is nothing paradoxical about this: sentences may
obviously change their truth values, due to our actions. Since learning the truth
of a sentence is itself an action, it is perfectly consistent to have a case in which
learning changes the truth value of the very sentence that is being learnt. Indeed,
this is always the case with Moore sentences. Though not paradoxical, the existence
of Moore sentences shows that the “Success” axiom does not correctly describe a
rational agent’s (higher-level) beliefs about what is the case after a new truth is being
learnt.
14
See e.g., Blackburn et al. (2001) for the general theory of modal correspondence and canonicity.
39 Dynamic Interactive Belief Revision 839
The only way to understand the “Success” axiom in the context of higher-level
beliefs is to insist on the above-mentioned “static” interpretation of conditional
'
belief operators Ba , as expressing the agent’s revised belief about how the state
of the world was before the revision.
In contrast, a belief update is a dynamic form of belief revision, meant to capture
the actual change of beliefs induced by learning: the updated belief is about the state
of the world as it is after the update. As noticed in Gerbrandy (1999), Baltag et al.
(1998), and Baltag and Moss (2004), the original model does not usually include
enough states to capture all the epistemic possibilities that arise in this way. While
in the previous section the models were kept unchanged during the revision, all the
possibilities being already there (so that both the unconditional and the conditional
beliefs referred to the same model), we now have to allow for belief updates that
change the original model.
In Baltag and Moss (2004), it was argued that epistemic events should be modeled
in essentially the same way as epistemic states, and this common setting was taken
to be given by epistemic Kripke models. Since in this paper we enriched our state
models with doxastic plausibility relations to deal with (conditional) beliefs, it is
natural to follow Baltag and Moss (2004) into extending the similarity between
actions and states to this setting, thus obtaining (epistemic) action plausibility
models. The idea of such an extension was first developed in Aucher (2003) (for
a different notion of plausibility model and a different notion of update product),
then generalized in van Ditmarsch (2005), where many types of action plausibility
models and notions of update product, that extend the so-called Baltag-Moss-Solecki
(BMS) update product from Baltag et al. (1998) and Baltag and Moss (2004),
are explored. But both these works are based on a quantitative interpretation of
plausibility ordinals (as “degrees of belief”), and thus they define the various types
of products using complex formulas of transfinite ordinal arithmetic, for which no
intuitive justification is provided.
In contrast, our notion of update product is a purely qualitative one, based on
a simple and intuitive relational definition: the simplest way to define a total pre-
order on a Cartesian product, given total pre-orders on each of the components,
is to use either the lexicographic or the anti-lexicographic order. We choose
the second option, as the closest in spirit to the classical AGM theory: it gives
priority to the new, incoming information (i.e., to “actions” in our sense).15 We
justify this choice by interpreting the action plausibility model as representing
the agent’s “incoming” belief, i.e., the belief-updating event, which “performs”
the update, by “acting” on the “prior” beliefs (as given in the state plausibility
model).
15
This choice can be seen as a generalization of the so-called “maximal-Spohn” revision.
840 A. Baltag and S. Smets
Action Models
An action plausibility model16 (APM, for short) is a plausibility frame .†; a /a2A
together with a precondition map pre W † ! Prop, associating to each element of †
some doxastic proposition pre . We call the elements of † (basic) doxastic actions
(or “events”), and we call pre the precondition of action . The basic actions
2 † are taken to represent deterministic belief-revising actions of a particularly
simple nature. Intuitively, the precondition defines the domain of applicability
of action : it can be executed on a state s iff s satisfies its precondition. The
relations a give the agents’ beliefs about which actions are more plausible than
others.
To model non-determinism, we introduce the notion of epistemic program. A
doxastic program over a given action model † (or †-program, for short) is simply
a set † of doxastic actions. We can think of doxastic programs as non-
deterministic actions: each of the basic actions
2 is a possible “deterministic
resolution” of . For simplicity, when D f
g is a singleton, we ambiguously
identify the program with the action
.
Observe that †-programs † are formally the “dynamic analogues” of
S-propositions P S. So the dynamic analogue of the conditional doxastic
appearance sPa (representing agent a’s revised theory about state s, after revision
with proposition P) is the set a .
Interpretation: beliefs about changes encode changes of beliefs. The name
“doxastic actions” might be a bit misleading, and from a philosophical perspective
Johan van Benthem’s term “doxastic events” seems more appropriate. The elements
of a plausibility model do not carry information about agency or intentionality and
cannot represent “real” actions in all their complexity, but only the doxastic changes
induced by these actions: each of the nodes of the graph represents a specific kind of
change of beliefs (of all the agents). As in Baltag and Moss (2004), we only deal here
with pure “belief changes”, i.e., actions that do not change the “ontic” facts of the
world, but only the agents’ beliefs.17 Moreover, we think of these as deterministic
changes: there is at most one output of applying an action to a state.18 Intuitively, the
precondition defines the domain of applicability of : this action can be executed
on a state s iff s satisfies its precondition. The plausibility pre-orderings a give the
agents’ conditional beliefs about the current action. But this should be interpreted
as beliefs about changes, that encode changes of beliefs. In this sense, we use such
16
Van Benthem calls this an “event model”.
17
We stress this is a minor restriction, and it is very easy to extend this setting to “ontic” actions.
The only reason we stick with this restriction is that it simplifies the definitions, and that it is general
enough to apply to all the actions we are interested here, and in particular to all communication
actions.
18
As in Baltag and Moss (2004), we will be able to represent non-deterministic actions as sums
(unions) of deterministic ones.
39 Dynamic Interactive Belief Revision 841
“beliefs about actions” as a way to represent doxastic changes: the information about
how the agent changes her beliefs is captured by our action plausibility relations. So
we read <a 0 as saying that: if agent a is informed that either or 0 is currently
happening, then she cannot distinguish between the two, but she believes that
is in fact happening. As already mentioned, doxastic programs † represent
non-deterministic changes of belief. Finally, for an action and a program , the
program a represents the agent’s revised theory (belief) about the current action
after “learning” that (one of the deterministic resolutions
in) is currently
happening.
Example 4 (Private “Fair-Game” Announcements). Let us consider the action that
produced the situation represented in Example 2 above. In front of Alice, Bob
looked at the coin, in such a way that (it was common knowledge that) only he
saw the face. In the DEL literature, this is sometimes known as a “fair game”
announcement: everybody is commonly aware that an insider (or a group of insiders)
privately learns some information. It is “fair” since the outsiders are not “deceived”
in any way: e.g., in our example, Alice knows that Bob looks at the coin (and he
knows that she knows etc.). In other words, Bob’s looking at the coin is not an
“illegal” action, but one that obeys the (commonly agreed) “rules of the game”. To
make this precise, let us assume that this is happening in such a way that Alice has
no strong beliefs about which of the two possible actions (Bob-seeing-Heads-up and
Bob-seeing-Tails-up) is actually happening. Of course, we assumed that before this,
she already believed that the coin lies Heads up, but apart from this we now assume
that the way the action (of “Bob looking”) is happening gives her no indication
of what face he is seeing. We represent these actions using a two-node plausibility
model † 2 (where as in the case of state models we draw arrows for the converse
plausibility relations a , disregarding all the loops):
H a T
Example 5 (Fully Private Announcements). Let us consider the action that pro-
duced the situation represented in Example 3 above. This was the action of Bob
taking a peek at the coin, while Alice was away. Recall that we assumed that Alice
believed that nothing was really happening in her absence (since she assumed Bob
was playing by the rules), though obviously she didn’t know this (that nothing
was happening). In the DEL literature, this action is usually called a fully private
announcement: Bob learns which face is up, while the outsider Alice believes
nothing of the kind is happening. To represent this, we consider an action model
† 3 consisting of three “actions”: the actual action in which Bob takes a peek
and sees the coin lying Heads up; the alternative possible action is the one in
which Bob sees the coin lying Tails up; finally, the action is the one in which
“nothing is really happening” (as Alice believes). The plausibility model † 3 for this
action is:
842 A. Baltag and S. Smets
H a T
a a
Here, the action is the one in the upper left corner, having precondition H:
indeed, this can happen iff the coin is really lying Heads up; similarly, the action
in the upper right corner has precondition T, since it can only happen iff the
coin is Tails up. Finally, the action is the lower one, having as precondition the
“universally true” proposition >: indeed, this action can always happen (since in it,
nothing is really happening!). The plausibility relations reflect the agents’ beliefs:
in each case, both Bob and Charles know exactly what is happening, so their local
plausibility relations are the identity (and thus we draw no arrows for them). Alice
believes nothing is happening, so is the most plausible action for her (to which all
her arrows are pointing); so she keeps her belief that H is the case, thus considering
as more plausible than .
Examples of doxastic programs. Consider the program D f; g † 3 over
the action model †3 from Example 5. The program represents the action of “Bob
taking a peek at the coin”, without any specification of which face he is seeing.
Although expressed in a non-deterministic manner (as a collection of two possible
actions, and ), this program corresponds is in fact deterministic, since in each
possible state only one of the actions or can happen: there is no state satisfying
both H and T. The whole set † gives another doxastic program, one that is really
non-deterministic: it represents the non-deterministic choice of Bob between taking
a peek and not taking it.
Appearance of actions and their revision: Examples. As an example of an
agent’s “theory” about an action, consider the appearance of action to Alice:
a D fg. Indeed, if happens (Bob taking a peek and sees the coin is Tails up),
Alice believes that (i.e., nothing) is happening: this is the “apparent action”, as
far as Alice is concerned. As an example of a “revised theory” about an action,
consider the conditional appearance a of to Alice given the program D f; g
introduced above. It is easy to see that we have a D fg. This captures our
intuitions about Alice’s revised theory: if, while was happening, she were told
that Bob took a peek (i.e., she’d revise with ), then she would believe that he saw
the coin lying Heads up (i.e., that happened).
Example 6 (Successful Lying). Suppose now that, after the previous action, i.e.,
after we arrived in the situation described in Example 3, Bob sneakily announces:
“I took a peek and saw the coin was lying Tails up”. We formalize the content of
this announcement as Kb T, i.e., saying that “Bob knows the coin is lying Tails up”.
This is a public announcement, but not a truthful one (though it does convey some
truthful information): it is a lie! We assume it is in fact a successful lie: it is common
39 Dynamic Interactive Belief Revision 843
knowledge that, even after Bob admitted having taken a peek, Alice still believes
him. This action is given by the left node in the following model † 4 :
¬Kb T a Kb T
We are ready to define our update operation, representing the way an action from
a (action) plausibility model †D .†; a ; pre/a2A “acts” on an input-state from a
given (state) plausibility model S D .S; a ; kk/a2A . We denote the updated state
model by S˝ †, and call it the update product of the two models. The construction
is similar to a point to the one in Baltag et al. (1998) and Baltag and Moss (2004),
and thus also somewhat similar to the ones in Aucher (2003) and van Ditmarsch
(2005). In fact, the set of updated states, the updated valuation and the updated
indistinguishability relation are the same in these constructions. The main difference
lies in our definition of the updated plausibility relation, via the Action Priority Rule.
To warm up, let us first define the update product for the single-agent case. Let
S D .S; ; kk/ be a single-agent plausibility state model and let †D .†; ; pre/ be
a single-agent plausibility action model.
We represent the states of the updated model S˝ † as pairs .s; / of input-states
and actions, i.e., as elements of the Cartesian product S †. This reflects that the
basic actions in our action models are assumed to be deterministic: For a given
input-state and a given action, there can only be at most one output-state. More
specifically, we select the pairs which are consistent, in the sense that the input-
state satisfies the precondition of the action. This is natural: the precondition of an
action is a specification of its domain of applicability. So the set of states of S˝ †
is taken to be
S ˝ † WD f.s; / W s ˆS pre./g :
The updated valuation is essentially given by the original valuation from the input-
state model: For all .s; / 2 S ˝ †, we put .s; / ˆ p iff s ˆ p. This “conservative”
way to update the valuation expresses the fact that we only consider here actions
that are “purely doxastic”, i.e., pure “belief changes”, that do not affect the ontic
“facts” of the world (captured here by atomic sentences).
We still need to define the updated plausibility relation. To motivate our
definition, we first consider two examples:
844 A. Baltag and S. Smets
Example 7 (A Sample Case). Suppose that we have two states s; s0 2 S such that
s < s0 , s ˆ :P, s0 ˆ P. This means that, if given the supplementary information
that the real state is either s or s0 , the agent believes :P:
¬P P
Suppose then an event happens, in whose model there are two actions ; 0 such that
> 0 , pre D :P, pre 0 D P. In other words, if given the information that either
or 0 is happening, the agent believes that 0 is happening, i.e., she believes that
P is learnt. This part of the model behaves just like a soft public announcement of
P:
¬P P
Naturally, we expect the agent to change her belief accordingly, i.e., her updated
plausibility relation on states should now go the other way:
¬P P
Example 8 (A Second Sample Case). Suppose the initial situation was the same as
above, but now the two actions ; 0 are assumed to be equi-plausible: Š 0 . This
is a completely unreliable announcement of P, in which the veracity and the falsity
of the announcement are equally plausible alternatives:
¬P P
In the AGM paradigm, it is natural to expect the agents to keep their original beliefs
unchanged after this event:
¬P P
The anti-lexicographic order. Putting the above two sample cases together, we
conclude that the updated plausibility relation should be the anti-lexicographic
preorder relation induced on pairs .s; / 2 S † by the preorders on S and on
†, i.e.:
In other words, the updated plausibility order gives “priority” to the action
plausibility relation, and apart from this it keeps as much as possible the old
order. This reflects our commitment to an AGM-type of revision, in which the new
information has priority over old beliefs. The “actions” represent here the “new
information”, although (unlike in AGM) this information comes in dynamic form
(as action plausibility order), and so it is not fully reducible to its propositional
content (the action’s precondition). In fact, this is a generalization of one of the
belief-revision policies encountered in the literature (the so-called “maximal-Spohn
revision”). But, in the context of our qualitative (conditional) interpretation of
plausibility models, we will argue below that this is essentially the only reasonable
option.
In the multi-agent case, the construction of the updated state space and updated
valuation is the same as above. But for the updated plausibility relation we need
to take into account a third possibility: the case when either the initial states or the
actions are distinguishable, belonging to different information cells.
Example 9 (A Third Sample Case). Suppose that we have two states s; s0 2 S such
that s ˆ :P, s0 ˆ P, but s 6a s0 are distinguishable (i.e., non-comparable):
:P P
This means that, if given the supplementary information that the real state is
either s or s0 , the agent immediately knows which of the two is the real states, and
thus she knows whether P holds or not. It is obvious that, after any of the actions
considered in the previous two examples, a perfect-recall agent will continue to
know whether P held or not, and so the output-states after and 0 will still be
distinguishable (non-comparable).
The “Action-Priority” Rule. Putting this together with the other sample cases, we
obtain our update rule, in full generality:
Example 2. Similarly, if † 3 is the action model from Example 5, then we can see
that the product S˝ † 3 is isomorphic to the state model S0 from Example 3.
“In-sanity check”: Successful lying. Applying the action model † 4 in Example 6,
representing the “successful lying” action, to the state model S0 from Example 3,
we obtain indeed the intuitively correct output of “successful lying”, namely the
following model S0 ˝ † 4 :
H a T
a a
a a
H T
a,b
Interpretation. As its name makes explicit, the Action-Priority Rule gives “pri-
ority” to the action plausibility relation. This is not an arbitrary choice, but it is
motivated by our specific interpretation of action models, as embodied in our Motto
above: beliefs about changes (i.e., the action plausibility relations) are nothing but
ways to encode changes of belief (i.e., reversals of the original plausibility order). So
the (strict) order on actions encodes changes of order on states. The Action-Priority
Rule is a consequence of this interpretation: it just says that a strong plausibility
order <a 0 on actions corresponds indeed to a change of ordering, (from
whatever the ordering was) between the original (indistinguishable) input-states
s a s0 , to the order .s; / <a .s0 ; 0 / between output-states; while equally plausible
actions Ša 0 will leave the initial ordering unchanged: .s; / a .s0 ; 0 / iff
s a s0 . Giving priority to action plausibility does not in any way mean that the
agent’s belief in actions is stronger than her belief in states; it just captures the fact
that, at the time of updating with a given action, the belief about the action is what
is actual, it is the current belief about what is going on, while the beliefs about the
input-states are in the past.19
In a nutshell: the doxastic action is the one that changes the initial doxastic state,
and not vice-versa. The belief update induced by a given action is nothing but an
update with the (presently) believed action. If the believed action requires the
agent to revise some past beliefs, then so be it: this is the whole point of believing
, namely to use it to revise one’s past beliefs. For example, in a successful lying,
the action plausibility relation makes the hearer believe that the speaker is telling
19
Of course, at a later moment, the above-mentioned belief about action (now belonging to the
past) might be itself revised. But this is another, future update.
39 Dynamic Interactive Belief Revision 847
the truth; so she’ll accept this message (unless contradicted by her knowledge), and
change her past beliefs appropriately: this is what makes the lying successful.
Action-priority update generalizes product update. Recall the definition of the
epistemic indistinguishability relation a in a plausibility model: s a s0 iff either
s a s0 or s0 a s. It is easy to see that the Action Priority Update implies the
familiar update rule from Baltag et al. (1998) and Baltag and Moss (2004), known
in Dynamic Epistemic Logic as the “product update”:
s !S .s0 ;
/ iff s D s0 ; .s;
/ 2 S ˝ † and
2 :
the same information cell, and within the two zones (P and :P), the old ordering
remains. In our setting, this action corresponds to the following local plausibility
action model:
a,b,c,···
¬P P
Taking the anti-lexicographic update product with this action will give an exact
“simulation” of the lexicographic upgrade operation.
“Conservative upgrade”. The operation "P of “conservative upgrade”, also
defined in van Benthem (2007), changes any model as follows: in every information
cell, the best P-worlds become better than all the worlds in that cell (i.e., in every
cell the most plausible P-states become the most plausible overall in that cell), and
apart from that, the old order remains. In the case of a system with only one agent, it
is easy to see that we have "P D *.a P/, where a is the unary “revision modality”
introduced in the previous section. In the case of a set A D f1; ; ng with n > 1
agents, we can simulate "P using a model with 2n actions f"I PgIA , with
^ ^
pre"I P D i P ^ : j P;
i2I j62I
.ŒP/S WD Œ!S PS D fs 2 S W 8t 2 S ˝ † .s !S t ) t ˆS˝† P/ g:
For basic doxastic actions 2 †, we define the dynamic modality Œ via the
above-mentioned identification of actions with singleton programs fg:
We can now introduce operators on doxastic programs that are the analogues of
the regular operations of PDL.
Sequential composition. The sequential composition †I of two action plausi-
bility models †D .†; a ; pre/, D .; a ; pre/ is defined as follows:
• the set of basic actions is the Cartesian product †
• the preconditions are given by pre.;ı/ WD hi preı
• the plausibility order is given by putting .; ı/ a . 0 ; ı 0 / iff: either <a
0 and ı a ı 0 ; or else Ša 0 and ı a ı 0 :
We think of .; ı/ as the action of performing first then ı, and thus use the
notation
I ı WD .; ı/:
I ƒ WD f. ; / W 2 ; 2 ƒg :
2. The transition relation for the program I is the relational composition of the
transition relations for and for and of the isomorphism map F:
I
s ! S s0 iff there exist w; t 2 S ˝ † such that
s !S w !S˝† t and F.t/ D s0 :
Again, it is easy to see that this behaves indeed like a non-deterministic choice
operator:
Proposition 13. Let i1 W † ! † t and i2 W ! † t be the two canonical
injections. Then the following are equivalent:
t
• s ! S s0
• there exists t such that:
either s !S t and i1 .t/ D s0 ; or else s !S t and i2 .t/ D s0 :
F
Other operators. Arbitrary F unions i i can be similarly defined, and then one
can define iteration WD i i (where 0 DŠ> and iC1 D I i ).
The “laws of dynamic belief revision” are the fundamental equations of Belief
Dynamics, allowing us to compute future doxastic attitudes from past ones, given the
doxastic events that happen in the meantime. In modal terms, these can be stated as
“reduction laws” for inductively computing dynamic modalities ŒP, by reducing
them to modalities Œ 0 P0 in which either the propositions P0 or the programs 0
have lower complexity.
The following immediate consequence of the definition of ŒP allows us to
reduce modalities for non-deterministic programs to the ones for their determin-
istic resolutions
2 :
Deterministic Resolution Law. For every program †, we have
^
ŒP D Œ
P :
2
So, for our other laws, we can restrict ourselves to basic actions in †.
The Action-Knowledge Law. For every action 2 †, we have:
^
ŒKa P D pre ! Ka Œ 0 P :
0 a
This Action-Knowledge Law is essentially the same as in Baltag et al. (1998) and
Baltag and Moss (2004): a proposition P will be known after a doxastic event iff,
whenever the event can take place, it is known that P will become true after all
events that are indistinguishable from the given one.
39 Dynamic Interactive Belief Revision 851
This law embodies the essence of the Action-Priority Rule: a proposition P will be
safely believed after a doxastic event iff, whenever the event can take place, it is
known that P will become true after all more plausible events and in the same time
it is safely believed that P will become true after all equi-plausible events.
Since we took knowledge and safe belief as the basis of our static logic K,
the above two laws are the “fundamental equations” of our theory of dynamic
belief revision. But note that, as a consequence, one can obtain derived laws for
(conditional) belief as well. Indeed, using the above-mentioned characterization of
conditional belief in terms of K and , we obtain the following:
The Derived Law of Action-Conditional-Belief. For every action 2 †, we
have:
_ ^ ^ h iP
ŒBPa Q D pre ! KQ a h
iP ^ :KQ a h
0 iP ^ Ba a Œa Q :
†
2
0 62
This derived law, a version of which was first introduced in Baltag and Smets
(2006c) (where it was considered a fundamental law), allows us to predict future
conditional beliefs from current conditional beliefs.
To explain the meaning of this law, we re-state it as follows: For every s 2 S and
2 †, we have:
ha iP
s ˆ ŒBPa Q iff s ˆ pre ! Ba Œa Q ;
where D f
2 † W s ˆS KQ a h
iPg :
It is easy to see that this “local” (state-dependent) version of the reduction law is
equivalent to the previous (state-independent) one. The set encodes the extra
information about the current action that is given to the agent by the context s and by
the post-condition P; while a is the action’s post-conditional contextual appear-
ance, i.e., the way it appears to the agent in the view of this extra-information .
Indeed, a given action might “appear” differently in a given context (i.e., at a state s)
than it does in general: the information possessed by the agent at the state s might
imply the negation of certain actions, hence their impossibility; this information will
then be used to revise the agent’s beliefs about the actions, obtaining her contextual
beliefs. Moreover, in the presence of further information (a “post-condition” P), this
appearance might again be revised. The “post-conditional contextual appearance” is
the result of this double revision: the agent’s belief about action is revised with the
information given to her by the context s and the post-condition P. This information
852 A. Baltag and S. Smets
where
KaP Q WD Ka .P ! Q/ ; Q aP Q WD :KaP :Q ;
K Q Pa Q WD :BPa :Q :
B
From these, we can derive laws for all the other doxastic attitudes above.
The problem of finding a general syntax for action models has been tackled in
various ways by different authors. Here we use the action-signature approach from
Baltag and Moss (2004).
39 Dynamic Interactive Belief Revision 853
The given listing can be used to assign syntactic preconditions for basic programs,
by putting: prei 'E WD 'i , and pre 'E WD > (the trivially true sentence) if is not in
the listing. Thus, the basic programs of the form 'E form a “syntactic plausibility
model" † '; E i.e., every given interpretation kk W L(†)! Prop of sentences as
doxastic propositions will convert this syntactic model into a “real” (semantic)
plausibility model, called † k'k.E
a,b,c, ···
Q P
This represents an event during which all agents share a common belief that
P is announced; but they might be wrong and maybe Q was announced instead.
However, it is common knowledge that either P or Q was announced.
Successful (public) lying Lie P (by an anonymous agent, falsely announcing P)
can now be expressed as Lie P WD +.P; :P/. The truthful soft announcement is
True P WD *.P; :P/. Finally, the soft public announcement (lexicographic update)
*P, as previously defined, is given by the non-deterministic union *P WD True P t
Lie P.
Semantics. We define by simultaneous induction two interpretation maps, one
taking sentences ' into doxastic propositions k'k 2 Prop, the second taking
program terms into doxastic programs kk over some plausibility frames. The
inductive definition uses the obvious semantic clauses. For programs: k 'k E is the
E (or, more exactly, the singleton program f k'kg
action k'k E over the frame † k'k),
E
k t 0 k WD kk t k 0 k, kI 0 k WD kkI k 0 k. For sentences: kpk is as given
by the valuation, k:'k WD :k'k, k' ^ k WD k'k ^ k k, kKa 'k WD Ka k'k,
ka 'k WD a k'k, kŒ'k WD Œ kk k'k.
Proof system. In addition to the axioms and rules of the logic K, the logic L(†)
includes the following Reduction Axioms:
Œ˛p $ pre˛ ! p
Œ˛:' $ pre˛ ! :Œ˛'
Œ˛.' ^ / $ pre˛ ! Œ˛' ^ Œ˛
^
Œ˛Ka ' $ pre˛ ! Ka Œ˛ 0 '
˛ 0 a ˛
^ ^
Œ˛a ' $ pre˛ ! Ka Œ˛ 0 ' ^ a Œ˛ 00 '
˛ 0 <a ˛ ˛ 00 Ša ˛
0 0
Œ t ' $ Œ' ^ Œ '
ŒI 0 ' $ Œ Œ 0 '
39 Dynamic Interactive Belief Revision 855
In our papers Baltag and Smets (2007a,b), we present a probabilistic version of the
theory developed here, based on discrete (finite) Popper-Renyi conditional proba-
bility spaces (allowing for conditionalization on events of non-zero probability, in
order to cope with non-trivial belief revisions). We consider subjective probability
to be the proper notion of “degree of belief”, and we investigate its relationship with
the qualitative concepts developed here. We develop a probabilistic generalization
of the Action Priority Rule, and show that the logics presented above are complete
for the (discrete) conditional probabilistic semantics.
We mention here a number of open questions: (1) Axiomatize the full (static)
logic of doxastic attitudes introduced in this paper. It can be easily shown that
they can all be reduced to the modalities Ka , Œ>a and ŒŠa . There are a number of
obvious axioms for the resulting logic KŒ>ŒŠ (note in particular that Œ> satisfies
the Gödel-Löb formula!), but the completeness problem is still open. (2) Axiomatize
the logic of common safe belief and common knowledge, and their dynamic versions.
More generally, explore the logics obtained by adding fixed points, or at least
“epistemic regular (PDL-like) operations” as in van Benthem et al. (2006b), on top
of our doxastic modalities. (3) Investigate the expressive limits of this approach with
respect to belief-revision policies: what policies can be simulated by our update? (4)
Extend the work in Baltag and Smets (2007a,b), by investigating and axiomatizing
doxastic logics on infinite conditional probability models. (5) Extend the logics with
quantitative (probabilistic) modal operators BPa;x Q (or a;x Q) expressing that the
degree of conditional belief in Q given P (or the degree of safety of the belief in Q)
is at least x.
856 A. Baltag and S. Smets
Acknowledgements Sonja Smets’ contribution to this research was made possible by the post-
doctoral fellowship awarded to her by the Flemish Fund for Scientific Research. We thank Johan
van Benthem for his insights and help, and for the illuminating discussions we had with him on
the topic of this paper. His pioneering work on dynamic belief revision acted as the “trigger” for
our own. We also thank Larry Moss, Hans van Ditmarsch, Jan van Eijck and Hans Rott for their
most valuable feedback. Finally, we thank the editors and the anonymous referees of the LOFT7-
proceedings for their useful suggestions and comments.
During the republication of this paper in 2015, the research of Sonja Smets was funded by the
European Research Council under the European Community’s Seventh Framework Programme
(FP7/2007-2013)/ERC Grant agreement no.283963.
References
Alchourrón, C. E., Gärdenfors, P., & Makinson, D. (1985). On the logic of theory change: partial
meet contraction and revision functions. Journal of Symbolic Logic, 50(2), 510–530.
Aucher, G. (2003). A combined system for update logic and belief revision. Master’s thesis,
University of Amsterdam. ILLC Publications MoL-2003-03.
Aumann, R. J. (1999). Interactive epistemology I: Knowledge. International Journal of Game
Theory, 28(3), 263–300.
Baltag, A. (2002). A logic for suspicious players: Epistemic actions and belief updates in games.
Bulletin of Economic Research, 54(1), 1–46.
Baltag, A., & Moss, L. S. (2004). Logics for epistemic programs. Synthese, 139(2), 165–224.
Baltag, A., Moss, L. S., & Solecki, S. (1998). The logic of public announcements, common
knowledge, and private suspicions. In I. Gilboa (Ed.), Proceedings of the 7th Conference on
Theoretical Aspects of Rationality and Knowledge (TARK 98), Morgan Kaufmann Publishers
Inc. San Francisco, CA, USA, (pp. 43–56).
Baltag, A., & Sadrzadeh, M. (2006). The algebra of multi-agent dynamic belief revision. Electronic
Notes in Theoretical Computer Science, 157(4), 37–56.
Baltag, A., & Smets, S. (2006). Conditional doxastic models: A qualitative approach to dynamic
belief revision. Electronic Notes in Theoretical Computer Science, 165, 5–21.
Baltag, A., & Smets, S. (2006b) Dynamic belief revision over multi-agent plausibility models. In
Bonanno et al. (2006) (pp. 11–24).
Baltag, A., & Smets, S. (2006c). The logic of conditional doxastic actions: A theory of dynamic
multi-agent belief revision. In S. Artemov, & Parikh, R. (Eds.), Proceedings of ESSLLI
Workshop on Rationality and Knowledge, (pp. 13–30). ESSLLI.
Baltag, A., & Smets, S. (2007a). From conditional probability to the logic of doxastic actions. In
D. Samet (Ed.), Proceedings of the 11th Conference on Theoretical Aspects of Rationality and
Knowledge (TARK), Brussels (pp. 52–61). UCL Presses Universitaires de Louvain.
Baltag, A., & Smets, S. (2007b). Probabilistic dynamic belief revision. In J. F. A. K. van Benthem,
S. Ju, & F. Veltman (Eds.), A Meeting of the Minds: Proceedings of the Workshop on Logic,
Rationality and Interaction, Beijing, 2007 (Texts in computer science, Vol. 8). London: College
Publications.
Battigalli, P., & Siniscalchi, M. (2002). Strong belief and forward induction reasoning. Journal of
Economic Theory, 105(2), 356–391.
Blackburn, P., de Rijke, M., & Venema, Y. (2001). Modal logic (Cambridge tracts in theoretical
computer science, Vol. 53). Cambridge: Cambridge University Press.
Board, O. (2002). Dynamic interactive epistemology. Games and Economic Behaviour, 49(1),
49–80.
Bonanno, G. (2005). A simple modal logic for belief revision. Synthese, 147(2), 193–228.
39 Dynamic Interactive Belief Revision 857
Bonanno, G., van der Hoek, W., & Wooldridge, M. (Eds.). (2006). Proceedings of the 7th
Conference on Logic and the Foundations of Game and Decision Theory (LOFT7), University
of Liverpool UK.
Friedmann, N., & Halpern, J. Y. (1994). Conditional logics of belief revision. In Proceedings of the
of 12th National Conference on Artificial Intelligence (AAAI-94), Seattle, 31 July–4 Aug 1994
(pp. 915–921). Menlo Park: AAAI.
Gärdenfors, P. Knowledge in flux: Modelling the dynamics of epistemic states. Gardenfors. 1988,
MIT Press, Cambridge/London.
Gerbrandy, J. (1999). Dynamic epistemic logic. In L. S. Moss, J. Ginzburg, & M. de Rijke (Eds.),
Logic, language and information (Vol. 2, p. 67–84). Stanford: CSLI Publications/Stanford
University.
Gerbrandy, J., & Groeneveld, W. (1997). Reasoning about information change. Journal of Logic,
Language and Information, 6(2), 147–169.
Gerbrandy, J. D. (1999). Bisimulations on planet Kripke. PhD thesis, University of Amsterdam.
ILLC Publications, DS-1999-01.
Gettier, E. (1963). Is justified true belief knowledge? Analysis, 23(6), 121–123.
Gochet, P., & Gribomont, P. (2006). Epistemic logic. In D. M. Gabbay & J. Woods (Eds.),
Handbook of the history of logic (Vol. 7, p. 99–195). Oxford: Elsevier.
Grove, A. (1988). Two modellings for theory change. Journal of Philosophical Logic, 17(2),
157–170.
Hintikka, J. (1962). Knowledge and belief. Ithaca: Cornell University Press.
Katsuno, H., & Mendelzon, A. O. (1992). On the difference between updating a knowledge base
and revising it. In P. Gärdenfors (Ed.), Belief revision (Cambridge tracts in theoretical computer
science, pp. 183–203). Cambridge/New York: Cambridge University Press.
Klein, P. (1971). A proposed definition of propositional knowledge. Journal of Philosophy, 68(16),
471–482.
Kooi, B. P. (2003). Probabilistic dynamic epistemic logic. Journal of Logic, Language and
Information, 12(4), 381–408.
Lehrer, K. (1990). Theory of knowledge. London: Routledge.
Lehrer, K., & Paxson, T. Jr. (1969). Knowledge: Undefeated justified true belief. Journal of
Philosophy, 66(8), 225–237.
Meyer, J.-J. Ch. & van der Hoek, W. (1995). Epistemic logic for AI and computer science
(Cambridge tracts in theoretical computer science, Vol. 41). Cambridge: Cambridge University
Press.
Pappas, G., & Swain, M. (Eds.). (1978). Essays on knowledge and justification. Ithaca: Cornell
University Press.
Plaza, J. A. (1989). Logics of public communications. In M. L. Emrich, M. S. Pfeifer,
M. Hadzikadic, & Z. W. Ras (Eds.), Proceedings of the 4th International Symposium on
Methodologies for Intelligent Systems Poster Session Program (pp. 201–216). Oak Ridge
National Laboratory, ORNL/DSRD-24.
Rott, H. (1989). Conditionals and theory change: Revisions, expansions, and additions. Synthese,
81(1), 91–113.
Rott, H. (2004). Stability, strength and sensitivity: Converting belief into knowledge. Erkenntnis,
61(2–3), 469–493.
Ryan, M., & Schobbens, P.-Y. (1997). Counterfactuals and updates as inverse modalities. Journal
of Logic, Language and Information, 6(2), 123–146.
Segerberg, K. (1998). Irrevocable belief revision in dynamic doxastic logic. Notre Dame Journal
of Formal Logic, 39(3), 287–306.
Spohn, W. (1988). Ordinal conditional functions: A dynamic theory of epistemic states. In W. L.
Harper & B. Skyrms (Eds.), Causation in decision, belief change, and statistics (Vol. II,
pp. 105–134). Dordrecht/Boston: Kluwer Academic
Stalnaker, R. (1968). A theory of conditionals. In N. Rescher (Ed.), Studies in logical theory (APQ
monograph series, Vol. 2). Oxford: Blackwell.
858 A. Baltag and S. Smets
Stalnaker, R. (1996). Knowledge, belief and counterfactual reasoning in games. Economics and
Philosophy, 12, 133–163.
Stalnaker, R. (2006). On logics of knowledge and belief. Philosophical Studies, 128(1), 169–199.
van Benthem, J. F. A. K. (2007). Dynamic logic for belief revision. Journal of Applied Non-
classical Logics, 17(2), 129–155.
van Benthem, J. F. A. K., Gerbrandy, J., & Kooi, B. (2006a) Dynamic update with probabilities. In
Bonanno et al. (2006) (pp. 237–246).
van Benthem, J. F. A. K., & Liu, F. (2004). Dynamic logic of preference upgrade. Technical report,
University of Amsterdam. ILLC Publications, PP-2005-29.
van Benthem, J. F. A. K., van Eijck, J., & Kooi, B. P. (2006b). Logics of communication and
change. Information and Computation, 204(11), 1620–1662.
van der Hoek, W. (1993). Systems for knowledge and beliefs. Journal of Logic and Computation,
3(2), 173–195.
van Ditmarsch, H. P. (2000). Knowledge games. PhD thesis, University of Groningen. ILLC
Pubications, DS-2000-06.
van Ditmarsch, H. P. (2002). Descriptions of game actions. Journal of Logic, Language and
Information, 11(3), 349–365.
van Ditmarsch, H. P. (2005) Prolegomena to dynamic logic for belief revision. Synthese, 147(2),
229–275.
van Ditmarsch, H. P., & Labuschagne, W. (2007). My beliefs about your beliefs: A case study in
theory of mind and epistemic logic. Synthese, 155(2), 191–209.
van Ditmarsch, H. P., van der Hoek, W., & Kooi, B. P. (2007). Dynamic epistemic logic (Synthese
library, Vol. 337). Dordrecht: Springer.
Voorbraak, F. P. J. M. (1993). As far as I know. PhD thesis, Utrecht University, Utrecht
(Quaestiones infinitae, Vol. VII ).
Williamson, T. (2001). Some philosophical aspects of reasoning about knowledge. In J. van
Benthem (Ed.), Proceedings of the 8th Conference on Theoretical Aspects of Rationality and
Knowledge (TARK’01) (p. 97). San Francisco: Morgan Kaufmann.
Chapter 40
Agreeing to Disagree
Robert J. Aumann
If two people have the same priors, and their posteriors for a given event A are
common knowledge, then these posteriors must be equal. This is so even though
they may base their posteriors on quite different information. In brief, people with
the same priors cannot agree to disagree.
We publish this observation with some diffidence, since once one has the
appropriate framework, it is mathematically trivial. Intuitively, though, it is not
quite obvious; and it is of some interest in areas in which people’s beliefs about
each other’s beliefs are of importance, such as game theory1 and the economics of
information.2 A “concrete” illustration that may clarify matters (and that may be
read at this point) is found at the end of the paper.
The key notion is that of “common knowledge.” Call the two people 1 and 2.
When we say that an event is “common knowledge,” we mean more than just that
both 1 and 2 know it; we require also that 1 knows that 2 knows it, 2 knows that 1
knows it, 1 knows that 2 knows that 1 knows it, and so on. For example, if 1 and
2 are both present when the event happens and see each other there, then the event
AMS 1970 subject classifications. Primary 62A15, 62C05; Secondary 90A05, 90D35.
Key words and phrases. Information, subjective probability, posterior, statistics, game theory,
revising probabilities, concensus, Harsanyi doctrine.
1
Cf. Harsanyi (1967–1968); also Aumann (1974), especially Section 9j (page 92), in which the
question answered here was originally raised.
2
Cf., e.g., Radner (1968, 1972); also the review by Grossman and Stiglitz (1976) and the papers
quoted there.
R.J. Aumann ()
Federmann Center for the Study of Rationality, The Hebrew University of Jerusalem,
Jerusalem, Israel
e-mail: [email protected]
becomes common knowledge. In our case, if 1 and 2 tell each other their posteriors
and trust each other, then the posteriors are common knowledge. The result is not
true if we merely assume that the persons know each other’s posteriors.
Formally, let (, B, p) be a probability space, P1 and P2 partitions of whose
join3 P1 _ P2 consists of nonnull events.4 In the interpretation, (, B) is the space
of states of the world, p the common prior of 1 and 2, and Pi the information
partition of i; that is, if the true state of the world is !, then i is informed of that
element Pi (!) of Pi that contains !. Given ! in , an event E is called common
knowledge at ! if E includes that member of the meet5 P1 ^ P2 that contains !.
We will show below that this definition is equivalent to the informal description
given above.
Let A be an event, and let qi denote the posterior probability p(A j Pi ) of A given
i’s information; i.e., if ! 2 ˝, then qi (!) D p(A \ Pi (!))/p(Pi(!)).
Proposition Let ! 2 , and let q1 and q2 be numbers. If it is common knowledge
at ! that q1 D q1 and q2 D q2 , then q1 D q2 .
Proof Let P be the member of P1 ^ P2 that contains !. Write P D [j Pj ,
where the Pj are disjoint members of P1 . Since q1 D q1 throughout P, we have
p(A \ Pj )/p(Pj) D q1 for all j; hence p(A \ Pj ) D q1 p(Pj ), and so by summing over
j we get p(A \ P) D q1 p(P). Similarly p(A \ P) D q2 p(P), and so q1 D q2 . This
completes the proof.
To see that the formal definition of “common knowledge” is equivalent to the
informal description, let ! 2 , and call a member ! 0 of reachable from ! if
there is a sequence P1 , P2 , : : : , Pk such that ! 2 P1 , ! 0 2 Pk , and consecutive Pj
intersect and belong alternatively to P1 and P2 . Suppose now that ! is the true
state of the world, P1 D P1 (!), and E is an event. To say that 1 “knows” E means
that E includes P1 . To say that 1 knows that 2 knows E means that E includes all
P2 in P2 that intersect P1 . To say that 1 knows that 2 knows that 1 knows E means
that E includes all P3 in P1 that intersect P2 in P2 that intersect P1 . And so on.
Thus all sentences of the form “i knows that i0 knows that i knows : : : E” (where
i0 D 3 i) are true if and only if E contains all ! 0 reachable from !. But the set
of all ! 0 reachable from ! is a member of Pi ^ P2 ; so the desired equivalence is
established.
The result fails when people merely know each other’s posteriors. Suppose
has 4 elements ˛, ˇ,
, ı of equal (prior) probability, P1 D f˛ˇ,
ıg, P2 D f˛ˇ
,
ıg, A D ˛ı, and ! D ˛. Then 1 knows that q2 is 13 , and 2 knows that q1 is 12 ; but 2
thinks that 1 may not know what q2 is ( 31 or 1).
Worthy of note is the implicit assumption that the information partitions P1
and P2 are themselves common knowledge. Actually, this constitutes no loss of
3
Coarsest common refinement of P1 and P2 .
4
Events whose (prior) probability does not vanish.
5
Finest common coarsening of P1 and P2 .
40 Agreeing to Disagree 861
generality. Included in the full description of a state ! of the world is the manner in
which information is imparted to the two persons. This implies that the information
sets P1 (!) and P2 (!) are indeed defined unambiguously as functions of !, and that
these functions are known to both players.
Consider next the assumption of equal priors for different people. John Harsanyi
(1968) has argued eloquently that differences in subjective probabilities should be
traced exclusively to differences in information—that there is no rational basis
for people who have always been fed precisely the same information to maintain
different subjective probabilities. This, of course, is equivalent to the assumption
of equal priors. The result of this paper might be considered evidence against this
view, as there are in fact people who respect each other’s opinions and nevertheless
disagree heartily about subjective probabilities. But this evidence is not conclusive:
even people who respect each other’s acumen may ascribe to each other errors in
calculating posteriors. Of course we do not mean simple arithmetical mistakes, but
rather systematic biases such as those discussed by Tversky and Kahneman (1974).
In private conversation, Tversky has suggested that people may also be biased
because of psychological factors that may make them disregard information that
is unpleasant or does not conform to previously formed notions.
There is a considerable literature about reaching agreement on subjective proba-
bilities; a recent paper is DeGroot (1974), where a bibliography on the subject may
be found. A “practical” method is the Delphi technique (see, e.g., Dalkey (1972)).
It seems to me that the Harsanyi doctrine is implicit in much of this literature;
reconciling subjective probabilities makes sense if it is a question of implicitly
exchanging information, but not if we are talking about “innate” differences in
priors. The result of this paper might be considered a theoretical foundation for
the reconciliation of subjective probabilities.
As an illustration, suppose 1 and 2 have a uniform prior on the parameter of
a coin, and let A be the event that the coin will come up H (heads) on the next
toss. Suppose that each person is permitted to make one previous toss, and that
these tosses come up H and T (tails) respectively. If each one’s information consists
precisely of the outcome of his toss, then the posteriors for A will be 23 and 31
respectively. If each one then informs the other one of his posterior, then they will
both conclude that the previous tosses came up once H and once T, so that both
posteriors will be revised to 21 .
Suppose now that each person is permitted to make several previous tosses, but
that neither one knows how many tosses are allowed to the other one. For example,
perhaps both make 4 tosses, which come up HHHT for 1, and HTTT for 2. They
then inform each other that their posteriors are 32 and 13 respectively. Now these
posteriors may result from a single observation, from 4 observations, or from more.
Since neither one knows on what observations the other’s posterior is based, he may
be inclined to give more weight to his own observations. Some revision of posteriors
would certainly be called for even in such a case; but it does not seem clear that it
would necessarily lead to equal posteriors.
Presumably, such a revision would take into account each person’s prior on the
number of tosses available to him and to the other person. By assumption these two
862 R.J. Aumann
priors are the same, but each person gets additional private information—namely,
the actual number of tosses he is allotted. By use of the prior and the information
that the posteriors are, respectively, 32 and 13 , new posteriors may be calculated. If
the players inform each other of these new posteriors, further revision may be called
for. Our result implies that the process of exchanging information on the posteriors
for A will continue until these posteriors are equal.
References
Introduction
Game theoretic reasoning has been widely applied in economics in recent years.
Undoubtedly, the most commonly used tool has been the strategic equilibrium of
Nash (1951), or one or another of its so-called “refinements.” Though much effort1
has gone into developing these refinements, relatively little attention has been paid
to a more basic question: Why consider Nash equilibrium in the first place?
A Nash equilibrium is defined as a way of playing the game—an n-tuple of
strategies—in which each player’s strategy is optimal for him, given the strategies
of the others. The definition seems beautifully simple and compelling; but when
considered carefully, it lacks a clear motivation. What would make the players play
such an equilibrium? How, exactly, would it come about?
Over the years, much has been written about the connection between Nash
equilibrium and common knowledge.2 According to conventional wisdom, Nash
1
Selten (1965), (1975), Myerson (1978), Kreps and Wilson (1982), Kohlberg and Mertens (1986),
and many others.
2
An event is called common knowledge if all players know it, all know that all know it, and so on
ad infinitum (Lewis 1969).
R.J. Aumann ()
Federmann Center for the Study of Rationality, The Hebrew University of Jerusalem,
Jerusalem, Israel
e-mail: [email protected]
A. Brandenburger
Stern School of Business, Tandon School of Engineering, NYU Shanghai, New York University,
New York, NY 10012, USA
e-mail: [email protected]; adambrandenburger.com
equilibrium is “based on” common knowledge of (a) the structure of the game
(i.e., the payoff functions3), (b) the rationality4 of the players, and (c) the strategies
actually played. These ideas sound appealing; the circularity of Nash equilibrium—
each player chooses his strategy only because the others choose theirs—does seem
related to the infinite hierarchy of beliefs inherent in common knowledge. But a
formalization has proved elusive. What, precisely, does “based on” mean? Can the
above wisdom be turned into a theorem?
It is our purpose here to clarify these issues in a formal framework. Specifically,
we seek epistemic conditions for a given strategy profile to be a Nash equilibrium:
conditions involving what the players know or believe about one another—in
particular, about payoff functions, strategy choices, decision procedures, and beliefs
about these matters.5
Surprisingly, we will find that common knowledge of the payoff functions and of
rationality are never needed; much weaker epistemic conditions suffice. Common
knowledge of the strategies actually being played is also irrelevant. What does turn
out to be relevant is common knowledge of the beliefs that the players hold about
the strategies of the others; but here, too, common knowledge is relevant only when
there are at least three players.
The main results are informally described in section “Description of the results.”
The background for a formal presentation is given in section “Interactive belief
systems”; the underlying tool is that of an interactive belief system, which provides
a framework for formulating epistemic conditions. Illustrations of such systems
are given in section “Illustrations.” The formal statements and proofs of the
main results, in section “Formal statements and proofs of the theorems,” are
preceded by some lemmas in section “Properties of belief systems.” Sections
“The main counterexamples” and “Additional counterexamples” contain a series of
counterexamples showing the results to be sharp; the examples—particularly those
of section “The main counterexamples”—provide insight into the role played by the
various epistemic conditions. Section “General (infinite) belief systems” shows that
the results apply to infinite as well as finite belief systems. Section “Discussion”
is devoted to a discussion of conceptual aspects and of the related literature. An
appendix treats extensions and converses of the main results.
The reader wishing to understand just the main ideas should read sections
“Description of the results” and “The main counterexamples,” and skim sections
“Interactive belief systems” and “Illustrations.”
3
See section “Structure of the game” for a discussion of why the payoff functions can be identified
with the “structure of the game.”
4
I.e., that the players are optimizers; that given the opportunity, they will choose a higher payoff.
A formal definition is given below.
5
Other epistemic conditions for Nash equilibrium have been obtained by Armbruster and Boege
(1979) and Tan and Werlang (1988).
41 Epistemic Conditions for Nash Equilibrium 865
An event is called mutual knowledge if all players simply know it (to be distin-
guished from common knowledge, which also requires higher knowledge levels—
knowledge about knowledge, and so on). Our first and simplest result is Theorem
41.1: Suppose that each player is rational and knows his own payoff function, and
that the strategy choices of the players are mutually known. Then these choices
constitute a Nash equilibrium in the game being played.
The proof is immediate: Since each player knows the choices of the others, and is
rational, his choice must be optimal given theirs; so by definition, we are at a Nash
equilibrium.
Note that neither the players’ rationality, nor their payoff functions, nor their
strategy choices are assumed common knowledge. For strategies, only mutual
knowledge is assumed. For rationality and the structure of the game, not even mutual
knowledge is assumed; only that the players are in fact rational, and know their own
payoff functions.6
Theorem 41.1 applies to all pure strategy profiles. It applies also to mixed strategy
profiles, under the traditional view of mixed strategies as conscious randomizations;
in that case, of course, it is the mixture that must be mutually known, not just their
pure realizations.
In recent years, a different view of mixed strategies has emerged.7 In this view,
players do not randomize; each player chooses some definite pure strategy. But the
other players need not know which one, and the mixture represents their uncertainty,
their probability assessment of his choice. This is the view adopted in the sequel;
it fits in well with the Bayesian approach to game theory, in which uncertainty
about strategic choices of others is, like any other uncertainty, subject to probability
assessment by each player.
For brevity, let us refer to pure strategies as actions. Define the conjecture of a
player as his probability assessment of the actions of the other players. Call a player
rational if his action maximizes his expected payoff given his conjecture.
When there are two players (and only then), the conjecture of each player is a
mixed strategy of the other player. Because of this, the results in the two-person and
n-person cases are quite different. For two-person games, we have the following
(Theorem 41.2): Suppose that the game being played (i.e., both payoff functions),
the rationality of the players, and their conjectures are all mutually known. Then
the conjectures constitute a Nash equilibrium.
Theorem 41.2 differs from Theorem 41.1 in two ways. First, in both the
conclusion and the hypothesis, strategy choices are replaced by conjectures; thus
we get “conjectural equilibrium”—an equilibrium in conjectures, not in strategies
6
When a game is presented in strategic form, as here, knowledge of one’s own payoff function may
be considered tautologous. See section “Interactive belief systems.”
7
Harsanyi (1973), Armbruster and Boege (1979), Aumann (1987), Tan and Werlang (1988),
Brandenburger and Dekel (1989), among others.
866 R.J. Aumann and A. Brandenburger
actually played. Second, the hypothesis calls not just for the fact of rationality, but
for mutual knowledge of this fact, and for mutual knowledge of the payoff functions.
But common knowledge still does not enter the picture.
Since we are now viewing mixed strategies as conjectures, it is natural that
conjectures replace choices in the result. So with n players, too, one might expect
a theorem roughly analogous to Theorem 41.2; i.e., that mutual knowledge of the
conjectures (when combined with appropriate assumptions about rationality and the
payoff functions) is sufficient for them to be in equilibrium. But here we are in for
a surprise: when n > 2, the conditions for a conjectural equilibrium become much
more stringent.
To understand the situation, note that the conjecture of each player i is a proba-
bility mixture of (n l)-tuples of pure strategies of the other players. So when n > 2,
it is not itself a mixed strategy; however, it induces a mixed strategy8 for each player
j other than i, called i’s conjecture about j. One difficulty is that different players
other than j may have different conjectures about j, in which case it is not clear how
to define j’s component of the conjectural equilibrium we seek to construct.
To present Theorem 41.3, the n-person “conjecture theorem”, one more concept
is needed. We say that the players have a common prior9 if all differences between
their probability assessments are due only to differences in their information; more
precisely, if there is an outside observer O with no private information,10 such that
for all players i, if O were given i’s information, his probability assessments would
be the same as i’s.
Theorem 41.3 is now as follows: In an n-player game, suppose that the players
have a common prior, that their payoff functions and their rationality are mutually
known, and that their conjectures are commonly known. Then for each player j, all
the players i agree on the same conjecture j about j, and the resulting profile ( 1 ,
: : : , n ) of mixed strategies is a Nash equilibrium.
The above three theorems give sufficient epistemic conditions for Nash equi-
librium. The conditions are not necessary; it is always possible for the players
to blunder into a Nash equilibrium “by accident,” so to speak, without anybody
knowing much of anything. Nevertheless, all three theorems are “sharp,” in the sense
that they cannot be improved upon; none of the conditions can be dispensed with,
or, so far as we can see, significantly weakened.
The presentation in this section, while correct, has been informal. For a formal
presentation, one needs a framework for describing “epistemic” situations in game
contexts; in which, for example, one can describe a situation where each player
maximizes against the choices of the others, all know this, but not all know that all
know this. Such frameworks are available in the differential information literature;
a particular adaptation is presented in the next section.
8
The marginal on j’s strategy space of i’s overall conjecture.
9
Aumann (1987); for a formal definition, see section “Interactive belief systems.” Harsanyi (1967–
1968) uses the term “consistency” to describe this situation.
10
That is, what O knows is common knowledge among the players; each player knows everything
that O knows.
41 Epistemic Conditions for Nash Equilibrium 867
Let us be given a strategic game form; that is, a finite set f1, : : : , ng (the players),
together with an action set Ai for each player i. Set A :D A1 An . An
interactive belief system (or simply belief system) for this game form is defined
to consist of:
(1) for each player i, a set Si (i’s types),
and for each type si of i,
(2) a probability distribution on the set S i of (n l)-tuples of types of the other
players (si ’s theory),
(3) an action ai for i (si ’s action),
and
(4) a function gi : A ! R (si ’s payoff function).
The action sets Ai are assumed finite. One may also think of the spaces Si as
finite; the ideas are then more transparent. For a general definition, where the Si are
measurable spaces and the theories are probability measures,11 see section “General
(infinite) belief systems.”
A belief system is a formal description of the players’ beliefs—about each other’s
actions and payoff functions, about these beliefs, and so on. Specifically, the theory
of a type si represents the probabilities that si ascribes to the types of the other
players, and so to their actions, their payoff functions, and their theories. See section
“Belief systems” for further discussion.
Set S :D S1 Sn . Call the members s D (s1 , : : : , sn ) of S states of the world,
or simply states. An event is a subset E of S. Denote by p(; si ) the probability
distribution on S induced by si ’s theory; formally, if E is an event, then p (E; si ) is
the probability assigned by si to fs i 2 S i : (si , s i ) 2 Eg.
A function g : A ! Rn (an n-tuple of payoff functions) is called a game.
Set A i :D A1 Ai 1 AiC1 An ; for a in A set a i :D (a1 , : : : , ai 1 ,
aiC1 , : : : , an ). When referring to player i, the phrase “at s” means “at si ”. Thus,
“i’s action at s” means si ’s action (see (3)); we denote it ai (s), and write a(s) for the
n-tuple (a1 (s), : : : , an (s)) of actions at s. Similarly, “i’s payoff function at s” means
si ’s payoff function (see (4)); we denote it gi (s), and write g(s) for the n-tuple (g1 (s),
: : : , gn (s)) of payoff functions12 at s. Viewed as a function of a, we call g(s) “the
game being played at s,” or simply “the game at s.”
11
Readers unfamiliar with measure theory may think of the type spaces Si as finite throughout
the paper. All the examples involve finite Si only. The results, too, are stated and proved without
reference to measure theory, and may be understood completely in terms of finite Si . On the other
hand, we do not require finite Si ; the definitions, theorems, and proofs are all worded so that
when interpreted as in section “General (infinite) belief systems,” they apply without change to the
general case. One can also dispense with finiteness of the action spaces Ai ; but that is both more
involved and less important, and we will not do it here.
12
Thus i’s actual payoff at the state s is gi (s)(a(s)).
868 R.J. Aumann and A. Brandenburger
Functions defined on S (like ai (s), a(s), gi (s), and g(s)) may be viewed like
random variables in probability theory. Thus if x is such a function and x is one
of its values, then [x D x], or simply [x], denotes the event fs 2 S : x (s) D xg. For
example, [ai ] denotes the event that i chooses the action ai ; and [g] denotes the event
that the game g is being played.
A conjecture ' i of i is a probability distribution on A i . For j ¤ i, the marginal
of ' i on Aj is called the conjecture of i about j induced by ' i . The theory of i at a
state s yields a conjecture 'i (s), called i’s conjecture at s, given by 'i (s) (a i ) :D p
([a i ]; si ). We denote the n-tuple ('1 (s), : : : , 'n (s)) of conjectures at s by '(s).
Player i is called rational at s if his action at s maximizes his expected payoff
given his information (i.e., his type si ); formally, letting hi :D gi (s) and bi , :D ai (s),
this means that Exp (hi (bi , a i ) jsi ) Exp (hi (ai , a i ) jsi ) for all ai in Ai . Another
way of saying this is that i’s actual choice bi maximizes the expectation of his actual
payoff hi when the other players’ actions are distributed according to his actual
conjecture 'i (s).
Player i is said to know an event E at s if at s, he ascribes probability 1 to E.
Define Ki E as the set of all those s at which i knows E. Set K1 E :D K1 E \ \
Kn E; thus K1 E is the event that all players know E. If s 2 K1 E, call E mutually known
at s. Set CKE :D K1 E \ K1 K1 E \ K1 K1 K1 E \ ; if s 2 CKE, call E commonly
known at s.
A probability distribution P on S is called a common prior if for all players i and
all of their types si , the conditional distribution of P given si is p(; si ); this implies
that for all i, all events E and F, and all numbers ,
(5) if p(E; si ) D p(F; si ) for all si 2 Si , then P(E) D P(F).
In words, (5) says that for each player i, if two events have proportional
probabilities given any si , then they have proportional prior probabilities.
Another regularity condition that is sometimes used in the differential informa-
tion literature is “mutual absolute continuity.” We do not define this here because
we have no use for it.
Belief systems provide a formal language for stating epistemic conditions. When
we say that a player knows some event E, or is rational, or has a certain conjecture
' i or payoff function gi , we mean that that is the case at some specific state s of the
world. Thus at s, Rowena may know E, but not know that Colin knows E; or at s, it
may be that Colin is rational, and that Rowena knows this, but that Colin does not
know that Rowena knows it. Some illustrations of these ideas are given in the next
section.
Illustrations
Example 41.1 We start with a belief system in which all types of each player i have
the same payoff function gi , namely, that depicted in Fig. 41.1. Thus the game being
played is commonly known. Call the row and column players (Players 1 and 2)
“Rowena” and “Colin” respectively.
41 Epistemic Conditions for Nash Equilibrium 869
Fig. 41.1 c d
C 2, 2 0, 0
D 0, 0 1, 1
Fig. 41.2 c1 d1 d2
The theories are depicted in Fig. 41.2; here C1 denotes a type of Rowena whose
action is C, whereas D1 and D2 denote two different types of Rowena whose actions
are D. Similarly for Colin. Each square denotes a state, i.e., a pair of types. The
two entries in each square denote the probabilities that the corresponding types
of Rowena and Colin ascribe to that state. For example, Colin’s type d2 attributes
1/2 1/2 probabilities to Rowena’s type being D1 or D2 . So at state (D2 , d2 ), he
knows that Rowena will choose the action D. Similarly, Rowena knows at (D2 , d2 )
that Colin will choose d. Since d and D are optimal against each other, both players
are rational at (D2 , d2 ), and (D, d) is a Nash equilibrium.
We have here a typical instance of Theorem 41.1 (see the beginning of section
“Description of the results”), which also shows that the folk wisdom cited in the
introduction is misleading. At (D2 , d2 ), there is mutual knowledge of the actions
D and d, and both players are in fact rational. But the actions are not common
knowledge. Thus, though Colin knows that Rowena will play D, she doesn’t know
that he knows this; indeed, she attributes probability 2/3 to his attributing probability
3/5 to her playing C. Moreover, though both players are rational at (D2 , d2 ), there
isn’t even mutual knowledge of rationality there. For example, Colin’s type d1
chooses d, with an expected payoff of 2/5, rather than c, with an expected payoff
of 6/5; thus this type is “irrational.” At (D2 , d2 ), Rowena attributes probability 2/3
to Colin being of this irrational type.
Note that the players have a common prior (Fig. 41.3). But Theorem 41.1
has nothing to do with common priors. If, for example, we change the theory of
Rowena’s type D2 from 23 d1 C 31 d2 to 12 d1 C 21 d2 , then there is no longer a common
prior; but (D, d) is still Nash equilibrium, for the same reasons as above. (As usual,
870 R.J. Aumann and A. Brandenburger
Fig. 41.3 c1 d1 d2
C1 0.2 0.3 0
D1 0.1 0 0.1
D2 0 0.2 0.1
Fig. 41.4 h t
H 1, 0 0, 1
T 0, 1 1, 0
Fig. 41.5 h1 t1 t2
T2 0, 0 1, 1/2 0, 0
that these are the conjectures). Moreover, the rationality of both players is mutually
known. Thus Theorem 41.2 (section “Description of the results”) implies that
. 21 H C 12 T; 12 h C 12 t/ is a Nash equilibrium, which indeed it is.
Note that neither the conjectures of the players nor their rationality are commonly
known. Indeed, at (T1 , t2 ), Colin knows that Rowena plays T, so that his conjecture is
not 21 H C 12 T but T; so it is irrational for him to play t, which yields him 0, rather than
1 that he could get by playing h. At the state (H1 , h1 ), Colin attributes probability
1/2 to Rowena attributing probability 1/2 to the state (T1 , t2 ); so at (H1, h1 ), there is
common knowledge neither of Colin’s conjecture 12 H C 21 T, nor of his rationality.
41 Epistemic Conditions for Nash Equilibrium 871
Fig. 41.6 h1 t1 t2
H1 0.2 0.2 0
T1 0.2 0 0.2
T2 0 0.2 0
Note, too, that like in the previous example, this belief system has a common
prior (Fig. 41.6). But also like there, this is not essential; the discussion would not
be affected if, say, we changed the theory of Rowena’s type T2 from t1 to 21 t1 C 21 t2 .
In this section we formally establish some basic properties of belief systems, which
are needed in the sequel. These properties are intuitively fairly obvious, and some
are well known in various formalizations of interactive knowledge theory; so this
section can be omitted at a first reading.
Lemma 41.1 Player i knows that he attributes probability to an event E if and
only if he indeed attributes probability to E.
Proof If: Let F be the event that i attributes probability to E; that is, F :D ft 2
S : p(E; ti ) D g. If s 2 F, then p (E; si ) D , so all states u with ui D si are in F.
Therefore p (F; si ) D 1; that is, i knows F at s, so s 2 Ki F.
Only if: Suppose that i attributes probability ¤ to E. By the “if” part of the
proof, he must know this, contrary to his knowing that he attributes probability to
E.
Corollary 41.1 Let ' be an n-tuple of conjectures. Suppose that at some state s, it
is mutually known that ' D '. Then '(s) D '. (In words: if it is mutually known that
the conjectures are ', then they are indeed '.)
Corollary 41.2 A player is rational if and only if he knows that he is rational.
Corollary 41.3 Ki Ki E D Ki E (a player knows something if and only if he knows
that he knows it), and Ki :Ki E D :Ki E (a player doesn’t know something if and
only if he knows that he doesn’t know it).
Lemma 41.2 Ki (E1 \ E2 \ ) D Ki E1 \ Ki E2 \ (a player knows each of
several events if and only if he knows that they all obtain).
Proof At s, player i ascribes probability 1 to E1 \ E2 \ if and only if he ascribes
probability 1 to each E1 , E2 , : : : .
872 R.J. Aumann and A. Brandenburger
Lemma 41.3 CKE Ki CKE (if something is commonly known, then each player
knows that it is commonly known).
Proof Since Ki K1 F K1 K1 F for all F, Lemma 41.2 yields Ki CKE D Ki (K1 E \
K1 K1 E \ ) D Ki K1 E \ Ki K1 K1 E \ K1 K1 E \ K1 K1 K1 E \ CKE.
Lemma 41.4 Suppose P is a common prior, Ki H H, and p (E; si ) D for all s 2
H. Then P(E \ H) D P(H).
Proof Let Hi be the projection of H on Si . From Ki H H it follows that p (H; si ) D 1
or 0 according to whether si is or is not13 in Hi . So when si 2 Hi , then p(E \ H;
si ) D p (E; si ) D D p (H; si ); and when si , 62 Hi , then p(E \ H; si ) D 0 D p(H;
si ). The lemma now follows from (5).
The following lemma is not needed for the proofs of the theorems, but relates to
the examples in section “The main counterexamples.” Set K2 E :D K1 K1 E, K3 E :D
K1 K2 E, and so on. If s 2 Km E, call E mutually known of order m at s.
Lemma 41.5 Km E Km 1 E for m > 1 (mutual knowledge of order m implies
mutual knowledge of orders 1, : : : , m 1).
Proof By Lemma 41.2 and Corollary 41.3, K 2 E D K 1 K 1 E D \niD1 Ki K 1 E
Ki K 1 E D Ki \njD1 Kj E Ki Ki E. Since this is so for all i, we get K 2 E \niD1 Ki E D
K 1 E. The result for m > 2 follows from this by substituting Km 2 E for E.
We now formally state and prove Theorems 41.1, 41.2, and 41.3. For more
transparent paraphrases (using the same terminology), see section “Description of
the results.”
Theorem 41.1 Let a be an n-tuple of actions. Suppose that at some state s, all
players are rational, and all know that a D a. Then a is a Nash equilibrium.
Proof Immediate (see beginning of section “Description of the results”).
Theorem 41.2 With n D 2 (two players), let g be a game, ' a pair of conjectures.
Suppose that at some state, it is mutually known that g D g, that the players are
rational, and that ' D '. Then (' 2 , ' 1 ) is a Nash equilibrium of g.
The proof uses a lemma; we state it for the n-person case, since it is needed again
in the proof of Theorem 41.3.
Lemma 41.6 Let g be a game, ' an n-tuple of conjectures. Suppose that at some
state s, it is mutually known that g D g, that the players are rational, and that ' D '.
13
In particular, i always knows whether or not H obtains.
41 Epistemic Conditions for Nash Equilibrium 873
Let aj be an action of a player j to which the conjecture ' i of some other player i
assigns positive probability. Then aj maximizes gj against14 ' j .
Proof By Corollary 41.1, the conjecture of i at s is ' i . So i attributes positive
probability at s to [aj ]. Also, i attributes probability 1 at s to each of the three events
[j is rational], [' j ], and [gj ]. When one of four events has positive probability, and
the other three each have probability 1, then their intersection is non-empty. So there
is a state t at which all four events obtain: j is rational, he chooses aj , his conjecture
is ' j , and his payoff function is gj . So aj maximizes gj against ' j .
Proof of Theorem 41.2 By Lemma 41.6, every action a1 with positive probability
in ' 2 is optimal against ' 1 in g, and every action a2 with positive probability in ' 1
is optimal against ' 2 in g. This implies that (' 2 , ' 1 ) is a Nash equilibrium of g.
Theorem 41.3 Let g be a game, ' an n-tuple of conjectures. Suppose that the
players have a common prior, which assigns positive probability to it being mutually
known that g D g, mutually known that all players are rational, and commonly
known that ' D '. Then for each j, all the conjectures ' i of players i other than
j induce the same conjecture j for j, and ( 1 , : : : , n ) is a Nash equilibrium of g.
The proof requires a lemma.
Lemma 41.7 Let Q be a probability distribution on A with15 Q(a) D Q (ai ) Q (a i )
for all a in A and all i. Then Q(a) D Q (a1 ) Q (an ) for all a.
Proof By induction. For n D 1 and 2 the result is immediate. Suppose it true for
n 1. From Q(a) D Q(a1 )Q(a 1 ) we obtain, by summing over an , that Q (a n ) D Q
(a1 ) Q (a2 , : : : , an 1 ). Similarly Q (a n ) D Q (a1 ) Q (ai , : : : , ai 1 , aiC1 , : : : ,
an 1 ) whenever i < n. So the induction hypothesis yields Q(a n ) D Q(a1 )Q(a2 )
Q (an 1 ). Hence Q (a) D Q (a n ) Q (an ) D Q (a1 ) Q (a2 ) Q(an ).
Proof of Theorem 41.3 Set F :D CK['], and let P be the common prior. By
assumption, P(F) > 0. Set Q(a) :D P ([a] jF). We show that for all a and i,
i
Q.a/ D Q .ai / Q a : (41.1)
Set H :D [ai ] \ F. By Lemmas 41.2 and 41.3, Ki H H, since i knows his own
action. If s 2 H, it is commonly, and so mutually, known at s that ' D '; so
by Corollary 41.1, '(s) D '; that is, p([a i ]; si ) D ' i (a i ). So Lemma 41.4 (with
E D [a i ]) yields P ([a] jF) D P ([a i ] \ H) D ' i (a i ) P(H) D ' i (a i ) P ([ai ] jF).
Dividing by P(F) yields Q(a) D ' i (a i ) Q (ai ); then summing over ai , we get
i
i
Q a D 'i a : (41.2)
14
That is, Exp gj (aj , a j ) Exp gj (bj , a j ) for all bj in Aj , when a j
is distributed according to
'j.
15
We denote Q (a i ) :D Q (Ai fa i g), Q (ai ) :D Q (A i
fai g), and so on.
874 R.J. Aumann and A. Brandenburger
that is, the distribution ' i is the product of the distributions j with j ¤ i.
Since common knowledge implies mutual knowledge, the hypothesis of the
theorem implies that there is a state at which it is mutually known that g D g, that
players are rational, and that ' D '. So by Lemma 41.6, each action aj with ' i
(aj ) > 0 for some i ¤ j maximizes gj against ' j . By (41.3), these aj are precisely the
ones that appear with positive probability in j . Again using (41.3), we conclude
that each action appearing with positive probability in j maximizes gj against the
product of the distributions k with k ¤ j. This implies that ( 1 , : : : , n ) is a Nash
equilibrium of g.
This section explores possible variations on Theorem 41.3 (the result giving
sufficient epistemic conditions, when n 3, for the players’ conjectures to yield
a Nash equilibrium). For simplicity, let n D 3. Each player’s “overall” conjecture
is then a distribution on pairs of actions of the other two players; so the three
conjectures form a triple of probability mixtures of action pairs. On the other hand,
an equilibrium is a triple of mixed actions. Our discussion hinges on the relation
between these two kinds of objects.
First, since our real concern is with mixtures of actions rather than of action
pairs, could we not formulate conditions that deal directly with each player’s
“individual” conjectures—his conjectures about each of the other players—rather
than with his overall conjecture? For example, one might hope that it would be
sufficient to assume common knowledge of each player’s individual conjectures.
Example 41.3 shows that this hope is vain, even when the priors are common and
rationality is commonly known. Overall conjectures do play an essential role.
Nevertheless, common knowledge of the overall conjectures seems a rather
strong assumption. Couldn’t we get away with less—say, with mutual knowledge
of the overall conjectures, or with mutual knowledge of a high order?
Again, the answer is no. In Example 41.4, there is mutual knowledge of overall
conjectures (which may be of an arbitrarily high order), common knowledge of
rationality, and common prior, but the individual conjectures do not constitute a
Nash equilibrium.
What drives this example is that different players have different individual
conjectures about some particular player j, so there isn’t even a clear candidate
41 Epistemic Conditions for Nash Equilibrium 875
for a Nash equilibrium.16 This raises the question of whether (sufficiently high
order) mutual knowledge of the overall conjectures implies Nash equilibrium of the
individual conjectures when the players do happen to agree, in addition to assuming
(sufficiently high order) mutual knowledge of the overall conjectures. Do we get
Nash equilibrium?
Again, the answer is no; this is shown in Example 41.5.
Finally, Example 41.6 shows that the common prior assumption is really needed;
it exhibits a situation with common knowledge of the overall conjectures and of
rationality, where the individual conjectures agree; but there is no common prior,
and the agreed-upon individual conjectures do not form a Nash equilibrium.
Summing up, one must consider the overall conjectures, and nothing less than
common knowledge of these conjectures, together with common priors, will do.
Except in Example 41.6, the belief systems in this section have common priors,
and these are used to describe them. In all the examples, the game being played
is (like in section “Illustrations”) fixed throughout the belief system, and so is
commonly known. Each example has three players, Rowena, Colin, and Matt,
who choose the row, column, and matrix (west or east) respectively. As in section
“Illustrations,” each type is denoted by the same letter as its action, and a subscript
is added.
Example 41.3 Here the individual conjectures are commonly known and agreed
upon, rationality is commonly known, and there is a common prior, and yet we
don’t get Nash equilibrium. Consider the game of Fig. 41.7, with theories induced
by the common prior in Fig. 41.8.
At each state, Colin and Matt agree on the conjecture 21 U C 21 D about Rowena,
and this is commonly known. Similarly, it is commonly known that Rowena and
Matt agree on the conjecture 21 L C 12 R about Colin, and Rowena and Colin agree
on 21 W C 12 E about Matt. All players are rational at all states, so rationality is
common knowledge at all states. But . 12 U C 12 D; 21 L C 12 R; 12 W C 21 E/ is not a Nash
equilibrium, because if these were independent mixed strategies, Rowena could gain
by moving to D.
Fig. 41.7 L R L R
U 1, 1, 1 0, 0, 0 U 0, 0, 0 1, 1, 1
D 1, 0, 0 1, 1, 1 D 1, 1, 1 0, 0, 0
W E
16
It is not what drives Example 41.3; since the individual conjectures are commonly known there,
they must agree (Aumann 1976).
876 R.J. Aumann and A. Brandenburger
Fig. 41.8 L1 R1 L1 R1
U1 1/4 0 U 0 1/4
D1 0 1/4 D 1/4 0
W1 E1
Fig. 41.9 L1 L2 L3
U1 0.4 W1 0.2 E1
U2 0.2 W2 0.1 E2
U3 0.1 W3
Note that the overall conjectures are not commonly (nor even mutually) known
at any state. For example, at (U1 , L1 , W1 ), Rowena’s conjecture is . 12 LW C 12 RE/,
but nobody else knows that that is her conjecture.
Example 41.4 Here we have mutual knowledge of the overall conjectures, common
knowledge of rationality, and common priors; yet individual conjectures don’t agree,
so one can’t even identify a candidate for a Nash equilibrium.
Consider a three-person game in which Rowena and Colin each have just one
action—say U for Rowena and L for Colin—and Matt has two actions, W and E. The
payoffs are unimportant in this example,17 since we are only interested in showing
that the individual conjectures do not agree. Let the belief system be as in Fig. 41.9.
As usual, Rowena’s types are depicted by rows, Colin’s by columns. Matt’s types
are indicated in the individual boxes in the diagram; note that in this case, he knows
the true state.
Consider the state (U2 , L2 , W2 ), or simply W2 for short. At this state, Rowena’s
conjecture is 32 LW C 31 LE, Colin’s is 21 UW C 12 UE, and Matt’s is UL. Rowena knows
Colin’s and Matt’s conjectures, as they are the same at the only other state (E2 ) that
she considers possible.18 Similarly, Colin knows Rowena’s and Matt’s conjectures,
as they are the same at the only other state (E1 ) that he considers possible. Matt
knows Rowena’s and Colin’s conjectures, since he knows that the true state is W2 .
17
They can easily be chosen so that rationality is common knowledge.
18
I.e., to which she assigns positive probability.
41 Epistemic Conditions for Nash Equilibrium 877
Fig. 41.10 L1 L2 L3 L4 L5
U1 16x W1 8x E1 x = 1/46
U2 8x W2 4x E2
U3 4x W3 2x E
3
U4 2x W4 x E4
U5 x W5
So the conjectures are mutually known. Yet Rowena’s conjecture for Matt, 32 W C 13 E,
is different from Colin’s, 21 W C 12 E.
The same idea yields examples of this kind with higher order mutual knowledge
of the conjectures. See Fig. 41.10. At W3 , there is third order mutual knowledge of
the conjectures. By lengthening the staircase19 and choosing a state in its middle,
one can get mutual knowledge of arbitrarily high order that Rowena’s conjecture for
Matt is 32 W C 13 E, while Colin’s is 21 W C 21 E.
The perspicacious reader will have realized that this example is not intrinsically
game-theoretic. It really boils down to a question about “agreeing to disagree”:
Suppose that two people with the same prior get different information, and that
given this information, their posterior probabilities for some event A are mutual
knowledge of some order. Are the posterior probabilities then equal? (Here the
individuals are Rowena and Colin, and the event A is that Matt chooses W; but
actually that’s just window dressing.) An example in Aumann (1976) provides
a negative answer to the question in the case of first order mutual knowledge.
Geanakoplos and Polemarchakis (1982) showed that the answer is negative also
for higher order mutual knowledge. The ingenious example presented here is due to
John Geanakoplos.20
Example 41.5 Here we have mutual knowledge of the overall conjectures, agree-
ment of individual conjectures, common knowledge of rationality, and a common
19
Alternatively, one can use a single probability system with countably many states, represented
by a staircase anchored at the top left and extending infinitely downwards to the right (Jacob’s
ladder?). By choosing a state sufficiently far from the top, one can get mutual knowledge of any
given order.
20
Private communication. The essential difference between the 1982 example of Geanakoplos and
Polemarchakis and the above example of Geanakoplos is that in the former, Rowena’s and Colin’s
878 R.J. Aumann and A. Brandenburger
Fig. 41.11 h t h t
H 1, 0, 3 0, 1, 0 H 1, 0, 2 0, 1, 2
T 0, 1, 0 1, 0, 3 T 0, 1, 2 1, 0, 2
W E
Fig. 41.12 h1 t1 h2 t2 h3 t3
H1 9x W1 9x E1 x = 1/52
T1 9x E1 9x W1
H2 3x W2 3x W1
T2 3x W1 3x W2
H3 x W3 x W2
T3 x W2 x W3
prior, and yet the individual conjectures do not form a Nash equilibrium. Consider
the game of Fig. 41.11. For Rowena and Colin, this is simply “matching pennies”
(Fig. 41.4); their payoffs are not affected by Matt’s choice. So at a Nash equilibrium,
they must play 21 H C 12 T and 12 hC 12 t respectively. Thus Matt’s expected payoff is 3/2
for W, and 2 for E; so he must play E. Hence . 21 H C 12 T; 21 h C 12 t; E/ is the unique
Nash equilibrium of this game.
Consider now the theories induced by the common prior in Fig. 41.12. Rowena
and Colin know which of the three “boxes” contains the true state, and in fact this is
commonly known between the two of them. In each box, Rowena and Colin “play
matching pennies optimally”; their conjectures about each other are 12 H C 12 T and
1 h C 1 t. Since these conjectures obtain at each state, they are commonly known
2 2
(among all three players); so it is also commonly known that Rowena and Colin are
rational.
probabilities for A approach each other as the other m of mutual knowledge approaches 1, whereas
in the latter, they remain at 2/3 and 1/2 no matter how large m is.
41 Epistemic Conditions for Nash Equilibrium 879
21
Rather than the upper row, say.
880 R.J. Aumann and A. Brandenburger
Fig. 41.13 h t h t
H 3/8 1/8 H 0 0
T 1/8 3/8 T 0 0
W E
Fig. 41.14 h1 t1
W1
Example 41.6 Here we show that one cannot dispense with common priors in
Theorem 41.3. Consider again the game of Fig. 41.11, with the theories depicted
in Fig. 41.14 (presented in the style of Figs. 41.2 and 41.5; note that Matt has no
type23 whose action is E). At each state there is common knowledge of rationality,
of overall conjectures (which are the same as in the previous example), and
of the game. As before, Arrow’s condition is satisfied, and it follows that the
individual conjectures are in agreement. And as before, the individual conjectures
. 12 H C 12 T; 12 h C 12 t/ do not constitute a Nash equilibrium.
22
Though similar in form, this condition neither implies nor is implied by common priors. We saw
in Example 41.4 that common priors do not even imply agreement between individual forecasts;
a fortiori, they do not imply Arrow’s condition. In the opposite direction, Example 41.6 satisfies
Arrow’s condition, but has no common prior.
23
If one wishes, one can introduce a type E1 of Matt to which Rowena’s and Colin’s types ascribe
probability 0, and whose theory is, say, 41 Hh C 14 Tt C 14 Th C 41 Ht.
41 Epistemic Conditions for Nash Equilibrium 881
Fig. 41.15 h1 t1
H1 1/6 1/6
T1 1/3 1/3
Fig. 41.16 S
U 1, 1
Game T
D 0, 0
U 0, 1
Game B
D 1, 0
Additional Counterexamples
In the previous section we saw that the assumption of a common prior and of
common knowledge of the conjectures are essential in Theorem 41.3. This section
explores the assumption of mutual knowledge of rationality and of the game being
played (in both Theorems 41.2 and 41.3), and shows that they, too, cannot be
substantially weakened.
Example 41.7 Here we show that in Theorems 41.2 and 41.3, mutual knowledge of
rationality cannot be replaced by the simple fact of rationality (as in Theorem 41.1).
Consider again the game of “matching pennies” (Fig. 41.4), this time with theories
induced by the common prior in Fig. 41.15.
At the state (H1 , h1 ), Colin’s and Rowena’s conjectures are commonly known to
be 13 H C 23 T and 12 h C 21 t respectively, and both are in fact rational (indeed Rowena’s
rationality is commonly known); but . 31 H C 32 T; 12 h C 12 t/ is not a Nash equilibrium,
since the only equilibrium of “matching pennies” is . 12 H C 21 T; 12 h C 21 t/. Note that
at the state (H1 , h1 ), Rowena does not know that Colin is rational.
Example 41.8 Here we show that in Theorems 41.2 and 41.3, knowing one’s own
payoff function does not suffice; one needs mutual knowledge of all the payoff
functions. Consider a two-person belief system where one of two games, T (top)
or B (bottom), is being played (Fig. 41.16).
882 R.J. Aumann and A. Brandenburger
Fig. 41.17 S1
TU1 1/2
BD1 1/2
Fig. 41.18 S
U 0, 1
Game T
D 0, 0
U 1, 1
Game B
D 0, 0
The theories are given by the common prior in Fig. 41.17. Thus Rowena knows
Colin’s payoff function, but Colin does not know Rowena’s. Rowena’s type TU1
has the Top payoff function and plays Up, whereas BD1 has the Bottom payoff
function and plays Down. Colin has just a single type, S1 . At both states, both players
are rational: Rowena, who knows the game, always plays an action that is strictly
dominant in the true game; Colin has no choice, so what he does is rational. So
there is common knowledge of rationality. At both states, Colin’s conjecture about
Rowena is 12 U C 21 DI Rowena’s conjecture about Colin is S. But . 12 U C 12 D; S/ is not
a Nash equilibrium in either of the games; Rowena prefers U in the top game, D in
the bottom game.
Example 41.9 Here we show that the hypotheses of Theorem 41.3 do not imply
that rationality is commonly known (as is the case when the game g is commonly
known—see Proposition 41.A1). Consider a two-person belief system where one of
the two games of Fig. 41.18 is being played. The theories are given by the common
prior in Fig. 41.19. Of Rowena’s three types, TU1 and TD1 are rational, whereas
BD1 (who plays Down in Game B) is irrational. Colin’s two types, S1 and S2 , differ
only in their theories, and both are rational. The conjectures 21 U C 21 D and S are
common knowledge at all states. At the state (TU1 , S1 ), it is mutually known that T
41 Epistemic Conditions for Nash Equilibrium 883
Fig. 41.19 S1 S2
TD1 1/4 0
BD1 0 1/4
is the game being played, and that both players are rational; but neither rationality
nor the game being played are commonly known.
For a general definition of a belief system, we specify that the type spaces Si be
measurable spaces. As before, a theory is a probability measure on S i D j¤i
Sj , which is now endowed with the standard product structure.24 The state space
S D j Sj , too, is endowed with the product structure. An event is now a measurable
subset of S. The “action functions” ai ((3)) are assumed measurable; so are the
payoff functions gi ((4)), as functions of si , for each action n-tuple a separately.
Also the “theory functions” ((2)) are assumed measurable, in the sense that for each
event E and player i, the probability p(E; si ) is measurable as a function of the type
si . It follows that also the conjectures 'i are measurable functions of si .
With these definitions, the statements of the results make sense, and the proofs
remain correct, without any change.
Discussion
An interactive belief system is not a prescriptive model; it does not suggest actions
to the players. Rather, it is a formal framework—a language—for talking about
actions, payoffs, and beliefs. For example, it enables us to say whether a given player
is behaving rationally at a given state, whether this is known to another player, and
24
The -field of measurable sets is the smallest -field containing all the “rectangles” j¤i Tj ,
where Tj is measurable in Sj .
884 R.J. Aumann and A. Brandenburger
so on. But it does not prescribe or even suggest rationality; the players do whatever
they do. Like the disk operating system of a personal computer, the belief system
simply organizes things, so that we can coherently discuss what they do.
Though entirely apt, use of the term “state of the world” to include the actions
of the players has perhaps caused confusion. In Savage (1954), the decision maker
cannot affect the state; he can only react to it. While convenient in Savage’s one-
person context, this is not appropriate in the interactive, many-person world under
study here. Since each player must take into account the actions of the others, the
actions should be included in the description of the state. Also the plain, everyday
meaning of the term “state of the world” includes one’s actions: Our world is shaped
by what we do.
It has been objected that prescribing what a player must do at a state takes away
his freedom. This is nonsensical; the player may do what he wants. It is simply that
whatever he does is part of the description of the state. If he wishes to do something
else, he is heartily welcome to do it, but he thereby changes the state.
Historically, belief systems were introduced by John Harsanyi (1967–1968), to
enable a coherent formulation of games in which the players need not know each
other’s payoff functions. To analyze such games, it is not enough to specify each
player’s beliefs about (i.e., probability distributions on) the payoff functions of the
others; one must also specify the beliefs of the players about the beliefs of the
players about the payoff functions, the beliefs of the players about these beliefs,
and so on ad infinitum. This complicated infinite regress seemed to make useful
analysis very difficult.
Harsanyi’s ingenious solution was to think of each player as being one of several
possible “types,” where a type determines both a player’s own payoff function and
a belief about the types of the others. The belief of a player about the types of
the others induces a belief about their payoff functions; it also induces a belief
about their beliefs about the types, and so a belief about the beliefs about the
payoff functions. The reasoning continues indefinitely. Thus from an I-game (“I”
for incomplete information), as Harsanyi calls his type-model, one can read off the
entire infinite regress of beliefs.
Belief systems as defined in section “Interactive belief systems” are formally just
like Harsanyi’s I-games, except that in belief systems, a player’s type determines
his action as well as his payoff function and his belief about other players’ types.25
As above, it follows that the player’s type determines his entire belief hierarchy—
i.e., the entire infinite regress of his beliefs about actions, beliefs about beliefs
about actions, and so on—in addition to the infinite regress of beliefs about payoff
functions, and how these two kinds of beliefs affect each other.
Traditionally, payoff functions have been treated as exogenous, actions as
endogenous. It was thought that unlike payoff functions, actions should be “pre-
dicted” by the theory. Belief systems wipe out this distinction; they treat uncertainty
25
For related ideas, see Armbuster and Boege (1979), Boege and Eisele (1979), and Tan and
Werlang (1988).
41 Epistemic Conditions for Nash Equilibrium 885
about actions just like uncertainty about payoff functions. Indeed, in this paper
the focus is on actions;26 uncertainty about payoff functions was included as an
afterthought, because we realized that more comprehensive results can be obtained
at almost no cost in the complexity of the proofs.
Is the belief system itself common knowledge among players? If so, how does it
get to be common knowledge? If not, how do we take into account the players’
uncertainty about it?
A related question is whether the belief system is exogenous, like a game or a
market model. If not, where does it come from?
The key to these issues was provided27 in a fundamental paper of Mertens and
Zamir (1985). They treat the Harsanyi case, in which the “underlying variables”28
are the payoff functions only (see (a) above); but the result applies without change
to our situation, where actions as well as payoff functions are underlying variables.
At (a) above, we explained how each type in a belief system determines a belief
hierarchy. Mertens and Zamir reverse this procedure: They start with the belief
hierarchies, and construct belief systems from them. Specifically, they define the
universal belief space as a belief system in which the type space of each player is
simply the set of all his belief hierarchies that satisfy certain minimal consistency
conditions. Thus the universal belief space is not exogenously given, like a game
or market; rather, it is an analytic tool, like the payoff matrix of a game originally
given in extensive form. It follows that the universal belief space may be considered
common knowledge.
Though the universal belief space is infinite, it is the disjoint union of infinitely
many “common knowledge components,”29 many of which are finite. Mertens and
Zamir call any union of such components a subspace of the universal belief space.
It follows that when the belief system is a subspace, then the belief system itself is
common knowledge.
It may be shown that any belief system B for A1 An —including, of
course, a finite one—is “isomorphic” to a subspace of the universal belief space.30
From all this we conclude that the belief system itself may always be considered
common knowledge.
26
As is apparent from the examples in sections “Illustrations,” “The main counterexamples,” and
“Additional counterexamples,” in most of which the game g is common knowledge.
27
See also Armbuster and Boege (1979) and Boege and Eisele (1979).
28
The variables about which beliefs—of all orders—are held.
29
At each state s of such a component S, the identity of that component is commonly known (i.e.,
it is commonly known at s that the true state is in S, though it need not be commonly known that
the true state is s).
30
This is because each type in B determines a belief hierarchy (see (a)). The set of n-tuples of all
these belief hierarchies is the subspace of the universal belief space that is isomorphic to B.
886 R.J. Aumann and A. Brandenburger
In this paper, “know” means “ascribe probability 1 to.” This is sometimes called
“believe,” while “know” is reserved for absolute certainty, with no possibility at all
for error. In the formalism of belief systems31 (section “Interactive belief systems”),
absolute certainty has little concrete meaning; a player can be absolutely certain
only on his own action, his own payoff function, and his own theory, not of anything
pertaining to anybody else.
Choosing between the terms “know” and “believe” caused us many sleepless
nights. In the end, we decided on “know” because it enables simpler, less convo-
luted language, and because we were glad to de-emphasize the relatively minor
conceptual difference between probability 1 and absolute certainty.32
Note that since our conditions are sufficient, our results are stronger with
probability 1 than with absolute certainty. If probability 1 knowledge of certain
events implies that is a Nash equilibrium, then a fortiori, so does absolute certainty
of those events.
Instead of using belief systems to formulate and prove our results and examples,
one may use knowledge partitions (e.g., Aumann 1976). The advantage of the
partition formalism is that with it one can represent absolute certainty, not just
probability 1 knowledge (see (c)). Also some of the proofs may become marginally
simpler.33 On the other hand, the partition formalism—especially when combined
with probabilities—is itself more complex than that of belief systems, and it is
desirable to use as simple, transparent, and unified an approach as possible.
31
Unlike that of partitions (see (e) below).
32
Another reason is that “belief” often refers to a general probability distribution, which does not
go well with using “know” to mean “ascribe probability 1 to.”
33
This is natural, as the results are slightly weaker (see (c)).
41 Epistemic Conditions for Nash Equilibrium 887
the first toss, a highly unlikely proposition. And this in spite of the fact that the
tosses themselves are of course “physically” independent (whatever that may mean;
we’re not sure it means anything). The relation between “physical” and stochastic
independence is murky at best.
Independence of individual conjectures is an appropriate assumption when one
thinks of mixed strategies in the old way, in terms of explicit randomizations. With
that kind of interpretation, “acting independently” is of course closely associated
with stochastic independence. But not when the probabilities represent other
players’ ignorance.
To be sure, common knowledge of the conjectures is also a rather strong assump-
tion. Moreover, we do conclude that the individual conjectures are independent. But
there is a difference between an assumption that is merely strong and one that is
groundless. More to the point, there is a qualitative difference between assuming
common knowledge of the conjectures and assuming independence. Common
knowledge of the conjectures describes what might almost be called a “physical”
state of affairs. It might, for example, be the outgrowth of experience or learning,
like common knowledge of a language. Independence, on the other hand, is in the
mind of the decision maker. It isn’t reasonable to make such an assumption when
one can’t describe some clear set of circumstances under which it would obtain—
and that is very difficult to do. By contrast, common knowledge of the conjectures
may itself be considered a “clear set of circumstances.”
The epistemic conditions discussed in this paper are local; they say when the
players’ actions or conjectures at a specific state constitute an equilibrium. By
contrast, one can treat global conditions, which refer to the belief system as a whole.
41 Epistemic Conditions for Nash Equilibrium 889
These are usually quite different from the local conditions treated here. For example,
the condition for correlated equilibrium in Aumann (1987) is global.
A global epistemic condition for Nash equilibrium34 is as follows:
Remark 41.3 Suppose that the game g is fixed,35 that there is a common prior P,
and that all players are rational at all states. Then the distribution Q of action n-
tuples is a Nash equilibrium in g if and only if for each player i, the expectation
of i’s conjecture given one of his actions ai is the same for all ai that are assigned
positive probability by Q.
Roughly speaking, the condition says that if the players use only information that
is relevant to their choice of an action, then their conjectures are always the same,
independently of their information.
34
Stated, but not proved, in Aumann (1987).
35
Constant throughout the belief system.
890 R.J. Aumann and A. Brandenburger
as rational.36 This would rescue Theorems 41.1 and 41.2, since in those results,
probability assessments and payoff functions of irrational types play no role.
It would, however, not rescue Theorem 41.3, since there one needs common
knowledge of the conjectures, but only mutual knowledge of rationality. Thus
it appears that irrational players must entertain conjectures; for this they should
be rational. To resolve this problem, one may distinguish between subjective
and objective theories—and so also between subjective and objective knowledge,
conjectures, and rationality. Thus, think of the common prior in Theorem 41.3 as
referring to the “objective” assessment of a fixed outside observer Otto. Call type
si objectively rational if its choice maximizes its expected payoff when calculated
according to the objective conjecture (the one that Otto would hold if given si ’s
information), and according to its own payoff function. This differs from subjective
rationality of si (the notion of rationality hitherto used), which requires that si ’s
choice be maximal according to si ’s own conjecture, not Otto’s. Similarly, say that a
type si knows some event E objectively if given si ’s information, Otto would ascribe
probability 1 to E. Theorem 41.3 can then be understood as asking for mutual
objective knowledge of the players’ payoff functions and of objective rationality,
and for common objective knowledge of objective conjectures. These assumptions
do not demand that irrational types entertain conjectures, but they may. If they do,
they are now to be thought of as subjectively—but not necessarily objectively—
rational.
Needless to say, our results remain true—and almost as interesting—when
rationality is commonly known. Thus readers who are dissatisfied with the above
interpretations may simply assume common knowledge of rationality.
The conclusions of our theorems state that a specified (mixed) strategy n-tuple is
an equilibrium; they do not state that the players know it to be an equilibrium, or
that this is commonly known. In the case of Theorems 41.2 and 41.3, though, it is
in fact mutual knowledge of order 1—but not necessarily of any higher order—that
is a Nash equilibrium. In the case of Theorem 41.1, it need not even be mutual
knowledge of order 1 that is a Nash equilibrium; but this does follow if, in addition
to the stated assumptions, one assumes mutual knowledge of the payoff functions.
36
It would be desirable to see whether and how one can derive this kind of modified belief system
from a Mertens–Zamir-type construction.
41 Epistemic Conditions for Nash Equilibrium 891
(k) Conclusions
Where does all this leave us? Are the “foundations” of Nash equilibrium more
secure or less secure than previously imagined? Do our results strengthen or weaken
the case for Nash equilibrium?
First, in assessing the validity of a game-theoretic solution concept, one should
not place undue emphasis on its a priori rationale. At least as important is the
question of how successful the concept is in providing insight into applications,
how tractable it is, and, relatedly, even the extent of its aesthetic appeal. On all these
counts, Nash equilibrium has proved its worth.
This said, the present results do indicate that the a priori case for Nash equilib-
rium is a little stronger than conventional wisdom has granted. Common knowledge
turns out to play a more limited role—at least in games with two players—than
previously thought. The reader may object that even mutual knowledge of, say,
payoff functions is implausible; but indisputably, common knowledge of payoff
functions is more so. It is true that our epistemic conditions for Nash equilibrium
in games with more than two players involve common knowledge (of conjectures);
indeed, it was surprising to discover that the conditions for equilibrium in the n-
person case are stronger than in the two-person case. Perhaps Nash equilibrium
rests on firmer foundations in two-person than in n-person games.
It should be remembered that the conditions for Nash equilibrium described here
are sufficient, but not necessary. Perhaps there are other ways of looking at Nash
equilibrium epistemically; if so, their nature is as yet unclear.
There also are non-epistemic ways of looking at Nash equilibrium, such as the
evolutionary approach (e.g., Maynard Smith 1982). Related to this is the idea that
a Nash equilibrium represents a societal norm. In the end, these viewpoints will
perhaps provide a more compelling basis for Nash equilibrium than those involving
games played by consciously maximizing, rational players.
Finally, the apparatus of this paper—belief systems, conjectures, knowledge,
mutual knowledge, common knowledge, and the like—has an appeal that extends
beyond our immediate purpose of providing epistemic conditions for Nash
equilibrium. The apparatus offers a way of analyzing strategic situations that
corresponds nicely to the concerns that we have all experienced in practice—what
is the other person thinking, what is his true motivation, does he see the world as I
do, and so on.
We start with some remarks on Theorem 41.3 and its proof. First, the conclusions
of Theorem 41.3 continue to hold under the slightly weaker assumption that the
common prior assigns positive probability to ' D ' being commonly known, and
there is a state at which ' D ' is commonly known and g D g and the rationality of
the players are mutually known.
892 R.J. Aumann and A. Brandenburger
Second, note that the rationality assumption is not used until the end of Theorem
41.3’s proof, after (41.3) is established. Thus if we assume only that there is
a common prior that assigns positive probability to the conjectures ' i being
commonly known, we may conclude that all players i have the same conjecture
j for other players j, and that each ' i is the product of the j with j ¤ i; that is, the
n 1 conjectures of each player about the other players are independent.
Third, if in Theorem 41.3 we assume that the game being played is commonly
(not just mutually) known, then we can conclude that also the rationality of the
players is commonly known.37 That is, we have
Proposition 41.A1 Suppose that at some state s, the game g and the conjectures
' i are commonly known and rationality is mutually known. Then at s, rationality is
commonly known. (Note that common priors are not assumed here.)
Proof Set G :D [g], F :D ['], Rj :D [j is rational], and R :D [all players are
rational] D R1 \ \Rn . In these terms, the proposition says that CK(G \ F) \
K1 R CKR. We assert that it is sufficient for this to prove
2
K .G \ F/ \ K 1 R K 2 R: (41.A1)
Indeed, if we have (A1), an inductive argument using Lemma 41.5 and that E E0
implies K1 (E) K1 (E0 ) (which follows from Lemma 41.2) yields Km (G \ F) \
K1 R Km R for any m; so taking intersections, CK (G \ F) \ K1 R CKR follows.
Let j be a player, Bj the set of actions aj of j to which the conjecture ' i of some
other player i assigns positive probability. Let Ej :D [aj 2 Bj ] (the event that the
action chosen by j is in Bj ). Since the game g and the conjectures ' i are commonly
known at s, they are a fortiori mutually known there; so by Lemma 41.6, each action
in Bj maximizes gj against ' j . Hence Ej \ G \ F Rj . At each state in F, each player
other than j knows that j’s action is in Bj ; that is, F \i¤j Ki Ej . So G \ F (\i¤j
Ki Ej ) \ (G \ F). So Lemmas 41.2 and 41.5 yield
2
K .G \ F/ K 2 \i¤j Ki Ej \ K 2 .G \ F/ K 1 \i¤j Ki Ej \ K2 .G \ F/
K 1 \i¤j Ki Ej \ K 1 \i¤j Ki .G \ F/ DK 1 \i¤j Ki Ej \ G \ F
K 1 \i¤j Ki Rj :
37
This observation, for which we are indebted to Ben Polak, is of particular interest because in
many applied contexts there is only one game under consideration, so it is of necessity commonly
known.
38
The proof would be simpler with a formalism in which known events are true (Kj E E). See
section “Alternative formalisms.”
41 Epistemic Conditions for Nash Equilibrium 893
Our fourth remark is that in both Theorems 41.2 and 41.3, mutual knowledge of
rationality may be replaced by the assumption that each player knows the others to
be rational; in fact, all players may themselves be irrational at the state in question.
(Recall that “know” means “ascribe probability 1”; thus a player may be irrational
even though another player knows that he is rational.)
We come next to the matter of converses to our theorems. We have already
mentioned (at the end of section “Description of the results”) that the conditions are
not necessary, in the sense that it is quite possible to have a Nash equilibrium even
when they are not fulfilled. In Theorem 41.1, the action n-tuple a(s) at a state s may
well be a Nash equilibrium even when a(s) is not mutually known, whether or not
the players are rational. (But if the actions are mutually known at s and a(s) is a Nash
equilibrium, then the players are rational at s; cf. Remark 41.2.) In Theorem 41.2,
the conjectures at a state s in a two-person game may constitute a Nash equilibrium
even when, at s, they are not mutually known and/or rationality is not mutually
known. Similarly for Theorem 41.3.
Nevertheless, there is a sense in which the converses hold: Given a Nash
equilibrium in a game g, one can construct a belief system in which the conditions
are fulfilled. For Theorem 41.1, this is immediate: Choose a belief system where
each player i has just one type, whose action is i’s component of the equilibrium
and whose payoff function is gi . For Theorems 41.2 and 41.3, we may suppose that
as in the traditional interpretation of mixed strategies, each player chooses an action
by an independent conscious randomization according to his component i of the
given equilibrium . The types of each player correspond to the different possible
outcomes of the randomization; each type chooses a different action. All types of
player i have the same theory, namely, the product of the mixed strategies of the
other n 1 players appearing in , and the same payoff function, namely gi . It may
then be verified that the conditions of Theorems 41.2 and 41.3 are met.
These “converses” show that the sufficient conditions for Nash equilibrium
in our theorems are not too strong, in the sense that they do not imply more
than Nash equilibrium; every Nash equilibrium is attainable with these conditions.
Another sense in which they are not too strong—that the conditions cannot be
dispensed with or even appreciably weakened—was discussed in sections “The
main counterexamples” and “Additional counterexamples.”
References
Armbruster, W., & Boege, W. (1979). Bayesian game theory. In O. Moeschlin & D. Pallaschke
(Eds.), Game theory and related topics. Amsterdam: North-Holland.
Aumann, R. (1976). Agreeing to disagree. Annals of Statistics, 4, 1236–1239.
Aumann, R. (1987). Correlated equilibrium as an expression of Bayesian rationality. Economet-
rica, 55, 1–18.
Boege, W., & Eisele, T. (1979). On solutions of Bayesian games. International Journal of Game
Theory, 8, 193–215.
894 R.J. Aumann and A. Brandenburger
Brandenburger, A., & Dekel, E. (1989). The role of common knowledge assumptions in game
theory. In F. Hahn (Ed.), The economics of missing markets, information, and games. Oxford:
Oxford University Press.
Geanakoplos, J., & Polemarchakis, H. (1982). We can’t disagree forever. Journal of Economic
Theory, 28, 192–200.
Harsanyi, J. (1967–1968). Games of incomplete information played by ‘Bayesian’ players, I-III.
Management Science, 14, 159–182, 320–334, 486–502.
Harsanyi, J. (1973). Games with randomly disturbed payoffs: A new rationale for mixed strategy
equilibrium points. International Journal of Game Theory, 2, 1–23.
Kohlberg, E., & Mertens, J.-F. (1986). On the strategic stability of equilibria. Econometrica, 54,
1003–1037.
Kreps, D., & Wilson, R. (1982). Sequential equilibria. Econometrica, 50, 863–894.
Lewis, D. (1969). Conventions: A philosophical study. Cambridge: Harvard University Press.
Maynard Smith, J. (1982). Evolution and the theory of games. Cambridge: Cambridge University
Press.
Mertens, J.-F., & Zamir, S. (1985). Formulation of Bayesian analysis for games with incomplete
information. International Journal of Game Theory, 14, 1–29.
Myerson, R. (1978). Refinements of the Nash equilibrium concept. International Journal of Game
Theory, 7, 73–80.
Nash, J. (1951). Non-cooperative games. Annals of Mathematics, 54, 286–295.
Savage, L. (1954). The foundations of statistics. New York: Wiley.
Selten, R. (1965). Spieltheoretische Behandlung eines Oligopolmodels mit Nachfragetragheit.
Zietschrift fur die gesante Staatswissenschaft, 121, 301–324.
Selten, R. (1975). Reexamination of the perfectness concept for equilibrium points in extensive
games. International Journal of Game Theory, 4, 25–55.
Tan, T., & Werlang, S. (1988). The Bayesian foundations of solution concepts of games. Journal
of Economic Theory, 45, 370–391.
Chapter 42
Knowledge, Belief and Counterfactual
Reasoning in Games
Robert Stalnaker
Introduction
Deliberation about what to do in any context requires reasoning about what will or
would happen in various alternative situations, including situations that the agent
knows will never in fact be realized. In contexts that involve two or more agents
who have to take account of each others’ deliberation, the counterfactual reasoning
may become quite complex. When I deliberate, I have to consider not only what
the causal effects would be of alternative choices that I might make, but also what
other agents might believe about the potential effects of my choices, and how their
alternative possible actions might affect my beliefs. Counterfactual possibilities are
implicit in the models that game theorists and decision theorists have developed –
in the alternative branches in the trees that model extensive form games and the
different cells of the matrices of strategic form representations – but much of the
reasoning about those possibilities remains in the informal commentary on and
motivation for the models developed. Puzzlement is sometimes expressed by game
theorists about the relevance of what happens in a game ‘off the equilibrium path’:
of what would happen if what is (according to the theory) both true and known
by the players to be true were instead false. My aim in this paper is to make some
suggestions for clarifying some of the concepts involved in counterfactual reasoning
in strategic contexts, both the reasoning of the rational agents being modeled, and
the reasoning of the theorist who is doing the modeling, and to bring together
some ideas and technical tools developed by philosophers and logicians that I think
might be relevant to the analysis of strategic reasoning, and more generally to the
conceptual foundations of game theory.
R. Stalnaker ()
Department of Linguistics and Philosophy, MIT, Cambridge, MA, USA
e-mail: [email protected]
There are two different kinds of counterfactual possibilities – causal and epis-
temic possibilities – that need to be distinguished. They play different but interacting
roles in a rational agent’s reasoning about what he and others will and should do, and
I think equivocation between them is responsible for some of the puzzlement about
counterfactual reasoning. In deliberation, I reason both about how the world might
have been different if I or others did different things than we are going to do, and also
about how my beliefs, or others’ beliefs, might change if I or they learned things that
we expect not to learn. To take an often cited example from the philosophical litera-
ture to illustrate the contrast between these two kinds of counterfactual suppositions,
compare: if Shakespeare didn’t write Hamlet, someone else did, with if Shakespeare
hadn’t written Hamlet, someone else would have.1 The first expresses a quite
reasonable disposition to hold onto the belief that someone wrote Hamlet should one
receive the unexpected information that Shakespeare did not; the second expresses
a causal belief, a belief about objective dependencies, that would be reasonable only
if one held a bizarre theory according to which authors are incidental instruments
in the production of works that are destined to be written. The content of what is
supposed in the antecedents of these contrasting conditionals is the same, and both
suppositions are or may be counterfactual in the sense that the person entertaining
them believes with probability one that what is being supposed is false. But it is
clear that the way it is being supposed is quite different in the two cases.
This contrast is obviously relevant to strategic reasoning. Beliefs about what it
is rational to do depend on causal beliefs, including beliefs about what the causal
consequences would be of actions that are alternatives to the one I am going to
choose. But what is rational depends on what is believed, and I also reason about
the way my beliefs and those of others would change if we received unexpected
information. The two kinds of reasoning interact, since one of the causal effects of
a possible action open to me might be to give unexpected information to another
rational agent.2
It is obvious that a possible course of events may be causally impossible even if it
is epistemically open, as when you have already committed yourself, but I have not
yet learned of your decision. It also may happen that a course of events is causally
open even when it is epistemically closed in the sense that someone believes, with
probability one, that it will not happen. But can it be true of a causally open course
of events that someone not only believes, but also knows that it will not occur? This
is less clear; it depends on how we understand the concept of knowledge. It does not
seem incoherent to suppose that you know that I am rational, even though irrational
choices are still causally possible for me. In fact, the concept of rationality seems
applicable to actions only when there are options open to an agent. If we are to make
sense of assumptions of knowledge and common knowledge of rationality, we need
1
Ernest Adams (1970) first pointed to the contrast illustrated by this pair of conditionals. The
particular example is Jonathan Bennett’s.
2
The relation between causal and evidential reasoning is the central concern in the development of
causal decision theory. See Gibbard and Harper (1981), Skyrms (1982) and Lewis (1980).
42 Knowledge, Belief and Counterfactual Reasoning in Games 897
to allow for the possibility that an agent may know what he or another agent is going
to do, even when it remains true that the agent could have done otherwise.
To clarify the causal and epistemic concepts that interact in strategic reasoning,
it is useful to break them down into their component parts. If, for example, there is a
problem about exactly what it means to assume that there is common knowledge of
rationality, it ought to be analyzed into problems about exactly what rationality is,
or about what knowledge is, or about how common knowledge is defined in terms
of knowledge. The framework I will use to represent these concepts is one that is
designed to help reveal the compositional structure of such complex concepts: it is a
formal semantic or model theoretic framework – specifically, the Kripkean ‘possible
worlds’ framework for theorizing about modal, causal and epistemic concepts. I will
start by sketching a simple conception of a model, in the model theorist’s sense,
of a strategic form game. Second, I will add to the simple conception of a model
the resources to account for one kind of counterfactual reasoning, reasoning about
belief revision. In these models we can represent concepts of rationality, belief
and common belief, and so can define the complex concept of common belief in
rationality, and some related complex concepts, in terms of their component parts.
The next step is to consider the concept of knowledge, and the relation between
knowledge and belief. I will look at some different assumptions about knowledge,
and at the consequences of these different assumptions for the concepts of common
knowledge and common knowledge of rationality. Then to illustrate the way some of
the notions I discuss might be applied to clarify some counterfactual reasoning about
games, I will discuss some familiar problems about backward induction arguments,
using the model theory to sharpen the assumptions of those arguments, and to state
and prove some theorems about the consequences of assumptions about common
belief and knowledge.
Before sketching the conception of a model of a game that I will be using, I will
set out some assumptions that motivate it, assumptions that i think will be shared
by most though not all, game theorists. First, I assume that a game is a partial
description of a set or sequence of interdependent Bayesian decision problems. The
description is partial in that while it specifies all the relevant utilities motivating the
agents, it does not give their degrees of belief. Instead, qualitative constraints are put
on what the agents are assumed to believe about the actions of other agents; but these
constraints will not normally be enough to determine what the agents believe about
each other, or to determine what solutions are prescribed to the decision problems.
Second, I assume that all of the decision problems in the game are problems of
individual decision making. There is no special concept of rationality for decision
making in a situation where the outcomes depend on the actions of more than one
agent. The acts of other agents are, like chance events, natural disasters and acts
of God, just facts about an uncertain world that agents have beliefs and degrees of
898 R. Stalnaker
belief about. The utilities of other agents are relevant to an agent only as information
that, together with beliefs about the rationality of those agents, helps to predict their
actions. Third, I assume that in cases where degrees of belief are undetermined, or
only partially determined, by the description of a decision problem, then no action
is prescribed by the theory unless there is an action that would be rational for every
system of degrees of belief compatible with what is specified. There are no special
rules of rationality telling one what to do in the absence of degrees of belief, except
this: decide what you believe, and then maximize expected utility.
A model for a game is intended to represent a completion of the partial
specification of the set or sequence of Bayesian decision problems that is given
by the definition of the game, as well as a representation of a particular play of
the game. The class of all models for a game will include all ways of filling in the
relevant details that are compatible with the conditions imposed by the definition
of the game. Although a model is intended to represent one particular playing
of the game, a single model will contain many possible worlds, since we need a
representation, not only of what actually happens in the situation being modeled,
but also what might or would happen in alternative situations that are compatible
with the capacities and beliefs of one or another of the agents. Along with a set of
possible worlds, models will contain various relations and measures on the set that
are intended to determine all the facts about the possible worlds that may be relevant
to the actions of any of the agents playing the game in a particular concrete context.
The models considered in this paper are models for finite games in normal or
strategic
˝ form.˛ I assume, as usual, that the game itself consists of a structure
N; hCi ; ui ii2N , where N is a finite set of players, Ci is a finite set of alternative
strategies for player i, and ui is player i0 s utility function taking a strategy profile (a
specification of a strategy for each player) into a utility value for the outcome that
would result from that sequence of strategies. A model for a game will consist of a
set of possible worlds (a state space), one of which is designated as the actual world
of the model. In each possible world in the model, each player has certain beliefs
and partial beliefs, and each player makes a certain strategy choice. The possible
worlds themselves are simple, primitive elements of the model; the information
about them – what the players believe and do in each possible world – is represented
by several functions and relations given by a specification of the ˝ particular model. ˛
Specifically, a model for a game will consist of a structure W; a; hSi; Ri ; Pi ii2N ,
where W is a nonempty set (the possible worlds), a is a member of W (the actual
world), each Si is a function taking possible worlds into strategy choices for player
i, each Ri is a binary relation on W, and each Pi is an additive measure function on
subsets of W.
The R relations represent the qualitative structure of the players’ beliefs in the
different possible worlds in the following way: the set of possible worlds that are
compatible with what player i believes in world w is the set fx:wRi xg. It is assumed
42 Knowledge, Belief and Counterfactual Reasoning in Games 899
that the R relations are serial, transitive, and euclidean.3 The first assumption
is simply the requirement that in any possible world there must be at least one
possible world compatible with what any player believes in that world. The other
two constraints encode the assumption that players know their own minds: they
are necessary and sufficient to ensure that players have introspective access to their
beliefs: if they believe something, they believe that they believe it, and if they do
not, they believe that they do not.
The S functions encode the facts about what the players do – what strategies
they choose – in each possible world. It is assumed that if xRi y, then Si (x) D Si (y).
Intuitively, this requirement is the assumption that players know, at the moment of
choice, what they are doing – what choice they are making. Like the constraints on
the structure of the R relations, this constraint is motivated by the assumption that
players have introspective access to their own states of mind.
The measure function Pi , encodes the information about the player’s partial
beliefs in each possible world in the following way: player i0 s belief function in
possible world w is the relativization of Pi to the set fx:wRi xg. That is, for any
proposition , Pi,W () D Pi ( \fx:wRi xg)/Pi (fx:wRi xg). The assumptions we are
making about Ri and Pi will ensure that Pi (fx:wRi xg) is nonzero for all w, so that
this probability will always be defined.
The use of a single measure function for each player, defined on the whole space
of possible worlds, to encode the information required to define the player’s degrees
of belief is just a technical convenience – an economical way to specify the many
different belief functions that represent that player’s beliefs in different possible
worlds. No additional assumptions about the players’ beliefs are implicit in this
form of representation, since our introspection assumptions already imply that any
two different belief states for a single player are disjoint, and any set of probability
measures on disjoint sets can be represented by a single measure on the union of all
the sets. This single measure will contain some extraneous information that has no
representational significance – different total measures will determine the same set
of belief functions – but this artifact of the model is harmless.4
3
That is, for all players i (x)(9y)xRi y, (x)(y)(z)((xRi y & yRi z) ! xRi z), and (x)(y)(z)((xRi y &
xRi z) ! yRiz).
4
It has been suggested that there is a substantive, and implausible, assumption built into the way
that degrees of belief are modeled: namely, that any two worlds in which a player has the same full
beliefs he also has the same partial beliefs. But this assumption is a tautological consequence of the
introspection assumption, which implies that a player fully believes that he himself has the partial
beliefs that he in fact has. It does follow from the introspection assumptions that player j cannot
be uncertain about player i’s partial beliefs while being certain about all of i’s full beliefs. But that
is just because the totality of i’s full beliefs includes his beliefs about his own partial beliefs, and
by the introspection assumption, i’s beliefs about his own partial beliefs are complete and correct.
Nothing, however, prevents there being a model in which there are different worlds in which player
i has full beliefs about objective facts that are exactly the same, even though the degrees of belief
about such facts are different. This situation will be modeled by disjoint but isomorphic sets of
possible worlds. In such a case, another player j might be certain about player i’s full beliefs about
everything except i’s own partial beliefs, while being uncertain about i’s partial beliefs.
900 R. Stalnaker
In order to avoid complications that are not relevant to the conceptual issues
I am interested in, I will be assuming throughout this discussion that our models
are finite, and that the measure functions all assign nonzero probability to every
nonempty subset of possible worlds.
We need to impose one additional constraint on our models, a constraint that is
motivated by our concern with counterfactual reasoning. A specification of a game
puts constraints on the causal consequences of the actions that may be chosen in the
playing of the game, and we want these constraints to be represented in the models.
Specifically, in a strategic form game, the assumption is that the strategies are chosen
independently, which means that the choices made by one player cannot influence
the beliefs or the actions of the other players. One could express the assumption by
saying that certain counterfactual statements must be true in the possible worlds in
the model: if a player had chosen a different strategy from the one he in fact chose,
the other players would still have chosen the same strategies, and would have had
the same beliefs, that they in fact had. The constraint we need to add is a closure
condition on the set of possible worlds – a requirement that there be enough possible
worlds of the right kind to represent these counterfactual possibilities.
For any world w and strategy s for player i, there is a world f(w,s) meeting the
following four conditions:
1. for all j ¤ i, if wRj x, then f(w,s)Rj x.
2. if wRi x, then f(w,s)Ri f(x,s).
3. Si (f(w,s)) D s
4. Pi (f(w,s)) D Pi (w).
Intuitively, f(w,s) represents the counterfactual possible world that, in w, is the
world that would have been realized if player i, believing exactly what he believes
in w about the other players, had chosen strategy s.
Any of the (finite) models constructed for the arguments given in this paper can
be extended to (finite) models satisfying this closure condition. One simply adds,
for each w 2 W and each strategy profile c, a world corresponding to the pair (w,c),
and extending the R’s, P’s, and S’s in a way that conforms to the four conditions.5
Because of our concern to represent counterfactual reasoning, it is essential that
we allow for the possibility that players have false beliefs in some possible worlds,
which means that a world in which they have certain beliefs need not itself be
compatible with those beliefs. Because the epistemic structures we have defined
allow for false belief, they are more general than the partition structures that will be
more familiar to game theorists. An equivalence relation meets the three conditions
we have imposed on our R relations, but in addition must be reflexive. To impose this
5
More precisely, for any given model M D <W,a,<Si, Ri, Pi >i2N >, not necessarily meeting the
closure condition, define a new model M0 as follows: W0 D W C;a0 D <a,S(a)>; for all w 2
W and c 2 C, S0 (<w,c>) D c; for all x,y 2 W and c,d 2 C, <x,c> R0 i <y,d> if the following three
conditions are met: (i) xRi y, (ii) ci D di , and (iii) for all j ¤ i, Sj (y) D dj ; P0 i (<x,c>) D Pi (x). This
model will be finite if the original one was, and will satisfy the closure condition.
42 Knowledge, Belief and Counterfactual Reasoning in Games 901
additional condition would be to assume that all players necessarily have only true
beliefs. But even if an agent in fact has only true beliefs, counterfactual reasoning
requires an agent to consider possible situations in which some beliefs are false.
First, we want to consider belief contravening, or epistemic counterfactuals: how
players would revise their beliefs were they to learn they were mistaken. Second,
we want to consider deliberation which involves causal counterfactuals: a player
considers what the consequences would be of his doing something he is not in fact
going to do. In both cases, a player must consider possible situations in which either
she or another player has a false belief.
Even though the R relations are not, in general, equivalence relations, there is
a relation definable in terms of R that does determine a partition structure: say
that two worlds x and y are subjectively indistinguishable for player i (x i y)
if player i0 s belief state in x is the same as it is in y. That is, x i y if and
only if fz:xRi zg D fz:yRi zg. Each equivalence class determined by a subjective
indistinguishability relation will be divided into two parts: the worlds compatible
with what the player believes, and the worlds that are not. In the regular partition
models, all worlds are compatible with what the player believes in the world, and
the two relations, Ri and i , will coincide.
To represent counterfactual reasoning, we must also allow for possible worlds in
which players act irrationally. Even if I am resolved to act rationally, I may consider
in deliberation what the consequences would be of acting in ways that are not. And
even if I am certain that you will act rationally, I may consider how I would revise
my beliefs if I learned that I was wrong about this. Even models satisfying some
strong condition, such as common belief or knowledge that everyone is rational,
will still be models that contain counterfactual possible worlds in which players
have false beliefs, and worlds in which they fail to maximize expected utility.
The aim of this model theory is generality: to make, in the definition of a model,
as few substantive assumptions as possible about the epistemic states and behavior
of players of a game in order that substantive assumptions can be made explicit as
conditions that distinguish some models from others. But of course the definition
inevitably includes a range of idealizing and simplifying assumptions, made for a
variety of reasons. Let me just mention a few of the assumptions that have been built
into the conception of a model, and the reasons for doing so.
First, while we allow for irrational action and false belief, we do assume (as is
usual) that players all have coherent beliefs that can be represented by a probability
function on some nonempty space of possibilities. So in effect, we make the
outrageously unrealistic assumption that players are logically omniscient. This
assumption is made only because it is still unclear, either conceptually or technically,
how to understand or represent the epistemic situations of agents that are not ideal
in this sense. This is a serious problem, but not one I will try to address here.
Second, as I have said, it is assumed that players have introspective access to their
beliefs. This assumption could be relaxed by imposing weaker conditions on the R
relations, although doing so would raise both technical and conceptual problems. It
is not clear how one acts on one’s beliefs if one does not have introspective access to
them. Some may object to the introspective assumption on the ground that a person
902 R. Stalnaker
may have unconscious or inarticulate beliefs, but the assumption is not incompatible
with this: if beliefs can be unconscious, so can beliefs about beliefs. It is not assumed
that one knows how to say what one believes.
Third, some have questioned the assumption that players know what they do.
This assumption might be relaxed with little effect; what is its motivation? The idea
is simply that in a static model for a strategic form game, we are modeling the
situation at the moment of choice, and it seems reasonable to assume that at that
moment, the agent knows what choice is being made.
Fourth, it is assumed that players know the structure of the game – the options
available and the utility values of outcomes for all of the players. This assumption is
just a simplifying assumption made to avoid trying to do too much at once. It could
easily be relaxed with minimal effect on the structure of the models, and without
raising conceptual problems. That is, one could consider models in which different
games were being played in different possible worlds, and in which players might
be uncertain or mistaken about what the game was.
Finally, as noted we assume that models are finite. This is again just a simplifying
assumption. Relaxing it would require some small modifications and add some
mathematical complications, but would not change the basic story.
In any possible worlds model, one can identify propositions with subsets of
the set of possible worlds, with what economists and statisticians call ‘events’.
The idea is to identify the content of what someone may think or say with, its
truth conditions – that is, with the set of possible worlds that would realize the
conditions that make what is said or thought true. For any proposition and
player i, we can define the proposition that i fully believes that as the set
fx2W:fy2W:xRi yg g, and the proposition that i believes that to at least degree
r as the set fx2W:Pi,x () rg. So we have the resources to interpret unlimited
iterations of belief in any proposition, and the infinitely iterated concept of common
belief (all players believe that , and all believe that all believe that , and all believe
that all believe that all believe that , and : : : etc.) can be defined as the intersection
of all the propositions in this infinite conjunction. Equivalently, we can represent
common belief in terms of the transitive closure R*, of the set all the R relations.
For any proposition , it is, in possible world x, common belief among the players
that if and only if is true in all possible worlds compatible with common belief,
which is to say if and only if fy:xR*yg .
If rationality is identified with maximizing expected utility, then we can define,
in any model, the propositions that some particular player is rational, that all players
are rational, that all players believe that all players are rational, and of course that it
is common belief among the players that all players are rational. Here is a sequence
of definitions, leading to a specification of the proposition that there is common
belief that all players are rational6 : first, the expected utility of an action (a strategy
choice) s for a player i in a world x is defined in the familiar way:
6
In these and other definitions, a variable for a strategy or profile, enclosed in brackets, denotes
the proposition that the strategy or profile is realized. So, for example, if e 2 C i (if e is a strategy
profile for players other than player i) then [e] D fx 2 W:Sj (x) D ej for all j ¤ ig.
42 Knowledge, Belief and Counterfactual Reasoning in Games 903
X
eui;x .s/ D Pi;x .Œe/ ui ..s; e//
e2C i
Second, we define the set of strategies that maximize expected utility for player i in
world x:
˚
Third, the proposition that player i is rational is the set of possible worlds in which
the strategy chosen maximizes expected utility in that world:
Ai D fx 2 W W Si .x/ 2 ri;x g
A D \i2N Ai
Fifth, the proposition there is common belief that everyone is rational is defined as
follows:
Z D fx 2 W W fy 2 W W xR yg Ag :
Any specification that determines a proposition relative to a model can also be used
to pick out a class of models – all the models in which the proposition is true in
that model’s actual world. So for any given game, we can pick out the class of
models of that game that satisfy some intuitive condition, for example, the class
of models in which the proposition Z, that there is common belief in rationality, is
true (in the actual world of the model). A class of models defined this way in turn
determines a set of strategy profiles for the game: a profile is a member of the set
if and only if it is realized in the actual world of one of the models in the class
of models. This fact gives us a way that is both precise and intuitively motivated
of defining a solution concept for games, or of giving a proof of adequacy for a
solution concept already defined. The solution concept that has the most transparent
semantic motivation of this kind is rationalizability: we can define rationalizability
semantically as the set of strategies of a game that are realized in (the actual world
of) some model in which there is common belief in rationality.7 Or, we can give
7
This model theoretic definition of rationalizability coincides with the standard concept defined by
Bernheim (1984) and Pearce (1984) only in two person games. In the general case, it coincides
with the weaker concept, correlated rationalizability. Model theoretic conditions appropriate for
the stronger definition would require that players’ beliefs about each other satisfy a constraint
that (in games with more than two players) goes beyond coherence: specifically, it is required
that no player can believe that any information about another player’s strategy choices would be
evidentially relevant to the choices of a different player. I think this constraint could be motivated,
in general, only if one confused causal with evidential reasoning. The structure of the game ensures
904 R. Stalnaker
a direct nonsemantic definition of the set of strategies – the set of strategies that
survive the iterated elimination of strictly dominated strategies – and then prove that
this set is characterized by the class of models in which there is common belief in
rationality: a set of strategies is characterized by a class of models if the set includes
exactly the strategies that are realized in some model in the class.8
Belief Revision
There are many ways to modify and extend this simple conception of a model of
a game. I will consider here just one embellishment, one that is relevant to our
concern with counterfactual reasoning. This is the addition of some structure to
model the players’ policies for revising their beliefs in response to new information.
We assume, as is usual, that rational players are disposed to revise their beliefs
by conditionalization, but there is nothing in the models we have defined to say
how players would revise their beliefs if they learned something that had a prior
probability of 0 – something incompatible with the initial state of belief. A belief
revision policy is a way of determining the sets of possible worlds that define the
posterior belief states that would be induced by such information. The problem is
not to generate such belief revision policies out of the models we already have –
that is impossible. Rather, it is to say what new structure needs to be added to the
model in order to represent belief revision policies, and what formal constraints the
policies must obey.
Since we are modeling strategic form games, our models are static, and so
there is no representation of any actual change in what is believed. But even in a
static situation, one might ask how an agent’s beliefs are disposed to change were
he to learn that he was mistaken about something he believed with probability
one, and the answer to this question may be relevant to his decisions. These
dispositions to change beliefs, in contrast to the potential changes that would display
the dispositions, are a part of the agent’s prior subjective state – the only state
represented in the worlds of our models.
I said at the start that one aim in constructing this model theory was to clarify,
in isolation, the separate concepts that interact with each other in strategic contexts,
and that are the component parts of the complex concepts used to describe those
contexts. In keeping with this motivation, I will first look at a pure and simple
abstract version of belief revision theory, for a single agent in a single possible
that players’ strategy choices are made independently: if player one had chosen differently, it could
not have influenced the choice of player two. But this assumption of causal independence has
no consequences about the evidential relevance of information about player one’s choice for the
beliefs that a third party might rationally have about player two. (Brian Skyrms (1992, pp. 147–8)
makes this point.)
8
This characterization theorem is proved in Stalnaker (1994).
42 Knowledge, Belief and Counterfactual Reasoning in Games 905
world, ignoring degrees of belief, and assuming nothing about the subject matter
of the beliefs. After getting clear about the basic structure, I will say how to
incorporate it into our models, with many agents, many possible worlds, and
probability measures on both the prior and posterior belief states. The simple
theory that I will sketch is a standard one that has been formulated in a number
of essentially equivalent ways by different theorists.9 Sometimes the theory is
formulated syntactically, with prior and posterior belief states represented by sets
of sentences of some formal language, but I will focus on a purely model theoretic
formulation of the theory in which the agent’s belief revision policy is represented
by a set of possible worlds – the prior belief state – and a function taking each
piece of potential new information into the conditional belief state that corresponds
to the state that would be induced by receiving that information. Let B be the set
representing the prior state, and let B0 be the set of all the possible worlds that are
compatible with any new information that the agent could possibly receive. Then if
is any proposition which is a subset of B0 , B() will be the set that represents the
posterior belief state induced by information .
There are just four constraints that the standard belief revision theory imposes on
this belief revision function:
1. For any , B()
2. If is nonempty, then B() is nonempty
3. If B\ is nonempty, then B() D B\
4. If B()\ is nonempty, then B(& ) D B()\
The first condition is simply the requirement that the new information received
is believed in the conditional state. The second is the requirement that consistent
information results in a consistent conditional state. The third condition requires that
belief change be conservative in the sense that one should not give up any beliefs
unless the new information forces one to give something up: if is compatible with
the prior beliefs, the conditional belief state will simply add to the prior beliefs.
The fourth condition is a generalization of the conservative condition. Its effect is
to require that if two pieces of information are received in succession, the second
being compatible with the posterior state induced by the first, then the resulting
change should be the same as if both pieces of information were received together.
Any belief revision function meeting these four conditions can be represented by
an ordering of all the possible worlds, and any ordering of a set of possible worlds
will determine a function meeting the four conditions. Let Q be any binary transitive
and connected relation on a set B0 . Then we can define B as the set of highest ranking
members of B0 , and for any subset of B0 , we can define B() as the set of highest
ranking members of :
9
The earliest formulation, so far as I know, of what has come to be called the AGM belief revision
theory was given by William Harper (1975). For a general survey of the belief revision theory, see
Gärdenfors (1988). Other important papers include Alchourón and Makinson (1982), Alchourón
et al. (1985), Grove (1988), Makinson (1985) and Spohn (1987).
906 R. Stalnaker
It is easy to show that this function will satisfy the four conditions. On the other
hand, given any revision function meeting the four conditions, we can define a
binary relation Q in terms of it as follows:
It is easy to show, using the four conditions, that Q, defined this way, is transitive
and connected, and that B() D fx2 : yQx for all y2 g. So the specification of
such a Q relation is just an alternative formulation of the same revision theory.
Now to incorporate this belief revision theory into our models, we need to
give each player such a belief revision policy in each possible world. This will be
accomplished if we add to the model a binary relation Q for each player. We need
just one such relation for each player, if we take our assumption that players know
their own states of mind to apply to belief revision policies as well as to beliefs
themselves. Since the belief revision policy is a feature of the agent’s subjective
state, it is reasonable to assume that in all possible worlds that are subjectively
indistinguishable for a player, he has the same belief revision policies.
Subjective indistinguishability (which we defined as follows: x i y if and only
if fz:xRi zg D fz:yRi zg) is an equivalence relation that partitions the space of all
possible worlds for each player, and the player’s belief revision function will be the
same for each world in the equivalence class. (The equivalence class plays the role of
B0 in the simple belief revision structure.) What we need to add to the game model
is a relation Qi for each player that orders all the worlds within each equivalence
class with respect to epistemic plausibility, with worlds compatible with what the
player believes in the worlds in that class having maximum plausibility. So Qi must
meet the following three conditions:
(q1) x i y, if and only if xQi y or yQi x.
(q2) Qi is transitive.
(q3) xRi y if and only if wQi y for all w such that w i x.
For any proposition , we can define the conditional belief state for player i in
world x, Bi,x () (the posterior belief state that would be induced by learning ),10
in terms of Qi as follows:
10
There is this difference between the conditional belief state Bi,x () and the posterior belief state
that would actually result if the agent were in fact to learn that : if he were to learn that , he would
believe that he then believed that , whereas in our static models, there is no representation of what
the agent comes to believe in the different possible worlds at some later time. But the potential
posterior belief states and the conditional belief states as defined do not differ with respect to any
information represented in the model. In particular, the conditional and posterior belief states do
not differ with respect to the agent’s beliefs about his prior beliefs.
42 Knowledge, Belief and Counterfactual Reasoning in Games 907
Once we have added to our models a relation Qi for each player that meets these
three conditions, the R relations become redundant, since they are definable in terms
of Q.11 For a more economical formulation of the theory, we drop the Ri 0 s when
we add the Qi 0 s, taking condition (q1) as above the new definition of subjective
indistinguishability, and condition (q3) as the definition of Ri . Formulated this way,
the models are now defined as follows:
A model is a structure <W,a,<Si ,Qi ,Pi >i2N >. W, a, Pi , and Si are as before; Each
Qi is a binary reflexive transitive relation on W meeting in addition the following
condition: any two worlds that are Qi related (in either direction) to a third world are
Qi related (in at least one direction) to each other. One can then prove that each Ri ,
defined as above, is serial, transitive, and euclidean. So our new models incorporate
and refine models of the simpler kind.
To summarize, the new structure we have added to our models expresses exactly
the following two assumptions:
1. In each possible world each player has a belief revision policy that conforms to
the conditions of the simple AGM belief revision theory sketched above, where
(for player i and world x) the set B is fy:xRi yg, and the set B0 is fy: y i xg
2. In each world, each player has a correct belief about what his own belief revision
policy is.
Each player’s belief revision structure determines a ranking of all possible worlds
with respect to the player’s degree of epistemic success or failure in that world. In
some worlds, the player has only true beliefs; in others, he makes an error, but not
as serious an error as he makes in still other possible worlds. Suppose I am fifth on
a standby waiting list for a seat on a plane. I learn that there is only one unclaimed
seat, and as a result I feel certain that I will not get on the plane. I believe that the
person at the top of the list will certainly take the seat, and if she does not, then I
am certain that the second in line will take it, and so on. Now suppose in fact that
my beliefs are mistaken: the person at the top of the list turns the seat down, and the
next person takes it. Then my initial beliefs were in error, but not as seriously as they
would be if I were to get the seat. If number two gets the seat, then I was making a
simple first degree error, while if I get the seat, I was making a fourth degree error.
It will be useful to define, recursively, a sequence of propositions that distinguish
the possible worlds in which a player’s beliefs are in error to different degrees:
11
The work done by Q is to rank the worlds incompatible with prior beliefs; it does not distinguish
between worlds compatible with prior beliefs – they are ranked together at the top of the ordering
determined by Q. So Q encodes the information about what the prior beliefs are – that is why R
becomes redundant. A model with both Q and R relations would specify the prior belief sets in two
ways. Condition (q3) is the requirement that the two specifications yield the same results.
Here is a simple abstract example, just to illustrate the structure: suppose there are just three
possible worlds, x y and z, that are subjectively indistinguishable in those worlds to player i.
Suppose fxg is the set of worlds compatible with i0 s beliefs in x, y, and z, which is to say that
the R relation is the following set: f<x,x>,<y,x>,<z,x>g. Suppose further that y has priority over z,
which is to say if i were to learn the proposition fy,zg, his posterior or conditional belief state would
be fyg. In other words, the Q relation is the following set: f<x,x>, <y,x>,<z,x>,<y,y>,<z, y>,<z, z>g.
908 R. Stalnaker
E1 i is the proposition that player i has at least some false belief – makes at least
a simple first degree error.
Ei1 D fx 2 W: for some y such that y i x, not yQi xg(D fx 2W: not xRi xg)
EkC1 i is the proposition that player i makes at least a k C 1 degree error:
˚
EkC1
i D x 2 Eki W for some y 2 Eki such that y i x; not yQi x :
The belief revision structure provides for epistemic distinctions between proposi-
tions that are all believed with probability one. Even though each of two propositions
has maximum degree of belief, one may be believed more robustly than the other
in the sense that the agent is more disposed to continue believing it in response to
new information. Suppose, to take a fanciful example, there are three presidential
candidates, George, a Republican from Texas, Bill, a Democrat from Arkansas, and
Ross, an independent from Texas. Suppose an agent believes, with probability one,
that George will win. She also believes, with probability one, that a Texan will win
and that a major party candidate will win, since these follow, given her other beliefs,
from the proposition that George will win. But one of these two weaker beliefs may
be more robust than the other. Suppose the agent is disposed, on learning that George
lost, to conclude that Bill must then be the winner. In this case, the belief that a major
party candidate will win is more robust than the belief that a Texan will win.
The belief revision structure is purely qualitative, but the measure functions
that were already a part of the models provide a measure of the partial beliefs for
conditional as well as for prior belief states. The Q relations, like the R relations,
deliver the sets of possible worlds relative to which degrees of belief are defined.
The partial beliefs for conditional belief state, like those for the prior states, are
given by relativizing the measure function to the relevant set of possible worlds.
Just as player i’s partial beliefs in possible world x are given by relativizing the
measure to the set Bi,x D fy:xRi yg, so the partial beliefs in the conditional belief
state for player i, world x and condition is given by relativizing the measure to the
set Bi,x () D fy2 : for all z2 such that z i x, zQi y).
So with the help of the belief revision function we can define conditional
probability functions for each player in each world:
In the case where the condition is compatible with i0 s prior beliefs – where
Pi,x ( ) > 0 this will coincide with conditional probability as ordinarily defined.
(This is ensured by the conservative condition on the belief revision function.)
But this definition extends the conditional probability functions for player x in
world i to any condition compatible with the set of worlds that are subjectively
indistinguishable for x in i.12
12
These extended probability functions are equivalent to lexicographic probability systems. See
Blume et al. (1991a, b) for an axiomatic treatment of lexicographic probability in the context of
decision theory and game theory. These papers discuss a concept equivalent to the one defined
below that I am calling perfect rationality.
42 Knowledge, Belief and Counterfactual Reasoning in Games 909
The belief revision theory, and the extended probability functions give us the
resources to introduce a refinement of the concept of rationality. Say that an action
is perfectly rational if it not only maximizes expected utility, but also satisfies a
tie-breaking procedure that requires that certain conditional expected utilities be
maximized as well. The idea is that in cases where two or more actions maximize
expected utility, the agent should consider, in choosing between them, how he
should act if he learned he was in error about something. And if two actions are
still tied, the tie-breaking procedure is iterated – the agent considers how he should
act if he learned that he were making an error of a higher degree. Here is a sequence
of definitions leading to a definition of perfect rationality.
Given the extended conditional probability functions, the definition of condi-
tional expected utility is straightforward:
X
eui;x .s=/ D Pi;x .Œe =/ ui ..s; e//
e2C i
rkC1 k
i;x D s 2 ri;x W eui;x s=E
kC1
eui;x s0 =EkC1 for all s0 2 rki;x :
rC k k
i;x D \ri;x for all k such that E \ fy W xi yg is nonempty:
The set rC
i;x is the set of strategies that are perfectly rational for player i in world x.
So the proposition that player i is perfectly rational is defined as follows:
˚
AC C
i D x 2 W W Si .x/ 2 ri;x :
I want to emphasize that this refinement is defined wholly within individual decision
theory. The belief revision theory that we have imported into our models is a general,
abstract structure, as appropriate for a single agent facing a decision problem to
which the actions of other agents are irrelevant as it is for a situation in which there
are multiple agents. It is sometimes said that while states with probability 0 are
I don’t want to suggest that this is the only way of combining the AGM belief revision structure
with probabilities. For a very different kind of theory, see Mongin (1994). In this construction,
probabilities are nonadditive, and are used to represent the belief revision structure, rather than to
supplement it as in the models I have defined. I don’t think the central result in Mongin (1994)
(that the same belief revision structure that I am using is in a sense equivalent to a nonadditive, and
so non-Bayesian, probability conception of prior belief) conflicts with, or presents a problem for,
the way I have defined extended probability functions: the probability numbers just mean different
things in the two constructions.
910 R. Stalnaker
relevant in game theory, they are irrelevant to individual decision making,13 but I
see no reason to make this distinction. There is as much or as little reason to take
account, in one’s deliberation, of the possibility that nature may surprise one as there
is to take account of the possibility that one may be fooled by one’s fellow creatures.
Perfect rationality is a concept of individual decision theory, but in the game
model context this concept may be used to give a model theoretic definition of
a refinement of rationalizability. Say that a strategy of a game ¦ is perfectly
rationalizable if and only if the strategy is played in some model of ¦ in which
the players have common belief that they all are perfectly rational. As with ordinary
correlated rationalizability, one can use a simple algorithm to pick out the relevant
class of strategies, and prove a characterization theorem that states that the model
theoretic and algorithmic definitions determine the same class of strategies. Here is
the theorem:
Strategies that survive the elimination of all weakly dominated strategies followed
by the iterated elimination of strictly dominated strategies are all and only those
that are realized in a model in which players have common belief that all are
perfectly rational.14
Before going on to discuss knowledge, let me give two examples of games to
illustrate the concepts of perfect rationality and perfect rationalizability.
First, consider the following very simple game: Alice can take a dollar for herself
alone, ending the game, or instead leave the decision up to Bert, who can either
decide whether the two players get a dollar each, or whether neither gets anything.
Figure 42.1 represents the strategic form of this game.
Both strategies for both players are rationalizable, but only Tt is perfectly
rationalizable. If Alice is certain that Bert will play t, then either of her strategies
would maximize expected utility. But only choice T will ensure that utility is
maximized also on the condition that her belief about Bert’s choice is mistaken.
Similarly, Bert may be certain that Alice won’t give him the chance to choose, but if
he has to commit himself to a strategy in advance, then if he is perfectly rational, he
will opt for the choice that would maximize expected utility if he did get a chance
to choose.
13
For example, Fudenberg and Tirole (1992) make the following remark about the relation between
game theory and decision theory: ‘Games and decisions differ in one key respect: probability-0
events are both exogenous and irrelevant in decision problems, whereas what would happen if a
player played differently in a game is both important and endogenously determined’.
To the extent that this is true, it seems to me an accident of the way the contrasting theories are
formulated, and to have no basis in any difference in the phenomena that the theories are about.
14
The proof of this theorem, and others stated without proof in this paper, are available from
the author. The argument is a variation of the proof of the characterization theorem for simple
(correlated) rationalizability given in Stalnaker (1994). See Dekel and Fudenberg (1990) for
justification of the same solution concept in terms of different conditions that involve perturbations
of the payoffs.
I originally thought that the set of strategies picked out by this concept of perfect rational-
izability coincided, in the case of two person games, with perfect rationalizability as defined by
Bernheim (1984), but Pierpaolo Battigalli pointed out to me that Bernheim’s concept is stronger.
42 Knowledge, Belief and Counterfactual Reasoning in Games 911
Fig. 42.1
Fig. 42.2
Second, consider the following pure common interest game, where the only
problem is one of coordination. It is also a perfect information game. One might
think that coordination is no problem in a perfect information game, but this example
shows that this is not necessarily true.
Alice can decide that each player gets two dollars, ending the game, or can leave
the decision to Bert, who may decide that each player get one dollar, or may give
the decision back to Alice. This time, Alice must decide whether each player gets
three dollars, or neither gets anything. Figure 42.2 represents the strategic form of
this game.
Now suppose Bert believes, with probability one, that Alice will choose T; what
should he do? This depends on what he thinks Alice would do on the hypothesis
that his belief about her is mistaken. Suppose that, if he were to be surprised by
Alice choosing L on the first move, he would conclude that, contrary to what he
previously believed, she is irrational, and is more likely to choose L on her second
choice as well. Given these belief revision policies, only choice t is perfectly rational
for him. But why should Alice choose T? Suppose she is sure that Bert will choose
t, which as we have just seen, is the only perfectly rational choice for him to make if
his beliefs about Alice are as we have described. Then Alice’s only rational choice
is T. So it might be that Alice and Bert both know each others’ beliefs about each
other, and are both perfectly rational, but they still fail to coordinate on the optimal
912 R. Stalnaker
outcome for both. Of course nothing in the game requires that Bert and Alice should
have these beliefs and belief revision policies, but the game is compatible with them,
and with the assumption that both Bert and Alice are perfectly rational.
Now one might be inclined to question whether Bert really believes that Alice
is fully rational, since he believes she would choose L on her second move, if she
got a second move, and this choice, being strictly dominated, would be irrational.
Perhaps if Bert believed that Alice was actually disposed to choose L on her second
move, then he wouldn’t believe she was fully rational, but it is not suggested that
he believes this. Suppose we divide Alice’s strategy T into two strategies, TT and
TL, that differ only in Alice’s counterfactual dispositions: the two strategies are ‘T,
and I would choose T again on the second move if I were faced with that choice’,
and ‘T, but I would choose L on the second move if I were faced with that choice’.
One might argue that only TT, of these two, could be fully rational, but we may
suppose that Bert believes, with probability one, that Alice will choose TT, and not
TL. But were he to learn that he is wrong – that she did not choose TT (since she
did not choose T on the first move) he would conclude that she instead chooses LL.
To think there is something incoherent about this combination of beliefs and belief
revision policy is to confuse epistemic with causal counterfactuals – it would be like
thinking that because I believe that if Shakespeare hadn’t written Hamlet, it would
have never been written by anyone, I must therefore be disposed to conclude that
Hamlet was never written, were I to learn that Shakespeare was in fact not its author.
Knowledge
expected utility, not the value of the actual payoff that I receive in the end, that is
relevant to the explanation and evaluation of my actions, and expected utility cannot
be influenced by facts about the actual world that do not affect my beliefs. But as
soon as we start looking at one person’s beliefs and knowledge about another’s
beliefs and knowledge, the difference between the two notions begins to matter.
The assumption that Alice believes (with probability one) that Bert believes (with
probability one) that the cat ate the canary tells us nothing about what Alice believes
about the cat and the canary themselves. But if we assume instead that Alice knows
that Bert knows that the cat ate the canary, it follows, not only that the cat in fact ate
the canary, but that Alice knows it, and therefore believes it as well.
Since knowledge and belief have different properties, a concept that conflates
them will have properties that are appropriate for neither of the two concepts taken
separately. Because belief is a subjective concept, it is reasonable to assume, as we
have, that agents have introspective access to what they believe, and to what they do
not believe. But if we switch from belief to knowledge, an external condition on the
cognitive state is imposed, and because of this the assumption of introspective access
is no longer tenable, even for logically omniscient perfect reasoners whose mental
states are accessible to them. Suppose Alice believes, with complete conviction and
with good reason that the cat ate the canary, but is, through no fault of her own,
factually mistaken. She believes, let us suppose, that she knows that the cat ate the
canary, but her belief that she knows it cannot be correct. Obviously, no amount
of introspection into the state of her own mind will reveal to her the fact that she
lacks this knowledge. If we conflate knowledge and belief, assuming in general that
i knows that if and only if i0 s degree of belief for is one, then we get a concept
that combines the introspective properties appropriate only to the internal, subjective
concept of belief with the success properties appropriate only to an external concept
that makes claims about the objective world. The result is a concept of knowledge
that rests on equivocation.
The result of this equivocation is a concept of knowledge with the familiar
partition structure, the structure often assumed in discussions by economists and
theoretical computer scientists about common knowledge, and this simple and
elegant structure has led to many interesting results.15 But the assumption that
knowledge and common knowledge have this structure is the assumption that there
can be no such thing as false belief, that while ignorance is possible, error is not.
And since there is no false belief, there can be no disagreement, no surprises, and
no coherent counterfactual reasoning.16
15
Most notably, Robert Aumann’s important and influential result on the impossibility of agreeing
to disagree, and subsequent variations on it all depend on the partition structure, which requires
the identification of knowledge with belief. See Aumann (1976) and Bacharach (1985). The initial
result is striking, but perhaps slightly less striking when one recognizes that the assumption that
there is no disagreement is implicitly a premise of the argument.
16
If one were to add to the models we have defined the assumption that the R relation is reflexive,
and so (given the other assumptions) is an equivalence relation, the result would be that the three
relations, Ri , Qi , and i , would all collapse into one. There would be no room for belief revision,
914 R. Stalnaker
since it would be assumed that no one had a belief that could be revised. Intuitively, the assumption
would be that it is a necessary truth that all players are Cartesian skeptics: they have no probability-
one beliefs about anything except necessary truths and facts about their own states of mind. This
assumption is not compatible with belief that another player is rational, unless it is assumed that it
is a necessary truth that the player is rational.
42 Knowledge, Belief and Counterfactual Reasoning in Games 915
is the model theoretic definition: for any game ¦ a strategy profile is strongly
rationalizable if and only if it is realized in a model in which there is no error,
common belief that all players are rational, and common belief that there is no
error. The set of strategy profiles characterized by this condition can also be given an
algorithmic definition, using an iterated elimination procedure intermediate between
the elimination of strictly dominated and of weakly dominated strategies.17 We
can also define a further refinement, strong perfect rationalizability: just substitute
‘perfect rationality’ for ‘rationality’ in the condition defining strong rationalizability.
A minor variation of the algorithm will pick out the set of strategy profiles
characterized by these conditions.
Knowledge and belief coincide on this demanding idealization, but suppose we
want to consider the more general case in which a person may know some things
about the world, even while being mistaken about others. How should knowledge
be analyzed? The conception of knowledge that I will propose for consideration
is a simple version of what has been called, in the philosophical literature about
the analysis of knowledge, the defeasibility analysis. The intuitive idea behind this
account is that ‘if a person has knowledge, then that person’s justification must be
sufficiently strong that it is not capable of being defeated by evidence that he does
not possess’ (Pappas and Swain 1978). According to this idea, if evidence that is
unavailable to you would give you reason to give up a belief that you have, then
your belief rests in part on your ignorance of that evidence, and so even if that belief
is true, it will not count as knowledge.
We can make this idea precise by exploiting the belief revision structure sketched
above, and the notion of robustness that allowed us to make epistemic distinctions
between propositions believed with probability one. The analysis is simple: i knows
that if and only if i believes that (with probability one), and that belief is robust
with respect to the truth. That is, i knows that in a possible world x if and only if
receives probability one from i in x, and also receives probability one in every condi-
tional belief state for which the condition is true in x. More precisely, the proposition
that i knows that is the set fx 2 W: for all such that x 2 , Bi ,x ( ) g.
Let me illustrate the idea with the example discussed above of the presidential
candidates. Recall that there are three candidates, George, Bill and Ross, and that
the subject believes, with probability one, that George will win. As a result she also
believes with probability one that a Texan will win, and that a major party candidate
will win. But the belief that a major party candidate will win is more robust than
the belief that a Texan will win, since our subject is disposed, should she learn that
George did not win, to infer that the winner was Bill. Now suppose, to everyone’s
surprise, Ross wins. Then even though our subject’s belief that a Texan would win
turned out to be true, it does not seem reasonable to say that she knew that a Texan
would win, since she was right only by luck. Had she known more (that George
17
The algorithm, which eliminates iteratively profiles rather than strategies, is given in Stalnaker
(1994), and it is also proved there that the set of strategies picked out by this algorithm is
characterized by the class of models meeting the model theoretic condition.
916 R. Stalnaker
would lose), then that information would have undercut her belief. On the other
hand, if Bill turns out to be the winner, then it would not be unreasonable to say that
she knew that a major party candidate would win, since in this case her belief did
not depend on her belief that it was George rather than Bill that would win.
The defeasibility conception of knowledge can be given a much simpler defi-
nition in terms of the belief revision structure. It can be shown that the definition
given above is equivalent to the following: the proposition i knows that is the
set fx: fy:xQi yg g. This exactly parallels the definition of the proposition that
i believes that : fx: fy:xRi yg g. On the defeasibility analysis, the relations
that define the belief revision structure are exactly the same as the relations of
epistemic accessibility in the standard semantics for epistemic logic.18 And common
knowledge (the infinite conjunction, everyone knows that , everyone knows that
everyone knows that , : : : ) exactly parallels common belief: the proposition there
is common knowledge that is fx:fy:xQ*yg g, where Q* is the transitive closure
of the Qi relations.
The defeasibility analysis provides us with two new model theoretic conditions
that can be used to define solution concepts: first, the condition that there is common
knowledge of rationality; second, the condition that there is common knowledge of
perfect rationality. The conditions are stronger (respectively) than the conditions
we have used to characterize rationalizability and perfect rationalizability, but
weaker than the conditions that characterize the concepts I have called strong
rationalizability and strong perfect rationalizability. That is, the class of models in
which there is common belief in (perfect) rationality properly includes the class in
which there is common knowledge, in the defeasibility sense, of (perfect) rationality,
which in turn properly includes the class in which there is no error, common belief
that there is no error, and common belief in (perfect) rationality. So the defeasibility
analysis gives us two distinctive model theoretic solution concepts, but surprisingly,
the sets of strategy profiles characterized by these new model theoretic conditions
are the same as those characterized, in one case, by the weaker condition, and in the
other case by the stronger condition. That is, the following two claims are theorems:
1. Any strategy realized in a model in which there is common belief in (simple)
rationality is also realized in a model in which there is common knowledge (in
the defeasibility sense) of rationality.
2. Any strategy profile realized in a model in which there is common knowledge of
perfect rationality is also realized in a model meeting in addition the stronger
condition that there is common belief that no one has a false belief.19
18
The modal logic for the knowledge operators in a language that was interpreted relative to this
semantic structure would be S4.3. This is the logic characterized by the class of Kripke models in
which the accessibility relation is transitive, reflexive, and weakly connected (if xQi y and xQi z,
then either yQi z or zQi y). The logic of common knowledge would be S4.
19
Each theorem claims that any strategy that is realized in a model of one kind is also realized in
a model that meets more restrictive conditions. In each case the proof is given by showing how to
modify a model meeting the weaker conditions so that it also meets the more restrictive conditions.
42 Knowledge, Belief and Counterfactual Reasoning in Games 917
Backward Induction
To illustrate how some of this apparatus might be deployed to help clarify the role
in strategic arguments of assumptions about knowledge, belief and counterfactual
reasoning, I will conclude by looking at a puzzle about backward induction reason-
ing, focusing on one notorious example: the finite iterated prisoners’ dilemma. The
backward induction argument purports to show that if there is common belief, or
perhaps common knowledge, that both players are rational, then both players will
defect every time, from the beginning. Obviously rational players will defect on the
last move, and since they know this on the next to last move, they will defect then
as well, and so on back through the game. This kind of argument is widely thought
to be paradoxical, but there is little agreement about what the paradox consists in.
Some say that the argument is fallacious, others that it shows an incoherence in
the assumption of common knowledge of rationality, and still others that it reveals
a self-referential paradox akin to semantic paradoxes such as the liar. The model
theoretic apparatus we have been discussing gives us the resources to make precise
the theses that alternative versions of the argument purport to prove, and to assess
the validity of the arguments. Some versions are clearly fallacious, but others, as I
will show, are valid.
The intuitive backward induction argument applies directly to games in extensive
form, whereas our game models are models of static strategic form games.20 But any
extensive form game has a unique strategic form, and proofs based on the idea of
the intuitive backward induction argument can be used to establish claims about
the strategic form of the game. A backward induction argument is best seen as
an argument by mathematical induction about a class of games that is closed with
respect to the subgame relation – in the case at hand, the class of iterated prisoners’
dilemmas of length n for any natural number n.
The conclusions of the backward induction arguments are conditional theses: if
certain conditions obtain, then players will choose strategies that result in defection
every time. The conditions assumed will correspond to the constraints on models
that we have used to characterize various solution concepts, so the theses in
question will be claims that only strategy profiles that result in defection every
time will satisfy the conditions defining some solution concept. If, for example, the
conditions are that there is common belief in rationality, then the thesis would be
that only strategies that result in defection every time are rationalizable. It is clear
that a backward induction argument for this thesis must be fallacious since many
20
Although in this paper we have considered only static games, it is a straightforward matter to
enrich the models by adding a temporal dimension to the possible worlds, assuming that players
have belief states and perform actions at different times, actually revising their beliefs in the course
of the playing of the game in accordance with a belief revision policy of the kind we have supposed.
Questions about the relationship between the normal and extensive forms of games, and about the
relations between different extensive-form games with the same normal form can be made precise
in the model theory, and answered.
918 R. Stalnaker
cooperative strategies are rationalizable. Pettit and Sugden (1989) have given a nice
diagnosis of the fallacy in this version of the argument. But what if we make the
stronger assumption that there is common knowledge of rationality, or of perfect
rationality? Suppose, first, that we make the idealizing assumption necessary for
identifying knowledge with belief: that there is no error and common belief that
there is no error, and common belief that both players are rational. Are all strongly
rationalizable strategy pairs in the iterated prisoners’ dilemma pairs that result in
defection every time? In this case the answer is positive, and the theorem that states
this conclusion is proved by a backward induction argument.
To prove this backward induction theorem, we must first prove a lemma that is
a general claim about multi-stage games – a class of games that includes iterated
games. First, some notation and terminology: let ¦ be any game that can be repre-
sented as a multi-stage game with observed action (a game that can be divided into
stages where at each stage all players move simultaneously, and all players know
the result of all previous moves). Let ¦# be any subgame – any game that begins at
the start of some later stage of ¦. For any strategy profile c of ¦ that determines a
path through the subgame ¦# , let c# be the profile for ¦# that is determined by c, and
let C# be the set of all strategy profiles of ¦ that determine a path through ¦# . By ‘an
SR model’, I will mean a model in which there is (in the actual world of the model)
no error, common belief that there is no error, and common belief that all players
are rational. Now we can state the multi-stage game lemma:
If profile c is strongly rationalizable in & , and if c determines a path through & # ,
then c# is strongly rationalizable in & # .
This is proved by constructing a model for # in terms of a model for , and
showing that if the original model is an SR model, so is the new one. Let M be any
SR model for in which c is played in the actual world of the model. Let # be any
subgame that contains the path determined by c. We define a model M# for # in
terms of M as follows: W# D fx 2 W:S(x) 2 C# g. The Qi #0 s and Pi #0 s are simply the
restrictions of the Qi 0 s and Pi 0 s to W# . The Si #0 s are defined so that for each x 2 W# ,
S# (x) is the profile for the game # that is determined by the profile S(x). (That is,
if S(x) D e, then S# (x) D e# .)
To see that M# is an SR model for # , note first that if there is no error
and common belief that there is no error in the original model, then this will
also hold for the model of the subgame: if fx:aR* xg fx:xRi x for all ig, then
fx:aR#* xg fx:xRi # x for all ig. This is clear, since fx:aR#* xg fx:aR* x) \W# , and
(x:xRi # x for all i) D fx:xRi x for all ig\W# . Second, because of the fact that players
know all previous moves at the beginning of each stage, they can make their strategy
choices conditional on whether a subgame is reached. (More precisely, for any
player i and pair of strategies s and s0 for i, that are compatible with # being
reached, there is a strategy equivalent to this: s if # is reached, s0 if not.) This
implies that for any world w, player i and subgame such that it is compatible with
i0 s beliefs that that subgame be reached, a strategy will be rational for i only if the
strategy determined for the subgame is rational, conditional on the hypothesis that
42 Knowledge, Belief and Counterfactual Reasoning in Games 919
the subgame is reached. This ensures that rationality is preserved in all worlds when
the model is modified. So c# is strongly rationalizable in # .
An analogous result about strong perfect rationalizability can be shown by
essentially the same argument.
One further observation before turning to the backward induction theorem itself:
for any game , if profile c is compatible with common belief in (the actual world
of) an SR model for , then c itself is strongly rationalizable. It is obvious that if
S(x) D c and aR*x, then the same model, with x rather than a as the actual world
will be an SR model if the original model was.
Now the backward induction theorem:
Any strongly rationalizable strategy profile in a finite iterated prisoners’ dilemma is
one in which both players defect every time.
The proof is by induction on the size of the game. For the base case – the one
shot PD – it is obvious that the theorem holds, since only defection is rational. Now
assume that the theorem holds for games of length k. Let be a game of length
k C 1, and be the corresponding iterated PD of length k. Let M be any SR model
of , and let c be any strategy profile that is compatible with common belief (that
is, c is any profile for which there exists an x such that S(x) D c, and aR*x). By
the observation just made, c is strongly rationalizable, so by the multi-stage game
lemma, c (the profile for determined by c) is strongly rationalizable in . But
then by hypothesis of induction, c is a profile in which both players defect every
time. So c (in game ) is a profile in which both players defect every time after the
first move. But c is any profile compatible with common belief in the actual world
of the model, so it follows that in the model M, it is common belief that both players
will choose strategies that result in defection every time after the first move. Given
these beliefs, any strategy for either player that begins with the cooperative move
is strictly dominated, relative to that player’s beliefs. So since the players are both
rational, it follows that they choose a strategy that begins with defection, and so one
that results in defection on every move.
Our theorem could obviously be generalized to cover some other games that have
been prominent in discussions of backward induction such as the centipede game
and (for strong perfect rationalizability) the chain store game. But it is not true, even
in perfect information games, that the strong or strong and perfect rationalizability
conditions are always sufficient to support backward induction reasoning. Recall
the perfect information, pure coordination game discussed above in which Alice
and Bert failed to coordinate on the backward induction equilibrium, even though
the conditions for strong perfect rationalizability were satisfied. In that example, the
strategy profile played was a perfect, but not subgame perfect, equilibrium. One can
show in general that in perfect information games, all and only Nash equilibrium
strategy profiles are strongly rationalizable (see Stalnaker (1994) for the proof).
As I noted at the end of the last section, it can be shown that the set of
strongly and perfectly rationalizable strategy profiles is characterized also by the
class of models in which there is common knowledge (in the defeasibility sense)
of perfect rationality. So we can drop the strong idealizing assumption that there
920 R. Stalnaker
is no error, and still get the conclusion that if there is common knowledge (in the
defeasibility sense) of perfect rationality, then players will choose strategies that
result in defection every time.
Pettit and Sugden, in their discussion of the paradox of backward induction,
grant that the argument is valid when it is common knowledge rather than common
belief that is assumed (though they don’t say why they think this, or what they are
assuming about knowledge). But they suggest that there is nothing surprising or
paradoxical about this, since the assumption of common knowledge of rationality
is incompatible with the possibility of rational deliberation, and so is too strong
to be interesting. Since knowledge logically implies truth, they argue, the argument
shows that ‘as a matter of logical necessity, both players must defect and presumably
therefore that they know they must defect’ (Pettit and Sugden 1989). But I think this
remark rests on a confusion of epistemic with causal possibilities. There is no reason
why I cannot both know that something is true, and also entertain the counterfactual
possibility that it is false. It is of course inconsistent to suppose, counterfactually or
otherwise, the conjunction of the claim that is false with the claim that I know
that is true, but it is not inconsistent for me, knowing (in the actual world) that
is true, to suppose, counterfactually, that is false. As Pettit and Sugden say, the
connection between knowledge and truth is a matter of logical necessity, but that
does not mean that if I know that I will defect, I therefore must defect, ‘as a matter
of logical necessity’. One might as well argue that lifelong bachelors are powerless
to marry, since it is a matter of logical necessity that lifelong bachelors never marry.
The semantic connection between knowledge and truth is not, in any case, what
is doing the work in this version of the backward induction argument: it is rather the
assumption that the players believe in common that neither of them is in error about
anything. We could drop the assumption that the players beliefs are all actually
true, assuming not common knowledge of rationality, but only common belief in
rationality and common belief that no one is in error about anything. This will suffice
to validate the induction argument.
Notice that the common belief that there will not, in fact, be any surprises,
does not imply the belief that there couldn’t be any surprises. Alice might think
as follows: ‘Bert expects me to defect, and I will defect, but I could cooperate, and
if I did, he would be surprised. Furthermore, I expect him to defect, but he could
cooperate, and if he did, I would be surprised’. If these ‘could’s were epistemic
or subjective, expressing uncertainty, then this soliloquy would make no sense,
but it is unproblematic if they are counterfactual ‘could’s used to express Alice’s
beliefs about her and Bert’s capacities. A rational person may know that she will
not exercise certain of her options, since she may believe that it is not in her interest
to do so.
It is neither legitimate nor required for the success of the backward induction
argument to draw conclusions about what the players would believe or do under
counterfactual conditions. In fact, consider the following ‘tat for tit’ strategy: defect
on the first move, then on all subsequent moves, do what the other player did on
the previous move, until the last move; defect unconditionally on the last move.
Our backward induction argument does not exclude the possibility that the players
42 Knowledge, Belief and Counterfactual Reasoning in Games 921
should each adopt, in the actual world, this strategy, since this pair of strategies
results in defection every time. This pair is indeed compatible with the conditions
for strong and perfect rationalizability. Of course unless each player assigned a very
low probability to the hypothesis that this was the other player’s strategy, it would
not be rational for him to adopt it, but he need not rule it out. Thus Pettit and Sugden
are wrong when they say that the backward induction argument can work only if it
is assumed that each player would maintain the beliefs necessary for common belief
in rationality ‘regardless of what the other does’ (Pettit and Sugden 1989, p. 178).
All that is required is the belief that the beliefs necessary for common knowledge
of rationality will, in fact, be maintained, given what the players in fact plan to do.
And this requirement need not be assumed: it is a consequence of what is assumed.
Conclusion
The aim in constructing this model theory was to get a framework in which to
sharpen and clarify the concepts used both by rational agents in their deliberative
and strategic reasoning and by theorists in their attempts to describe, predict and
explain the behavior of such agents. The intention was, first, to get a framework
that is rich in expressive resources, but weak in the claims that are presupposed
or implicit in the theory, so that various hypotheses about the epistemic states and
behavior of agents can be stated clearly and compared. Second, the intention was to
have a framework in which concepts can be analyzed into their basic components,
which can then be considered and clarified in isolation before being combined
with each other. We want to be able to consider, for example, the logic of belief,
individual utility maximization, belief revision, and causal-counterfactual structure
separately, and then put them together to see how the separate components interact.
The framework is designed to be extended, both by considering further specific
substantive assumptions, for example, about the beliefs and belief revision policies
of players, and by adding to the descriptive resources of the model theory additional
structure that might be relevant to strategic reasoning or its evaluation, for example
temporal structure for the representation of dynamic games, and resources for more
explicit representation of counterfactual propositions. To illustrate some of the
fruits of this approach we have stated some theorems that provide model theoretic
characterizations of some solution concepts, and have looked closely at one familiar
form of reasoning – backward induction – and at some conditions that are sufficient
to validate this form of reasoning in certain games, and at conditions that are not
sufficient. The focus has been on the concepts involved in two kinds of counter-
factual reasoning whose interaction is essential to deliberation in strategic contexts,
and to the evaluation of the decisions that result from such deliberation: reasoning
about what the consequences would be of actions that are alternatives to the action
chosen, and reasoning about how one would revise one’s beliefs if one were to
receive information that one expects not to receive. We can get clear about why
922 R. Stalnaker
people do what they do, and about what they ought to do, only by getting clear about
the relevance of what they could have done, and might have learned, but did not.21
References
21
I would like to thank Pierpaolo Battigalli, Yannis Delmas, Drew Fudenberg, Philippe Mongin,
Hyun Song Shin, Brian Skyrms, and an anonymous referee for helpful comments on several earlier
versions of this paper.
Chapter 43
Substantive Rationality and Backward
Induction
Joseph Y. Halpern
Starting with the work of Bicchieri (1988, 1989), Binmore (1987), and Reny
(1992), there has been intense scrutiny of the assumption of common knowledge
of rationality, the use of counterfactual reasoning in games, and the role of common
knowledge and counterfactuals in the arguments for backward induction in games
of perfect information. Startlingly different conclusions were reached by different
authors.
These differences were clearly brought out during a 2.5 h round table discussion
on “Common knowledge of rationality and the backward induction solution for
games of perfect information” at the 1998 TARK (Theoretical Aspects of Rational-
ity and Knowledge) conference. During the discussion, Robert Aumann and Robert
Stalnaker stated the following theorems:
Aumann’s Theorem (Informal version). Common knowledge of substantive
rationality implies the backwards induction solution in games of perfect
information.
Stalnaker’s Theorem (Informal version). Common knowledge of substantive
rationality does not imply the backwards induction solution in games of perfect
information.
The discussion during the round table was lively, but focused on more philo-
sophical, high-level issues. My goal in this short note is to explain the technical
differences between the framework of Aumann (1995) and Stalnaker (1996) that
lead to the different results, and to show what changes need to be made to Aumann’s
framework to get Stalnaker’s result.1 I believe that the points that I make here are
well known to some (and certainly were made informally during the discussion).
Indeed, many of the key conceptual points I make already appear in Stalnaker’s
discussion of Aumann’s result in Stalnaker (1998, pp. 45–50). However, since
Stalnaker uses belief rather than knowledge and must deal with the complications
of having probability, it is not so easy to directly compare his results in Stalnaker
(1998) with Aumann’s. I hope that the simpler model I present here will facilitate
a careful comparison of the differences between Aumann’s and Stalnaker’s results
and thus clarify a few issues, putting the debate on a more rational footing.
There are three terms in the theorems that need clarification:
• (common) knowledge
• rationality
• substantive rationality
I claim that Stalnaker’s result can be obtained using exactly the same definition of
(common) knowledge and rationality as the one Aumann (1995) used. The definition
of knowledge is the standard one, given in terms of partitions. (I stress this point
because Stalnaker (1996) has argued that probability-1 belief is more appropriate
than knowledge when considering games.) The definition of rationality is that a
player who uses strategy s is rational at vertex v if there is no other strategy that
he knows will give him a better payoff, conditional on being at vertex v. Both
Aumann and Stalnaker give substantive rationality the same reading: “rationality at
all vertices v in the game tree”. They further agree that this involves a counterfactual
statement: “for all vertices v, if the player were to reach vertex v, then the player
would be rational at vertex v”. The key difference between Aumann and Stalnaker
lies in how they interpret this counterfactual. In the rest of this note, I try to make
this difference more precise.
The Details
1
The model that I use to prove Stalnaker’s result is a variant of the model that Stalnaker (1996)
used, designed to be as similar as possible to Aumann’s model so as to bring out the key differences.
This, I believe, is essentially the model that Stalnaker had in mind at the round table.
43 Substantive Rationality and Backward Induction 925
Ki .E/ is the event that i knows E. Let A.E/ D K1 .E/ \ : : : \ Kn .E/. A.E/ is the event
that everyone (all the players) know E. Let
2
Again, I should stress that this is not exactly the model that Stalnaker uses in (1996), but it suffices
for my purposes. I remark that in Halpern (1999), I use selection functions indexed by the players,
so that player 1 may have a different selection function than player 2. I do not need this greater
generality here, so I consider the simpler model where all players use the same selection function.
3
There are certainly other reasonable properties we could require of the selection function. For
example, we might want to require that if v is reached in some state in Ki .!/, then f .!; v/ 2
Ki .!/. I believe that it is worth trying to characterize the properties we expect the selection
function should have, but this issue would take us too far afield here. (See Stalnaker RC, 1999,
Counterfactual propositions in games, Unpublished manuscript, for further discussion of this
point.) Note that F1–F3 are properties that seem reasonable for arbitrary games, not just games
of perfect information.
43 Substantive Rationality and Backward Induction 927
A a B a A a
(3,3)
n1 n2 n3
d d d
4
A game is nondegenerate if the payoffs are different at all the leaves.
5
Actually, F4 says that player i considers at least as many strategies possible at ! as at f .!; v/.
To capture the fact that player i’s beliefs about other players’ possible strategies do not change,
we would need the opposite direction of F4 as well: if ! 0 2 Ki .!/ then there exists a state ! 00 2
Ki .f .!; v// such that s.! 0 / and s.! 00 / agree on the subtree of below v. I do not impose this
requirement here simply because it turns out to be unnecessary for Aumann’s Theorem.
43 Substantive Rationality and Backward Induction 929
steps iff there exists a state ! 000 that is reachable from ! 0 in 1 step such that ! 00 is
reachable from ! 000 in k steps. We say that ! 00 is reachable from ! 0 if ! 00 is reachable
from ! 0 in k steps for some k. It is well known (Aumann 1976) that ! 0 2 CK.E/ iff
! 00 2 E for all ! 00 reachable from ! 0 .
I show by induction on k that for all states ! 0 reachable from !, if v is a vertex
which is at height k in the game tree (i.e., k moves away from a leaf), the move
dictated by the backward induction solution (for the subgame of rooted at v) is
played at v in state ! 0 .
For the base case, suppose v is at height 1 and ! 0 is reachable from !. Since
! 2 CK.S-RAT/, we must have ! 0 2 S-RAT. Suppose player i moves at ! 0 . Since
! 0 2 S-RAT, player i must make the move dictated by the backwards induction
solution at f .! 0 ; v/. By F3, he must do so at ! 0 as well.
For the inductive step, suppose that v is at height k C 1, player i moves at v,
and ! 0 is reachable from !. Suppose, by way of contradiction, that a is the action
indicated by the backward induction solution at v but si .! 0 /.v/ D a0 ¤ a. Note that
by the induction hypothesis, at every vertex below v, all the players play according
to the backward induction solution in state ! 0 . Since ! 0 2 S-RAT, we must have
that i is rational at v in f .! 0 ; v/. By F3, it follows that i plays a0 at vertex v in
f .! 0 ; v/ and at every vertex below v, the players play according to the backward
induction solution. Thus, there must be a state ! 00 2 Ki .f .! 0 ; v// such that by using
si .f .! 0 ; v//, player i does at least as well in ! 00 as by using the backward induction
strategy starting from v. By F4, there must exist some ! 000 2 Ki .! 0 / such that s.! 00 /
and s.! 000 / agree on the subtree of below v. Since ! 000 is reachable from !, by
the induction hypothesis, all players use the backward induction solution at vertices
below v. By F3, this is true at ! 00 as well. However, this means that player i does
better at ! 00 playing a at v than a0 , giving us the desired contradiction.
For the second half, given a nondegenerate game of perfect information, let s be
the strategy where, at each vertex v, the players play the move dictated the backward
induction solution in the game defined by the subtree below v. For each vertex v,
let sv be the strategy where the players play the actions required to reach vertex
v, and then below v, they play according to s. Note that if v is reached by s, then
sv D s. In particular, if r is the root of the tree, then sr D s. Consider the extended
model .; K1 ; : : : ; Kn ; s; f / where D f!v W v is a vertex of g, Ki .!v / D f!v g,
s.!v / D sv , and f .!; v/ is ! if v is reached by s.!/ and !v otherwise. I leave it
to check that this gives us an extended model where the selection function satisfies
F1–4.
Remarks. 1. As mentioned earlier, Theorem 1 is similar in spirit to the theorem
proved by Stalnaker (1998, p. 43). To distinguish this latter theorem from what
I have been calling “Stalnaker’s Theorem” in this paper, I call the latter theorem
“Stalnaker’s probabilistic theorem”, since it applies to his formal model with
probabilities. To state Stalnaker’s probabilistic theorem precisely, I need to
explain some differences between my model and his. First, as I mentioned earlier,
Stalnaker considers belief rather than knowledge, where belief is essentially
“with probability 1”. He defines a probabilistic analogue of extended models and
930 J.Y. Halpern
if v3 is reached, Ann will play a”? Or perhaps it should be “given her current
beliefs (regarding, for example, what move Bob will make), if v3 is reached, Ann
will play a”. Or perhaps it should be “in the state ‘closest’ to the current state
where v3 is actually reached, Ann plays a”.
I have taken the last reading here (where ‘closest’ is defined by the selection
function); assumption F3 essentially forces it to be equivalent to the second
reading.
However, without F4, this equivalence is not maintained with regard to Bob’s
beliefs. That is, consider the following two statements:
• Bob currently believes that, given Ann’s current beliefs, Ann will play a if v3
is reached;
• in the state closest to the current state where v3 is reached, Bob believes that
Ann plays a at v3 .
The first statement considers Bob’s beliefs at the current state; the second
considers Bob’s beliefs at a different state. Without F4, these beliefs might be
quite different. It is this possible difference that leads to Stalnaker’s Theorem.
3. Strategies themselves clearly involve counterfactual reasoning. If we take strate-
gies as primitive objects (as both Aumann and Stalnaker do, and as I have done
for consistency), we have two sources of counterfactuals in extended models:
selection functions and strategies. Stalnaker (1996, p. 135) has argued that “To
clarify the causal and epistemic concepts that interact in strategic reasoning, it
is useful to break them down into their component parts.” This suggests that it
would be useful to have a model where strategy is not a primitive, but rather is
defined in terms of counterfactuals. This is precisely what Samet (1996) does.6
Not surprisingly, in Samet’s framework, Aumann’s Theorem does not hold
without further assumptions. Samet shows that what he calls a common hypothe-
sis of rationality implies the backward induction solution in nondegenerate games
of perfect information. Although there are a number of technical differences in
the setup, this result is very much in the spirit of Theorem 1.
Acknowledgements I’d like to thank Robert Stalnaker for his many useful comments and
criticisms of this paper.
References
6
Samet does not use selection functions to capture counterfactual reasoning, but hypothesis
transformations, which map cells (in the information partition) to cells. However, as I have shown
(Halpern 1999), we can capture what Samet is trying to do by using selection functions.
932 J.Y. Halpern
H Keynes, J.M., 3, 23
Hacking, I., 117 Knowledge, 6, 15, 19, 55, 112, 523–525, 527,
Hajék, A., 18 545, 587–603, 762
Halpern, J., 6, 332, 355, 525, 738 acquisition, 533, 695
Hammond, P.J., 374 common, 7, 609, 738, 759, 774, 859,
Hansson, S.O., 5, 191, 219 863, 912
Harman, G., 592 defeasible, 828, 916
Harper, W., 5, 87 elusive, 567–586
Harsanyi, J., 861 justified true belief, 523, 605, 649
Hartmann, S., 19 mutual, 865
Hawthorne, J., 18 Kooi, B., 193
Helzner, J., 1, 355 Koopman, B.O., 120
Hempel, C., 18 Kourousias, G., 193
Hendricks, V.F., 1–9 Kraus, S., 294
Higher-order expectation, 747 Kripke semantics, 6, 776, 897
Hintikka, J., 6, 133, 523, 527–550, 605 Kripke, S., 584, 660
Hoek, W.v.d., 6, 193, 525 Kyburg, H., 3, 16
Horse lottery, 365
Howson, C., 19
Hume, D., 44, 304, 319, 603, 632 L
Hurwicz, L., 16, 57 Learning problem, 696
Huygens, C., 388 Lehman, D., 294
Lehrer, K., 654
Lenzen, W., 6, 525
I Levi identity, 196
Imaging, 354, 467 Levi, I., 2, 17, 87, 107–129, 190, 247–266
Incentive, 510–511 Lewis, C.I., 52
Induction, 18, 37 Lewis, D., 3, 17, 334, 354, 524, 531, 567–586,
Information, 8, 530–533, 537, 737, 738, 762 738, 741–757
processing, 8 Lindström, S., 270
shared, 763–765 List, C., 8
Informational value, 191, 257, 262–266 Liu, F., 7
Inquiry, 535 Logic, 6, 303, 523–525
Interaction, 1, 7 of conditional beliefs, 836
Interactive belief system, 867 “of consistency”, 16, 37–39
Interactive epistemology, 7–8, 737–739, 818 counterfactual, 339
Interrogation, 535–538 deductive, 132
Introspection, 523 “of discovery”, 533
negative, 523, 609, 612, 671, 718, 829 doxastic, 308, 523, 613
positive, 523, 608, 671 of doxastic action, 852–855
dynamic doxastic, 193
dynamic epistemic, 7, 193, 738, 813
J epistemic, 6, 523, 556, 649, 719–721
James, W., 247 independence friendly, 538
Jeffrey, R., 2, 15, 47–65, 141, 472 inductive, 132, 701
Joyce, J., 5, 355, 457–489 justification, 524, 533, 535–538, 569, 650
Justification, 7, 19, 649, 655 nonmonotonic, 294–295, 304
probability, 103
of programs, 738
K of proofs, 652
Kadane, J., 18, 177–182, 352, 441–455 public announcement, 7, 783, 847
Kahneman, D., 352, 493–517, 549, 630 of questions and answers, 534
Kelly, K.T., 7, 19, 192, 193, 524, 695–715 “of truth”, 16, 41–45
936 Index
Q Simon, H., 16
Quantifying in, 531 Skepticism, 319, 567, 583, 587–603
Skyrms, B., 17, 153–160
Sleeping Beauty, 169–170, 182
R Smets, S., 7, 738, 813–855
Radical probabilism, 16, 18, 331 Social choice theory, 270
Ramsey, F., 3, 15, 21–45, 47, 351, 389 Social epistemology, 8, 737
Ranking function, 192 Social software, 7
Ranking theory, 305–313 Socrates, 534, 628
Ratificationism, 470–473 Solecki, S., 738, 773–812
Rational choice, 107, 190 Solvability, 696
Rationality, 6, 8, 386–387, 738, 865, 912 Source dependence, 494
bounded, 16, 396 Sphere semantics, 339
strategic, 159 Spohn, W., 5, 190, 303–343
Raven’s paradox, 18 Stalnaker, R., 6, 464, 524, 605–625, 738,
Reasoning, 8 895–922
conditional, 815 State-independent utility, 443–445
defeasible, 304, 619–622 Statistics, 334
nonmonotonic, 270, 535 Straight rule of induction, 699
strategic, 897 Strategy, 154, 395, 464, 475, 611, 696, 805,
Recovery, 191 863, 865, 898, 910
Reflection principle, 18, 163, 172, 177 Structuralism, 332
Reichenbach, 699 Superiority, 80–81
Relevance, 193, 321 Suppes, P., 16
Relevant alternative, 573, 591 Supposition, 69
Reliability, 7, 524, 578, 653–654, 695, Surface equivalence, 78
708–709 Suspicion, 782
Risk seeking, 494
Rott, H., 4, 190, 269–294
Rubinstein, A., 352 T
Russell, B., 23 Teller, P., 17
Theory of action, 15
Topology, 708
S Trust, 56, 169, 588
Sahlin, N.-E., 18 Truth-tracking, 524, 593
Salience, 753 Tversky, A., 352, 493–517, 549
Savage, L.J., 5, 351, 357–359, 389
Schelling, T., 753
Schervish, M.J., 18, 177–182, 352, 441–454
Secular realism, 251 U
Segerberg, K., 193 Uncertainty, 17, 92, 339
Seidenfeld, T., 18, 177–182, 352, 353, Uncertainty ambiguity, 392
361–382, 441–454 Urbach, P., 19
Selection function, 269, 271–276, 466
Seligman, J., 525
Sen, A., 190 V
Sensitivity, 591 Van Fraassen, B., 2, 16, 67–88
Separatism, 332 Vardi, M., 525
Sequential incoherence, 372 Von Neumann, J., 351, 389
Severe withdrawal, 191
Shackle, G., 337
Shafer, G., 342 W
Shangri La, 163 Williamson, T., 7, 19, 189, 524, 547, 717–734
Similarity, 639 Wittgenstein, L., 41