Navigation
Papers by Melberg
Elster Page
Ph.D work

About this web
Why?
Who am I?
Recommended
Statistics
Mail me
Subscribe
Search papers
List of titles only
Categorised titles

General Themes
Ph.D. in progress
Economics
Russia
Political Theory
Statistics/Econometrics
Various papers

The Questions
Ph.D Work
Introduction
Cost-Benefit
Statistical Problems
Social Interaction
Centralization vs. Decentralization

Economics
Define economics!
Models, Formalism
Fluctuations, Crisis
Psychology

Statistics
Econometrics

Review of textbooks

Belief formation
Inifinite regress
Rationality

Russia
Collapse of Communism
Political Culture
Reviews

Political Science
State Intervention
Justice/Rights/Paternalism
Nationalism/Ethnic Violence

Various
Yearly reviews

Philosophy
Explanation=?
Methodology

 

Jon Elster

and

the problem of estimating

the net value of information

Theory and implications

 

 

By Hans O. Melberg

 

 

Introduction

As argued in chapter one and two there is a shift in Jon Elster's arguments about the problems involved in the collection of an optimal amount of information after 1985. He no longer uses the term "inifinite regress" and he does not quote S.G. Winter. Instead, the argument focuses on the problems involved in the formation of probabilities. The general position is that "beliefs are indeterminate when the evidence is insufficient to justify a judgment about the likelihood of the various outcomes of action. This can happen in two main ways: through uncertainty, especially about the future, and through strategic action" (Nuts and Bolts, p. 33). More specifically the following two quotations illustrate some of the causes of the problem according to Elster:

Deciding how much evidence to collect can be tricky. If the situation is highly stereotyped, as medical diagnosis is we know pretty well the costs and benefits of additional information. In situations that are unique, novel and urgent, like fighting a battle or helping the victim in a car accident, both costs and benefits are highly uncertain … (Nuts and Bolts, p. 35, my emphasis)

In many everyday decisions, however, not to speak of military or business decisions, a combination of factors conspire to pull the lower and upper bounds [on how much information it would be rational to collect] apart from one another. The situation is novel, so that past experience is of limited help. It is changing rapidly, so that information runs the risk of becoming obsolete. If the decision is urgent and important, one may expect both the benefits and the opportunity costs of information-collecting to be high, but this is not to say that one can estimate the relevant marginal equalities. (Elster 1985, p. 70, my emphasis)

To impose some order on the following discussion, I want to make a distinction between three types of probability, three types of problems and three types of implications.

On probability, we may follow Elster (ETC, p. 195-199) and distinguish between the following concepts of probability according to their source: objective probabilities (using relative frequency as source), theoretical probability (the source of the estimate is a theory such as a weather prediction), and subjective probability (degrees of belief as measured by willingness to make bets on the belief).

As for the three problems, I want to make a conceptual distinction between non-existent probabilities, weak (but unbiased) probabilities and biased probabilities. Elster seems to argue that both non-existence and weak probabilities represent indeterminacy (see the first quotation, NB p. 33), but I believe it is important to distinguish between the two since the question in this chapter is whether it is impossible to form beliefs about the value of information.

Finally, I want to separate the following three implications related to the arguments about probabilities. First, the advice that uncertainty makes it rational to use the maximin strategy. Second, that uncertainty implies that it would be intellectually honest to use a strategy of randomizations. Third, that uncertainty implies that we should not seek more information since it is wasteful to spend resources learning the second decimal when we cannot know the first.

 

Table 1: An overview of Elster's arguments about the problem of estimation and their implications

Probabilty concept

Problem

Cause

Implication a

Justification

Example

Objective

Non-existent probabilities

Brute and strategic uncertainty

Maximin b

Arrow+Hurwicz proof (Best end result?).

Choice between fossile, nuclear and hydroelectric energy

Objective/

Subjective

Weak probabilities

Brute and strategic uncertainty

Randomization/

Maximin

Intellectual honesty

Choice of career (forester or lawyer)

Investment choices

Subjective

Biased probabilities

Hot and cold cognitive mechanisms

Randomization?

Better end result

Investment choices?

(a) Implication for all: Not waste time seeking information when such information is impossible to find or only weakly significant.

(b) Assuming we know the best/worst possible outcome.

 

The textual basis for the distinctions on implications (can be skipped)

In Explaining Technical Change Elster (1983, p. 185) argues that "there are two forms of uncertainty [risk and ignorance] that differ profoundly in their implications for action. [...]. To this analytical distinction there corresponds a distinction between two criteria for rational choice, which may be roughly expressed as 'maximize expected utility' and 'maximize minimal utility'." More specifically, the argument is that the choice between fossile, nuclear and hydroelectric energy source should be determined bot by trying to assign numerical probabilities to the outcomes, but by selecting that alternative which has the best worst consequence (minimax). To justify this principle, Elster appeals to a paper by Arrow and Hurwicz (1972). Hence, one implication of the impossibility of estimating probabilities - Elster claims - is that we should use minimax instead of maximizing expected utility (see also ETC p. 76).

In a different context, the argument is that intellectual honesty implies that we should use a strategy of randomization when we are in situations of ignorance:

In my ignorance about the first decimal - whether my life will go better as a lawyer or as a forester - I look to the second decimal. Perhaps I opt for law school because that will make it easier for me to visit my parents. This way of deciding is as good as any - but it is not one that can be underwritten by rational choice as superior to, say, just tossing a coin. (SJ, p. 10)

The idea is followed up in a chapter discussing rules about child custody after a divorce in which Elster argues that it may be better to toss a coin than to make an impossible attempt to determine who of the parents will be best for the child.

A third implication of uncertainty, according to Elster, is that it is wasteful to collect a lot of information: "it is often more rational to admit ignorance than to strive for numerical quasi-precision in the measurement of belief" (US, 128).

In sum, Elster presents a number of arguments about our inability to form reliable estimates and the implications of this inability. Probabilities can be non-existent, weak or biased and this implies that it may be rational to use maximin and/or randomization instead of maximization of expected utility, and that it is irrational to collect information about the second decimal in a problem when the first decimal is unknown. The arguments are summarized in the table below.

 

Are the arguments valid?

To further demonstrate what Elster labels an irrational prefrence for symmetry, I have chosen to discuss the validity of Elster’s arguments under three headings. First, how strong is the argument about the non-existence of probabilities (which involves a discussion of subjective and objective probability). Second, how sound is the argument that randomization is preferable (since it is more honest) in situations of weak probabilities. Third, what is the relevance of biased probabilities to the indeterminacy of rational choice? Within these three headings I want to discuss both the validity of the arguments in isolation, and their consistency with Elster’s other arguments.

On the existence of probability estimates

The principle of maximization of expected utility presupposes that the agent has or can form probabilities about the possible consequences of an action. Hence, if it can be shown that these probabilities do not exist, it implies that MEU cannot be used in that situation. This means, As Elster argues, that uniqueness, novelty and fast changing environments are problematic for expected utility theory because we cannot use previous experience of similar situations to estimate the relevant probabilities. One possible counterargument is that Elster’s arguments about uniqueness and non-existence of probabilities is heavily dependent on the classical view of probability as relative frequency. If, for instance, we use the concept of theoretical probability it seems perfectly possible to get reasonable estimates even from unique combinations of weather observations. Another, and in this context more significant, counterargument counterargument, is the argument that probabilities should be intepreted as measures of subjective uncertainty, in which case it is perfectly possible to speak about probability even in unique situations.

Subjective probabilities

Elster, of course, is aware of this alternative view of probability, but he argues against the use of subjective probabilities. The arguments are (rather crudely), summarized in the following list:

  1. It denies the possibility of genuine uncertainty (SG, p. 19-20)
  2. It leads to logical inconsistencies.
  3. "It presupposes that we are somehow able to integrate our various fragments of knowledge and arrive at a stable and intersubjectively valid conclusion" (ETC, p. 199)

On (1) and (2)

Does subjective probability deny genuine uncertainty? Bayesians argue that it is always possible to translate my uncertainty into probability statements about the world that can be acted upon. You simply elicit the subjective probabilities by forcing a person to choose between a given set of alternatives. For instance, suppose you had to choose between the following alternatives (A vs. B, the example is build on US p. 129):

A: If you correctly guess the twenty-first decimal of pi you get $100, if you are wrong you get nothing.

B: If you draw a red ball from an urn of p per cent red balls and 100 - p per cent blue balls you get $100.

If the person prefers A to B one might infer that the person’s subjective probability of being able to guess the decimal, is higher than B. One might then increase the percentage of red balls in alternative B and force the agent once again to choose between A and B. If we continue this process we will eventually come to a point where the agent prefers B to A (or end up with the conclusion that the agent is certain that he can estimate the twentyfirst decimal of pi).

I am not convinced by this argument for the non-existence on genuine uncertainty. First, it seems either to deny (by assumption) the very question we want to examine; We do not allow the agent to respond "I don’t know!" Second, it assumes that the answer reveals what we want it to reveal since the inference that the choice reveals our subjective uncertainty is only valid if the agent really tries to maximize his expected utility when faced with the two alternatives. If the agent instead simply selects his answer at random (or using some other criteria), then the inference from his answer to his subjective probability is not valid.

A Bayesian might argue that the problem could easily be solved by saying that total ignorance ("I don’t know" in the example above) can simply be translated into the probabilty statement that "all outcomes are equally likely to happen." I find this an attractive proposal, but this is both conceptually and logically problematic. Conceptually, as Iversen (1984, p. 61) admits, "saying that each value is equally likely is to say something about the parameter and represents one step up from complete ignorance." As for the logical problem, imagine that you have to guess the value of X, and all you know is that X is somewhere between (including) 0 and 5 (the example is from Iversen 1984, p. 61). If you use the assumption that complete ignorance means that all outcomes between 0 and 5 are equally likely, then the probability that X is less than 2.5 is 0.5:

P (X < 2.5) = 0.5

But, if you are ignorant about the value of X, you are also ignorant about the value of X2. The possible range of X2 is from 0 to 25 (since X goes from 0 to 5). This means that the probability that X2 is less than 12.5 should be 0.5 (being ignorant about the value of X2 we simply say that all outcomes between 0 and 25 are equally likely). In other words:

P (X2 < 12.5) = 0.5

By taking the square root of both sides of the inequality above, we get:

P (X < 3.54) = 0.5

But this clearly contradicts the first statement that P(X < 2.5) = 0.5.

I am not sure how to respond to this problem. It certainly shows that complete ignorance is not the same as a uniform probability distribution. It does not show, however, that complete ignorance is something that really exists. The inconsistency is simply caused by the different specification of the possible outcomes. One might "solve" the problem by arguing that also the specification of possible outcomes belong to the subjective realm. That is, we must simply use the states we believe are possible in the calculation and the proof that this is inconsistent compared to the results using a different set of possible states (more states) is not relevant (or does not prove irrationality). I cannot be blamed for not using a set of outcomes I believed did not exist (given that this belief itself was rational). I am slightly more worried about the conceptual step (going from I don’t know to a probability distribution), but I am less willing than Elster to dismiss the argument that "insufficient reason" justifies a uniform distribution.

On (3)

The final argument is that subjective probabilities are not intersubjectively valid. I am unsure about what this means, but one interpretation might be that people given the same information might come up with different probabilities and it sounds wrong to argue that both are equally valid as a basis for calculating what you should do. (The underlying argument seems to be that "two different estimates cannot both be equally rational since there is only one truth"). A bayesian could make several responses. First, bayesian and classical estimates may converge over time even if people have different initial priors (People starting with different beliefs about the amount of red and blue balls in an urn will revise their beliefs as the are allowed to see the colour of selected balls using Bayes rule).

Second, given the differences in background knowledge it is perfectly possible that two rational people come up with different probability estimates. People will differ in their background knowlege because they have encountered different information in their lives and this is reflected in their prior beliefs. Rational updating based on the same new information may then result in two different belief, but none need be more rational than the other (one is certainty closer to the truth than the other, but that is not the point; beliefs do not have to be true to be rational).

I believe that this second point also reveals a tension in Elster’s argument. He demands that probabilities should be intersubjectively valid, but he also insists that rationality is a subjective notion. Consider the following quotation:

It is not possible, however, to give general optimality criteria for the gathering of information. One frequently made proposal - to collect information up to the point where the expected value of more evidence equals marginal cost of collecting it - fails because it does not respect the subjective character of rational choice (RC, p. 14, my emphasis)

The argument here is that an outside observes might be able to asses the value of information, but this does not help the person who tries to act rationally as long as he cannot estimate the value of information. The information has to be available to the person who is making the decisions. This is true, but it also suggests that probability is an inherently subjective notion. As argued, different persons have different information and as such it is possible that they both rationally estimate probabilities that differ. To demand that probabilities be intersubjectively valid (if one by this means that everybody should arrive at the same estimate), is to impose an objective standard on something that is inherently subjective. [On reflection I am not sure that this is what Elster means by the phrase "intersubjectively valid."]

A third reply to the argument that subjective probabilities are not "intersubjectively valid" is that objective probabilities are no more intersubjectively valid than subjective probabilities. This is because three is no neutral criterion that determines which cases are "similar enough" to be used as a basis for calculating the objective probability. Some might argue that it was impossible to estimate the proabability that the USSR would collapse (no similar events to use as a basis for calculation), others might argue history provided cases of "similar empires" that could be used to work out the probability of collapse. Or, to use an example from Elster: “The doctor carrying out a medical diagnosis finds himself many times in the same situation” while “most people are unemployed only once, or, if more than once, under widely differing circumstance.” (SJ, 16, original emphasis). For this argument to be “intersubjectively valid” we need a criterion of “sameness” and “different circumstances” and there is no such neutral criterion.

Risk dominates uncertainty and vice versa

Even if we yield the (dubious) point that only objective probabilities are valid as inputs in the decision-making process, Elster himself presents an argument that reduces the importance of uncertainty (ETC, p. 202). The argument is that risk dominates uncertainty when the two interact multiplicatively. For instance, assume you want to know the probability of successful use of stolen plutonium. For this to occur, three things must happen: first somebody must try to steal the plutonium (assume the probability of this is P1), the break-in must be successful (P2) and they must manage to construct a bomb using the plutonium (P3). An safety expert worried about this may then multiply the three to probabilities to get an estimate of how likely the “successful theft” scenario is p1 * p2 * p3. As long as one of these is measurable, there is some basis for the overall probability (the overall probability cannot be higher than its the highest individual component). While this problem may reduce the problem of genuine ignorance, we should also be aware that uncertainty dominates risk when they interact additively. This gives risk, once again, an important role.

Sub-conclusion on the existence of genuine uncertainty

I hope to have shown that Elster’s argument about the non-existence of probability depends quite heavily on the classical view on probability as relative frequency. I also hope to have shown that the argument in favour of this view, and against the subjective view, is (at least) open to discussion. Beyond this I have no strong conclusions on whether the non-existence of probabilities is a serious problem. I tend to believe (rather weakly) that there is often some aspect of the problem that allows us to make some inferences on probabilities. For instance, in the mentioned problem about pi I would certainly choose A as long as the percentage of red balls was below 10 since there are only ten decimals to choose from. In many cases it also seems reasonable to translate "I don’t know" into "all alternative are equally likely." Yet, I am also aware of the problems with the other proposals and this is the reason for my guarded conclusion.

Weak probabilities and the argument for randomization

First of all we must ask in what sense probabilities are weak. Since I want to distinguish between bias and weakness, I shall reserve the label weak for beliefs that are unbiased. Conceptually the distinction is important (although practice is more difficult!). For instance, we may form a belief about the colour composition of the balls in an urn based on a sample of three. This belief is not very strong, but - if the proper statistical formulas are applied - it is not biased.

As mentioned Elster argues that some beliefs are too weak to justify inclusion in a rational caluclation of net expected utility (and that we for this reason should refrain from choosing actions based on such calculations).

In my ignorance about the first decimal - whether my life will go better as a lawyer or as a forester - I look to the second decimal. Perhaps I opt for law school because that will make it easier for me to visit my parents. This way of deciding is as good as any - but it is not one that can be underwritten by rational choice as superior to, say, just tossing a coin. (SJ, p. 10)

I think the argument is weak. Assume you have to choose between the following two alternatives:

A: 10 000 USD with an estimated probability of 50.01 (and 0 with probability 49.99)

B: 10 000 USD with an estimated probability of 49.99 (and 0 with probability 50.01)

It seems to me that I would choose A even if my I knew that the variance in my estimated probability was high. True, I have no strong preference between the alternatives, but why toss coins as long as I have an option that gives a higher expected payoff? Elster might reply that this choice is an example of hyperrationality ("defined as the failure to recognize the failure of rational choice theory to yield unique prescriptions or predictions." SJ, p.17). I agree that it would be irrational to spend much time and money trying to estimate the second decimal if we were ignorant about the first in the case above, but that is not the question. We do not ask whether it is profitable to collect more information, but which choice you should make for a given set of information.

One might argue that the difference is small in the example above, but the true compasrison is not simply between the difference in probability, but the difference in expected utility when the probabilities are multiplied by the payoffs. In the case above the difference is $200, which seems to me to be a non-negligible sum. The larger the payoff the more significant the small difference in probability is. This argument seems to reveal a tension in Elster's view: In the quotations at the beginning of this chapter he argues that both factors (weak probabilities and large payoffs or "importance") pull in the direction of "coin-tossing", but it seems to me that the factors (at least in my example) pull in separate directions.

There is, however, an even more serious problem with Elster’s suggestion. In the real world we will encounter many choices in which we may rely on probabilities of varying reliability. Sometimes we are very uncertain, sometimes we are more certain. Let us compare the following two rules for choosing what to do (decision-guides):

A: If our beliefs are very weak, you should (or weaker: might as well) toss a coin to decide the matter; if the beliefs are reliable, you should choose the alternative with the highest expected utility (Elster's strategy)

B: Choose the action with the highest expected utility both in situations with weak and strong beliefs. (Bayesian strategy)

First of all, the fact that we have to make many choices means that the many small differences becomes large in aggregate. As a Bayesian says in response to why we should choose B:

"... life is full of uncertainties - in a given week, you may buy insurance, bet on a football game, make a guess on an exam question, and so forth. As you add up the uncertainties of the events, the law of large numbers come into play, and the expected value determine your long-run gains" Gelman 1998, p. 168).

Another problem with Elster's decision-rule, is the fact that before we make a decision we have to determine whether the situation is one of "enough" certainty to use chose the action that maximizes expected utility, or whether we are so uncertain that we should randomize (or something else, like maximin). Where is the limit, and is it not costly to do examine the circumstances in this way every time we have to make a decision? Of course, we could go all the way and say that all our knowledge is always so weak that we always should toss coins. In this way we could avoid the problem of choosing when using Elster's strategy. Sometimes Elster is attracted to this argument, but at other times he seems to want to "have the cake and eat it." For instance, he is sympathetic to Descartes when he claims that our knowledge is limited in a way that can be compared to being lost in the forest. Yet, when discussing child custody after a divorce he does not want to go all the way and argue that it might as well always be decided using randomization. In some "obvious" cases the court should not toss a coin. But then the court first has to examine whether the case is obvious and this process is costly in the same way (but maybe not to the same extent) that a trial about child-custody would be. In short, either decision-rule A has a problem in terms of deciding when to toss a coin, or one has to believe that we are so lost that we might as well always toss coins.

The relevance of biased probabilities

When discussing subjective beliefs (and beliefs in general) Elster often presents convincing arguments to the effect that beliefs often are formed by hot (beliefs influence by what you want to be the case) and cold cognitive mechanisms (wrong beliefs even when you do not have any strong preferences about the truth). The argument is also used when discussing the problems involved in collecting an optimal amount of information. For instance, he argues that the elicitation of subjective beliefs is subject to a mechanism called anchoring, that is if we start from a low probability (few red balls) in the example of eliciting subjective probabilities, the agent is more likely to end up with a low "subjective probability" than if we start from a high probability and goes down (many red balls). In short the procedure for measuring the belief affect the belief we find! Surely this is a sign that these subjective probabilities are unreliable and should not be used as inputs in decision-making.

Although I find the topic of hot and cold belief-formation both interesting and important, it is not relevant in the present context. The main question in this paper is whether the principle of rationality yields a determinate answer, not whether peoples’ actual behaviour conform to the standards of rationality.

There is, however, room for a final comment about Elster arguments that applies to all the previous situations and the recommendation that agents should use maximin or randomization in situations of great uncertainty. It seems to me that this prescription itself (toss coins when you are very unsure), is itself subject to the problem it is meant to avoid. Since Elster argues that we sometimes have reliable probabilities, it follows that we have to decide whether to use the maximin/randomization or maximize expected utility. If the argument against the use of expected utility is that we tend to deceive ourselves so we cannot rely on our subjective probabilities, then one might also suspect that the agent deceives himself when making the choice about which procedure to use. To say that we sometimes should use maximin because we are biased, is not very helpful if the same bias makes us exaggerate the reliability of the probabilities so that we will not choose maximin. This is another instance of the problem already mentioned, when you do not go all the way to say that we should always use the maximin strategy.

 

Conclusion: Elster on the problem of estimation

Sometime Elster argues that some people have good judgement (see e.g. SG, p. 16, ETC p. 87). It seems to me that this implicitly reveals that it is often possible to form rational beliefs about the value of information. If we really lived in a world in which we were lost in the forest, there would be no judgment - only luck and unluck. I am still unsure about the extent to which we are inherently "lost" (i.e. except for out limited abilities), but I do think this section has demonstrated some weaknesses in the argument that the estimation problems implies that it is often impossible to form rational estimates about the value of information.