General Intelligence :

Uncertain logic

Summary

There are many kinds of uncertain logics. My approach is a simple combination of probabilistic and fuzzy logic, and we'll call it P-Z logic.

Probability (P) is a form of uncertainty where multiple scenarios are possible and we assign (frequentist or subjective) betting odds to them.

Fuzziness (Z) represents "vague phenomena", which expresses the degree to which an entity belongs to a class (ie, fuzzy membership).

My theory is that P and Z are sufficient to represent all real world phenomena. There are many other ways to quantify uncertainty, but I choose P and Z because they are intuitive to understand.

Background

(1975) Certainty factors (Shortcliffe & Buchanan)
(1968, 1976) Dempster-Shafer theory (aka evidence theory)
(1980) Issac Levi's credal probability
(1986) Nilsson's probability logic
(1988) Judea Pearl's contributions
(1978, 1996) Zadeh's fuzzy logic and "computing with words"
(1988) Dubois & Prade's possibility theory
(1992) Peter Walley's imprecise probability
David Poole's first-order probability logic
Kathryn Laskey's first-order Bayesian logic
Pei Wang's uncertain logic
Simon Parsons' qualitative method
Fabio Cozman
Ben Goertzel & Matt Ikle's probabilistic logic network (PLN)
others

Requirements
Probabilistic logic
- Single-value probability is inadequate
- Inference with interval probabilities
Fuzzy logic

Fuzziness != probability

Combining P and Z
Lifting to first-order logic

Background

There is a diversity of ways to deal with uncertainty, both quantitative and qualitative.

Some recent books on probabilistic logic are:

[Adams 1998] A Primer of Probability Logic
[Coletti & Scozzafava 2002] Probabilistic Logic in a Coherent Setting
[Halpern 2003] Reasoning About Uncertainty
— multiple alternatives of axiomatic systems. Axiomatic formulations are given for several uncertainty measures, including probability, possibility, "ranking functions", "relative likelihood", and "plausibility". According to Halpern, each of these methods has its own advantages and disadvantages, but all of them can be made axiomatically complete.
[Hailperin 1996] Sentential Probability Logic
[Kyburg & Teng 2001] Uncertain Inference
[Parsons 2001] Qualitative methods for reasoning under uncertainty
[Walley 1991] Statistical Reasoning with Imprecise Probabilities
[Weiru Liu 2001 ] Propositional, Probabilistic and Evidential Reasoning
International Journal of Approximate Reasoning

Certainty factors (Shortcliffe & Buchanan)

— originally used in MYCIN
— proposed by Shortliffe and Buchanan in the late 1970s
— an ad hoc scheme that may run into difficulties in long inference chains

Dempster-Shafer theory (AKA evidence theory)

— has a better theoretical basis than certainty factors
— may still lead to results contrary to our expectations

Issac Levi's credal probability

Nilsson's probability logic

Judea Pearl's contributions

Zadeh's fuzzy logic and "computing with words"

Fuzzy logic is probably the "most developed" scheme currently dealing with uncertainty.

Dubois & Prade's possibility theory

David Poole's first-order probability logic

Kathryn Laskey's first-order Bayesian logic

Pei Wang's uncertain logic

Pei Wang's uncertain logic is particularly simple and effective. Wang has written extensively on his theory, including a book on general intelligence [Wang 2006]. I discuss here some ideas from him but the view presented here does not represent Wang's.

Positive and negative evidence

In Wang's logic, a statement is associated with 2 numbers: the number of positive (w₊) and negative (w_-) instances. For example for the statement women have long hair, there may be 70 women with long hair and 30 women with short hair. The total support for the statement is (w₊) + (w_-) = 100 instances.

Using a pair of numbers allows us to know the probability of a statement as well as the the number of instances that support that statement. This is very important because some statements may be supported by very few instances and thus are weaker than statements with more support.

A major advantage of this 2-number approach is that probabilities can be updated by new instances. For example, if I encounter a new woman with long hair, then the updated probability can be easily calculated from (w₊ + 1,w_-). It seems that such updating is impossible with classical, single-number probability.

The (w+,w-) pair can be converted into a measure of probability and confidence, as a pair <frequency,confidence> (cf [Wang 2006] for further details).

Fuzziness and probability

Wang has given a unified interpretation of fuzziness and probability.

Simon Parsons' qualitative method

Ben Goertzel & Matt Ikle's probabilistic logic network (PLN)

Others

Other flavors of uncertain logics include (incomplete!):

Incidence calculus [Alan Bundy 1985, 1992]
Probabilistic argumentation [Rolf Haenni 2005]
— a system has 2 sets of variables: probabilistic and logical (binary)

Requirements

soundness
— do not draw incorrect conclusions, especially in long inference chains
completeness
— any possible conclusions can be reached
decidability
— giving answers in a finite amount of time
probabilistic consistency
— same as de Finetti's idea of "avoiding sure loss"
probabilistic coherence
— Peter Walley makes a distinction between coherence and consistence (see below)
paraconsistency
— how to deal with it?

Probabilistic coherence

As explained in [Walley 1991], p29:

In general, we say that Your beliefs are incoherent when there is a finite set of gambles, each of which You are disposed to accept, whose net outcome is certainly no better than the outcome of a further gamble which You are not disposed to accept. [...] This principle entails transitivity of preferences amongst gambles.

Probabilistic logic

The basic idea of probabilistic logic is to assign probabilities to possible worlds. For example, if we toss a coin twice, the possible worlds would be { HH, HT, TH, TT }. The probabilities of all possible worlds should sum to 1. This approach was first proposed by [Nilsson 1986] in the context of AI, though it was just a rephrasing of de Finetti's ideas in the 1930s.

Single-value probability is inadequate

One problem with using a single number for the probability of each logical statement is that some simple inferences cannot be done. The reason is that Bayesian inference requires complete CPTs (Conditional Probability Tables) to perform, but many common-sense facts are expressed as incomplete CPTs.

Example 1:

"Useful programs are usually complex."

This statement can be represented as:

IF useful(program) [Z ≥ 0.8]
THEN complex(program) [Z ≥ 0.8]
with conditional probability P(complex|useful) = 0.7

Notice that the complete CPT would have these entries:

useful	complex	P
Z ≥ 0.8	Z ≥ 0.8	0.7
Z ≥ 0.8	Z ≤ 0.8	?
Z ≤ 0.8	Z ≥ 0.8	?
Z ≤ 0.8	Z ≤ 0.8	?

In a typical common-sense situation, we usually just want to specify the first entry while leaving the other entries "blank" (ie, having probabilities belonging to the [0,1] interval).

Example 2:

This example is exactly analogous to "resolution" in classical logic.

"This is Cold War 2. If I side with the US, I may become a traitor. If I side with China, I may become a loser. Either way, my life sucks."

We can express the premises in conditional probabilities as:

P( life-sucks | pro-US ) = 0.9
P( life-sucks | ~pro-US ) = 0.8

and we want to query the probability P( life-sucks ), but it is unspecified whether I am pro-US or not.

In classical resolution, the inference is simply:

pro-US -> life-sucks
~pro-US -> life-sucks
---------------------------
life-sucks

But using Bayesian inference we will have the following CPT (Conditional Probability Table):

pro-US	life-sucks
true	0.9
false	0.8

and we cannot evaluate P( life-sucks ) because P( pro-US ) is unknown.

According to probability theory:

P(A) = P(A | B) P(B) + P(A | ~B) P(~B)

Therefore:

P(life-sucks) = P(life-sucks | pro-US) P( pro-US ) + P(life-sucks | ~pro-US) P( ~pro-US )
= 0.9 P( pro-US ) + 0.8 P( ~pro-US )
= 0.9 p + 0.8 (1 - p)
= 0.8 + 0.1 p
= [0.8, 0.9]

In other words, if we allow the use of interval probabilities, we can infer that P( life-sucks ) = [0.8, 0.9] even when we assume that P( pro-US ) = [0,1] (ie, unknown). Thus we obtain a result analogous to classical resolution.

Next we turn to how to do inference generally with interval probabilities.

Inference with interval probabilities

This method was first outlined by [Boole 1854] but went unnoticed for a long time, until [Hailperin 1965] revived it. [Nilsson 1986] re-proposed the same method in an artificial intelligence setting, as did several other researchers.

Basically the method reduces the inference problem to a linear programming problem.

Example 3:

An excellent exposition is given in [Ng & Subrahmanian 1992]:

"[Kolmogorov 1956] and [Hailperin 1984] have shown that given distinct events E₁ and E₂, we cannot precisely specify P(E₁ ^ E₂) from P(E₁) and P(E₂), but we can characterize precisely the range within which the probability of E₁ ^ E₂ must lie. As [Frechet 1935] has shown, max{0, P(E₁) + P(E₂) - 1} ≤ P(E₁ ^ E₂) ≤ min{ P(E₁), P(E₂) } represents the tightest bounds for P(E₁ ^ E₂). ....

"Given the two events, there are four possible worlds: first (world K₁), in which the events E₁ and E₂ both occur; second (world K₂) in which E₁ occurs, but E₂ does not occur; third (world K₃) in which E₂ occurs while E₁ does not occur; and lastly (world K₄) in which neither E₁ nor E₂ occur. Suppose P(E₁) = [a₁,b₁] and P(E₂) = [a₂,b₂]. Furthermore, let k_i be the probability that world K_i is the actual world. This situation can be expressed via the following linear program Q:

0 ≤ a₁ ≤ k₁ + k₂ ≤ b₁ ≤ 1,
0 ≤ a₂ ≤ k₁ + k₃ ≤ b₂ ≤ 1,
sum k_j = 1,
k_j ≥ 0

"As event E₁ occurs in worlds K₁ and K₂ which are mutually incompatible worlds, the probability of E₁occurring is (k₁ + k₂). As P(E₁) is known to be in [a₁,b₁], this gives rise to the first inequality in the linear program Q. The second inequality arises similarly when we consider E₂ instead of E₁. ....

"To find the range for P(E₁ ^ E₂), we need to solve the linear program Q for the parameter k₁ that represents the probability of the world in which E₁ and E₂ are both true. However, in general, there is no unique solution. Thus, we need to solve Q to find the minimal and maximal values of k₁. Likewise, to find the range for P(E₁ v E₂), we solve for the minimal and maximal values of (k₁ + k₂ + k₃)."

[A simple proof of the probability bounds follows.]

A formal description is as follows:

We are given m logical sentences S_i defined on n logical variables x_j with the usual Boolean operators: ^ v ~. Each sentence S_i is attached with a probability π_i. The key insight is to create 2ⁿ possible worlds, each with probability p_kthat together sum to 1. The variables x_j takes on all combinations of { true, false } in the possible worlds. Thus we set up the constraints as the following matrix equations:

I p = 1,
A p = π,
p ≥ 0,

where I is a unit vector, and A is a m x 2ⁿ matrix representing the "truth table" of the sentences. When interval probabilities are used, the equations become inequalities. What we want to find out is the probability bounds attached to an additional sentence S_m+1. This becomes a linear programming problem.

Recently [Hansen & Jaumard et al 2000], [Jaumard & Parreira 2006] designed optimized algorithms to solve this type of problems. According to the papers, problems with several hundred sentences have been solved efficiently using this method.

Fuzzy logic

Fuzziness != probability

Many authors have stated that fuzziness is distinct from probability.

Consider these 2 statements:

The car is probably expensive (with probability P = 0.8)
The car is fairly expensive (with fuzzy value Z = 0.8)

Suppose you must find a cheap car. You have a slight chance of finding a cheap car in "A", but not in "B".

Notice that no matter how you represent "A" and "B" quantitatively, the distinction between "A" and "B" will not go away.

Combining P and Z

The main problem is that the CPT will become very large if each variable can take on a number of fuzzy values. For example, in "useful program are usually complex", if both usefulness and complexness can take on 3 values, { high, mid, low }, the CPT will be:

useful	complex	P
high	high	0.8
high	mid	?
high	low	?
mid	high	?
mid	mid	?
mid	low	?
low	high	?
low	mid	?
low	low	?

In general there will be vⁿ entries, where v is the valency of the fuzzy variables. And if the fuzzy variables are continuous, CPTs must be replaced by parametric functions.

It is impractical to store and do inference with very large CPTs. In the above example, only the first row is significant and the other entries can be filled in by assumptions.

My solution is to store CPTs in a compact way, storing only informative entries. Then we generate a simplified CPT by collapsing the number of fuzzy values, ie, reducing the valency of the fuzzy variables. If the valency becomes 2, the case is reduced to binary logic. For example, the above CPT may be collapsed to:

useful	complex	P
high	high	0.8
high	mid / low	?
mid / low	high	?
mid / low	mid / low	?

This reduction is only approximate, but it provides a working system, which is our main objective at this stage.

Lifting to first-order logic

So far we have only dealt with propositional inference; what we need is a first-order probabilistic logic.

The solution I adopt is known as Knowledge-Based Model Construction (KBMC) which was developed by [Wellman et al 1992], [Haddawy 1994], and others.

The method basically stores a knowledgebase of first-order rules, and when a query is asked, generates from the KB a propositional Bayesian network specific to the query, on the fly.

The network-generating algorithm resembles a kind of forward-chaining inference.

Reference

The references are incomplete... I'll add the missing ones later.

[Boole 1854] An investigation of the laws of thought, on which are founded the mathematical theories of logic and probabilities. Walton and Maberley, London (reprint Dover, New York, 1958)

[Frechet 1935] Generalizations du Theoreme des Probabilities Totales, Fund Math 25, p379-387.

[Halpern 2003] Reasoning About Uncertainty. MIT Press.

[Haddawy 1994] Generating Bayesian networks from probability logic knowledge bases. Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence.

[Hailperin 1965] Best possible inequalities for the probability of a logical function of events, American Mathematical Monthly 72:343-359

[Hailperin 1984] Probabilistic Logic. Notre Dame J of Formal Logic, 25, 3, p198-212.

[Hailperin 1996] Sentential probability logic. Lehigh University Press, Bethlehem and Associated University Presses, London

[Hansen, Jaumard, de Aragao, Chauny, & Perron 2000] Probabilistic satisfiability with imprecise probabilities. International Journal of Approximate Reasoning, 24 (2000) 171-189

[Jaumard & Parreira 2006] An anytime deduction algorithm for the probabilistic logic and entailment problems (preprint submitted to Elsevier)

[Kolmogorov 1956] Foundations of the Theory of Probability, Chelsea Publishing Co.

[Ng & Subrahmanian 1992] Probabilistic Logic Programming. Information and Computation, 101:150--201 (download)

[Nilsson 1986] Probabilistic logic. Artificial Intelligence 28(1):71-87

[Pearl 1988] Probabilistic Reasoning in Intelligent Systems — Networks of Plausible Inference. Morgan Kaufmann, California

[Parsons 2001] Qualitative methods for reasoning under uncertainty. MIT Press

[Walley 1991] Statistical Reasoning with Imprecise Probabilities. Monographs on Statistics and Applied Probability #42. Chapman and Hall

[Wang 2006] Rigid Flexibility — The Logic of Intelligence , Springer applied logic series, Springer

[Wellman, Breese, & Goldman 1992] From knowledge bases to decision models. Knowledge Engineering Review 7(1):35-53

[Zadeh 1996] "Fuzzy logic = computing with words". IEEE Transactions on Fuzzy Systems 4:103-111

| Home | Table of Contents |

4/Sept/2007, Jan/2007, Sep/2006 (C) GIRG [ Notice about intellectual property on this page ]