| Home | Table of Contents |
General Intelligence :
There are many kinds of uncertain logics. My approach is a simple combination of probabilistic and fuzzy logic, and we'll call it P-Z logic.
Probability (P) is a form of uncertainty where multiple scenarios are possible and we assign (frequentist or subjective) betting odds to them.
Fuzziness (Z) represents "vague phenomena", which expresses the degree to which an entity belongs to a class (ie, fuzzy membership).
My theory is that P and Z are sufficient to represent all real world phenomena. There are many other ways to quantify uncertainty, but I choose P and Z because they are intuitive to understand.
There is a diversity of ways to deal with uncertainty, both quantitative and qualitative.
Some recent books on probabilistic logic are:
— originally used in MYCIN
— proposed by Shortliffe and Buchanan in the late 1970s
— an ad hoc scheme that may run into difficulties in long inference chains
— has a better theoretical basis than certainty factors
— may still lead to results contrary to our expectations
Fuzzy logic is probably the "most developed" scheme currently dealing with uncertainty.
Pei Wang's uncertain logic is particularly simple and effective. Wang has written extensively on his theory, including a book on general intelligence [Wang 2006]. I discuss here some ideas from him but the view presented here does not represent Wang's.
Positive and negative evidence
In Wang's logic, a statement is associated with 2 numbers: the number of positive (w+) and negative (w-) instances. For example for the statement women have long hair, there may be 70 women with long hair and 30 women with short hair. The total support for the statement is (w+) + (w-) = 100 instances.
Using a pair of numbers allows us to know the probability of a statement as well as the the number of instances that support that statement. This is very important because some statements may be supported by very few instances and thus are weaker than statements with more support.
A major advantage of this 2-number approach is that probabilities can be updated by new instances. For example, if I encounter a new woman with long hair, then the updated probability can be easily calculated from (w+ + 1,w-). It seems that such updating is impossible with classical, single-number probability.
The (w+,w-) pair can be converted into a measure of probability and confidence, as a pair <frequency,confidence> (cf [Wang 2006] for further details).
Fuzziness and probability
Wang has given a unified interpretation of fuzziness and probability.
Other flavors of uncertain logics include (incomplete!):
As explained in [Walley 1991], p29:
In general, we say that Your beliefs are incoherent when there is a finite set of gambles, each of which You are disposed to accept, whose net outcome is certainly no better than the outcome of a further gamble which You are not disposed to accept. [...] This principle entails transitivity of preferences amongst gambles.
The basic idea of probabilistic logic is to assign probabilities to possible worlds. For example, if we toss a coin twice, the possible worlds would be { HH, HT, TH, TT }. The probabilities of all possible worlds should sum to 1. This approach was first proposed by [Nilsson 1986] in the context of AI, though it was just a rephrasing of de Finetti's ideas in the 1930s.
One problem with using a single number for the probability of each logical statement is that some simple inferences cannot be done. The reason is that Bayesian inference requires complete CPTs (Conditional Probability Tables) to perform, but many common-sense facts are expressed as incomplete CPTs.
Example 1:
"Useful programs are usually complex."
This statement can be represented as:
IF useful(program) [Z ≥ 0.8]
THEN complex(program) [Z ≥ 0.8]
with conditional probability P(complex|useful) = 0.7
Notice that the complete CPT would have these entries:
useful | complex | P |
Z ≥ 0.8 | Z ≥ 0.8 | 0.7 |
Z ≥ 0.8 | Z ≤ 0.8 | ? |
Z ≤ 0.8 | Z ≥ 0.8 | ? |
Z ≤ 0.8 | Z ≤ 0.8 | ? |
In a typical common-sense situation, we usually just want to specify the first entry while leaving the other entries "blank" (ie, having probabilities belonging to the [0,1] interval).
Example 2:
This example is exactly analogous to "resolution" in classical logic.
"This is Cold War 2. If I side with the US, I may become a traitor. If I side with China, I may become a loser. Either way, my life sucks."
We can express the premises in conditional probabilities as:
P( life-sucks | pro-US ) = 0.9
P( life-sucks | ~pro-US ) = 0.8
and we want to query the probability P( life-sucks ), but it is unspecified whether I am pro-US or not.
In classical resolution, the inference is simply:
pro-US -> life-sucks
~pro-US -> life-sucks
---------------------------
life-sucks
But using Bayesian inference we will have the following CPT (Conditional Probability Table):
pro-US | life-sucks |
true | 0.9 |
false | 0.8 |
and we cannot evaluate P( life-sucks ) because P( pro-US ) is unknown.
According to probability theory:
P(A) = P(A | B) P(B) + P(A | ~B) P(~B)
Therefore:
P(life-sucks) = P(life-sucks | pro-US) P( pro-US ) + P(life-sucks | ~pro-US) P( ~pro-US )
= 0.9 P( pro-US ) + 0.8 P( ~pro-US )
= 0.9 p + 0.8 (1 - p)
= 0.8 + 0.1 p
= [0.8, 0.9]
In other words, if we allow the use of interval probabilities, we can infer that P( life-sucks ) = [0.8, 0.9] even when we assume that P( pro-US ) = [0,1] (ie, unknown). Thus we obtain a result analogous to classical resolution.
Next we turn to how to do inference generally with interval probabilities.
This method was first outlined by [Boole 1854] but went unnoticed for a long time, until [Hailperin 1965] revived it. [Nilsson 1986] re-proposed the same method in an artificial intelligence setting, as did several other researchers.
Basically the method reduces the inference problem to a linear programming problem.
Example 3:
An excellent exposition is given in [Ng & Subrahmanian 1992]:
"[Kolmogorov 1956] and [Hailperin 1984] have shown that given distinct events E1 and E2, we cannot precisely specify P(E1 ^ E2) from P(E1) and P(E2), but we can characterize precisely the range within which the probability of E1 ^ E2 must lie. As [Frechet 1935] has shown, max{0, P(E1) + P(E2) - 1} ≤ P(E1 ^ E2) ≤ min{ P(E1), P(E2) } represents the tightest bounds for P(E1 ^ E2). ....
"Given the two events, there are four possible worlds: first (world K1), in which the events E1 and E2 both occur; second (world K2) in which E1 occurs, but E2 does not occur; third (world K3) in which E2 occurs while E1 does not occur; and lastly (world K4) in which neither E1 nor E2 occur. Suppose P(E1) = [a1,b1] and P(E2) = [a2,b2]. Furthermore, let ki be the probability that world Ki is the actual world. This situation can be expressed via the following linear program Q:
0 ≤ a1 ≤ k1 + k2 ≤ b1 ≤ 1,
0 ≤ a2 ≤ k1 + k3 ≤ b2 ≤ 1,
sum kj = 1,
kj ≥ 0"As event E1 occurs in worlds K1 and K2 which are mutually incompatible worlds, the probability of E1 occurring is (k1 + k2). As P(E1) is known to be in [a1,b1], this gives rise to the first inequality in the linear program Q. The second inequality arises similarly when we consider E2 instead of E1. ....
"To find the range for P(E1 ^ E2), we need to solve the linear program Q for the parameter k1 that represents the probability of the world in which E1 and E2 are both true. However, in general, there is no unique solution. Thus, we need to solve Q to find the minimal and maximal values of k1. Likewise, to find the range for P(E1 v E2), we solve for the minimal and maximal values of (k1 + k2 + k3)."
[A simple proof of the probability bounds follows.]
A formal description is as follows:
We are given m logical sentences Si defined on n logical variables xj with the usual Boolean operators: ^ v ~. Each sentence Si is attached with a probability πi. The key insight is to create 2n possible worlds, each with probability pk that together sum to 1. The variables xj takes on all combinations of { true, false } in the possible worlds. Thus we set up the constraints as the following matrix equations:
I p = 1,
A p = π,
p ≥ 0,
where I is a unit vector, and A is a m x 2n matrix representing the "truth table" of the sentences. When interval probabilities are used, the equations become inequalities. What we want to find out is the probability bounds attached to an additional sentence Sm+1. This becomes a linear programming problem.
Recently [Hansen & Jaumard et al 2000], [Jaumard & Parreira 2006] designed optimized algorithms to solve this type of problems. According to the papers, problems with several hundred sentences have been solved efficiently using this method.
Many authors have stated that fuzziness is distinct from probability.
Consider these 2 statements:
Suppose you must find a cheap car. You have a slight chance of finding a cheap car in "A", but not in "B".
Notice that no matter how you represent "A" and "B" quantitatively, the distinction between "A" and "B" will not go away.
The main problem is that the CPT will become very large if each variable can take on a number of fuzzy values. For example, in "useful program are usually complex", if both usefulness and complexness can take on 3 values, { high, mid, low }, the CPT will be:
useful | complex | P |
high | high | 0.8 |
high | mid | ? |
high | low | ? |
mid | high | ? |
mid | mid | ? |
mid | low | ? |
low | high | ? |
low | mid | ? |
low | low | ? |
In general there will be vn entries, where v is the valency of the fuzzy variables. And if the fuzzy variables are continuous, CPTs must be replaced by parametric functions.
It is impractical to store and do inference with very large CPTs. In the above example, only the first row is significant and the other entries can be filled in by assumptions.
My solution is to store CPTs in a compact way, storing only informative entries. Then we generate a simplified CPT by collapsing the number of fuzzy values, ie, reducing the valency of the fuzzy variables. If the valency becomes 2, the case is reduced to binary logic. For example, the above CPT may be collapsed to:
useful | complex | P |
high | high | 0.8 |
high | mid / low | ? |
mid / low | high | ? |
mid / low | mid / low | ? |
This reduction is only approximate, but it provides a working system, which is our main objective at this stage.
So far we have only dealt with propositional inference; what we need is a first-order probabilistic logic.
The solution I adopt is known as Knowledge-Based Model Construction (KBMC) which was developed by [Wellman et al 1992], [Haddawy 1994], and others.
The method basically stores a knowledgebase of first-order rules, and when a query is asked, generates from the KB a propositional Bayesian network specific to the query, on the fly.
The network-generating algorithm resembles a kind of forward-chaining inference.
More on this later...
The references are incomplete... I'll add the missing ones later.
[Boole 1854] An investigation of the laws of thought, on which are founded the mathematical theories of logic and probabilities. Walton and Maberley, London (reprint Dover, New York, 1958)
[Frechet 1935] Generalizations du Theoreme des Probabilities Totales, Fund Math 25, p379-387.
[Halpern 2003] Reasoning About Uncertainty. MIT Press.
[Haddawy 1994] Generating Bayesian networks from probability logic knowledge bases. Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence.
[Hailperin 1965] Best possible inequalities for the probability of a logical function of events, American Mathematical Monthly 72:343-359
[Hailperin 1984] Probabilistic Logic. Notre Dame J of Formal Logic, 25, 3, p198-212.
[Hailperin 1996] Sentential probability logic. Lehigh University Press, Bethlehem and Associated University Presses, London
[Hansen, Jaumard, de Aragao, Chauny, & Perron 2000] Probabilistic satisfiability with imprecise probabilities. International Journal of Approximate Reasoning, 24 (2000) 171-189
[Jaumard & Parreira 2006] An anytime deduction algorithm for the probabilistic logic and entailment problems (preprint submitted to Elsevier)
[Kolmogorov 1956] Foundations of the Theory of Probability, Chelsea Publishing Co.
[Ng & Subrahmanian 1992] Probabilistic Logic Programming. Information and Computation, 101:150--201 (download)
[Nilsson 1986] Probabilistic logic. Artificial Intelligence 28(1):71-87
[Pearl 1988] Probabilistic Reasoning in Intelligent Systems — Networks of Plausible Inference. Morgan Kaufmann, California
[Parsons 2001] Qualitative methods for reasoning under uncertainty. MIT Press
[Walley 1991] Statistical Reasoning with Imprecise Probabilities. Monographs on Statistics and Applied Probability #42. Chapman and Hall
[Wang 2006] Rigid Flexibility — The Logic of Intelligence , Springer applied logic series, Springer
[Wellman, Breese, & Goldman 1992] From knowledge bases to decision models. Knowledge Engineering Review 7(1):35-53
[Zadeh 1996] "Fuzzy logic = computing with words". IEEE Transactions on Fuzzy Systems 4:103-111
| Home | Table of Contents |
4/Sept/2007, Jan/2007, Sep/2006 (C) GIRG [ Notice about intellectual property on this page ]