General Intelligence :

Learning

If my analysis is correct, all declarative forms of learning in GI can be handled by inductive logic programming (ILP, a term coined by [Muggleton 1991]), with these special considerations:

statistical learning and defaults
the role of explanations
"background noise"
online learning
direct knowledge insertion

ILP can be supervised and unsupervised.

Knowledge discovery in the GI falls into 3 categories:

semantic knowledge learning

supervised
unsupervised

generic knowledge learning

consistency seeking

procedural knowledge learning
— learning new ways of doing things

Procedural learning may require learning mechanisms other than ILP, such as reinforcement learning.

Language learning is handled by the natural language processing module.

Metalearning usually means trying to improve learning performance by incorporating "metaknowledge" about different learning problems. In the context of general intelligence, this figures as how to make learning better in various knowledge domains. Currently I am only dealing with learning in general.

Another sense of metalearning, ie "learning to learn", means to automatically improve learning algorithms. In other words, how to improve ILP by learning-to-learn? This is also an advanced topic that I am ignoring at present.

Theory revision is considered in reason maintenance.

Inductive logic programming (ILP)

A short introduction to ILP is [Wrobel 1996]. What follows is partly adapted from there and from Ch10 of [Mitchell 1997].

For decades, machine learning research has focused on learning in propositional representations, for example the famous decision-tree method ID3 [Quinlan 1983]. But propositional logic cannot express the relations between objects. It is clear that many problems require first-order logic to represent, and GI is obviously such a problem.

The basic goal of ILP is: given a number of examples, to learn (or to "induce") first-order logic rules that "describe" or "explain" the examples.

A simple ILP example

To learn the relation: Daughter(x,y).

A database may have the following facts:

Male(mary) = false Female(mary) = true Mother(mary, louise) = true Father(mary, bob) = true Daughter(bob, mary) = trueetc......

After presenting a large number of such facts, this rule may be learned by ILP:

IF Father(y,x) ^ Female(y) THEN Daughter(x,y)

Problem definition and some concepts: hypothesis space, background knowledge

Top-down and bottom-up methods

ILP, like other learning problems, is essentially a search in the hypothesis space. The hypothesis space can be ordered from general to specific. If we take the most general hypothesis to be the "top", then there are top-down and bottom-up approaches. Well known searching techniques in the AI literature applies here too, such as breadth-first, depth-first, best-first, and heuristic search.

{ To do: explain searching, FOIL, GOLEM, etc }

Special considerations

Statistical learning and defaults

Some statements are statistically true, ie they have exceptions. For example "women have long hair". We have to learn such statements (also keeping track of their frequentist probabilities, ie the ratio of positive examples to total number of examples).

The role of explanations

Adding explanations may complicate the above process. For example, sub-Saharan African people have "wooly" hair, which makes them an exception to the rule that "women have long hair", and the statistics we have for other women do not apply to them. It seems that we need to keep track of both statistics and explanations at the same time.

In general the addition of explanations helps reduce the hypothesis space (so-called explanation-based learning, cf [Mitchell 1997] ch11), and this process happens continually as the system accumulates knowledge.

"Background noise"

There is another kind of "noise" different from what we discussed above. In sensory experience, learning usually focuses on certain objects while ignoring the background. For example, I may see a pineapple at the supermarket in one setting and on a pineapple tree in another setting. In both cases I learn the characteristics of pineapples while ignoring the backgrounds.

Thus, learning is affected by pattern recognition, though sometimes the new pattern itself has not been learned yet. For example, it may be the first time I see a pineapple but I would still be able to delineate it as a distinct object. Also, the recognition of the background (eg a table) can also help delineate new objects.

Online learning

Sensory experience stays in Working Memory only temporarily, then it is stored in Episodic Memory. Most learning algorithms need to process the set of examples multiple times during learning. Therefore, the Learner must access Episodic Memory as well.

Also, the Learner should store a large number of candidate rules and wait for them to be confirmed/disconfirmed by experience. Only confirmed knowledge would be exported to Generic / Semantic Memory.

Direct knowledge insertion

Because our knowledge representation is logical, it is relatively easier (as compared to neural networks) to insert knowledge manually into the memory systems (G_M, S_M, even E_M). Moreover, the Learner should treat such knowledge as if they are native, and continually revise such knowledge based on incoming experience. In other words, the human user can provide inductive bias explicitly in the form of candidate hypotheses.

Semantic knowledge learning

Semantic knowledge (which literally means the meaning of words) is exclusively expressed as patterns and their definitions. Notice that the patterns themselves are nameless, their association with words is handled by the natural language processor.

Semantic learning occurs in 2 modes:

Supervised

Input: sensory facts (labeled as positive or negative examples); the pattern name.

Output: definition of the pattern by logical formulas.

This is pretty much the classic ILP scenario. The special considerations from above apply.

Unsupervised

Input: sensory facts (which may contain positive or negative examples, or noise; but all is unspecified), no pattern name is given.

Output: a likely "good" new pattern with its definition by logical formulas. The pattern may remain nameless.

This is not very different from "classical" ILP, since we can simply treated all facts as positive examples with the possibility of them being noise or exceptions. The learning algorithm will try to guess "good" concepts based on criteria such as minimal description length (MDL) or information entropy.

An example of unsupervised learning is: given samples of various fruits (their visual appearance), automatically form the concepts of apple, orange, banana, etc.

Another example is: given many instances of "things that hurt me", generate concepts that help classify what hurts me from what doesn't.

In this example, "things that hurt me" is a concept with intrinsic importance (assigned by the Emotion module) and thus high saliency. The other concepts make this classification easier, for example "hitting sharp objects hurts". In other words, a concept is useful if it helps make classification succinct.

Generic knowledge learning

In cognitive psychology the term generic memory includes all declarative knowledge that are not specific events (ie not happening at a specific time), and thus it includes semantic memory. We take the liberation of using "Generic Memory" to mean generic memory minus semantic memory. Examples are "women usually have long hair" and "entropy always increases".

Input: sensory facts

Output: generalize logical rules from the facts

An example: "If a person says something, then s/he means it unless s/he is lying, insane, etc".

Says(x,y) → Means(x,y) unless Lying(x), Insane(x),...

We would be given many facts like "Mom says there is a red apple on the table, and there is indeed a red apple on the table" or "Dad says fire burns, and fire indeed burns".

Again, this is a classic ILP scenario, and the above special considerations apply.

Consistency seeking

A special form of learning is consistency seeking, which is covered in knowledge maintenance. We can imagine that a baby AI's mind contains a lot of random facts, and as it matures the web of facts get more and more consistent. The facts in Generic Memory often support each other. For example "all men are lazy" is supported by "John is often lazy" etc. Sometimes a new fact may contradict old facts stored in G_M. Then there is the question of which fact(s) to retract or modify. This can be done in an unsupervised way by weighing the amount of "support" for the facts in question. So we have an unsupervised way of updating the knowledgebase.

For example, when I was a kid I thought that the jellyfish on the beach is called "steam" (perhaps because it is transparent). When I grew up I learned the correct usage of "steam". This is an example where a piece of semantic knowledge became inconsistent with new knowledge and in need of modification. It is the job of the Knowledge Maintenance module to find out which facts are likely to be wrong, and to notify the corresponding learning modules to modify them accordingly.

Procedural knowledge learning

{ Looking for suggestions for this... }

{ What is the "chunking" learning method as used by Soar? }

You may provide some instant feedback with this form:

Reference

[Mitchell 1997] Machine learning. McGraw-Hill

[Muggleton 1991] Inductive logic programming, New Generation Computing 8(4):295-318

[Wrobel 1996] Inductive logic programming, in Brewka ed, Principles of knowledge representation. CSLI publications, Stanford, California.

| Home | Table of Contents |

23/Aug/2006 (C) GIRG [ Notice about intellectual property on this page ]

Name:
Comment / suggestion / question: