General Intelligence :

Relation between cognition and vision

What Is general intelligence?
The GI-vision approach

What is general cognition?

A fundamental idea of cognition is compression of information.

Sensory experience in its raw form is like a long movie:

The Visual Module compresses this movie into a semi-symbolic representation. The Cognitive Module further compresses this information.

Compression and achieved by (statistical) pattern recognition. For example, in the "Lena" image

the object in the center can be recognized as a "face".

GI is organized hierarchically, with higher abstraction at the top. At a very high level the Lena image may be compressed symbolically as "young woman wearing a hat".

The internal representation of a GI may be a data structure containing:

objects
properties of objects
relations between objects

Some examples:

objects: "a face", "an apple", "the letter A"
properties: "face.sex = female", "apple.color = red", "letter_A.font = Times New Roman"
relations: "hat above face", "apple on table"

GI is a system that can automatically invent concepts of new objects, properties, and relations. These new concepts are in turn applied to the internal representation. For example it may learn the new property "rectangular" by looking at a lot of rectangular shapes.

A more detailed GI theory is here.

The cognitive vision approach

Vision is basically the problem of compressing the sensory input stream in a meaningful way.

= "young woman wearing a hat"

But there may be several levels of representation.

As explained in the last section, each level of representation consists of:

objects
properties
relations

The pixel level is level 0 (not a representation).

Level 1 consists of basic syntactic features:

edges / lines (includes straight lines and curves)
regions of uniform color / intensity
gradients of color / intensity
textures

These features are represented as objects. For example:

object: curve1
property: curve1.thickness = thin
relation: curve1 connected-to curve2

At level 2 we may perform character recognition or 3D object recognition, and so on.

The key thing here is that the vision problem is solved by combining GI and low-level processing. Once we have represented the image as objects + properties + relations, we can assume that a GI can handle it.

Level 1 processing is hand-programmed because it starts with pixels. Then we can rely on the GI to discover higher-level features (eg the concept "cube"). The GI will also allow us to hand-program some rules (known as knowledge insertion), so long periods of learning may be skipped.

Here is the detailed vision scheme.

| Home | Table of Contents |

Mar/2006 (C) General intelligence corporation limited