General Intelligence :

Primal sketch project

Background

According to Marr, the primal sketch represents changes in light intensity occuring over space in the image. It also organises these local descriptions of intensity change into a 2D representation of image regions and the boundaries between them.

Specifically, the primal sketch may include elements such as edges (curves or straight lines), color blobs, ends, junctions (of edges), texons, etc.

Marr's vision scheme consists of 3 levels:

primal sketch
2.5D
3D

My vision scheme is slightly different which goes from 3D, 2D to 1D. This difference may not be all that important; what is important is how best to recognize features at various levels, from a computational viewpoint. The primal sketch is an essential stage of any complete vision system.

What I propose is that the output of the primal sketch should be represented as a symbolic web with links that are logical predicates. Developing such a low-level layer would make a tremendous contribution towards computer vision.

What we're trying to do

1. Obtaining the primal sketch

One solution to the primal sketch problem (I have not surveyed the topic completely) is presented in [Gao, Zhu & Wu 2006], "Primal Sketch: Integrating Texture and Structure" (pdf), by researchers in the UCLA Center for Image and Vision Science.

As explained in the paper, an input image is separated into 2 regimes, one "sketchable" and the other "non-sketchable". The sketchable regime is one of low entropy where the image can be represented by a sparse-coding sum of primitive features; the non-sketchable regime is of relatively higher entropy where sparse-coding is no longer practical and those areas of the image should be represented as textures.

In our project we were trying to deal with the type of low level feature extraction in the sketchable regime, while temporarily ignoring textures. I'm currently trying to have a collaboration with the UCLA group or possibly license their technology.

2. Output an attributed graph

After obtaining the primal sketch, the second task is to represent the image as an attributed graph. This has been partly accomplished in [Han & Zhu 2005], Bottom-up/Top-down Parsing with Attributed Graph Grammar (pdf), for some simple shapes.

The goal of our project is to output attributed graphs similar to the above kind for all sorts of images. The expressiveness of the attributed graph may be comparable to that of first-order predicate logic.

Then the next step is to delegate the tasks of recognition and inference to an intelligent agent or cognitive architecture.

3. Handover to intelligent agent

The following tasks may be handled by the intelligent agent:

pattern recognition
attentional mechanism (eg focusing attention to various features or objects)
learning of concepts and facts (declarative and episodic memory)
complex inference (for example, a chair is "anything that can be sat on", may involve procedural memory)

The division of labor can avoid repeating these tasks for vision and for general cognition.

Currently we are considering the following intelligent agents for integration with vision:

Novamente (has an architecture integrating perception and cognition)
Soar (has provisions for sensory perception)
Cyc (depends on its proprietary inference engine)
ACT-R
EPIC
IDA (intelligent distribution agent, Stan Franklin)

Basically any general cognitive architecture that can handle sensory perception, can use our vision module.

Reference

[Guo, Zhu & Wu 2006] Primal Sketch: Integrating Texture and Structure, Computer Vision and Image Understanding, 2006 (Accepted for the Special Issue on Generative Model Based Vision) (pdf)

[Han & Zhu 2005] Bottom-up/Top-down Image Parsing with Attribute Graph Grammar, Statistical preprint, 2005. Submitted to PAMI (pdf)

| Home | Table of Contents |

Mar/2006 (C) General intelligence corporation limited