Machine vision

— General theory and background

The background information on this page is adapted from [Bruce, Green & Georgeson 2003] except otherwise stated.

Background of general vision theories

Marr & Nishihara (1978)

Many people are familiar with Marr's vision theory so I won't go into details here.

This is a diagram explaining Marr's general vision framework [from Herman Gomes' webpage]:

Another famous diagram from [Marr & Nishihara 1978] [redrawn by Herman Gomes]:

Marr & Nishihara's theory is restricted to describing objects using a set of generalized cones (after [Binford 1971]).

Biederman and geons

[Biederman 1987] (web page: geon.usc.edu/~biederman) proposed the "recognition by components" theory, which is closely related to Marr and Nishihara's earlier theory. In Biederman's theory, complex objects are described as spatial arrangements of basic component parts known as "geons". Geons are defined by properties that are invariant over different views. Some example geons are [taken from Kirkpatrick's web page]:

In particular, [Biederman & Hummel 1992] proposed a neural network system to recognize geon-based objects:

Our approach

Re Marr's & Biederman's theories

Our approach roughly conforms to Marr's framework of "primal sketch → 2.5D → 3D" reduction, but we describe the reduction as "0D → 1D → 2D → 3D". It may not be such a big difference, so I'll skip the discussion of this issue.

Another similarity with Marr is that I think the high-level representation is 3D in nature. But I do not restrict the representation to generalized cones only. My theory is that any 3D object can be defined by 2D surfaces, and this theory is not restricted to generalized cones or Biederman's geons.

The following examples may convince you that some shapes are not representable by common geons:

In all of the above cases, the objects have parts that are defined by some irregular surfaces. The geon theory may fail in such cases because Biederman et al use neural networks to learn the geons, and the geons are statistically characterized by vertices, blobs, and axes. This kind of learning may be slow and recognition may be erratic. What we need is a more robust theory that can represent any 3D shape. The solution is to use logical rules to define geons in terms of 2D lines and junctions:

This method is more robust and can recognize things other than common geons, such as the highheel.

Shape from shading and from texture

A number of algorithms have been developed to recover "shape from shading" with the aim of describing the 3D shape given only the pattern of reflected light intensities (for example [Horn & Brooks 1989]), and "shape from texture". Shape from texture is a particularly hard problem so we will handle it later.

But the framework of 3-2-1-0D reduction still holds for "shape-from-X". It is relatively easy to describe 3D shapes in terms of 2D surfaces, and 2D surfaces in terms of 1D contours; but it is very difficult to jump from a set of pixels straight to a 3D description. This is probably why many prior vision theories failed.

For example, to recognize the nose, one should first recognize the shades and highlights and contours as 2D/1D features:

The conjunction of these 2D/1D elements allows us to recognize that the nose is a protrusion from the face, and its particular shape is jointly defined by the shapes of 2D/1D elements. It seems that a common mistake is to assume that the brain immediately recognizes the nose as a 3D shape from pixel-level data, without going through the intermediate stages, because the brain is usually unconscious of those intermediate stages.

Recognizing shading as 2D features will require special algorithms (so does recognizing textures). At first we will focus on using exclusively edge detection (contours) to recognize objects.

Reference

[Biederman 1987] Recognition by components: A theory of human image understanding. Psychological Review, 94, 115-145

[Biederman & Hummel 1992] Dynamic binding in a neural network for shape recognition. Psychological Review 99, 480-517

[Binford 1971] Visual perception by computer. Paper presented at the IEEE Conference on Systems and Control, December 1971, Miami

[Bruce, Green & Georgeson 2003] Visual Perception — Physiology, Psychology and Ecology, Psychology Press, NY

[Gomes 2000] Herman Gomes' web page: Marr's Theory: From primal sketch to 3-D models, http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/GOMES1/marr.html

[Horn & Brooks 1989] Shape from shading. MIT Press, Cambridge, MA

[Kirkpatrick K] Web page on object recognition http://www.pigeon.psy.tufts.edu/avc/print/kirkpatrick/kirkpatrick_figprint.htm

[Marr & Nishihara 1978] Representation and recognition of the spatial organization of 3D shapes. Proceeding of the Royal Society of London, series B, 200, 269-294

| Home | Table of Contents |

22/Jun/2006 (C) General intelligence corporation limited