Melberg, Hans O. (1998), Visual presentation of non-linear correlation in n-dimensions

_{[Note for bibliographic reference: Melberg, Hans O. (1998), Visual presentation of
non-linear correlation in n-dimensions: A speculation, www.oocities.org/hmelberg/papers/980219.htm]

Visual presentation of non-linear correlation in n-dimensions

A speculation

by Hans O. Melberg

Introduction

This is a speculation in the true meaning of the word. I do not know whether what I say
can be proven or whether it is useful. Maybe it is all a big misunderstanding, in which
case I simply reveal my ignorance. Yet, I want to present an idea I had while trying to
visualize a figure of more than three dimensions. I then began to think that such
visualizations were potentially important in discovering previously unknown patterns of
causation and that computer programs could make more imaginative use of visualizations to
show these connections.

An example

For instance, the quality of a soccer player depends on more than three variables; his
ability to run fast, to handle the ball well, to read the game, to keep his pace over time
and several other variables. Note that it does not help if you are only good at one thing
(e.g. running fast) if you cannot handle the ball. Moreover, there may be complex
interaction effects; the value of ball-handeling skills may depend on two other variables.

Now, traditional analysis could simply run a multiple regression, including interaction
variables, to examine these relationships. But it is hard to create a visualized map of
all the variables. Moreover, if the pattern of causation is complex enough a simple
regression may not reveal much, or it may require too many observations to reveal a
reliable pattern. Maybe, just maybe, a visalization of many dimensions could help reduce
this problem. Before I go into detail, and before I give more examples, a little history
may be useful.

History

It is easy to see the relationship between temperature and the volume of a gas. One simply
does an experiment and plots the relevant data in a figure: temperature on the horizontal
axis and volume on the vertical axis. The same procedure can be repeated for any other two
variables that are related in a simple fashion. Now, even if this procedure is obvious
today, it has not always been so. In the 17th century R. Descartes discovered that it was
very useful to combine two variables in a horizontal/vertical diagram (a Cartesian
diagram/coordinate system). Moreover, this discovery made it much easier to see patterns -
even infer causality - than it had been previously. Of course, large amounts of pure data
could be used without the figure, but the visalization made it much easier to find
patterns.

What patterns? First there is the simple linear relationship, a constant increase for
every unit of increase in the dependent variable. Then there are many types of non-linear
relationships; exponential (like population growth), hyperbolic, quadratic and so on.

If you believe that more than two variables are related in a system, one may make a three
dimensial map to visualize the relationship. Traditionally this was a bit difficult, but
after the computer revolution this is no longer a problem. Using three dimensional maps
one can discover peaks and valleys which would be hard to find simply by looking at the
raw data.

Now, if one wants to use more than three dimensions, one runs into a problem: It is hard
to visualize. At least, this is what many people say. I am not so sure. Part of the
problem is that computer generated figures for analysing data have mainly been extensions
of pen and paper figures. However, computer technology offers so many more opportunities
to discover patterns in data. Three of the major opportunities being movements, shape and
sound. Moreover, even if we stick to traditional pen and paper approaches I think it is
possible to present much more information in figures, and thereby - hopefully - discover
new general-classes of patterns (this time in n-dimensions).

Visualizing many dimensions: Some ideas

Imagine that the happiness of a person depends on many variables and that these variables
interact in many ways. Here is one suggestion for visualizing this in a figure:

- y-axis: happiness

- x-axis: number of friends

- rotation of the box used to mark a point

- size of box

- colour (intensity) of box

- speed of blinking

There are also many other ways of doing this. The boxes could be moving in a circle, we
could add one dimension by making the height of a box depend on one vaiable while the
width depends on another. We could use rotating arrows (of different lengths and rotation
speeds) which were attached to a circle of different size, colour intensity and speed of
blinking. More imaginative ideas include presenting data in some kind of a planet system
(rotation of each planet, distance betwen planets, rotation of planets around other
planets). Alternatively, we could use sound much more actively and/or making the figure
much more interactive and dynamic (allowing the user to "travell" into the
figure and listening to the sounds associated with different values at that spot, and/or
allowing sequences of pictures to be presented after each other to see if this revealed a
pattern).

One last idea, which is not mine, for presenting multi-dimensional data, is Chernov faces.
In short, the various dimensions are reflected in the various features in a face (size of
ears, degree of smile, amount of hair, etc). This seems maybe to be a better idea than
mine since it is intuitively easier to understand (see E. Tufte's book about
data-visaluzation for more on this).

What is the point?

I do not know how useful these visualizations are. The figures may reveal a system of
linear correlation, but this could by done in a simple ANOVA diagram too. I think the new
contribution of these figures would be to make it easier to discover non-linear and
unknown patterns (maybe indicating causation) which would be hard to discover using
traditional techniques.

How do I know that there are classes of non-linear correlations that can be discovered
this way? Can I give some concrete examples of where this procedure would be useful? In
short, I don't know. I have previously mentioned analysis of soccer players and happiness,
but these were more illustrations than really good examples. A better example might be
mixing metals and chemicals to create a new compound with special properties. For
instance, in his reply to criticism from D. Hendry, M. Friedman said that he was skeptical
of multiple regression. The reason was his experience with multiple regression when he was
trying to find a compound metal with some required properties (as strong as possible, as
light as possible and more). I do not want to argue that this is a good argument against
D. Hendry, but I do think multi-dimensional visualizations might be of help in developing
compound metals (and chemicals?). I also think the procedure could be used in the social
sciences.

Conclusion

As I said in the introduction, this has been a speculation in the true meaning of the
word. I may not be original, it may not be useful, and it need not be taken seriously. The
way to go from here to make a computer program to test whether the visalzations can be
useful in real examples. Only then would it deserve to be taken seriously.

_{[Note for bibliographic reference: Melberg, Hans O. (1998), Visual presentation of
non-linear correlation in n-dimensions: A speculation, www.oocities.org/hmelberg/papers/980219.htm]}}