DNA, Intelligent Fiddling, and Design Recognition

Updated: 11/22/2006

This is a continuation of an on-going discussion. Below are excerpts taken from the grault.net wiki discussion board. Some of the text was edited for context clarification and privacy.

Suppose we do away with Behe's version of ID, and speculate that maybe biology was designed, or at least "fiddled with", and want to test it by looking for patterns or messages in DNA using various DNA Encoding Techniques such as searching for logos, images, pi, primes, etc. Let's agree that the probability is really low of finding something; nevertheless, it is not zero. So, how is DNA sifting materially different from a scientific sense than SETI? SETI has been considered a "waste of time" by some also. However, being a waste of time is an economic issue, not a scientific one.
Note that DNA-ID does not necessarily conflict with evolution. It is like "SETI looking down instead of up". It searches for mere fiddling or alteration of DNA and does not require the ability to build life from scratch (although could possibly detect it if there). In some ways this is closer to main-stream Christianity who consider evolution as merely a tool of God. However, this version does not rely on supernatural assumptions. They could be the same aliens that SETI is looking for.

Are We Testing Yet?

SETI is already well into testing, but nobody has started any serious DNA sifting for ET artifacts. Thus, your SETI comparison is lame, like the rest of your [censored].

I am comparing to SETI when it was first formed, not as it is now. Was SETI's hypothesis not a valid one until it started actual sky scans? If so, what rule of science are you citing? A hypothesis generally precedes testing, not the other way around.

DNA Encoding Techniques

Here are some potential encoding techniques or message types that can be searched. More details or examples may be specified over time.

  • Images - One way to detect potential candidates is to try different "framing" widths in bit-maps, and then return the matches with the lowest scan-line delta's. Photographs tend to repeat similar info per scan line. Thus, the difference between line N and line N+1 will tend to be smaller than say random noise. Chunks with the lowest delta's can be extracted for later human visual evaluation. Related: Columbus Argument

  • Math Sequences
    • Prime number sequences: 2, 3, 5, 7, 11, 13, 17, 19, 23... Translated into binary: 10,11,101,111... compressed into 1011101111... and then encoded using technique described below.
    • Fibonacci sequence
    • Pi

  • Statistical Outliers, such as deviations from expected average number of codons or bases per area.
    • Some have suggested this, but I was not sold on it because it does not provide any indication of the nature or reason for the outlier. One of the above detection method would probably have to be used anyhow to get more info on it. At best it may narrow the search scope. See below.

Binary Digit Encoding

There are 24 different ways to map them to binary. As an example: C=00, A=01, T=10, G=11 is one of the 24 possible ways. Any search would have to try all 24 ways, and perhaps also backward for things such as increasing prime number sequences, since ET may write numbers right-to-left.

You're assuming that a given nucleobase maps to a value between 0 and 3. Why assume that? Why, for example, not assume that CC=0, CA=1, CT=2, and so on? Or, worse: CATTAC=0, TCAGTA=1, CCAGAT=2, etc. The Genetic Operations Designer may choose any number system or notational encoding he/she/it/they/mu likes. -- DV

Perhaps. But we look for the lowest-hanging fruit first. The encoding I propose is probably the simplest. SETI's ET may be broadcasting in ways that we won't notice. However, we don't have the resources to test every possible SETI-ET signal type. Further, if we were going to broadcast/leave a signal that we wanted other species to be able to identify, a convoluted encoding would generally be avoided. They may indeed be using such for their own purposes, but again we search the easiest first. It is more of an economic issue than a scientific one.

So you're assuming the Genetic Operations Designer has included a "signature" specifically for other species to find? If so, wouldn't it have been simpler to code DNA so that, say, a "MADE IN MAGRATHEA" logo appears in scales, fur, or skin pigment on the back of every creature?

  • We don't know what the motivations of the designer/fiddler is. An physical logo may get wiped out by evolution faster than an a DNA version because its down-selection force is wasting DNA space rather than changing the appearence of a critter.

Or, the G.O.D. could create a creature so obviously engineered that any intelligent being would immediately say, "C'mon, there's no way that could have evolved naturally!"

Like, say, the platypus...

The platypus isn't as weird as it is made out to be, any more than other "living fossil" forms, that includes ferns, sharks, etc.

As for genetic encoding, the above is already biologically confused. Nature uses a base-64 system, not a base-24 system. See e.g. http://en.wikipedia.org/wiki/Genetic_code

  • The base-64 is for biology uses. A message placer may not care about what biology uses because their purpose is to place messages, not mirror biology. Base-64 seems a bit wasteful for most purposes.

The SETI searches use signal processing and statistical techniques to look for any signal source that is distinguishable from pure noise. (How this is done, and to what extent such approaches are theoretically feasible, is an interesting field of study in itself.)

If you haven't studied something, say so, and ask questions. Don't just make baseless assumptions as if you knew something when in fact you do not; that gives a very bad impression indeed.

-- DM

I agree that SETI does not search (just) for content patterns. But that alone does not disqualify DNA-ID. Perhaps this relates to Columbus Argument.

Why These Tests?

Complicated or not, it becomes an undefined problem. The question is not whether it's scientific or not, but whether it's reasonable or not. When a testable attribute (such as the presence of a monochromatic signal) is chosen as a search criterion, it becomes reasonable. I'm not claiming that DNA-ID is unreasonable, I'm simply claiming that a reasonable test has not yet been proposed. "Looking for outliers" seems to be the suggested test, but "outlier" -- in the context of DNA -- has not been defined, as I've pointed out above. Without such a definition, what shall I look for? -- DV

I generally like to use the integer sequence detection suggestions here because they are easier to describe and define than image detection and outlier usage. One reason ET may use such sequences is because they could establish a common numerical language to be used in further decyphering. In essence, it is "interspecies math calibration". It may also help them find "scientifically curious" species rather than mere opportunists; they would rather make contact with Vulcans than Ferrengi (Star Trek reference). It is almost comparable to Google's famous math puzzle resume filtering process.

I should also point out that I don't consider any single test "sufficient". Ideally we would try all the above listed tests and any new decent suggestions because the reasons for each could vary widely.

Is Prime Algorithm Clear Enough

At this point, it strikes me that the best way to convince me, and perhaps others, of the value of DNA-ID is for you to come up with some rigorously-defined, specific tests based on the general ones described above. It'd be ideal if you'd even write some code. I'd happily run a background process on a machine or two. -- DV

The prime detection suggestion seems strait-forward to me. I don't see alternative ways to interpret it unless one goes out on a limb. What aspect of it do you find fuzzy and I will clarify. The only thing I see lacking is the length threashold. But that does not really matter because we can sort the hits by length and explore them from high-to-low as we have resources for. To avoid overloading the candidate database, perhaps start with a threashold length of 25. With this we can query for long sequences or single specimens with lots of medium ones.

It's conceptually straight-forward, but a clear idea is not the same as a clear specification. I was thinking more along the lines of something that specifies which format of genetic data it will use, identify which tests will be performed, and then a brief but rigorous outline of each test's algorithm in, say, pseudocode. That would allow you to specify some of the details, like the threshold length mentioned above, without further debate. -- DV

Let's focus on the primes suggestion for right now. What specificly do you see not specified? As far as I can tell, everything is there such that a reasonably experienced analyst can transform it into an algorithm.

I'm sure I can, but as I'm sure you're aware, what a developer implements is not necessarily going to be what the originator of the idea (i.e., you, in this case) intends -- even if the idea seems obvious -- unless the specification is rigorous. Also, I was hoping that since I am going to code it, you could contribute to the development effort by providing a detailed spec in pseudocode or equivalent. That would save me a bit of time, eliminate any possible ambiguity, and make sure I implement what you have in mind -- especially as I'm not entirely convinced that what I have in mind is what you have in mind. Also, it would allow you to define any and all relevant parameters such as the threshold length, etc., and/or decide whether or not these should vary dynamically and under what conditions. -- DV

It is mostly just string matching. One generates the 48 strings that fit the threadshold length, and then log all positions that contain them. A fancier technique would be to see how long the sequence goes rather than a fixed-length match. Personally I find the image hunting more interesting. I even started working on it, but kind of slacked off of late.

Coincidence Check

We can check to see if we are merely finding coincidences by testing with randomly-generated data. For example, we may write a script to generate random bases:


And then run our pattern-hunter software on them to see if it returns the same frequency of hits as actual DNA does. We may plot a histogram of prime lengths that looks something like:

  Actual DNA:
  0100: *****  (Number of strands longer than 100 but shorter than 500)
  0500: ****
  1000: **
  2000: ***
  5000: *

  Test DNA (random):
  0100: ***
  0500: ******
  1000: ***
  2000: *
  5000: *

If the histograms look roughly the same, then it suggests expected coincidental frequencies of prime sequences and that nothing "interesting" is happening.

Mutations Scramble It Too Much?

Re: Mutations would scramble the message too easily

But a damaged message is not necessarily an unusable one, especially in images. Redundancy and simple parity "bits" can be used to improve the message. The given prime detection algorithm would not handle "gaps" very well, but a fancier version could be made that is more gap-friendly.

The Forbes article also suggests ways to reduce mutations. Forbes Article: Message in a Bottle.

Too Unlikely?

This is a silly page. Looking for radio signals from extraterrestrial life makes a certain amount of sense. We emit radio signals, so life elsewhere might as well. Looking for signs that extraterrestrial life has "fiddled" with DNA makes no sense at all. There's no evidence of it. There's no evidence that extraterrestrials have ever been to Earth. There's no evidence that DNA has ever been "fiddled" with. There's no reason to think extraterrestrials would fiddle with DNA even if they did visit here. Am I missing the joke? Is there an unspoken assumption that aliens are genetic vandals, tagging planets they visit with microscopic spray cans? -- EH

No, it's not a joke. It's a variant of certain "intelligent design" thinking, which hypothesises that if DNA was engineered by an intelligent agent rather than being purely evolved, then there might be a marker in DNA -- something like a "Made in Magrathea" stamp, or an "Inspected by G.O.D." tag -- indicative of its intelligent origins. It has nothing to do with alien visitation; SETI is referenced as an analogy, not a basis. Given what we know about evolution, natural selection, and DNA, obviously the hypothesis is a massive stretch, but it would be unscientific to categorically reject any hypothesis without testing it no matter how far-fetched the hypothesis may seem. Categorical rejection (or embracement) of hypotheses is a characteristic of faith, not science. -- DV

Intelligent Design isn't science, it's a bad joke. So is this. -- EH

Beware of conflating the "Intelligent Design" movement -- which is of recent origins, appears to disguise a fundamentalist Christian agenda, and is largely promoted by the Discovery Institute -- with the much older notion of "intelligent design", which merely postulates that life on Earth may have been engineered. Its modern forms often derive from a simple question: If we presume humanity will reach the technological capacity to engineer life in the future, is it possible that earthly life was engineered in the past? This has nothing to do with religion, supernatural forces, spirituality or deism, and everything to do with reasonable questions about the origins of life on this planet. The lack of testability of related hypotheses largely confines exploration of this idea to the realms of philosophy and science fiction, but, as is suggested by this page, does not entirely preclude scientific investigation. -- DV

"...but it would be unscientific to categorically reject any hypothesis without testing it..."

Where did you get that idea? Do you imagine that scientists spend their time testing every far-fetched hypothesis they can generate? -- EH

Where did I get that idea? Probably from over a quarter century working for scientists, with scientists, and as a scientist. Many scientists would dearly love to test every far-fetched hypothesis they can generate, but are stopped by a lack of time, funding and/or a reasonable means of testing said hypotheses. Of course, a vast number of far-fetched hypotheses can be almost implicitly tested, and require no funding or time. The hypothesis "There is a Santa Claus," for example, can be trivially rejected by simply observing (among a hundred other things) the average chimney diameter compared to human girth, or the need to obtain faster-than-light flight from reindeer. On the other hand, the "DNA was originally engineered" hypothesis -- however unlikely it may be, and evidence suggests it's very, very unlikely -- is not as trivially rejected. -- DV

This idea can be safely rejected. There's no evidence that DNA was designed. That hypothesis doesn't add predictive power to our models or simplify them.

First you say it has nothing to do with alien visitation, but later you say it has nothing to do with religion. Are you proposing a DNA designer that is neither alien nor supernatural? Was it the dinosaurs? The commies? And what sort of scientists do you work with? -- EH

How do you know there's no evidence that DNA was designed unless you examine DNA for evidence of design? That, I presume, is what the original author of this page had in mind. Indeed, the hypothesis doesn't add predictive power in and of itself, at least in terms of current models of evolution, however any evidence in favour of the hypothesis would provide support for (at least) some of the more unusual life-origins speculations like panspermia and exogenesis.

By saying it has nothing to do with alien visitation, I was thinking in terms of exogenesis being more likely (among numerous profoundly-unlikelies, obviously) than some primordial "Erich von Daniken"-style close encounter. Obviously, the ultimate origin would be alien. Sorry, I wasn't being clear in my original edit.

I've worked with and/or for and/or have in the family several psychologists, a linguist, a sociologist, some anthropologists, a couple of physicists, and a biologist. I currently work mainly with computer scientists. (Of course, whether these can be considered "scientists" or not -- or should be more properly considered mathematicians, engineers, computing specialists or whatever -- is a matter for another debate.) -- DV

"How do you know there's no evidence that DNA was designed unless you examine DNA for evidence of design?"

How do you know there are no teapots orbiting the sun between Earth and Mars unless you examine all of the space involved?

I know there's no evidence that DNA was designed because all of the evidence (and there is an astounding amount of it) indicates it wasn't designed. Everywhere we look we see confirmation of natural selection. There's no reason to add a designer to the model.

I find it very hard to believe you've worked with scientists who thought every hypothesis was worth testing. Do they drill holes in their refrigerators to see if the light stays on when the door is closed? -- EH

How do you know there are no teapots orbiting the sun between Earth and Mars unless you examine all of the space involved?

I don't, and it would be unscientific to declare that there are no teapots orbiting the sun. Obviously, the probability of finding such a teapot is considerably lower than the probability of finding a teapot in my kitchen, but unless I exhaustively examine the space, I cannot with 100% certainty declare the absence of a teapot. If I do exhaustively examine the space, I can then only declare that there wasn't a teapot when I looked. One might have been sneakily introduced after I left. I can't even assert, with 100% certainty, that there is a teapot in my kitchen even though I used it this morning. The probability of finding a teapot in my kitchen is very high, but my memory might be faulty, or I might have hallucinated it, or a burgler might have just stolen my teapot to launch it into orbit between Earth and Mars.

Alternatively, I could use an existing theoretical framework to assert that teapots cannot exist in space. Such a framework is imaginary, of course, but for the sake of argument we'll assume it exists and is supported by ample experimental and logical evidence. Even with this theoretical foundation, combined with empirical results -- say, observation of 100,000 test teapots vanishing in a puff of dust in absolute vacuum -- we cannot assert with 100% certainty that every teapot cannot exist in space. Our theoretical framework might appear to hold true in all our test cases, but fail with a perfectly spherical, infinitesimally small teapot of nearly infinite density, or fail every 1,000,000,000,000,000,000,000 or so teapots -- either of which demonstrates that our spatial teapot theory is incorrect. This is not unknown -- see Newtonian physics vs. relativity, for example.

In other words, science cannot state absolutes. Absolute statements are the sole domain of mathematics and religion. Science can only make assertions -- based on repeated observations and theoretical frameworks -- in terms of probabilities. Obviously, some probabilities are higher than others; in some cases asymptotically approaching 100% truth or false. In the case of DNA ID, I don't doubt that we're very close to 0% probability of finding it, but it would be erroneous (and a statement of faith!) to categorically claim that the probability is absolute zero.

As for the refrigerator, I know no scientist who has actually drilled a 'fridge in order to test the hypothesis that the light goes out when the door is closed. I do know a chemistry teacher who probably would, just to illustrate the distinction between science and faith and the effect of observation on a system. I.e., that "the light will go out" is a statement of faith until we drill a hole and observe it, but even then we might have changed the fridge's behaviour merely by drilling the hole. However, our understanding of mechanics and electricity is sufficient to state with near 100% probability that the light will go out, and our doubt about the theory and the value of the outcome are generally insufficient to warrant damaging a perfectly good appliance.[1]

As for whether DNA ID is worth doing or not, it comes down to your assessment of how likely you feel it is. If you feel it's asymptotically close to 0% probability, then feel free not to participate. Apparently, the author of this page feels otherwise, as is his perogative. In general, a good scientist is willing to test any hypothesis that exceeds his or her personal probability threshold. As with the rest of science, there are no absolute values here; it is down to personal beliefs and personal assessment of probabilities to determine whether a given hypothesis is worth testing or not. -- DV

[1] Of course, the switch could fail, which means on a fridge-by-fridge basis the probability of the light going out might be considerably less than 100%. Therefore, I think a study should be done to find out what percentage of 'fridge lights fail to extinguish, on average, and what this means in terms of worldwide energy consumption and carbon emissions. On that basis, I could probably even get government funding to do it. -- DV

Re: DNA-ID is too remote of a possibility and is thus a waste of scientific resources.

That is more of an economic question than a scientific one.

However, DNA is a fairly good way to preserve messages, and thus is not a bottom-feeder as far as possible ways to leave messages. Radio waves are very transient compared to DNA. Thus, even if fewer ET's use DNA than radio, the results would last longer.

Plus, we don't have to build bulky antennea. Biology work already created the data sets for us. That is why arguments about being a waste of scientific resources don't hold significant weight. It is a relative bargain.

It would make the top ten list of places to search for ET artifacts if there was such a thing. Perhaps even the top 5.

Identifying Patterns

Columbus Argument

Suppose for the sake of argument that when Cristopher Columbus landed in the America's that he saw no sign of intelligent life ("natives"), *except* for a slightly crumbling Aztek calendar tablet (like the one seen at all the turist stops).

Columbus's time had no decent chemical analysis tools to date and test such artifacts to see if they are artificial. The only tool available was the human eye's pattern recognition ability.

If sifting digital signals that could potentially be from ETI or TI, such as DNA, one may be faced with a similar situation. We may find something that looks intelligently designed. Although we may never be sure it is, such evidence can boost an ID standing of such specimens, and thus it is "testable". 100% assurence is not required to qualify as "testable". It may indeed turn out to be coindental or a natural process. Some seem to imply that because nature sometimes fools us, that such evidence should be summarily dismissed. (Hopefully it would spur further exploration to see if more specimens can be found.)

Some may argue that the calendar has human figures (although highly stylized ones), but even if those were removed or scratched out for whatever reason, the basic argument still stands. Geometric designs may be evidence of intelligent.

In summary, I don't see any material difference between an Aztek calendar-like figure/image found in the hills or the very same image encoded within DNA. Such a pattern may be concidental in DNA, but it could also be coincidental in Columbus's case.

If a space probe sent a photo like this
from the side of a mountain on a distant
planet, most would agree it is "evidence of
intelligence" even if it is not 100% proof.

The Photograph Argument

Nobody has ever claimed it would be impossible to indentify works from all possible ET intelligences. For example, photographs with shadows, shading, and perspective cannot be produced by any known natural force of nature. The laws of perspective and image projection with a lens are universal, independent of ET's culture. Now, it is possible that some ETI don't use photographs. We simply wouldn't be able to detect them perhaps; just like SETI won't detect ETI's not using radio.

Perhaps coincidentally an image may appear that shows proper perspective etc. However, if we find several of them it reduces the chance that it is coincidence. And if we find several showing the same or similar objects from different perspectives, then the probability of coincidence goes down even further. Nature simply cannot perform perspective calculations on its own.

If the photos depict alien buildings, vehicles, bodies, etc., the probability goes down even further.

(Note that a natural pin-hole camera is perhaps possible, such as a hole in a cave, but usually the shadows are blurred away due to the long exposures needed. Still, that would be on a cave wall, not in DNA if it by chance happened.)

Have We Already Looked?

This is the claim that researchers "would have noticed" images, primes, etc. in past DNA studies because they will usually produce statistical outliers.

Counter arguments:

  • No Followup Evidence - There is no evidence that statistical analysis was done on a large scale, and even if so, there is no record anybody bothered to inspect outliers for "ET archeology" such as images, math sequences (pi, primes, etc.), language, etc. (See "Encoding Techniques" above.)

  • Even if they did, that does not make it any more "done" than SETI, who has already searched thousands of stars but has billions to go. There are more individual lifeforms on earth than stars in the milky way.

Thus, your argument has to pass two hurdles to click, not just one.

Please explain why you think that a research scientist would ignore statistical outliers as significant as those related to base-4 prime numbers. It is simply not a credible assertion.

Because they are looking for biological stuff and don't have time for followup on outliers. Followup is the resource bottleneck, not outlier finding. Outliers by themselves give little or no info about what they actually are.

Do you really think that someone with the requisite training and innate curiosity to be doing research at that level would simply ignore anomalous data points?

Yes. This is because followup is time-consuming and they have plenty of on-task things to do.

Isaac Asimov put it best: "The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' (I've found it!), but 'That's funny...'"

Edison himself and his big lab ignored curious properties of semi-conductor-like materials that may have lead to the invention of the transister 50 years earlier because he felt there were other tasks that were more important.

Note that you are still attempting to slip in your assumption that such outliers exist, without proving it. Show us these outliers that have been ignored.

Outliers do exist: natural ones. One won't know the difference unless they FOLLOWUP.