Molecular

Computation

 

Chapter contributed to the book
Non-Standard Computation
Molecular Computation, Cellular Automata, Evolutionary Algorithms, Quantum Computers

by T. Gramß, S. Bornholdt, M. Groß, M. Mitchell, T. Pellizzari
Foreword by M. Schroeder, Introduction by H. Schuster
Wiley-VCH, May 1998
ISBN 3-527-29427-9, Euro 69.00, 226 + xiv pp.

Cover

Was Biotronic alles kann: Blind sehen, gehörlos hören ...
von Claudia Borchard-Tuch und Michael Groß
Wiley-VCH Juni 2002, ISBN 3527303812, Euro 24,90, 280 S.

Cover

 

Introduction

Life is computation. Every single living cell reads information from a memory, re-writes it, receives data input (information about the state of its environment), processes the data and acts according to the results of all this computation. Globally, the zillions of cells populating the biosphere certainly perform more computation steps per unit of time than all man made computers put together.

However, living cells do not use any devices which we (late 1990s computer users) would expect to be necessary for computation. No semiconductor chips, nor quantum dots or mechanical, Babbage-type machinery. Rather than mechanics, quantum mechanics or electronics, cells use chemistry: they compute by using molecules, mostly heteropolymeric macromolecules such as proteins and nucleic acids, but small molecules also play a part in this, as we shall see. Proteins can, for instance, act as signal receptors, logical gates, or signal transducers between different forms of signalling including light, electricity and chemical messenger systems. Nucleic acids mainly act as memory, both for permanent and for short-term applications, although the self-splicing activity of certain RNA molecules means that they can perform information processing as well.

While the genetic role of nucleic acids has been known for several decades, the roles of proteins for molecular computation in the cell is only just beginning to be explored. Signal transduction even if it is only to do with yes or no signals, is a complex business, which only very slowly yields to massive research efforts. Even more so, the more complex logical connections and quantitative interdependencies in the cell have eluded research so far. A brief overview on nature's reservoir of computational molecules will be given in section 2.

While we seem to know more about the computational aspects of DNA than of those of proteins in the cell, this situation is reversed in the application of biomolecules in artificial computing systems. Although some light harvesting proteins such as bacteriorhodopsin (bR) have been used in attempts to create molecular (computer) memories since the early 1970s, the notion of DNA-based computing only came up in 1994. This paradox may partly be explained by the fact that relatively small populations of bR molecules can be manipulated and "read" by laser light which has been available for several decades, whereas the methods to deal with small amounts of DNA have only been developed very recently. Computational applications of both DNA and proteins in artificial systems will be discussed in detail in section 3.

If we define molecular computation as computation using molecules as data storage and/or processing units, the field is not limited to biomolecules. Indeed, the young science called supramolecular chemistry, which studies complex systems composed of synthetic molecules associated by weak interactions, has already produced some devices such as molecular switches, which may prove useful for future developments in computation.

All three kinds of molecular devices (DNA, proteins, and synthetic supramolecules) are now in a critical phase of their development. Within the next few years it will be decided whether or not they will become useful in commercial products, for instance in some future generation of personal computers. In order to better understand the potentials and limitations of these highly promising artificial (bio)molecular systems, we shall first have a glance at that sub-micron scale computer which was developed a few billion years ago and has been tremendously successful ever since -- the cell.

Computational molecules in the cell

In the beginning there was ... information

All life forms we know work on the same principle: cells surrounded by lipid membranes, with DNA as the storage medium of genetic information, RNA for diverse information transfer purposes, and proteins to carry out mechanical and chemical functions. This uniformity is convenient for students who read biochemistry, because most knowledge can readily be generalized. However, as only the analysis of diversity patterns allows us to trace back the family history of molecular biology, this also means that we don't know anything about the time before evolution invented the cell and the DNA/proteins machinery.

It seems reasonable to assume that evolution began when some simple molecule "learned" to replicate itself with occasional errors, thus giving evolution the chance to set its principles of variation and selection into practice. Although we don't know what kind of molecule this first replicator was (RNA is the most promising candidate), we know that it contained a piece of information. And all the processes it performed should be familiar to computer scientists: It read data from a memory, and copied it. Strictly speaking, this first evolving piece of information does not need to have been expressed in a molecule. For instance, A. G. Cairns-Smith, of the University of Glasgow proposed irregularities in clay minerals as carriers of the proto-genetic information. According to his hypothesis, a "genetic takeover" took place, when the mineral genes served as templates for the construction of the first molecular genes.

To this day, copying and spreading genetic information is what life is all about. All the visible parts of living organisms (phenotype) have evolved because they were able to assist this purpose in one way or the other. Considering that life essentially is a molecular data copying machinery, we will be less surprised to learn that evolution has produced some clever computational devices on the way.

RNA -- the universal genius

In 1989, Thomas Cech and Sidney Altman were awarded the Nobel prize in chemistry for their work demonstrating that RNA -- hitherto regarded as an information rather than an action molecule -- can act as an enzyme. It can, for instance, catalyse cleavage and ligation of RNA both within the same molecule and in other molecules. In its more orthodox roles, it can act as genetic material (in viruses), and as a carrier of information between the genetic material and the protein synthesis machinery (messenger RNA). Chances are that the RNA strands accounting for roughly two thirds of the molecular weight of the ribosome, the protein synthesis machinery of the cell, are ribozymes as well. Although the prebiotic chemistry in the tradition of the Urey-Miller experiment has not yet demonstrated the making of a self-replicating ancestral RNA, the fact that RNAs can carry information, form and function, makes the hypothesis of the pre- cellular, RNA-based life ("RNA-world") a plausible one. However, the importance of the universalist molecule faded when the specialists entered the field -- DNA and proteins.

DNA -- the cell's ROM

Evolution in the reaction tube has been demonstrated for RNA molecules, and the hypothesis of the "RNA-world" states that all genetic and enzymatic functions of early life forms have been carried out by RNA molecules. However, there is a limit to the performance of such a molecular "universal genius". The ability of RNA to fold up to cloverleaf-like structures is certainly beneficial for enzymatic activity but can become a major problem when we want to use the nucleic acid as a linear data store. The reading head decoding the information along the line could easily be stopped by an unwanted formation of three- dimensional structure. Furthermore, the relatively low fidelity in RNA copying may have been an asset during the evolution of the first macromolecules, when a lot of trial and error was needed to get life going. However, established lifeforms need a greater degree of genetic stability which RNA cannot easily provide.

These are the reasons we believe to have been essential for the evolution of DNA as the genetic material of the cell, a specialized molecule which has exclusively the function which we call ROM in modern computers. DNA uses almost the same code as RNA -- with the minor exception of Uracil being replaced by Thymin -- but it normally occurs in pairs of complementary strands, which have a strong tendency to form a very well-defined linear structure, the double helix. Unlike the sporadic knots and loops of RNA, the double helix protects the whole length of the molecule from degradation, and can be locally unwound to be read. The chemistry of DNA provides the possibility of proof- reading after replication, which brings the error-rate of replication down by more than one order of magnitude.

During the 1980s, molecular biologists have developed highly specific means to process DNA molecules in a variety of ways, including site specific mutagenesis, splicing, and amplification starting from only a few molecules. As we will see later, this repertoire makes the cell's data store an ideal building material for innovative applications ranging from molecular wires and scaffolds to molecular computation.

In a similar way, as DNA replaced RNA in the role of life's ROM, proteins took over the structural and enzymatic functions. One of the advantages of protein enzymes over ribozymes is that the former can be regulated by other enzymes, in a way which, again, is strongly reminiscent of computing.

Reversible regulation of enzymes by phosphorylation

DNA encodes -- among other things -- protein genes, i. e. the instructions for making the several thousands of different kinds of proteins which each cells needs for its day to day business. Many (most probably most) of these are involved in some kind of regulation. Transcription factors indicate which piece of DNA is to be read when - this mechanism is in charge of the whole area of embryonic development, and its malfunction can cause malignant cell proliferation. Protein-cutting enzymes (proteases) can activate or inactivate other enzymes by clipping little bits off. As we shall see in the next subchapter, this kind of regulation plays an important role in many physiological processes, including blood clotting. Protein-phosphorylating enzymes (kinases) can activate other enzymes by attaching phosphate groups to them. Among other cellular processes, the synthesis and degradation of the "cellular food-store", the long-chain carbohydrate glycogen, is regulated in this way.

The whole metabolism of the cell is regulated by a vast network of such interactions between regulatory proteins, which, as soon as their input signal reaches a threshold, or as a set of combined input requirements is fulfilled, act on other molecules to pass the information on. As Dennis Bray pointed out in a review in Nature in 1995, regulatory proteins can carry out a variety of computational tasks. Highly cooperative systems with a sharp transition between rest and action, can act as switches. Proteins responding to several different substrates in an additive or an alternative way can serve as logical gates with AND or OR function. Cascades of enzyme action, such as the notorious blood- clotting cascade, can enhance an input signal by more than five orders of magnitude. And proteins with several binding sites for regulatory ligands and complex long-distance ("allosteric") interactions between them can respond to different inputs in a very complex way best described in terms of fuzzy logic.

Phosphorylation is not only one of the most important regulation mechanisms of the cell, but possibly also the best suited for computation. The major advantage of the phosphorylation reaction is that it is fully reversible and only uses little space so that an enzyme can have several sites which can be phosphorylated and dephosphorylated independently. For instance, the enzyme glycogen synthase can be phosphorylated in many different sites by six different protein kinases (PK) and dephosphorylated by several different protein phosphatases (PP). This not only implies that signals influencing the enzyme can come from a variety of different pathways, but also that the enzyme can compute the signal in a way that, for instance, a weighted sum of different signals must surpass a given threshold value in order to activate the enzyme.

Excursus 1: Proteins -- the basic facts

Proteins are the cell's "action" molecules. They come in an enormous variety of shapes and sizes and fulfill an equally wide range of tasks. All this is achieved by combination of just 20 building blocks, the amino acids. During protein biosynthesis on the ribosome, these are combined to a chain molecule by a reaction merging the carbonic acid group (--COOH) of one amino acid to the amino group (--NH2) of the next, which leads to what is known as a peptide bond (--CO--NH--). The chain molecule then folds up to a complex but well- defined three-dimensional structure stabilized by a large number of weak interactions between different parts of the chain. Only this correctly folded structure, known as "the native state" is functionally active.

Proteins can be found as water-soluble, more or less round shaped particles in the cell's (or, indeed, extracellular) aqueous fluids (globular proteins), embedded into lipid membranes (membrane proteins, such as signal receptors), or as structural elements in fibres (keratin in hair, and various others in all biogenic mineral structures such as bones, teeth, mollusc shells, etc.).

Dividing proteins by a different set of criteria, one can distinguish between proteins which catalyse a chemical reaction (enzymes, such as the phosphatases, proteases and kinases mentioned in the main text), those wich transport a small molecule (such as the oxygen carriers hemoglobin and myoglobin), those which confer information by binding or releasing other macromolecules, and, indeed, many more groups.

Depending on their function, proteins have vastly different lifetimes. While eye lense proteins have to remain intact for the whole life span of the organism (lest the lense would become turbid from aggregated protein), proteins involved in signalling such as the small peptide hormone insulin are degraded after they have delivered their message. The average lifetime of insulin only lasts five minutes. Many equilibria in the cell are controlled by variation of the rate of synthesis of a specific protein on the background of its constant rate of degradation.

The irreversible switch: proteolysis

While metabolism often needs both directions of a reaction (for instance, make glycogen if there's food aplenty, degrade it, when hardship comes) many reactions only need an on- or an activation switch. Switching off can be left to time (given that lifespans of most biomolecules in the cell are limited anyway) or to a loss of activity.

One of the most commonly found irreversible on-switches is the proteolytic cleavage of an inactive precursor-protein. And one of the best-studied examples, where half a dozen switching processes form a reaction cascade, is found in the phenomenon of blood clotting (Fig.~ref{fig-regul} e)). The last two steps of this extremely important cascade are among the best studied cases of activation by proteolysis. The penultimate step is the conversion of prothrombin to thrombin. In the last step, the activated enzyme thrombin clips the protein fibrinogen in a way so that it can aggregate and form the structures which ultimately form the blood clot. One obvious reason for this being organized in such a complicated way is that the process can be desastrous if it is triggered in the wrong place or at the wrong time. The enzyme cascade as a series of safety locks, so to speak. (The other side of this coin is that any spontaneous mutation inactivating any of the more than ten proteins involved in the cascade will cause serious problems, which is why haemophilia is a relatively common genetic disease, originating from a variety of molecular defects.)

However, there is also a computational aspect to this complex regulation system, namely multiplication of a signal. Each of the enzymes in the cascade can activate hundreds of molecules representing the next step, so having for instance, six steps can easily provide you with a signal amplification of 12 orders of magnitude. And, just to make things even more complicated, I have to admit that there are in fact two cascades leading to the activation of thrombin, which are connected by the biochemical equivalent of the logical AND operation.

One cycle for all signals: G-proteins

Next to phosphorylation, binding of a nucleoside-triphosphate is arguably the most important reversible "switching" process in the cell. While the triphosphate of Adenosin, ATP, which fuels muscle contraction and a multitude of other energy- consuming processes, is generally seen as the energy currency of the cell (although it also carries the information that energy is avaliable), the analogous guanosin triphosphate (GTP) must clearly be regarded as a switching device. GTP binding is crucial in the signal transduction performed by the G-proteins -- the conversion of extracellular hormonal or sensory signals into intracellular, chemically encoded messages. The same fundamental functional cycle of G- proteins, effectors and GAPs can be used to describe a wide variety of phenomena in the cell, including signal transduction, protein trafficking, protein biosynthesis and rearrangements of the cytoskeleton.

For instance, if the hormone adrenalin arrives at a liver cell and binds to the adrenalin-specific receptor embedded in its membrane, this leads to a whole cascade of reactions involving a G-protein, an effector catalysing the synthesis of cyclic adenosin monophosphate (AMP) and eventually results in the export of glucose from the cell. Another GTP binding protein, Ras, is important for the regulation of cell growth and differentiation. (Mutations of this protein are notorious for causing a variety of cancers.)

When the first crystal structures of complete G proteins came out in December 1995 and January 1996, the complexity of the structures revealed prompted Nature to use the headline "The G protein nanomachine". Indeed, the structure, which includes a seven-bladed propeller, inspires interpretation in mechanical terms. On the basis of these well-established structural data, the exact molecular function of this most important relais of all cells will certainly be worked out in a relatively short time.

Only a few weeks later, the details of how one of the GTPases involved in the biosynthesis of proteins is recycled into the active state were revealed by another crystal structure. While the elongation factor EF-G simply gets recharged by binding another molecule of GTP after it has hydrolysed one during its activity, the other factor promoting peptide chain elongation, EF-Tu, needs a specific protein, EF-Ts to become recharged. The crystal structure of a EF-Tu EF-Ts complex published in early 1996 will allow a more detailed understanding of how used G-proteins can be switched back to the active state.

More generally, the flood of high resolution structures of complex protein systems solved in the years 1993--95 should give us hope that within a few years from now we should be able to describe the cell's "communication technology" not only in general concepts, but also in molecular and mechanistic detail.

Proteins as RAM

If we could shock-freeze a single cell and then analyze the state of all of its macromolecules, the phosphorylation, GTP-binding etc. state of the proteins would give us an exact description of the cell's environment, its metabolic requirements, any stress it is exposed to etc. In other words, much as we can call DNA the cell's ROM, we can think of the regulatory proteins as its RAM.

Only the latter function is much less obvious, as, in real life we cannot read the RAM of a cell in the way we sequence its DNA. It is the very nature of a transient, quickly changing working memory which makes this cellular function difficult to study.

Protein-based neural networks

Combining the above properties of the protein repertoire of a cell strongly suggests that they behave as a nanoscale neural network. Indeed, the relatively simple reaction characteristic of proteins means that mathematical models of neural networks can describe the protein network more closely than one made of real neurons.

D. Bray has demonstrated that a system consisting of just two transmembrane receptors sharing a ligand molecule on one side and a target protein which may be phosphorylated on the other side of the membrane, can in computer simulations be trained like a neural network. The neural network analogy can also be used to explain how regulatory proteins often cluster to form functional units.

Light to electronical information (vision)

If one wishes to study the information transfer from the environment into the living cell, light is the stimulus which is most convenient to generate, control and detect. Furthermore, light detection is immensely relevant for both biology (vision, photosynthesis, phototaxis) and technology (optoelectronics, visual media, etc.).

Although vision in the sense of mental images of the outside world has to be discussed on a cellular rather than a molecular scale, the underlying fundamental processes involved in the detection of individual photons are clearly defined in molecular terms and are in a way related to the issues of cellular computation discussed above.

The major type of photosensitive cell of our eyes, the extremely sensitive but colour-blind rods can be activated by a single photon. Their primary molecular light receptor is a protein pigment complex known as rhodopsin containing retinal as a chromophore and opsin as the protein component. Light only needs picoseconds to induce the isomerisation of 11-cis-retinal to the all-trans version of the pigment. The conformational strain on this isomer, which does not fit the opsin binding site any longer, promotes a series of chemical reactions which eventually lead to the cleavage of the retinal-opsin connection. All-trans retinal diffuses away from the receptor and is restored to the 11-cis form in the dark reaction of the rod vision process, which remarkably recrutes its energy from the cleavage of membrane lipids.

Meanwhile, the receptor has passed the information contained in the original photon on to a G-protein called transducin by catalysing the exchange of transducin-bound guanosin diphosphate (GDP) for GTP. Transducin functions in a very similar way to the G-proteins involved in hormone reaction and obeys the generalized reaction cycle shown in Fig. ... . The transducin-GTP complex is believed to activate an enzyme which extremely rapidly converts cyclic GMP into GMP thus giving out the order that all the sodium channels of the cell have to be closed, which in turn leads to the polarization of the cell membrane and to a synaptic nerve signal. Again, as in the case of the hormone reaction, the cascade not only passes the signal on but amplifies it up to 100 fold in each step so that a single photon causes the rapid production of 10^{5} molecules of GTP which are enough to guarantee the closure of all sodium channels of the outer segment of the rod cell.

In extremely halophilic (salt-loving) Archaebacteria, such as can be found in the Dead Sea and in other extremely salty environments, the protein homologous to the rhodopsin involved in vertebrate vision, known as bacteriorhodopsin, serves an entirely different purpose: photosynthesis. Membrane fragments containing up to one molecule bacteriorhodopsin in 10 lipid molecules can be easily generated and are among the most remarkable biological materials known to date, as will be discussed below. In addition, halobacteria have a second kind of rhodopsin analogue, halorhodopsin, which is involved in the sensory reaction to light and in phototaxis.

Of course, the bulk of the global photosynthesis turnover goes through the chloroplasts of green plants, with the photosynthetic reaction center entirely unrelated to rhodopsins.

Electricity: from cells to brains

Communication and computation within cells relies, as we have seen, largely on chemical reactions between biomolecules and on the diffusive motion of these molecules. However, as multicellular organisms evolved, and the distribution of tasks between separate parts of the organism required rapid communication over much longer distances, a new kind of carrier had to be used.

Electricity came in handy, as cells have an electrical potential of ca. - 70 mV with respect to their environment anyway, which they actively establish by means of membrane proteins which continously pump protons out. If a cell membrane became very leaky, the spontaneous flow of ions would rapidly neutralize this resting membrane potential. And this is basically how excitable cells such as neurons work.

When these cells are excited by a signal from the outside, sodium channels open wide and allow positively charged sodium ions to flow into the cell, thus shifting the cell's potential from - 70 to + 40 mV within a few milliseconds. After a few more milliseconds, the gates are closed again and the pumps have restored the initial value of - 70 mV. This short peak in the action potential travels along nerve cell axons like an electric signal in a piece of wire. Interestingly, at the interface with the next cell, the synapse, the signal is handed over in a chemical form again, by neurotransmitters.

Hence the general stategy employed by multicellular organisms for long-range communication seems to be a hybridization between electrical and molecular information transport. The speed of this process is less than 100 m/s, which is by orders of magnitude slower than in semiconductor electronics.

In higher organisms, neurons form brains, which can perform a lot of tasks beyond the capability of today's computers, such as rapid access associative memory. Just how brains compute and remember is nowadays, in the "decade of the brain" a matter of much research and even more debate. However, one hypothesis, which was once fashionable a few decades ago has been ruled out. The brain does not store information in the sequence of molecules. Although, of course, it uses molecular interactions for computation, it does not use biomolecules to encode or process datasets. Hence, a brain working on the basis of molecular computation remains still to be invented, although quite a few interesting components are provided by nature.

 

Bestellen Sie Was Biotronic alles kann bei

amazon.de

oder direkt vom Verlag

Order Non-Standard Computation from:

amazon.com

amazon.co.uk

amazon.de

amazon.fr

 

For enquiries concerning rights and licences, please contact Claudia Rutz,
email: crutz@wiley-vch.de

 

contact

prose and passion

last update:

17.01.2005