Skip to content

Alphabetic Life and its Inevitability

April 23, 2014


In the very first post of this series, reference was made to ‘molecular alphabets’, and in a post of last year (8th September) it was briefly proposed that molecular alphabets are so fundamental to life that a ‘Law of Alphabets’ might even be entertained. This theme is further developed in this current post.

 How to Make A Biosystem

The study of natural biology provides us with many lessons concerning the essential properties of life and living systems. A recurring and inescapable theme is complexity, observed across all levels from the molecular to cellular scales, and thence to whole multicellular organisms. While the latter usually have many layers of additional complexity relative to single-celled organisms, even a typical ‘simple’ free-living bacterial cell possesses breath-takingly complex molecular operations which enable its existence.

Why such complexity? In the first case, it is useful to think of the requirements for living systems, as we observe them. While comprehensive definitions of life are surprisingly difficult, the essence of biology is often seen as informational transfer, where the instructions for building an organism (encoded in nucleic acids) are replicated successively through continuing generations. (A crucial accompaniment to this is the ability of living organisms to evolve through Darwinian evolution, since no replicative system can ever be 100% error-free, and reproductive variation provides the raw material for natural selection). But while the replication of genomes may be the key transaction, it is only enabled by a wide array of accompanying functions provided by (largely) proteins. The synthesis of proteins and complex cellular structures requires energy and precursor molecules, so systems for acquiring and transducing these into usable forms must also be present.

Molecular Size and Life

The primal ‘motive’ of biological entities to replicate themselves requires a host of anciliary systems for creating the necessary building blocks and structuring them in the correct manner. All this requires energy, the acquisition and deployment of which in turn is another fundamental life need. Processes for molecular transport and recognition of environmental nutrients and other factors are also essential. And since organisms never exist in isolation, systems for coping with competitors and parasites are not merely an ‘optional extra’. Although all of these activities are associated with functional requirements necessitating certain distinct catalytic tasks, a major driver of complexity is the fundamental need for system regulation. In many cases, the orderly application of a series of catalyses is essential for obtaining an appropriate biological function. But in general, much regulatory need comes down to the question of efficiency.

This has been recognized from the earliest definition of regulatory systems in molecular biology. The lac operon of E. coli regulates the production of enzymes (principally ß-galactosidase) involved with the metabolism of the sugar lactose. If no lactose is available in the environment, it is clearly both functionally superfluous and energetically wasteful to synthesize the lactose-processing enzymes. Thus, a regulatory system that responds to the presence of lactose and switches on the relevant enzyme production would be beneficial, and this indeed is what the natural lac operon delivers. In general, an organism that possesses any regulatory system of this type (or various other types of metabolic regulators) will gain a distinct competitive edge over organisms lacking them. And hence this kind of selection drives the acquisition of complex regulatory systems.

So, if complexity is a given, how can this be obtained in molecular terms? How can the molecular requirements for both high catalytic diversity and intricate system regulation be satisfied? An inherent issue in this respect is molecular size. Biological enzymes are protein molecules that can range in molecular weight from around 10 kilodaltons (kD) to well over an order of magnitude greater. If we look beyond catalysts to include all functional molecules encountered in complex multicellular organisms, we find the huge protein titin, an essential component of muscle. Titin is composed of a staggering 26,920 amino acid residues, clocking up a molecular weight of around 3 megadaltons.

But in terms of catalysis itself, why is size an issue? This is a particularly interesting question in the light of relatively recent findings that small organic biomolecules can be effective in certain catalytic roles. Some of these are amino acids (proline in particular), and have hence been dubbed ‘aminozymes’. While certain catalytic processes in living cells may be mediated by such factors to a greater degree than previously realized, small molecule catalysis alone cannot accommodate the functional demands of complex biosystems.

This assertion is based on several factors, including: (1) Certain enzymatic tasks require stabilization of short-lived transitional states of substrate molecules, accomplished by a binding pocket in a large molecule, but difficult to achieve otherwise; and (2) Some necessary biological reactions require catalytic juxtaposition of participating substrate molecules across relatively large molecular distances, a function for which small molecules are unlikely to be capable of satisfying. Even apart from these dictates, the necessity of efficient regulation, as considered above, also limits possible roles for small molecules. A fundamental mechanism for biological control at the molecular level is the phenomenon of allostery, where binding of a regulatory molecule to a site in a larger effector molecule causes a conformational change, affecting the function of the effector molecule at a second distant active site. By definition, to be amenable to allosteric regulation, an effector molecule must be sufficiently large to encompass both an effector site for its primary function (catalytic or otherwise) and a second site for regulatory binding.

Since better regulation equates with improved biosystem efficiency and biological fitness, the evolution of large effector molecules should accordingly be a logical advantage:



Fig. 1: Competitive Advantages and complexity


 Small Conundrums

Even if we accept that molecular complexity and associated molecular size is an inexorable requirement of complex life, why should such biosystems use a limited number of building blocks (molecular alphabets) to make large effector molecules? Why not, in the manner of an inspired uber-organic chemist, build large unique effectors from a wide variety of small-molecule precursor components?

Let’s look at this in the following way. Construction of a unique complex molecule from simpler precursors will necessitate not just one, but a whole series of distinct catalytic tasks, usually requiring in turn distinct catalysts applied in a coordinated series of steps. But, as noted above, mediation of most biological catalytic events requires complex molecules themselves. So each catalyst in turn requires catalysts for its own synthesis. And these catalysts in turn need to be synthesized……all leading suspiciously towards an infinite regress of complexity. This situation is depicted in Fig. 2:


Fig. 2. Schematic depiction of synthesis of a complex uniquely-structured (non-alphabetic) molecule. Note in each case that the curved arrows denote the action of catalysts, where (by definition) the catalytic agent promotes a reaction and may be transiently modified, but emerges at the end of the reaction cycle in its original state. A: A series of intermediate compounds are synthesized from various simpler substrates (S1, S2, S3 …), each by means of distinct catalysts (1, 2, 3….). Each intermediate compound must be sequentially incorporated into production of the final product (catalyst 6 …… catalyst i). Yet since each catalytic task demands complex mediators, each catalyst must be in turned synthesized, as depicted in B. Reiteration of this for each of (catalyst a …… catalyst j) leads to an indefinite regress.


These relatively simple considerations might suggest that attempts to make large ‘non-alphabetic’ molecules as functional biological effectors will inevitably suffer from severe limitations. Are things really as straightforward as this?

Autocatalytic Sets and Loops

There is a potential escape route from a linear infinite synthetic regression, and that is in the form of a loop, where the ends of the pathway join up. Consider a scenario where a synthetic chain closes on itself through a synthetic linkage between the first and last members. This is depicted in Fig. 3A below, where product A gives rise to B, B to C, C to D, and finally D back to A. Here the catalytic agents are shown as external factors, and as a result this does not really gain anything on the linear schemes of the above Fig. 2, since by what means are the catalysts themselves made? But what if the members of this loop are endowed with special properties of self-replicative catalysis? In other words, if molecule B acts on A to form B itself, and C on B to form C, and so on. This arrangement is depicted in Fig. 3B.



Fig. 3. Hypothetical molecular synthetic loops, mediated by external catalysts (A), self-replicating molecules (B), or a self-contained autocatalytic set (C). In cases (B) and (C), each member can act as both a substrate and a catalyst. In case (B), each member can directly synthesize a copy of itself through action on one of the other members of the set, whereas in case (C) the replication of each member is indirect, occurring through their coupling as an autocatalytic unit. Note that in case (B) each catalysis creates a new copy of the catalysts themselves, as well as preserving the original catalysts. For example, for molecule D acting on molecule C, one could write: C [D-catalyst] à D + [D-catalyst] = 2D. In case (C) it is also notable that the entire cycle can be initiated by 4 of the 6 possible pairs of participants taken from A, B, C and D. In other words, the (C) cycle can be initiated by starting only with pairs AD, AB, CD, and CB – but not with the pairs AC and BD. As an example for a starting population of A and D molecules: D acts on A to produce B; remaining A can act on B to produce C; remaining B can act on C to produce D, remaining C acts on D to produce A, thus completing the cycle. If the reaction rates for each were comparable, a steady-state situation would result tending to equalizing the concentrations of each participant.



But the scenarios of Fig. 3 might not seem to approach the problem of how to attain increasing molecular size and complexity needed for intricate biosystems in a non-alphabetic manner. This can readily be added if we assume a steady increase in complexity / size around a loop cycle, with a final re-production of an original component (Fig. 4). These effects could be described in the terms used for biological metabolism: the first steps in the cycle are anabolic (building up of complexity), while the final step is catabolic (breaking down complex molecules into simpler forms).

Fig.4-AutocatLoop&CmpxtyFig. 4. A hypothetical autocatalytic loop, where black stars denote rising molecular size and complexity. For simplicity, here each component is rendered in blue when acting as a product or substrate, and in red when acting as a catalyst. Here the additional co-substrates and/or cofactors (assumed here to be simple organics that are environmentally available) are also depicted (S1 – S3) for molecules D, A, and B acting as catalysts. Since C cleaves off an ‘A moiety’ from molecule D, no additional substrate is depicted in this case.


Of course, the schemes of Figs. 3 & 4 are deliberately portrayed in a simple manner for clarity; in principle the loops could be far larger and (as would seem likely) also encompass complex cross-interactions between members of each. Both anabolic and catabolic stages (Fig. 4) could be extended into many individual steps. The overall theme is the self-sustaining propagation of the set as a whole.

So, could autocatalysis allow the production of large, complex and non-alphabetic biomolecules, acting in turn within entire biosystems constituted in such a manner? The hypothetical loop constructs as above are easy to design, but the central question is whether the principles are viable in the real world of chemistry.

In order to address this question, an important point to note is that not just a few such complex syntheses would need to be established for a non-alphabetic biosystem, but very many. And each case would need to serve complex and mutually interacting functional requirements. It is accordingly hard to see how the special demands of self-sustaining autocatalytic loops could be chemically realized on his kind of scale, even if a few specific cases were feasible. The ‘chemical reality’ problem with theoretical autocatalytic systems has been elegantly discussed by the late Leslie Orgel.

Even this consideration does not delve into the heart of the matter, for we must consider how life on Earth – and indeed life anywhere – may attain increasing complexity. This, of course, involves Darwinian evolution via natural selection, which operates on genetic replicators. It is not clear how an autocatalytic set could produce stable variants that could be selected for replicative fitness. Models for replication of such sets as ‘compositional genomes’ have been put forward, but in turn refuted by others. But in any case, there is an elegant natural solution to the question of how to attain increasing complexity, which is inherently compatible with evolvability.

 The Alphabetic Solution

And here we return to the theme of molecular alphabets, generally defined as specific sets of monomeric building blocks from which indefinite numbers of functional macromolecules may be derived, through covalently joined linear string of monomers (concatemers). But how does the deployment of alphabets accomplish what non-alphabetic molecular systems cannot?

Here we can refer back to the above-noted issue of building complex molecules, and the problem of complexity regression for the necessary catalysts, and building the catalysts themselves. The special feature of alphabets is that, with a suitable suite of monomers, a vast range of functional molecules can be produced by concatenation of specific sequences of alphabetic members. We can be totally confident that this is so, given the lessons of both the proteins and nucleic acid alphabets. The versatility of proteins for both catalysis and many other biological functions has long been appreciated, but since 1982 the ability of certain folded RNA single strands to perform many catalytic tasks has also become well-known. And specific folded DNA molecules can likewise perform varied catalyses, even though such effects have not been found in natural circumstances.

So, nature teaches us that functional molecules derived from molecular alphabets can perform essentially all of the tasks required to operate and regulate highly complex biosystems. But how does this stand with synthetic demands, seen to be a crucial problem with complex natural non-alphabetic structures? Two critical issues are pertinent here. Firstly, an alphabetic concatemer can be generated by simply applying the same catalytic ligation process successively, provided the correct sequence of monomers is attained. This is fundamentally unlike a complex non-alphabetic molecule, where sites of chemical modification may vary and thus require quite different catalytic agents. The other major issue addresses the question of how correct sequences of alphabetic concatemers are generated. In this case the elegant solution is template-based copying, enabled through molecular complementarities. This, of course, is the basis of all nucleic acid replication, through Watson-Crick base pairing. Specific RNA molecules can thus act both as replicative templates and folded functional molecules. The power of nucleic acid templating was taken a further evolutionary step through the innovation of adaptors (transfer RNAs), which enabled the nucleic acid-based encoding of the very distinct (and more functionally versatile) protein molecular alphabet.

But in order to achieve these molecular feats, a certain number of underlying catalytic tasks clearly must be satisfied in the first place. These are required to create the monomeric building blocks themselves, and all the ‘infrastructure’ needed for template-directed polymerization of specific sequences of new alphabetic concatenates. But once this background requirement is in place, in principle products of any length can be created without the need for new types of catalytic tasks to be introduced. In contrast, for non-alphabetic complex syntheses, the number of tasks required will tend to rise as molecular size increases. In a large series of synthetic steps towards building a very large and complex non-alphabetic molecule, some of the required chemical catalyses may be of the same type (for example, two discrete steps both requiring a transesterification event). But even if so, the specific sites of addition must be controlled in a productive (non-templated) manner. This requires some form of catalytic discrimination, in turn necessitating additional catalytic diversity. Fig. 5 depicts this basic distinction between alphabetic and complex non-alphabetic syntheses.



Fig. 5. Schematic representation of catalytic requirements for alphabetic vs. complex (non-repetitive) non-alphabetic syntheses. For alphabetic macromolecular syntheses, a baseline level of catalytic tasks (here referred to as a ‘complexity investment’; of N tasks) allows the potential generation of alphabetic concatenates of specific sequences and of indefinite length – thus shown by a vertical line against the Y-axis (this line does not intercept the X-axis since a minimal size of a concatenate is determined by the size of the alphabetic monomers). For non-alphabetic complex molecules of diverse structures, as molecular size increases the number of distinct catalysts required will tend to continually rise, to cope with required regiospecific molecular modifications performed with the correct stereochemistry. It should be stressed that the curved ‘non-alphabetic’ line is intended to schematically represent a general trend rather a specific trajectory. Catalytic requirements could vary considerably subject to the types of large and complex molecules being synthesized, while still exhibiting the same overall increasing demand for catalytic diversity.


It must be noted that the above concept of a ‘complexity investment’ (Fig. 5) should not be misconstrued as arising evolutionarily prior to the generation of templated alphabetic syntheses. Progenitor systems enabling rudimentary templated syntheses would necessarily have co-evolved with the generation of templated products themselves. Yet once a threshold of efficiency was attained in direct and adapted templated molecular replication, a whole universe of functional sequences is potentially exploitable through molecular evolution.

And herein lies another salient point about molecular alphabets. As noted above, the secret of life’s ascending complexity is Darwinian evolution, and it is difficult to see how this could proceed with autocatalytic non-alphabetic systems. But variants (mutations) in a replicated alphabetic concatemeric string can be replicated themselves, and if functionally superior to competitors, they will prove selectable. Indeed, even for an alphabet with relatively few members (such as the 4-base nucleic acid alphabet), the numbers of alternative sequences for concatenates of even modest length soon becomes hyper-astronomical. And yet the tiny fraction of the total with some discernable functional improvement above background can potentially be selected and differentially amplified. Successive cumulative improvements can then ensue, eventually producing highly complex and highly ordered biological systems.

 Metabolic Origins vs. Genetic Origins and Their Alphabetic Convergence

The proposed importance of alphabets leads to considerations of abiogenesis, the question of ultimate biological beginnings. Two major categories of theories for the origin of life exist. The ‘genetic origin’ stance holds that some form of replicable informational molecule must have emerged first, which led to the molecular evolution of complex biological systems. This school of thought points to considerable evidence for an early ‘RNA World’, where RNA molecules fulfilled both informational (replicative) and functional (catalytic) roles. But given difficulties in modeling how RNA molecules could arise de novo non-biologically, many proponents of the RNA World invoke earlier, simpler hypothetical informational molecules which were later superseded by RNA.

An alternative view, referred to as the ‘metabolic origin’ hypothesis, proposes that self-replicating autocatalytic sets of small molecules were the chemical founders of biology, later diversifying into much higher levels of complexity.

Both of these proposals for abiogenesis have strengths and weaknesses, but the essential point to make in the context of the present post is that it is not necessary to take a stand in favor of either hypothesis in order to promote the importance of molecular alphabets for the evolution of complex life. In a nutshell, this issue can be framed in terms of the difference between factors necessary for the origin of a process, and factors essential for its subsequent development. In the ‘alphabetic hypothesis’, molecular alphabets are crucial and inescapable for enabling complex biosystems, but are not necessarily related to the steps at the very beginning of the process from non-biological origins.

If the ‘genetic origin’ camp are correct, then alphabets are implicated at the very beginning of abiogenesis. On the other hand, if the opinions of ‘metabolic origin’ advocates eventually hold sway, molecular alphabets (at least in the sense used for building macromolecules from a limited set of monomers) would seem to be displaced at the point of origin. But the biological organization we see around us on this planet (‘Life 1.0’ ) is most definitely based on well-defined alphabets. So, both abiogenesis hypotheses necessarily must converge upon alphabets at some juncture in the history of molecular evolution. For genetic origins, a direct progression in the complexity of both alphabets themselves and their derived products would be evident, but a metabolic origin centering on autocatalytic small-molecule sets must subsequently make a transition towards alphabetic systems, in order for it to be consistent with observable extant biology. Thus, stating that alphabets enable the realization of highly complex biological systems refers to all the downstream evolutionary development once alphabetic replicators have emerged. No necessary reference is accordingly made to the role of alphabets at the beginning of the whole process .

 A ‘Law of Alphabets’?

Now, the last issue to look at briefly in this post is the postulated universality of alphabets. It is clear that molecular alphabets are the basis for life on this planet, but need that always be the case? To answer this, we can revisit the above arguments: (1) Complex biosystems of any description must involve complex molecular interactions; (2) The demand for molecular complexity is inevitably associated with requirements for increasing molecular size; (3) Biological synthesis of a wide repertoire of large and complex functional molecules is difficult to achieve by non-alphabetic means; (4) The fundamental requirement for Darwinian evolution for the development of complex life is eminently achievable through alphabetic concatenates, but is difficult to envisage (and certainly unproven) via non-alphabetic means.

It is also important to note that these principles say nothing directly about the chemistry involved, and quite different chemistries could underlie non-terrestrial biologies. Even if so, the needs for molecular complexity and size would still exist, favoring in turn the elegant natural solution of molecular alphabets.

So if this proposal is logically sound, then it would indeed seem reasonable to propose that a ‘Law of Alphabets’ applies universally to biological systems. In a previous post, it was noted that an even more fundamental, but related ‘law’ could be a ‘Law of Molecular Complementarity’, since such complementarities are fundamental to known alphabetic replication. Indeed, it is difficult to conceive of an alphabetic molecular system where complementarity-based replication at some level is absent. Still, while complementarity may be an essential aspect of alphabetic biology, it does not encompass the whole of what alphabets can deliver, and is thus usefully kept as in separate, though intersecting, compartment.

To conclude, a biopoly(verse), delivered in a familiar alphabet:


If high bio-complexity may arise

In accordance with molecular size

Compounds that are small

Are destined to fall

And with alphabets, intricacy flies

References & Details

(In order of citation, giving some key references where appropriate, but not an exhaustive coverage of the literature).


‘…comprehensive definitions of life are surprisingly difficult…..’     For example, see Cleland & Chyba 2002; Benner 2010, Root-Bernstein 2012.

‘……The lac operon of E. coli regulates the production of enzymes……’     The story of the lac operon is a classic in molecular biology, included in most basic textbooks. The French group involved, led by Jacques Monod, won a Nobel prize for this in 1965. For a revisit of an old 1960 paper regarding the operon concept, see Jacob et al. 2005.

An inherent issue in this respect is molecular size.‘     See my recent paper (Dunn 2013) for a more detailed discussion of molecular size in relation to functional demands.

Biological enzymes are protein molecules that can range in molecular weight….’     A case in point for a large enzyme, pertaining to the above lac operon, is the E. coli enzyme ß-galactosidase, which has 1024 amino acid residues and a molecular weight of 116 kilodaltons. For details on the structure of ß-galactosidase, see Juers et al. 2012.

Titin is composed of a staggering 26, 920 amino acid residues…….’     See Meyer & Wright 2013.

‘…..small organic biomolecules can be effective in certain catalytic roles. ‘     See Barbas 2008.

‘……small molecule catalysis cannot accommodate the functional demands of complex biosystems. ‘     See again Dunn 2013 for a more detailed discussion of this issue.

‘…..the phenomenon of allostery…’    The lac operon again can be invoked as a good example of the importance of allostery; see Lewis 2013.

‘……a potential escape route from a linear infinite synthetic regression… in the form of a loop….’     A major proponent of autocatalytic loops and self-organization has been Stuart Kauffman, outlined (among with many other themes) in his book, The Origins of Order. (Oxford University Press, 1993).

‘…..The ‘chemical reality’ problem with theoretical autocatalytic systems has been elegantly discussed by the late Leslie Orgel. ‘     See Orgel 2008.

Models for replication of such sets as ‘compositional genomes’ have been put forward, but in turn refuted by others.‘     For the model of autocatalytic set replication, see Segré et al. 2000; for a refutation of it, see Vasas et al. 2010.

‘……molecular alphabets, generally defined as specific sets of monomeric building blocks….’       See Dunn 2013 for a more detailed definition, and discussion of related issues.

‘……since 1982 the ability of certain folded RNA single strands to perform many catalytic tasks….’     The seminal paper on ribozymes came from Tom Cech’s group in 1983 (Kruger et al. 1982).

‘…….specific folded DNA molecules can likewise perform varied catalyses….’     See Breaker & Joyce 1994.

This is fundamentally unlike a complex non-alphabetic molecule, where sites of chemical modification may vary…….’     Note that this statement does not include molecules such as polymeric carbohydrates, where these are composed of repeated monomers and thus relatively simple in their structures.

‘….the numbers of alternative sequences for concatenates of even modest length soon becomes hyper-astronomical. ‘     For example, in the case of an RNA molecule of 100 nucleotides in length, 4100 (equivalent to 1060) sequence combinations are possible.

‘…..quite different chemistries could underlie non-terrestrial biologies. ‘     See Bains 2004 for a detailed discussion of this issue.

Next post: September.

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s