In previous posts (April 2015; August 2015) we have looked at the notion of biological parsimony from several vantage points. For example, one such issue was the frequency of certain protein folds as recurring evolutionary motifs, in contrast to other folds which are used much more restrictedly (April 2015 post). Here we look at parsimony at the higher level of biological systems, chiefly concerning the strong tendency for such systems to evolve towards strongly economic arrangements.
Two Levels of Parsimony
In the post of April 2015, a number of different forms of biological modularity were listed, as applied towards parsimonious biosystems. Here we can ‘parsimoniously package’ the general phenomenon of biological parsimony itself into two major levels: that of the packing or arrangement of function in specific molecules or encoded biological information, and that of the deployment of molecules in terms of their functional interactions in the operations of biosystems. Of course, these are not independent factors, since a polyfunctional protein (for example) will have multiple types of interactions with other functional partners within the biosystem within which it operates. This is one means whereby a single protein can have distinct roles in cells of divergent differentiation lineages.
These levels of parsimony and their interactions are depicted in Fig. 1.
Fig. 1. Two major levels of macromolecular biological parsimony and their inter-relationships, schematically depicted. A, Packing / Functional Arrangement refers to parsimonious packaging of encoded information (such as a single genetic coding locus capable of producing multiple distinct proteins, via alternate promoters, differential splicing, or other means; represented here as ‘Informational Encoding’). Also, encompassed within this level is the parsimonious use of encoded macromolecular structures by evolutionary selection (“Evolutionary motif redeployment”) with divergence of function, well-exemplified by the TIM-barrell protein motif, as mentioned in a previous post (April 2015). In addition, the ‘packing’ levels includes the grouping of multiple distinct functions into a single macromolecule (such as a protein W with n separable functions). B, Parsimony at the level of functional interactions. These include intermolecular interactions (for example, where a protein W interactions with n different partner molecules with n distinct results), and also intramolecular effects (as in the case of allosteric changes in a protein induced by ligand binding at a distinct site). Also within this level of parsimony is the evolutionary redeployment of portions of specific signaling pathways in different cellular contexts, with divergent ‘read-out’ consequences.
Aside from the above ‘packing’ issues, another aspect of molecular functional packing which is relevant to biological parsimony at the level of individual molecules is evolutionary ‘redeployment’ parsimony. (In this respect, see also a previous post [April 2015], where this has also been discussed). Such an evolutionary form of parsimony refers to the tendency of biological components, especially at the molecular structural level, to be ‘repurposed’ by evolution towards assuming new functional roles. (This facet of parsimony is also noted in Fig. 1 above.) Why should this be so? In fact, it follows fairly simply from the principle that natural selection process can only ‘tinker’ with what is currently available, and cannot foresee the optimal solutions to biosystem problems which might become apparent in hindsight. Thus, it is usually more probable that encoded pre-existing structures will be co-opted for other functions than wholly new genes will arise de novo. In order for the ‘re-purposing’ to happen, an obvious problem would seem to arise from the simple question, “if some cellular mediator assumes a new function, what takes care of its original function?” In fact, there are well-known processes whereby this can happen, primarily involving the generation of additional gene copies (gene duplication events) with which evolution can ‘tinker’, without compromising the function of the original gene product.
Of course, it could be argued that since the entirety of biology is evolutionarily derived, that all biological parsimony is ‘evolutionary’. In a broad sense, this is obviously true, but it is worth highlighting the ‘repurposing’ type of evolutionary parsimony for special mention in this context. It is certainly true that not all evolutionary change can be classified as arising from the redeployment of pre-existing ‘parts’ towards novel applications, in any case. For example, where a single mutation in a functional protein confers a selectable fitness benefit, which ultimately becomes fixed in a population, evolutionary change has occurred – but via a direct modification of an existing ‘part’ towards better efficiency in its original role, not towards an entirely new function.
That matter aside, in this parsimonious post, the focus will be on interactomes, following a brief introduction from the previous post on this topic.
It was noted in the previous post that the seemingly low numbers of coding sequences in human and other ‘higher’ organisms is counter-balanced to a considerable degree by various diversity generating mechanisms, by which a single gene can encode multiple proteins, or a single protein can be post-translationally modified in distinct ways. But as well, it was also noted in passing that many (if not most) proteins have more than one role to play in developing organisms, often in cell types at distinct stages of differentiation. This is the essence of the interactome, the sum total of molecular interactions that enable a biosystem to function normally. In this context, a key word is connectivity, where ‘no protein [or any functional mediator] is an island, entire to itself’.
There are numerous ways that the parsimony principle is manifested within interactomes. One prominent feature in this regard is signaling and signaling pathways. It is common to find a single defined signaling mediator with multiple roles towards different cell types, or at different stages of differentiation. An example to consider here is a cytokine known as Leukemia Inhibitory Factor, or LIF. As its name suggests, it was first defined as a factor inhibiting the growth of leukemic cells, yet in other circumstances it can behave as an oncogene. It is well-known as a useful reagent in cell biology owing to its ability to maintain the pluripotent differentiation status of embryonal stem cells, an activity of great utility for the generation of ‘knock out’ mice. But in addition to this, LIF has been shown to have roles in the biology of blastocyst implantation, hippocampal and olfactory receptor neuronal development, platelet formation, proliferation of certain hematopoietic cells, bone formation, adipocyte lipid transport, production of adrenocorticotropic hormone, neuronal development and survival, muscle satellite cell proliferation, and some aspects of hepatocyte regulation. An irony of this polyfaceted range of functions is that certain activities among the above LIF-list were at first ascribed to new and unknown mediators, before detailed biochemical analysis showed that LIF was the actual causative factor.
The extent of the pleiotropism (‘many turns’) of LIF has intrigued and surprised numerous workers, leading to this effect being called an ‘enigma’. Why should one cytokine do so many things? Here it should be noted that in the cytokine world, while LIF is certainly not unique in having multiple activities, it is probably the star performer in this regard. In answer to the question “why does it make design sense to use LIF in the regulation of such a diverse and unrelated series of biological processes?”, we can invoke the parsimony principle, by a now familiar logical pathway. It is thus reasoned that a biosystem factor will tend to assume multiple functional roles if it can do so without compromising organismal fitness. The ‘tend to’ phrase is predicted on the assumption that is energetically and informationally favorable to streamline the functional operations of a biosystem as much as possible, and that evolutionary processes will move organism design in that direction, via increased fitness gains. At the same, it is evident that there must be limits to this kind of trend, since at some point in the poly-deployment of a mediator, inefficiencies will inevitably creep in, as one signal event begins to interfere with another. A number of ways have been ‘designed’ by evolution to minimize this, of which more below. But to return to the specific question of why should LIF – and not some other cytokine – be such an exemplar of polyfunctionality, there is no specific answer. All that can be suggested is that the many biological roles that feature LIF do not interfere with each other, or that they complement each other, such that there is a fitness gain by LIF’s multideployment in such ways. And this could be condensed into saying, ‘it can, so it does’, which might not sound particularly helpful. There may be reasons of simple evolutionary contingency as to why LIF gained these roles and not some other cytokine – or indeed there may be deeper (and highly non-obvious) reasons why the prize necessarily must go to LIF. Such questions might be answered at least in part by advanced systems biological modeling, or (ultimately) by equally advanced synthetic biology, where artificial programming of real-world model biosystems can address such questions directly.
With this introduction in the form of LIF in mind, it is useful to now think about ways that receptor signaling can diversify with either a single mediator involved, or with a single receptor. With respect to the latter circumstances, there are biological precedents where a single heterodimeric receptor (composed of two chains) can respond with distinct signaling resulting from engagement with separate ligands. This effect is well-exemplified by the Type I interferons (IFN), of which there are several distinct types (in humans alone, these include IFN-α, IFN-β, IFN-ε, IFN-κ, and IFN-ω, where IFN-α has 13 different subtypes), all of which bind to the same heterodimeric Type I receptor. Yet despite their sharing of a common receptor, the signaling induced by these distinct kinds of interferons is quite distinct as well. This phenomenon is depicted in Fig. 2 below.
Fig. 2. Schematic depiction of a single heterodimeric receptor which enables distinct signaling from binding of different ligands, even in the same cell type. In the top panel, Ligand A (blue pentagon) engages certain specific residues within the receptor pocket, with induction of a conformational change which activates a subset of the intracytoplasmic co-signaling molecules, with a specific signaling pathway triggered. The bottom panel depicts a different ligand (Ligand B, red hexagon), which engages the receptor with different contact residues, resulting in distinct receptor changes and concomitant downstream signaling.
In general, the form of ligand signaling complexity depicted in Fig. 2, where a specific ligand can activate one signaling pathway without activating another, has been termed ‘biased agonism’. This phenomenon has been much-studied in recent times with respect to G-Protein Coupled Receptors (GPCRs), which are a hugely diverse class of cellular receptors. They have long been of particular interest to the pharmaceutical industry through their susceptibility to selective drug action (‘druggability’), and biased agonism clearly offers a handle on improving the selectivity by which GPCR-mediated signaling is directed in a desired manner.
Other complexities to signaling arrangements are possible which increase signal diversity from a limited set of participants. Cells of different lineages may express the same receptors, but differ in their patterns of co-receptors and signaling accessory molecules whereby intracellular signals are generated. This is depicted in Fig. 3A and Fig. 3B below. Other processes whereby a limited set of ligands and receptors diversify their signaling are shown in Fig. 3C- Fig. 3F. Thus, signaling-based polyfunctionality is one aspect of interactomic parsimony.
Fig. 3. Schematic depiction of mechanisms for signaling diversity generated with either the same receptor in different contexts (A-E), or the same ligand binding to a different receptor (F). A and B: the same receptor (as a heterodimer) expressed in cells of two distinct differentiation states, such that they differ in their complements of coreceptors (not shown) or intracytoplasmic accessory signaling molecules (colored ovals). After engagement with ligand, the resulting signal pathway in context A is thus divergent from that generated in context B; C and D: the same receptor where it forms a homodimer (C) or heterodimer (D), each with distinct signaling consequences; E: the same receptor as in A, but where it interacts with a second ligand (pale blue octagon), which engenders a conformational change such that it binds either a different ligand, or a modified form of the original ligand; F: the same ligand as in A, but where it is compatible with another receptor entirely, with corresponding divergent signaling effects.
The deployment of different subunits in the signaling arrangements of Fig. 3 is itself a subset of a more general effect within interactomes, where modularity of subunits within protein complexes is a ubiquitous feature. This reflects an aphorism coined in a previous post (April 2015), to the effect that “Parsimony is enabled by Modularity; Modularity is the partner of Parsimony”. And with respect to protein modularity in eukaryotic cells, there is plenty of evidence for this from studies of the yeast proteome, where differential protein-protein combinations have been extensively documented.
Signaling to different compartments
Biosystems are compartmentalized at multiple levels. As well as the unit of compartmentalization we know as cells, numerous membrane-bound structures are ubiquitously encompassed within cellular boundaries themselves. An obvious one to note is the cell nucleus itself. While the subcellular organelles known as mitochondria (the energy powerhouses of cells) and chloroplasts (the photosynthetic factories of green plants) have their own small genomes encoding a limited number of proteins, in both cases many more proteins required for their functions are encoded by the much larger host cell genomes. Other compartments lacking their own genomes exist, including (but limited to) the endoplasmic reticulum, the Golgi apparatus, and peroxisomes.
It would be easy to imagine a host genome-encoded set of special proteins reserved for the organelles or other compartments, along with specialized transport systems (to target the organelle-required proteins to the right places) in each case. In some cases, this appears to be so, but if this was generalized, it would certainly violate the parsimony principle, since many such proteins are also required to function in more than one cellular compartment. One could envisage a solution in the form:
Signal A – Protein 1, 2, 3….. | Signal A recognition system, to compartment A
Signal B – Protein 1, 2, 3….. | Signal B recognition system, to compartment B
By such an arrangement, an identical set of proteins could be targeted to distinct compartments if they were are appended to modular recognition signals. Yet as is so often the case, biology is both more subtle and more complicated than simplistic schemes such as this. In fact, a variety of natural ‘solutions’ for the multi-targeting issue have evolved. To use the above terminology, some could be depicted at the mRNA level as:
Signal A (spliced in) – Protein 1 coding sequence….. | (expression) — Signal A recognition system, to compartment A
Signal A (spliced out) – Protein 1 coding sequence….. | (expression) — no targeting signal, remains in cytosol.
In these circumstances, the ‘default’ localization is with the cytoplasm (cytosol), and organelle targeting is effected only where a signal sequence is translated and appended to the protein. Differential splicing at the RNA level can then include or exclude the sequence of interest, both (parsimoniously) from the same genetic locus. But many more mechanisms than this have been documented for general multi-compartmentalization, including the existence of chimeric signal sequences that are bipartite towards different compartments. The take-home message once again is the stunning extent to which known biosystems have evolved highly parsimonious deployment of their encoded functional elements, all encompassed within biological interactomes.
This short tour of the parsimonious interactome has barely scratched the surface of the topic as a whole, and some other aspects of biology parsimony will indeed be taken up in the next post. Meanwhile, a biopoly(verse) take on receptor-ligand parsimony:
A ligand-receptor attraction
Can show parsimonious action
For receptors can change
In their signaling range
And vary a transduced reaction
References & Details
(In order of citation, giving some key references where appropriate, but not an exhaustive coverage of the literature).
‘…..it is usually more probable that pre-existing structures can be co-opted for other functions than wholly de novo genes will arise.’ It has long been considered that gene duplication is an effective means by which novel functions can evolutionarily arise, and far more likely than de novo gene evolution. In this regard, see a review by Hurles 2004. Yet in recent times evidence for the evolution of de novo genes from ‘orphan’ open reading frames has become stronger; see Andersson et al. 2015. Nevertheless, the duplication-mediated repurposing of pre-existing evolutionary ‘parts’ is still most likely to be much more frequent.
‘….Leukemia Inhibitory Factor, or LIF. As its name suggests, it was first defined as a factor inhibiting the growth of leukemic cells….’ For general LIF background and its polyfunctional nature, see Hilton 1992 and Metcalf 2003. As other examples of LIF anti-tumor activities, see Bay et al. 2011; Starenki et al. 2013.
‘…..yet in other circumstances it [LIF] can behave as an oncogene.‘ See Liu et al. 2015.
‘……this effect [LIF polyfunctionality] being called an ‘enigma’ …..’ See Metcalf 2003.
‘…..“why does it make design sense to use LIF in the regulation of such a diverse and unrelated series of biological processes” ……’ This question (slightly paraphrased here) was posed by Metcalf 2003.
‘……This effect is well-exemplified by the Type I interferons.’ See a review by Platanias 2005.
‘…….‘biased agonism’. This phenomenon has been much-studied in recent times with respect to G-Protein Coupled Receptors (GPCRs)…..’ For very recent updates on biased agonism in a GPCR context, see Pupo et al. 2016; Rankovic et al. 2016.
‘……protein modularity in eukaryotic cells, there is plenty of evidence from studies of the yeast proteome, where differential protein-protein combinations have been extensively documented.‘ See Gavin et al. 2006; Gagneur et al. 2006.
‘……But many more mechanisms than this have been documented for general multi-compartmentalization.’ See a review by Yogev and Pines 2011, where at least 8 different targeting mechanisms were listed for mitochondria alone. See also Avadhani et al. 2011 for a discussion of chimeric signals in a specific protein context.
Next Post: March.
The previous post discussed the notion that biological processes, and biosystems in general, exhibit a profound economy of organization and structure, which can be termed biological parsimony. At the same time, there are biological phenomena which seem to run counter to this principle, at least at face value. In this post, this ‘counterpoint’ theme is continued, with an emphasis on the organization of genomes. In particular, the genome sizes of the most complex forms of life (unlike simpler bacteria) superficially considerably exceed the apparent basic need for functional coding sequences alone.
Complex Life With Sloppy Genomes?
When it comes to genomics, prokaryotes are good advertisements for parsimony. They have small and very compact genomes, with minimal intergenic spaces and few introns. Since their replication times are typically very short under optimal conditions, the time and energy requirements for genomic replication are often significant selective factors, tending to streamline genomic sizes as much as possible. A major factor for the evolution of prokaryotic organisms is their typically very large population size, which promotes the rapid positive selection of small fitness gains. Prokaryotic genomes are thus under intense selection for functional and replicative simplicity, leading to rapid elimination of non-essential genomic sequences.
Yet the situation is very different for more complex biologies of eukaryotes, where genome sizes are commonly bigger by 1000-fold or more than that of the bacterial laboratory workhorse, E. coli. It is widely recognized that this immense differential is enabled in eukaryotic cells through the energy dividend provided by mitochondria, the organelles acting as so-called powerhouses of such cells. Mitochondria (and chloroplasts in plants) are intracellular symbiotes, descendents of ancient bacterial forms which entered into an eventual partnership with progenitors of eukaryotic cells, and in the process underwent massive genomic reduction. The energetic contribution of mitochondria enabled much larger cells, with concomitant much larger genomes.
If eukaryotic genomes can be ‘sloppy’, and accommodate very large tracts of repetitive DNAs deriving from parasitic mobile elements, or other non-coding sequences, where is the ‘parsimony principle’ to be found? We will return to this question later in this post, but first let’s look at some interesting issues revolving around the general theme of genomic size.
Junk is Bunk?
While a significant amount of genomic sequence in a wide variety of complex organisms is now known to encode not proteins but functional RNAs, genome sizes still seem much larger than what should be strictly necessary. This observation is emphasized by the findings of genomic sequencing projects, where complex organisms, including Homo sapiens, show what seems at first glance to be a surprisingly low count of protein-coding genes. In addition, closely related organisms can have markedly different genome sizes. These observations are directly pertinent to the ‘C-value paradox’, which refers to the well-documented disconnect between genome size and organismal complexity. Since genomic size accordingly appears to be arbitrarily variable (at least up to a point), much non-coding DNA has been considered by many in the field to be ‘junk’. In this view, genomic expansion (by duplication events or extensive parasitism by mobile genetic elements) has little if any selective impedance until finally limited by truly massive genomic sizes. In other words, the junk DNA hypothesis holds that genomes can accumulate large amounts of superfluous sequence which are essentially along for the ride, being replicated in common with all essential genomic segments. This trend is only restricted when genomes reach a size which eventually does impact upon the relative fitness of an organism. Thus, even the junk DNA stance concedes that genomes must necessarily be size-restricted, even though a lot of genomic noise can be tolerated before this point is reached.
It must be noted that the junk DNA viewpoint has been challenged, broadly along two separate lines. One such counterpoint holds that the apparent lack of function of large sectors of eukaryotic genomes is simply incorrect, since a great deal of the ‘junk’ sequences are transcribed into RNAs with a variety of essential cellular functions beyond encoding proteins. As noted above, there is no question that functional (non-coding) RNAs are of prime importance in the operations of all cellular life. At a basic level this has been known almost since the birth of molecular biology, since ribosomal RNAs (rRNAs) and transfer RNAs (tRNAs) have been described for many decades. These RNAs are of course essential for protein synthesis, and are transcribed from corresponding genomic DNA sequences.
But in much more recent times, the extent of RNA function has become better appreciated, to include both relatively short regulatory molecules (such as microRNAs [miRNAs]) and much longer forms (various functional non-coding species [ncRNAs]). While the crucial importance of these classes of nucleic acid functionalities is beyond dispute, the relevance of this to genome sizes is another matter entirely. To use the human genome as a case in point, even if the number of functional RNA genes was twice the size of the protein-coding set, the net genome size would still be much larger than required. While the proponents of the functional-RNA refutation of junk DNA have pointed to the evident transcription of most (if not all) of complex vertebrate genomes, this assertion has been seriously challenged by the Hughes (Toronto) lab as based on inadequate evidence.
Other viewpoints suggest that the large fraction of eukaryotic chromosomal DNA which is seemingly superfluous is in fact necessary, but without a strong requirement for sequence specificity. We can briefly consider this area in a little more detail.
Genomic Junk and Some ‘Indifferent’ Viewpoints
One of these proposals, the ‘skeletal DNA’ hypothesis, as largely promulgated by Tim Cavalier-Smith, side-steps the problem of whether much of the genome is superfluous junk or not, in favor of a structural role for the large non-genic component of the genome. Here the sequence of the greater part of the ‘skeletal’ DNA segments is presumed to be non-specific, where the main evolutionary selective force is for genomic size per se, irrespective of the sequences of non-genic regions. Where DNA segments are under positive selection but not in a sequence-specific manner, the tracts involved have been termed ‘indifferent DNA’, which seems an apt tag in such circumstances. Cavalier-Smith proposes that genomic DNA acts as a scaffold for nuclei, and thus nuclear and cellular size correlate with genome sizes. But at the time, DNA content itself does not directly alter proliferating cell volumes; rather the latter results from variation in encoded cell cycle machinery and signals (related to cellular concentrations of control factors).
Another proposal for the role of large non-coding genomic segments could be called the ‘transposable element shield’ theory. In this concept (originally put forward by Claudiu Bandea) so-called junk genomic segments reduce the risk that vital coding sequences will be subjected to insertional inactivation by parasitic mobile elements. Once it has been drawn to one’s attention, this proposal has a certain intuitive appeal. Thus, if 100% of a complex genome was comprised of demonstrably functionally sequences, then by definition any insertion by a parasitic transposable sequence element would knock out a function (or at least have a very high probability of doing so). If only 10% of the genome was of vital functional significance, and the rest a kind of shielding filler, then the insertional risk goes down by an order of magnitude. This model assumes that insertion of mobile elements is sequence-target neutral, or purely random in insertion site. Since this is not so for certain types of transposable elements, the Bandea proposal also encompasses the notion that protective genomic sequences are not necessarily arbitrary, but include sequences with a decoy-like function, to absorb parasitic insertions with reduced functional costs. Strictly speaking, then, this proposal is not fully ‘indifferent’ in referring to ‘junk’ DNA, but clearly is at least partially so. It should be noted as well that shielding against genomic parasitism is of significance for multicellular organisms with large numbers of somatic cells, as well as germline protection.
In the context of whether genomes increase in size by the accumulation of ‘junk’ or through selectable (but sequence-independent) criteria, it should be noted that a strong case has been made Michael Lynch and colleagues for the significance of non-adaptive processes in causing changes in genome size, especially in organisms with relatively low replicative population sizes (the opposite effect to large-population prokaryotes, as noted above). The central issue boils down to energetic and other functional costs – if genome sizes can expand with negligible or low fitness cost, passive ‘junk’ can be tolerated. But a ‘strong’ interpretation of the skeletal DNA hypothesis holds that genome sizes are as large as they are for a selectable purpose – acting as a nuclear scaffold.
In considering the factors influencing the genome sizes of complex organisms, some specific cases in comparative genomics are useful to highlight, as follows.
Lessons from Birds, Bats, and Other ‘Natural Experiments’
Modern molecular biology has allowed the directed reduction of significant sections of certain bacterial genomes for both scientific and technological ends. But some ‘natural experiments’ have also revealed very interesting aspects of vertebrate genomes.
One such piece of highly significant information comes from studies of the genomes of vertebrates that are true fliers, as found with birds and bats. Such organisms are noted collectively for their significantly smaller genomes in comparison to other vertebrates, especially other amniotes (reptiles and mammals). The small-genome / flight correlation has even been proposed for long-extinct ancient pterosaurs, from studies of fossil bone cell sizes. In the case of birds, genome size reduction has been assigned as stemming from loss of repetitive sequences, deletions of certain genomic segments, and (non-essential) gene loss.
A plausible explanation for the observed correlation between the ability to fly and smaller genomes is the high-level metabolic demand of flight. This dictate is argued to favor streamlined genomes, via the reduction in replicative metabolic costs. Supporting evidence for such a contention is provided by the negative correlation between genome size and metabolic rate in all tetrapods (amphibians, reptiles, birds, and mammals), where a useful measure of oxidative metabolic rate is the ‘heart index’, or the ratio of heart mass to body weight. Even among birds themselves, it has been possible to show (using heart indices) negative correlations between metabolic rates and genomic sizes. Thus, highly active fliers with relatively large flight muscle quantities tend to have smaller genomes than more sedate fliers, with hummingbirds (powerhouses of high-energy hovering flight) having the smallest genomes of all birds.
It was stated earlier that closely related organisms can have quite different genome sizes, and the packaging of genomes in such cases can also differ markedly. The Indian muntjac deer has a fame of sorts among cytogeneticists, owing to the extremely low size of its chromosome count relative to other mammals (only 6 diploid chromosomes in females, with an extra one in males). Indeed, the Chinese muntjac has a more usual diploid chromosome count of 46, and yet this deer is closely enough related to Indian muntjacs that they can interbreed (albeit with sterile offspring, reminiscent of mules produced through horse-donkey crosses). The Indian muntjac genome is believed to be the result of chromosomal fusions, with concomitant deletion of significant amounts of repetitive DNAs, and reduction in certain intron sizes. As a result, the Indian muntjac genome is reduced in total size by about 22% relative to Chinese muntjacs.
This illustration from comparative genomics once again suggests that genome size alone cannot be directly related to function. Although the link between numbers of distinct functional elements and complexity might itself be inherently complex, it is reasonable to contemplate what degrees of molecular function are required to build different organisms. If all genomes were entirely functional and ‘needed’, then much more genomic sequence is required to build lungfishes, onions, and many other plants than human beings.
Junk vs. Garbage
A common and useful division of items that are nominally ‘useless’ has been noted by Sydney Brenner. He pointed out that most languages distinguish between stuff that is apparently useless yet harmless (‘junk’), and material that is both useless and problematic or offensive in some way (‘garbage’). An attic may accumulate large amounts of junk which sits there, perhaps for decades, without much notice, but useless items which become odoriferous or take up excessive space are promptly disposed of. The parallel he was making with genomic sequences is clear. ‘Garbage sequences’ that are, or become, deleterious in some way are rapidly removed by natural selection, but this does not apply to sequences which are merely ‘junk’.
Junk sequences thus do not immediately impinge upon fitness, at least in organisms with low population sizes. Also, ‘junk’ may be co-opted during evolution for a true functional purpose, as with the full ‘domestication’ of otherwise parasitic mobile elements. Two important points must be noted with respect to the domestication of formerly useless or even deleterious sequence elements: (1) just because some mobile element residues have become domesticated, it does not at all follow that all such sequences are likewise functional; and (2) the co-option (or ‘exaptation’) of formerly useless DNA segments does not in any way suggest that evolution has kept such sequences ‘on hand’ on the off-chance they might find a future use.
Countervailing Trends for Genomic Size
How do complex genomes expand in size, anyway? Duplication events are a frequent contributor towards such effects, and these processes can range from local effects on relatively small segments, to whole genes, and even entire genomes. The latter kind of duplication leads to a state known as polyploidy, which in some organisms can become a surprisingly stable arrangement.
Yet the major influence on genomic sizes in eukaryotes is probably the activity of parasitic mobile (transposable) elements, such that a correlation between genomic size and their percent constitution by such elements has been noted. It has been suggested that although in some cases very large genomes with a high level of transposable elements appear to be deleterious (notably certain plants believed to be on the edge of extinction), in other circumstances (large animal genomes as seen with salamanders and lungfish) a high load of transposable elements may be tolerated without significant fitness loss. The latter effect has been attributed to a slow acquisition of the mobile elements, whereby their continued spread tends to be inactivated by mutation or other (‘sequence decay’ mechanisms. This in itself can be viewed from the perspective of the ‘garbage/junk’ dichotomy: at least some transposable elements that remain active may be deleterious, and thus suitable for relegation into the ‘garbage’ box, while inactivated elements are more characteristic of ‘junk’.
Yet there is documented evidence indicating a global trend in evolution towards genome reduction, in a wide diversity of organisms. When this pattern is considered along with factors increasing genomic size, it has been proposed that the overall picture is biphasic. In this view, periods of genomic expansion in specific lineages are ‘punctuated’ not by stasis (as the original general concept of ‘punctuated equilibrium’ proposed) but with slow reduction in genomic sizes. Though the metabolic demands of flying vertebrates may place special selective pressures towards genomic reduction, a general trend towards genomic contraction suggests that selection always tends to favor smaller and more efficient genomes. Even where the selective advantage of genome size is small and subtle, over evolutionary time it will be inevitably exerted with the observed results. But at the same time, genomic copy-errors (from small segments to whole genes to entire genomes) and parasitic transposable elements act as an opposing influence towards genomic expansion. And in this context, it is important to recall the above notes (from Michael Lynch and colleagues) with respect to the importance of organismal population size in terms of the magnitudes of the selective pressures dictating the streamlining of genomes.
A human genome-reduction project (actually rendered much more feasible by the advent of new genome-editing techniques) could presumably produce a fully-functional human with a much smaller genome, but such a project would be unlikely to pass the scrutiny of institutional bioethics committees. (Arbitrary deletions engendered by blind natural selection will either be positively selected or not; a human project with the tools to reduce genome size would often lack 100% certainty that a proposed deletion would not have deleterious effects). Yet apart from this, we might also ask whether such engineered humans would have an increased risk of somatic cell mutagenesis via transposable elements (leading to cancer), if the Bandea theory of genomic shielding of transposable elements holds water.
Now, what then for parsimony in the light of the cascade of genomic information emerging in recent times?
If the junk DNA hypothesis was truly wrong in an absolute sense (that is, if all genomes were constituted from demonstrably functional sequences), then the parsimony principle might still hold at the genomic level. Here one might claim that all genomic sequences are parsimonious to the extent that they are functionally relevant, and therefore genomes are as large as functionally necessary, but no larger. Yet an abundance of evidence from comparative genomics (as discussed briefly above) suggests strongly that this intrepretation is untenable. But if a typical eukaryotic energy budget derived from mitochondria allows a ‘big sloppy genome’, where does the so-called parsimony principle come in?
The best answer to this comes not from genomic size per se, but gene number and the organization of both gene expression and gene expression products. Consider some of the best-studied vertebrate genomes, as in the Table below. If protein-coding genes only are considered, both zebrafish and mice have a higher count than humans. Nevertheless, as noted above, it is now known that non-coding RNA, both large and small, are very important. If these are noted, and a combined ‘gene tally’ thus calculated, we now find Homo sapiens coming out on top. More useful still may be the count for gene transcripts in general, since these include an important generator of genomic diversity: differential gene splicing.
But what does this mean in terms of complexity? Are humans roughly only twice as complex as mice, or roughly three times as complex as a zebrafish? Almost certainly there is much more to the picture than that, since these superficial observations belie what is likely to be the most significant factor of all: the way expressed products of genomes (both proteins and RNAs) interact, which can impose many hidden layers of complexity onto the initial expression toolkit. These patterns of interactions comprise an organism’s interactome.
How many genes does it take to build a human? Or a mouse, or a fish? As noted earlier in this post, in the aftermath of the first results for the sequencing of the human genome, and numerous other genomes soon afterward, many onlookers expressed great surprise at the ‘low’ number of proteins apparently encoded by complex organisms. Other observers pointed out in turn that if it is not known how to build a complex creature, how could one know what an ‘appropriate’ number of genes should be? Still, a few tens of thousands of genes does seem a modest number, even factoring in additional diversity-generating mechanisms such as differential splicing. At least, this would be the case if every gene product had only a single, unique role in the biology of an organism – but this is manifestly not so.
In fact, single proteins very often have multiple roles, in multiple ways, via the global interactome. An enzyme, for example, may have the same basic activity, but quite distinct roles in cells of distinct differentiation states. Other proteins can exhibit distinct functional roles (‘moonlighting’) in different circumstances. It is via the interactome, then, that genomes exhibit biological parsimony, to a high degree.
This ‘interactomic’ theme will be developed further in the succeeding post.
Some Parsimonious Conclusions
(1) Prokaryotic genomes have strong selective pressures towards small size.
(2) Eukaryotic genomes can expand to much larger sizes, with considerable portions of redundant or non-essential segments, by mechanisms that may be non-adaptive or positively selected (skeletal DNA, transposable element shielding). Such processes include duplication of specific segments (gene duplication) or even whole-genome duplication (polyploidy). This may countered by long-term evolutionary trends towards genome reduction, but the ‘expandability’ of eukaryotic genomes (as opposed to prokaryotes) still remains.
(3) The expressed interactomes of eukaryotes are highly parsimonious.
(4) Biological parsimony is a natural consequence of strong selective pressures, which tend to drive towards biosystem efficiency. But the selective pressures themselves are linked to the energetics of system processes, and population sizes. Thus, a biological process (case in point: genome replication) within organisms with relatively small populations and moderate energetic demands (many vertebrates) may escape strong selection for efficiency, and be subjected to genetic drift and genomic expansion, with a slow counter-trend towards size reduction. An otherwise tolerable process in terms of energetic demands (genome replication once again) may become increasingly subject to selective pressure towards efficiency (size contraction) if an organism’s metabolic demands are very high (as with flying vertebrates).
(5) Based on internal functions alone, it might be possible to synthetically engineer a complex multicellular eukaryote where most if not all of its genome had a defined function, but such an organism would likely be highly vulnerable outside the laboratory to disruption of vital sequences through insertion of parasitic mobile elements.
And to conclude, a biopolyversical rumination:
There are cases of genomes expanding
Into sizes large and outstanding
Yet interactomes still show
That parsimony will grow
Via selective pressures demanding
References & Details
(In order of citation, giving some key references where appropriate, but not an exhaustive coverage of the literature).
‘They have small and very compact genomes, with minimal intergenic spaces and few introns.’ In cases where conventional bacteria have introns, they are frequently ‘Group I’ introns in tRNA genes, which are removed from primary RNA transcripts by self-splicing mechanisms. The ‘third domain of life’, the Archaeal prokaryotes, have tRNA introns which are removed via protein catalysts. See Tocchini-Valentini et al. 2015.
‘….their replication times are typically very short under optimal conditions….’ E. coli can replicate in about 20 minutes in rich media, for example. But not all prokaryotes are this speedy, notably some important pathogens. Mycobacterial doubling times are on the order of 16-24 hr for M. tuberculosis (subject to conditions) or as slow as 14 days for the causative agent of leprosy, M. leprae. For an analysis of the genetics of fast or slow growth in mycobacteria, see Beste et al. 2009. For much detail on Mycobacterium leprae, see this site.
‘ A major factor for the evolution of prokaryotic organisms is their typically very large population size……’ For excellent discussion of these issues, see work from the lab of Michael Lynch, as in Lynch & Conery 2003.
‘……Mitochondria …… entered into an eventual partnership with progenitors of eukaryotic cells, and in the process underwent massive genomic reduction….’ Human mitochondrial genomes encode only 13 proteins. For a general and very detailed discussion of such issues. See Nick Lane’s excellent book, Power, Sex, Suicide (Oxford University Press, 2005.
‘The energetic contribution of mitochondria enabled much larger cells, with concomitant much larger genomes.’ In the words of the famed bio-blogger PZ Myers, ‘a big sloppy genome’ [a post commenting on the hypothesis of Lane & Martin 2010]
‘….complex organisms, including Homo sapiens, show what seems at first glance to be a surprisingly low count of protein-coding genes.’ See (for example) the ENSEMBLE genomic database.
‘…..closely related organisms can have markedly different genome sizes.’ See Doolittle 2013.
‘….even if the number of functional RNA genes was twice the size of the protein-coding set, the net genome size would still be much larger than required.’ The study of Xu et al. 2006 provides (in Supplementary Tables) the striking contrast between the estimated % of coding sequences and genome sizes for a range of prokaryotes and eukaryotes. Although slightly dated in terms of current gene counts, the low ratios of coding sequences in most of the sampled eukaryotes (especially mammals( would stand if even doubled. By the same token, with prokaryotes, a direct correlation exists between coding DNA and genome size, but this relationship falls down for eukaryotes above a certain genome size (0.01 Gb, where the haploid human genome is about 3 Gb; see Metcalfe & Casane 2013).
‘….the proponents of the functional-RNA refutation of junk DNA have pointed to the evident transcription of most if not all of complex vertebrate genomes…..’ The ENCODE project ignited much controversy by asserting that the notion of junk DNA was no longer valid, based on transcriptional and other data. (See Djebali et al. 2012; ENCODE Project Consortium 2012). The ‘junk as bunk’ proposal has itself been comprehensively debunked by Doolittle (2013) and Graur et al. 2013.
‘….. this assertion [widely encompassing genomic transcription] has been seriously challenged as based on inadequate evidence.’ See Van Bakel et al. 2010.
‘…..skeletal DNA hypothesis, as largely promulgated by Tim Cavalier-Smith….’ See Cavalier-Smith 2005.
‘…..this concept (originally put forward by Claudiu Bandea) …..’ See a relevant online Bandea publication.
‘…..shielding against genomic parasitism is of significance for multicellular organisms…..’ Regardless of the veracity of the Bandea hypothesis, a variety of genomic mechanisms for protection from parasitic transposable elements have evolved; see Bandea once more.
‘ Where DNA segments are under positive selection but not in a sequence-specific manner, the tracts involved have been termed ‘indifferent DNA’…..’ See Graur et al. 2013.
‘….a strong case has been made Michael Lynch and colleagues for non-adaptive changes in genome size….’ See Lynch 2007.
‘….molecular biology has allowed the directed reduction of significant sections of certain bacterial genomes ….’ For work on genome reduction in E. coli, see Kolisnychenko et al. 2002; Pósfai et al. 2006. For analogous work on a Pseudomonas species see Lieder et al. 2015. The Venter have (famously) worked on synthetic genomes, which allows the most direct way of establishing the minimal genome for a prokaryotic organism. With respect to this, see Gibson et al. 2010.
‘…birds and bats. Such organisms are noted collectively for their significantly smaller genomes in comparison to other vertebrates.‘ For avian genomes, see Zhang et al. 2014; for bats, see Smith & Gregory 2009. ‘…small-genome / flight correlation has even been proposed for long-extinct ancient pterosaurs’ See Organ & Shedlock, 2009. In this study it was found that ‘megabats’ (larger, typically fruit-eating bats lacking sonar) are even more constrained in terms of genomic size than microbats.
‘ In the case of birds, genome size reduction has been assigned……’ For details in this area, see Zhang et al. 2014.
‘…..evidence of a negative correlation between genome size and metabolic rate …..A measure of oxidative metabolic rate is the ‘heart index’…..’ See Vinogradov & Anatskaya 2006.
‘…highly active fliers with large relative flight muscle quantities tended to have smaller genomes than more sedate fliers. ‘ See Wright et al. 2014.
‘…hummingbirds (powerhouses of high-energy hovering flight) having the smallest genomes of all birds…’ See Gregory at al. 2009.
‘…..the Indian muntjac genome is reduced in total size by about 22% relative to Chinese muntjacs…..’ The Indian muntjac genome is about 2.17 Gb; the Chinese muntjac genome is about 2.78 Gb. See Zhou et al. 2006; Tsipouri et al. 2008.
‘……much more genomic sequence is required to build lungfishes, onions, and many plants than human beings.’ The note regarding onions comes from T. Ryan Gregory (cited as a personal communication by Graur et al. 2013). For lungfish and many other animal genome sizes, see a comprehensive database (overseen by T.R. Gregory). For plant genomes, see another useful database.
‘…… ‘junk’ may be co-opted during evolution for a true functional purpose, as with the full ‘domestication’ of otherwise parasitic mobile elements……’ See Hua-Van et al. 2011.
‘…. because some mobile element residues have become domesticated, it does not at all follow that all such sequences are likewise functional.’ This point has been emphasized by Doolittle 2013.
‘…..a state known as polyploidy…….’ For an excellent review on many aspects of polyploidy, see Comai 2005.
‘……a correlation between genomic size and their percent constitution by such [mobile] elements has been noted.‘ See Metcalfe & Casane 2013.
‘…..has been suggested …….. very large genomes with a high level of transposable elements appear to be deleterious …… in other circumstances ……a high load of transposable elements may be tolerated….’ See Metcalfe & Casane 2013.
‘……documented evidence indicating a global trend in evolution towards genome reduction….’ | ‘…..it has been proposed that the overall picture is biphasic. Periods of genomic expansion in specific lineages are ‘punctuated’ not by stasis (as the original general concept of ‘punctuated equilibrium’ proposed) but with slow reduction in genomic sizes. ‘ See Wolf & Koonin 2013. For a background on the theory of punctuated equilibrium, see Gould & Eldredge 1993.
‘…..human genome-reduction project (actually rendered much more feasible by the advent of new genome-editing techniques)……’ There is so much to say about these developments (including zinc finger nucleases, TALENs, and in particular CRISPR-Cas technology) that it will form the subject of a future post.
‘ ENSEMBLE Dec 2013 release‘ (Table ) See the ENSEMBLE database site.
‘ These patterns of interactions comprise an organism’s interactome.’ Note here that the term ‘interactome’ can be used in a global sense, or for a specific macromolecule. Thus, a study might refer to the ‘interactome of Protein X’, in reference to sum total of interactions concerning Protein X in a specific organism.
Next post: September.
Sometimes Biopolyverse has considered aspects of life which may be generalizable, such as molecular alphabets. This post takes a look at another aspect of complex life which is universal on this planet, and unlikely to be escapable by any complex biology. The central theme is the observation that the fundamental processes of life have an underlying special kind of economy, which may be termed biological parsimony. Owing its scope and diversity, this will be the first of a series dealing with this issue. Here, we will look at the general notion of parsimony in a biological context, and begin to consider why such arrangements should be the rule. Some biological phenomena would seem to challenge the parsimony concept, and in this initial post we will look at certain features of the protein universe in this respect.
In the post of January 2014, the role of biological parsimony in the generation of complexity was briefly referred to. The fundamental issue here concerns how a limited number of genes could give rise to massively complex organisms, by means of processes that shuffle and redeploy various functional components. Thus, the ‘thrifty’ or parsimonious nature of biological systems is effectively enabled by the modularity of a basic ‘parts list’. A modest aphorism could thus state:
“Parsimony is enabled by Modularity; Modularity is the partner of Parsimony”
The most basic example of modularity in biology can be found with molecular alphabets, which were considered in a recent post. Generation of macromolecules from linear sequence combinations of subunits from a distinct and relatively small set (an ‘alphabet’ in this context) has a clear modular aspect. Subunit ‘letters’ of an alphabet can be rearranged in a vast number of different strings, and it is this simple principle which gives biological alphabets immense power as a fundamental tool underlying biological complexity.
This and several other higher-level modular aspects of biological systems are outlined in Table 1 below.
Table 1. Major Levels of Biological Modularity.
- Molecular alphabets: For an extended discussion of this theme, see a previous post. The modularity of any alphabet is implicit in its ability to generate extremely large numbers of strings of variable length, with specific sequences of the alphabetic ‘letters’.
- Small molecular scaffolds: Small molecules have vital roles in a wide variety of biological processes, including metabolic, synthetic, and regulatory activities. In numerous cases, distinct small biomolecules share common molecular frameworks, or scaffolds. The example given here (perhydrocyclopentanophenanthrene skeleton) is the core structure for cholesterol, sex hormones, cardiac glycosides, and steroids such as cortisone.
- Protein folds: Although a large number of distinct protein folds are known, some in particular have been ‘used’ by evolution for a variety of functions. The triosephosphate isomerase (TIM) (β α)8 -barrel fold (noted as the example in the above Table) has been described as the structural core of >170 encoded proteins in the human genome alone.
- Alternate splicing / differential intron & exon usage: The seemingly low numbers of protein-encoding genes in the human genome is substantially boosted by alternate forms of the splicing together of exonic (actual coding) sequence segments from single primary transcripts. This can occur by skipping or incorporation of specific exons. Also, the phenomenon of intron retention is another means of extending the functionality of primary transcripts.
- Alternate / multiple promoters: Many gene products are expressed in different tissues or different developmental stages in multicellular organisms. This is often achieved through single promoters subject to differential activating or repressing influences, such as varying transcription factors, or negative regulation through microRNAs (miRNAs). Another way of extending the versatility of a single core gene is seen where greater than one promoter (sometimes many) are upstream of a core coding sequence. With this arrangement, the regulatory sequence influences on each promoter can be clearly demarcated, and transcripts from each alternate promoter can be combined with alternate splicing mechanisms (as above with (4), often with the expression of promoter-specific 5’ upstream exons. A classic example of this configuration is found with the microphthalmia gene (MITF) which has many isoforms through alternate promoters and other mechanisms.
- Recombinational Segments: As a means of increasing diversity with a limited set of genomic sequences, in specific cell lineages recombinational mechanisms can allow a combinatorial assortment of specific coding segments to produce a large number of variants. The modularity of such genetic sequences in these circumstances is obvious, and is a key feature of the generation of diversity by the vertebrate adaptive immune system.
- Protein complex subunits: Protein-protein interactions are fundamental to biological organization. There are many precedents for complexes made up of multiple protein subunits having distinct compositions in different circumstances. Thus, a single stimulus can signal very different results in different cellular backgrounds, associated with different protein complexes being involved in their respective signaling pathways. Enzymatic complexes, such as those involved in DNA repair, can also show subunit-based modularity.
- Cells: From a single fertilized zygote, multicellular organisms of a stunning range of shapes and forms can be grown, based on differentiation and morphological organization. Thus, cellular units can be considered a very basic form of biological modularity.
Discussion of both small molecule and macromolecular instances of modularity / parsimony will be extended in succeeding posts.
Some of these modularity levels are interlinked in various ways. For example, the evolutionary development of modular TIM barrels may have been enhanced by alternate splicing mechanisms. Indeed, the latter process may be of general evolutionary importance, particularly in the context of gene duplications. In such circumstances, one gene copy can evolve novel functions (subfunctionalization) sometimes associated with the use of alternate splice variation.
* Certainly this Table is not intended to be comprehensive with respect to modularity mechanisms, but illustrates some major instances as pertinent examples.
When a person is referred to as ‘parsimonious’, there are often connotations of miserliness, or a suggestion that the individual in question is something of a skinflint. In a biological context, on the other hand, the label of parsimony is nothing but a virtue, since it is closely associated with the efficiency of the overall biological system.
Pathways to Parsimony
When modular components can be assembled in different ways for different functions, the outcome is by definition more parsimonious than producing distinct functional forms for each task. An alphabetic system underlies the most fundamental level of parsimony, but numerous high-order levels of parsimonious assembly can also exist, as Table 1 indicates.
Evolution itself is highly conducive to parsimony, simply owing to the fact that multiple functional molecular forms can be traced back to a common ancestor which has diversified and branched through many replicative generations. As noted in the footnotes to Table 1, gene duplication (or even genome duplication) is a major means by which protein evolution can occur, via the development of functional variants in the ‘spare’ gene copies. It is the ‘tinkering’ nature of evolution which produces a much higher probability that pre-existing structures will be co-opted into new roles than entirely novel structures developed.
But there is a second evolutionary consideration in the context of biological parsimony, and that is where bio-economies, or bioenergetics, comes to the forefront. Where biosystems are in replicative competition, it is logical to assume that a system with the most efficient means of copying itself will predominate over rivals with relatively inferior processes. And the copying mechanism will be underwritten by the entire metabolic and synthetic processes used by the biosystem in question. Efficiency will thus depend on how streamlined the biosystem energy budget can be rendered, and the most parsimonious solutions to these questions will thus be evolutionarily favored.
If evolution is a truly universal biological feature (as postulated within many definitions of life) then bioparsimony is accordingly highly likely to be a universally observed principle in any biological system anywhere in the universe.
Counterpoints and Constraints: Protein Folding
Certain observations might seem to run in a contrary fashion to the proposed fundamental nature of parsimony and modularity in biology. Let’s initially take a look at protein folding as an initial case in point.
Folds and Evolution
Table 1 highlights the modularity of certain protein folds, but this is certainly not a ubiquitous trait within the protein universe. On the one hand we can cite the instances of specific protein folds which are widespread in nature, fulfilling many different catalytic or structural functions (as with the TIM-barrel fold; Table 1). Yet at the same time, it is true that many folds (>60%) are restricted to one or two functions.
While all proteins may ultimately be traceable back to a very limited set of prototypical forms (if not a universal common ancestor in very early molecular evolution), it appears that some protein folds are much more amenable to evolutionary ‘tinkering’ than others. This has been attributed to structural aspects of certain folds, in particular a property which has been termed ‘polarity’. In this context, polarity essentially refers to a combination of a highly ordered structural scaffold encompassing loop regions whose packing within the total fold is relatively ‘loose’ and amenable to sequence variation.
It follows logically that if mutations in Fold A have a much higher probability of creating novel activities than mutations in Fold B, then variants of Fold A will be more likely to expand evolutionarily (through gene duplication or related mechanisms). Here the TIM-barrel motif is a representative star for the so-called ‘Fold A’ set, which in turn are exhibitors of the polarity property par excellence.
While some natural enzymatic activities are associated with single types of folds, in other cases quite distinct protein folds can mediate the same catalytic processes. (Instances of the latter are known as analogous enzymes). It does not necessarily follow, however, that the absence in nature of an analogous counterpart for any given protein catalyst indicates that an alternative folding solution for that particular catalytic activity is not possible per se. In such circumstances, a potentially viable alternative structure (another polypeptide sequence with a novel fold constituting the potential analogous enzyme) has simply never arisen through lack of suitable evolutionary antecedents.
By their nature, the blind processes of natural selection on a molecular scale will favor certain protein folds simply by virtue of their amenability to innovation. If every catalytic or structural task could be competitively fulfilled by only a handful of folds, the protein folding universe would likely show much less diversity than is noted in extant biology. Evolution of novel folds will be favored when they are more efficient for specific tasks than existing structures. All of this is underpinned by the remarkable parsimony of the protein alphabet, especially when one reflects upon the fact that an astronomical number of possible sequences can be obtained with a linear string of amino acids corresponding to even a small protein.
Parsimony and Necessity
Although so far this musing on parsimony and modularity has barely scratched the surface of the topic as a whole, at this point we can round off this post by considering briefly why parsimonious bio-economies should be so ubiquitously observed.
Some aspects of biology which inherently invoke parsimony may be in themselves fundamentally necessary for any biological system development. For example, molecular alphabets appear to be essential for biology in general, as argued in a previous post. Likewise, while construction of complex macroscopic organisms from a relatively small set of cell types, themselves differentiated from a single zygote, can be viewed as a highly parsimonious system, there may be no other feasible evolutionary pathway which can produce comparable functional results.
But, as indicated by the above discussion of protein folds, other cases may not be quite so clear-cut, and require further analysis. Complex trade-offs may be involved, as with the factors determining genome sizes, which we will address in the succeeding post.
It is clear that evolutionary selection for energetic efficiency is surely a contributing factor to a trend towards biological parsimony, as also noted above. But apart from bioenergetics, one might propose factors in favor of parsimony which relate to the informational content of a cell. Thus, if every functional role required for all cellular activities (replication in particular) was represented by a completely distinct protein or RNA species, it could be speculated that the resulting scale-up of complexity would place additional constraints on functional viability. A great increase in all molecular functional mediators might be commensurate with a corresponding increase in deleterious cross-interactions, solutions for which might be difficult to obtain evolutionarily. Of course, such a ‘monomolecular function’ biosystem would be unlikely to arise in the first place, when competing against more thrifty alternatives. The latter would tend to differentially thrive through reduced energetic demands, if not more ready solutions to efficient interactomes. Consequently, it probably comes down to bioenergetics once more, if a little more indirectly.
Finally, a bio-polyverse salute to the so-called parsimony principle in biology:
Evolution can tinker with bits
In ‘designing’ selectable hits
Is a route to creation
Thus parsimony works, and it fits.
References & Details
(In order of citation, giving some key references where appropriate, but not an exhaustive coverage of the literature).
Some of the issues covered in this post were considered in the free supplementary material for Searching for Molecular Solutions, in the entry: SMS-Extras for Ch. 9 (Under the title of Biological Thrift).
Table 1 Footnote references:
‘ Small molecules have vital roles in a wide variety of biological processes……’ See the above supplementary downloadable material (Searching for Molecular Solutions –Chapter 9).
‘…..The triosephosphate isomerase (TIM) (β α)8 -barrel fold is known as the structural core of >170 encoded proteins….’ See Ochoa-Levya et al. 2013. Additional folds accommodating diverse functions are noted in Osadchy & Kolody 2011.
‘ The modularity of such genetic sequences in these circumstances is obvious, and is a key feature of the generation of diversity by the vertebrate adaptive immune system.’ For a general and search-accessible overview of immune systems, see the text Immunobiology 5th Edition. For an interesting recent hypothesis on the origin of vertebrate adaptive immunity, see Muraille 2014.
‘…..a single stimulus can signal very different results in different cellular backgrounds….’ / ‘ Enzymatic complexes, such as those involved in DNA repair, can also show subunit-based modularity.’ To be continued and expanded in a subsequent post with respect to parsimony involving proteins and their functions.
‘…..the evolutionary development of modular TIM barrels may have been enhanced by alternate splicing mechanisms.’ See Ochoa-Levya et al. 2013.
‘….the latter process [alternate splicing] may be of general evolutionary importance, particularly in the context of gene duplications…..’ See Lambert et al. 2015.
‘ If evolution is truly universal (as postulated within many definitions of life) …..’ See Cleland & Chyba 2002.
‘…..some natural enzymatic activities are associated with single types of folds…’ An example is dihydrofolate reductase (cited also in Tóth-Petróczy & Tawfik 2014), the enzymatic activity of which is mediated by a fold not used by any other known biological catalysts.
‘….in other cases quite distinct protein folds can mediate the same catalytic processes. (Instances of the latter are known as analogous enzymes).’ See Omelchenko et al. 2010.
‘…..an astronomical number of possible sequences can be obtained with a linear string of amino acids corresponding to even a small protein.‘ See an earlier post for more detail on this.
‘…..molecular alphabets appear to be essential for biology in general…..’ See also Dunn 2013.
Next Post: August.
Many prior biopolyverse posts have concerned evolutionary themes, either directly or indirectly. In the present offering, we consider in more detail factors which may limit what evolutionary processes can ‘deliver’. More to the point, are there biological structures which we can conceive, but which could never be produced through evolution, even in principle?
It has been alleged by proponents of so-called ‘Intelligent Design’ (ID) that some features of observable biology are so complex that no intermediate precursor forms can be envisaged in a feasible evolutionary pathway. Of course, the hidden (or not so hidden) agenda with such people is the premise that if natural biological configurations of sufficient complexity exist such that they are truly ‘irreducible’, then one must look to some form of divine intervention to kick things along. In fact, all such ‘irreducibly complex’ examples proffered by such parties have been convincingly demolished by numerous workers with more than a passing familiarity with the mechanism of evolution.
These robust refutations in themselves cannot prove that there is no such thing as a truly evolutionarily irreducible structure in principle. What is needed, then, is not to attempt to find illusory non-evolvable biological examples in the observable biosphere, but to identify holes in the existing functional and structural repertoire as manifested by all living organisms collectively. Biological ‘absences’ could result from two broad possible scenarios: features which are possible, but not present simply due to the contingent nature of evolutionary pathways, and features which have not appeared because there is no feasible route by which they could arise. (Perhaps a third possibility would exist for ID enthusiasts, whereby God had inscrutably chosen not to create any truly irreducible biological prodigies). Of course, deciding between the ‘absent but possible’ and ‘absent and never feasible’ alternatives is not always going to be simple, if indeed it ever is.
The Greatest Show On Any Planet
Richard Dawkins has called it the Greatest Show on Earth. Sean Carroll used words of Darwin himself, “endless forms most beautiful”. These and many other authors have been struck by the incredibly diverse array of living creatures found in a huge variety of terrestrial environments. With the great insights triggered by the labors of Darwin and Wallace, all of this biological wonder can be seen as having been shaped and molded by the blind and cumulative hand of natural selection. And once understood, selective processes can be seen to operate in a universal sense, from single molecules to the most complex arrangements of matter, as long as each entity possesses the means for its own replication. It is for this reason that Darwinian evolution has been proposed as a universal hallmark of life anywhere, whatever form its replicative essence may take. While there may be few things which are truly universal in a biological sense (see a previous post for the view that molecular alphabets are one such case in point), it is hard to escape the conclusion that change through evolution and life go hand-in-hand, no matter what form such life may take.
So where do the ‘endless’ outpourings of biological design innovations ever reach some kind of end-point? There is a classic example that can be considered at this point.
It has often been claimed that a truly human invention unrepresented in nature is the wheel, and this absence has been proposed as a possible true case of ‘irreducible complexity’. At the molecular level, however, wheel-like structures have been documented. Three such cases are known, all rotary molecular motors: the bacterial flagellum, and two component molecular motors of ATP synthase. Remarkable as the latter structures are, it is of course the macroscopic level that people have had in mind when contemplating the apparently wheel-less natural world.
It will be instructive to make a brief diversion to consider what constraints might operate for a biological wheel design on a macroscale, and their general implications for the selection of complex systems. We can refer to a hypothetical macroscopic wheel-organ in a biological organism as a ‘macrobiowheel’, to distinguish it from true molecular-level rotary wheel-like systems. Although beyond the molecular scale, such an organ need not be large, and could in principle be associated with any multicellular animal. Such a postulated biological wheel structure could be used for locomotion in either terrestrial or aquatic environments, using rolling or propeller motion, respectively.
First there is a pseudo-example which should be noted. The animal phylum Rotifera encompasses the set of multicellular though microscopic ‘wheel animalcules’, rotifers, which superficially are characterized by a wheel-like locomotory organ in their aquatic environments. In fact, these ‘wheels’ are an illusory effect created by the sweeping motion of rings of cilia, and thus need not be considered further for the present purposes. Wheels of biological origin that can be unambiguously confirmed with the naked eye (or even a simple microscope) are thus conspicuous by their absence. Is this mere contingency, or strict necessity?
Re-inventing the Wheel
Let’s consider what would be required to construct a macrobiowheel. Firstly, one would have to define what physical features are required – is the wheel structure analogous to bone or other biological organs composed of hard inorganic materials? The problem of how blood vessels and nerves could cross the gap between an axle and a wheel hub has been raised as a seeming insurmountable constraint – but with some imagination potential solutions could be conceived. For example, the axle and bearings could be bathed in a very narrow fluid-filled gap, where vessels on the other side of the gap take up nutrients and transport them to the rest of the living wheel structure (a heart-like pump within the wheel might be required to ensure the efficiency of this, depending on the size of the hypothetical animal). Transmission of nerve signals might be more problematic; perhaps the macrobiowheel could be insensate, although this would presumably be a disadvantage. Conceivably, the same fluid-filled gap could also act as a ‘giant synapse’ for nerve transmission, such that perception of the state of the wheel structure is received as a continuous whole, without discrimination as to specific local wheel regions. (This would thus alert an organism to a problem with its macrobiowheel organ without specifying which particular part is involved; a better arrangement than no information at all). Another possibility is the use of perturbations in local electric fields as a ‘remote’ sensing device, as used by a variety of animals, including the Australian platypus. The rotational motion for the ‘drive axle’ might be obtained from successive linear muscle-powered movements of structures coupled to the axle by gear-like projections.
No doubt much more could be said on this particular theme, but that will be unnecessary. The issue here is not to indulge in wild speculation, but to make the point that it is uncertain whether a biowheel of any scale at the macro-level is an impossibility purely from a biological systems viewpoint alone. So perhaps we could be so bold as to claim that with sufficient ingenuity of design, a true macrobiowheel could be assembled in a functional manner. But having acknowledged this, the formal possibility that a macrobiowheel could exist is not at all the same thing as the question of whether a feasible pathway could be envisaged for such a structure to emerge in terrestrial animals by natural selection. The potential problems to be addressed are (1) too large a jump in evolutionary ‘design space’ (across a fitness landscape) is required; (2) [along with (1)] no selective advantage of intermediate forms is apparent; (3) [along with (1) and (2)] the energy requirements for the system may be unfavorable compared with alternate designs such as conventional vertebrate limbs (consider the problem as noted above of the non-linkage of the macrobiowheel circulatory system from the rest of the organism).
The first problem, the ‘design space jump’ conundrum, implicitly states that a macromutation producing a functional macrobiowheel would be a practical impossibility. In the brief speculation as to how such a biological wheel might be constructed, it is quite clear that multiple novel processes would be required; the macrobiowheel would need to be supported by multiple novel subsystems. Where a macromutation producing any one such subsystem is exceedingly improbable, the chances of the entire package emerging at once is effectively zero. So it is one thing to design a complete and optimized macrobiowheel; to propose a pathway for evolutionary acquisition of this exotic feature we must also rationalize ‘intermediate’ structures with positive fitness attributes for the organism. Thus even if one of the postulated macromutations should amazingly appear, it would be useless for an evolutionary pathway leading to macrobiowheels unless a fitness advantage is conferred. (As always, natural selection cannot anticipate any potential advantage down the line, but adaptations selected for one function may be co-opted for other functions later in evolutionary time). A depiction of the constraints on evolution of macrobiowheels is presented in Fig. 1 below.
Fig. 1. Representations of fitness landscapes for evolution of a locomotion system for a multicellular organism. Here the vertical axes denote relative fitness of expressed phenotypes; different peaks represent distinct genotypes. In all cases, dotted lines indicate ‘moves’ to novel genotypes that are highly improbable (gray) or proscribed through transition to a state of reduced fitness (red). A. In this landscape it is assumed that a macrobiowheel is inherently biologically possible. In other words, for present purposes it is taken that there exists a genotype from which a macrobiowheel can be expressed as a functional phenotype. Yet such a genotype may not be accessible through evolutionary processes. The conclusion of A is that even though a biological construct corresponding to a macrobiowheel is possible, it cannot feasibly arise naturally, since it is effectively impossible to cross an intervening fitness ‘valley’ in a single jump (A to X; gray dotted line), and transition to intermediate forms cannot occur through their lowered fitness relative to any feasible starting point (A to B or C (gray-shaded); red dotted line). In turn, transitions from B or C to peak X (purple dotted lines) cannot take place. It is also implicit in this schema that no other feasible pathway to configurations B or C exist. Thus, configuration (genotype) X is a true case of unattainable or ‘irreducible’ complexity. B, Depiction of a conventional evolutionary pathway whereby the same starting point as in (A) transitions to an improved locomotory arrangement through intermediate forms of fitness benefits.
So, ‘true irreducibility’ can result in principle from inability to create intermediate steps, universal pre-commitment to alternative design, or finally by an absolute incapacity to biologically support the proposed function. Also, the likelihood of a biological innovation acting as a fitness advantage is fundamentally dependent on the nature of the environment. Thus, with respect to our macrobiowheel musings, it has been pointed out that an absence of roads might counter any tendency for wheel-based locomotion to arise. It is not clear, though, whether an organism dwelling in an environment characterized by flat plains might benefit from wheel mobility, and in any case this issue is not relevant to macroscopic aquatic organisms and hypothetical wheel-like ‘biopropellers’ driven by rotary motion (as opposed to micro-scale real rotary bacterial flagella).
A Very Indirect Biological Route to Crossing Fitness Valleys
In a previous post concerning synthetic biology, it has already been noted that human ambitions for tinkering with biological molecules need not suffer from the same types of limitations which circumscribe the natural world. ….. So if a macrobiowheel is compatible with biological systems at all, humans with advanced biotechnologies could then in principle design and construct such a system. Such circumstances are schematically depicted in Fig. 2.
Fig. 2. Potential role of human intervention in the generation of ‘unevolvable’ biological systems, as exemplified here with macrobiowheels. Here the natural fitness landscape of Fig. 1 (orange trace) has superimposed upon it peaks corresponding to biological constructs of human origin. Since the human synthetic biological approach circumvents loss of low-fitness forms through reproductive competition*, ‘intermediate’ forms all are depicted here as having equal fitness. Thus, by human agency, intermediate forms B and C can be used as synthetic stepping stones towards the final (macrobiowheel) product, despite their non-viability under natural conditions (Fig. 1). Alternatively, should it be feasible at both the design and synthetic levels, ‘direct’ assembly of a genome expressing the macrobiowheel structure might be attainable (direct arrow to the ‘X’ peak).
*Note that this presupposes that completely rational design could be instituted, although in reality artificial evolutionary processes might be used to achieve the desired results. But certainly no third-party competitors would be involved here.
Construction of a macrobiowheel would serve to validate the hypothesis that such an entity is biologically possible. Also, demonstration of a final functional wheel-organ would greatly facilitate analysis of what pathways would have to followed if an equivalent structure was to evolve naturally. This would then consolidate the viewpoint that a true macrobiowheel is indeed biologically irreducibly complex. But since other structures and pathways might still exist, it would not serve as formal proof of the irreducibility stance in this case.
The ‘human agency’ inset of Fig. 2 has itself evolved from biological origins, just as for any other selectable attribute. Therefore, from a broad viewpoint, a biological development (human intelligence) can in itself afford an unprecedented pathway for the crossing of fitness valleys which otherwise would be naturally insurmountable. So whether we are speaking of exotica such as macrobiowheels or any other biological structures with truly ‘irreducible complexity’, then their existence could in principle be realized at some future time through the agency of advanced human synthetic biology. And given the current pace of scientific change, such times may arrive much sooner than many might believe.
Finally, we leave this theme with a relevant biopoly(verse) offering:
Biological paths may reveal
What evolution can thus make real
Yet beyond such constraints
And purist complaints
Could we make a true bio-based wheel?
References & Details
‘….proponents of so-called ‘Intelligent Design….’ The ‘poster boy’ of ID is quite probably Michael Behe, of LeHigh University and the Discovery Institute. He is the author of Darwin’s Black Box – The Biochemical Challenge to Evolution (Free Press, 1996), and more recently The Edge of Evolution – The Search for the Limits of Darwinism (Simon & Schuster 2008).
‘…..all such ‘irreducibly complex’ examples proffered by such parties have been convincingly demolished…’ See Zuckerkandl 2006; also a National Academy of Sciences publication by a group of eminent biologists.
‘……a third possibility would exist for ID enthusiasts…..’ A personal perspective: A religious fundamentalist once asked me why there are no three-legged animals; he seemed to somehow think that their absence was evidence against evolution. Of course, the shoe is definitely on the other foot in this respect. If God created low-fitness animal forms that prevailed (among which tripedal animals would likely be included) , or fabulous creatures without any conceivable evolutionary precursors, then that in itself would be counted as ID evidence.
‘ Richard Dawkins has called it the Greatest Show on Earth.’ This refers to his book, The Greatest Show on Earth: The Evidence for Evolution. Free Press (2010).
“ Sean Carroll used words of Darwin himself, “endless forms most beautiful”. The renowned developmental biologist Sean Carroll published a popular book entitled Endless Forms Most Beautiful – The New Science of Evo Devo, which gives a wonderful overview of the field of evolutionary development, or how the development of multicellular organisms from single cells to adult forms has been shaped by evolution. Darwin referred to “endless forms most beautiful” in the final section of The Origin of Species.
‘….the blind and cumulative hand of natural selection.’ This is not to say that the complete structure of biological entities, from genome to adult phenotype, is entirely a product of classical natural selection, but the latter process is of prime significance. For a very informative discussion of some of these issues, and the influence of non-adaptive factors in evolution, see Lynch 2007.
‘……Darwinian evolution has been proposed as a universal hallmark of life anywhere….’ For a cogent discussion of the NASA ‘evolutionary’ definition and related issues, see Benner 2010.
‘……the wheel, and this absence has been proposed as a possible true case of ‘irreducible complexity’ ‘ See Richard Dawkins’ The God Delusion, Bantam Press (2006).
‘….animal phylum Rotifera……..’ See Baqui et al. (2000) for their rotifer site, which provides much general information and further references.
‘…….how blood vessels and nerves could cross the gap between an axle and a wheel hub has been raised as a seeming insurmountable constraint……’ | ‘….an absence of roads might counter any tendency for wheel-based locomotion to arise…..’ See again Dawkins’ The God Delusion, Bantam Press (2006).
‘……..the use of perturbations in local electric fields as a ‘remote’ sensing device, as used by a variety of animals, including the Australian platypus.’ For more background on electroreception, especially in the platypus, see Pettigrew 1999, and Pedraja et al. 2014.
‘……could exist is not at all the same thing as the question of whether a feasible pathway could be envisaged for such a structure to emerge ……. by natural selection.’ For an extension of this theme at the functional RNA level, see Dunn 2011.
‘ Fig. 1. Representations of fitness landscapes…..’ Further discussion of evolutionary problems in surmounting fitness valleys can be found in Dunn 2009. The title of Dawkins’ book Climbing Mount Improbable (1997; W. W. Norton & Co) is in itself a fine metaphor for how cumulative selectable change can result in exquisite evolutionary ‘designs’, which of course is the major theme of the book.
‘……advanced human synthetic biology….’ The ongoing role of synthetic biology in testing a variety of possible biological scenarios was also discussed in a previous post under the umbrella term of ‘Kon-Tiki’ experiments.
Next Post: April.
Previous posts from biopolyverse have grappled with the question of biological complexity (for example, see the post of January 2014). In addition, the immediate predecessor to the current post (April 2014) discussed the essential role of molecular alphabets in allowing the evolution of macromolecules, themselves a necessary precondition for the complexity requirements underlying functional biology as we understand it. Yet although molecular alphabets enable very large molecules to become the springboard for biological systems, another often overlooked factor in their synthesis exists, and that is the theme of the present post.
Initially, it will be useful to consider some aspects of the limitations on molecular size in living organisms.
How Big is Big?
If it is accepted that biological complexity requires molecules of large sizes (as examined in the previous post), what determines the upper limits of such macromolecules? At the most fundamental level of chemistry, ultimately determined by the ability of carbon atoms to form concatenates of indefinite length, no direct constraints on biomolecular size appear to exist. In seeking examples to demonstrate this, we need look no further then the very large single duplex DNA molecules which constitute individual eukaryotic chromosomes. The wheat 3B chromosome is among the largest known of these, with almost a billion base pairs, and a corresponding molecular weight of around 6.6 x 1011 Daltons.
But in almost all known eukaryotic cases, an individual chromosome does not equate with genome size. In other words, a general rule is that it takes more than one chromosome to constitute even a haploid (single-copy) genome. Why then should not all genomes be composed of a single (very) long DNA string, rather than being constituted from separate chromosomal segments? And why should separate organisms differ so markedly in their chromosome numbers (karyotypes)? At least a part of an answer to this may come down to contingency, where alternative chromosomal arrangements may have been equally effective, but one specific configuration has become arbitrarily fixed during evolution of a given species. But certainly other factors must exist which are connected ultimately to molecular size. A DNA molecule of even ‘average’ chromosomal size in free solution would be an impractical prospect for containment within a cell nucleus of eukaryotic dimensions, unless it was ‘packaged’ in a manner such that its average molecular volume was significantly curtailed. And of course the DNA in natural chromosomes is indeed packaged into specific complexes with various proteins (particularly histones), and to a lesser extent RNA, termed chromatin.
Yet even a good packaging system must have its limits, and in this respect it is likely that selective pressures exist that act as restrictions on the largest chromosomal sizes. An extremely long chromosomal length may eventually reach a point where its functional efficiency is reduced, and organisms bearing such karyotypic configurations would be at a selective disadvantage.
No biological proteins can begin to rival the sheer molecular weights of chromosomal DNA molecules, but once again there is no fundamental law that prevents polypeptide chains from attaining an immense length, purely from a chemical point of view. Of course, proteins (in common with functional single-stranded RNA molecules) have a very significant constraint placed upon them relative to linear DNA duplexes. Biological proteins must fold into specific three-dimensional shapes even to attain solubility, let alone exhibit the astonishing range of functions which they can manifest. This folding is directed by primary amino acid sequence, and this dictate dramatically reduces the number of potentially useful forms which could arise from a polypeptide of even modest length. Yet since the largest proteins (such as titin, considered in the previous post) are composed of a series of joined modules, the ‘module-joining’ could in principle be extended indefinitely to produce proteins of gargantuan size.
So why not? Why aren’t proteins on average even bigger? Here one might recall a saying attributed to Einstein, “Keep things as simple as possible, but no simpler”, and repackage it into an evolutionary context. Although many caveats can be introduced, it is valid to note that evolutionary selection will tend to drive towards the most parsimonious ‘solutions’ to biological imperatives. Thus, the functions performed by proteins are usually satisfied by molecules which are large by the standards of small-molecule organic chemistry, but much smaller than titin-sized giants of nearly 30,000 amino acid residues. A larger version of an existing protein will require an increased energy expenditure for its synthesis, and therefore will be selected against unless it offered a counter-balancing significant advantage over the existing wild-type form.
So selective pressures ultimately deriving from the cellular energy balance-sheet will often favor smaller molecules, if they can successfully compete against larger alternatives. But another factor to note in this context – and this brings us to the major theme of this post – is the sheer time it takes to synthesize an exceedingly large molecule. Clearly, this synthetic time is itself determined by the maximal production rates which can be achieved by biochemical mechanisms available to an organism. Yet even with the most efficient systems, it is inevitable that eventually a molecular size threshold will be crossed where the synthetic time requirement becomes a negative fitness factor. In this logical scenario, a ‘megamolecule’ might provide a real fitness benefit, but lose competitiveness through the time lag required for its synthetic production relative to alternative smaller molecular forms.
These ‘drag’ effects of biosynthetic time requirements are not merely hypothetical, and can be relevant for chromosomal DNA replication, to briefly return to the same example as used above. Although as we have seen, chromosome length and number do not directly equate with genome size, as far as a cell is concerned, it is the entire genome that must be replicated before cell division can proceed. In this respect, it is notable that certain plants have genomes of such size that their genomic replication becomes a significant rate-limiting step in comparison to other related organisms.
Life in the Fast Lane
Let’s consider primordial replicative biosystems (perhaps pre-dating even the RNA World, and certainly the RNA-DNA-Protein World – see a previous post), where the machinery for replication of informational biomolecules is at a rudimentary stage of evolutionary development. In such a case, it can be proposed that an individual biosystem will selectively benefit from mutations in catalysts directing its own replication, where the mutational changes increase the efficiency and rate of replicative synthesis. This simply follows from the supposition that for biosystems A and B replicating in time t, if for one copy of B, n copies of A are made (where n > 1.0), then A systems will eventually predominate. Even very small positive values of n will still have the same end result. In principle, numerous factors could result in an enhancement of this n value, but here we are assuming that a simple increase in replicative rate would do the trick.
But improved replicative rates could also have an accelerating effect on early biosystem molecular evolution, by enabling the synthesis of larger molecular forms than were previously feasible. This assumes that a slow replication rate for essential biomolecular components of an early ‘living’ system would mean that its upper molecular size limits were much more constrained than for alternative ‘faster’ variants. Such a scenario could arise for any very long molecular concatenate whose replication rate was too slow to be an effective functional member of a simple co-operative molecular system. Faster replication rates would then be in effect enabling factors for increased molecular size, and in turn increased molecular complexity. Fig. 1 depicts this putative effect in two possible modes of operation.
Fig. 1: Proposed effects of enhancement in synthetic rates as enabling factors for increased molecular size and complexity in early biosystems. Increased rates of biosynthesis leading to increased replicative rates in themselves provide a selective advantage (top panel). Yet it can also be considered that an acceleration of synthetic rate potential could also act as an enabling factor for increased potential molecular size, and in turn increasingly complex molecular structures. This might occur through ‘quantum leaps’ (bottom panel, A), where at certain crucial junctures a small rate increase has a large flow-on effect in terms of size enablement, or via a more continuous process (B), where rate increases are always associated with size and complexity enablement. In both cases, though, such effects could not occur indefinitely, owing to an increasing need for regulation of synthetic rates within complex biosystems.
In a very simple replicative system, a single catalyst might determine the replication rate of all its individual components, and accordingly the replication speed of the system as a whole. But increasing catalytic replicative efficiency could become a victim of its own success as system complexity (associated with enhanced reproductive competitiveness) rises. In such cases, differential replicative rates of different components will determine system efficiency. It is both energetically wasteful and potentially a wrench in the works if system components only needed in several copies are made at the same level as components needed in hundreds of copies. Clearly, system regulation is needed in such circumstances, and without it, molecular replication enhancement is likely to be detrimental beyond a certain point. This eventuality is schematically depicted in Fig. 2.
Fig. 2: Proposed effect of introduced regulatory sub-systems on sustaining enhanced biosystem replicative rates. This suggests that even at the same replicative speed, a regulated system will be better off than an unregulated one; and that higher speeds may be permitted by tight regulation. But limits are placed even here. Absence of controlled regulation would probably apply only in the very earliest of emerging biosystems. In other words, the co-evolved regulation is likely to have been a fundamental feature of biosystem synthetic rates, since an imbalance between rates of production of the components of gene expression would be deleterious even in simple systems.
Until this point, we have been considering replication of biosystem molecules in quite simplistic terms. In real systems of a biological nature, functional molecules undergo several levels of processing beyond their basic replicative synthesis. It is appropriate at this point to take a quick look at some of these.
Processing Levels and Biological Synthetic Speed
In even relatively simple bacterial cells, both RNA and protein molecules typically undergo extensive processing, in a variety of ways. And this trend is considerably more emphasized in complex eukaryotes. Although an in-depth discussion of such effects is beyond the scope of the present post, some of them (but by no means all) are listed in Table 1 below.
Table 1. Levels of processing involving primary transcription or translation. These processes can be considered as secondary steps which are required for the complete maturation of biological macromolecules, varying by type and biological circumstances. Where several processing levels are necessary, any one of them is potentially a rate-limiting step for production of the final mature species. It should be noted that while some of these processes are near-universal (such as accurate protein folding following primary polypeptide chain expression), some are restricted to a relatively small subset of biological systems (such as protein splicing via inteins).
One way of enhancing the overall production rates of biological macromolecules bearing modifications after primary transcription and translation is to couple processes together. For protein expression, mRNA transcription and maturation is itself a necessary initial step, and mRNA and protein synthesis are in fact coupled in prokaryotic cells. Where transcription and translation are so linked, a nascent RNA chain can interact with a ribosome for polypeptide translation initiation before transcription is complete.
In contrast, such transcriptional-translational coupling is not found in eukaryotic cells, where mature mRNAs are exported from the nucleus for translation via cytoplasmic ribosomes. Yet examples of ‘process coupling’ can certainly still be uncovered in complex eukaryotes, with a good example being the coupling of primary transcription with the removal of intervening sequences (introns) via splicing mechanisms mediated by the RNA-protein complexes termed spliceosomes.
The sheer complexity of the diverse processing events for macromolecular maturation in known biological systems serves to emphasize the above-noted point that regulation of the replication of biomolecules in general is far from a luxury, but an absolute pre-requisite. Before complex biosystems had any prospects of emerging in the first place, at least basic regulatory systems for replicative processes would necessarily have already been in place, in order to allow the smooth ‘meshing of parts’ which is part and parcel of life itself.
Speed Trade-Offs and Regulation
There is certainly more than one way for a replicative system to run off the rails, like a metaphorical speeding locomotive, if increasing replicative rates are not accompanied by regulatory controls. A key factor which will inevitably become highly significant in this context is the replicative error rate, or replicative fidelity. ‘Copying’ at the molecular level would ideally be perfect, but this is no more attainable in an absolute sense than the proverbial perpetual motion machine, and for analogous entropic reasons. Thus, what a biosystem can gain in the roundabouts with an accentuated replication rate, it may lose in the swings with loss of replicative accuracy. The problem of fidelity, particularly with the replication of key informational DNA molecules, has been addressed up to a point by the evolution of proof-reading mechanisms (where DNA polymerases possess additional enzymatic capabilities for excising mismatched base-pairs), and DNA repair systems (where damaged DNA is physically restored to its original state, to avoid damage-related errors being passed on with the next replication round). Although such systems might seem obviously beneficial for an organism, there are trade-offs in such situations. Proof-reading may act as a brake on replicative speeds, and also comes at a significant energetic cost.
The complexities of regulatory needs also dictate that rates at some levels of biological synthesis are less than what could be achieved were the component ‘factories’ to be completely unfettered. A good example of this is the relative rate of translation in prokaryotes vs. eukaryotes, where the latter have a significantly slower rate of protein expression on ribosomes. It is highly likely that a major reason for this is the greater average domain complexity of eukaryotic proteins, which require a concomitantly longer time for correct folding to occur, usually as directed by protein chaperones. A striking confirmation of this, as well as a very useful application, has been to employ mutant ribosomes in E. coli with a slower expression rate. When this was done, significant enhancement of the folding of eukaryotic proteins was observed, to the point where proteins otherwise virtually untranslatable in E. coli could be successfully expressed.
Speed Limits In Force?
How can the rates of biological syntheses be slowed down? In principle, one could envisage a number of ways that this could be achieved. In one such process, the degeneracy of the genetic code (where a single amino acid is specified by more than one codon) has been exploited through evolutionary time as a means for ‘speed control’ in protein synthesis. Degenerate triplet ‘synonymous’ codons differ in the third ‘wobble’ positions. For example, the amino acid alanine is specified by four mRNA codons, GCA, GCG, GCC, and GCU. Where synonymous codons in mRNAs are recognized by specific subsets of transfer RNA (tRNA) molecules within the total tRNA group charged with the same amino acid, translational speed can be significantly influenced by the size of the relevant tRNA intracellular pools. To illustrate this in simplified form, consider a specific amino acid X with codons A, B, C, and D, where relevant tRNA molecules a, b, c, and d exist (such that when charged with the correct amino acid, tRNA-aX, tRNA-bX, tRNA-cX and tRNA-dX are formed). Here we arbitrarily assign tRNA-a and –b as mutually recognizing both the codons A and B, and likewise tRNA-c and –d as mutually recognizing the codons C and D. If the tRNA pools for the latter C and D codons are less than those for A and B codons, then the C / D synonymous codons are ‘slow’ in comparison with A and B. A known determinant of tRNA pool size (and thus in turn codon translational efficiency and speed) is the respective tRNA gene copy number. Thus, in this model, it would be predicted that the gene copy number for (A +B) would be significantly greater than for (C + D). Where there are selectable benefits in slowing down translation rates, the use of ‘slow’ codons is thus a useful strategy known to be pervasively applied in biology.
So, the initial and simplistic picture of ‘more is better’ which is logically applicable in very basic organized biosystems (Fig. 1) is not compatible with more advanced cellular systems. This must be kept in mind if we ask whether current biological synthetic rates could be accelerated across the board, either through natural evolution or artificial synthetic biological intervention. So much interlinking of distinct biological processes exists that it would seem difficult for evolutionary change itself to have much impact on synthetic rates in the most fundamental circumstances. Single mutations that accelerate a synthetic process will almost always fail to accommodate the global biosystem’s optimal requirements, and therefore elicit a fall in fitness. From this stance, fundamental synthetic rates would seem likely to be ‘locked in’ or ‘frozen’ by the need for each component of complex regulatory networks to be compatible with each other. Synthetic biology, on the other hand, is not necessarily limited in this way, but even here the would-be biological tinkerer would have to construct multiple changes in a biosystem at once. So global and fundamental changes in biological synthetic rates are not likely to be on the agenda in the near-term future.
To conclude, a biopoly(verse) appropriate for this post’s theme:
Let’s consider synthetic speed
As a potent driver, indeed
An organism’s fate
My come down to rate
The faster, the more it can breed
But recall the many caveats made above with respect to regulation…..
References & Details
‘……..wheat 3B chromosome is among the largest known of these……….’ See Paux et al. 2008.
‘….in almost all known eukaryotic cases, an individual chromosome does not equate with genome size.’ The Australian ant Myrmecia pilosula (the ‘jack jumper’ ant) has been reported to have only a single chromosomal pair, such that somatic cells of haploid males bear only a single chromosome. See Crosland & Crozier 1986.
‘ An extremely long chromosomal length may eventually reach a point where its functional efficiency is reduced, and organisms bearing such karyotypic configurations would be at a selective disadvantage.‘ The evolution of chromosome length cannot be studied without considering the role of non-coding DNA, which composes a large percentage of the total genomes of many organisms. By reducing the amounts of non-coding DNA tracts relative to coding sequences, chromosome number can be reduced without necessitating commensurately extended individual remaining chromosomes.
‘….the number of potentially useful forms which could arise from a polypeptide of even modest length….’ Even a small protein of 100 amino acid residues could in principle be composed of 20100 different sequences, for a protein of titin size the number is beyond hyper-astronomical (2026,926).
‘….titin-sized giants of nearly 30,000 amino acid residues….’ Titins and other very large proteins are found in muscle tissues, where they have a physical role as molecular ‘springs’ and fibers, or their attendant co-functionary species. It is presumed that in this specialized context, proteins of such extreme size were advantageous over possible alternatives with smaller macromolecules.
‘…..certain plants have genomes of such size that their genomic replication becomes a significant rate-limiting step…’ Here the plant Paris japonica with 1.3 x 1011 base pairs is the current place-holder, and has a concurrent slow growth rate. See a Science report by Elizabeth Pennisi.
‘….protein splicing via inteins….’ For a recent review and discussion of intein applications, see Volkmann & Mootz 2013.
‘……a good example being the coupling of primary transcription with the removal of intervening sequences (introns) via splicing mechanisms ……. ‘ See Lee & Tam 2013 for a recent review.
‘……such systems [proof-reading and repair] might seem obviously beneficial for an organism, there are trade-offs in such situations….’ It is also interesting to consider that a low but significant level of mutation is ‘good’ in evolutionary terms, in providing (in part, along with other mechanisms such as recombination) the raw material of genetic diversity upon which natural selection can act. But of course, this benefit is not foreseen by selection upon individual organisms: only immediately selectable factors such as metabolic costs are relevant in such contexts.
‘…..proof-reading mechanisms (where DNA polymerases possess additional enzymatic capabilities for excising mismatched base-pairs……’ Proof-reading DNA polymerases possess 3’-exonucleolytic activity that excises base mismatches, allowing correction re-insertion of the appropriate base.
‘……has been to employ mutant ribosomes in E. coli with a slower expression rate. ….. significant enhancement of the folding of eukaryotic proteins was observed….’ For this work, and a little more background on eukaryotic vs. prokaryotic expression, see Siller et al. 2010.
‘…..the degeneracy of the genetic code (where a single amino acid is specified by more than one codon) has been exploited through evolutionary time as a means for ‘speed control’….’ Different classes of eukaryotic proteins have different requirements for enforced ‘slow-downs’, and secreted and transmembrane proteins are major examples of those which benefit from such imposed rate controls. (See Mahlab & Linial 2014). Additional complications arise from the role of sequence context effects (local mRNA sequence environments), as noted in prokaryotes by Chevance et al. 2014. In E. coli, many specific synonymous codons can be removed and replaced with others with little apparent effect on fitness, but notable exceptions to this have been found. See in this respect the study by Lajoie et al. 2013.
Next post: January 2015.
In the very first post of this series, reference was made to ‘molecular alphabets’, and in a post of last year (8th September) it was briefly proposed that molecular alphabets are so fundamental to life that a ‘Law of Alphabets’ might even be entertained. This theme is further developed in this current post.
How to Make A Biosystem
The study of natural biology provides us with many lessons concerning the essential properties of life and living systems. A recurring and inescapable theme is complexity, observed across all levels from the molecular to cellular scales, and thence to whole multicellular organisms. While the latter usually have many layers of additional complexity relative to single-celled organisms, even a typical ‘simple’ free-living bacterial cell possesses breath-takingly complex molecular operations which enable its existence.
Why such complexity? In the first case, it is useful to think of the requirements for living systems, as we observe them. While comprehensive definitions of life are surprisingly difficult, the essence of biology is often seen as informational transfer, where the instructions for building an organism (encoded in nucleic acids) are replicated successively through continuing generations. (A crucial accompaniment to this is the ability of living organisms to evolve through Darwinian evolution, since no replicative system can ever be 100% error-free, and reproductive variation provides the raw material for natural selection). But while the replication of genomes may be the key transaction, it is only enabled by a wide array of accompanying functions provided by (largely) proteins. The synthesis of proteins and complex cellular structures requires energy and precursor molecules, so systems for acquiring and transducing these into usable forms must also be present.
Molecular Size and Life
The primal ‘motive’ of biological entities to replicate themselves requires a host of anciliary systems for creating the necessary building blocks and structuring them in the correct manner. All this requires energy, the acquisition and deployment of which in turn is another fundamental life need. Processes for molecular transport and recognition of environmental nutrients and other factors are also essential. And since organisms never exist in isolation, systems for coping with competitors and parasites are not merely an ‘optional extra’. Although all of these activities are associated with functional requirements necessitating certain distinct catalytic tasks, a major driver of complexity is the fundamental need for system regulation. In many cases, the orderly application of a series of catalyses is essential for obtaining an appropriate biological function. But in general, much regulatory need comes down to the question of efficiency.
This has been recognized from the earliest definition of regulatory systems in molecular biology. The lac operon of E. coli regulates the production of enzymes (principally ß-galactosidase) involved with the metabolism of the sugar lactose. If no lactose is available in the environment, it is clearly both functionally superfluous and energetically wasteful to synthesize the lactose-processing enzymes. Thus, a regulatory system that responds to the presence of lactose and switches on the relevant enzyme production would be beneficial, and this indeed is what the natural lac operon delivers. In general, an organism that possesses any regulatory system of this type (or various other types of metabolic regulators) will gain a distinct competitive edge over organisms lacking them. And hence this kind of selection drives the acquisition of complex regulatory systems.
So, if complexity is a given, how can this be obtained in molecular terms? How can the molecular requirements for both high catalytic diversity and intricate system regulation be satisfied? An inherent issue in this respect is molecular size. Biological enzymes are protein molecules that can range in molecular weight from around 10 kilodaltons (kD) to well over an order of magnitude greater. If we look beyond catalysts to include all functional molecules encountered in complex multicellular organisms, we find the huge protein titin, an essential component of muscle. Titin is composed of a staggering 26,920 amino acid residues, clocking up a molecular weight of around 3 megadaltons.
But in terms of catalysis itself, why is size an issue? This is a particularly interesting question in the light of relatively recent findings that small organic biomolecules can be effective in certain catalytic roles. Some of these are amino acids (proline in particular), and have hence been dubbed ‘aminozymes’. While certain catalytic processes in living cells may be mediated by such factors to a greater degree than previously realized, small molecule catalysis alone cannot accommodate the functional demands of complex biosystems.
This assertion is based on several factors, including: (1) Certain enzymatic tasks require stabilization of short-lived transitional states of substrate molecules, accomplished by a binding pocket in a large molecule, but difficult to achieve otherwise; and (2) Some necessary biological reactions require catalytic juxtaposition of participating substrate molecules across relatively large molecular distances, a function for which small molecules are unlikely to be capable of satisfying. Even apart from these dictates, the necessity of efficient regulation, as considered above, also limits possible roles for small molecules. A fundamental mechanism for biological control at the molecular level is the phenomenon of allostery, where binding of a regulatory molecule to a site in a larger effector molecule causes a conformational change, affecting the function of the effector molecule at a second distant active site. By definition, to be amenable to allosteric regulation, an effector molecule must be sufficiently large to encompass both an effector site for its primary function (catalytic or otherwise) and a second site for regulatory binding.
Since better regulation equates with improved biosystem efficiency and biological fitness, the evolution of large effector molecules should accordingly be a logical advantage:
Fig. 1: Competitive Advantages and complexity
Even if we accept that molecular complexity and associated molecular size is an inexorable requirement of complex life, why should such biosystems use a limited number of building blocks (molecular alphabets) to make large effector molecules? Why not, in the manner of an inspired uber-organic chemist, build large unique effectors from a wide variety of small-molecule precursor components?
Let’s look at this in the following way. Construction of a unique complex molecule from simpler precursors will necessitate not just one, but a whole series of distinct catalytic tasks, usually requiring in turn distinct catalysts applied in a coordinated series of steps. But, as noted above, mediation of most biological catalytic events requires complex molecules themselves. So each catalyst in turn requires catalysts for its own synthesis. And these catalysts in turn need to be synthesized……all leading suspiciously towards an infinite regress of complexity. This situation is depicted in Fig. 2:
Fig. 2. Schematic depiction of synthesis of a complex uniquely-structured (non-alphabetic) molecule. Note in each case that the curved arrows denote the action of catalysts, where (by definition) the catalytic agent promotes a reaction and may be transiently modified, but emerges at the end of the reaction cycle in its original state. A: A series of intermediate compounds are synthesized from various simpler substrates (S1, S2, S3 …), each by means of distinct catalysts (1, 2, 3….). Each intermediate compound must be sequentially incorporated into production of the final product (catalyst 6 …… catalyst i). Yet since each catalytic task demands complex mediators, each catalyst must be in turned synthesized, as depicted in B. Reiteration of this for each of (catalyst a …… catalyst j) leads to an indefinite regress.
These relatively simple considerations might suggest that attempts to make large ‘non-alphabetic’ molecules as functional biological effectors will inevitably suffer from severe limitations. Are things really as straightforward as this?
Autocatalytic Sets and Loops
There is a potential escape route from a linear infinite synthetic regression, and that is in the form of a loop, where the ends of the pathway join up. Consider a scenario where a synthetic chain closes on itself through a synthetic linkage between the first and last members. This is depicted in Fig. 3A below, where product A gives rise to B, B to C, C to D, and finally D back to A. Here the catalytic agents are shown as external factors, and as a result this does not really gain anything on the linear schemes of the above Fig. 2, since by what means are the catalysts themselves made? But what if the members of this loop are endowed with special properties of self-replicative catalysis? In other words, if molecule B acts on A to form B itself, and C on B to form C, and so on. This arrangement is depicted in Fig. 3B.
Fig. 3. Hypothetical molecular synthetic loops, mediated by external catalysts (A), self-replicating molecules (B), or a self-contained autocatalytic set (C). In cases (B) and (C), each member can act as both a substrate and a catalyst. In case (B), each member can directly synthesize a copy of itself through action on one of the other members of the set, whereas in case (C) the replication of each member is indirect, occurring through their coupling as an autocatalytic unit. Note that in case (B) each catalysis creates a new copy of the catalysts themselves, as well as preserving the original catalysts. For example, for molecule D acting on molecule C, one could write: C [D-catalyst] à D + [D-catalyst] = 2D. In case (C) it is also notable that the entire cycle can be initiated by 4 of the 6 possible pairs of participants taken from A, B, C and D. In other words, the (C) cycle can be initiated by starting only with pairs AD, AB, CD, and CB – but not with the pairs AC and BD. As an example for a starting population of A and D molecules: D acts on A to produce B; remaining A can act on B to produce C; remaining B can act on C to produce D, remaining C acts on D to produce A, thus completing the cycle. If the reaction rates for each were comparable, a steady-state situation would result tending to equalizing the concentrations of each participant.
But the scenarios of Fig. 3 might not seem to approach the problem of how to attain increasing molecular size and complexity needed for intricate biosystems in a non-alphabetic manner. This can readily be added if we assume a steady increase in complexity / size around a loop cycle, with a final re-production of an original component (Fig. 4). These effects could be described in the terms used for biological metabolism: the first steps in the cycle are anabolic (building up of complexity), while the final step is catabolic (breaking down complex molecules into simpler forms).
Fig. 4. A hypothetical autocatalytic loop, where black stars denote rising molecular size and complexity. For simplicity, here each component is rendered in blue when acting as a product or substrate, and in red when acting as a catalyst. Here the additional co-substrates and/or cofactors (assumed here to be simple organics that are environmentally available) are also depicted (S1 – S3) for molecules D, A, and B acting as catalysts. Since C cleaves off an ‘A moiety’ from molecule D, no additional substrate is depicted in this case.
Of course, the schemes of Figs. 3 & 4 are deliberately portrayed in a simple manner for clarity; in principle the loops could be far larger and (as would seem likely) also encompass complex cross-interactions between members of each. Both anabolic and catabolic stages (Fig. 4) could be extended into many individual steps. The overall theme is the self-sustaining propagation of the set as a whole.
So, could autocatalysis allow the production of large, complex and non-alphabetic biomolecules, acting in turn within entire biosystems constituted in such a manner? The hypothetical loop constructs as above are easy to design, but the central question is whether the principles are viable in the real world of chemistry.
In order to address this question, an important point to note is that not just a few such complex syntheses would need to be established for a non-alphabetic biosystem, but very many. And each case would need to serve complex and mutually interacting functional requirements. It is accordingly hard to see how the special demands of self-sustaining autocatalytic loops could be chemically realized on his kind of scale, even if a few specific cases were feasible. The ‘chemical reality’ problem with theoretical autocatalytic systems has been elegantly discussed by the late Leslie Orgel.
Even this consideration does not delve into the heart of the matter, for we must consider how life on Earth – and indeed life anywhere – may attain increasing complexity. This, of course, involves Darwinian evolution via natural selection, which operates on genetic replicators. It is not clear how an autocatalytic set could produce stable variants that could be selected for replicative fitness. Models for replication of such sets as ‘compositional genomes’ have been put forward, but in turn refuted by others. But in any case, there is an elegant natural solution to the question of how to attain increasing complexity, which is inherently compatible with evolvability.
The Alphabetic Solution
And here we return to the theme of molecular alphabets, generally defined as specific sets of monomeric building blocks from which indefinite numbers of functional macromolecules may be derived, through covalently joined linear string of monomers (concatemers). But how does the deployment of alphabets accomplish what non-alphabetic molecular systems cannot?
Here we can refer back to the above-noted issue of building complex molecules, and the problem of complexity regression for the necessary catalysts, and building the catalysts themselves. The special feature of alphabets is that, with a suitable suite of monomers, a vast range of functional molecules can be produced by concatenation of specific sequences of alphabetic members. We can be totally confident that this is so, given the lessons of both the proteins and nucleic acid alphabets. The versatility of proteins for both catalysis and many other biological functions has long been appreciated, but since 1982 the ability of certain folded RNA single strands to perform many catalytic tasks has also become well-known. And specific folded DNA molecules can likewise perform varied catalyses, even though such effects have not been found in natural circumstances.
So, nature teaches us that functional molecules derived from molecular alphabets can perform essentially all of the tasks required to operate and regulate highly complex biosystems. But how does this stand with synthetic demands, seen to be a crucial problem with complex natural non-alphabetic structures? Two critical issues are pertinent here. Firstly, an alphabetic concatemer can be generated by simply applying the same catalytic ligation process successively, provided the correct sequence of monomers is attained. This is fundamentally unlike a complex non-alphabetic molecule, where sites of chemical modification may vary and thus require quite different catalytic agents. The other major issue addresses the question of how correct sequences of alphabetic concatemers are generated. In this case the elegant solution is template-based copying, enabled through molecular complementarities. This, of course, is the basis of all nucleic acid replication, through Watson-Crick base pairing. Specific RNA molecules can thus act both as replicative templates and folded functional molecules. The power of nucleic acid templating was taken a further evolutionary step through the innovation of adaptors (transfer RNAs), which enabled the nucleic acid-based encoding of the very distinct (and more functionally versatile) protein molecular alphabet.
But in order to achieve these molecular feats, a certain number of underlying catalytic tasks clearly must be satisfied in the first place. These are required to create the monomeric building blocks themselves, and all the ‘infrastructure’ needed for template-directed polymerization of specific sequences of new alphabetic concatenates. But once this background requirement is in place, in principle products of any length can be created without the need for new types of catalytic tasks to be introduced. In contrast, for non-alphabetic complex syntheses, the number of tasks required will tend to rise as molecular size increases. In a large series of synthetic steps towards building a very large and complex non-alphabetic molecule, some of the required chemical catalyses may be of the same type (for example, two discrete steps both requiring a transesterification event). But even if so, the specific sites of addition must be controlled in a productive (non-templated) manner. This requires some form of catalytic discrimination, in turn necessitating additional catalytic diversity. Fig. 5 depicts this basic distinction between alphabetic and complex non-alphabetic syntheses.
Fig. 5. Schematic representation of catalytic requirements for alphabetic vs. complex (non-repetitive) non-alphabetic syntheses. For alphabetic macromolecular syntheses, a baseline level of catalytic tasks (here referred to as a ‘complexity investment’; of N tasks) allows the potential generation of alphabetic concatenates of specific sequences and of indefinite length – thus shown by a vertical line against the Y-axis (this line does not intercept the X-axis since a minimal size of a concatenate is determined by the size of the alphabetic monomers). For non-alphabetic complex molecules of diverse structures, as molecular size increases the number of distinct catalysts required will tend to continually rise, to cope with required regiospecific molecular modifications performed with the correct stereochemistry. It should be stressed that the curved ‘non-alphabetic’ line is intended to schematically represent a general trend rather a specific trajectory. Catalytic requirements could vary considerably subject to the types of large and complex molecules being synthesized, while still exhibiting the same overall increasing demand for catalytic diversity.
It must be noted that the above concept of a ‘complexity investment’ (Fig. 5) should not be misconstrued as arising evolutionarily prior to the generation of templated alphabetic syntheses. Progenitor systems enabling rudimentary templated syntheses would necessarily have co-evolved with the generation of templated products themselves. Yet once a threshold of efficiency was attained in direct and adapted templated molecular replication, a whole universe of functional sequences is potentially exploitable through molecular evolution.
And herein lies another salient point about molecular alphabets. As noted above, the secret of life’s ascending complexity is Darwinian evolution, and it is difficult to see how this could proceed with autocatalytic non-alphabetic systems. But variants (mutations) in a replicated alphabetic concatemeric string can be replicated themselves, and if functionally superior to competitors, they will prove selectable. Indeed, even for an alphabet with relatively few members (such as the 4-base nucleic acid alphabet), the numbers of alternative sequences for concatenates of even modest length soon becomes hyper-astronomical. And yet the tiny fraction of the total with some discernable functional improvement above background can potentially be selected and differentially amplified. Successive cumulative improvements can then ensue, eventually producing highly complex and highly ordered biological systems.
Metabolic Origins vs. Genetic Origins and Their Alphabetic Convergence
The proposed importance of alphabets leads to considerations of abiogenesis, the question of ultimate biological beginnings. Two major categories of theories for the origin of life exist. The ‘genetic origin’ stance holds that some form of replicable informational molecule must have emerged first, which led to the molecular evolution of complex biological systems. This school of thought points to considerable evidence for an early ‘RNA World’, where RNA molecules fulfilled both informational (replicative) and functional (catalytic) roles. But given difficulties in modeling how RNA molecules could arise de novo non-biologically, many proponents of the RNA World invoke earlier, simpler hypothetical informational molecules which were later superseded by RNA.
An alternative view, referred to as the ‘metabolic origin’ hypothesis, proposes that self-replicating autocatalytic sets of small molecules were the chemical founders of biology, later diversifying into much higher levels of complexity.
Both of these proposals for abiogenesis have strengths and weaknesses, but the essential point to make in the context of the present post is that it is not necessary to take a stand in favor of either hypothesis in order to promote the importance of molecular alphabets for the evolution of complex life. In a nutshell, this issue can be framed in terms of the difference between factors necessary for the origin of a process, and factors essential for its subsequent development. In the ‘alphabetic hypothesis’, molecular alphabets are crucial and inescapable for enabling complex biosystems, but are not necessarily related to the steps at the very beginning of the process from non-biological origins.
If the ‘genetic origin’ camp are correct, then alphabets are implicated at the very beginning of abiogenesis. On the other hand, if the opinions of ‘metabolic origin’ advocates eventually hold sway, molecular alphabets (at least in the sense used for building macromolecules from a limited set of monomers) would seem to be displaced at the point of origin. But the biological organization we see around us on this planet (‘Life 1.0’ ) is most definitely based on well-defined alphabets. So, both abiogenesis hypotheses necessarily must converge upon alphabets at some juncture in the history of molecular evolution. For genetic origins, a direct progression in the complexity of both alphabets themselves and their derived products would be evident, but a metabolic origin centering on autocatalytic small-molecule sets must subsequently make a transition towards alphabetic systems, in order for it to be consistent with observable extant biology. Thus, stating that alphabets enable the realization of highly complex biological systems refers to all the downstream evolutionary development once alphabetic replicators have emerged. No necessary reference is accordingly made to the role of alphabets at the beginning of the whole process .
A ‘Law of Alphabets’?
Now, the last issue to look at briefly in this post is the postulated universality of alphabets. It is clear that molecular alphabets are the basis for life on this planet, but need that always be the case? To answer this, we can revisit the above arguments: (1) Complex biosystems of any description must involve complex molecular interactions; (2) The demand for molecular complexity is inevitably associated with requirements for increasing molecular size; (3) Biological synthesis of a wide repertoire of large and complex functional molecules is difficult to achieve by non-alphabetic means; (4) The fundamental requirement for Darwinian evolution for the development of complex life is eminently achievable through alphabetic concatenates, but is difficult to envisage (and certainly unproven) via non-alphabetic means.
It is also important to note that these principles say nothing directly about the chemistry involved, and quite different chemistries could underlie non-terrestrial biologies. Even if so, the needs for molecular complexity and size would still exist, favoring in turn the elegant natural solution of molecular alphabets.
So if this proposal is logically sound, then it would indeed seem reasonable to propose that a ‘Law of Alphabets’ applies universally to biological systems. In a previous post, it was noted that an even more fundamental, but related ‘law’ could be a ‘Law of Molecular Complementarity’, since such complementarities are fundamental to known alphabetic replication. Indeed, it is difficult to conceive of an alphabetic molecular system where complementarity-based replication at some level is absent. Still, while complementarity may be an essential aspect of alphabetic biology, it does not encompass the whole of what alphabets can deliver, and is thus usefully kept as in separate, though intersecting, compartment.
To conclude, a biopoly(verse), delivered in a familiar alphabet:
If high bio-complexity may arise
In accordance with molecular size
Compounds that are small
Are destined to fall
And with alphabets, intricacy flies
References & Details
‘……The lac operon of E. coli regulates the production of enzymes……’ The story of the lac operon is a classic in molecular biology, included in most basic textbooks. The French group involved, led by Jacques Monod, won a Nobel prize for this in 1965. For a revisit of an old 1960 paper regarding the operon concept, see Jacob et al. 2005.
‘An inherent issue in this respect is molecular size.‘ See my recent paper (Dunn 2013) for a more detailed discussion of molecular size in relation to functional demands.
‘ Biological enzymes are protein molecules that can range in molecular weight….’ A case in point for a large enzyme, pertaining to the above lac operon, is the E. coli enzyme ß-galactosidase, which has 1024 amino acid residues and a molecular weight of 116 kilodaltons. For details on the structure of ß-galactosidase, see Juers et al. 2012.
‘ Titin is composed of a staggering 26, 920 amino acid residues…….’ See Meyer & Wright 2013.
‘…..small organic biomolecules can be effective in certain catalytic roles. ‘ See Barbas 2008.
‘……small molecule catalysis cannot accommodate the functional demands of complex biosystems. ‘ See again Dunn 2013 for a more detailed discussion of this issue.
‘…..the phenomenon of allostery…’ The lac operon again can be invoked as a good example of the importance of allostery; see Lewis 2013.
‘……a potential escape route from a linear infinite synthetic regression…..is in the form of a loop….’ A major proponent of autocatalytic loops and self-organization has been Stuart Kauffman, outlined (among with many other themes) in his book, The Origins of Order. (Oxford University Press, 1993).
‘…..The ‘chemical reality’ problem with theoretical autocatalytic systems has been elegantly discussed by the late Leslie Orgel. ‘ See Orgel 2008.
‘ Models for replication of such sets as ‘compositional genomes’ have been put forward, but in turn refuted by others.‘ For the model of autocatalytic set replication, see Segré et al. 2000; for a refutation of it, see Vasas et al. 2010.
‘……molecular alphabets, generally defined as specific sets of monomeric building blocks….’ See Dunn 2013 for a more detailed definition, and discussion of related issues.
‘……since 1982 the ability of certain folded RNA single strands to perform many catalytic tasks….’ The seminal paper on ribozymes came from Tom Cech’s group in 1983 (Kruger et al. 1982).
‘…….specific folded DNA molecules can likewise perform varied catalyses….’ See Breaker & Joyce 1994.
‘ This is fundamentally unlike a complex non-alphabetic molecule, where sites of chemical modification may vary…….’ Note that this statement does not include molecules such as polymeric carbohydrates, where these are composed of repeated monomers and thus relatively simple in their structures.
‘….the numbers of alternative sequences for concatenates of even modest length soon becomes hyper-astronomical. ‘ For example, in the case of an RNA molecule of 100 nucleotides in length, 4100 (equivalent to 1060) sequence combinations are possible.
‘…..quite different chemistries could underlie non-terrestrial biologies. ‘ See Bains 2004 for a detailed discussion of this issue.
Next post: September.
This post continues from the previous, which discussed the notion of laws in biology, and considered candidates for what might be the major contenders for universal ‘lawful’ status in this domain. Although at the end of this post, a series of possible universal biological dictates were briefly listed, the status of evolution as the primary biological law was highlighted. In turn, this places natural selection, the primer mover of evolution, in the spotlight.
But is this really the case? Is there a more fundamental law that underlies (or at least accompanies) all evolutionary processes? These issues are examined further in this post.
Complexity and Evolution
When discussing the priority of biological ‘laws’, semantics and definitions will inevitably enter the picture. Thus, while it might be acknowledged that evolution is ‘Law No. 1’, it might be proposed that more fundamental laws operate which enable evolution in itself. The ‘law’ of Darwinian natural selection immediately springs to mind, but other processes have been proposed as even more fundamentally significant.
In the previous post, the intriguing role of complexity as a putative ‘arrow of evolution’ was alluded to. Here it was also noted that this apparent ramping of complex functions could arise from ‘locking in’ of systems in increasingly intricate arrangements. In this view, where evolutionary success is associated with a structural or functional innovation which increases the overall complexity of an organism, reversing such a change in later generations may be unfeasible. This will occur when the selected evolutionary change has become enmeshed as a fundamental part of a complex network of interactions, where removing a key component cannot easily be compensated. Successive innovations which become ‘locked’ in a comparable manner thus lead to a steady increase in the net complexity of biological systems. Of course, this is not to say that all complex adaptations are irreversible, and a classic example of ‘complexity loss’ is the deletion of functional eyes from cave-dwelling animals living in complete darkness.
An interesting recent publication highlights a possible example of an evolutionary process that may lead to increased complexity. Gene duplication has long been acknowledged as an important driver of evolution, where formation of a duplicate gene copy allows continuation of the original gene function while allowing mutations in the second copy (or ‘paralog’), and accompanying exploration of new evolutionary space. An implicit assumption in such cases is that the expressed paralog does not interfere with the original gene function, but this may not apply where such functions depend on networks of co-operating protein-protein and protein-nucleic acid interactions. Johnson and colleagues have shown that this is indeed the case for certain duplicated transcription factor genes in yeast. As a result, a strong selective pressure exists for resolution of such paralog interference, one solution of which is mutational change which removes the interference effect, while simultaneously allowing the progressive emergence of novel functions. Such effects were noted by this group as being a potential source of increasing complexity, although this is not formally proven.
A complexity gradient is readily apparent between bacterial cells and single-celled eukaryotes, and in turn between the latter and the variety of multicellular organisms based on the eukaryotic cellular design. The key enabling event here was an ancient symbiosis between ancestral bacterial cells, resulting in the eventual evolution of mitochondria and chloroplasts as key eukaryotic organelles for energy production and photosynthesis respectively. The energetic endowment of mitochondria allowed the evolution of large genomes capable of directing the complex cell differentiation required for multicellular life. (And among the mechanisms for such genomic complexity acquisition, we can indeed note duplication events, as mentioned above).
And yet it is important to consider the putative evolutionary ‘drive’ towards complexity in terms of the biosphere as a whole. Whatever the evolutionary origin of biological systems of escalating intricacy, it is clearly not a global phenomenon. Only certain eukaryotic lineages have shown this apparent trend, while the prokaryotic biomass on Earth has existed in a similar state for billions of years, and will no doubt continue to exist as long as suitable habitats exist on this planet. (And here prokaryotes are clear masters at colonizing extreme environments).
Such observations are entirely consistent with the blind forces of natural selection. Change in a lineage will not occur unless variants emerge which are better-suited to existing or changing environments (including evasion or domination of natural predators or competitors). So, the question then becomes: is increased complexity simply a by-product or accompaniment to natural selection itself which may or may not occur, or is it an inevitability? Before continuing any further, it will be useful to briefly look at just how complexity might be assessed and measured.
Defining Biological Complexity
Perhaps consistent with its intuitive properties, a good definition of complexity in itself is not the essence of simplicity. One approach would seem logical: to perform a census of the range of different ‘parts’ that comprise an organism, where the total count provides a direct complexity index. (Obviously, by this rationale, the higher the ‘parts list’ number, the greater the perceived complexity). But a problem emerges in terms of the system level at which such a survey should be performed, since it can be in principal span hierarchies ranging from the molecular, organelle, and cell differentiation state, up to macroscopic organs in multicellular organisms. The physical size of organisms has also been noted as a correlate of complexity, but not a completely reliable one.
An additional and very important observations also suggests that a simple parts list of organismal components is at best a considerable under-rating of what may be the true underlying complexity. Biological systems are characteristically highly modular and parsimonious. This bland statement refers to the often incredible economy of informational packing in genomes, such that a basic human protein-encoding gene count of only approximately 20,000 can encode the incredible complexity of functioning human being. The baseline gene figure is greatly amplified by systems using separate gene parts (exons) in alternative ways, through RNA splicing and editing, and a gamut of post-translational modifications. But beyond this level of parsimonious modularity, the same gene products can perform quite distinguishable functions through differential associations with alternative expressed products of the same genome, corresponding to distinct cellular differentiation states. A far better account of complexity must therefore cover the entire interactome of an organism, but this is a far more onerous undertaking than a mere parts list.
And the levels of potentially encoded complexity don’t even stop there. Consider a protein A that interacts with proteins B and C in one cell type (α) within an organism, and D and E in another type of cell (β) within the same organism. The differential complexes ABC and ADE result from alternate programs of gene expression (cell type α having an expressed phenotype A+, B+, C+ D-, E-; while the β phenotype is A+, B-, C-, D+, E+), combined with the encoded structural features of each protein which enable their mutual interactions. The interaction of A with its respective partners is thus directly specified by the genome via regulatory control mechanisms. But indirect programming is also possible. There are numerous routes towards such a scenario, but in one such process a genomically-encoded gene A can be randomly assorted with other gene fragments prior to expression, such that a (potentially large) series of products (A*, A**, A***, and so on) is created. If a single cell making a specific randomly-created modification of the A gene (A*, for example) is functionally selected and amplified, then A* is clearly significant for the organism as a whole, yet is not directly specified by the genome. And the creation of A* thus entails a ramping-up of organismal complexity.
The ‘indirect complexity’ scenario is actively realized within the vertebrate adaptive immune system, where both antibody and T cell receptor genes are diversified by genomic rearrangements, random nucleotide additions (by the enzyme terminal transferase) and somatic hypermutation. And clearly the circuitry of the mammalian nervous system, with its huge number of synaptic linkages, cannot be directly specified by the genome (although here the details of how this wiring is accomplished remained to be sketched in).
These considerations make the point that defining and quantitating complexity in biological systems is not as straightforward as it might initially seem. In principle, a promising approach centers on treating the complexity of a system as a correlate of system information content. While this has been productive in many virtual models, it still remains an elusive goal to use informational measures in accounting for all of the above nuances of how biology has achieved such breath-taking levels of complexity.
A ‘Zeroth’ Law for Biology?
Where measures of complexity can be kept within a specified rein, multiple computer simulations and models have suggested that evolving systems do show a trend towards increasingly complex ‘design’. But in real biological systems, what is the source of burgeoning complexity? Is it somehow so inevitable that it needs the status of a ‘law’?
McShea and colleagues have proposed the ‘Zero-Force Evolutionary Law’ (ZFEL), which has been stated as: “In any evolutionary system in which there is variation and heredity, there is a tendency for diversity and complexity to increase, one that is always present but may be opposed or augmented by natural selection, other forces, or constraints acting on diversity or complexity.” This could be seen as a law where complexity is increasing ratcheted up over evolutionary time, through the acquisition of variations which may have positive, negative, or neutral selectable properties. If subject to negative selection, such variants are deleted, while positively-selected variants are amplified by differential reproductive success. Variants that are completely neutral, however, may be retained, and potentially serve as a future source of evolutionary diversity.
An interesting wrinkle on the notion of neutral mutations is the concept of conditional neutrality, where a mutation may be ‘neutral’ only under certain circumstances. For example, it is known that certain protein chaperones can mask the presence of mutations in their ‘client’ proteins which would be otherwise unveiled in the absence of the chaperone activity. (A chaperone may assist folding of an aberrant protein into a normal structural configuration, whereas with impaired chaperone assistance the protein may assume a partially altered and functionally distinct structural state). Such a masking / unmasking phenomenon has been termed evolutionary capacitance.
But is the ‘Zero-Force’ law truly that, or simply a by-product of the primary effect of Darwinian natural selection? (The latter was discussed in the last post as the real First Law of Biology). The above ZFEL definition itself would seem to embed the ‘Zero Force’ law as an off-shoot of evolution itself, by beginning with ‘In any evolutionary system……’. Certainly ZFEL may correctly embrace at least one means by which complexity is enhanced, but since the adoption or elimination of such candidate complexity is ultimately controlled by natural selection, it would seem (at least to biopolyverse) that it is a subsidiary rule to the overarching theme of evolution itself.
In any case, if a ‘zero-force’ law is operative, why has the huge biomass of prokaryotic organisms persisted within the biosphere for such immense periods of time? An interesting contribution to this question highlights the importance of an organism’s population size for the acquisition of complexity. In comparison with prokaryotes, eukaryotes (from single celled organisms to multicellular states) are typically larger in physical size but with smaller total population numbers. (Recall the above mention of the role of eukaryotic mitochondria in bioenergetically enabling larger genomes, and in turn larger cell sizes). In a large and rapidly replicating population, under specific circumstances a paralog gene copy arising from a duplication event (noted above as an important potential driver of complexity acquisition) has a significantly greater probability of being deleted and lost before it can spread and become fixed. Thus, from this viewpoint, a eukaryotic organism with a substantially reduced population base is more likely to accumulate genomic and ultimately phenotypic complexity than its prokaryotic counterparts. Once again, the origin of eukaryotes through the evolution of symbiotic organelles derived from free-living prokaryotes was an absolutely key event in biological evolution, without which complex multicellular life would never have been possible. And eons of prokaryotic existence on this planet preceded this development, suggesting that it was not a highly probable evolutionary step, perhaps dependent on specific environmental factors combined with elements of chance.
A complexity-enabling but highly contingent (and evidently rate-limiting) event such as eukaryogenesis does not create confidence in the operation of a regular biological law. And other ‘complexity breakthroughs’ are likely to exist. The ‘Cambrian Explosion’, where a variety of animal phyla with distinct body plans emerged during the beginning of the Cambrian era about 540 million years ago, may be a case in point. This ‘explosion’ of complexity in a relatively short period of geological time has long been pondered, although molecular phylogenetic data have suggested earlier origins of many phyla. Still, an intriguing suggestion has been that the first evolution of ‘good vision’ was an enabling factor for the rapid evolution (and thus complexification) of marine Cambrian fauna.
So increasing biological complexity seems to have more of a ‘punctuated’ evolutionary history than an inexorable upward trend. Fitting a ‘law’ into what is governed by environmental changes, contingency, and natural selection may be a tall order. But perhaps it is too early to say……
On that note, a non-complex biopoly-verse offering:
Within life, one often detects
A trend towards all things complex
Does biology have laws
That underlie such a cause?
Such questions can sorely perplex….
References & Details
‘ Johnson and colleagues……’ See Baker et al. 2013.
‘….the eventual evolution of mitochondria and chloroplasts….’ See the excellent book by Nick Lane: Power, Sex, Suicide – Mitochondria and the Meaning of Life. Oxford University Press, 2005.
‘…..a census of the range of different ‘parts’….’ See McShea 2002.
‘…..protein-encoding gene count of only on the order of 20,000…..’ Note the ‘protein-encoding’ here; if non-coding RNA genes are added, the count is much higher. See the GENCODE data base.
‘ The physical size of organisms has also been noted as a correlate of complexity….’ See a relevant article by John Tyler Bonner (2004).
‘…..Biological systems are characteristically highly modular and parsimonious. / A far better account of complexity must therefore cover the entire interactome of an organism…..’ The need to address the modularity of living systems in order to fully apprehend them has been forcefully argued by Christof Koch (2012).
‘McShea and colleagues have proposed the ‘Zero-Force Evolutionary Law’……’ See Biology’s First Law . Daniel W. McShea and Robert N. Brandon , 2010 , Chicago University Press , Chicago, IL, USA; also Fleming & McShea 2013.
‘Such a masking / unmasking phenomenon has been termed evolutionary capacitance.’ See a recent interview (Masel 2013) for a background on such capacitance phenomena, and further references.
‘In comparison with prokaryotes, eukaryotes (from single celled organisms to multicellular states) are typically larger in physical size……’ Exceptions exist where very large prokaryotes overlap in size with single-celled eukaryotes. In such cases, the giant prokaryotes are powered by multiple genome copies. On that theme, see Lane and Martin 2010.
‘In a large and rapidly replicating population, under specific circumstances a paralog gene copy arising from a duplication event ……has a significantly greater probability of being deleted….’ This requires more detail regarding gene duplication outcomes: Following a gene duplication event, a resulting paralog copy can acquire deleterious mutations and be lost, or rarely acquire advantageous mutations providing a positive selection (neofunctionalization). But another possible outcome is where deleterious mutations occur in both gene copies, such that both are required for the continuing fitness of the host organism. In such circumstances of subfunctionalization, the original functions of the single encoded gene product are thus distributed between the two duplicate copies.
Another significant point with respect to the population size argument of M. Lynch and colleagues is that selectable fixation of mutational variants will always be longer in large replicating populations than in small ones. Where subfunctionalization occurs after gene duplication, additional mutational changes can occur which completely inactivate one copy, a terminal loss of fitness. In a large population base, such events will act against the species-wide retention of gene subfunctionalization much more so than in small populations. The latter, therefore, are subject to relatively increased complexification as a result of the preservation of this type of gene duplication.
‘ This ‘explosion’ of complexity in a relatively short period of geological time has long been pondered, although molecular phylogenetic data have suggested earlier origins of many phyla.’ See Jermiin et al. 2005 for some discussion of these themes.
‘….the first evolution of ‘good vision’ was an enabling factor for the rapid evolution…..’ See Zhao et al. 2013 for a recent study, and discussion of this notion.
‘Fitting a [complexity] ‘law’ into what is governed by environmental changes…..’ See Auerbach & Bongard 2014 for an in silico study of environmental effects on the evolution of complexity. They find environmental complexity and model organismal complexity are correlated, suggesting complexity may only be favored in certain biological niches.
Next Post: April.