Parsimony and Modularity – Key Words for Life
Sometimes Biopolyverse has considered aspects of life which may be generalizable, such as molecular alphabets. This post takes a look at another aspect of complex life which is universal on this planet, and unlikely to be escapable by any complex biology. The central theme is the observation that the fundamental processes of life have an underlying special kind of economy, which may be termed biological parsimony. Owing its scope and diversity, this will be the first of a series dealing with this issue. Here, we will look at the general notion of parsimony in a biological context, and begin to consider why such arrangements should be the rule. Some biological phenomena would seem to challenge the parsimony concept, and in this initial post we will look at certain features of the protein universe in this respect.
In the post of January 2014, the role of biological parsimony in the generation of complexity was briefly referred to. The fundamental issue here concerns how a limited number of genes could give rise to massively complex organisms, by means of processes that shuffle and redeploy various functional components. Thus, the ‘thrifty’ or parsimonious nature of biological systems is effectively enabled by the modularity of a basic ‘parts list’. A modest aphorism could thus state:
“Parsimony is enabled by Modularity; Modularity is the partner of Parsimony”
The most basic example of modularity in biology can be found with molecular alphabets, which were considered in a recent post. Generation of macromolecules from linear sequence combinations of subunits from a distinct and relatively small set (an ‘alphabet’ in this context) has a clear modular aspect. Subunit ‘letters’ of an alphabet can be rearranged in a vast number of different strings, and it is this simple principle which gives biological alphabets immense power as a fundamental tool underlying biological complexity.
This and several other higher-level modular aspects of biological systems are outlined in Table 1 below.
Table 1. Major Levels of Biological Modularity.
- Molecular alphabets: For an extended discussion of this theme, see a previous post. The modularity of any alphabet is implicit in its ability to generate extremely large numbers of strings of variable length, with specific sequences of the alphabetic ‘letters’.
- Small molecular scaffolds: Small molecules have vital roles in a wide variety of biological processes, including metabolic, synthetic, and regulatory activities. In numerous cases, distinct small biomolecules share common molecular frameworks, or scaffolds. The example given here (perhydrocyclopentanophenanthrene skeleton) is the core structure for cholesterol, sex hormones, cardiac glycosides, and steroids such as cortisone.
- Protein folds: Although a large number of distinct protein folds are known, some in particular have been ‘used’ by evolution for a variety of functions. The triosephosphate isomerase (TIM) (β α)8 -barrel fold (noted as the example in the above Table) has been described as the structural core of >170 encoded proteins in the human genome alone.
- Alternate splicing / differential intron & exon usage: The seemingly low numbers of protein-encoding genes in the human genome is substantially boosted by alternate forms of the splicing together of exonic (actual coding) sequence segments from single primary transcripts. This can occur by skipping or incorporation of specific exons. Also, the phenomenon of intron retention is another means of extending the functionality of primary transcripts.
- Alternate / multiple promoters: Many gene products are expressed in different tissues or different developmental stages in multicellular organisms. This is often achieved through single promoters subject to differential activating or repressing influences, such as varying transcription factors, or negative regulation through microRNAs (miRNAs). Another way of extending the versatility of a single core gene is seen where greater than one promoter (sometimes many) are upstream of a core coding sequence. With this arrangement, the regulatory sequence influences on each promoter can be clearly demarcated, and transcripts from each alternate promoter can be combined with alternate splicing mechanisms (as above with (4), often with the expression of promoter-specific 5’ upstream exons. A classic example of this configuration is found with the microphthalmia gene (MITF) which has many isoforms through alternate promoters and other mechanisms.
- Recombinational Segments: As a means of increasing diversity with a limited set of genomic sequences, in specific cell lineages recombinational mechanisms can allow a combinatorial assortment of specific coding segments to produce a large number of variants. The modularity of such genetic sequences in these circumstances is obvious, and is a key feature of the generation of diversity by the vertebrate adaptive immune system.
- Protein complex subunits: Protein-protein interactions are fundamental to biological organization. There are many precedents for complexes made up of multiple protein subunits having distinct compositions in different circumstances. Thus, a single stimulus can signal very different results in different cellular backgrounds, associated with different protein complexes being involved in their respective signaling pathways. Enzymatic complexes, such as those involved in DNA repair, can also show subunit-based modularity.
- Cells: From a single fertilized zygote, multicellular organisms of a stunning range of shapes and forms can be grown, based on differentiation and morphological organization. Thus, cellular units can be considered a very basic form of biological modularity.
Discussion of both small molecule and macromolecular instances of modularity / parsimony will be extended in succeeding posts.
Some of these modularity levels are interlinked in various ways. For example, the evolutionary development of modular TIM barrels may have been enhanced by alternate splicing mechanisms. Indeed, the latter process may be of general evolutionary importance, particularly in the context of gene duplications. In such circumstances, one gene copy can evolve novel functions (subfunctionalization) sometimes associated with the use of alternate splice variation.
* Certainly this Table is not intended to be comprehensive with respect to modularity mechanisms, but illustrates some major instances as pertinent examples.
When a person is referred to as ‘parsimonious’, there are often connotations of miserliness, or a suggestion that the individual in question is something of a skinflint. In a biological context, on the other hand, the label of parsimony is nothing but a virtue, since it is closely associated with the efficiency of the overall biological system.
Pathways to Parsimony
When modular components can be assembled in different ways for different functions, the outcome is by definition more parsimonious than producing distinct functional forms for each task. An alphabetic system underlies the most fundamental level of parsimony, but numerous high-order levels of parsimonious assembly can also exist, as Table 1 indicates.
Evolution itself is highly conducive to parsimony, simply owing to the fact that multiple functional molecular forms can be traced back to a common ancestor which has diversified and branched through many replicative generations. As noted in the footnotes to Table 1, gene duplication (or even genome duplication) is a major means by which protein evolution can occur, via the development of functional variants in the ‘spare’ gene copies. It is the ‘tinkering’ nature of evolution which produces a much higher probability that pre-existing structures will be co-opted into new roles than entirely novel structures developed.
But there is a second evolutionary consideration in the context of biological parsimony, and that is where bio-economies, or bioenergetics, comes to the forefront. Where biosystems are in replicative competition, it is logical to assume that a system with the most efficient means of copying itself will predominate over rivals with relatively inferior processes. And the copying mechanism will be underwritten by the entire metabolic and synthetic processes used by the biosystem in question. Efficiency will thus depend on how streamlined the biosystem energy budget can be rendered, and the most parsimonious solutions to these questions will thus be evolutionarily favored.
If evolution is a truly universal biological feature (as postulated within many definitions of life) then bioparsimony is accordingly highly likely to be a universally observed principle in any biological system anywhere in the universe.
Counterpoints and Constraints: Protein Folding
Certain observations might seem to run in a contrary fashion to the proposed fundamental nature of parsimony and modularity in biology. Let’s initially take a look at protein folding as an initial case in point.
Folds and Evolution
Table 1 highlights the modularity of certain protein folds, but this is certainly not a ubiquitous trait within the protein universe. On the one hand we can cite the instances of specific protein folds which are widespread in nature, fulfilling many different catalytic or structural functions (as with the TIM-barrel fold; Table 1). Yet at the same time, it is true that many folds (>60%) are restricted to one or two functions.
While all proteins may ultimately be traceable back to a very limited set of prototypical forms (if not a universal common ancestor in very early molecular evolution), it appears that some protein folds are much more amenable to evolutionary ‘tinkering’ than others. This has been attributed to structural aspects of certain folds, in particular a property which has been termed ‘polarity’. In this context, polarity essentially refers to a combination of a highly ordered structural scaffold encompassing loop regions whose packing within the total fold is relatively ‘loose’ and amenable to sequence variation.
It follows logically that if mutations in Fold A have a much higher probability of creating novel activities than mutations in Fold B, then variants of Fold A will be more likely to expand evolutionarily (through gene duplication or related mechanisms). Here the TIM-barrel motif is a representative star for the so-called ‘Fold A’ set, which in turn are exhibitors of the polarity property par excellence.
While some natural enzymatic activities are associated with single types of folds, in other cases quite distinct protein folds can mediate the same catalytic processes. (Instances of the latter are known as analogous enzymes). It does not necessarily follow, however, that the absence in nature of an analogous counterpart for any given protein catalyst indicates that an alternative folding solution for that particular catalytic activity is not possible per se. In such circumstances, a potentially viable alternative structure (another polypeptide sequence with a novel fold constituting the potential analogous enzyme) has simply never arisen through lack of suitable evolutionary antecedents.
By their nature, the blind processes of natural selection on a molecular scale will favor certain protein folds simply by virtue of their amenability to innovation. If every catalytic or structural task could be competitively fulfilled by only a handful of folds, the protein folding universe would likely show much less diversity than is noted in extant biology. Evolution of novel folds will be favored when they are more efficient for specific tasks than existing structures. All of this is underpinned by the remarkable parsimony of the protein alphabet, especially when one reflects upon the fact that an astronomical number of possible sequences can be obtained with a linear string of amino acids corresponding to even a small protein.
Parsimony and Necessity
Although so far this musing on parsimony and modularity has barely scratched the surface of the topic as a whole, at this point we can round off this post by considering briefly why parsimonious bio-economies should be so ubiquitously observed.
Some aspects of biology which inherently invoke parsimony may be in themselves fundamentally necessary for any biological system development. For example, molecular alphabets appear to be essential for biology in general, as argued in a previous post. Likewise, while construction of complex macroscopic organisms from a relatively small set of cell types, themselves differentiated from a single zygote, can be viewed as a highly parsimonious system, there may be no other feasible evolutionary pathway which can produce comparable functional results.
But, as indicated by the above discussion of protein folds, other cases may not be quite so clear-cut, and require further analysis. Complex trade-offs may be involved, as with the factors determining genome sizes, which we will address in the succeeding post.
It is clear that evolutionary selection for energetic efficiency is surely a contributing factor to a trend towards biological parsimony, as also noted above. But apart from bioenergetics, one might propose factors in favor of parsimony which relate to the informational content of a cell. Thus, if every functional role required for all cellular activities (replication in particular) was represented by a completely distinct protein or RNA species, it could be speculated that the resulting scale-up of complexity would place additional constraints on functional viability. A great increase in all molecular functional mediators might be commensurate with a corresponding increase in deleterious cross-interactions, solutions for which might be difficult to obtain evolutionarily. Of course, such a ‘monomolecular function’ biosystem would be unlikely to arise in the first place, when competing against more thrifty alternatives. The latter would tend to differentially thrive through reduced energetic demands, if not more ready solutions to efficient interactomes. Consequently, it probably comes down to bioenergetics once more, if a little more indirectly.
Finally, a bio-polyverse salute to the so-called parsimony principle in biology:
Evolution can tinker with bits
In ‘designing’ selectable hits
Is a route to creation
Thus parsimony works, and it fits.
References & Details
(In order of citation, giving some key references where appropriate, but not an exhaustive coverage of the literature).
Some of the issues covered in this post were considered in the free supplementary material for Searching for Molecular Solutions, in the entry: SMS-Extras for Ch. 9 (Under the title of Biological Thrift).
Table 1 Footnote references:
‘ Small molecules have vital roles in a wide variety of biological processes……’ See the above supplementary downloadable material (Searching for Molecular Solutions –Chapter 9).
‘…..The triosephosphate isomerase (TIM) (β α)8 -barrel fold is known as the structural core of >170 encoded proteins….’ See Ochoa-Levya et al. 2013. Additional folds accommodating diverse functions are noted in Osadchy & Kolody 2011.
‘ The modularity of such genetic sequences in these circumstances is obvious, and is a key feature of the generation of diversity by the vertebrate adaptive immune system.’ For a general and search-accessible overview of immune systems, see the text Immunobiology 5th Edition. For an interesting recent hypothesis on the origin of vertebrate adaptive immunity, see Muraille 2014.
‘…..a single stimulus can signal very different results in different cellular backgrounds….’ / ‘ Enzymatic complexes, such as those involved in DNA repair, can also show subunit-based modularity.’ To be continued and expanded in a subsequent post with respect to parsimony involving proteins and their functions.
‘…..the evolutionary development of modular TIM barrels may have been enhanced by alternate splicing mechanisms.’ See Ochoa-Levya et al. 2013.
‘….the latter process [alternate splicing] may be of general evolutionary importance, particularly in the context of gene duplications…..’ See Lambert et al. 2015.
‘ If evolution is truly universal (as postulated within many definitions of life) …..’ See Cleland & Chyba 2002.
‘…..some natural enzymatic activities are associated with single types of folds…’ An example is dihydrofolate reductase (cited also in Tóth-Petróczy & Tawfik 2014), the enzymatic activity of which is mediated by a fold not used by any other known biological catalysts.
‘….in other cases quite distinct protein folds can mediate the same catalytic processes. (Instances of the latter are known as analogous enzymes).’ See Omelchenko et al. 2010.
‘…..an astronomical number of possible sequences can be obtained with a linear string of amino acids corresponding to even a small protein.‘ See an earlier post for more detail on this.
‘…..molecular alphabets appear to be essential for biology in general…..’ See also Dunn 2013.
Next Post: August.