Skip to content

Lesch-Nyhan Disease: A Moonlighting Example?

March 18, 2018

Previous posts of Biopolyverse (particularly that of August 2016) have considered the phenomenon of protein ‘moonlighting’, where many proteins have been well documented as possessing more than one distinct functional role. In this post, an example is considered where moonlighting is a possible explanation of a striking phenotype associated with a specific genetic lesion. The gene in question is HPRT1, encoding the enzyme hypoxanthine-guanine-phosphoribosyl transferase, which is important in cellular purine salvage pathways.

Terrible but Fascinating

In 1964, William Nyhan and Michael Lesch published a description of a hitherto unrecognized genetic disease, which has eponymously become known as Lesch-Nyhan syndrome, or Lesch-Nyhan disease (LND). It was later shown that this condition was characterized by the genetic loss of a specific enzyme, called hypoxanthine-guanine-phosphoribosyl transferase. (This has been abbreviated in various ways, including HGPRT, HGPRTase, and HPRT. For simplicity, the later term will be used here to refer to this enzyme). The gene encoding HPRT (HPRT1) is situated on the X-chromosome, and as a sex-linked gene, its functional loss is almost exclusively seen in males.

What does HPRT, in its role as a protein catalyst, normally do? It is actually well understood as a key mediator within a metabolic pathway which allows purines (chemical components of A and G nucleotides that are, along with pyrimidines, fundamental in DNA and RNA structure) to be recycled back into active biochemical function. This process is thus called a ‘salvage’ pathway, and is seen ubiquitously across all domains of life.  In humans a defective purine salvage pathway results in the accumulation of uric acid, which can cause gout and liver failure. Although this is indeed a feature of Lesch-Nyhan disease, it is not the known biochemical enzymology of HPRT which creates the compelling interest in this condition.  Since HPRT is expressed in all nucleated cells, it might be expected from first principles that its profound deficiency would not have particular organ-specific effects. And yet it most certainly does, in the nervous system, where a range of cognitive impairments are seen with afflicted children. Even this is not as striking as the ‘behavioral phenotype’ of self-mutilation, which is a hallmark of ‘classic’ LND.  This self-injurious behavior (SIB) can take the form of severe lip and finger biting, or other forms of self-damage, which have been graphically described in the literature. As a relentlessly compulsive affliction, it is a terrible burden on both victims and their carers, without any truly effective treatment. Thus, as many have noted before, loss of a single gene product can evidently have a profound effect on human behavior. While self-harming behaviors arising from genetic mutations in humans are not unique to LND, it is especially pronounced in the latter, and one diagnostic criterion for the ‘classic’ form of the disease. There are many ramifications of these observations for neurology, psychology, and even philosophy, all of which combine to produce an extra element of compelling interest into this compulsive disorder.

Abbreviations

De novo vs. Salvage

In general terms, a salvage enzyme can be seen as an energy-saving back-up, in order to promote the efficiency of house-keeping operations in any biosystem by recycling precursor compounds. As such, salvage has its own biological and evolutionary logic, and is accordingly a ubiquitous biological feature. The purine salvage processes mediated (in part) by HPRT are depicted in Fig. 1 below.

Fig1F-SalvagePaths

Fig. 1. HPRT and purine salvage pathways. Both the purines hypoxanthine and guanine are substrates for HPRT catalysis, with the additional requirement for 5’-phosphoribosyl pyrophosphate (PRPP) to provide an activated ribose moiety towards the formation of purine nucleotides. Guanosine monophosphate (GMP) is made directly from guanine and PRPP via the catalytic auspices of HPRT, while hypoxanthine as substrate requires additional enzymatic processing to yield either GMP or adenosine monophosphate (AMP). AMP is also produced by the activity of a different salvage enzyme, adenine phosphoribosyl transferase (APRT), which acts on adenine plus PRPP.

___________________________________________________________________

Human HPRT is a protein of 217 amino acid residues (Fig. 2), which requires magnesium for its catalytic activity, and is normally found as a tetramer (4 subunits of the monomeric protein).

Fig.2-HumHPRTFig. 2.  Structure of a monomer of human HPRT in complex with guanosine monophosphate (GMP). This image has been generated from Protein Databank entry 1HMP. Here alpha-helices, beta-sheets, and turns are indicated by red, green, and blue segments respectively.

___________________________________________________________________

A salvage pathway can be distinguished from synthesizing a biochemical compound from scratch, usually termed ‘de novo’ synthesis (from Latin, ‘of new’). Most organisms possess both synthetic approaches towards purines in their biochemical repertoires, such is its fundamental biological importance. Humans are certainly included among species which can synthesize their own purines as well as recycle them via salvage pathways. The de novo synthesis of purines is well-defined, involving numerous enzymatic steps, which result in the formation of multienzyme complexes termed ‘purinosomes’ under conditions of high purine demand.

Another moonlighting enzyme?

Following this background, we have seen that the enzymatic activity of HPRT is well-understood at the molecular level. So why should it be a candidate for having a functional moonlighting role, in addition to its normal catalytic task? Before commenting further, it will be useful to note the general meaning of moonlighting in this context, with reference also back to a previous post on this topic. A second functional role for an enzyme if true moonlighting is involved implies that the enzyme performs some function quite distinct from its normal catalysis. Thus, HPRT is sometimes noted as having two roles in terms of its ability to utilize more than one substrate (both guanine and hypoxanthine), but this is not moonlighting in itself. A moonlighting second function would necessarily involve something unrelated to the activity towards either conventional HPRT substrate (Fig. 1), which could also be quite distinct from purine salvage entirely. In general terms, a moonlighting function for any enzyme will necessarily involve interactions with other biomolecules beyond its conventional substrates or cofactors. Since an enzyme’s catalytic center is usually evolutionarily fine-tuned for a specific activity, a second functional site will logically be more likely found on a distinct domain of the same protein. This kind of functional separation has been observed with known polyfunctional proteins, but overlap between functional domains may also occur.

At first glance, imputing a moonlighting role to HPRT might seem quite rational, given the striking neurological phenotype of Lesch-Nyhan victims and the apparent difficulty of accounting for how absence of salvage enzyme function alone could bring about such higher-level behavioral abnormalities. Indeed, the possibility of a moonlighting role for HPRT in itself it is not an original suggestion with this post, and it would be quite surprising if it was. Yet the consideration of hypothetical HPRT moonlighting does not seem to have been much pursued in the literature.

There is in fact a likely reason for this apparent neglect, and that stems from a plausible manner in which HPRT deficiency could have deep ramifications beyond merely ensuring that DNA and RNA synthesis is kept up to speed.

Guanosine nucleotides – a definite dual role

As a term in biology, moonlighting is generally applied to proteins, but it is possible to extend its ambit broadly in general biosystems, in line with the notion of biological parsimony (internal ref). One such example can be found which is strongly relevant here, and that concerns the biology of guanosine nucleotides.

A very large and diverse family of mammalian cell surface receptors are called GPCRs, for G-Protein Coupled Receptors. As the name implies, these receptors use guanosine nucleotides (GDP and GTP) as part of a molecular on-off switch for the signaling process. So, in a truly parsimonious manner, guanosine nucleotides play a fundamental role in a wide variety of essential signaling transduction pathways, as well as being basic structural and informational constituents of nucleic acids. The possible relevance of this to the Lesch-Nyhan phenotype comes from two pieces of information: (1) the neurotransmitter dopamine acts as a ligand for five separate receptors, all of the GPCR class, and (2) perturbations in dopamine-mediated (dopaminergic) neural pathways have been reported in LND and associated experimental cellular models. The linking of these observations with the LND phenotype comes from the reasonable proposition that deficiency in the purine salvage pathway leads to a corresponding shortfall in neural GDP / GTP pools, with consequent disarray in certain (principally dopaminergic) neural signaling pathways. In turn, by this proposal such abnormalities ultimately result in the over-riding of the normal avoidance of self-destructive behavior.

A problem with this interpretation comes from direct measurement of guanosine nucleotide pools in normal vs. LND neural tissues. If salvage deficiency meant that neurones cannot maintain sufficient levels of purine nucleotides, it should be quantifiable, and yet experimentation has not borne this out. Defects in the salvage pathway trigger up-regulation of de novo purine synthesis, evidenced by high purinosome levels in Lesch-Nyhan cells, irrespective of disease severity. As a counter-argument to this, it has been suggested that de novo synthesis may be nevertheless unable to cope with certain circumstances of unusually high purine demand placed on certain neural cells in particular, given their requirements for GTP / GDP in signaling as well as nucleic acid turnover. Yet they are not only cell types with this extra requirement.

As an alternative possibility, it has been proposed that excessive de novo purine production in the absence of salvage pathways may lead to the accumulation of neurotoxic by-products, but this has not been well-substantiated. In that general vein, it has long been known that treatment of LND patients with an inhibitor of uric acid formation alleviates gout-like symptoms, but without effect on the SIB phenotype.

Patterns of mutations

In considering the possibility of a moonlighting role for HPRT, it is necessary to look at the possible effects of mutations in a protein’s amino acid sequence that could arise as a consequence of human germline mutations at the DNA level. Obviously, large deletions or rearrangements could destroy the entire coding sequence for a protein, and all possible functional activities of it would be irrevocably lost. The effects of single amino acid residue substitutions could range from protein misfolding and aggregation or degradation associated with global low activity, to having little or no bearing on effective function. In between these two extremes are mutations which are compatible with global protein folding (albeit with possible reduced stability), while associated with localized functional loss. This is depicted schematically in Fig. 3 below.

Fig3-MoonlightingProts

Fig. 3. Possible functional outcomes of point amino acid mutations in an enzyme with two functional sites (catalytic and a second unrelated Functional Site 2), which are physically separated in terms of the protein’s tertiary mature folded state. Many such mutations may be incompatible with proper folding, triggering cellular responses which recognize and degrade imperfectly folded proteins. In this schematic, a subset of mutations is also noted which are closely localized in either the catalytic site or the second Functional Site 2, where both are compatible with global folding and only adversely influence the local function to which they are proximal. In Category A, catalysis is preserved, but Function 2 is lost, while in the reverse-case Category B, catalysis is ablated by mutation, but the distal Function 2 site remains operational. For a real dual-site moonlighting protein, neither possibility might be possible owing to compromising of global folding, or only one of the two Categories might be feasible in practice.  Also, the ‘clean’ status of the Category A/B dichotomy may not be realized in a real-world situation, where a mutation severely reducing catalytic activity may not totally spare the second Function 2, but still allow enough activity to avoid the phenotypic consequences of complete Function 2 loss. And the reciprocal situation, with a completely null Site 2 mutation accompanied by only partial retention of catalysis, is also formally possible.

___________________________________________________________________

How then do observed patterns of mutations in the clinic with respect to the Lesch-Nyhan SIB phenotype and HPRT levels support or refute the moonlighting hypothesis? If the postulated second ‘neurological’ function of HPRT was associated with a completely distinct region of the protein to the enzyme active site, then it could be possible to find ‘clean’ mutations which knocked out one function while largely preserving the other function. In such circumstances, it would be possible in principle to find germline mutations resulting in enzymatically ablated or minimal HPRT activity, but where no SIB phenotype was manifested (Category B of Fig. 3). Likewise, a mutationally-induced SIB syndrome would be possible where no or little impairment of HPRT enzymatic activity could be found (Fig. 3, Category A).

Certainly mutants of HPRT which retain virtually full catalytic activity in vitro are known. In fact, a triple mutant with replacement of 3 cysteine residues with alanines has been used as an active surrogate of the wild-type enzyme for structural determination, given its greater resistance to oxidation over its natural counterpart. (Keough et al. 2005). It is possible, however, that these mutational changes could have an impact on the hypothetical moonlighting Function 2. These specific mutations have not been described in human patients, and of course cannot be deliberately introduced into the human germline in order to test their effects.

In fact, there have been reports of HPRT mutations in human patients which appear to be consistent with the Category B scenario of Fig. 3 (very low or ablated HPRT without the SIB phenotype). Such findings have been somewhat controversial, since many instances of ‘null’ HPRT activity (especially in relatively early reports) have been discounted by some groups as due to inadequate assay procedures or other experimental shortcomings. Yet it is not at all controversial that clinical HPRT deficiency in itself ranges over a spectrum, from virtually negligible activity even with sensitive assays, to relatively mild impairment. Across this spectrum, cases without the SIB phenotype but HPRT deficiency at various levels has been termed ‘mild’ LND or ‘nsLND’ (non-self-injurious), or grouped as a separate condition (Kelley-Seegmiller syndrome). These considerations aside, if HPRT is solely monofunctional as an enzyme, it would seem a little inconsistent to propose that certain neural cells have an elevated requirement for purine salvage as well as de novo synthesis (which accounts for the LND phenotype), and at the same time insist that even low residual levels of HPRT can allow escape from SIB but not other consequences of salvage deficiency, such as uric acid over-production and deposition.

The moonlighting hypothesis would be consistent with mild LND cases (without SIB) showing distinct patterns of mutations in HPRT, and this indeed has been observed. Nevertheless, it has been noted that the severity spectrum of LND correlates with HPRT enzyme function, where the lowest residual function is seen with the full SIB phenotype. For this to be fully accepted, it is necessary to discount reports of the SIB manifestation in the presence of an HPRT null phenotype, ascribed as noted above to assay problems. Even so, many mutations with global effects would by definition be expected to knock out all or most protein functions, so the correlation of SIB with the lowest enzymatic phenotypes does not exclude moonlighting at all. And a completely ‘clean’ mutation for Category B (Fig. 3) may not exist, where it has such a localized effect that only catalytic activity is ablated without affecting the hypothetical moonlighting function.

But what of the reverse-case Category A scenario of Fig. 3 (normal or effectively functional HPRT in the presence of the SIB phenotype). This is even harder to pin down, but reference to such circumstances has been made in the literature. An obvious problem in this regard, particularly with older reports, is that enzymatic measurements alone are insufficient information. Detection of normal HPRT enzymatic activity in the presence of a SIB phenotype does not answer the question of whether the patient’s corresponding HPRT gene bears any mutations consistent with a Category A (Fig. 3) scenario. Clinical observations that other genetic syndromes can have SIB-like phenotypes (as noted above) are especially pertinent in this regard. Thus, a formal demonstration of the Category A effect would require not only a case of compulsive SIB in association with normal (or near-normal) HPRT levels, but also definitive proof of a coding sequence mutation in the HPRT gene from the same individuals.

Pros and cons for HPRT as a moonlighting protein

If we apply the wisdom of Occam’s Razor, we look first to the simplest explanation that fits the facts, and certainly loss of a single well-characterized activity from the product of the HPRT1 gene is the simplest starting point when attempting to explain the observed Lesch-Nyhan phenotypes. Yet, where the facts demand it, additional complexities may need to be introduced – and biology is notoriously complex in this regard.

In the Table 1 below, some relevant experimental observations are listed, and their interpretations from both the monofunctional (HPRT catalysis as the sole activity of the gene product) and moonlighting interpretations.

Table 1

Table 1. Contrasting interpretations for experimental observations between the moonlighting hypothesis for HPRT (> 1 distinct functions) and the monofunctional catalytic stance. Some of these experimental findings have been discussed above; other relevant examples are also added.

It was of interest that artificially knocking out the murine HPRT gene does not result in a SIB phenotype. Subsequently it was reported that chemical inhibition of the additional salvage enzyme APRT (adenine phosphoribosyl transferase) in HPRT-null mice did result in SIB effects, and this was accordingly cited as an animal LND model. Unfortunately, this was not borne out by additional studies which showed that double APRT-HPRT mouse knock-outs lacked the SIB traits of Lesch-Nyhan victims. In any case, the very different neurological backgrounds between mice and humans could be interpreted as contributing to either moonlighting or monofunctional roles in humans. In the former case, a secondary moonlighting function (lacking in mice) could have been acquired evolutionarily long after the divergence between the common mammalian ancestors of rodents and primates; in the latter monofunctional scenario, an evolutionary role for an increased requirement for guanosine nucleotides in dopaminergic neural signaling (or other pathways) could be likewise proposed as having arisen after the rodent-primate divergence.

___________________________________________________________________

By way of a final note with respect to moonlighting itself, it has been noted that a subset of moonlighting proteins may operate by means of intrinsically unstructured domains, which can assume a specific conformation under the correct physiological circumstances. But from the diversity of known protein moonlighting examples, evidently this does not apply universally. Nevertheless, it is of interest to note that while HPRT does not possess intrinsically disordered domains, it has been shown to undergo exceptionally pronounced conformational changes during its catalytic cycle.  Conceivably, such flexibility could be relevant to its hypothetical second role as a moonlighter during interactions with another biological partner protein or other mediator(s).

In the absence of other information, the possibility of a second moonlighting function for HPRT could be experimentally evaluated with specific searches for its global proteomic interactions in a specific neural cellular background. If HPRT has a defined second role, in principle it should be demonstrable by finding novel partners of it within specific human neural interactomes, especially in comparison with its murine counterparts.

Finally, a global comment in a biopoly(verse) format:

 

As an enzyme, I think that you will find

HPRT has a role well-defined

But its functions may range

Since its loss is quite strange

With the effects that it has on the mind

 

References & Details

(In order of citation, giving some key references where appropriate, but not an exhaustive coverage of the literature).

In 1964, William Nyhan and Michael Lesch published a description of a hitherto unrecognized genetic disease…..’ See Lesch & Nyhan (1964). The first encounter of these medical researchers with this condition dates back to 1962, as described by Richard Preston in a New Yorker article in 2007.

The gene encoding HPRT (HPRT1) is situated on the X-chromosome….’     See Fu et al. 2014. Unlike males, females have two copies of the X chromosome, one of which is normally inactivated in each cell as a mechanism for gene expression dosage compensation. Since the inactivation process randomly operates on either X copy in a cell, if an X-linked gene is defective on one copy (heterozygous), on average 50% of cells will contain a functional X chromosome copy correctly expressing the gene of interest. This will reduce the overall expression level, and in the case of HPRT1 can be associated with an increased propensity for gout in affected females, depending on the specific mutation involved, or other factors. Sufficient HPRT expression from a single functional X chromosome HPRT1 copy is, however, sufficient to prevent the neurological symptoms of Lesch-Nyhan disease from arising.

‘……self-injurious behavior (SIB) can take the form of severe lip and finger biting, or other forms of self-damage…..’     See Preston 2007; Fu et al. 2014.

Human HPRT is a protein of 217 amino acid residues….’     The N-terminal methionine residue is naturally cleaved off, such that numbering of the HPRT sequence begins with the N-terminal alanine residue (Keough et al. 2005).

Structure of a monomer of human HPRT in complex with guanosine monophosphate ’ (Fig. 2).      It has been found that HPRT protein in the absence of either its substrates or products (as in Fig. 2, with GMP) is unstable in vitro. The free enzyme was eventually successfully crystallized and structurally resolved by the use of a triple mutant with 3 of the 4 natural cysteine residues replaced with alanines, a manipulation which did not affect its enzymatic behavior (Keough et al. 2005).

This image has been generated from Protein Databank entry 1HMP….’ (Fig. 2).      The structure was derived by Eads et al. 1994. The image itself was generated from the Protein Databank 1HMP entry with Protein Workshop software (Moreland et al. 2005).

Most organisms possess both synthetic approaches towards purines in their biochemical repertoires….’      A notable exception to this guideline are certain protozoan parasites, which have lost the de novo purine synthetic pathway in favor of exploiting freely available host purines through their own salvage pathways. See el Kouni 2003.

The de novo synthesis of purines involves numerous enzymatic steps….’      For more detail, see Biochemistry 5th Edition, Berg et al. 2002.

‘…..multienzyme complexes termed ‘purinosomes’….’      See Pedley & Benkovic 2017.

This kind of functional separation [between distinct protein sites associated with specific functions] ……’  See a previous post with specific reference to the Large T protein of SV40 virus, which has at least seven different sites performing different functions.

‘……the possibility of a moonlighting role for HPRT in itself it is not an original suggestion…..’      See Ceballos-Picot et al. 2009, for a mention of conceivable HPRT moonlighting in the Discussion of this paper.

‘…….the notion of biological parsimony……’      See previous posts 21 April 2015; 24 August 2015; 21 February 2016; and 28 August 2016.

‘……perturbations in dopamine-mediated (dopaminergic) neural pathways have been reported in LND….’      See (for example) Bell et al. 2016.   ‘….and associated experimental cellular models.’      See Kang et al. 2013, in a study of the effects of HPRT knock-down on developmental pathways of murine embryonal stem cells (ESCs). Once again, evidence for a role for HPRT in dopaminergic signaling / neural development does not preclude its functioning in a moonlighting role distinct from catalysis, as the latter could be directly connected with a neural signaling / developmental pathway in a manner that has no direct connection with the purine salvage catalytic role. In any case, whatever the effects of HPRT deficiency in the murine ESC system, complete ablation of HPRT in mice does not result in a SIB phenotype, as noted below.

A clinical observation consistent with the above-noted role of dopaminergic signaling in LND is a certain level of responsiveness of LND patients to D1 dopamine receptor antagonists in reducing SIB. With respect to this, see Khasnavis et al. 2016.

‘…….treatment of LND patients with an inhibitor of uric acid formation….’      The inhibitor is allopurinol, which blocks the activity of the enzyme xanthine oxidase. See Hitchings 1975; De Antonio et al. 2002.

‘……If salvage deficiency meant that neurones cannot maintain sufficient levels of purine nucleotides, it should be quantifiable…..’      See Bell et al. 2016;  Fu et al. 2015.

‘…..up-regulation of de novo purine synthesis, evidenced by high purinosome levels in Lesch-Nyhan cells….’   /   ‘ ….. it has been suggested that de novo synthesis may be nevertheless unable to cope with certain circumstances of unusually high purine demand….’ See Fu et al. 2015.

‘….in the absence of salvage pathways may lead to the accumulation of neurotoxic by-products…..’      See Sidi & Mitchell 1985.

‘…little of no bearing on effective function.’      Here “little” indicates a loss of efficiency that might have negatively selectable consequences for an organism in natural circumstances, but which would have small effects on human beings in most present day life situations. Of course, the flip-side is the possibility of a mutation with increased efficiency, which might under natural conditions become itself positively selected for.

‘…….a triple mutant with replacement of 3 cysteine residues with alanines has been used as an active surrogate of the wild-type enzyme for structural determination……’     See  Keough et al. 2005.

‘….clinical HPRT deficiency in itself ranges over a spectrum….’  /  ‘……mild LND cases (without SIB) showing distinct patterns of mutations in HPRT…..’   /  ‘…..the severity spectrum of LND correlates with HPRT enzyme function…..’     See Jinnah et al. 2010.

‘….reports of HPRT mutations in human patients which appear to be consistent with the Category B scenario of Fig. 3 (very low or ablated HPRT without the SIB phenotype).’      See Rijksen et al. 1981 (with discussion of other such examples); Bayat et al. 2014; Bell et al. 2016.

‘……many instances of ‘null’ HPRT activity (especially in relatively early reports) have been discounted by some groups as due to inadequate assay procedures or other experimental shortcomings.’     See Fu et al. 2014.

‘…..a separate condition (Kelley-Seegmiller syndrome).’     See Kelley et al. 1969.

‘……reference to such circumstances [the Category A scenario of Fig. 3 (normal or effectively functional HPRT in the presence of the SIB phenotype] has been made in the literature.’     See Rijksen et al. 1981, who cite Etienne et al. 1973; Encéphalopathie hypèruricosurique avec auto-mutilations. Rev Rhum Mal 40: 265-270.

Table 1. ‘…….artificially knocking out the murine HPRT gene did not result in a SIB phenotype…….’  See Finger et al. 1988.

‘…chemical inhibition of the additional salvage enzyme APRT (adenine phosphoribosyl transferase) in HPRT-null mice did result in SIB effects…..’     See Wu & Melton 1993.

‘….additional studies which showed that double APRT-HPRT mouse knock-outs lacked the SIB traits of Lesch-Nyhan victims.’     See Engle et al. 1996.

‘….a subset of moonlighting proteins may operate by means of intrinsically unstructured domains….’     See Tompa et al., 2005.

‘…..the diversity of known protein moonlighting examples….’      See the moonlighting protein database, as of 2018, and its associated publication (Chen et al. 2018). As an additional note, while this database holds no entry for HPRT, it does contain a representative of a different nucleotide salvage pathway, in the form of human thymidine phosphorylase, which acts both in the salvage-related production of thymidine monophosphate, and with the unrelated function of platelet-derived endothelial cell growth factor.

‘…..HPRT ……. has been shown to undergo exceptionally pronounced conformational changes during its catalytic cycle.’      See Keough et al. 2005.

‘….a second moonlighting function for HPRT could be experimentally evaluated with specific searches for its global proteomic interactions in a neural cellular background.’     These include high-throughput 2-hybrid assays or mass-spectroscopic approaches.

Next post: April-May.

 

 

 

Advertisements

The Efficient Science Hypothesis

January 30, 2017

This post is somewhat unusual by the standards of Biopolyverse. Its inspiration lies in a book called Connectome, by Sebastian Seung, but does not dwell on the central theme of this excellent and well-written volume (the nature and significance of the patterns of human neural connectivity). It rather picks up and runs with what is, in the context of the book as a whole, a mere aside to the general reader.

So just what is the Efficient Science Hypothesis? Seung based his brief proposal on the Efficient Market Hypothesis, and for a definition of that in turn, I provide one given online within Investopedia:

The efficient market hypothesis (EMH) is an investment theory that states it is impossible to “beat the market” because stock market efficiency causes existing share prices to always incorporate and reflect all relevant information. According to the EMH, stocks always trade at their fair value on stock exchanges, making it impossible for investors to either purchase undervalued stocks or sell stocks for inflated prices. As such, it should be impossible to outperform the overall market through expert stock selection or market timing, and that the only way an investor can possibly obtain higher returns is by purchasing riskier investments.

The Efficient Science Hypothesis (ESH) is thus an analogy of this economic proposal in the world of scientific endeavor, where it can be simply framed in terms of the scientific information and tools that are generally known and available. If the ESH is in fact correct, a single researcher or group cannot ‘beat the market’ by getting a leading advantage over others by using freely available materials and knowledge. Their competitors will be equally able to exploit what is on hand, and will be just as capable of devising the necessary experiments to unravel currently unsolved scientific problems. Thus, science will advance at a similar rate irrespective of who first makes a discovery by a marginal time factor, and irrespective of the specific players. (Should any specific individual or group be suddenly eliminated by some disaster, science as a whole will not be much retarded for a noticeable time). Throwing money, people, and hard work at a problem may get you ahead momentarily, but not for very long in scientific arms races between competitors on an equal footing from available technological and knowledge-based resources.

So the ESH would essentially propose.

Science to Technology and Back Again

A corollary of Seung’s ESH is the role of technology as a potential deal-breaker, or a means of beating the competition and thereby turning the tables on an otherwise level playing field. In this viewpoint, an individual or group who devise and implement a new technology that is applicable to their field of study will have, at least for a short period of time, an unassailable advantage over the rest. Within the analogy with the Efficient Markets Hypothesis, this kind of boost would correspond to some special insight into market conditions, without necessarily invoking unethical insider trading.

Of course, if the ESH was universally applicable in a very general sense, then as soon as a new technology was practically feasible (through the advent of previous technologies), then it would be rapidly and independently latched onto by the relevant international research community. In other words, a new technology would be ‘efficiently’ conceived and developed by independent workers as soon as it was enabled by the current ‘state of the art’, as patent attorneys would phrase it. And once again, no single individual or group would be able to surge ahead of their competitors.

Sometimes, it does seem as though this is indeed the case. Consider an example in this regard, where the central technology is the polymerase chain reaction (PCR), a ubiquitous process in molecular biology. Essentially, PCR involves the amplification of DNA strands through successive cycles of annealing of specific DNA primers (oligonucleotides) with a desired complementary template, enzymatic extension from the primers with a thermoresistant DNA polymerase, and thermal denaturation of the resulting duplex strands in order for the cycle to resume. Application of PCR thus allows potentially even a single DNA molecule to be amplified millions of times. Within the framework of this core amplification technology, which has been available and universally disseminated since the mid-1980s, many workers around the world seized upon its potential applications in an variety of circumstances. In 1990, at least four separate reports were published for essentially the same PCR application, where a hitherto unknown sequence specificity for a DNA-binding protein can be defined. (Prior to this, a protein could be shown to possess general DNA binding affinity, but testing whether it bound selectively in a sequence-specific manner, and (if so) defining the precise binding sequence, was often a difficult task). In a publication emerging in the following year dealing with a similar approach, the acronym CASTing (for Cyclic Amplification and Selection of Targets) was coined, and is most commonly applied to this technology. The details of this approach are not important in a discussion of the ESH, but an overview of it is provided in Fig. 1 for general background.

casting-techFig. 1. Outline of the CASTing process for defining the sequence specificity of a DNA-binding protein. An oligonucleotide is synthesized where a random tract (Nx, where typically x is around 20 bases in length) is flanked by defined sites for PCR primers. The complementary strands for the whole population are synthesized by extension from the ‘reverse’ primer, and the resulting duplexes are incubated with a DNA-binding protein of interest. Then, it is necessary to partition remaining free oligonucleotides from those which interact with the binding protein. This can be done in a variety of ways, including immunoprecipitation of the protein/complexes with specific antibody, gel electrophoresis, or by means of rendering the protein onto a solid-phase matrix and washing away unbound material. The partitioning process provides a subpopulation of oligonucleotides enriched for the binding site sequence of interest. This subpopulation is then amplified, and the process repeated through sufficient cycles (typically 6-10) to enable direct identification of binding sequences through cloning and sequencing. Confirmation of candidate sequences is then sought through direct binding assays.

_________________________________________________________________

So this example can be raised in support of the ESH. Yet on closer inspection, as with CASTing. usually such convergent developments flow from an application of a particular pre-existing (though recent) technology. New technologies often have many applications beyond that which was initially aimed for. Advances of this kind could be categorized as ‘subtechnologies’ that spring from a core pre-existing technological structure. The ideas that are inherent in the original innovation, especially if it is revolutionary and widely used, have a high probability of occurring in many minds at the same time, thereby increasing the chances of convergent thoughts and co-incidental subtechnological advances. To use a very old phrase, one might say that such developments stem from ideas ‘whose time has come’.

Numerous additional examples can be cited concerning the proliferation of subtechnologies from a central technological advance. A vivid contemporary case in point is the seemingly endless applications of the CRISPR gene-editing technology, which in the space of a few years have emerged from multiple different laboratories. Beyond gene inactivation and editing, such downstream reworkings of CRISPR capabilities include the targeting of transcriptional activation or repression, RNA modulations, and epigenetic engineering processes. A full discussion of these is outside the scope of the present post.

Technology and Enablement

Technology builds on technology, and science advances with improved technological tools. This is depicted in Fig. 2 below, where a simple loop (A) can be broken down further, by indicating that a given technological development may lead directly to various subtechnologies (B; and as noted above), each of which can feed in turn back into the river of scientific development.

scitech-links

Fig. 2. Interactive loops between science and technology.

_________________________________________________________________

Before a given technological development can be conceived and implemented, it is thus necessary that a body of theory and practice exists upon which the new invention can be built. Here we can look at the core technology for the above CASTing example, or the basic PCR approach itself. Fig. 3 shows a list (not necessarily exhaustive) for both knowledge-based items (basic science) and the technological background which underpin the PCR technique. Combined, these prior developments enable the implementation of the outgrowth PCR advance, and thus in turn PCR-based subtechnologies, such as the above CASTing example.

pcr-enablement

Fig. 3. Knowledge and prior technologies enabling the development of the Polymerase Chain Reaction (PCR).

_________________________________________________________________

Still, even when all the various enabling bits and pieces are on hand, it still takes a human mind to join up the dots and formulate a new technological or scientific advance. Although Kary Mullis is acknowledged as the originator of PCR, it is notable that the conceptual basis of PCR cycling was described well over a decade earlier by Har Khorana’s group, but before some of the key enabling factors for PCR (Fig. 3) were readily available.

Many Minds and Many Labs Converging Towards a ThrESHold?

Taking a broad historical view, even when all necessary enabling components appear to be available, in some circumstances an advance does not occur for long periods of time. In an interesting essay, Jared Diamond noted that several areas of scientific progress could actually have occurred in ancient times, if people with the appropriate mind-set had acted accordingly. The fields of study that were potentially compatible with such early investigations included the classification of species, biogeography and comparative linguistics. So here the potential for considerable advancement of knowledge existed in the presence of ‘equal opportunity’ natural resources and general background information, but was never acted on.

Yet this does not directly conflict with the ESH, by simply pointing to the ‘S’ within the acronym. Where a general framework of systematic scientific investigation is lacking, no science is performed in the modern sense, efficiently or otherwise. The ancient Greeks, for example, indulged in plenty of speculations about the nature of things, but were not inclined to collect data or experiment. Still, even though no scientific tradition or infrastructure was in place, it is formally possible that a lone genius in those far-off times might have pioneered one of these possible lines of proto-scientific studies, and even founded the beginnings of systematic ‘natural philosophy’, as science itself was once termed.

It certainly could be argued that leaving aside the lack of any scientific tradition, there are many cultural features that would have had strong influences on the likelihood that any talented individual could have succeeded in a proto-scientific endeavor. For example, the opportunities for such activities among nomadic wanderers or impoverished subsistence cultures would have been virtually non-existent. But this raises another issue relevant to the ESH: the population size of participating individuals. This is simply based on the reasonable premise that the probability of a key innovating individual emerging is directly proportional to the available population base. Of course, within any ‘favorable’ culture only a small proportion of the populace in turn would be ‘available’ as potential innovators.

In ancient times, the global population was much less than now, and clearly even a generous assignation of cultures (and their internal elites) that could conceivably enable the genesis of scientific undertakings would reduce the ‘available’ population pool much more. There is another factor, implicit within the ESH but crucially important to it, that is also very relevant to these ‘ancient science’ considerations: communication. In order for any version of the ESH to exist, dissemination of scientific findings must be made as rapidly as possible. Where writing and copying themselves are highly rate-limiting steps, and lines of communication are poor if present at all, clearly no ‘ancient ESH’ could have been viable.

So from the ancient world we bounce back to modern times, where the population base of relevant individuals is immensely higher, and where communication via the internet is all but instantaneous. If we imagine an ‘ESH Threshold’ (or ThrESHold for short) as needing a requisite population size and communication rate to be attained, then have we already converged to this point?

Certainly if it was agreed that scientific progress in the modern world is most likely to have a uniform trajectory, through widespread parity of researchers in terms of skills and background knowledge, then the ESH would be applicable. But what about revolutionary developments emerging from the minds and hands of super-gifted individuals?

Individuals and Serendipity

Major scientific advances have historically been made by individuals, not groups. Innovations in the past have stemmed from specific minds, giving themselves a leading edge over competitors of the day. For a long time, the classic ‘Why Didn’t I Think of That?’ (WDITOT) effect prevailed – where an idea is a relatively straightforward combination of two or more pieces of knowledge that were freely available at the time of its conception, but only put together initially by a single individual. (And where many contemporaries rue their lack of comparable insight after the fact). Where a relatively small population base of possible participants existed, the WDITOT effect had a reasonably high probability of occurring. Yet as the ThrESHold is approached with an increasing participating population and better communications, the likelihood of discoveries occurring only through single-individual eureka moments diminishes accordingly in favor or multiple contemporaneous discovery events.

It might be conceded that for some ‘timely’ innovations, many modern minds are primed to develop them at more or less the same juncture. Yet what about the real quantum leaps requiring stunningly original insights? Surely a uniform field of progress along the lines of the ESH would not apply there? In principle, this would seem to be the case, but it is not clear that such stand-out events have been occurring as they did in past. Thus, it has been a common question of many people to ask, “Why are there no more Einsteins? (or where are they, or some variation on this theme). A number of different answers to this have been proffered, including the notion that it’s simply much harder for anyone (no matter how smart) to make a revolutionary contribution, since all the low-hanging fruit of humanity’s quest for knowledge have already been picked. Another answer suggests that it’s wrong to think there are no more Einsteins, since there are in fact many in contemporary research, making it that much harder to stand out from the pack. Yet another point made in this regard is the supposition that bright young scientists who are most likely to have highly novel insights are too diverted into relatively mundane work in order to publish and establish their careers. If so, this would constitute a novel ‘cultural factor’ limiting innovations and ground-breaking work that could be made, even if one assumes that the ‘raw material’ of knowledge that could enable new breakthroughs already exists. (This is by analogy with the above note regarding the limitations of ancient science by cultural imperatives, and suggests that such effects many not be entirely discountable in the present day).

And then there is the matter of serendipity, where a chance-based observation can lead to a hitherto unprecedented line of investigation, or on occasion a dramatically radical insight that leaps over conventional thought. Surely this kind of development is a wild-card event that cannot be accommodated by the smoothing effect of the ESH? Certainly in general, an advance-by-chance could in principle provide a group with a lead in a field, perhaps analogously to an entirely fortuitous market investment, to use the original inspiration for the ESH in terms of market efficiency. But in a world at or past the ThrESHold, even serendipitous discoveries may be influenced by the scientific zeitgeist. Consider a contemporary scientific problem which is being investigated globally, and for which a limited number of experimental pathways are available. Given this, there is a high probability that a specific line of experiments will be undertaken by multiple independent groups, and experimental steps which afford the opportunity of leading to a serendipitous and unexpected observation will be likewise performed independently. With that background, it is then down to the experimental observer as to whether he or she will pick up on the novel observation and ‘run with it’. Here we can be reminded of the famous aphorism attributed to Pasteur, “Fortune favors the prepared mind”.

Of course, this scenario only applies to a subset of possible serendipitous opportunities, excluding cases where a truly unpredictable event such as a laboratory accident provided the key observations. Even that might tend to be smoothed out to some extent if the participating population is sufficiently large, but attaining such population numbers on a finite planet seems exceedingly unlikely.

Conclusions

If the ESH applies, then the proposition that a technological advance is a way to temporarily escape the smooth landscape of progress falls down, since a technological innovation itself is highly likely to emerge repeatedly and independently when times are ripe (as exemplified with the CASTing example above).

As the global pool of workers engaged in science and technology grows, so does the likelihood that parallel discoveries will be made. So the growth of the collective pool of investigators in specific fields will tend to reach a threshold (the ‘ThrESHold’) that converges with an approximation of state of affairs as postulated by the ESH. Nevertheless, individual genius and at least some forms of serendipity provide opportunities for a temporary leap-frogging of the consensus approach to advancement of a particular field.

Finally, we can make note of ‘temporary’ caveat above. A scientific ideal is the publishing and dissemination of results, but this is very far from merely an unworldly principle to those confronted with the ‘publish or perish’ attitude to scientific career promotion. So an individual career may be enhanced by timely publication of revolutionary results, but the field as a while will be able to profit from the advance involved. In this sense, any deviation from the ESH is soon ‘corrected’ by knowledge dissemination. An important exception is clandestine research carried out by governments with military or state security potential, where an advance edge may have far-reaching repercussions. Even here, though, with the passage of time such scientific secrets tend to be disseminated one way or the other.

There we will leave it, but with a final biopoly-verse salute to the wild-card elements by which the smooth progress of Efficient Science may be for a brief time by-passed:

So does science move in a dance

With the latest techno-advance?

But what of the dreamers

The players, the schemers,

Who stumble on things by mere chance?

References & Details

(In order of citation, giving some key references where appropriate, but not an exhaustive coverage of the literature).

‘……a book called Connectome, by Sebastian Seung…..’ In full, the title is Connectome: How the Brain’s Wiring Makes Us Who We Are; Mariner Books, 2013.

‘…… at least four separate reports were published for essentially the same PCR application……’ These are: Blackwell et al. 1990; Mavrothalassitis et al. 1990; Thiesen & Bach 1990 and Pollock & Treisman 1990.

‘…….the acronym CASTing (for Cyclic Amplification and Selection of Targets) was coined, and is most commonly applied to this technology…..’     See Wright et al. 1991. Despite the priority of the above 1990 reports, the CASTing acronym may have trumped alternatives due to its catchy appeal, and its evoking throwing a net into a sea of sequences in order to find the desired one.

‘……..it still takes a human mind to join up the dots…..’     Perhaps not for much longer, at least in some circumstances, given the rapid progress in recent times of neural-net based machine-learning artificial intelligence.

‘……Kary Mullis is acknowledged as the originator of PCR…..’     See Saiki et al. 1985, and Mullis 1990, for a personal account of his discovery. The initial PCR reports did not use a thermostable DNA polymerase as listed in Fig. 3, but all the full potential of PCR was not realized until heat-resistant polymerases from thermophilic organisms (such as Taq polymerase) began to be used. Mullis received a Nobel Prize for the PCR innovation in 1993.

‘……the conceptual basis of PCR cycling was described well over a decade earlier by Har Khorana’s group……’     See Kleppe et al. 1971. Ironically, Khorana was a major contributor to the development of practical oligonucleotide synthesis, which is an essential enabling technology for PCR. He had already received a Nobel Prize (in 1968 for work associated with unraveling the genetic code) before this paper was published.

‘………Jared Diamond noted that several areas of scientific could actually have occurred in ancient times…..’     See Diamond, J. in This Idea Must Die (p. 486; 2015, John Brockman, Ed.).

‘……it’s wrong to think there are no more Einsteins, since there are in fact many in contemporary research……’     This kind of proposition was made by James Gleick, in his biography of Richard Feynman (as cited in a Scientific American blog).

‘……bright young scientists who are most likely to have highly novel insights are too diverted into relatively mundane work…..’ This has been raised by the prolific science writer Philip Ball, in an article in The Guardian.

Next post: Early 2018.

Protein Multifunctionality and the Interactome

August 28, 2016

anu

A theme from several previous biopolyverse postings has highlighted the impressive economies of many aspects of complex biological systems, termed biological parsimony. As with the previous post, here the general notion of biological parsimony is explored in terms of the network of interactions that allows a complex organism to operate, or its interactome. For the present purposes, the emphasis is placed upon the role of multi-functional proteins in greatly extending the functional range of proteomes beyond what is directly encoded in their corresponding genomes. Protein functions routinely define some form of molecular interaction, so protein multifunctionality is one facet of the interactome.

 Proteins in the Moonlight

In any complex system, it is clearly an economy if a specific component has more than one function. This could arise from an inherent multifunctional property of the component, or via differential interactions of the component with other factors within the same complex system. Although there may be overlap between these two possibilities, they are not identical, and this is reflected in certain definitions. With respect to the latter ‘differential interactions’ category, we could reflect back to the previous post, where (among other things) the multi-faceted protein LIF (Leukemia Inhibitory Factor) was discussed. LIF could be considered ‘exhibit A’ as an example of pleotropism, but for the present purposes we can look to a different kind of multi-functionalism.

It is well-documented that many proteins go beyond multi-partner interactions and multiple signaling effects, and exhibit completely distinct functions under specific conditions. The first description of such protein multi-tasking was with the eye structural protein crystallin, which can also act as a specific enzyme. This effect, thus distinguishable from pleotropy, has been referred to a number of ways, but the term ‘moonlighting’ seems to have won the day. (This topic was noted briefly in a previous post devoted to the theme of parsimony (August 2015). The coining of the term is often attributed to Constance Jeffery in 1999, who has done much to promote the field, but at least one use of ‘moonlighting’ (in a similar context) in the literature prior to this date has also been recorded.

A key feature of moonlighting in its current conceptual definition is the functional use of a different region of a protein over the region(s) associated with its ‘original’ function. (The ‘original’ function in this context refers simply to what was historically described first, and says nothing about its relative biological importance). Thus, an enzyme showing ‘promiscuity’ in terms of being able to use more than one specific substrate at its active site is not moonlighting by this definition. With this caveat in mind, many examples of the moonlighting phenomenon have been described since the original discoveries.

How widespread is the propensity of proteins to ‘moonlight’? Inextricably entangled with this seemingly simple question is the deeper issue of knowledge limitations. A ‘non-moonlighting’ (monofunctional) protein is pigeon-holed as such owing to an absence of any evidence for additional functions beyond its standard role. Yet, as the old adage goes, absence of evidence is not the same as evidence of absence. It might be thought that genetic diseases (or knockout genes in animal models) with a single clear-cut and biochemically substantiated loss-of-function phenotype define protein products without moonlighting activities. But this in itself does not rule out additional roles for such a gene product, since the often-observed effect of genetic redundancy could in principle explain why the absence of the protein of interest does not produce additional loss phenotypes. In other words, if the moonlighting role of the protein of interest (Protein 1) was Function B, other ‘back-up’ proteins might perform Function B as well, and mask the absence of Protein 1. At the same time, it might well be noted that if the genetically observed role of Protein 1 was Function A, then obviously no functional redundancy for the A function could exist, or a single-loss genetic lesion producing the A-deficit would not have been observed in the first place. In turn, this hypothetical arrangement prompts the supposition that some proteins might have a ‘main’ role, and one or more ‘secondary’ roles for moonlighting duties. In some cases, this distinction might be relatively straightforward, but is unlikely to always be so. Issues such as this render the moonlighting concept not as simple as it is sometimes made out to be.

The converse of the ostensibly tidy scenario when a single gene defect produces a well-defined loss-phenotype is where the interpretation of a single genetic lesion is considerably complicated by the gene product’s propensity for moonlighting. This effect has been well-documented in the case of gene defects for metabolic enzymes, where specific cases might be expected to elicit a phenotype that is largely if not totally predictable in principle. Thus, if enzyme X acting on substrate U to create product V is genetically inactivated, then the resulting phenotype may be predicted on the basis of what effects a short-fall in production of V might have, or perhaps the consequences of failure to process and remove excess U. But such a prediction might at best tell only a part of the story, if X has a completely unpredicted but important moonlighting role in a very distinct cellular function.

 Intrinsically unstructured proteins and multifunctionality

In the context of multifunctionality and moonlighting, it is worth singling out an interesting category of proteins as being of special significance. Not so long ago (before the mid-1990s), proteins were regarded as always possessing a well-defined and highly ordered folded structure which enabled their functions, and in many cases this indeed applies. But it is now recognized that an important subset of proteins do not possess such an ordered structure, at least before they interact with a binding partner. Proteins of this nature may be completely disordered, or possess an ordered domain linked to a domain lacking specific order. Such is the extent of this phenomenon that up to 40% of eukaryotic proteins are mostly disordered, and >50% have a significant region of low folding order.

Intrinsically unstructured proteins or protein domains in some cases possess the ability to interact with more than one binding partner, in such a manner that the originally structured region assumes different conformations in different binding circumstances. Where this occurs, the binding partner protein may act as a template for the originally unstructured protein, in directing folding towards a specific configuration. Even if this templating effect for a particular unstructured protein is restricted to a limited set of partner proteins, the implications for protein moonlighting are clear, and this has been noted for over a decade. A protein that can assume alternative forms and functions based on the presence of different potential binding partners has in-built modularity, and is obviously capable of fulfilling even a rigorous definition of moonlighting

But not all moonlighting is carried out by proteins with initially poorly-defined structures. Some of the possible alternatives are considered in the Figure below:

MultiFunct-StructAspects

Fig. 1. Different forms of protein multi-functionality, and the processes resulting in multi-functional species from the same polypeptide. F1 and F2 denote distinguishable functions.Here a protein contains two separate and structurally definable segments, which perform well-demarcated and distinct functions under specific circumstances (depicted here as operating via different binding partners).

A. Here a protein contains two separate and structurally definable segments, which perform well-demarcated and distinct functions under specific circumstances (depicted here as operating via different binding partners).

B. In this depiction, an intrinsically unstructured polypeptide assumes different folds under the templating influence of different binding molecules. ‘Tails’ (curved lines) are shown on the altered originally unstructured proteins in the bound states, since it has been found that under such conditions the acquisition of an ordered structure is not necessarily complete.

C. This schematic depicts a protein that can, via a disordered intermediate, assume two quite distinct folding states, corresponding to the concept of ‘metamorphic’ proteins. These have only relatively recently been described and studied.

D. In this case, under the influence of a binding interaction, a part of a protein undergoes a distinct folding alteration, with the acquisition of new functional properties. This corresponds to the concept of ‘transformer’ proteins.

Whether all of these schemas or only some are classified as ‘moonlighting’ clearly comes down to a matter of definitions.

The description embodied by (A) is usually considered as a classic paradigm for protein moonlighting, as the two functions in this schematic are represented by different protein sites and mediate distinct functions. In practice, many different functions can be fulfilled by different protein regions. Perhaps the champion of polyfunctionality is the Large T protein of the mammalian SV40 virus, which contains at least seven different sites performing distinct functions. Intrinsically instructured proteins (as in [B] above) have indeed been discussed in the context of moonlighting, as noted above. The other categories (C) and (D) (metamorphic and transformer proteins, respectively) are not generally considered under the umbrella of the moonlighting concept, but certainly they are cases of novel mechanisms for protein multifunctionality.

__________________________________________________________________

 

Origins of moonlighting / multifunctionality

It is a natural question to ask why evolution should favor the development of multifunctional or moonlighting proteins, and this has been considered in some depth by those interested in the field. For example, one salient point has been raised to the effect that unused parts of a large protein surface may over evolutionary time tend to acquire new functions.

Consider a scenario where protein A performs task 1, and protein B does task 2. If another multidomain protein C can perform both tasks, but the size of C is < A + B, then the combined transcription / translation costs for C then also must be < A + B. Economies of bioenergetics arise where only one initiation event is required for both transcription and translation, rather than for two separate genes. Also, there may be a need for only single processing signals where applicable (for example, nuclear or organelle transport, membrane display, and so on). This kind of argument from the stance of energetics has a basic and logical appeal, and has been made repeatedly. Thus, by means of diverse forms of protein multifunctionality, a proteome of limited size can acquire an expanded functional range within the boundaries of the same energy budget as previously used.

But it is not quite as simple as it may seem at first glance. Again using the above example, Protein C must be integrated into a complex interactome, so in some cases having tasks 1 and 2 relegated to separate molecules may be actually advantageous, disfavoring multifunctional packing. Also, it has been noted that a conflict tends to exist between a protein with multiple functions and optimization of each function. Indeed, mutations which themselves minimize such ‘adaptive conflict’ may be key innovators in the development of successful moonlighting or other multifunctional mechanisms. All this is mediated through a complex balance sheet determining optimal fitness, such that the simplest and most economical alternatives that are evolutionarily accessible (parsimony) will always win.

There are clearly many unanswered questions when the evolution of protein multifunctionality are considered. For example, what is the timing of the evolutionary origins of moonlighting? In other words, at what point in the evolution of complex biosystems did it first appear? Systems with multi-tasked effectors would be presumably evolutionarily favored, but were they an early or late event during the course of molecular evolution?

These questions could lead to a final speculation: is the parsimony of moonlighting / multifunctionality not merely a edge-giving energy saver, but an essential requirement for the development of highly complex biosystems? If this proposal was true, then the only pathway for an organism to acquire increasingly complex structures and systems is via the introduction of the kind of parsimony that protein multifunctionality can confer. In this scenario, if all protein effectors were monofunctional, then biosystems would eventually hit an ‘energy wall’ beyond which they could not traverse, and all biology would be trapped at a relatively basic level of complexity and organization.

An interesting extension of this proposal can also be made with respect to the emergence of the extant world of protein-DNA-RNA from its precursor RNA World (a topic touched upon in several previous posts, for example, 21 June 2011 ). Since the functional range of folded RNA molecules is generally accepted as inferior to that which the larger protein alphabet can offer, an RNA world might likewise suffer from a deficiency in the ability of moonlighting ribocatalysts to form. An RNA world trapped with mostly monofunctional players might thus, by this logic, remain at a far-reduced level of complexity below that which the extant biosphere has attained. Obviously, many factors are likely to be involved in the ascendancy of the protein-DNA-RNA world, but restrictions on RNA multifunctionality might be one of them. As noted in other contexts in previous posts, these and related questions will actually become amenable to experimental testing, via advancements in synthetic biology and increasingly sophisticated model biosystems.

And a final word in a (biopoly)verse form:

 

A protein can wear more than one hat

It can do not just this, but now that

If its functional roles

Fulfill many goals

Then moonlighting is smoothly down pat.

 References & Details

(In order of citation, giving some key references where appropriate, but not an exhaustive coverage of the literature).

The first description of such protein multi-tasking was with the eye structural protein crystallin…..’     See Piatigorsky & Wistow 1989.

The coining of the term [moonlighting] is often attributed to Constance Jeffrey…..’     See Jeffrey 1999.

‘………at least one use of ‘moonlighting’ in a similar context in the literature prior to this date has been recorded. ’     See Campbell & Scanes 1995, whose paper was entitled, “Endocrine peptides ‘moonlighting’ as immune modulators: roles for somatostatin and GH-releasing factor.”.

‘……an enzyme showing ‘promiscuity’ in terms of being able to use more than one specific substrate at its active site is not moonlighting…..’     This point was raised by Jeffrey 1999. See also an interesting review on enzyme promiscuity by Khersonsky & Tawfik 2010, where the distinction between promiscuity and moonlighting was also noted.

‘…….many examples of the moonlighting phenomenon have been described.’     See some informative reviews, including Gancedo et al. 2016; Henderson & Martin 2014; Huberts & Van der Klei 2010.

Issues such as this render the moonlighting concept not as simple as it is sometimes made out to be.’     For example, the respected molecular biologist and bioinformaticist Eugene Koonin has expressed the opinion, “I am actually inclined to think that all proteins perform multiple roles in organisms and are at some level moonlighting.”. (This statement is part of an open review by Koonin of a paper by Khan et al. 2014, entitled, “Genome-scale identification and characterization of moonlighting proteins” [Biol. Direct, 2014]).

This effect [complication of phenotypes produced from single-gene coding sequence mutations as a consequence of protein moonlighting] has been well-documented in the case of gene defects for metabolic enzymes…..’      See Sriram et al. 2005.

‘……..if enzyme X acting on substrate U to create product V is genetically inactivated, then the resulting phenotype may be predicted…….’      In practice, it has to be noted, even if a completely monospecific enzymatic function exists, it is not necessarily simple to predict the entirety of the phenotype that results from its functional knock-out. A good case in point in this regard is human Lesch-Nyhan syndrome, which results from mutational inactivation of the enyzme hypoxanthine-guanine phosphoribosyltransferase, involved in purine metabolism. This syndrome is characterized by a number of clinical manifestations, including effects on purine recycling, leading to excessive levels of uric acid. The mechanism of this is well-understood and quite predictable from the genetic lesion. However, in addition to this abnormality, severe Lesch-Nyhan syndrome produces motor disability, and a strange compulsion of afflicted individuals (almost all males, since the gene is on the X chromosome) to engage in serious self-mutilating behavior. Predicting the generation of such a higher-level neurological abnormality as a result of a deficit in a purine-salvage enzyme is another matter entirely, and despite much study, the mechanism for this behavioral pathology remains obscure. (See Jinnah et al. 2013; Dammer et al. 2015).

‘…..it is now recognized that an important subset of proteins do not possess such an ordered structure…..’      See Tompa et al. 2005; Fuxreiter et al. 2014 for useful reviews. An example of a protein with a disordered domain is the important transcription factor and oncoprotein c-Myc (Yu et al. 2016).

‘……up to 40% of eukaryotic proteins are mostly disordered, and >50% have a significant region of low folding order.’     These statistics were cited by Yu et al. 2016 and Fuxreiter et al. 2014, respectively.

‘…….the implications [of intrinsically unstructured proteins] for protein moonlighting are clear, and this has been noted for over a decade.’ See Tompa et al. 2005.

‘……..the concept of ‘metamorphic’ proteins. These have only relatively recently been described and studied.’      See Camilloni & Sutto 2009; Tyler et al. 2011.

This corresponds to the concept of ‘transformer’ proteins.’      See Knauer et al. 2012. It may be noted that the concept of ‘transformer’ proteins (Fig. 1D) can be considered as an extreme form of allostery, where a protein (or functional folded nucleic acid, for that matter) undergoes a conformational change in response to a covalent or non-covalent interaction. The general phenomenon of allostery is very widespread in nature and of fundamental importance, but the ‘transformer’ effect is at a different level of conformational change in comparison to the vast majority of allosteric circumstances.

Tails’ (curved lines) are shown on the altered originally unstructured protein in the bound states, since it has been found that under such conditions the acquisition of an ordered structure is not necessarily complete.’     When this is the case, the resulting protein complex may have a ‘fuzzy’ aspect. See Gruet et al. 2016.

‘……the champion of polyfunctionality is the Large T protein of the mammalian SV40 virus, which contains at least seven different sites performing distinct functions.’     These are: Cul7 binding, pocket protein binding, DNA binding and initiation of viral replication, helicase, ATPase, p53/p300 binding, and host range determination. In addition, Large T has a nuclear localization signal, numerous phosphorylation sites, and an acetylation site. (See the ftp site for Searching for Molecular Solutions, specifically SMS-Cited Notes-Ch.9).

This [the evolution of multifunctionality / moonlighting] has been considered in some depths by those interested in the field.’      See Jeffery 1999, and also Sriram et al. 2005, who consider this question and note earlier opinions.

Also, it has been noted that a conflict would exist between a protein with multiple functions and optimization of each function.’      See Fares 2014.

‘….mutations which themselves minimize such ‘adaptive conflict’ may be key innovators…..’      See Copley 2014.

 

Next Post: January 2017.

Biological Parsimony and Interactomes I

February 21, 2016

In previous posts (April 2015; August 2015) we have looked at the notion of biological parsimony from several vantage points. For example, one such issue was the frequency of certain protein folds as recurring evolutionary motifs, in contrast to other folds which are used much more restrictedly (April 2015 post). Here we look at parsimony at the higher level of biological systems, chiefly concerning the strong tendency for such systems to evolve towards strongly economic arrangements.

Two Levels of Parsimony

In the post of April 2015, a number of different forms of biological modularity were listed, as applied towards parsimonious biosystems. Here we can ‘parsimoniously package’ the general phenomenon of biological parsimony itself into two major levels: that of the packing or arrangement of function in specific molecules or encoded biological information, and that of the deployment of molecules in terms of their functional interactions in the operations of biosystems. Of course, these are not independent factors, since a polyfunctional protein (for example) will have multiple types of interactions with other functional partners within the biosystem within which it operates. This is one means whereby a single protein can have distinct roles in cells of divergent differentiation lineages.

 

These levels of parsimony and their interactions are depicted in Fig. 1.

Fig1-ParsimonyLevels

Fig. 1. Two major levels of macromolecular biological parsimony and their inter-relationships, schematically depicted. A, Packing / Functional Arrangement refers to parsimonious packaging of encoded information (such as a single genetic coding locus capable of producing multiple distinct proteins, via alternate promoters, differential splicing, or other means; represented here as ‘Informational Encoding’). Also, encompassed within this level is the parsimonious use of encoded macromolecular structures by evolutionary selection (“Evolutionary motif redeployment”) with divergence of function, well-exemplified by the TIM-barrell protein motif, as mentioned in a previous post (April 2015).  In addition, the ‘packing’ levels includes the grouping of multiple distinct functions into a single macromolecule (such as a protein W with n separable functions). B, Parsimony at the level of functional interactions. These include intermolecular interactions (for example, where a protein W interactions with n different partner molecules with n distinct results), and also intramolecular effects (as in the case of allosteric changes in a protein induced by ligand binding at a distinct site). Also within this level of parsimony is the evolutionary redeployment of portions of specific signaling pathways in different cellular contexts, with divergent ‘read-out’ consequences.

Aside from the above ‘packing’ issues, another aspect of molecular functional packing which is relevant to biological parsimony at the level of individual molecules is evolutionary ‘redeployment’ parsimony. (In this respect, see also a previous post [April 2015], where this has also been discussed). Such an evolutionary form of parsimony refers to the tendency of biological components, especially at the molecular structural level, to be ‘repurposed’ by evolution towards assuming new functional roles. (This facet of parsimony is also noted in Fig. 1 above.) Why should this be so? In fact, it follows fairly simply from the principle that natural selection process can only ‘tinker’ with what is currently available, and cannot foresee the optimal solutions to biosystem problems which might become apparent in hindsight. Thus, it is usually more probable that encoded pre-existing structures will be co-opted for other functions than wholly new genes will arise de novo. In order for the ‘re-purposing’ to happen, an obvious problem would seem to arise from the simple question, “if some cellular mediator assumes a new function, what takes care of its original function?” In fact, there are well-known processes whereby this can happen, primarily involving the generation of additional gene copies (gene duplication events) with which evolution can ‘tinker’, without compromising the function of the original gene product.

Of course, it could be argued that since the entirety of biology is evolutionarily derived, that all biological parsimony is ‘evolutionary’. In a broad sense, this is obviously true, but it is worth highlighting the ‘repurposing’ type of evolutionary parsimony for special mention in this context. It is certainly true that not all evolutionary change can be classified as arising from the redeployment of pre-existing ‘parts’ towards novel applications, in any case. For example, where a single mutation in a functional protein confers a selectable fitness benefit, which ultimately becomes fixed in a population, evolutionary change has occurred – but via a direct modification of an existing ‘part’ towards better efficiency in its original role, not towards an entirely new function.

That matter aside, in this parsimonious post, the focus will be on interactomes, following a brief introduction from the previous post on this topic.

 Parsimonious Interactions

It was noted in the previous post that the seemingly low numbers of coding sequences in human and other ‘higher’ organisms is counter-balanced to a considerable degree by various diversity generating mechanisms, by which a single gene can encode multiple proteins, or a single protein can be post-translationally modified in distinct ways. But as well, it was also noted in passing that many (if not most) proteins have more than one role to play in developing organisms, often in cell types at distinct stages of differentiation. This is the essence of the interactome, the sum total of molecular interactions that enable a biosystem to function normally. In this context, a key word is connectivity, where ‘no protein [or any functional mediator] is an island, entire to itself’.

There are numerous ways that the parsimony principle is manifested within interactomes. One prominent feature in this regard is signaling and signaling pathways. It is common to find a single defined signaling mediator with multiple roles towards different cell types, or at different stages of differentiation. An example to consider here is a cytokine known as Leukemia Inhibitory Factor, or LIF. As its name suggests, it was first defined as a factor inhibiting the growth of leukemic cells, yet in other circumstances it can behave as an oncogene. It is well-known as a useful reagent in cell biology owing to its ability to maintain the pluripotent differentiation status of embryonal stem cells, an activity of great utility for the generation of ‘knock out’ mice. But in addition to this, LIF has been shown to have roles in the biology of blastocyst implantation, hippocampal and olfactory receptor neuronal development, platelet formation, proliferation of certain hematopoietic cells, bone formation, adipocyte lipid transport, production of adrenocorticotropic hormone, neuronal development and survival, muscle satellite cell proliferation, and some aspects of hepatocyte regulation. An irony of this polyfaceted range of functions is that certain activities among the above LIF-list were at first ascribed to new and unknown mediators, before detailed biochemical analysis showed that LIF was the actual causative factor.

The extent of the pleiotropism (‘many turns’) of LIF has intrigued and surprised numerous workers, leading to this effect being called an ‘enigma’. Why should one cytokine do so many things? Here it should be noted that in the cytokine world, while LIF is certainly not unique in having multiple activities, it is probably the star performer in this regard. In answer to the question “why does it make design sense to use LIF in the regulation of such a diverse and unrelated series of biological processes?”, we can invoke the parsimony principle, by a now familiar logical pathway. It is thus reasoned that a biosystem factor will tend to assume multiple functional roles if it can do so without compromising organismal fitness.  The ‘tend to’ phrase is predicted on the assumption that is energetically and informationally favorable to streamline the functional operations of a biosystem as much as possible, and that evolutionary processes will move organism design in that direction, via increased fitness gains. At the same, it is evident that there must be limits to this kind of trend, since at some point in the poly-deployment of a mediator, inefficiencies will inevitably creep in, as one signal event begins to interfere with another. A number of ways have been ‘designed’ by evolution to minimize this, of which more below. But to return to the specific question of why should LIF – and not some other cytokine – be such an exemplar of polyfunctionality, there is no specific answer. All that can be suggested is that the many biological roles that feature LIF do not interfere with each other, or that they complement each other, such that there is a fitness gain by LIF’s multideployment in such ways. And this could be condensed into saying, ‘it can, so it does’, which might not sound particularly helpful. There may be reasons of simple evolutionary contingency as to why LIF gained these roles and not some other cytokine – or indeed there may be deeper (and highly non-obvious) reasons why the prize necessarily must go to LIF. Such questions might be answered at least in part by advanced systems biological modeling, or (ultimately) by equally advanced synthetic biology, where artificial programming of real-world model biosystems can address such questions directly.

With this introduction in the form of LIF in mind, it is useful to now think about ways that receptor signaling can diversify with either a single mediator involved, or with a single receptor. With respect to the latter circumstances, there are biological precedents where a single heterodimeric receptor (composed of two chains) can respond with distinct signaling resulting from engagement with separate ligands. This effect is well-exemplified by the Type I interferons (IFN), of which there are several distinct types (in humans alone, these include IFN-α, IFN-β, IFN-ε, IFN-κ, and IFN-ω, where IFN-α has 13 different subtypes), all of which bind to the same heterodimeric Type I receptor. Yet despite their sharing of a common receptor, the signaling induced by these distinct kinds of interferons is quite distinct as well. This phenomenon is depicted in Fig. 2 below.

Fig2-ReceptorParsimony1

Fig. 2. Schematic depiction of a single heterodimeric receptor which enables distinct signaling from binding of different ligands, even in the same cell type. In the top panel, Ligand A (blue pentagon) engages certain specific residues within the receptor pocket, with induction of a conformational change which activates a subset of the intracytoplasmic co-signaling molecules, with a specific signaling pathway triggered. The bottom panel depicts a different ligand (Ligand B, red hexagon), which engages the receptor with different contact residues, resulting in distinct receptor changes and concomitant downstream signaling.

In general, the form of ligand signaling complexity depicted in Fig. 2, where a specific ligand can activate one signaling pathway without activating another, has been termed ‘biased agonism’. This phenomenon has been much-studied in recent times with respect to G-Protein Coupled Receptors (GPCRs), which are a hugely diverse class of cellular receptors. They have long been of particular interest to the pharmaceutical industry through their susceptibility to selective drug action (‘druggability’), and biased agonism clearly offers a handle on improving the selectivity by which GPCR-mediated signaling is directed in a desired manner.

Other complexities to signaling arrangements are possible which increase signal diversity from a limited set of participants. Cells of different lineages may express the same receptors, but differ in their patterns of co-receptors and signaling accessory molecules whereby intracellular signals are generated. This is depicted in Fig. 3A and Fig. 3B below. Other processes whereby a limited set of ligands and receptors diversify their signaling are shown in Fig. 3C- Fig. 3F. Thus, signaling-based polyfunctionality is one aspect of interactomic parsimony.

Fig3-ReceptorParsimony3

Fig. 3. Schematic depiction of mechanisms for signaling diversity generated with either the same receptor in different contexts (A-E), or the same ligand binding to a different receptor (F). A and B: the same receptor (as a heterodimer) expressed in cells of two distinct differentiation states, such that they differ in their complements of coreceptors (not shown) or intracytoplasmic accessory signaling molecules (colored ovals). After engagement with ligand, the resulting signal pathway in context A is thus divergent from that generated in context B; C and D: the same receptor where it forms a homodimer (C) or heterodimer (D), each with distinct signaling consequences; E: the same receptor as in A, but where it interacts with a second ligand (pale blue octagon), which engenders a conformational change such that it binds either a different ligand, or a modified form of the original ligand; F: the same ligand as in A, but where it is compatible with another receptor entirely, with corresponding divergent signaling effects.

The deployment of different subunits in the signaling arrangements of Fig. 3 is itself a subset of a more general effect within interactomes, where modularity of subunits within protein complexes is a ubiquitous feature. This reflects an aphorism coined in a previous post (April 2015), to the effect that “Parsimony is enabled by Modularity; Modularity is the partner of Parsimony”. And with respect to protein modularity in eukaryotic cells, there is plenty of evidence for this from studies of the yeast proteome, where differential protein-protein combinations have been extensively documented.

Signaling to different compartments

Biosystems are compartmentalized at multiple levels. As well as the unit of compartmentalization we know as cells, numerous membrane-bound structures are ubiquitously encompassed within cellular boundaries themselves. An obvious one to note is the cell nucleus itself. While the subcellular organelles known as mitochondria (the energy powerhouses of cells) and chloroplasts (the photosynthetic factories of green plants) have their own small genomes encoding a limited number of proteins, in both cases many more proteins required for their functions are encoded by the much larger host cell genomes. Other compartments lacking their own genomes exist, including (but limited to) the endoplasmic reticulum, the Golgi apparatus, and peroxisomes.

It would be easy to imagine a host genome-encoded set of special proteins reserved for the organelles or other compartments, along with specialized transport systems (to target the organelle-required proteins to the right places) in each case. In some cases, this appears to be so, but if this was generalized, it would certainly violate the parsimony principle, since many such proteins are also required to function in more than one cellular compartment. One could envisage a solution in the form:

Signal A – Protein 1, 2, 3….. | Signal A recognition system, to compartment A

Signal B – Protein 1, 2, 3….. | Signal B recognition system, to compartment B

By such an arrangement, an identical set of proteins could be targeted to distinct compartments if they were are appended to modular recognition signals. Yet as is so often the case, biology is both more subtle and more complicated than simplistic schemes such as this. In fact, a variety of natural ‘solutions’ for the multi-targeting issue have evolved. To use the above terminology, some could be depicted at the mRNA level as:

Signal A (spliced in) –  Protein 1 coding sequence….. | (expression) — Signal A recognition system, to compartment A

Signal A (spliced out) – Protein 1 coding sequence….. | (expression) — no targeting signal, remains in cytosol.

In these circumstances, the ‘default’ localization is with the cytoplasm (cytosol), and organelle targeting is effected only where a signal sequence is translated and appended to the protein. Differential splicing at the RNA level can then include or exclude the sequence of interest, both (parsimoniously) from the same genetic locus. But many more mechanisms than this have been documented for general multi-compartmentalization, including the existence of chimeric signal sequences that are bipartite towards different compartments. The take-home message once again is the stunning extent to which known biosystems have evolved highly parsimonious deployment of their encoded functional elements, all encompassed within biological interactomes.

This short tour of the parsimonious interactome has barely scratched the surface of the topic as a whole, and some other aspects of biology parsimony will indeed be taken up in the next post. Meanwhile, a biopoly(verse) take on receptor-ligand parsimony:

 A ligand-receptor attraction

Can show parsimonious action

For receptors can change

In their signaling range

And vary a transduced reaction

 References & Details

(In order of citation, giving some key references where appropriate, but not an exhaustive coverage of the literature).

‘…..it is usually more probable that pre-existing structures can be co-opted for other functions than wholly de novo genes will arise.’    It has long been considered that gene duplication is an effective means by which novel functions can evolutionarily arise, and far more likely than de novo gene evolution. In this regard, see a review by Hurles 2004. Yet in recent times evidence for the evolution of de novo genes from ‘orphan’ open reading frames has become stronger; see Andersson et al. 2015. Nevertheless, the duplication-mediated repurposing of pre-existing evolutionary ‘parts’ is still most likely to be much more frequent.

‘….Leukemia Inhibitory Factor, or LIF. As its name suggests, it was first defined as a factor inhibiting the growth of leukemic cells….’    For general LIF background and its polyfunctional nature, see Hilton 1992 and Metcalf 2003. As other examples of LIF anti-tumor activities, see Bay et al. 2011; Starenki et al. 2013.

‘…..yet in other circumstances it [LIF] can behave as an oncogene.‘    See Liu et al. 2015.

‘……this effect [LIF polyfunctionality] being called an ‘enigma …..’     See Metcalf 2003.

‘…..why does it make design sense to use LIF in the regulation of such a diverse and unrelated series of biological processes” ……’    This question (slightly paraphrased here) was posed by Metcalf 2003.

‘……This effect is well-exemplified by the Type I interferons.’     See a review by Platanias 2005.

‘…….‘biased agonism’. This phenomenon has been much-studied in recent times with respect to G-Protein Coupled Receptors (GPCRs)..’     For very recent updates on biased agonism in a GPCR context, see Pupo et al. 2016; Rankovic et al. 2016.

‘……protein modularity in eukaryotic cells, there is plenty of evidence from studies of the yeast proteome, where differential protein-protein combinations have been extensively documented.‘   See Gavin et al. 2006; Gagneur et al. 2006.

‘……But many more mechanisms than this have been documented for general multi-compartmentalization.’     See a review by Yogev and Pines 2011, where at least 8 different targeting mechanisms were listed for mitochondria alone. See also Avadhani et al. 2011 for a discussion of chimeric signals in a specific protein context.

 

Next Post: March.

 

Biological Parsimony and Genomics

August 24, 2015

The previous post discussed the notion that biological processes, and biosystems in general, exhibit a profound economy of organization and structure, which can be termed biological parsimony. At the same time, there are biological phenomena which seem to run counter to this principle, at least at face value. In this post, this ‘counterpoint’ theme is continued, with an emphasis on the organization of genomes. In particular, the genome sizes of the most complex forms of life (unlike simpler bacteria) superficially considerably exceed the apparent basic need for functional coding sequences alone.

Complex Life With Sloppy Genomes?

When it comes to genomics, prokaryotes are good advertisements for parsimony. They have small and very compact genomes, with minimal intergenic spaces and few introns. Since their replication times are typically very short under optimal conditions, the time and energy requirements for genomic replication are often significant selective factors, tending to streamline genomic sizes as much as possible. A major factor for the evolution of prokaryotic organisms is their typically very large population size, which promotes the rapid positive selection of small fitness gains. Prokaryotic genomes are thus under intense selection for functional and replicative simplicity, leading to rapid elimination of non-essential genomic sequences.

Yet the situation is very different for more complex biologies of eukaryotes, where genome sizes are commonly bigger by 1000-fold or more than that of the bacterial laboratory workhorse, E. coli. It is widely recognized that this immense differential is enabled in eukaryotic cells through the energy dividend provided by mitochondria, the organelles acting as so-called powerhouses of such cells. Mitochondria (and chloroplasts in plants) are intracellular symbiotes, descendents of ancient bacterial forms which entered into an eventual partnership with progenitors of eukaryotic cells, and in the process underwent massive genomic reduction. The energetic contribution of mitochondria enabled much larger cells, with concomitant much larger genomes.

If eukaryotic genomes can be ‘sloppy’, and accommodate very large tracts of repetitive DNAs deriving from parasitic mobile elements, or other non-coding sequences, where is the ‘parsimony principle’ to be found? We will return to this question later in this post, but first let’s look at some interesting issues revolving around the general theme of genomic size.

Junk is Bunk?

While a significant amount of genomic sequence in a wide variety of complex organisms is now known to encode not proteins but functional RNAs, genome sizes still seem much larger than what should be strictly necessary. This observation is emphasized by the findings of genomic sequencing projects, where complex organisms, including Homo sapiens, show what seems at first glance to be a surprisingly low count of protein-coding genes. In addition, closely related organisms can have markedly different genome sizes. These observations are directly pertinent to the ‘C-value paradox’, which refers to the well-documented disconnect between genome size and organismal complexity. Since genomic size accordingly appears to be arbitrarily variable (at least up to a point), much non-coding DNA has been considered by many in the field to be ‘junk’. In this view, genomic expansion (by duplication events or extensive parasitism by mobile genetic elements) has little if any selective impedance until finally limited by truly massive genomic sizes. In other words, the junk DNA hypothesis holds that genomes can accumulate large amounts of superfluous sequence which are essentially along for the ride, being replicated in common with all essential genomic segments. This trend is only restricted when genomes reach a size which eventually does impact upon the relative fitness of an organism. Thus, even the junk DNA stance concedes that genomes must necessarily be size-restricted, even though a lot of genomic noise can be tolerated before this point is reached.

It must be noted that the junk DNA viewpoint has been challenged, broadly along two separate lines. One such counterpoint holds that the apparent lack of function of large sectors of eukaryotic genomes is simply incorrect, since a great deal of the ‘junk’ sequences are transcribed into RNAs with a variety of essential cellular functions beyond encoding proteins. As noted above, there is no question that functional (non-coding) RNAs are of prime importance in the operations of all cellular life. At a basic level this has been known almost since the birth of molecular biology, since ribosomal RNAs (rRNAs) and transfer RNAs (tRNAs) have been described for many decades. These RNAs are of course essential for protein synthesis, and are transcribed from corresponding genomic DNA sequences.

But in much more recent times, the extent of RNA function has become better appreciated, to include both relatively short regulatory molecules (such as microRNAs [miRNAs]) and much longer forms (various functional non-coding species [ncRNAs]). While the crucial importance of these classes of nucleic acid functionalities is beyond dispute, the relevance of this to genome sizes is another matter entirely. To use the human genome as a case in point, even if the number of functional RNA genes was twice the size of the protein-coding set, the net genome size would still be much larger than required. While the proponents of the functional-RNA refutation of junk DNA have pointed to the evident transcription of most (if not all) of complex vertebrate genomes, this assertion has been seriously challenged by the Hughes (Toronto) lab as based on inadequate evidence.

Other viewpoints suggest that the large fraction of eukaryotic chromosomal DNA which is seemingly superfluous is in fact necessary, but without a strong requirement for sequence specificity. We can briefly consider this area in a little more detail.

Genomic Junk and Some ‘Indifferent’  Viewpoints

One of these proposals, the ‘skeletal DNA’ hypothesis, as largely promulgated by Tim Cavalier-Smith, side-steps the problem of whether much of the genome is superfluous junk or not, in favor of a structural role for the large non-genic component of the genome. Here the sequence of the greater part of the ‘skeletal’ DNA segments is presumed to be non-specific, where the main evolutionary selective force is for genomic size per se, irrespective of the sequences of non-genic regions. Where DNA segments are under positive selection but not in a sequence-specific manner, the tracts involved have been termed ‘indifferent DNA’, which seems an apt tag in such circumstances. Cavalier-Smith proposes that genomic DNA acts as a scaffold for nuclei, and thus nuclear and cellular size correlate with genome sizes. But at the time, DNA content itself does not directly alter proliferating cell volumes; rather the latter results from variation in encoded cell cycle machinery and signals (related to cellular concentrations of control factors).

Another proposal for the role of large non-coding genomic segments could be called the ‘transposable element shield’ theory. In this concept (originally put forward by Claudiu Bandea) so-called junk genomic segments reduce the risk that vital coding sequences will be subjected to insertional inactivation by parasitic mobile elements. Once it has been drawn to one’s attention, this proposal has a certain intuitive appeal. Thus, if 100% of a complex genome was comprised of demonstrably functionally sequences, then by definition any insertion by a parasitic transposable sequence element would knock out a function (or at least have a very high probability of doing so). If only 10% of the genome was of vital functional significance, and the rest a kind of shielding filler, then the insertional risk goes down by an order of magnitude. This model assumes that insertion of mobile elements is sequence-target neutral, or purely random in insertion site. Since this is not so for certain types of transposable elements, the Bandea proposal also encompasses the notion that protective genomic sequences are not necessarily arbitrary, but include sequences with a decoy-like function, to absorb parasitic insertions with reduced functional costs. Strictly speaking, then, this proposal is not fully ‘indifferent’ in referring to ‘junk’ DNA, but clearly is at least partially so. It should be noted as well that shielding against genomic parasitism is of significance for multicellular organisms with large numbers of somatic cells, as well as germline protection.

In the context of whether genomes increase in size by the accumulation of ‘junk’ or through selectable (but sequence-independent) criteria, it should be noted that a strong case has been made Michael Lynch and colleagues for the significance of non-adaptive processes in causing changes in genome size, especially in organisms with relatively low replicative population sizes (the opposite effect to large-population prokaryotes, as noted above). The central issue boils down to energetic and other functional costs – if genome sizes can expand with negligible or low fitness cost, passive ‘junk’ can be tolerated. But a ‘strong’ interpretation of the skeletal DNA hypothesis holds that genome sizes are as large as they are for a selectable purpose – acting as a nuclear scaffold.

In considering the factors influencing the genome sizes of complex organisms, some specific cases in comparative genomics are useful to highlight, as follows.

Lessons from Birds, Bats, and Other ‘Natural Experiments’

Modern molecular biology has allowed the directed reduction of significant sections of certain bacterial genomes for both scientific and technological ends. But some ‘natural experiments’ have also revealed very interesting aspects of vertebrate genomes.

One such piece of highly significant information comes from studies of the genomes of vertebrates that are true fliers, as found with birds and bats. Such organisms are noted collectively for their significantly smaller genomes in comparison to other vertebrates, especially other amniotes (reptiles and mammals). The small-genome / flight correlation has even been proposed for long-extinct ancient pterosaurs, from studies of fossil bone cell sizes. In the case of birds, genome size reduction has been assigned as stemming from loss of repetitive sequences, deletions of certain genomic segments, and (non-essential) gene loss.

A plausible explanation for the observed correlation between the ability to fly and smaller genomes is the high-level metabolic demand of flight. This dictate is argued to favor streamlined genomes, via the reduction in replicative metabolic costs. Supporting evidence for such a contention is provided by the negative correlation between genome size and metabolic rate in all tetrapods (amphibians, reptiles, birds, and mammals), where a useful measure of oxidative metabolic rate is the ‘heart index’, or the ratio of heart mass to body weight. Even among birds themselves, it has been possible to show (using heart indices) negative correlations between metabolic rates and genomic sizes. Thus, highly active fliers with relatively large flight muscle quantities tend to have smaller genomes than more sedate fliers, with hummingbirds (powerhouses of high-energy hovering flight) having the smallest genomes of all birds.

It was stated earlier that closely related organisms can have quite different genome sizes, and the packaging of genomes in such cases can also differ markedly. The Indian muntjac deer has a fame of sorts among cytogeneticists, owing to the extremely low size of its chromosome count relative to other mammals (only 6 diploid chromosomes in females, with an extra one in males). Indeed, the Chinese muntjac has a more usual diploid chromosome count of 46, and yet this deer is closely enough related to Indian muntjacs that they can interbreed (albeit with sterile offspring, reminiscent of mules produced through horse-donkey crosses). The Indian muntjac genome is believed to be the result of chromosomal fusions, with concomitant deletion of significant amounts of repetitive DNAs, and reduction in certain intron sizes. As a result, the Indian muntjac genome is reduced in total size by about 22% relative to Chinese muntjacs.

This illustration from comparative genomics once again suggests that genome size alone cannot be directly related to function. Although the link between numbers of distinct functional elements and complexity might itself be inherently complex, it is reasonable to contemplate what degrees of molecular function are required to build different organisms. If all genomes were entirely functional and ‘needed’, then much more genomic sequence is required to build lungfishes, onions, and many other plants than human beings.

Junk vs. Garbage

A common and useful division of items that are nominally ‘useless’ has been noted by Sydney Brenner. He pointed out that most languages distinguish between stuff that is apparently useless yet harmless (‘junk’), and material that is both useless and problematic or offensive in some way (‘garbage’). An attic may accumulate large amounts of junk which sits there, perhaps for decades, without much notice, but useless items which become odoriferous or take up excessive space are promptly disposed of. The parallel he was making with genomic sequences is clear. ‘Garbage sequences’ that are, or become, deleterious in some way are rapidly removed by natural selection, but this does not apply to sequences which are merely ‘junk’.

Junk sequences thus do not immediately impinge upon fitness, at least in organisms with low population sizes. Also, ‘junk’ may be co-opted during evolution for a true functional purpose, as with the full ‘domestication’ of otherwise parasitic mobile elements. Two important points must be noted with respect to the domestication of formerly useless or even deleterious sequence elements: (1) just because some mobile element residues have become domesticated, it does not at all follow that all such sequences are likewise functional; and (2) the co-option (or ‘exaptation’) of formerly useless DNA segments does not in any way suggest that evolution has kept such sequences ‘on hand’ on the off-chance they might find a future use.

Countervailing Trends for Genomic Size

How do complex genomes expand in size, anyway? Duplication events are a frequent contributor towards such effects, and these processes can range from local effects on relatively small segments, to whole genes, and even entire genomes. The latter kind of duplication leads to a state known as polyploidy, which in some organisms can become a surprisingly stable arrangement.

Yet the major influence on genomic sizes in eukaryotes is probably the activity of parasitic mobile (transposable) elements, such that a correlation between genomic size and their percent constitution by such elements has been noted. It has been suggested that although in some cases very large genomes with a high level of transposable elements appear to be deleterious (notably certain plants believed to be on the edge of extinction), in other circumstances (large animal genomes as seen with salamanders and lungfish) a high load of transposable elements may be tolerated without significant fitness loss. The latter effect has been attributed to a slow acquisition of the mobile elements, whereby their continued spread tends to be inactivated by mutation or other (‘sequence decay’ mechanisms. This in itself can be viewed from the perspective of the ‘garbage/junk’ dichotomy: at least some transposable elements that remain active may be deleterious, and thus suitable for relegation into the ‘garbage’ box, while inactivated elements are more characteristic of ‘junk’.

Yet there is documented evidence indicating a global trend in evolution towards genome reduction, in a wide diversity of organisms. When this pattern is considered along with factors increasing genomic size, it has been proposed that the overall picture is biphasic. In this view, periods of genomic expansion in specific lineages are ‘punctuated’ not by stasis (as the original general concept of ‘punctuated equilibrium’ proposed) but with slow reduction in genomic sizes. Though the metabolic demands of flying vertebrates may place special selective pressures towards genomic reduction, a general trend towards genomic contraction suggests that selection always tends to favor smaller and more efficient genomes. Even where the selective advantage of genome size is small and subtle, over evolutionary time it will be inevitably exerted with the observed results. But at the same time, genomic copy-errors (from small segments to whole genes to entire genomes) and parasitic transposable elements act as an opposing influence towards genomic expansion. And in this context, it is important to recall the above notes (from Michael Lynch and colleagues) with respect to the importance of organismal population size in terms of the magnitudes of the selective pressures dictating the streamlining of genomes.

A human genome-reduction project (actually rendered much more feasible by the advent of new genome-editing techniques) could presumably produce a fully-functional human with a much smaller genome, but such a project would be unlikely to pass the scrutiny of institutional bioethics committees. (Arbitrary deletions engendered by blind natural selection will either be positively selected or not; a human project with the tools to reduce genome size would often lack 100% certainty that a proposed deletion would not have deleterious effects). Yet apart from this, we might also ask whether such engineered humans would have an increased risk of somatic cell mutagenesis via transposable elements (leading to cancer), if the Bandea theory of genomic shielding of transposable elements holds water.

Now, what then for parsimony in the light of the cascade of genomic information emerging in recent times?

Thrifty Interactomes?

If the junk DNA hypothesis was truly wrong in an absolute sense (that is, if all genomes were constituted from demonstrably functional sequences), then the parsimony principle might still hold at the genomic level. Here one might claim that all genomic sequences are parsimonious to the extent that they are functionally relevant, and therefore genomes are as large as functionally necessary, but no larger. Yet an abundance of evidence from comparative genomics (as discussed briefly above) suggests strongly that this intrepretation is untenable. But if a typical eukaryotic energy budget derived from mitochondria allows a ‘big sloppy genome’, where does the so-called parsimony principle come in?

The best answer to this comes not from genomic size per se, but gene number and the organization of both gene expression and gene expression products. Consider some of the best-studied vertebrate genomes, as in the Table below. If protein-coding genes only are considered, both zebrafish and mice have a higher count than humans. Nevertheless, as noted above, it is now known that non-coding RNA, both large and small, are very important. If these are noted, and a combined ‘gene tally’ thus calculated, we now find Homo sapiens coming out on top. More useful still may be the count for gene transcripts in general, since these include an important generator of genomic diversity: differential gene splicing.

ComparativeGeneCounts-TABLE

_________________________________________________________________________

But what does this mean in terms of complexity? Are humans roughly only twice as complex as mice, or roughly three times as complex as a zebrafish? Almost certainly there is much more to the picture than that, since these superficial observations belie what is likely to be the most significant factor of all: the way expressed products of genomes (both proteins and RNAs) interact, which can impose many hidden layers of complexity onto the initial expression toolkit. These patterns of interactions comprise an organism’s interactome.

How many genes does it take to build a human? Or a mouse, or a fish? As noted earlier in this post, in the aftermath of the first results for the sequencing of the human genome, and numerous other genomes soon afterward, many onlookers expressed great surprise at the ‘low’ number of proteins apparently encoded by complex organisms. Other observers pointed out in turn that if it is not known how to build a complex creature, how could one know what an ‘appropriate’ number of genes should be? Still, a few tens of thousands of genes does seem a modest number, even factoring in additional diversity-generating mechanisms such as differential splicing. At least, this would be the case if every gene product had only a single, unique role in the biology of an organism – but this is manifestly not so.

In fact, single proteins very often have multiple roles, in multiple ways, via the global interactome. An enzyme, for example, may have the same basic activity, but quite distinct roles in cells of distinct differentiation states. Other proteins can exhibit distinct functional roles (‘moonlighting’) in different circumstances. It is via the interactome, then, that genomes exhibit biological parsimony, to a high degree.

This ‘interactomic’ theme will be developed further in the succeeding post.

Some Parsimonious Conclusions

(1) Prokaryotic genomes have strong selective pressures towards small size.

(2) Eukaryotic genomes can expand to much larger sizes, with considerable portions of redundant or non-essential segments, by mechanisms that may be non-adaptive or positively selected (skeletal DNA, transposable element shielding). Such processes include duplication of specific segments (gene duplication) or even whole-genome duplication (polyploidy). This may countered by long-term evolutionary trends towards genome reduction, but the ‘expandability’ of eukaryotic genomes (as opposed to prokaryotes) still remains.

(3) The expressed interactomes of eukaryotes are highly parsimonious.

(4) Biological parsimony is a natural consequence of strong selective pressures, which tend to drive towards biosystem efficiency. But the selective pressures themselves are linked to the energetics of system processes, and population sizes. Thus, a biological process (case in point: genome replication) within organisms with relatively small populations and moderate energetic demands (many vertebrates) may escape strong selection for efficiency, and be subjected to genetic drift and genomic expansion, with a slow counter-trend towards size reduction. An otherwise tolerable process in terms of energetic demands (genome replication once again) may become increasingly subject to selective pressure towards efficiency (size contraction) if an organism’s metabolic demands are very high (as with flying vertebrates).

(5) Based on internal functions alone, it might be possible to synthetically engineer a complex multicellular eukaryote where most if not all of its genome had a defined function, but such an organism would likely be highly vulnerable outside the laboratory to disruption of vital sequences through insertion of parasitic mobile elements.

And to conclude, a biopolyversical rumination:

There are cases of genomes expanding

Into sizes large and outstanding

Yet interactomes still show

That parsimony will grow

Via selective pressures demanding

References & Details

(In order of citation, giving some key references where appropriate, but not an exhaustive coverage of the literature).

They have small and very compact genomes, with minimal intergenic spaces and few introns.’     In cases where conventional bacteria have introns, they are frequently ‘Group I’ introns in tRNA genes, which are removed from primary RNA transcripts by self-splicing mechanisms. The ‘third domain of life’, the Archaeal prokaryotes, have tRNA introns which are removed via protein catalysts. See Tocchini-Valentini et al. 2015.

‘….their replication times are typically very short under optimal conditions….’     E. coli can replicate in about 20 minutes in rich media, for example. But not all prokaryotes are this speedy, notably some important pathogens. Mycobacterial doubling times are on the order of 16-24 hr for M. tuberculosis (subject to conditions) or as slow as 14 days for the causative agent of leprosy, M. leprae. For an analysis of the genetics of fast or slow growth in mycobacteria, see Beste et al. 2009. For much detail on Mycobacterium leprae, see this site.

A major factor for the evolution of prokaryotic organisms is their typically very large population size……’     For excellent discussion of these issues, see work from the lab of Michael Lynch, as in Lynch & Conery 2003.

‘…..this immense differential is enabled in eukaryotic cells through the energy dividend provided by mitochondria……’    See Lane & Martin 2010; Lane 2011.

‘……Mitochondria …… entered into an eventual partnership with progenitors of eukaryotic cells, and in the process underwent massive genomic reduction….’     Human mitochondrial genomes encode only 13 proteins. For a general and very detailed discussion of such issues. See Nick Lane’s excellent book, Power, Sex, Suicide (Oxford University Press, 2005.

The energetic contribution of mitochondria enabled much larger cells, with concomitant much larger genomes.’     In the words of the famed bio-blogger PZ Myers, ‘a big sloppy genome’ [a post commenting on the hypothesis of Lane & Martin 2010]

‘….complex organisms, including Homo sapiens, show what seems at first glance to be a surprisingly low count of protein-coding genes.’      See (for example) the ENSEMBLE genomic database.

‘…..closely related organisms can have markedly different genome sizes.’     See Doolittle 2013.

‘….even if the number of functional RNA genes was twice the size of the protein-coding set, the net genome size would still be much larger than required.’      The study of Xu et al. 2006 provides (in Supplementary Tables) the striking contrast between the estimated % of coding sequences and genome sizes for a range of prokaryotes and eukaryotes. Although slightly dated in terms of current gene counts, the low ratios of coding sequences in most of the sampled eukaryotes (especially mammals( would stand if even doubled. By the same token, with prokaryotes, a direct correlation exists between coding DNA and genome size, but this relationship falls down for eukaryotes above a certain genome size (0.01 Gb, where the haploid human genome is about 3 Gb; see Metcalfe & Casane 2013).

‘….the proponents of the functional-RNA refutation of junk DNA have pointed to the evident transcription of most if not all of complex vertebrate genomes…..’     The ENCODE project ignited much controversy by asserting that the notion of junk DNA was no longer valid, based on transcriptional and other data. (See Djebali et al. 2012; ENCODE Project Consortium 2012).  The ‘junk as bunk’ proposal has itself been comprehensively debunked by Doolittle (2013) and Graur et al. 2013.

 ‘….. this assertion [widely encompassing genomic transcription] has been seriously challenged as based on inadequate evidence.’     See Van Bakel et al. 2010.

‘…..skeletal DNA hypothesis, as largely promulgated by Tim Cavalier-Smith….’     See Cavalier-Smith 2005.

‘…..this concept (originally put forward by Claudiu Bandea) …..’      See a relevant online Bandea publication.

‘…..shielding against genomic parasitism is of significance for multicellular organisms…..’      Regardless of the veracity of the Bandea hypothesis, a variety of genomic mechanisms for protection from parasitic transposable elements have evolved; see Bandea once more.

Where DNA segments are under positive selection but not in a sequence-specific manner, the tracts involved have been termed ‘indifferent DNA…..’      See Graur et al. 2013.

‘….a strong case has been made Michael Lynch and colleagues for non-adaptive changes in genome size….’      See Lynch 2007.

‘….molecular biology has allowed the directed reduction of significant sections of certain bacterial genomes ….’      For work on genome reduction in E. coli, see Kolisnychenko et al. 2002; Pósfai et al. 2006. For analogous work on a Pseudomonas species see Lieder et al. 2015. The Venter have (famously) worked on synthetic genomes, which allows the most direct way of establishing the minimal genome for a prokaryotic organism. With respect to this, see Gibson et al. 2010.

‘…birds and bats. Such organisms are noted collectively for their significantly smaller genomes in comparison to other vertebrates.‘     For avian genomes, see Zhang et al. 2014; for bats, see Smith & Gregory 2009. ‘…small-genome / flight correlation has even been proposed for long-extinct ancient pterosaurs’   See Organ & Shedlock, 2009. In this study it was found that ‘megabats’ (larger, typically fruit-eating bats lacking sonar) are even more constrained in terms of genomic size than microbats.

In the case of birds, genome size reduction has been assigned……’     For details in this area, see Zhang et al. 2014.

‘…..evidence of a negative correlation between genome size and metabolic rate …..A measure of oxidative metabolic rate is the ‘heart index…..’      See Vinogradov & Anatskaya 2006.

‘…highly active fliers with large relative flight muscle quantities tended to have smaller genomes than more sedate fliers. ‘      See Wright et al. 2014.

‘…hummingbirds (powerhouses of high-energy hovering flight) having the smallest genomes of all birds…’      See Gregory at al. 2009.

‘…..the Indian muntjac genome is reduced in total size by about 22% relative to Chinese muntjacs…..’      The Indian muntjac genome is about 2.17 Gb; the Chinese muntjac genome is about 2.78 Gb. See Zhou et al. 2006; Tsipouri et al. 2008.

‘……much more genomic sequence is required to build lungfishes, onions, and many plants than human beings.’     The note regarding onions comes from T. Ryan Gregory (cited as a personal communication by Graur et al. 2013). For lungfish and many other animal genome sizes, see a comprehensive database (overseen by T.R. Gregory). For plant genomes, see another useful database.

‘….A common and useful division of items that are nominally ‘useless’ has been noted by Sydney Brenner.‘      See Brenner 1998. This ‘junk / garbage’ distinction was alluded to by Graur et al. 2013.

‘…… ‘junk’ may be co-opted during evolution for a true functional purpose, as with the full ‘domestication’ of otherwise parasitic mobile elements……’     See Hua-Van et al. 2011.

‘…. because some mobile element residues have become domesticated, it does not at all follow that all such sequences are likewise functional.’      This point has been emphasized by Doolittle 2013.

‘…..a state known as polyploidy…….’      For an excellent review on many aspects of polyploidy, see Comai 2005.

‘……a correlation between genomic size and their percent constitution by such [mobile] elements has been noted.‘ See Metcalfe & Casane 2013.

‘…..has been suggested …….. very large genomes with a high level of transposable elements appear to be deleterious …… in other circumstances ……a high load of transposable elements may be tolerated….’      See Metcalfe & Casane 2013.

‘……documented evidence indicating a global trend in evolution towards genome reduction….’ | ‘…..it has been proposed that the overall picture is biphasic. Periods of genomic expansion in specific lineages are ‘punctuated’ not by stasis (as the original general concept of ‘punctuated equilibrium’ proposed) but with slow reduction in genomic sizes. ‘     See Wolf & Koonin 2013. For a background on the theory of punctuated equilibrium, see Gould & Eldredge 1993.

‘…..human genome-reduction project (actually rendered much more feasible by the advent of new genome-editing techniques)……’      There is so much to say about these developments (including zinc finger nucleases, TALENs, and in particular CRISPR-Cas technology) that it will form the subject of a future post.

‘ ENSEMBLE Dec 2013 release‘     (Table ) See the ENSEMBLE database site.

These patterns of interactions comprise an organism’s interactome.’      Note here that the term ‘interactome’ can be used in a global sense, or for a specific macromolecule. Thus, a study might refer to the ‘interactome of Protein X’, in reference to sum total of interactions concerning Protein X in a specific organism.

Next post: September.

Parsimony and Modularity – Key Words for Life

April 21, 2015

Sometimes Biopolyverse has considered aspects of life which may be generalizable, such as molecular alphabets. This post takes a look at another aspect of complex life which is universal on this planet, and unlikely to be escapable by any complex biology. The central theme is the observation that the fundamental processes of life have an underlying special kind of economy, which may be termed biological parsimony. Owing its scope and diversity, this will be the first of a series dealing with this issue. Here, we will look at the general notion of parsimony in a biological context, and begin to consider why such arrangements should be the rule. Some biological phenomena would seem to challenge the parsimony concept, and in this initial post we will look at certain features of the protein universe in this respect.

Thrifty Modules

In the post of January 2014, the role of biological parsimony in the generation of complexity was briefly referred to. The fundamental issue here concerns how a limited number of genes could give rise to massively complex organisms, by means of processes that shuffle and redeploy various functional components. Thus, the ‘thrifty’ or parsimonious nature of biological systems is effectively enabled by the modularity of a basic ‘parts list’. A modest aphorism could thus state:

“Parsimony is enabled by Modularity; Modularity is the partner of Parsimony”

 

The most basic example of modularity in biology can be found with molecular alphabets, which were considered in a recent post. Generation of macromolecules from linear sequence combinations of subunits from a distinct and relatively small set (an ‘alphabet’ in this context) has a clear modular aspect. Subunit ‘letters’ of an alphabet can be rearranged in a vast number of different strings, and it is this simple principle which gives biological alphabets immense power as a fundamental tool underlying biological complexity.

This and several other higher-level modular aspects of biological systems are outlined in Table 1 below.

 

BioModularLevels

Table 1. Major Levels of Biological Modularity.

  1. Molecular alphabets: For an extended discussion of this theme, see a previous post. The modularity of any alphabet is implicit in its ability to generate extremely large numbers of strings of variable length, with specific sequences of the alphabetic ‘letters’.
  2. Small molecular scaffolds: Small molecules have vital roles in a wide variety of biological processes, including metabolic, synthetic, and regulatory activities. In numerous cases, distinct small biomolecules share common molecular frameworks, or scaffolds. The example given here (perhydrocyclopentanophenanthrene skeleton) is the core structure for cholesterol, sex hormones, cardiac glycosides, and steroids such as cortisone.

PCPP-skeleton

  1. Protein folds: Although a large number of distinct protein folds are known, some in particular have been ‘used’ by evolution for a variety of functions. The triosephosphate isomerase (TIM) (β α)8 -barrel fold (noted as the example in the above Table) has been described as the structural core of >170 encoded proteins in the human genome alone.
  2. Alternate splicing / differential intron & exon usage: The seemingly low numbers of protein-encoding genes in the human genome is substantially boosted by alternate forms of the splicing together of exonic (actual coding) sequence segments from single primary transcripts. This can occur by skipping or incorporation of specific exons. Also, the phenomenon of intron retention is another means of extending the functionality of primary transcripts.
  3. Alternate / multiple promoters: Many gene products are expressed in different tissues or different developmental stages in multicellular organisms. This is often achieved through single promoters subject to differential activating or repressing influences, such as varying transcription factors, or negative regulation through microRNAs (miRNAs). Another way of extending the versatility of a single core gene is seen where greater than one promoter (sometimes many) are upstream of a core coding sequence. With this arrangement, the regulatory sequence influences on each promoter can be clearly demarcated, and transcripts from each alternate promoter can be combined with alternate splicing mechanisms (as above with (4), often with the expression of promoter-specific 5’ upstream exons. A classic example of this configuration is found with the microphthalmia gene (MITF) which has many isoforms through alternate promoters and other mechanisms.
  4. Recombinational Segments: As a means of increasing diversity with a limited set of genomic sequences, in specific cell lineages recombinational mechanisms can allow a combinatorial assortment of specific coding segments to produce a large number of variants. The modularity of such genetic sequences in these circumstances is obvious, and is a key feature of the generation of diversity by the vertebrate adaptive immune system.
  5. Protein complex subunits: Protein-protein interactions are fundamental to biological organization. There are many precedents for complexes made up of multiple protein subunits having distinct compositions in different circumstances. Thus, a single stimulus can signal very different results in different cellular backgrounds, associated with different protein complexes being involved in their respective signaling pathways. Enzymatic complexes, such as those involved in DNA repair, can also show subunit-based modularity.
  6. Cells: From a single fertilized zygote, multicellular organisms of a stunning range of shapes and forms can be grown, based on differentiation and morphological organization. Thus, cellular units can be considered a very basic form of biological modularity.

Discussion of both small molecule and macromolecular instances of modularity / parsimony will be extended in succeeding posts.

Some of these modularity levels are interlinked in various ways. For example, the evolutionary development of modular TIM barrels may have been enhanced by alternate splicing mechanisms. Indeed, the latter process may be of general evolutionary importance, particularly in the context of gene duplications. In such circumstances, one gene copy can evolve novel functions (subfunctionalization) sometimes associated with the use of alternate splice variation.

* Certainly this Table is not intended to be comprehensive with respect to modularity mechanisms, but illustrates some major instances as pertinent examples.

___________________________________________________________________

 

When a person is referred to as ‘parsimonious’, there are often connotations of miserliness, or a suggestion that the individual in question is something of a skinflint. In a biological context, on the other hand, the label of parsimony is nothing but a virtue, since it is closely associated with the efficiency of the overall biological system.

Pathways to Parsimony

When modular components can be assembled in different ways for different functions, the outcome is by definition more parsimonious than producing distinct functional forms for each task. An alphabetic system underlies the most fundamental level of parsimony, but numerous high-order levels of parsimonious assembly can also exist, as Table 1 indicates.

Evolution itself is highly conducive to parsimony, simply owing to the fact that multiple functional molecular forms can be traced back to a common ancestor which has diversified and branched through many replicative generations. As noted in the footnotes to Table 1, gene duplication (or even genome duplication) is a major means by which protein evolution can occur, via the development of functional variants in the ‘spare’ gene copies. It is the ‘tinkering’ nature of evolution which produces a much higher probability that pre-existing structures will be co-opted into new roles than entirely novel structures developed.

But there is a second evolutionary consideration in the context of biological parsimony, and that is where bio-economies, or bioenergetics, comes to the forefront. Where biosystems are in replicative competition, it is logical to assume that a system with the most efficient means of copying itself will predominate over rivals with relatively inferior processes. And the copying mechanism will be underwritten by the entire metabolic and synthetic processes used by the biosystem in question. Efficiency will thus depend on how streamlined the biosystem energy budget can be rendered, and the most parsimonious solutions to these questions will thus be evolutionarily favored.

If evolution is a truly universal biological feature (as postulated within many definitions of life) then bioparsimony is accordingly highly likely to be a universally observed principle in any biological system anywhere in the universe.

Counterpoints and Constraints: Protein Folding

 Certain observations might seem to run in a contrary fashion to the proposed fundamental nature of parsimony and modularity in biology. Let’s initially take a look at protein folding as an initial case in point.

Folds and Evolution

Table 1 highlights the modularity of certain protein folds, but this is certainly not a ubiquitous trait within the protein universe. On the one hand we can cite the instances of specific protein folds which are widespread in nature, fulfilling many different catalytic or structural functions (as with the TIM-barrel fold; Table 1). Yet at the same time, it is true that many folds (>60%) are restricted to one or two functions.

While all proteins may ultimately be traceable back to a very limited set of prototypical forms (if not a universal common ancestor in very early molecular evolution), it appears that some protein folds are much more amenable to evolutionary ‘tinkering’ than others. This has been attributed to structural aspects of certain folds, in particular a property which has been termed ‘polarity’. In this context, polarity essentially refers to a combination of a highly ordered structural scaffold encompassing loop regions whose packing within the total fold is relatively ‘loose’ and amenable to sequence variation.

It follows logically that if mutations in Fold A have a much higher probability of creating novel activities than mutations in Fold B, then variants of Fold A will be more likely to expand evolutionarily (through gene duplication or related mechanisms). Here the TIM-barrel motif is a representative star for the so-called ‘Fold A’ set, which in turn are exhibitors of the polarity property par excellence.

While some natural enzymatic activities are associated with single types of folds, in other cases quite distinct protein folds can mediate the same catalytic processes. (Instances of the latter are known as analogous enzymes). It does not necessarily follow, however, that the absence in nature of an analogous counterpart for any given protein catalyst indicates that an alternative folding solution for that particular catalytic activity is not possible per se. In such circumstances, a potentially viable alternative structure (another polypeptide sequence with a novel fold constituting the potential analogous enzyme) has simply never arisen through lack of suitable evolutionary antecedents.

By their nature, the blind processes of natural selection on a molecular scale will favor certain protein folds simply by virtue of their amenability to innovation. If every catalytic or structural task could be competitively fulfilled by only a handful of folds, the protein folding universe would likely show much less diversity than is noted in extant biology. Evolution of novel folds will be favored when they are more efficient for specific tasks than existing structures. All of this is underpinned by the remarkable parsimony of the protein alphabet, especially when one reflects upon the fact that an astronomical number of possible sequences can be obtained with a linear string of amino acids corresponding to even a small protein.

Parsimony and Necessity

 Although so far this musing on parsimony and modularity has barely scratched the surface of the topic as a whole, at this point we can round off this post by considering briefly why parsimonious bio-economies should be so ubiquitously observed.

Some aspects of biology which inherently invoke parsimony may be in themselves fundamentally necessary for any biological system development. For example, molecular alphabets appear to be essential for biology in general, as argued in a previous post. Likewise, while construction of complex macroscopic organisms from a relatively small set of cell types, themselves differentiated from a single zygote, can be viewed as a highly parsimonious system, there may be no other feasible evolutionary pathway which can produce comparable functional results.

But, as indicated by the above discussion of protein folds, other cases may not be quite so clear-cut, and require further analysis. Complex trade-offs may be involved, as with the factors determining genome sizes, which we will address in the succeeding post.

It is clear that evolutionary selection for energetic efficiency is surely a contributing factor to a trend towards biological parsimony, as also noted above. But apart from bioenergetics, one might propose factors in favor of parsimony which relate to the informational content of a cell. Thus, if every functional role required for all cellular activities (replication in particular) was represented by a completely distinct protein or RNA species, it could be speculated that the resulting scale-up of complexity would place additional constraints on functional viability. A great increase in all molecular functional mediators might be commensurate with a corresponding increase in deleterious cross-interactions, solutions for which might be difficult to obtain evolutionarily. Of course, such a ‘monomolecular function’ biosystem would be unlikely to arise in the first place, when competing against more thrifty alternatives. The latter would tend to differentially thrive through reduced energetic demands, if not more ready solutions to efficient interactomes. Consequently, it probably comes down to bioenergetics once more, if a little more indirectly.

Finally, a bio-polyverse salute to the so-called parsimony principle in biology:

Evolution can tinker with bits

In ‘designing’ selectable hits

Modular innovation

Is a route to creation

Thus parsimony works, and it fits.

 

References & Details

(In order of citation, giving some key references where appropriate, but not an exhaustive coverage of the literature).

Some of the issues covered in this post were considered in the free supplementary material for Searching for Molecular Solutions, in the entry: SMS-Extras for Ch. 9 (Under the title of Biological Thrift).

Table 1 Footnote references:

Small molecules have vital roles in a wide variety of biological processes……’      See the above supplementary downloadable material (Searching for Molecular Solutions –Chapter 9).

‘…..The triosephosphate isomerase (TIM) (β α)8 -barrel fold is known as the structural core of >170 encoded proteins….’      See Ochoa-Levya et al. 2013. Additional folds accommodating diverse functions are noted in Osadchy & Kolody 2011.

A classic example of this [alternate promoter] configuration is found with the microphthalmia gene (MITF)….’      See SMS-Extras (as noted above; Ch.9); also Shibahara et al. 2001.

The modularity of such genetic sequences in these circumstances is obvious, and is a key feature of the generation of diversity by the vertebrate adaptive immune system.’      For a general and search-accessible overview of immune systems, see the text Immunobiology 5th Edition. For an interesting recent hypothesis on the origin of vertebrate adaptive immunity, see Muraille 2014.

‘…..a single stimulus can signal very different results in different cellular backgrounds….’   /   ‘ Enzymatic complexes, such as those involved in DNA repair, can also show subunit-based modularity.’      To be continued and expanded in a subsequent post with respect to parsimony involving proteins and their functions.

‘…..the evolutionary development of modular TIM barrels may have been enhanced by alternate splicing mechanisms.’      See Ochoa-Levya et al. 2013.

‘….the latter process [alternate splicing] may be of general evolutionary importance, particularly in the context of gene duplications…..’      See Lambert et al. 2015.

If evolution is truly universal (as postulated within many definitions of life) …..’      See Cleland & Chyba 2002.

‘……many folds (>60%) are restricted to one or two functions.’     See Dellus-Gur et al. 2013; Tóth-Petróczy & Tawfik 2014.

‘…..some natural enzymatic activities are associated with single types of folds…’      An example is dihydrofolate reductase (cited also in Tóth-Petróczy & Tawfik 2014), the enzymatic activity of which is mediated by a fold not used by any other known biological catalysts.

‘…..a property which has been termed ‘polarity’ ….’      These concepts have been promoted by Dan Tawfik’s group. See Dellus-Gur et al. 2013; Tóth-Petróczy & Tawfik 2014.

‘….in other cases quite distinct protein folds can mediate the same catalytic processes. (Instances of the latter are known as analogous enzymes).’      See Omelchenko et al. 2010.

‘…..an astronomical number of possible sequences can be obtained with a linear string of amino acids corresponding to even a small protein.‘     See an earlier post for more detail on this.

‘…..molecular alphabets appear to be essential for biology in general…..’ See also Dunn 2013.

Next Post: August.

Evolutionary Constraints, Natural Bioengineering, and ‘Irreducibility’

January 25, 2015

Many prior biopolyverse posts have concerned evolutionary themes, either directly or indirectly. In the present offering, we consider in more detail factors which may limit what evolutionary processes can ‘deliver’. More to the point, are there biological structures which we can conceive, but which could never be produced through evolution, even in principle?

It has been alleged by proponents of so-called ‘Intelligent Design’ (ID) that some features of observable biology are so complex that no intermediate precursor forms can be envisaged in a feasible evolutionary pathway. Of course, the hidden (or not so hidden) agenda with such people is the premise that if natural biological configurations of sufficient complexity exist such that they are truly ‘irreducible’, then one must look to some form of divine intervention to kick things along. In fact, all such ‘irreducibly complex’ examples proffered by such parties have been convincingly demolished by numerous workers with more than a passing familiarity with the mechanism of evolution.

These robust refutations in themselves cannot prove that there is no such thing as a truly evolutionarily irreducible structure in principle. What is needed, then, is not to attempt to find illusory non-evolvable biological examples in the observable biosphere, but to identify holes in the existing functional and structural repertoire as manifested by all living organisms collectively. Biological ‘absences’ could result from two broad possible scenarios: features which are possible, but not present simply due to the contingent nature of evolutionary pathways, and features which have not appeared because there is no feasible route by which they could arise. (Perhaps a third possibility would exist for ID enthusiasts, whereby God had inscrutably chosen not to create any truly irreducible biological prodigies). Of course, deciding between the ‘absent but possible’ and ‘absent and never feasible’ alternatives is not always going to be simple, if indeed it ever is.

The Greatest Show On Any Planet

Richard Dawkins has called it the Greatest Show on Earth. Sean Carroll used words of Darwin himself, “endless forms most beautiful”. These and many other authors have been struck by the incredibly diverse array of living creatures found in a huge variety of terrestrial environments. With the great insights triggered by the labors of Darwin and Wallace, all of this biological wonder can be seen as having been shaped and molded by the blind and cumulative hand of natural selection. And once understood, selective processes can be seen to operate in a universal sense, from single molecules to the most complex arrangements of matter, as long as each entity possesses the means for its own replication. It is for this reason that Darwinian evolution has been proposed as a universal hallmark of life anywhere, whatever form its replicative essence may take. While there may be few things which are truly universal in a biological sense (see a previous post for the view that molecular alphabets are one such case in point), it is hard to escape the conclusion that change through evolution and life go hand-in-hand, no matter what form such life may take.

So where do the ‘endless’ outpourings of biological design innovations ever reach some kind of end-point? There is a classic example that can be considered at this point.

Unmakeable?

It has often been claimed that a truly human invention unrepresented in nature is the wheel, and this absence has been proposed as a possible true case of ‘irreducible complexity’. At the molecular level, however, wheel-like structures have been documented. Three such cases are known, all rotary molecular motors: the bacterial flagellum, and two component molecular motors of ATP synthase. Remarkable as the latter structures are, it is of course the macroscopic level that people have had in mind when contemplating the apparently wheel-less natural world.

It will be instructive to make a brief diversion to consider what constraints might operate for a biological wheel design on a macroscale, and their general implications for the selection of complex systems. We can refer to a hypothetical macroscopic wheel-organ in a biological organism as a ‘macrobiowheel’, to distinguish it from true molecular-level rotary wheel-like systems. Although beyond the molecular scale, such an organ need not be large, and could in principle be associated with any multicellular animal. Such a postulated biological wheel structure could be used for locomotion in either terrestrial or aquatic environments, using rolling or propeller motion, respectively.

First there is a pseudo-example which should be noted. The animal phylum Rotifera encompasses the set of multicellular though microscopic ‘wheel animalcules’, rotifers, which superficially are characterized by a wheel-like locomotory organ in their aquatic environments. In fact, these ‘wheels’ are an illusory effect created by the sweeping motion of rings of cilia, and thus need not be considered further for the present purposes. Wheels of biological origin that can be unambiguously confirmed with the naked eye (or even a simple microscope) are thus conspicuous by their absence. Is this mere contingency, or strict necessity?

 Re-inventing the Wheel

Let’s consider what would be required to construct a macrobiowheel. Firstly, one would have to define what physical features are required – is the wheel structure analogous to bone or other biological organs composed of hard inorganic materials? The problem of how blood vessels and nerves could cross the gap between an axle and a wheel hub has been raised as a seeming insurmountable constraint – but with some imagination potential solutions could be conceived. For example, the axle and bearings could be bathed in a very narrow fluid-filled gap, where vessels on the other side of the gap take up nutrients and transport them to the rest of the living wheel structure (a heart-like pump within the wheel might be required to ensure the efficiency of this, depending on the size of the hypothetical animal). Transmission of nerve signals might be more problematic; perhaps the macrobiowheel could be insensate, although this would presumably be a disadvantage. Conceivably, the same fluid-filled gap could also act as a ‘giant synapse’ for nerve transmission, such that perception of the state of the wheel structure is received as a continuous whole, without discrimination as to specific local wheel regions. (This would thus alert an organism to a problem with its macrobiowheel organ without specifying which particular part is involved; a better arrangement than no information at all). Another possibility is the use of perturbations in local electric fields as a ‘remote’ sensing device, as used by a variety of animals, including the Australian platypus. The rotational motion for the ‘drive axle’ might be obtained from successive linear muscle-powered movements of structures coupled to the axle by gear-like projections.

No doubt much more could be said on this particular theme, but that will be unnecessary. The issue here is not to indulge in wild speculation, but to make the point that it is uncertain whether a biowheel of any scale at the macro-level is an impossibility purely from a biological systems viewpoint alone. So perhaps we could be so bold as to claim that with sufficient ingenuity of design, a true macrobiowheel could be assembled in a functional manner. But having acknowledged this, the formal possibility that a macrobiowheel could exist is not at all the same thing as the question of whether a feasible pathway could be envisaged for such a structure to emerge in terrestrial animals by natural selection. The potential problems to be addressed are (1) too large a jump in evolutionary ‘design space’ (across a fitness landscape) is required; (2) [along with (1)] no selective advantage of intermediate forms is apparent; (3) [along with (1) and (2)] the energy requirements for the system may be unfavorable compared with alternate designs such as conventional vertebrate limbs (consider the problem as noted above of the non-linkage of the macrobiowheel circulatory system from the rest of the organism).

The first problem, the ‘design space jump’ conundrum, implicitly states that a macromutation producing a functional macrobiowheel would be a practical impossibility. In the brief speculation as to how such a biological wheel might be constructed, it is quite clear that multiple novel processes would be required; the macrobiowheel would need to be supported by multiple novel subsystems. Where a macromutation producing any one such subsystem is exceedingly improbable, the chances of the entire package emerging at once is effectively zero. So it is one thing to design a complete and optimized macrobiowheel; to propose a pathway for evolutionary acquisition of this exotic feature we must also rationalize ‘intermediate’ structures with positive fitness attributes for the organism. Thus even if one of the postulated macromutations should amazingly appear, it would be useless for an evolutionary pathway leading to macrobiowheels unless a fitness advantage is conferred. (As always, natural selection cannot anticipate any potential advantage down the line, but adaptations selected for one function may be co-opted for other functions later in evolutionary time). A depiction of the constraints on evolution of macrobiowheels is presented in Fig. 1 below.

 

FitnessValleys

 

Fig. 1. Representations of fitness landscapes for evolution of a locomotion system for a multicellular organism. Here the vertical axes denote relative fitness of expressed phenotypes; different peaks represent distinct genotypes. In all cases, dotted lines indicate ‘moves’ to novel genotypes that are highly improbable (gray) or proscribed through transition to a state of reduced fitness (red). A. In this landscape it is assumed that a macrobiowheel is inherently biologically possible. In other words, for present purposes it is taken that there exists a genotype from which a macrobiowheel can be expressed as a functional phenotype. Yet such a genotype may not be accessible through evolutionary processes. The conclusion of A is that even though a biological construct corresponding to a macrobiowheel is possible, it cannot feasibly arise naturally, since it is effectively impossible to cross an intervening fitness ‘valley’ in a single jump (A to X; gray dotted line), and transition to intermediate forms cannot occur through their lowered fitness relative to any feasible starting point (A to B or C (gray-shaded); red dotted line). In turn, transitions from B or C to peak X (purple dotted lines) cannot take place. It is also implicit in this schema that no other feasible pathway to configurations B or C exist. Thus, configuration (genotype) X is a true case of unattainable or ‘irreducible’ complexity. B, Depiction of a conventional evolutionary pathway whereby the same starting point as in (A) transitions to an improved locomotory arrangement through intermediate forms of fitness benefits.

_____________________________________________________________________

So, ‘true irreducibility’ can result in principle from inability to create intermediate steps, universal pre-commitment to alternative design, or finally by an absolute incapacity to biologically support the proposed function. Also, the likelihood of a biological innovation acting as a fitness advantage is fundamentally dependent on the nature of the environment. Thus, with respect to our macrobiowheel musings, it has been pointed out that an absence of roads might counter any tendency for wheel-based locomotion to arise. It is not clear, though, whether an organism dwelling in an environment characterized by flat plains might benefit from wheel mobility, and in any case this issue is not relevant to macroscopic aquatic organisms and hypothetical wheel-like ‘biopropellers’ driven by rotary motion (as opposed to micro-scale real rotary bacterial flagella).

A Very Indirect Biological Route to Crossing Fitness Valleys

In a previous post concerning synthetic biology, it has already been noted that human ambitions for tinkering with biological molecules need not suffer from the same types of limitations which circumscribe the natural world. ….. So if a macrobiowheel is compatible with biological systems at all, humans with advanced biotechnologies could then in principle design and construct such a system. Such circumstances are schematically depicted in Fig. 2.

 

 

HumanAgency

 

Fig. 2. Potential role of human intervention in the generation of ‘unevolvable’ biological systems, as exemplified here with macrobiowheels. Here the natural fitness landscape of Fig. 1 (orange trace) has superimposed upon it peaks corresponding to biological constructs of human origin. Since the human synthetic biological approach circumvents loss of low-fitness forms through reproductive competition*, ‘intermediate’ forms all are depicted here as having equal fitness. Thus, by human agency, intermediate forms B and C can be used as synthetic stepping stones towards the final (macrobiowheel) product, despite their non-viability under natural conditions (Fig. 1). Alternatively, should it be feasible at both the design and synthetic levels, ‘direct’ assembly of a genome expressing the macrobiowheel structure might be attainable (direct arrow to the ‘X’ peak).

*Note that this presupposes that completely rational design could be instituted, although in reality artificial evolutionary processes might be used to achieve the desired results. But certainly no third-party competitors would be involved here.

_____________________________________________________________________

Construction of a macrobiowheel would serve to validate the hypothesis that such an entity is biologically possible. Also, demonstration of a final functional wheel-organ would greatly facilitate analysis of what pathways would have to followed if an equivalent structure was to evolve naturally. This would then consolidate the viewpoint that a true macrobiowheel is indeed biologically irreducibly complex. But since other structures and pathways might still exist, it would not serve as formal proof of the irreducibility stance in this case.

The ‘human agency’ inset of Fig. 2 has itself evolved from biological origins, just as for any other selectable attribute. Therefore, from a broad viewpoint, a biological development (human intelligence) can in itself afford an unprecedented pathway for the crossing of fitness valleys which otherwise would be naturally insurmountable. So whether we are speaking of exotica such as macrobiowheels or any other biological structures with truly ‘irreducible complexity’, then their existence could in principle be realized at some future time through the agency of advanced human synthetic biology. And given the current pace of scientific change, such times may arrive much sooner than many might believe.

 

Finally, we leave this theme with a relevant biopoly(verse) offering:

 

Biological paths may reveal

What evolution can thus make real

Yet beyond such constraints

And purist complaints

Could we make a true bio-based wheel?

 

 

References & Details

(In order of citation, giving some key references where appropriate, but not an exhaustive coverage of the literature).

‘….proponents of so-called ‘Intelligent Design….’     The ‘poster boy’ of ID is quite probably Michael Behe, of LeHigh University and the Discovery Institute. He is the author of Darwin’s Black Box – The Biochemical Challenge to Evolution (Free Press, 1996), and more recently The Edge of Evolution – The Search for the Limits of Darwinism (Simon & Schuster 2008).

‘…..all such ‘irreducibly complex’ examples proffered by such parties have been convincingly demolished…’     See Zuckerkandl 2006; also a National Academy of Sciences publication by a group of eminent biologists.

‘……a third possibility would exist for ID enthusiasts…..’     A personal perspective: A religious fundamentalist once asked me why there are no three-legged animals; he seemed to somehow think that their absence was evidence against evolution. Of course, the shoe is definitely on the other foot in this respect. If God created low-fitness animal forms that prevailed (among which tripedal animals would likely be included) , or fabulous creatures without any conceivable evolutionary precursors, then that in itself would be counted as ID evidence.

‘ Richard Dawkins has called it the Greatest Show on Earth.’    This refers to his book, The Greatest Show on Earth: The Evidence for Evolution. Free Press (2010).

Sean Carroll used words of Darwin himself, “endless forms most beautiful”.     The renowned developmental biologist Sean Carroll published a popular book entitled Endless Forms Most Beautiful – The New Science of Evo Devo, which gives a wonderful overview of the field of evolutionary development, or how the development of multicellular organisms from single cells to adult forms has been shaped by evolution. Darwin referred to “endless forms most beautiful” in the final section of The Origin of Species.

‘….the blind and cumulative hand of natural selection.’      This is not to say that the complete structure of biological entities, from genome to adult phenotype, is entirely a product of classical natural selection, but the latter process is of prime significance. For a very informative discussion of some of these issues, and the influence of non-adaptive factors in evolution, see Lynch 2007.

‘……Darwinian evolution has been proposed as a universal hallmark of life anywhere….’      For a cogent discussion of the NASA ‘evolutionary’ definition and related issues, see Benner 2010.

‘……the wheel, and this absence has been proposed as a possible true case of ‘irreducible complexity’  ‘     See Richard Dawkins’ The God Delusion, Bantam Press (2006).

‘…….the bacterial flagellum……’     For a description of the rotary flagellar motor, see Sowa et al. 2005; Sowa & Berry 2008.

‘……two component molecular motors of ATP synthase…..’      See Capaldi & Aggeler 2002; Oster & Wang 2003.

‘….animal phylum Rotifera……..’      See Baqui et al. (2000) for their rotifer site, which provides much general information and further references.

‘…….how blood vessels and nerves could cross the gap between an axle and a wheel hub has been raised as a seeming insurmountable constraint……’ | ‘….an absence of roads might counter any tendency for wheel-based locomotion to arise…..’      See again Dawkins’ The God Delusion, Bantam Press (2006).

‘……..the use of perturbations in local electric fields as a ‘remote’ sensing device, as used by a variety of animals, including the Australian platypus.’     For more background on electroreception, especially in the platypus, see Pettigrew 1999, and Pedraja et al. 2014.

‘……could exist is not at all the same thing as the question of whether a feasible pathway could be envisaged for such a structure to emerge ……. by natural selection.’ For an extension of this theme at the functional RNA level, see Dunn 2011.

Fig. 1. Representations of fitness landscapes…..’ Further discussion of evolutionary problems in surmounting fitness valleys can be found in Dunn 2009. The title of Dawkins’ book Climbing Mount Improbable (1997; W. W. Norton & Co) is in itself a fine metaphor for how cumulative selectable change can result in exquisite evolutionary ‘designs’, which of course is the major theme of the book.

‘……advanced human synthetic biology….’ The ongoing role of synthetic biology in testing a variety of possible biological scenarios was also discussed in a previous post under the umbrella term of ‘Kon-Tiki’ experiments.

Next Post: April.