Rewriting the genetic alphabet
Click here to listen to the full podcast episode
The genetic alphabet as we know it consists of four letters - A, G, T and C, which are the abbreviations of four chemical bases, adenine, guanine, thymine and cytosine. A always pairs with T, through two connections known as hydrogen bonds, while G always pairs with C through three bonds. This four-letter code has long been considered fixed for all life on Earth, but in 1977 researchers in the Soviet Union identified a phage virus called S-2L, which infects photosynthetic bacteria, that contained a chemically modified version of adenine, which they called 2-aminoadenine, or Z for short.
Z is very similar to A, but it has an extra little chemical addition known as an amine group. This allows it to form a third hydrogen bond with its pairing partner, T, compared to the usual two between A and T. As a result, the Z-T pairing is more stable and resistant to the kinds of viral DNA-chomping enzymes that make up the bacterial immune system. But although it’s intriguing, the discovery was considered a scientific curiosity and largely ignored in the decades that followed
Then in the early 2000s, researchers at University of Evry in France began looking more closely at the curious Z-based S-2L virus genome. They identified the genes for producing Z and began searching for other viruses that harboured the same genes, suggesting that they might also make Z too. To their surprise, far from S-2L being a one-off, there were hundreds of similar phages that were also using Z instead of A.
Then in 2021, researchers managed to identify all the molecular machinery necessary for using Z to make DNA instead of A, including an enzyme that produces Z, and a polymerase enzyme that incorporates Z into DNA as it’s being replicated.
Scientists think that using Z could help viruses avoid bacterial defences that recognise and destroy foreign genetic code (we’re back to arms races again!). But the additional stability of the Z-T pairing could provide other advantages.
The appearance of Z in viruses shows the adaptability of the genetic code and dispels the myth that it is a ‘frozen’, ‘static’ or ‘completed’ code. In fact, there may be many more examples of Z-T pairing or other substitutions in nature that haven’t been discovered yet because standard genetic sequencing can’t identify them. I guess you don’t know what’s out there if you can’t see it… Who knows what other letters we could be adding to the genetic alphabet in future.
References:
DNA Has Four Bases. Some Viruses Swap in a Fifth, Quantamagazine
A widespread pathway for substitution of adenine by diaminopurine in phage genomes, Science
Noncanonical DNA polymerization by aminoadenine-based siphoviruses, Science
A third purine biosynthetic pathway encoded by aminoadenine-based viral DNA genomes, Science