Three is the magic number
We all know about the two strands of the double helix, and that there are four letters in DNA. But as far as I’m concerned, when it comes to biology, three is the magic number. That’s because the code that translates the DNA recipes encoded in genes into the thousands of different proteins that build every living thing on earth is based on groups of three letters, no more no less.
So how does it work? And who figured it out in the first place?
The story of the discovery of the triplet code doesn’t start with biology at all, but with a crucial mathematical insight by physicist George Gamow in the mid-1950s. Around this time it had become clear that DNA was made up of four different chemical ‘letters’, or bases – adenine, cytosine, thymine and guanine, or A, C, T and G. And scientists also knew that there were 20 different amino acids that acted as the molecular building blocks of all known proteins.
So if a gene was a kind of DNA code that told cells which amino acids to assemble in which order to make a particular protein, then there should be some kind of relationship between the number of DNA letters required to encode each amino acid.
Gamow did the maths. If it was two letters per amino acid, then there could be only 16 possible pairs of A, C, T and G - AA, AC, AT, AG, CC, CA, CT, CG and so on - falling short of the minimum 20 that were needed to encode all the different amino acid building blocks. But there were far too many possible combinations of four different letters – 256 in total – which seemed excessive. But there are only 64 different three-letter combinations, not too few but not too many. So three had to be the magic number, at least in theory.
In 1961, Francis Crick (out of Watson and Crick), Leslie Barnett, Sydney Brenner, and Richard Watts-Tobin proved that three was indeed the rule, carrying out an elegant experiment where they removed or added one single letter of DNA at a time from a gene in a virus that infects bacteria. Taking away or adding in just one letter completely messed up the gene, as did adding or removing two. But removing or putting in three left the gene functional.
The next thing was to figure out which three-letter DNA word encoded which amino acid. This no simple task given that molecular biology tools and techniques were still in their infancy. The first crucial step came from Spanish-American biochemist Severo Ochoa, who purified RNA polymerase in 1955. This is the enzyme that reads DNA and makes a molecular copy called messenger RNA, which is then used by the cell as the instructions for making proteins.
This meant that researcher Marshall Nirenberg could create specific strings of RNA ‘words’ to order in the lab. To keep things simple, he started with just one repeated letter - uracil or U, which is the RNA equivalent of the DNA letter T. He fed long strings just spelling U U U U U U U into test tubes containing mashed-up bacterial cell extracts to see what proteins they made.
It was phenylalanine, encoded by UUU. One down, 19 to go. Other researchers copied Nirenberg’s idea, creating more complex artificial messenger RNAs and seeing what they made. At the University of Wisconsin, Indian-born biochemist Har Gobind Khorana created a string reading U C U C U C… and popped it into the bacterial system.
The result was an alternating string of serine and leucine. The experiments came thick and fast. Eventually the three-letter words, or codons, spelling out all 20 amino acids were figured out, along with three words that meant ‘stop reading here’- UAG, UAA, and UGA – and just one for the start, AUG, which encodes the amino acid methionine at the start of every single protein.
There was also confirmation that each set of three RNA or DNA letters are non-overlapping, and that there can also be multiple triplets encoding for the same amino acid, For example, there are four different codons for leucine, and two for phenylalanine.
Khorana and Nirenberg won the 1968 Nobel Prize in Physiology or Medicine "for their interpretation of the genetic code and its function in protein synthesis," together with Robert Holley, who figured out an important part of the process by which amino acids are assembled together when the recipes encoded in RNA are translated into proteins.
The award was particularly wonderful news for Khorana, who had grown up in the Punjab (now modern-day Pakistan) in what was virtually the only literate family in his small village, was taught to read and write by his dad and went to school under a tree.
The full dramatic story of how the genetic code was cracked is laid out in glorious detail in Matthew Cobb’s book, Life’s Greatest Secret, which is well worth a read if you’re curious to find out more.
For me, the most incredible thing about the triplet code is its ubiquity. The same system of three-letter DNA words spells out the amino acids in your body, in bacteria living in a deep sea vent, in a glorious bird of paradise and a slimy slug. It’s used by all living organisms on earth, with only a few minor exceptions such as a variation in the amino acid encoded by a particular triplet, highlighting the shared evolutionary origin of all life on earth.
But the genetic code is now taking a further evolutionary leap, thanks to human ingenuity. Researchers are extending the code by developing synthetic DNA bases, such as X and Y, which encode amino acids that aren’t normally found in nature but can be incorporated into proteins with enough genetic trickery. And in 2017, researchers in South Korea even engineered a living mouse with a modified genetic code. No longer an enigma, the code of life has been well and truly cracked.
References and further reading:
2011: 50th Anniversary of the Discovery of the Genetic Code. Prof. Dr. Volker A. Erdmann Prof. Dr. Jan Barciszewski Angewandte Chemie r 2011 https://doi.org/10.1002/anie.201103895
Eugene V. Koonin and Artem S. Novozhilov. IUBMB Life. 2009 Feb; 61(2): 99–111. Origin and evolution of the genetic code: the universal enigma
Life’s Greatest Secret: The Race to Crack the Genetic Code, by Matthew Cobb