Genes or junk?
"Click here to listen to the full podcast episode"
Once the data from the human genome began to pour in, the fact that humans have fewer than 25,000 protein-coding genes was only one of the scientific surprises. If they had hoped that the human genome was like Netflix, packed with a fascinating mix of unique programmes, it was more like the worst cable channel ever, crammed with endless repeats of long-cancelled shows and boring ads, with only an occasional original episode to lighten the tedium.
In fact, when they looked closely, they realised that actual genes make up less than 2% of all the DNA in the whole human genome. So what’s all the rest? Is it just junk?
We can thank Japanese-American geneticist Susumu Ohno for the term ‘junk DNA’. Back in 1972 he published a paper entitled “So much ‘junk’ DNA in our genome”, musing on an interesting mathematical problem.
At the time, scientists already knew that a single human cell contained at least 750 times as much DNA as a bacterium. And they also knew that bacteria had a few thousand genes. A quick calculation suggested that if the number of genes in any genome was directly proportional to the amount of DNA, then humans should have….. three million genes, more or less.
But – hold up a minute. Ohno also noted that lungfish and salamanders can have 36 times more DNA in their cells than is present in ours, suggesting that they should have…. A hundred million genes. None of this really made sense. Human cells don’t make 3 million proteins. And what would a lowly lungfish need with a hundred million genes, anyway?? The most sensible conclusion was that the vast majority of the human genome was ‘junk’ (now more correctly known as non-coding DNA) – and, by implication, so were the genomes of many other species too.
The exact quantity and function of all this non-coding DNA is still a hot topic in the world of genetics, and – like the exact number of genes – seems to depend on how you measure it and who you ask. But the one thing that most people agree on is that size isn’t everything, at least when it comes to genes and genomes.
The human genome consists of around 3 billion basepairs, or ‘letters’. To be strictly accurate, that’s a haploid genome, one half of the full diploid set you get from mum and dad when egg meets sperm at the moment of fertilisation. So how does this stack up compared with other organisms?
Mexican axolotls have more than 10 times as much DNA in their genome as we do, around 32 billion basepairs, and previously held the title of the animal species with the biggest genome to be fully sequenced to date, only to be usurped by the Australian lungfish in 2021, whose genome clocks in at an impressive 43 billion basepairs - more than 14 times the size of our own. Both of these are dwarfed by the African lungfish, Protopterus aethiopicus, with a genome of 130 billion basepairs, but – perhaps understandably – is yet to be fully sequenced.
Yet again, plants put in an impressive showing here, with all these slimy suckers dwarfed by the Japanese canopy plant, Paris Japonica, which went into the record books in 2010 with a genome calculated at around 149 billion basepairs. Yet the prize for the biggest genome on record – so far – goes to Polychaos dubium, a single-celled amoeba, whose genome is supposedly an incredible 670 billion basepairs. However, as its name might suggest, some researchers cast doubt on the figure, as it was calculated before the development of modern genomic analysis techniques.
Even the number of chromosomes in the human genome – 46, or 23 pairs – is nothing to write home about (and if you remember our recent reposted episode, Strands of Life, featuring the case of the missing chromosomes, you’ll know that number came down from the original figure of 48!). We don’t have a particularly large number for a typical mammal – although cats have 38 chromosomes (that’s 19 pairs), dogs have 78, or 39 pairs, while the South American rodent Tympanoctomys barrerae has an impressive 102 chromosomes, or 51 pairs. The simple single-celled pond-dweller Oxytricha trifallax beats this hands down, with more than 15,000 chromosomes, most of which contain just one single gene – the result of some incredibly complex genetic jiggery-pokery that happens as the organism organises its genes in order to read them. Now that takes some counting…
Speaking of other species and their genomes, as we discussed way back in our fifth ever episode, Vegetable Soup, published at the beginning of 2019, it’s hard to talk about genomes and junk DNA without bringing up the Onion Test, devised by geneticist T. Ryan Gregory and published in a paper written together with Alexander Palazzo in 2014.
Put simply, the Onion Test goes like this. The onion in your vegetable drawer has five times more DNA than humans. So if you’re a researcher who thinks that non-coding DNA has a particular function in the genome, can you explain why an onion needs about five times more of it than a human to do the same thing?
Unpeeling this idea a bit further, Gregory points out that some species of onions have around double the amount of DNA as your regular onions, while others have less than half. Yet they’re pretty much the same and have the same number of genes, so why would they need double or half the amount of non-coding DNA?
Then there’s the poisonous Fugu pufferfish – often eaten (very carefully!) as a delicacy in Japan. They have remarkably compact genomes, roughly an eighth of the size of our own yet containing almost exactly the same repertoire of genes and very little junk. So how do Fugu get by with virtually no non-coding DNA, while humans - and many other species - have so much? Nobody really knows, but - as is so often the case - the answer is probably just “well, it evolved that way.”
Susumu Ohno suggested as much in his seminal paper on junk DNA, arguing that the genome that humans have ended up with at this current point in time is the result of evolutionary processes at work over aeons. As he put it in a beautiful quote, ‘The triumphs as well as failures of nature’s past experiments appear to be contained in our genome.’