Genetics Unzipped is the podcast from the Genetics Society - one of the oldest learned societies dedicated to promoting research, training, teaching and public engagement in all areas of genetics. Find out more and apply to join at genetics.org.uk

Where have all the genes gone?

Where have all the genes gone?

Octopus

"Click here to listen to the full podcast episode"

The year is 2001. After a decade of work and billions of dollars, the first draft of the human genome has been assembled – the result of a race between the public-funded Human Genome Project and Craig Venter’s private company Celera. The Prime Minister Tony Blair and US President Bill Clinton linked up by satellite to celebrate the achievement, with Clinton making the somewhat unscientific claim that “Today, we are learning the language in which God created life.” And lo, there was much hype.

But for all the claims that we had finally unlocked the secrets of human biology and were setting off into a new era of gene-driven medicine, there was one rather glaring issue with this genomic book of life: where were all the genes?

The year before the grand announcement of the draft human genome, two men were drinking in the bar at a scientific conference at Cold Spring Harbor in New York. One was Ewan Birney, from the European Bioinformatics Institute based in Cambridge, the other was Francis Collins, director of the US National Human Genome Research Institute, and both were up to their necks in the race to read the human genome. At the time, the finish line was still pretty far away, and nobody yet had a good idea about how many genes it would ultimately take to make a human.

Together, Birney and Collins hatched a plan to launch a sweepstake on the answer. It would cost a dollar to bet in 2000, five dollars in 2001 and twenty in 2002, as scientists got closer to the actual answer, with the winner announced in three years’ time at the same conference.

They roped in the conference director to act as bookie and started collecting bets from the research community. By the time of the 2003 meeting, some 460 people had put down hard cash for what had become known as the GeneSweep.

Most of the guesses that came in early were mostly quite large, ranging from the high tens of thousands to more than 150,000. As more data came in from the sequencing labs, the number came down, averaging out at around 60,000. In the end they were all too high.

By the time the 2003 meeting rolled around, the number of human genes in the database (as agreed in the original GeneSweep rules) was just 24,847. The $1200 jackpot went to Lee Rowen, a Seattle genome researcher who picked 25,947 – still an overshoot but only just. Rowen split her winnings with the only two other researchers who bet on less than 30,000 genes - Paul Dear from the UK Medical Research Council, who guessed 27,462, and Olivier Jaillon of Genoscope, the French national DNA sequencing centre, who plumped for 26,500.

Looking back from today’s vantage point, the fact that just three guesses were even remotely in the right ballpark seems staggering, but it highlights how little was known about the human genome at the time, and how fast our knowledge has accelerated. And, as you might expect, nearly twenty years after Rowen, Dear and Jaillon pocketed their winnings (and, I hope, spent at least some of them in the conference bar!), that official ‘correct’ figure of 24,847 genes is out of date.

Around the time of the first announcement of the draft human genome, David Bentley, then head of genetics at the Wellcome Trust’s Sanger Institute, where much of the human genome was sequenced, confidently announced, “We will have a best reference sequence by 2003, and will have identified all the genes by 2004. That will give us access to all the genetic information about ourselves – it’s unchanging, and unequivocal.”

Even now, the exact number of genes in the human genome is up for debate – a confusion that owes more than a little to the fact that nobody is quite sure what a gene actually is. We’ve moved far beyond the early days of molecular biology when a gene was a stretch of DNA that was copied into a specific type of messenger RNA and decoded to make a protein. In a world of non-coding RNAs, microRNAs, weird genome arrangements and more, the exact number of genes in the human genome largely depends on who’s counting and what they’re including.

Most researchers now put the number of human genes at somewhere between 20,000 and 25,000, with some studies even going as low as 19,000 or so. Whatever the precise number, it’s still a fraction of the 100,000-ish that was bandied around in the early days of the human genome. So why had those early guesses been so high?

Around the time of the GeneSweep, I was doing my PhD in developmental genetics at the Gurdon Institute in Cambridge. I distinctly remember learning that there were around a hundred thousand human genes. This made sense, given that the human body is made up a repertoire of at least that many proteins – so, if one gene encodes one protein, then you’re going to need a hundred thousand genes.

I also suspect there was more than a little bit of species exceptionalism at play, following the completion of earlier projects sequencing the genomes of simpler organisms. For example, the genome of the nematode worm C. elegans clocks in at roughly 20,000 genes while the fruit fly Drosophila has 15,000. And of course, aren’t we humans at least, say, five times more complex and awesome than a tiny fly or a worm?

However, subsequent genome projects have proved that there is little connection between the number of genes in an organism’s genome and its size or cleverness. In fact, recent research suggests that species tend to lose genes as they evolve more complexity, suggesting that ‘less is more’ and ‘use it or lose it’ might be more apt maxims for genomes than ‘go big or go home’.

It turns out that humans are nothing special, at least in terms of the number of genes we have. Many organisms have far more genes than we do. Water fleas the size of a grain of rice have around 31,000 genes. Octopuses have around 33,000.

This fact always reminds me of one of the less well-argued articles about not eating animals that I have ever seen, which was published in the Guardian just after researchers announced they had sequenced the octopus genome. The first line of the piece declares, “They may be delicious and sure, there are lots of them, but next time you’re chomping down on your barbecued octopus, just remember they were the first intelligent beings on Earth and have more genes than you do.”

However, if the author applied this genetic standard to all her food, she would quickly find her plate looking rather bare. While it’s true that many animal species have as many or more genes than humans, so would be off the menu, plants are particularly blessed in the gene department: grapes have around 30,000 genes, Golden Delicious apples 57,000, and wheat has nearly 100,000. Even the humble carrot has 26,000 genes. And although some fungi have under 10,000 genes, the largest mushroom genome analysed to date, belonging to the species Craterellus lutescens, has 52,289. So if you’re looking for a scientifically sound argument for vegetarianism, this ain’t it.

Are you more special than an onion?

Are you more special than an onion?

Genes or junk?

Genes or junk?

0