Genetics Unzipped is the podcast from the Genetics Society - one of the oldest learned societies dedicated to promoting research, training, teaching and public engagement in all areas of genetics. Find out more and apply to join at genetics.org.uk

The birth of DNA sequencing

The birth of DNA sequencing

Researchers read a Sanger sequencing autoradiograph, Image courtesy of National Human Genome Research Institute

Researchers read a Sanger sequencing autoradiograph, Image courtesy of National Human Genome Research Institute

Click here to listen to the full podcast episode

Our story starts with a theoretical physicist called Walter Gilbert.

Gilbert did his PhD at the University of Cambridge in the late 1950s, where he met James Watson, out of Watson and Crick, who helped to solve the double helical structure of DNA.

After completing his PhD, Gilbert moved to Harvard University in 1956, where he soon became an associate professor in physics. Watson also moved to Harvard around the same time, and the two remained friends. They often talked about the exciting developments in molecular biology. After deciding that physics was having a dull moment, Gilbert began joining in with the experiments in the biology lab. 

Despite his education in theoretical physics, Gilbert wasn’t afraid to return to the basics in his new field of biology and often bugged the students in the laboratory with his questions. He soon found he had a knack for experimental biology, and it wasn’t long before he and Watson were running the molecular biology group together.

Gilbert’s work soon led him to a problem. He was working on regulatory proteins that control how genes are switched on and off in cells, by binding to specific sequences of DNA near genes, and he needed to figure out the DNA sequence that a particular protein called the lac repressor bound to. 

With no efficient DNA sequencing techniques available at the time, he relied on painstakingly slow and labour intensive laboratory methods to work it out. It took Gilbert and his colleagues two years to decipher the 24 base sequence, which they published in 1973.   

Knowing that genes were thousands or tens of thousands of bases long, Gilbert knew that if they ever wanted to have a hope of understanding even the simplest gene, they needed a quicker way to read DNA. 

In February 1977, Gilbert and his student Allan Maxam published an exciting new DNA sequencing method. 

Their technique, which quickly became known as the Maxam-Gilbert or chemical method, involved labelling the ends of the DNA you want to sequence with radioactive phosphorous and then splitting it up into four separate tubes. Each tube is then treated with a different chemical that breaks the DNA at a specific base in a small proportion of the DNA strands, generating a bunch of fragments of different lengths, each ending at one of the four DNA letters, A, C, T or G depending on the chemical and exactly where they were cut. 

These fragments are then separated by size, running the four reactions side by side through a slab of polyacrylamide gel, and visualised using X-rays to decipher the original DNA sequence. So, for example, the longest fragment might end at a T, the next longest at a C, the next at an A, the next at a C and so on, telling you that your original sequence must have read T, C, A, C etc etc.

The new technique slashed the time taken to read DNA from one base per month to sequencing hundreds of bases in an afternoon.

But at the same time as Maxam and Gilbert were working on their method, other scientists were also coming up with their own solutions to the problem of rapid DNA sequencing.

In 1975, British biochemists Fred Sanger and Alan Coulson published a paper describing what they called the ‘plus and minus’ method of DNA sequencing. This basically involves eight separate reactions using DNA polymerase and various combinations of labelled bases to generate DNA fragments of different lengths that revealed the underlying sequence in a similar way to Maxam and Gilbert’s method. Sanger and his colleagues managed to use it to read the first DNA genome, all five thousand or so bases of bacteriophage ϕX174 (or ‘PhiX’ for short).

Although it did work, this method was complicated (and hard to get your head around, if I’m honest) so Sanger and Coulson teamed up with another biochemist, Steve Nicklen, to develop a simpler, faster technique - the chain termination method, which became known as Sanger sequencing. 

Similar to the plus and minus method, this technique uses DNA polymerase to make new DNA from a template (that’s the stretch of DNA you want to read), set up in four separate reactions, each with one different radioactively labelled base plus a mixture of all four bases left unlabelled. Importantly, these radioactively labelled bases have been modified in a way that means they can’t have any more bases added after them. 

So as the DNA template is copied, these modified nucleotides of a particular letter are incorporated at random positions, terminating the strands and resulting in fragments of different lengths with a known final letter.  The four reactions are then run side by side through a gel, separating the fragments by size as before allowing you to work out the original sequence from the fragments from their length and final letter. Published at the end of 1977, the new technique quickly caught on.

Fred Sanger and Walter Gilbert both took a share in the 1980 Nobel Prize for their work on DNA sequencing, and their two methods were quickly adopted by labs throughout the world. In a neat parallel with the war being fought at that time in the world of consumer electronics between Betamax and VHS video, the two methods battled it out for dominance in the early 1980s. 

But although Maxam-Gilbert sequencing was initially more popular, in the end, Sanger sequencing won thanks to its simplicity and ongoing improvements that made it ever faster, cheaper and safer than Maxam-Gilbert sequencing, which was eventually largely forgotten. 

Alas for Gilbert this isn’t the only time he’s been pipped to the post in a biotech race. A few years earlier, he also lost the battle to become the first to produce synthetic insulin, a title that went to Herbert Boyer and the scientists at Genentech instead (listen to episode S4.11, from genes to bugs to drugs for more on that story).

Although it was revolutionary, Sanger sequencing was also painstakingly slow in the early days, producing about 100 DNA letters at a time. Throughout the 1980s researchers worked out how to use fluorescent dyes instead of radioactivity to label the end letters and discovered more efficient ways of separating the different length DNA fragments.

While these advances hugely sped up the process and made it possible to read sequences of hundreds of letters in a fraction of the time, the genomes of even the simplest organisms can run into many thousands of letters. So how do you read it all and put it together? 

In order to read longer sections of DNA and decipher whole genomes, scientists developed a technique called shotgun sequencing, which involves chopping up DNA into random overlapping lengths, sequencing the sections and then re-assembling the code by matching up the overlaps to get back to the original whole sequence, a bit like a very complicated jigsaw. The method was used for the first time in 1981 to sequence the whole genome of a cauliflower mosaic virus, which contained around 8000 base pairs. 

Increasing automation made decoding whole genomes quicker and easier. The first organism to have its whole genome read was the bacterium Haemophilus influenzae, published in 1995 and clocking in at just 1.8 million letters. Yeast followed in 1996 and the tiny nematode worm C. elegans in 1998. 

By then, the wheels were fully in motion sequencing the human genome - or as much of the 3 billion letters as was possible with the technology of the time. 

I was lucky enough to be a summer student for a couple of years in the mid-90s at the institute in Cambridge that bears Sanger’s name while the human genome project was under way, working right alongside the teams that were sequencing the mouse and puffer fish genomes. There was an incredible sense of excitement in the air that we were finally getting our hands on the tools that would enable us to start decoding DNA at scale and unlock a whole new world of biology. 

You can learn more about this incredible time for one of the researchers at the forefront of the Human Genome Project, Eric Green, in episode 22 from season 3, the Past, Present and Future of the Human Genome Project.

By 2001, after a decade of work and billions of dollars, a rough draft of the human genome had been assembled. UK Prime Minister Tony Blair and US President Bill Clinton linked up by satellite to make the announcement, during which Clinton made the somewhat unscientific claim that ‘Today, we are learning the language in which God created life.’ And lo, the age of genomics was born.

References

The future of DNA sequencing

The future of DNA sequencing

DNA sequencing: the next generation

DNA sequencing: the next generation

0