013 The Zero Dollar Genome
Kat: Hello, and welcome to Genetics Unzipped - the Genetics Society podcast with me, Dr Kat Arney. In this episode we talk to George Church about his plans for the ‘Zero Dollar Genome’, and find out how one scientist’s interest in personal genomics got a little too close to home.
From Billions to Zero
Kat: The first draft of the human genome came with a price tag running into billions of dollars. In less than twenty years, the cost of whole genome sequencing had plummeted, making the thousand dollar genome a reality by 2014, and opening up a consumer market for personal genome sequencing - although as geneticist Elaine Mardis quipped, it’s a $1000 genome, and a £100,000 analysis. The price for sequencing continues to fall, and several companies are vying to be the first to break the hundred dollar barrier.
But, according to George Church, professor of genetics at Harvard Medical School and one of the world’s leading authorities on genes and genomes, we’re about to see the dawn of the zero dollar genome, making personal whole genome sequencing effectively free in exchange for the data.
To be fair, George does have a vested interest in this area. In 2005 he set up the Personal Genomes Project, which asks willing volunteers to publicly share their personal genome data for the public good - the idea being that trying to keep genome data completely anonymous was too difficult, so what would happen if it was all out in the open to start with. And he’s involved in a new venture called Nebula Genomics, which goes the other way, using a new type of encryption to ensure privacy and data security.
I was lucky enough to sit down with him at a recent conference exploring issues in personal genome sequencing, held at the Wellcome Genome Campus in Cambridge, to find out how far we’ve come in whole genome sequencing and see what’s coming down the pipeline for the future.
George: The quality has improved tremendously, and the way you get to a thousand dollars was by multiplexing. That is to say one drop of liquid used to do one reaction and now one drop of liquid can do millions or billions of reactions, so that was the main technical thing. But then to get it from one thousand to zero, you have to recognise that there's huge parts of society that can benefit financially, like the insurance companies or the government or other healthcare providers.
They're spending trillions of dollars a year worldwide on severe Mendelian diseases and money can be recovered there. Then there's research operations like pharmaceutical companies that also stand to gain quite a bit of money, if you can get the right cohorts to them in a way where more people feel comfortable with this - there's been a recruiting issue.
The Mendelian disease issue can be solved immediately without new research, and then there's the new research that can be used for developing new drugs. Both of those are trillion-dollar business, potentially.
So anyway, you just pass a tiny fraction of that along to the individuals to incentivise them, or at least not de-incentivise them by the thousands of dollars cost.
Kat: As anyone knows all too well, there is no such thing as a free lunch so if you're saying that companies will pay to have individual genomes sequenced, maybe you get a little kickback for your own pocket. That's very nice, money for my genome, lovely. But what about the issues of privacy and access?
There's been so much talk about online data, digital data; who has access to it, what's it being used for, how is it being used to identify and track people? How do we address that kind of issue with genomic data, which is so personal and so sensitive?
George: Alright, that was the point we were making in 2004 about the personal genome project. At that time we did consider it extremely identifying, along with your imaging data, which is as well. That's why we made it open because we couldn't think at the time of how to make it closed or to share it and still have it private.
But since then, since 2004 there have been two computer science revolutions. One is in homomorphic encryption, which is a fancy way of saying that you can ask questions of an encrypted genome and get the answers out as if you had decrypted it, without ever decrypting it.
Most people think of that encryption like an email; you might encrypt something you have in a decrypted form, similar to an email. Then at the other end you decrypt it, so at both ends it's decrypted. But in homomorphic encryption queries, you never decrypt it. It's always encrypted, and you own it.
So in the Nebula model, which is what we're talking about here, which gets us to the zero-dollar genome, it both figures out how to get the money squared away but also how to keep the privacy via this combination of homomorphic encryption and Blockchain.
Kat: So it's almost like I would have my genome in a bag, and someone would come to me and go, "Have you got this in your bag?", and I can go, "Yes", or "No"?
George: Right. Except you won't have to do that, the computer will do that, so even you don't necessarily need to look at your genome.
Some people in addition to worrying about other people seeing things in their genome, are worrying about them seeing something they can't treat. And that is totally negotiable - it always has been but this makes it even easier - so that you only see things that will help you medically, which for most people is a very tiny amount of information, a tiny bit impactful. To 5 percent of the population it's very impactful and 95 percent of the population, most of your problems are generic for the whole human race, like ageing.
This allows you to not learn anything that is not actionable and you're not responsible for it - an insurance agency can't say you're gaming the system. You can prove that you never looked at your genome, they can't use it against you in a court of law because you don't actually have the genome. The genome is not open, literally. It's not possible to find out anything about your brother or your other family members because the encryption is designed to only answer certain questions.
Kat: So it's not even coming up to me and saying, "Have you got this in your genetic bag?", it's like going, "Anonymous person, do you have this in your genetic bag?"?
George: That's one of the preferred ways of using it. There are a number of new ways that are opened up by this, but that would be one of the preferred ways of doing it, yes.
Kat: I love the sound of this because there's a lot of those kinds of questions that have bothered me about; how much do I want to know about what's going on that has relevance to my family, who may not want to know the kinds of things that I want to know? Or they do want to know the kinds of things that I don't want to know. And the issues with insurance and all these kinds of things.
Moving on more broadly to the idea of the human genome and what we can find out about it, I think it's something that people don't really know, that the human genome that we have is not the human genome. It's just a reference sequence that is not necessarily representative of all the diversity in the world. It's not even all the bases of that six billion in our DNA.
George: Yes, that's right. The original genome about which there is a lot of fuss made, was not only a poor technology in that it couldn't produce any genomes of medical value to an individual, number one. Number two, it wasn't finished. There was five to eight percent missing. And if we had finished for one person, as you do more and more parts of the world we found that something like ten percent of it is missing from that person even if you had a perfect sequence of that person. And we still don't have a perfect sequence of any person and we still haven't finished the survey.
So there's a bit more to do, but I think the ability to finish a genome is approaching. Many of my colleagues consider it done, but the parts that aren't done are actually quite interesting from a standpoint of senescence and aging and non-disjunction, meaning that way that chromosomes are lost during fertilisation and development.
So we need not only to finish that, we need to figure out the functional analysis of that. We need to figure out what the intentional mutation in the lab, we need to figure out what the functions are. And it's that ability to read and write genomes - including repetitive sections as they're important - and it's suddenly arriving.
Kat: These are the repetitive bits, the long dead viruses, just structural parts of the genome - repeated sequences. Why are they so damaging to health? What goes wrong with them and what are you trying to do to understand and change them?
George: Well, first of all they're not long dead. Many of them are still active and many of them jump around during development. So random jumping around - they could land in the wrong place and cause cancer or some development defect. They also probably have a positive aspect. Some have been found to have a positive component in, say, placental development and health. We need to understand what they do and some of them will have technological significance.
One example of a highly repetitive, non-conserved, functionless (in the sense you could delete it) was CRISPR, which then turned into a major technology for anything genome. So both from the aspect that they could hurt you, they could help you or they could be some new technology. I think that's why we need to finish the human genome.
Kat: And once we've got the sequence of the human genome, for whatever value that looks like, now we have the tools in our hands to start chopping it up, taking bits out, rearranging, rewriting. Where do you see the future of that heading and what are you and your team trying to do there?
George: We've been working on the technology for writing as much as for reading and both of them, we've brought down the cost by about ten millionfold. Then there's editing where the efforts have resulted in a slightly smaller improvement, but something that people can still get very excited about. And most of the editing is used for - clinically so far - rare genetic diseases and some infectious diseases. One of the things my group is working on is solving diseases that affect everybody, which are diseases of aging. We would like to reverse aging so you have a very youthful existence for more years.
Kat: And this would be through manipulating the repeats, changing them, deleting them, fiddling with the repetitive sequences in the genome?
George: Aging reversal is not limited to repeats. There is some evidence that the repeats are involved. The telomeres at the ends of chromosomes certainly are involved. The interstitial ones are also involved. Almost every gene has some possibility. We've whittled it down to about three hundred non-repetitive regions plus a dozen repeat family types.
Kat: At some point, if you're talking about making lots and lots of changes in the genome, and in one of your slides you talked about making thousands of changes, wouldn’t it just be quicker to build a whole new chromosome?
George: Yes. We certainly do some of that. We call it writing and editing, and if you do enough editing it would be hyper-editing. It's kind of a case by case thing as the technology changes and it's changing so rapidly, exponentially. We have to re-evaluate it basically every year. And I would say just in the last few months, it's shifted from an emphasis on - in my lab at least - writing the DNA back to higher editing. Because we've found we can now make up to 26 thousand edits, which is more than most projects that involve resynthesis from scratch. Even though it's very inexpensive to write the DNA in the little segments, it's in the order of a few thousand, a couple of thousand dollars to write a whole human genome of six billion base pairs. To assemble it properly and test it is still prohibitive, but editing, it can be very easy to get the editing right. It does it more or less automatically in a few days.
Kat: I'm just stunned by you saying, "Oh, we can make 26 thousand edits", I spent most of my PhD trying to knock out one gene.
George: Right, yes, it's changed tremendously. And it's not because we have a roomful of robots. It literally is a molecular method where one person's hand can do it in a couple of minutes, and then you let it incubate for a few days in the incubator and it's done. So it really is a completely different game. It has to do with what we call multiplexing, which is an electronics term that we've borrowed. Molecular multiplexing allows us to do sometimes billions of things for the same cost and effort as one thing.
Kat: And finally I'm intrigued about where the advances in technology are going to enable us to look at the sequence of DNA, the sequence of RNA, when genes are active in situ.
Because I think this is something else people don't really realise about the human genome, that it's different in different parts of your body and it's doing different things. If you just take a sample of tissue or blood and mash it up and look at the genes or look at the RNA, you're missing all of that context and single cell detail. Where are we now with those kinds of approaches? Will we be able to see the DNA sequence in a single cell down a microscope one day?
George: Yes, the one day is this year, basically. Another thing that we're missing is the three-dimensional structure of the individual cell and the three-dimensional structure of cells within organs in the body.
We have anatomy, so we have very good anatomy, that has been stable for decades, the anatomy books that medical students use. But when you start extending that down below the cellular level we have pictures which might be in two or three colours, when there are millions of different molecules, each deserves its own colour. So what's happened is that exactly the same methods that we use for sequencing DNA where you break the cell apart and splatter it randomly on a glass microscope slide, we can say hey, sequencing now is microscopy, why break it apart and randomise it? Why not keep the three-dimensional coordinates.
That's called fluorescent in situ sequencing and it works for DNA, RNA, protein, anything for which you have an antibody, including carbohydrates. Suddenly now, every pixel, every voxel in a three-dimensional tissue section, or the reconstruction of an entire organism if you have enough microscope time, every pixel has a molecule that has a name.
Kat: I just think it's wonderful, it completely blows my mind. Finally, you gave us a talk where at the end you said, "Any questions?" and everyone sat there like their brain had been winded because some of this stuff sounds so fantastical and even a few years ago I would never have believed some of the things that have been published recently. Just give me a little snapshot, paint me a picture of something in the future that you can see being a reality, that would sound like science fiction today.
George: Well, first we have to calibrate the things that sounded like science fiction when I started them and now are ordinary today. For example, nanopore sequencing was something I started in the 1980s and it just seemed implausible that you could hold in your hand a sequencing device because at that time, even pathetic sequencing devices were the size of a room and they could hardly do anything.
Now you can hold in your hand something that has millions of sequencing devices in the nano machines. Similarly a nano machine that could so precisely make thousands of edits in a cell was science fiction. So that's the calibration of things that have been delivered.
Going forward, things like - we can now apply some of these tools to making an infinite supply of organs, so rather than having to desperately find a match, very often by the time it's delivered it doesn’t work because it's dead or not suitable for use, and there's not an adequate number of people dying in a suitable way to donate their organs.
That can be solved either by engineering human organ development in the lab or via animals, pigs that are close enough that are now engineered to be the donors. I could go on, but these things sound like science fiction and some of them arrive very, very quickly. In five years, some of them went from a crazy idea to everybody is doing it.
Kat: The idea that you talked about that I really loved was the concept of having a tiny DNA sequencer maybe in your watch or something like that, that was constantly scanning the microbes in the environment. So if someone sneezes near you on the tube you could be like, "Oh god no, they've got a bug or a cold or something like that, I'm going to move to the other side of the carriage." That seems like something implausible but could genuinely become a reality, with the pace of change that's happening?
George: We're very close. The nanopores that I mentioned, it's portable. It's not quite real time, meaning it's a twenty-minute delay when we would like something that's more like a two second delay or less.
Kat: Yes, you can't really move away from someone who's sneezed on you with a cold if it's half an hour later.
George: But you could have ubiquitous monitors where over the course of a day a room goes from being healthy to unhealthy, like a day care or a plane or a waiting room, or something like that. It takes the course of a day for somebody to sneeze and detect that. Even twenty minutes would be great.
Then basically, you don't just keep it to yourself, you tweet it out to the whole network of wearable sequencing devices and you're all healthier for it. Because right now, some sneezes are harmless, you shouldn't be terrified of them, but others are more serious and it's not just pathogens, there's also allergens. So if you have a way of carefully monitoring what you're sensitive to and what's in the environment. Some pathogens are okay if you're already immune to them, so it's a highly personalised software that you need. Some of them are okay if there's a good drug for them, but others are multi drug resistant, so there's all these nuances.
Are you allergic to it, are you immune to it, is there a drug for it? I think there's a great opportunity for software, just like there was a great opportunity in the dawn of GPS satellites and apps and queries of the internet, Wikipedia and so forth. We're at a similar point now for biology.
Kat: Genomics from everyone from everything, all genomes all the time?
George: Right. Genomes for All was the title of a paper I wrote for Scientific American in 2005. It's still a good goal.
Kat: George Church from Harvard Medical School, winding me in the brain with his vision for the future.
Genomics gets personal
Another person I met at the Personal Genomics conference was one of the organisers - Manuel Corpas, a genome bioinformatician from Cambridge Precision Medicine who’s been working in the field for many years. While working at the Wellcome Sanger Institute, he started to get interested in delving into his own personal genome. And once he’d done that, he wondered if his family would be interested in looking inside their genes too.
Manuel: It was a time when personal genomics became possible with the launching of direct to consumer products, such as 23andme etc. So I thought, hmm, if this is happening now, we're using genomics for people who have rare genomic diseases, it's just a question of time that this is going to become mainstream. So I thought that it would be a good idea to first see my own genome so I can analyse myself. You can download it from 23andme which is what I did, so that was the start in 2009.
Then we decided to analyse my family genomes, because I had a higher risk than normal to have prostate cancer and I didn't know why I had prostate cancer elevated risk, since this was something that didn't even exist in my family, we didn't have any history of that. I was able to learn a bit of genetics, the most contributing marker for genetics of prostate cancer.
It happens that I inherited the sort of bad allele, the bad copy from my mum and the bad copy from my dad. They don't have them together, but when you put them together when they created me then it increases the risk of having prostate cancer. That was the beginning of a tale that is still ongoing.
Kat: So then what happened? You went to the rest of your family and said, "Hi everyone - Christmas present; spit in this tube!"
Manuel: It was more like an evolution of things. So after I did 23andme for all my family and we decided to put it together and then made it publicly accessible to anyone who wanted to use it, then I got a lot of interest from the community. Because at the time, there wasn't any complete family genome that could be openly downloadable, so I got a lot of people interested in using it. I said okay, fine, you have them, but if you find something useful then feed me back the results.
Kat: Yeah, but useful can also mean not great…
Manuel: Yes, that's absolutely right. Yeah. I mean the thing is that we are kind of Spanish, laid back people and I guess that when you want to learn about your personal genome and all this information, it's not what you can get out of that, it's more like what's your personal attitude to what you want to know and don't want to know. So after going through a lengthy process of asking for advice from people who know about the ethical, legal implications of sharing data, my family was very supportive. They wanted to get on board with science, so to speak.
Kat: And when your family got the information within their genomes and started looking at it and looking at each other, how did they respond? It's like, OK, we've all got our genomes, what does this tell us about us and our family?
Manuel: Yeah, so these are the kinds of questions that many people are now beginning to wonder. For example, who has got the best genome?
Kat: Yeah, competitive genomes! I don't know how competitive your family is, but woah, mine would be --
Manuel: My family is very competitive! Well, it was more like, Auntie's genome is not so good, well, never mind. That was one of the first things that came up. My Mum, who loves me a lot, she's got apparently fewer risks than I do and she said, "I'll give you my genome if you want."
Kat: She already gave you the worst bits of hers!
Manuel: I said, "Mum, sorry, you cannot. I was born with this so that's it. That's for life now." At least for now.
Kat: It's like, yeah, if you wanted to give me the good genes, you should have done that a few decades ago, bad luck. Where has this story got to? Have there been any more surprising things that have come out of your family exploration?
Manuel: Yes, I mean obviously thank God my father is my real father…
Kat: Yeah. All these people getting the kits for Christmas and they're like, "Hi everyone!" and then you sort of get Mum going, "Hmm, yeah, maybe we'll do this later."
Manuel: Yeah, so another interesting thing that I found was why my dad doesn't like milk in his coffee. Because he's lactose intolerant, and he doesn't like milk very much, but we did find he's got a very high risk of lactose intolerance. Now, this is what the genetics is saying and this is us trying to assign a causative correlation. That's what we believe, but I cannot tell you for sure if that's the actual reason. This is kind of a lesson to be learned, that you may find many wonderful things, many predictions, and sometimes you might end up adjusting your own personal narrative based on the results, with no correlation to the real genetic basis of your own. I've learned a lot about communication of results, for example my aunt had several episodes of venous thromboembolism.
Kat: Blood clots?
Manuel: Yes, and it appeared that she had high risk in her genome for that. Then I said, "Auntie, the fact that you are having thromboembolisms is not because perhaps you have smoked a lot or have a lot to drink, actually you had that susceptibility." And she actually didn't do much with that. She could have gone to her GP and perhaps talked about that, but there wasn't a lot of intake there.
The other thing was, my sister's genome, who was represented as a series of chromosomal pixelated images was taken by an artist, and this artist created the first genome blanket in the world. If you have the chance to go to the Tilburg Museum of Textile and Industry in Amsterdam, you will see the first genome blanket, which belongs to my sister.
Kat: That's amazing! I was going to go to Amsterdam next month, so I will go and check it out. We're here at the Personal Genomics conference, why is it so important to have this conference, and why is it so important to be having to be having the kinds of conversations now about; what does genetic testing mean, what do these results mean, how do we talk about it and what's coming down the pipeline?
Manuel: Well, the hope is that we are about to close a point of inflection where genomics is going to become something which is widespread. The prices have come down, the amount you need to pay to get your genome sequenced is below a thousand dollars.
Kat: Well, down to zero, if you believe George Church.
Manuel: At some point, perhaps. But of course then if you want the interpretation then that's a different matter. I think the real point here is that technology is unstoppable, and I can see the similarities at the beginning of the 1990s when the internet was started by Tim Berners Lee in 1991. Then at the end of the decade everyone knew what the internet was and it became something that has changed our way of life.
I believe that genomics is poised to change the way medicine is done and as a result, the way we live our lives, and the way we even know ourselves and our understanding of ourselves.
I'm going to give you an anecdote for example. Suddenly my auntie passed away a few years ago and I was able to retrieve, a few years later, some of her hairs. I've sequenced her genome, right? I posted that in the blog and I already got people who came back to me, "My son passed away a few years ago, can you help me sequence his genome? You know - I don't have anything else from him but I want to have his genome as a way of having something about him.”
And that tells me that because it's so deep, the implications of having that information, people are looking at themselves or looking at their genomes as a new facet of their own personality, their own being. So I think that's potentially what could happen in the near future.
Kat: You may be gone, but your genome lives on! That’s Manuel Corpas, speaking to me at the recent Personal Genomes Conference.
Orchids in the cloud
Orchids are one of the largest groups of flowering plants, popping up all over the place in an incredibly diverse array of shapes and colours. While you might mainly think of orchids as ornamental houseplants - and I’ve certainly received a few as sadly short-lived housewarming gifts - many species in their native locations are vulnerable to poaching or deforestation. So understanding the genetic diversity and complexity in these wild populations is essential for helping to figure out how species are changing and targeting conservation efforts most effectively.
In the latest episode of the podcast from Heredity, the Genetics Society journal, James Burgon chats to Professor José Iriondo from King Juan Carlos University in Madrid about his latest paper investigating the fine-scale genetic structure of an unassuming orchid that grows nestled in trees within the Ecuadorian cloud forests. Working in a landscape unlike any other, José and his team have uncovered a genetic mystery - while they might have expected orchids growing in the same trees to all be genetically similar, the reality is not so simple, highlighting vital conservation considerations for these iconic and endangered plants.
José: The reason why we chose this species is a bit sad because orchids in Ecuador are in danger of extinction because of illegal orchid poaching. So we ended up using a species which is not a very showy orchid - on the contrary, it is a small orchid with green flowers and narrow leaves, called Epidendrum rhopalostele but it is just one of many epiphytic orchids in Ecuador.
One particular result that was quite striking is that we found orchids belonging to these two groups in 21 of the 25 trees. This was quite unexpected because if there is genetic differentiation in a population you would expect to find it related to some environmental difference or some geographic difference. But in this case the individuals of both groups were present in 21 of the trees where the orchids were found.
Kat: You can hear the full interview in the latest Heredity podcast - just search for Heredity in your favourite podcast app, or go to https://www.nature.com/hdy/podcast/index.html
That’s all for now. Next time we’ll be back with more stories from our series exploring 100 ideas in genetics, taking a spin around the flower garden in search of snapdragons and miniature DNA sequencers.
You can find us on Twitter @geneticsunzip or email us at podcast@geneticsunzipped.com with any questions and feedback. Please do take a minute to subscribe on Apple Podcasts, or wherever you get your podcasts from, and it would be great if you could rate and review - and more importantly, please spread the word so more people can discover the show.
Genetics Unzipped is presented by me, Kat Arney, and produced by First Create the Media for the Genetics Society - one of the oldest learned societies in the world dedicated to supporting and promoting the research, teaching and application of genetics. You can find out more and apply to join at genetics.org.uk Our theme music was composed by Dan Pollard, and the logo was designed by James Mayall. Thanks to Hannah Varrall for production, thank you for listening, and until next time, goodbye.
References and further reading:
The $1,000 genome, the $100,000 analysis? Elaine Mardis, Genome Medicine 2010; 2(11): 84.
A family experience of personal genomics, M Corpas (2012) J Genet Couns. 2012 Jun;21(3):386-91
Elena Torres, María-Lorena Riofrío & José M. Iriondo. Heredity, volume 122, p458–467 (2019) Complex fine-scale spatial genetic structure in Epidendrum rhopalostele: an epiphytic orchid