Genetics Unzipped is the podcast from the Genetics Society - one of the oldest learned societies dedicated to promoting research, training, teaching and public engagement in all areas of genetics. Find out more and apply to join at genetics.org.uk

Naomi Allen: What can 500,000 genomes tell us about human health?

Naomi Allen: What can 500,000 genomes tell us about human health?

Courtesy of Prof. Naomi Allen

"Click here to listen to the full podcast episode"

Kat: Cast your mind back to the early 2000s, if you can. For our youngest listeners, ask a parent and make them feel old… The draft human genome had just been published, and the scientific world was waking up to the possibility of exploring our DNA - and its connection to disease - in unprecedented detail. Against this backdrop, a group of British scientists embarked on one of the most ambitious biomedical research projects ever undertaken: UK Biobank. 

Over three years, half a million adult participants were recruited from across the UK to take part, providing biological samples for DNA sequencing and biomarker testing, along with plenty of other information about their health and lifestyle. This laid the foundation for a treasure trove of data which is still being added to today as the research team continues to measure and monitor participants as they age. 

Initially, like many similar large-scale research projects, UK Biobank focused on sequencing exomes - that’s just genes themselves, which make up less than 2% of the human genome. But now they’ve gone the whole way.

At the end of November, UK Biobank announced a significant milestone in the project: the release of whole genome sequence data from all 500,000 participants. After five years of work, more than 350,000 hours of sequencing time, and more than £200 million investment from government, non-profit and industry funders, this represents the largest single set of genome sequencing data ever to be released, by some margin. 

But it’s not just the size that’s important, it’s what we can do with this data that counts. I sat down with Professor Naomi Allen, Chief Scientist at UK Biobank, to find out why this dataset is so valuable, and why it’s so important that these sequences are whole genomes, rather than just genes.

Naomi: Whole genome sequencing enables researchers to look at all of the genetic variation across the entire genome. So not just in the 2% of the genome that encodes for proteins, but all of the genetic variation, much of which was previously considered "junk DNA" precisely because we didn't know what it did.

Naomi: This information will enable really robust research into how variation, across the entire length of our genetic makeup, influences health and disease risk. 

Kat: Why is this information, these whole genome sequences, so useful? What can researchers do to find out more about the links between DNA, variations in it, and disease?

Naomi: Whole genome sequencing data on half a million people together with all of the other lifestyle, environment and health data we have on these individuals will enable researchers to better understand the role of genetics in the causes of various health outcomes - why some individuals develop certain diseases and other people don't, over time.

Naomi: It will also enable researchers to do much better risk prediction. For example, using the whole genome sequencing data will lead to a much greater ability for us to characterise an individual's genetic risk of developing breast cancer. Rather than just using a BRCA mutation, you can use all of the genetic variation across an individual's genome and find out which 10% of women are at very high genetic risk of developing the disease, who can then be targeted for earlier screening or more intensive mammograms.

Naomi: It could also be potentially transformative for finding new drugs. If we find that rare genetic variants across the whole genome are associated with a particular health characteristic, then that will give new insights into drug targets that could be used to treat those diseases.

Kat: It's really incredible to see how our understanding of how genetics and genomics influence health has really transformed within my lifetime. Going from finding single genes that are linked to disease to realising that most diseases are to do with many, many genes, and that there are many, many variations, most of which are not in the genes themselves. It does paint a picture of incredible complexity, and I guess that's why we need incredibly complex data to understand this.

Naomi: Well, that's right. There's been so much hype over the years about precision medicine or personalised medicine. I think the release of whole genome sequencing data will lead to research that can really make tangible strides in that area. For example, we might be able to identify subgroups within the population who are more or less likely to respond to treatment based on their genetic profile or who are more or less likely to experience side effects of certain drugs based on their genetic profile. So you can see how having this data and linking it to disease and other health characteristics could potentially lead to much more targeted precision medicine approaches for the whole population. 

Kat: But of course, the big question is who gets access to it? Could I get a hold of it?

Kat: What kind of checks are in place to control who actually gets to access this data, to rummage around in it and do that kind of research? 

Naomi: Yeah, so that's a really important question. Health-related researchers can access the data. All researchers are vetted carefully. Researchers have to come from a valid research institute, from academia or from commercial companies, and we look at their publication record to make sure that they are actually performing health-related research in the public interest.

Naomi: All applications to use the data are carefully assessed and we monitor the research output of each and every application to make sure it falls within the remit of approved research. It's important to say that only ever de-identified data is made available to researchers - by that I mean we will never release data or names, addresses, date of birth, NHS numbers and so on.

Naomi: The whole genome sequencing data are made available via a cloud-based secure research analysis platform so that researchers can access these very large data very securely on the cloud and to perform in situ analyses. So the data are not being downloaded by researchers all over the world.

Kat: What kinds of research, what kinds of progress have already come from the data that's already in UK biobank? 

Naomi: It's been possible to use the genetic data that we previously released for researchers to develop polygenic risk scores that identify, very early on in life, individuals with a high genetic risk of developing a particular disease.

Naomi: One of the first examples was a polygenic risk score for heart disease that identified about 8% of the population that had triple the normal risk of heart disease, which is often equivalent to a single gene disorder. So you could use these genetic tools to identify individuals who are at high genetic risk for further preventative strategies.

Naomi: So that's been a really important early win, of the use of genetic data for population-based interventions for early screening.

Kat: We're now seeing the rise of technologies that people can wear, things like watches and various types of monitors that can provide more ongoing data about health.

Kat: Are you trying to bring those sorts of technologies into the UK Biobank? 

Naomi: So, in 2014, we gave 100, 000 participants a smartwatch to wear for seven days, and analyses on that data have already shown that differences in physical activity patterns can predict Parkinson's disease up to seven years before diagnosis. This is astonishing if you think about it, and you can start to see how these types of data that many of the population are actually wearing could be used in the future to diagnose disease earlier and start treatment earlier, when it's much more effective - especially for something like Parkinson's where it takes years to get a proper diagnosis.

Kat: What are your hopes for the future of UK Biobank, its data, its opportunities, the things you want to do next? 

Naomi: We really would love to bring all of our existing, very altruistic participants back to do a repeat of all of the measures they generously gave to us at baseline. This is so we can assess change over a 15 to 20 year period and look at how participants have aged over that time and better characterise health outcomes.

Naomi: For example, we know that of the participants who are diagnosed with dementia, for about half of them, we don't know what type of dementia they have. So we would like to bring those participants back to have a brain scan, to have blood samples done, so that we can better identify the type of dementia they have, which will enable much more accurate research into the development of treatments for specific subtypes of disease.

Naomi: That's the sort of thing we would really like to do over the next couple of years - much better characterisation of health outcomes, assessment of ageing, frailty, cognition, and dementia. I think over time, the resource will become more informative rather than less informative, and we hope to be following up with our participants for many decades to come.

Thanks to Prof. Naomi Allen.

Danuta Jeziorska: Exploring the secrets of the ‘dark genome’

Danuta Jeziorska: Exploring the secrets of the ‘dark genome’

Larry Moran: What’s in your genome?

Larry Moran: What’s in your genome?

0