There was an eruption in Iceland last week. No, this was not another volcanic eruption. Rather, there was a seismic release of human genetic data that provides a glimpse into the future of drug discovery. The studies were published in Nature Genetics (the issue’s Table of Contents can be found here), with insightful commentary from Carl Zimmer / New York Times (here), Matthew Herper / Forbes (here), and others (here, here).
[Disclaimer: I am a Merck/MSD employee. The opinions I am expressing are my own and do not necessarily represent the position of my employer.]
As I have commented before, human genetics represent a very powerful approach to identify new drug targets (see here, here). I have articulated a 4-step process (see slide #5 from this deck): (1) select a phenotype that is relevant for drug discovery; (2) identify a series of genetic variants (or “alleles”) that is associated with the phenotype; (3) assess the biological function of phenotype-associated alleles; and (4) determine if those same alleles are associated with other phenotypes that may be considered adverse drug events.
There is an important assumption about this model: genes with an “allelic series” will be identified from large-scale genetic studies, and these phenotype-associated alleles will serve as an estimate of function-phenotype dose-response curves. The Nature Genetics articles from Iceland provide empirical support for this model.
I highlight two manuscripts from this collection of studies. The first is on a specific gene, ABCA7, implicated in Alzheimer’s disease, and the second is on rare “human knockouts” – the most extreme form of genetic perturbation.
Background and results: Previous genetic studies, including GWAS, have shown that there are at least 18 non-MHC regions of the genome (consisting of >100 genes) implicated in risk of Alzheimer’s disease (AD). The Steinberg et al study used “imputation of the whole-genome sequences of 2,636 Icelanders into 104,220 long-range phased individuals and their close relatives to investigate whether any of the genes located in the regions showing common variant association with Alzheimer’s disease (excluding the MHC) also harbored rare variants conferring higher risk.” At an allele frequency threshold of 2%, they identified loss-of-function (LoF) variants in 20 genes from 10 of the loci implicated by GWAS, and missense variants for 82 genes in 17 loci. The most significant association result was for ABCA7 (OR = 2.08, P = 3.8 × 10-5 for LoF variants; P=0.0002 for missense variants). One variant in particular drove the significant findings for the missense variant test (a splice-site variant, c.5570+5G>C), and this variant was functionally determined to be LoF. None of the ABCA7 LoF variants was located on the background of the common variant previously associated with Alzheimer’s disease (rs4147929[A]), thus ruling out a synthetic association. Six ABCA7 LoF variants were associated with AD risk in case-controls samples outside of Iceland (Finland, Germany, Norway, United States, OR = 1.73, P = 0.006). A meta-analysis with the Icelandic data yielded an OR of 2.03 and a P value of 6.8 × 10-15.
Why this is important: The study provides evidence of an allelic series for ABCA7 associated with risk of AD. In doing so, this genetic study implicates ABCA7 rather than another gene in the GWAS-implicated locus as the disease-associated gene. With a range of alleles, it is possible to understand how to target the gene for therapeutic benefit – the transporter would need to be agonized to facilitate movement of lipids across membranes in cells in the brain (e.g., microglial cells). Whether ABCA7 ever becomes a therapeutic target will take many years of dedicated research.
Background and results: As stated in the introduction of the article: “An unanswered question in human genetics is what is the population frequency of homozygous loss-of-function mutations in the germline genome?” To answer this question, the study utilized the same survey (as in the ABCA7 study above) of whole genome sequence data from 2,636 Icelanders, with imputation into an additional 101,584 chip-genotyped and phased Icelanders. Out of approximately 20,000 genes in the human genome, they identified low-frequency (minor allele frequency [MAF] less than 2%) heterozygous LoF mutations in one-quarter (4,924 genes of the 19,135 RefSeq genes) and low-frequency homozygous LoF mutations in ~6% of genes (1,171 genes). Extended to the entire population, they identified 8,041 individuals (or 7.7% of the population) who had 1 gene completely knocked out by LoF mutations. In an analysis focused on 1,717 genes implicated in Mendelian recessive disorders, a small percent of genes (n=88) were found to have homozygous LoF mutations in Icelanders. Finally, they found that there was a deficit of “double transmissions” (i.e., individuals predicted to be homozygous null for a gene based on parental sequence), due to lethality from “early death or the variants being embryonic lethal in the homozygous state”, or due to under-sampling “because of illness or disability”.
Why this is important: A goal for pharmaceutical development is to identify homozygous null mutations – or “human knockouts” – for each gene in the human genome. In the 4-step “allelic series” model (see slide #5 from this deck), human knockouts provide the most extreme human perturbation possible – lifelong absence of a gene and its protein product. This Icelandic study provides a glimpse into what it will take to catalogue human knockouts to generate function-phenotype dose-response curves. What is humbling is that the study found complete human knockouts for only 6% of genes in the human genome (although ~25% of genes with heterozygous null mutations).
How will we get to a higher number of rare, complete human knockouts?
First, we need sequence data in more individuals from different ethnic populations. While the Iceland study is large, it still represents sequence data from “only” 2,636 individuals. [Note that given the population structure of Iceland, this is a very efficient strategy to catalogue and test most rare variants in the entire population.] Large-scale sequencing efforts are underway in places like England (here) and the US (e.g., Regeneron/Geisinger, NIH-sponsored Precision Medicine Initiative). As a rule of thumb, ~1% of individuals from outbred populations will be homozygous null (although this depends upon the frequency cut-off, size of the gene, filters for calling a variant LoF, etc.), as compared to 7.7% of Icelanders. The increase frequency of in Iceland compared to outbred populations is due to the unique population history of Iceland (genetic bottleneck with a limited number of common ancestors.)
Second, we need sequencing studies in consanguineous populations. Whereas 7.7% of Icelanders are complete human knockouts, approximately 50% or more individuals from a consanguineous relationship will be homozygous null for a rare mutation. One study to watch is the 100,000 person “Genes & Health” sequencing study in East London, which is funded by the MRC Clinical Research Infrastructure initiative and the Wellcome Trust (see here).
Third, we need sequence data in patients ascertained by disease status, especially highly penetrant rare diseases. In the Iceland study, there was little overlap between Mendelian recessive diseases and homozygous null complete human knockouts, indicating that ascertainment for the general population is quite different than ascertaining from diseased populations.
It is important to understand the limits of our ability to find human knockouts. Even with the resources described above – which will happen in the near future – not every gene will harbor a human knockout due to embryonic lethality (see “double transmission deficit” above). This number is not yet known, but we should be able to estimate it indirectly once we have larger sequencing studies in multiple ethnic groups. It should also be possible to estimate this number directly from sequencing studies in fetuses that died in utero.
An important caveat to the Iceland study is that they limited their analysis to low-frequency (MAF<2%) mutations, which are more likely to be functionally deleterious than more common variants. There are a few common LoF variants that confer sepsis resistance (e.g., CASP12), impact muscle physiology (e.g., ACTN3) or influence other human traits (e.g., FUT2 and blood group antigens).
Finally, it is important to emphasize two features to maximize the use of living biobanks such as that described in Iceland: (1) ability to link genetic data with clinical data, and (2) ability to recall by genotype for functional studies.
The Iceland genetic study performed limited genotype-phenotype correlation studies. In one of the companion manuscripts (here), they provide evidence for rare variants in MYL4 associated with early-onset atrial fibrillation, several mutations in ABCB4 that increase risk of liver diseases and an intronic variant in GNAS associating with increased thyroid-stimulating hormone levels when maternally inherited. More systematic genotype-phenotype correlations studies are required to maximize this rich dataset.
In addition, it is important for individuals in any living biobank to be consented for recall. This will allow for deeper phenotypic interrogation among individuals who carry rare mutations of interest, and especially those that are complete human knockouts. The Iceland study performed limited recall.
In summary, population-based surveys that link human genetic data to clinical data in subjects consented for recall is a very powerful strategy to find new drug targets. The Iceland study represents the tip of the iceberg (couldn’t resist this pun) in what I expect will be a major theme in the future in drug discovery.