My research centers on the identification of inherited genetic variation that influences disease risk in humans. Only within the last decade, following the completion of the human genome project and rapid developments in DNA sequencing technology, has this been possible. Results from genome-wide association studies (GWAS) have demonstrated both the feasibility and the potential for identifying unexpected biological pathways of disease – pathways that seem likely, in many cases, to be the targets of successful new therapies and predictive risk assessments at the individual and population levels.
Most GWAS are case-control studies that look for significant genotype frequency differences at 500,000 to 1,000,000 variable DNA sequences (termed SNPs) throughout the genome. Success is dependent on sample size, typically involving 1,000’s of subjects. This is because the relative impact of a variant tends to be minor, and because the large number of independent statistical tests that are performed requires statistical power. Over the past decade, such GWASs have been cobbled together with collections of existing DNA samples, and collectively they have identified from the low dozens to the low hundreds of genes associated with many of the most common human diseases. Many of the discovered genes have turned out to be targets of existing drug therapies, and this insight has reoriented many pharmaceutical companies toward drug discovery projects based on these novel genetic targets. Because the function of most of the ~20,000 human genes remains entirely unknown, these results also provide clues and years of follow-up research to academic biologists who have begun the slow task of understanding the function(s) of the implicated genes.
Following these initial efforts using opportunistic sample collections, efforts to develop prospective “biobanks” with genomic information linked to electronic clinical data, have followed. One such effort was pioneered at my home institution, UCSF, by (among others) Professor Pui Kwok (also my academic at the Academia Sinica here in Taiwan) in a collaboration with Kaiser Permanente. To date, >100,000 Kaiser patients have been genotyped, with data linked to longitudinal EHR data, resulting in numerous publications and discoveries. More recently, similar efforts have taken shape, including at Vanderbilt University, Geisinger Health, the Million Veterans Program, the UK Biobank, and the All of Us study of the NIH. Each of these studies has strengths and weaknesses, the most important of which is the lack of uniform and high quality longitudinal EHR data that is present in only a few places, such as Kaiser and the VA – and in Taiwan.
The Academia Sinica (AS), in collaboration with 12 hospitals throughout Taiwan, has launched the Taiwan Precision Medicine Initiative (TPMI) with plans to enroll and genotype 300,000 people initially, followed by 1 million people by early 2022. The eventual goal is to genotype nearly all 23 million people in Taiwan within 6 to 10 years, to empower genetic discovery and to bring genetic profiles into everyday clinical practice. All patients give consent to release their de-identified electronic health records and genetic profiles to a central research database housed at AS. In addition, patients complete a survey about their daily activities, health habits, and environmental exposures. Most are offered wearable devices to capture other data such as physical activity, heart rhythm, O2 saturation, etc. Because of Taiwan’s relatively homogeneous genetic background, advanced medical care system, comprehensive electronic health records, and large cohort size, the dataset generated by the TPMI will be the largest and richest for genetic discovery in the world. When the number of genotyped patients exceeds 300,000, it will be one of the largest cohorts in the world and will have the richest and longest electronic medical records anywhere. By the time 1,000,000 people have been enrolled and genotyped in 2022, it will be the largest cohort in the world; and researchers in Taiwan will be able to identify Taiwan-specific, East-Asia specific and some more global genetic associations for common diseases, novel biomarkers, and drug targets. The data can also form the basis for international consortia to identify genetic associations for less common diseases.
Current genetic knowledge will allow for some instances of drug dosing guided by pharmacogenetics, prevention of known adverse drug reactions, stratified disease screening and tailored management based on genetic profile. The genetic profile also includes HLA & blood typing, which will improve tissue matching for transplantation. Patient care may be improved, and overall medical costs could be expected to decrease over time.
Taiwan established its National Health Insurance (NHI) system in 1995. This single-payer system serves 99.9% of the population and covers both in- and out-patient services. From its earliest years it established an electronic health records (EHR) system. As a result, Taiwan has one of the most comprehensive and lengthiest population-scale EHR systems of any jurisdiction in the world (including the U.S., where a minority of health-encounters are captured by any form of EHR even to this day), and by far the most comprehensive in Asia, including Japan and Korea. For many Taiwanese, the EHR extends for nearly 20 years into the past. It is impossible to overstate the value and uniqueness of this lengthy EHR history covering such a large proportion of the population — across all incomes, geographic regions, and individual lifestyles — for the benefit of biomedical research and for genetics in particular.
At the same time, with funding commitments from individual hospitals ($30million over 5 years) and AS ($10million), the genome-scale SNP genotyping activity that is already underway at the direction of my host, Dr. Pui Kwok, will establish one of the largest and most powerful research databases of phenotypic and genomic data for biomedical discovery anywhere in the world. Because most studies to date have been conducted in the U.S. and Europe, this study will provide important and novel insights into elements of East and Southeast Asian populations, who together account for more than one quarter of humanity
During the term of my fellowship, I had proposed to make use of this unparalleled resource, including the genotyping data from an initial 300,000 subjects, in the following three areas:
- Performing Population Genetic analyses of the structure of the Taiwanese Chinese population. Taking advantage of my earliest population genetics training with R.C. Lewontin at Harvard and L.L. Cavalli-Sforza at Stanford, and building on my earlier work, I had hoped to conduct an initial genetic-geographic survey of the structure of the Taiwanese population. This is important for the design and analysis of nested case-control GWAS from the TPMI dataset, in which cases and controls must be matched or adjusted for all but their case/control status. It also helps develop an understanding of the linkage disequilibrium structure of the population, aiding with the imputation of >20million SNP genotypes from the 750K SNP backbone. And it builds on current understanding of the origins of the Taiwanese population on the Chinese mainland and in some cases beyond.
- Conducting case-control analyses of metabolic traits, especially of T2D, obesity, serum lipids, and blood pressure. Given the already high and increasing incidence of this disease area, the initial 300K person cohort will be uniquely powerful for driving discovery efforts relevant to East Asian populations and will build on my extensive publication history of research in this area. Given the magnitude of medical expenditure on these common problems, it will be a key area of national interest, and testing the efficacy of early intervention based on genotypic risk will be highly innovative.
- Designing and planning the analysis of infectious disease related traits, which depends heavily on spending the time to assess the type and quality of data available in the EHRs. This encompasses my primary interest in delineating the genetic variants that may explain the considerable variation in severity and outcomes to infections — even in epidemic situations where there is virtually no variation in the pathogen. Two possible approaches are possible, one is examining rare but very severe outcomes to relatively common pathogens. For example, my recent studies of acute liver failure caused by hepatitis A virus or neuroinvasive disease caused by West Nile Virus. A second approach would be to look at relatively common infections in Taiwan, such as tuberculosis or hepatitis B – although both are rapidly declining from previously high levels due to public health interventions and a vaccine, respectively – and identifying the genetic risk variants that may explain why only about 10% of the infected population will eventually go on to serious lung or liver disease, respectively.
I have chosen to concentrate particularly on disorders of immunity and metabolism for several reasons. First is my belief that deaths from epidemics (as we have been reminded so dramatically by the COVID-19 virus pandemic that has impacted all of us during our time in Taiwan this year) or famine are likely to have been among the two greatest selective forces in our evolutionary past. This leads to the expectation that the magnitude of genetic effects contributing to susceptibility to infection, autoimmunity, and metabolic disease is significant, and probably larger than for many other complex diseases. This should increase the likelihood of identifying relevant genes via population-based association studies, and should serve as a better testing ground for methodology that might then be more successfully applied to diseases with more subtle genetic etiologies. It also leads to the attractive hypothesis (which my research program aims to test) that our adaptations to survive infections and periods of food scarcity have left us maladapted to modern life in which infectious mortality has been reduced by improvements in hygiene, antibiotics/antivirals, and vaccines; and in which an overabundance of food poses a greater threat to the health of a growing fraction of the global population (in the form of obesity, dyslipidemia, high blood pressure and T2D), than does its scarcity. In addition to susceptibility to infections and overt autoimmune conditions; immune genes are now known to play key roles in many cancers, allergic and hyper-responsive disorders of rapidly increasing incidence such as asthma, metabolic disease/diabetes, as well cardiac and vascular disease. By using pathogens, vaccines, and autoimmune diseases as probes of functionally relevant immunogenetic variation, I believe we can gain a broader understanding of numerous other diseases that all converge in one way or another on the nexus of immune genes – and my research program at UCSF seeks to uncover the genetic underpinnings of both immune-related and metabolic diseases of humans.
So I had set ambitious goals for a 7-month fellowship (although I spent nearly 12 months here in Taiwan, all told), based on the even larger and even more ambitious TPMI; and while I certainly didn’t accomplish them all, I did engage in a particularly productive collaboration concerning the ability of individuals to successfully resolve a hepatitis b infection vs. remaining chronically infected. And certainly I have laid considerable groundwork for longer-term studies that I expect will continue collaboratively after my departure from Taiwan.
The hepatitis b virus is a simple RNA virus whose genome encodes just 3 or 4 genes. Yet, over a lifetime of chronic infection, it can wreak dreadful havoc on the liver, the primary tissue it is able to infect — first by causing cirrhosis, and in many, liver cancer. Yet, some 90% of us can quickly rid ourselves of the infection, often with few to no symptoms. What genetic variation in what genes (or other factors) determines these very different outcomes to the same infection? (We can pose the same question about nearly any other infectious disease, including the COVID-19 that is causing such consternation during our fellowship year here. While “only” about 2% of infections are fatal, there is quite a large fraction of the population that can become infected and exhibit no symptoms at all – while others, perhaps the majority — have a mild cold-like illness. Not all of the mortality is due to the waning immunity of old age or secondary health conditions, since young and otherwise healthy people have also succumbed to the infection.)
For whatever historical reasons, Taiwan has been unusually severely impacted by hepatitis b virus. In 97,759 individuals for whom we had available genetic and hepatitis b infection data, just over one-half (49,914) had evidence of a prior, yet successfully cleared infection. More than 10% (10,373) had a chronic infection; while nearly 20% (19,049) had been vaccinated and is protected from infection. With a colleague at the Genome Research Center at Academia Sinica, Dr. Hway-I Yang, I completed an initial analysis of these data with one highly significant finding. We identified variation in a set of genes referred to as the Human Leukocyte Antigen Complex, one of the most variable portions of our genome. The task of these genes is to bind to protein fragments from infecting pathogens, and in turn bind to various other specialized immune cells, especially the B-cells, that produce antibodies that are able to fight infection. But this is only the beginning, and as more data become available we will continue our search ever deeper into the human genome, with the expectation that the number of genes associated with outcomes to hepatitis b virus infection will increase.
Managing Editor: Yi-Feng Huang 黃一峯