Illustation by Valero Doval

Health

ZIP code or genetic code?

Researchers tap massive database to determine the effects of genes and environment in 560 common conditions

7 min read

When it comes to disease and health, which is more powerful — ZIP code or genetic code?

The degree to which nature and nurture affect disease and health remains one of the eternal — and still unanswerable — questions in medicine.

Now a team of investigators from Harvard Medical School (HMS) and the University of Queensland in Australia have tackled this question in a decidedly novel way.

In what the researchers describe as a coup for big data and a scientific first, the team has used a massive insurance database of nearly 45 million people in the U.S., including thousands of twin pairs, to determine the effects of genes and environment in 560 common conditions. The diseases analyzed span 23 categories, ranging from cardiovascular illness and neuromuscular diseases to skeletal conditions.

The work, published Jan. 14 in Nature Genetics, is thought to be the largest assessment of U.S. twins to date, the researchers said. It is also the first one to go beyond the traditional one-disease-at-a-time approach and analyze hundreds of the most common conditions among more than 56,000 twin pairs. To date, most twin or familial studies of genes and environment have looked at just a single disease or environmental factor.

Many diseases are neither purely genetic nor purely environmental, but rather the result of a complex interplay between the two. Unlike classic inherited conditions — those caused strictly by mutations in a gene or a set of genes — environmentally fueled conditions are the sole result of factors external to an individual’s biology. Most diseases do not fall neatly in either category but have elements of both. Disentangling how genes and environment contribute to multiple diseases in the same population has been astoundingly difficult, the researchers said. The new study aims to solve this challenge by developing a large-scale analytical approach.

The condition with the strongest potential link to socioeconomic status was morbid obesity.

“The nurture-versus-nature question is very much at the heart of our study. We foresee the value of this type of large-scale analysis will be in shining light on the relative contribution of genes versus shared environment in a multitude of diseases,” said senior study author Chirag Patel, assistant professor of biomedical informatics in the Blavatnik Institute at HMS.

The new method, the team said, underscores the value of large-scale analyses in informing national research efforts such as the National Institutes of Health’s All of Us program, part of the Precision Medicine Initiative that aims to tease out biologic, genetic, social, and environmental factors in disease and health as a way to inform individualized therapies. The findings of the new study can help direct research efforts by clarifying the relative influence of genetic versus environmental factors for a range of diseases.

“Our findings can provide signposts that inform subsequent research efforts and help scientists narrowly focus their pursuits,” said study first author Chirag Lakhani, a postdoctoral research fellow in biomedical informatics at the Blavatnik Institute. “For example, if our study of twins shows that there is very little heritability effect in a certain family of eye disorders, then future research should pursue alternative explanations.”

Using the database of 45 million-plus patient records — which also included more than 724,000 sibling pairs — the investigators estimated the influence of genes and environment in fraternal twins, who share half of their genome, or DNA, and identical twins, whose DNA is 100 percent the same. Same-sex twins can be either identical or fraternal, while opposite-sex twins are always fraternal, but the researchers did not know which same-sex pairs were identical. To circumvent this hurdle, they developed a novel statistical method that inferred the probability that a pair of twins was fraternal (non-identical) or identical. In doing so, the researchers were able to separate purely genetic from nongenetic contributions.

All the patients had been part of the insurance database for at least three years, giving the researchers more than just a snapshot in time. The newly published study, which involved young twin pairs, newborns to 24 years of age, was not designed to follow disease development over time, so the researchers were unable to assess the genetic and environmental influences of diseases that tend to develop in middle and older age, such as cardiovascular and neurodegenerative conditions.

Almost 60 percent of monthly health spending could be predicted by analyzing genetic and environmental factors.

The analysis included variables such as clinical diagnoses, imaging test results, blood chemistry tests like red and white blood cell counts, cholesterol levels, and many others, as well as environmental factors such as air pollution levels, climate conditions and socioeconomic status, all extrapolated from the patients’ ZIP codes.

Nearly 40 percent of the diseases in the study (225 of 560) had a genetic component, while 25 percent (138) were driven at least in part by factors stemming from sharing the same household, social influences, and the like. Cognitive disorders demonstrated the greatest degree of heritability — four out of five diseases showed a genetic component — while connective tissue diseases had the lowest degree of genetic influence. Of all disease categories, eye disorders carried the highest degree of environmental influence, with 27 of 42 diseases showing such effect. They were followed by respiratory diseases, with 34 out of 48 conditions showing an effect from sharing a household. The disease category with lowest environmental influence was reproductive illnesses, with three of 18 conditions showing such effect, and cognitive conditions, with two out of five showing an influence.

Overall, socioeconomic status, climate conditions, and air quality in each twin pair’s ZIP code had a far weaker effect on disease than genes and shared environment — a composite measure of external, nongenetic influences including family and lifestyle, household, and neighborhood.

In total, 145 of 560 diseases were modestly influenced by socio-economic status derived by ZIP code. Thirty-six diseases were influenced at least in part by air quality, and 117 were affected by changes in temperature. The condition with the strongest potential link to socioeconomic status was morbid obesity. While obesity undoubtedly has a genetic component, the researchers said, the findings raise an important question about the influence of environment on genetic predispositions.

Thirty-six diseases were influenced at least in part by air quality, and 117 were affected by changes in temperature.

“This finding opens up a whole slew of questions, including whether and how a change in socioeconomic status and lifestyle might compare against genetic predisposition to obesity,” Patel said.

Lead poisoning was, not surprisingly, entirely driven by environment. Conditions such as flu and Lyme disease were, again unsurprisingly, affected by differences in climate.

When researchers looked at classes of diseases by monthly health care spending, they found that both genes and environment significantly contributed to cost of care, with the two being nearly equal drivers of spending. Almost 60 percent of monthly health spending could be predicted by analyzing genetic and environmental factors.

Large-scale analysis like this study can help forecast long-term spending for various conditions and inform resource allocation and policy decisions, the researchers said.

Detailed study results available here: http://apps.chiragjpgroup.org/catch/

Co-investigators were Braden Tierney and Arjun Manrai of Harvard Medical School, and Jian Yang and Peter Visscher of the University of Queensland, Australia.

Data sets for the study were provided by Aetna insurance company. Aetna had no funding role in the study. The research was supported by the Australian National Health and Medical Research Council (grants 1078037 and 1113400), National Science Foundation (grant 1636870), and Sylvia and Charles Viertel Charitable Foundation.