Everywhere they go, humans leave stray DNA. Police have used genetic sequences retrieved from cigarette butts and coffee cups to identify suspects; archaeologists have sifted DNA from cave dirt to identify ancient humans. But for scientists aiming to capture genetic information not about people, but about animals, plants, and microbes, the ubiquity of human DNA and the ability of even partial sequences to reveal information most people would want to keep private is a growing problem, researchers from two disparate fields warn this week. Both groups are calling for safeguards to prevent misuse of such human genomic “bycatch.”
Genetic sequences recovered from water, soil, and even air can reveal plant and animal diversity, identify pathogens, and trace past environments, sparking a boom in studies of this environmental DNA (eDNA). But the samples can also contain significant amounts of human genes, researchers report today in Nature Ecology & Evolution. In some cases, the DNA traces were enough to determine the sex and likely ancestry of the people who shed them, raising ethical alarms.
Similarly, scientists have for decades analyzed the genetic information in fecal matter to reveal the microbes in people’s intestines—the gut microbiome, which plays dramatic roles in human health and development. Because the amount of microbial DNA in a stool sample is so much greater than the amount of human DNA, which is often degraded, most researchers have assumed that any recovered sequences don’t contain significant genetic information about the sample provider. But enough is present, according to an analysis published today in Nature Microbiology, to potentially identify the donor’s sex, likely ancestry, certain disease risks, and, when linked to other databases, even their full identity. Computer programs commonly used to filter out human genetic sequences from the microbiome data did not eliminate the problem, the researchers found.
“I see this as a major ethics problem for the whole field,” says microbiome pioneer Rob Knight of the University of California, San Diego (UCSD), who wrote an accompanying commentary. “We will have to completely rethink how we communicate to research subjects about the privacy risks of participating in microbiome research.”
The human genetic bycatch issue could have far-reaching consequences. Knight says his group may need to take down all of the genetic sequences it has posted to public databases to remove more of the human DNA. It will also complicate data sharing among eDNA researchers and could mean that ecologists and environmental scientists, who are not used to getting permits to study people, will need another set of ethics approvals. eDNA or microbiome data from groups, such as Indigenous peoples, who have long been concerned about the unintended consequences of genetic data, could be especially sensitive. “We’re moving in a direction and at a pace that we really need to recalibrate” the ethics and legal rules surrounding such data, says Keolu Fox, a genomic scientist at UCSD who is Kānaka Maoli (Native Hawaiian) and who has long advocated for Indigenous rights in genetic research.
Jessica Farrell, David Duffy, and their colleagues at the University of Florida were using eDNA to probe herpes virus infections that cause tumors in sea turtles when they started to worry about human bycatch. They used a particularly powerful sequencing method to identify DNA in samples of sand from nesting sites and in water from tidal estuaries and tanks at the university’s Sea Turtle Hospital. They found the turtle virus and turtle DNA, but they also found long stretches of human DNA, intact enough that sequencing programs could easily recognize X and Y chromosomes.
Startled by the abundant human DNA in their samples, they received ethics board permission to look for human DNA in a variety of other environments. They found human DNA in samples from the Avoca River in Ireland; seawater from near the coast of St. Augustine, Florida; sand from a footprint; and air in a room where people were working. In many of the samples, the genetic traces were enough to identify the sex and likely ancestry of the person who left them, and some revealed genetic variants associated with disease risk.
In contrast, human geneticists Yukinori Okada and Yoshihiko Tomofuji of Osaka University and their colleagues set out from the start to see whether stray human sequences in their microbiome data posed a privacy problem. They had sequenced both the genomes and fecal microbiomes of 343 people to study how a person’s genes might correlate with and possibly influence their gut flora, and they wanted to be sure that if they shared any of their data, the donors would not be identifiable. What they found was not reassuring. For 97% of the samples, the microbiome data contained enough human genetic information to correctly predict the donor’s sex.
Almost all the donors could also be reidentified from single nucleotide polymorphisms (SNPs) in the fragments of human DNA present in the microbiome data. SNPs are specific points in the genome where the DNA sequence typically varies between people; they are the basis of DNA fingerprinting, and certain SNPs can also determine susceptibility to some diseases. The microbiome data didn’t include enough information for a standard DNA fingerprint match, but by applying advanced statistical methods to the combinations of SNPs in each sample, the researchers matched 320 of the 343 fecal samples to the correct donor. Computer programs designed to screen for and filter out human DNA sequences reduced, but did not eliminate the problem; after filtering, the group could still identify as many as 11% of the samples.
They could also confirm the East Asian ancestry of all but six of their 343 donors. When they used the same techniques on publicly available gut microbiome data sets from Europe, South Asia, and East Asia and assumed the donor’s ancestry matched the data’s region, they could correctly predict ancestry between 80% and 92% of the time. As a final step, the scientists did additional “ultradeep shotgun” sequencing on five of the fecal samples. They identified genotypes associated with inflammatory bowel diseases, type 2 diabetes and other, even rarer conditions, information that is usually considered highly sensitive and private.
Knight says the analysis shows that the current methods microbiome researchers use to filter out human DNA and anonymize samples simply don’t work well enough. Researchers also need to re-evaluate how widely microbiome-derived sequences can be shared, he adds. “We are currently looking at withdrawing all the human metagenomic data sets we ever deposited, so we can redeposit only the sequences that positively match a microbe,” he says.
The power to extract personal data from eDNA and microbiome samples will continue to increase, both groups of authors warn. That raises concerns about misuse by police or other government agencies, collection by commercial companies, or even mass genetic surveillance, says Natalie Ram, a law and bioethics scholar at the University of Maryland Francis King Carey School of Law. In the United States, she says, researchers and funding agencies should make greater use of federal Certificates of Confidentiality. They prohibit the disclosure of “identifiable, sensitive research information” to anyone not connected with a study, such as law enforcement, without the subject’s consent.
The National Institutes of Health automatically issues such certificates for federally funded health-related research, but could expand the range of studies that are covered, Ram says. In Europe, says Ewan Birney, a bioinformaticist and deputy director general of the European Molecular Biology Laboratory, existing data protection laws should help protect against misuse of human genomic bycatch while still allowing research to proceed. “It would be a bad thing for the world if we were not able to share eDNA samples,” he says.
Yves Moreau of KU Leuven, who studies both artificial intelligence and genetics and has warned that China is using mass collections of human DNA to help suppress minorities, agrees that any limits on research need to be carefully weighed with the potential benefits. At the same time, he says, “it is important to put this on everybody’s radar so that emerging risks and potential abuses can be identified early.”
Fox concurs. “Which companies and governments are going to pay and license to have poop-based surveillance technology?” he asks. “Imputing people’s identity based on their poop is compelling and interesting, for a number of reasons, and most of them are all the wrong reasons.”