Scientists are beginning the world’s largest study of proteins circulating in human blood in an effort to better understand the development and treatment of disease.
Launched today as part of an ongoing collaboration between the British health study UK Biobank and more than a dozen pharmaceutical companies, the project aims to measure levels of 5400 proteins in blood samples from half a million people. For some of those people, researchers will analyze two separate samples taken years apart, generating a “first-of-its-kind” database on how a person’s changing protein levels influence disease later in life, organizers say. The companies backing the project with tens of millions of dollars and carrying out the initial analysis of 300,000 blood samples will get an exclusive 9-month window of access to the data, after which the information will be made more widely available to UK Biobank–approved research teams.
Scientists not involved in the project say the vast amount of data generated could aid development of blood tests that detect disease before symptoms appear and identify new drug targets for illnesses. “This will be an exceptionally powerful resource for understanding health and disease,” says Eleftheria Zeggini, director of the Institute of Translational Genomics at Helmholtz Munich.
Since 2006, the UK Biobank has collected biological samples, medical images, and health data from more than 500,000 volunteers. It periodically releases batches of anonymized health information along with genetic and other data to approved researchers. More than 20,000 scientists from about 50 countries are currently working on the data.
Many groups looking for disease biomarkers or drug targets have assessed blood proteins in various patient populations and healthy individuals, but not at the scale the UK Biobank offers. Its addition of protein data began in 2020, when 13 biopharmaceutical companies working as the UK Biobank Pharma Proteomics Project analyzed concentrations of nearly 3000 proteins in the blood of 54,000 participants. In a 2023 paper, this consortium identified more than 14,000 associations between particular genetic variants and the levels of proteins in the blood—about 80% of which were previously unknown.
That proteomic data set, already the largest of its kind, has since been used to pinpoint biomarkers and potential drug targets for conditions including breast cancer and Parkinson’s disease. “One of the most exciting applications of these data has been training AI models for disease prediction,” adds Ryan Dhindsa, a geneticist and consultant for AstraZeneca. He and other researchers have done this using machine learning algorithms to look for links between protein levels, among other factors, and a person’s risk of illness.
The next phase of the project will ramp up that work 10-fold, scanning initial blood samples from the UK Biobank’s half-million participants, plus follow-up samples from 100,000 of them taken up to 15 years later. This expanded data set will provide better resolution to investigate diseases that have often “fallen below the radar” in smaller studies, such as polycystic ovary syndrome, motor neuron disease, and certain kidney and bone cancers, Claudia Langenberg, an epidemiologist at Queen Mary University of London and a collaborator on analyses of the earlier data set, told reporters at a prelaunch briefing on Thursday.
Work on the first 300,000 samples—250,000 initial samples and 50,000 follow-ups—is expected to take about a year and 14 pharma companies, including industry giants such as AstraZeneca, Pfizer, and GSK, are providing the money for that. The UK Biobank has not yet secured funding for the remaining 300,000 samples, according to organizers.
Researchers are keen to get their hands on the data when they become available. “We will definitely jump on it,” says Cornelia van Duijn, a genetic epidemiologist at the University of Oxford who works with UK Biobank CEO Rory Collins and some of the pharma companies involved. Studying protein levels can offer insights into a person’s health that their genome, which is far more fixed over a person’s lifetime, cannot, she adds. In particular, identifying proteins that rise or fall at the beginning of disease could help clinicians detect when someone is beginning to become ill and could benefit from early treatment.
Yale University statistical geneticist Hongyu Zhao cautions that the UK Biobank data have known limitations that have provided challenges in genetic studies and could also make it hard to pin down certain proteins’ roles. Almost all participants are of European descent, for example, and they tend to be healthier than the general population, meaning researchers should err on the side of caution in interpreting results and try to replicate findings in other populations where possible. Nevertheless, the announcement is “great news,” he says. “This is going to create a lot of opportunities. [It’s] an invaluable resource.”
