Artificial intelligence (AI) has already revolutionized the study of proteins by predicting their 3D structures, which are key to their function. Now, AI is beginning to wield its power for much smaller molecules: the drugs, herbicides, and catalysts at the heart of medicine, agriculture, and industrial chemistry.
Today in Science, researchers report that a new AI tool can determine the structure of small molecules even with spotty data. The tool can decipher patterns in troves of data that were previously discarded as not good enough. The approach could make it easier for chemists to advance a wide range of compounds central to modern life.
“That is a game changer,” says Horst Puschmann, a small molecule crystallographer at Durham University who was not involved with the work.
AI’s recent progress in predicting protein structures has come about largely because of the availability of vast sets of training data. Researchers know the DNA sequences of the genes that code for proteins, and for many of these proteins, they also know the precise 3D structure. With the two data sets, scientists can train an AI to accurately predict a novel protein’s unknown 3D shape based just on its DNA sequence.
But small molecules present a more difficult challenge, says Anders Madsen, a small molecule crystallographer at the University of Copenhagen. Although researchers can compute basic 3D structures just from their chemical formulas, it can often be impossible to work out the precise structure because many slight variations are equally plausible.
To identify the actual structures, researchers typically turn to x-ray crystallography. They start by converting a purified batch of a small molecule into a solid crystal, where all the copies line up in a repeating pattern like fruit stacked in a grocery display. Scientists then fire a beam of x-rays at the crystal. Electrons surrounding the molecule’s atoms deflect the x-rays, producing a “diffraction pattern” recorded by detectors. By analyzing these patterns, researchers can map groupings of electrons and work out the arrangement of the atoms.
Getting a good structure also requires knowing the “phase” of the bombarding x-rays, a property of their wavelength and path through the crystal. Although detectors can measure the intensity of the x-rays—essentially the number of photons in each spot—they can’t measure their phase. Researchers must make educated guesses to test the most likely phase values and confirm the ones that best fit the data. But all too often scientists wind up with poor crystals that generate fuzzy diffraction patterns, which make it impossible to pin down the phase. “You put junk in, you get junk out,” says Andrew Bond, a small molecule crystallographer at the University of Cambridge.
However, AI is often able to see patterns in fuzzy data that are invisible to researchers. Madsen and his colleagues set out to see whether that was the case here. They did so by working backward, using a computer model to concoct millions of made-up structures of small molecules and to compute the fuzzy diffraction patterns poor crystals would produce. The AI started with random phase values and iterated until it landed on phases that, when combined with the fuzzy intensity data, produced the correct structure.
At that point, the researchers had the inputs (intensity and phase information) and outputs (3D structures) for millions of hypothetical molecules. They used this to train their AI to look for patterns connecting the intensity data to the phase information that together would give them the right structures.
Next, the scientists needed to see whether the trained AI could predict the structure of real molecules it hadn’t seen before. They found that it could accurately solve the known structures for each of the nearly 2400 small molecules they tested, with as little as 10% of the data needed with traditional x-ray methods. “It’s like magic in a way,” Puschmann says.
For now, the technique only works with molecules containing up to about 50 atoms. Madsen says he hopes to continue to refine the AI in hopes that it can work with larger molecules, thereby making it a more versatile tool.
Bond foresees similar AIs being trained with data sets from rival techniques, such as electron beam diffraction, which doesn’t require the preparation of crystals. But for now, he says, “This is a really nice first step.”
