Vilnius University Life Sciences Center (VU GMC) bioinformatics dr. Justas Dapkūnas and dr. Kliments Olechnovičius studies protein structures, which can provide a lot of important information and new knowledge in the world of life science. Today, artificial intelligence contributes to this research, and the results of the research are also useful in the fight against the coronavirus. VU researchers talk about research on protein structures, their importance and the new "rules of the game" dictated by the use of artificial intelligence in this field.
How is the field of work of bioinformatics specialists at VU GMC related to life sciences?
K. Olechnovich. In short, bioinformaticians study life with computers. This requires information such as protein sequences and structures. Such information is usually stored in open databases, anyone can access it. The main challenge is to use that information to solve problems.
Problems in bioinformatics are very diverse, one group will not cover everything, so GMC has several teams of bioinformaticians. Specifically, we mainly develop and use computational methods to analyze and model the spatial structures of proteins. Recently, we have been mainly concerned with the structures of protein complexes.
J. Dapkūnas. In principle, bioinformatics is as broad as all life sciences. As a result, bioinformaticians tend to specialize in certain areas: we analyze protein structures, while other scientists study genes and genomes, for example.
Why are proteins important to humanity and why do we need to analyze them?
J. Dapkūnas. We sometimes like to say that proteins are the most important molecules on Earth. Geneticists, for whom DNA is the most important molecule, would probably disagree with this, but it is proteins that carry out the program encoded in DNA.
K. Olechnovich. In living cells, proteins are both workers and builders and building materials. The main characteristic of a protein is its spatial structure - the way the protein's atoms are arranged in space. What function the protein performs and how depends on the structure. Often, the structure of a protein is the most important piece of information to learn exactly how the protein works and how to influence how it works. There are many different structures, most of them are still unknown to science.
The best way to find out the structure is to carry out special experiments. This is done by a number of experimental scientists who deposit solved structures into the Protein Data Bank (PDB) database. But experiments are expensive and don't always work, you won't learn all the structures that way. This is where bioinformatics must help.
J. Dapkūnas. In addition to the fundamental knowledge of protein structures, there are at least some practical aspects. In particular, proteins that function in living organisms are often drug targets, and knowing their structures can lead to attempts to create new drug molecules that would alter protein function and treat various diseases. Another important aspect is that proteins are nevertheless chemical molecules and can be used, for example, as catalysts in the chemical industry. Enzymes are very good catalysts, but often the protein that is found in nature is not enough, and it needs to be modified to better do what people want it to do. These changes are much easier to make by knowing the structure of the protein than by trying to blindly poke anywhere.
Is it enough to know the structures of individual proteins?
K. Olechnovich. No, because proteins usually do not work alone. They interact with each other and form permanent and temporary complexes. Therefore, the model of the complex structure is much more valuable than the model of a single protein chain, it tells more about reality. There are far more complexes than single protein chains, and far from all their diversity is reflected in the PDB.
J. Dapkūnas. I recently read about a study that tried to find interactions between proteins in bacteria using computer methods Escherichia coli ir Mycobacterium tuberculosis. It turns out that about half of the interactions found have not yet been studied experimentally. It is said that only less than 1/10 of the interactions between human proteins have been identified experimentally.
These studies show how often proteins interact with other proteins and how little we still know about it. And at the same time, it becomes clear that in order to understand how and what proteins do, and then to change this activity, it is necessary to analyze the structures of interacting proteins.
The GMC bioinformatics team has been participating in the prestigious Protein Structure Modeling Competition (CASP) for many years. What is special about this competition?
K. Olechnovich. Officially, CASP is not a competition, but an experiment aimed at assessing the state of our field, protein structure bioinformatics. It is like an unofficial world championship in this field, where dozens of groups of scientists from JAV, Europe, Japan and other countries are trying to show the best possible results.
J. Dapkūnas. This protein structure modeling experiment originated nearly 30 years ago to assess how accurately protein structures can be predicted. This is assessed by blind testing. All participants are presented with the same protein sequences. The structures of these proteins are usually already known, but not published. Participants in the experiment - anyone can participate, all you need to do is register - send their models, which are then compared by independent evaluators with real, experimentally determined protein structures. This is how we find out if anyone is able to model protein structures at all, and who can do it most accurately. Thus, this experiment pushes the whole field of protein structure science forward, as it honestly evaluates the progress in structure prediction and various related fields.
What is special about the latter, in 2020 the CASP competition held at the end?
K. Olechnovich. The VU GMC team has been participating in various categories of this competition for many years. This time the GMC team, in which, in addition to me and Just, the team leader prof. Czeslov Venclov took second place in the category of protein complex structure modeling. in 2018 we were the first in this category.
As for the uniqueness of the competition, it should be remembered that the pandemic adjusted the format of this event and for the first time it was held completely remotely, but this did not prevent us from achieving high results. However, the main news of the last CASP was a much bigger surprise than the virtual format - the victory of the DeepMind team, which used the AlphaFold2 structure prediction method based on artificial intelligence, in the single protein structure modeling category.
Two years ago, this team also performed well, but this year's results exceeded the highest expectations. It is interesting that in 2008, when no one even dreamed of such possibilities of using artificial intelligence, the first place in this category was taken by Lithuanian scientists Dr. Ch. Venclov and dr. M. Margelevičius team.
J. Dapkūnas. In the CASP experiment, we are involved mainly in the category of modeling protein complexes, where it is necessary to describe how proteins interact with each other. Because in 2020 we only used a slightly modified methodology that helped us to be first in 2018, we joked that either we would do well again or the world would have a scientific breakthrough. We were one of the best in the field of modeling protein complexes, so we can safely say that there was no breakthrough there. But in the main category of CASP, the results of DeepMind's AlphaFold2 team were impressive: the problem of single protein structures seems to be solved, now only technological improvements remain.
After this competition, the prediction of single protein structures became not a matter of science, but of engineering. This means that our category, the modeling of protein complexes, will apparently now be a major field in protein bioinformatics. At least in this category, we are not yet competing with artificial intelligence.
Can it be said that the capabilities of artificial intelligence have won over the powers of the human mind?
K. Olechnovich. It's better to call it a victory for the people at DeepMind. It wasn't like some general AI learned to model proteins. A large team of the world's best artificial intelligence specialists has worked intensively and purposefully for at least four years.
J. Dapkūnas. DeepMind CEO Demis Hassabis said at the CASP conference that it was not artificial intelligence that solved the problem of predicting protein structures, but people who used artificial intelligence as a tool. In fact, the DeepMind company assembled a large team of almost 30 people, which included specialists from various fields: both artificial intelligence scientists and molecular biologists. Crucially, their simulations were made possible only by using publicly available protein sequences and structures determined by thousands of experimenters around the world over decades.
What significance will the application of artificial intelligence have for the field of molecular biology?
K. Olechnovich. The structures of proteins and their complexes are important primarily for basic science. And practical applications are made using the results of fundamental science. I think artificial intelligence methods greatly accelerate the progress of fundamental science.
J. Dapkūnas. I think AlphaFold2 is a huge achievement. DeepMind scientists have shown that protein structures can be modeled with sufficient accuracy on a computer, and this will be of great importance both for fundamental science research and for various practical applications. But by no means can it be said that they have answered all the questions. In particular, it remains unclear how protein structures are formed, as we still know few details about what physical and chemical principles govern the folding of these large and complex structures and how it occurs so efficiently and precisely in cells.
In addition, there are many other questions related to protein science, from the protein interactions we have already mentioned, to the various properties of proteins. Protein structures are often thought of as snapshots of individual proteins, but that's not the case in life: they interact with each other and move all the time. In addition, it is very interesting how various mutations change proteins and their activity, so there is still a lot of work left for protein structure scientists.
As the pandemic rages, scientists in many fields are throwing their all into finding ways to overcome it. How can your research help in this fight, and what are the potential applications of artificial intelligence in this field?
J. Dapkūnas. The CASP community immediately jumped into the fight against the coronavirus, inviting scientists around the world to model SARS-CoV-2 proteins whose structure was still unknown, making these models public for everyone to use. In the fall, we ourselves took part in a similar protein interaction modeling experiment at CAPRI, where we tried to model how coronavirus proteins interact with each other and with human proteins.
K. Olechnovich. AlphaFold2 has modeled the structures of several coronavirus proteins. Those models are probably not bad. What to do with them? For example, one can look for other molecules that can act as drugs and block the proteins of the coronavirus by binding to them. This is a separate topic. So having a model of a protein's structure, even a very good one, is just the beginning. Scientists will not be out of work, and artificial intelligence methods will serve as tools for them. We will still have to think for ourselves.