Phase Space Invaders (ψ)
With the convergence of data, computing power, and new methods, computational biology is at its most exciting moment. At PSI, we're asking the leading researchers in the field to discover where we're headed for, and which exciting pathways will take us there. Whether you're just thinking of starting your research career or have been computing stuff for decades, come and join the conversation!
Phase Space Invaders (ψ)
Episode 15 - Paulo CT Souza: Developing a universal coarse-grained force field, and approaching the science of molecular complexity
In the fifteenth episode, Paulo Souza and I discuss the challenges inherent in managing a project of such a scope, and the philosophy behind the systematic way in which Martini is continuously improved and reparameterized. Paulo describes how a user-centric approach helps refine and troubleshoot the model through its widescale adoption, and how different inherent limitations of coarse-grained modeling can be addressed to progressively make the force field more predictive and less reliant on user-defined biases. Then, we talk about the interplay between force field development and modern trends in biology, biomedicine and computational sciences - from small molecules to lipid nanoparticles, huge systems such as organelles or viruses, and molecular complexity, understood as going beyond simplified model systems and approaching biologically relevant mixtures and compositions.
Welcome to the Phase space invaders podcast, where we explore the future of computational biology and biophysics by interviewing researchers working on exciting transformative ideas. My guest today is Paulo Souza and group leader. Laboratory of biology and modeling of the seller or the Ecole Normale Superieure in Lyon. These days. Paulo works as the main developer of martini. The arguably most popular coarse-grained force field that unifies lipids, small molecules, proteins. Nucleic acids and many other polymers in an interoperable physics based framework. In the two decades of its evolution. The first field evolved into many directions and flavors. And Paulo. Just the force of the broad community of developers. So we discussed the challenges inherent in managing a project of such a scope and the philosophy behind the systemic way, which martini is continuously improved. And the parametrized. Paulo describes how a user centric approach helps refine and troubleshoot that model through its wide scale adoption. And how different limitations of coarse-grain modeling can be addressed to progressively make the force field more predictive. And let's rely on user defined biases. Then we talk about the interplay between forcefield development and modern trends in biology, biomedicine. And computational sciences. From small molecules to depend on the particles. Huge systems such as organelles viruses and molecular complexity understood as going beyond simplified model systems. And approaching biologically relevant mixtures and compositions. That's a lot of ground to cover, so let's go. Paulo Souza, welcome to the podcast.
Paulo CT Souza:Thank you. This is very nice to be here.
Milosz:So Paulo, uh, I think it's very much unquestionable that Martini has been the most popular coarse grained force field for biology for a good while. And I have always been cheering for it somewhere in the back of my mind as the most promising candidate to unify the somehow scattered world of coarse grained modeling. And now you have been in charge of the development of Martini for quite some time, too. I got to say, I know plenty of people who work on, say, improvements of protein or RNA models, but the breadth of the systems that you have to cover for Martini is just not comparable. So I first wanted to ask, how do you even go about a task as general as developing a universal force field for biology that nevertheless incorporates some empirical knowledge from each domain?
Paulo CT Souza:Well, I am a chemistry and I always look at the problem from a perspective, right? Although the biomolecules are different and they have a different physical chemical behavior, you also can actually I find many things in commons, and Martini is about that, right? So Martini is a building block kind of approach, like a Lego, where actually the pieces will make the models. And then you needed to calibrate these pieces in a way that they follow certain chemical trends. But at the same time, it can be pieces to build all kinds of biomolecules. So you needed to have a compromise and test it all at the same time, right? So we have it. In the development of Martini, we work it in a way that we started with the pieces, but then the calibration goes on for more complex systems. So actually we can cover more of the chemical space, because even if you only look at the pieces, once you go to chemicals for complex systems, is still many missing points in the interaction matrix. That's the heart of Martini to cover. So it's natural to, if you want to, cover and validated more of the interactions that you need to have in any coarse-grained model as Martini is natural to go to all diversity of biological systems. It's the way that actually you can be sure that this Lego kind of approach is working.
Milosz:Right, so you start with a kind of maximally reductionist approach and then you look at points where it breaks down or when maybe somehow.
Paulo CT Souza:any, any coarse grained model is not transferable, right? You need to be a bit careful. So there are approximations going on, but Martini is very pragmatic, the idea is to try to make it something that is useful. could see as a toy model that actually you can have a very easy way to get some trends and some ideas, and you can easily change the species of Lego and see, but yeah, always be careful. Any course growing model is not transferable. So you need it to be Consider that if you are going a bit far away from the, how we validated the model, you should at least test it against some experimental data to see if you actually still valid what you are looking.
Milosz:Right. So this is where this integrative question comes in, right? You mentioned that you're able to integrate now a lot of experimental data. What kind of data do you consider for that, for, for the force field development?
Paulo CT Souza:parameterization of the beads itself, we use a lot of thermodynamic data, right? So Martini, The original model was fully based on partitioning free energy or free energy of transfer. So essentially, the tendency of system going from an oil phase to a water phase, but in the newest versions, we have incorporated more. So actually, we use a lot of miscibility data as well.'cause in these, we also can play around with the self interaction that's important, aspect in terms of cavity cost. Uh, but at the same time, once you have it validated the pieces and you go a bit through more complex systems, you also start to looking other properties. For lipids, it's very common to compare with thickness, area per lipid for proteins, you can take a look and compare dynamics, protein, protein interactions, and in a later state, even, free energies of dimerization. So. We go from the pieces of thermodynamics to complex systems as well. And this is kind of integrative modeling in this sense, right? So it's a model in the coarse grained model development. We call this a top down approach, So you use experimental data to calibrate the idea. But it's also integrative modeling in the sense that you use experimental data to improve a bit the model. This is from the interaction perspective. Okay, no bonded interactions. For bonded interactions, Martini actually relies a lot in atomistic simulations. So in this perspective, is a bottom up approach.
Milosz:I can see. What are the most challenging parts that are particular to coarse grain modelling? I mean, you have all sorts of issues with kinetics and entropic effects and maybe perhaps cooperativity. I imagine you want to address those things in some systematic manner, right? Well, I think
Paulo CT Souza:your question has two sides. There is one side that's about developing coarse grained models, right? And this is about to know the limitations. So once you develop a coarse grained model, you reduce the degrees of freedom, which means that you need to compensate this somehow with enthalpy, right? And in Martini, this is done. Actually, the potential is effective potential based on Lennard Jones, which also implies other limitations, right? Lennard Jones, potentials, they have a limit, uh, region that is a fluid, which means that actually have certain limitations of what you really can represent. For instance, solvation free energies, we can get the trends right, but you cannot be quantitative, because if you try that with Lennard Jones, you go to the solid phase. So in this I missed one part of your question. Can you repeat again?
Milosz:cooperativity, for example, like multiple hydrogen bonds. co occurring, uh,
Paulo CT Souza:For instance, you cannot represent, but you can somehow represent the difference in strength of the interactions with that or the propensity of being a bit stronger interactions in relation to a system that doesn't form hydrogen bonds, right? So these things you can, incorporate it in an effective way, but as a developer, you needed to incorporate it. to be aware and see how you compensated that. And this is all in this effective potential that you have it. from a developer perspective, right? For a user, there are also other aspects, one aspect that you need to be careful is actually numerical instability, when you build the models graining, you have a time step that you use for the integration of the motion equations and actually getting models. Stable is one challenge that they have at that. So there are other limitations when you look at from a user perspective, how you build it and how you can interpretate what you're doing. Actually, there is one point that users tends to do very badly is actually going to, Directions that the courseware model was not designed, so, for instance, look at it in our papers, we never look at solid state. But, so if you try to use that with a courseware model as Martini, it depends and sees that that will not work, right? So be aware of the limitations from a user perspective, so you can properly use the model and create a model that is stable, while for the development, you needed to somehow, Incorporated what is missing in a different way and be aware, right? So there is no free lunch. Every time that you actually simplify the system, you are losing something
Milosz:And on top of that, you have the question of model flexibility versus portability, right? So you're, for example, developing the model with, I imagine, GROMACS in mind primarily. Does that limit you in terms of how expressive the physics can be? You know, if you have to constrain yourself to functions that are expressible in this one particular Engine, yeah,
Paulo CT Souza:the indeed, indeed, but this is in part, part of the philosophy of the Martini developers. So one key aspect we want, if you are working, with coarse-grained you want a model that is fast, right? So needs to pay off because you're not doing atomistic. then we want to take advantage of these decades of development of MD codes as GROMACS, OpenMM now, but also NAMD, we know that also already implemented Martini before. And with that we can keep competitive, right? So this is, in the end, is a choice. We needed the model to be fast and we want to take advantage of what is already around and being developed. Not only. In terms of speed up, in terms of techniques, right? GROMACS together with PLUMED, you can use all the free energy methods that are there. So this is one aspect of model. Another aspect that's important, we like it to keep the model easy to use. And this is. Not for citations necessarily, it's really with the idea that a model needs to be testable. are developing more methods and approaches that actually others in the modeling community cannot use it and cannot test it, you are not sure actually what you're saying. You needed this external validation from the community. And the idea of keeping easy is fundamental. The end. Keeping, like, using GROMACS is also connected with that, right? You want, in a way, that people can easily start it. Like, one day, looking at our tutorials, you already can start your first simulations.
Milosz:I imagine that if you have such a big task of developing a general force field, you're going to rely at some point on the feedback from the community, right? That people will tell you, Oh, look, this kind of systems do not work, or this kind of systems have some issues that are introduced by the latest version. Because now Martini 3, from what I understand is trying to address the shortcomings of martini two, which was stickiness, right?
Paulo CT Souza:For
Milosz:what is that?
Paulo CT Souza:was one of the main issues, right? So people, especially looking at protein protein interactions they have many problems because they tended to interact In a non productive way, right? Whatever interface you look at, actually they would interact too strongly. So we put a lot of effort on that. But even, so, and actually we're I think near to seven research groups. the paper was 26 authors, but actually it was even more, right, people testing. And even doing all these efforts, we still actually couldn't, cover all the bases. So once we published, you saw, actually, start to see people already trying to do it. Testing Martini, for putting the protein model in different contexts and then they're already starting problems, right? So it is Extremely important that the community tested the model and this is the way that actually we learn it In parts that you actually miss it during the calibration or validation and now we can actually work in improvements That's already the current stated right when we release it in martini3 The protein model you could say that was even a prototype in this sense because they was not super fully tested And now after this phase of testing, we are now actually pushing to a new phase where many of these problems that were reported in the literature will be solved as well. And this is an infinite cycle, right? This is the beauty of the thing. And this has happened because the model is popular, Otherwise, we wouldn't have this feedback or would take it longer to actually find out the problems.
Milosz:I see but what I perceive as the biggest complexity of building coarse grain systems is you need to know what kind of information you have to put into the model and what kind of information you can get out, right? Especially in the case of say proteins how much you can rely on, let's say, conformational sampling of internal protein motions. how much you can predict new protein protein interfaces.
Paulo CT Souza:Indeed, indeed. For instance, in a first level, actually, if you are trying to benchmark protein protein interactions in Martini, certainly you need to test the ones that actually don't change too much the conformation, right? This is the first step. So take the tests that actually don't see conformational change Once this is actually reasonably working, then you go step by step for things that start changing, to a point that now people are looking also at IDPs, right? So we have like a prototype models now that even they are unfolded, right? in certain environment, and once they bind, they go to another environment. This is like go to other secondary structure. So this is like the holy grail of the whole thing, Something that could change fully the conformation. But this is the last step, and it's super hard to get it, possibly impossible for certain cases, right? So if you look at, for instance, a protein that has structural waters inside, let's say that there is a water making a bridge between two residues. How actually you can deal with that in a coarse grained model So you represented water as a coarse grained bead. So there are many challenges ahead, right? But for some cases, actually, it is doable, and we are moving ahead towards that.
Milosz:Right, so you would say there is a pathway to maybe at some point get rid of the elastic network model or to free the protein to behave in a kind of self directed way.
Paulo CT Souza:recently released a new preprint about this Go martini 3, And also a mini review about the protein model and in both we try to make some statements So one aspect that we have now in the model is this really Separation of the protein model and what we call the bias and the bias is the elastic or the Go. So now these models needs to be completely separated and the bias comes to fix problems, but the model needs to be in a way that every time that now improve the protein model. The bias needs to sense that in a way that you need to reduce. So, and this now is giving us a direction for that. And we have had models nowadays that, for instance, we are able to change secondary structure already. So is already like
Milosz:Mm hmm. Mm
Paulo CT Souza:So, and which means that I can get rid of many of the elastic networks go potentials there also, once we actually improve it a bit, the protein protein interactions concerning the backbone, for instance, let's say that they rescue a bit of the directionality, maybe now I can represent better the beta sheets. these also remove it a bit. So one day we will fully remove it. Elastic and Go models. This is the, let's say, the direction is the goal that everyone in the field should have it. Right now we are going towards that and let's see how far we can go I think not with Martini3. Maybe with a new version in the future, maybe it will be possible, but with this current set bits and the way that we work it, you still have some limitations as the structural waters that I just mentioned to you.
Milosz:hmm. I see. And how much can you benefit from the recent development in, let's say, AI structural prediction, does it change the way you work with the force field or is it something completely orthogonal?
Paulo CT Souza:I would say that this is still something that we are digesting, right? So one aspect that we are definitely incorporating soon is for automatic parameterization of new models, right? And for that, actually, we needed to increase our database of molecules. And once you reach a certain number, we know that uh, deep learning approach will possibly solve the problem. So, but not for all classes of molecules, right? We expected for small molecules be the case, but for other ones, don't think that will be still possible. You will not get, for instance, Full validate, let's say, a created polymer model with this approach, we think this is the way that the martini works. So this is one aspect. Other aspect that you have been looking is actually how we improve it. Our bonded potentials with machine learning as well. because. This is something that we can do it. Right now, we are limited with the potentials that we fitted against some atomistic simulations, but we could potentially start predicting potentials that actually could have a balance of secondary structure, then this would allow us to actually give it transitions. This is also another point that is impacting. In the future, like, say, Martini 4, this is the way that I see it. Possibly the whole interaction matrix of Martini Will be based or partially based in some AI approach, but we still don't know how, right? Because if you go there, so Martin is based in this interaction matrix. This means that the interactions doesn't follow a Lorentz-Berthelot rules, right? You don't use combination rules. It means that you need to define each pair of interaction of all the 843 beads that we have it there and how much of this pair interactions are validated, which means that you have experimental data. 20 percent maybe the rest relies in a lot of. chemical intuition or chemical knowledge that you have about trends of how molecules work how you incorporated these in a machine learning kind of approach is to don't know. And because many points there, we will not be able to find out only in complex systems, not in the simple systems that we use to calibrate it, the bit to bit interactions.
Milosz:I can see how AI systems can be really good at interplating but not necessarily at extrapolating to those unexplored regimes. so yeah, a very important consideration there. And as you say, you, set up a database of models, right? Where you can collect user input now on well parametrized, well validated, small molecules,
Paulo CT Souza:Yeah,
Milosz:large molecules.
Paulo CT Souza:not my idea, right? So actually it's a work of one colleague of mine,Guillaume Launay, and this is possibly expanded in the future. The idea was actually to have this database, so not only the developers, but anyone that has a model that was validated and published put it there. And actually, even if it's published, every time that someone submits a model this model passes by a curation, so we have it like the Martini developers taking a look, if it's actually well and then placing there. And the idea is growing this database until the point that actually it would be very useful for everyone, for exactly the automatic topology builders. Not there yet. So right now I think we have it near to 500 molecules there, but In terms of chemical space not too much is covered because most of these molecules for instance actually half I would say that are lipids, but we have a ongoing effort to expand it now for more like ionic liquids, deep eutetic solvents that you already published and we're still developing. We're also working with new classes of lipids for that we are using fragments like lipids in lipid nanoparticles. We have been working, so we are pushing New applications, also keeping in mind about this new space that we are covering so we can feed in the database.
Milosz:I can imagine that much of the development is actually driven by trends, right? What becomes important for the community. If there is a new thing like, I don't know, vaccines, then maybe, nanolipid formulations will become a burning issue.
Paulo CT Souza:Well, definitely The delivery systems was always nice application, of course, with models, right? Given their size, So they usually are, you could look at from nano to microemulsions to lipid nanoparticles. And then they have a sizes that go from 50 to 100 nanometers, there is even micrometer, size. And this is like a coarse grain realm, right? So this is where actually coarse grain really shine in relation to atomistic models. with the vaccines and this whole development of new generation of lipids, this is really a lot of new developments in Martini to Support the area so we can maybe one day even do a screening of this. in one side, you have the screening, but on the other side, you also get the benefit creating new models, increasing this database that we one day will help us not only in nanoparticles, all the other emerging fields that we have it Small molecule binate proteins People working in interface of biology materials also have colleagues working in materials itself. Polymers, semiconductors, green solvents, yes. be.
Milosz:So it all contributes to the same goal in the end. How do you see the ultimate goal? We had this conversation with Dan Zuckerman recently here on the podcast, where we were discussing the pros and cons of going really, really big in terms of organelle and cells versus really, really profoundly studying the small systems to make sure that we understand, what is happening on the small scale. And so where does Martini development go in this regard? I can imagine both, but what is the philosophy behind it?
Paulo CT Souza:I think Martini actually contributed a lot to the view of going to complex systems, right? So actually we even using these kind of thing in our talks, right? So if you look at, for instance, the lipid field, right? If you go to nineties, the eighties, let's go to the eighties, people were simulating one component. Biers. Picoseconds, right? Then if you move it 20 years in time, you start to see these multi-component membranes, like 2 3 different lipids and also some proteins there. But then here now is not more atomistic. You see coarse graining and now we like, now we move it again, 20 years we are in what we call the computational microscopy area. Right? That actually you can model things in their full complexity and what's the gain in that for me? Although you, every time that you increase in size, of course, you cannot go so long in time, right? So you cannot run longer simulations, but at the same time you can have many copies of your protein of interest in a more complex, realistic environment. And this is something that actually for people that wants to understand biology is super valid, right? But sometimes transmembrane proteins, for instance, for years, people only studied then in POPC membranes, which actually can be very far away from the real lipid environment that they are in cells, which are actually asymmetric membranes with hundreds of types of lipids, which actually can play a huge role in the modulation of their function. So I really see these as a very important thing. Although. These needs to go hand to hand also with the simple systems because in the simple systems is the ones that we can validate the model. Okay, so while in the complex, we go to the frontiers of what we can predict in the simple ones is the ones that we use it to. Really check it. How good are both? Both are important in this sense But I think we need to keep it both because one is more realistic for biological applications The other we can be sure about how accurate it is the model
Milosz:it's kind of fascinating that the world of science is slowly inching towards complexity science, right? People are trying to define complexity to study complex systems. So it's really cool to see that we as molecular scientists will be able to take part in this, new exploration.
Paulo CT Souza:I need it and we are only touching right the it is only starting right you're talking about complexity But in the future If you have this complexity, you also need to think in non equilibrium. And you also have fluxes, difference in protonation state, how these all will be incorporated in the system so you really can start simulating, who knows, maybe a whole cell or a fusion of a cell with a huge nanoparticle, things like that. So this is, very exciting. I know that many people are, Fully interested in what actually bring it. But this other side of modeling is also a huge thing. And soon you will explode.
Milosz:No, absolutely I see there's a lot of attempts at just throwing data at AI, but I think without a very serious kind of inductive bias from the physical side, we can easily get lost in what the AI predictions turn out to be. That's, that's a consistent motif coming out from that research.
Paulo CT Souza:Indeed, indeed.
Milosz:And then, well, complexity has the particular feature of being complex, so we shouldn't expect simple answers, I guess okay, then, thank you a lot, Paulo Souza, thanks for the conversation. Thanks for sharing your knowledge about
Paulo CT Souza:You are welcome. I would like to thank you for the invitation. And please continue this fantastic work. It's really promising. I'm really liking this podcast.
Milosz:Thanks a lot. Great conversation. Have a great day.
Paulo CT Souza:You too. Bye bye.
Thank you for listening. See you in the next episode of Face Space Invaders.