Phase Space Invaders (ψ)

Episode 19 - Alex MacKerell: Simple physics, missing experimental data, and model compatibility

Miłosz Wieczór Season 3 Episode 19

Send us a text

In episode 19, Alex and I discuss the history and future of developments in the CHARMM family of force fields, and whether Alex believes there is more physics that we need to include in our classical energy functions to work around our current challenges in biomolecular modeling. Throughout the conversation, he's advocated for a pragmatic, down-to-earth approach, with the idea of "big molecules, small physics". Alex also highlights the need to augment AI tools with HI, or human intelligence, arguing that so far most attempts at automating model development too much end up with parameters that are unphysical and non-transferable. Yet another interesting point is our often surprising reliance on truly ancient experimental data, and we try to make a point that these very non-sexy physical chemistry measurements straight from the 60s and 70s could truly advance the field if anyone was willing to fund them and actually get them done.

Welcome to the Phase Space Invaders podcast. Please make yourself comfortable. This time in episode number 19, I'm talking to Alex MacKerell, a professor of pharmaceutical sciences and the director of the computer aided drug design center at the University of Maryland School of Pharmacy in Baltimore. In the community, Alex is famous for his contributions to the CHARMM family of atomistic force fields, including multiple innovations on the physics side. from the dihedral correlation CMAP terms that are key for accurate modeling of protein structures to the more recent developments of the polarizable atomistic model based on the oscillating Drude particle. So whether it's the CHARMM general force field for small molecules or CHARMM36 for biomacromolecules, You will find Alex's name all over the place and he remains also extremely active in the field of molecular pharmacology and drug design. And so we discussed the history and future of these developments and whether Alex believes there is more physics that we need to include in our classical energy functions to work around our current challenges in biomolecular modeling. Throughout the conversation, he's advocated for a pragmatic down to earth approach, which with the idea of big molecules, small physics. Alex also highlights the need to augment AI tools with HI or human intelligence, arguing that so far most attempts at automating model development too much end up with parameters that are unphysical and non transferable. Yet another interesting point is our often surprising reliance on the truly ancient experimental data. And we try to make a point that these very non sexy physical chemistry measurements straight from the 60s and the 70s could truly advance the field if anyone was willing to fund them and actually get them done. Hope you enjoy our discussion.

Milosz:

Okay, Alex MacKerell, welcome to the podcast.

Alex MacKerell:

Thank you. It's good to be here.

Milosz:

So Alex, you spent a good part of your impressive career working on the fundamentals of the CHARMM force field, which is considered state of the art in almost any corner of biochemistry, you know, be it glycans, membranes, polymers. And now obviously proteins and nucleic acids as well. Is there any remaining part of the chemical space that you yourself are looking forward to explore in this way?

Alex MacKerell:

Well, we just, you know, from the beginning, we started with basically proteins and then to nucleic acids and so on. And then you realize we had to make a very comprehensive force field because people want to start to, look at very large heterogeneous systems and so on. So we really looked towards first covering biological molecules as much as possible in a, consistent fashion. So basically the proteins and the nucleic acids and the lipids, and they all could be mixed and matched and so on. And then you move into non biological space, which is now getting into drug molecules and so on. And so this got us into,

the CGenFF, the Charm

Alex MacKerell:

General Force Field. And basically we started with everything that we had done for the biologicals. Made CGenFF and then slowly sort of expanded CGenFF coverage into a wider range of chemical space. So that becomes, an infinite task, essentially. Now, you sort of cover the most relevant parts first, and then, you know, when you. Sort of run into a problem that there's some obscure chemical connectivity that you haven't dealt with you move into that area and so on. So this is sort of where we continue to sort of expand the additive force field and so on and then also, there's the realm of the lipids and carbohydrates, which are. Quite diverse in biology when you get into all the bacteria and all these different things. And so that's an area which sort of continues to be expanded again slowly as people run into different problems that they're interested in and so on.

Milosz:

Right, there's this feeling sometimes in at least some corners of the force field development field, right, that we are getting to the limits of what you can squeeze out of the classical force field. But from the way you stated it also seems that, in many, many subfields, we're just at the beginning, right? There's this combinatorial explosion where, which never ends really. So where do you think we are in this, area, Is it that we're really in front of a, like opening a Pandora's box of possible explorations, possible developments?

Alex MacKerell:

Well, you know, one of the things that I've been happy to see, and this was sort of our, you know, when we started doing these things around, 1990s or something was again, this idea of a comprehensive force field. And now you're seeing people start to apply that to very large systems. You know, I think a good example was with the pandemic. You saw these simulations of the spike protein with the by layer and the glycans and they were getting very interesting and very appears to be pretty accurate results with respect to conformational changes that they didn't really bake into the simulations and so on. And I think, you know, in the context of the additive force field, this is sort of where you're going is showing that we're seeing that it works very well in very large, diverse. know, sell walls, all these sort of strange things, and that's been very satisfying to see that all these sort of little itsy bitsy things like worrying about the hydration of, you know, 300 of hydration of methanol. So it's not perfect, but what has been done in a way that it's all balanced out. quite well in these very large heterogeneous systems, and you start to see people like doing, pharmacology. Now it has a wing of like. Computational chemistry, and they're sort of getting insights in very complex systems. And I think that's great. I think that's going to continue to move forward. And again, as part of that, you sort of have to add some new functionality with respect to carbohydrates, lipids or whatever, even with proteins and Nucleic acids, you have all these different modifications and these types of things and that'll just continue to, you know, move forward and so on.

Milosz:

Right, so probably we get all the low hanging fruits already, right? And now it's gonna be whole, galaxy of additions. But it's impressive to think that probably a large part of medicinal chemistry relies on the tools that you've developed, right? So,, I know that you have been looking into using, for example, machine learning to enhance the prediction of types, prediction of parameters. is it where your mind is now, or like, is it the main way forward, or do you think we can add something else to improve those general capabilities that's just beyond standard things?

Alex MacKerell:

Well yeah, we're, we're combining ML with HL, which is human learning. And you know, it's like, you can start to sample like broad ranges of, parameter space using machine learning and going in and sort of generating a collection of. Data for a range of parameters, and then using that to train a machine learning models then search through millions of parameter sets and so on. but in the end, it still gets back to, making decisions about what's coming out of the models in the end. And you could, you know, you can really sort of get. something that's looks really good with respect to reproducing some sort of target data and so on. And then you sort of look under the hood and you see that the values of the parameters are sort of these unphysical Values that just don't make any sense. And if you start to combine them with other things, it just wouldn't work. So, I mean, this is sort of the problem of getting into like highly automated fitting approaches is that you sort of can use them, but then you really have to stop, step back and see what you've done. And is it going to map out into a broader context and so on, and you can say you can keep adding, you're making your machine learning models and larger and more complex and these types of things. And, but you're always, at least us, we still have to go back and look at, you know, what's actually been done and then reevaluate how to maybe do the machine learning better, or maybe even sort of going on a little tangent of HL and, you know, sort of doing some, manual optimization and. Using our intuition in these types of things. So it's a combination of you really have to sit down and think about the problem, collect the appropriate target data, which is very sparse. This is a big problem, and you just can't rely on quantum mechanical data because it doesn't really get you the condensed phase properties. You can sort of work around that to various degrees. But, you know, the sparseness of the experimental target data that there is a real problem in this sort of. Hinders automated procedures to some extent.

Milosz:

Right. When it comes to automated parametrization, I've been there. I can certify that yes, under the hood is absolutely ugly. Even if something stable comes out of it, that's very true. So do you think we might be looking into another, like you pioneered some of those, new ways of incorporating physics, Polarization, CMAPs, system specific combinations of Lennard Jones parameters and so on. Do you think there is another term that we can add, you know, that will solve many of our problems? Or is it like, we're close to the limits

Alex MacKerell:

Yeah, I think one of the areas now that we're sort of focused on is Okay, you can always expand the energy function, right? We've now moved into the Drude polarizable force field. So we do polarization using these Drude oscillators and so on. Now you can start to get, you know, charge transfer, things like that, which makes the model more complex and so on. I mean, one of our general philosophies has been to generally just sort of like get to a form of the energy function. And we're going to stay there and we're going to expand it. the coverage And this is what we did with the additive force field to get a really expanded coverage that allows people to do these heterogeneous systems that we talked about before. And with the drude force field, we're sort of doing the same type of thing now in the context of a given form of the energy function, One of the big approximations are the combining rules for the van der Waals interactions, and we're typically using, we stay with the Lennard Jones formalism for all of its, you know, its good points and its bad points and so on. But the combining rules are a severe approximation, and one of the things we're working on is just bringing in more explicit parameterization of the different atom type atom type interactions. This gets back to the pair specific Lennard Jones and sort of expanding on that. Because I think that's an area where you can get a lot of improvements in the context of the same form of the energy function. And, um, you know, there you can start to use, large scale quantum mechanical calculations of like large numbers of interacting pairs, automating that and using that as target data to sort of fit with like the off diagonal Lennard Jones terms, which is the terms that you have to calculate using the combining rules and so on. So I think that's a big area for We can improve in the context of the current energy functions, and we found that it's more important with the polarizable force field because when you have the electronic, Polarization effects, basically, you know, the electronic structure in principle changes as a function of environment. And so this makes the relationship of your van der Waals term and your electrostatic terms, even a little bit more sensitive than in the additive force field. And the charges are never changing or anything. And so we found that we have to be a little bit more careful with polarizable force field with respect to the treatment of the, um, the Lennard Jones term. And then using these, looking at pair specific interactions is quite important. And We're trying to automate that as much as possible, but again, you get back to the problem of you know, the quantum mechanics that we takes you so far and so on. But you know, we've more recently done some work with like ions optimizing specific parameters for the Lennard Jones term between ions and the range of functional groups come the biological molecules, drugs and so on. And that's an area we're sort of again, we're working in that.

Milosz:

Right, it's funny to think that a lot of artifacts can be actually caused by ions, right? We often think that, oh, it's just the internal, uh within a macromolecule or something and then you realize, oh, there's an ion, chelating something or staying in a strange position, especially with magnesium or those ions that are more problematic

Alex MacKerell:

Yeah, and this is, you know, with the polarizable force field, your solutes, whatever your protein and nucleic acid is much more sensitive to the ionic environment, much more than the additive force field. And we're seeing that that requires more care and so on. It also requires proper treatment of the ion environment. Now, you have specific interactions, like RNA with magnesium at a specific site and these types of things and so on. But then just getting the distribution of the ions around a nucleic acid or protein needs to be done carefully, which is largely doing, using big systems. So you can have basically your nucleic acid, which is a poly anion. Then you've got the counter ions around it, but then you have to have the co ions beyond the counter is to sort of get a proper balance for the counter ions to, like, go towards the nucleic acid versus going back out into solution and so on. So you have to start simply doing Simulations on systems that are larger to get a full ionic environment, and we find that that's more important, especially with the polarizable force field to sort of get these more subtle aspects of ion biomolecule interactions better. These types of things. Yeah.

Milosz:

it is, there's also always a question of availability of good quality data, right?'cause things like positions and, occupancies of ion binding sites. are notoriously hard to determine with high precision, especially when the conditions of experiment are often extreme or strange.

Alex MacKerell:

Yeah. I mean, like, sodium in the grooves of DNA is still somewhat controversial, right? You know,

Milosz:

How do

Alex MacKerell:

it,

Milosz:

that exactly? So yeah, what's the driving force on the experimental side that you think might help us here? Do you think it's just coming out of structures and or like some specific techniques that you see that's It's helping a lot, or

Alex MacKerell:

you know, the stuff that would really help is the stuff no one wants to do and this is back to, hydration free energies of a wide range of chemical space, you know, and somebody experimentally measuring those pure sovereign properties of, you know, a wider range of, you know, molecules, things like osmotic pressure of, much wider range of ions and different types of functional groups like, you know, give me dimethyl phosphate with a range of group one ions, you know, osmotic pressures for those. And we could, you know, really sort of know what we're doing versus sort of, we're sort of grasping at like, there's sodium with acetate, you know, potassium with dimethyl phosphate, looking at these little spaces where you, you know, This is this idea of sparse data, where you find the pieces that you can work with, and you then sort of try to come up with an algorithm that says, okay, if I use this approach, I get something that gives me a condensed phase properties, and then you start applying it to, other ions and things like that, and you're hoping that it works. You just don't know. And then you sort of have to do simulations and you're you sort of look at What you're getting and say, well, that seems reasonable. It doesn't seem reasonable, but I have the data to really compare it to. So this is sort of in the end if everything's like solid, you know, salts coming out of solution is probably not a good, you know, good force field and these types of things, but you don't know the details of like what the exact isomeric pressure should be. So you sort of say, okay, it looks good. Go from there.

Milosz:

You know I've definitely seen myself, you know, look into papers from 1967 or something like that. Trying to read

Alex MacKerell:

On these topics, it's like, yeah,

Milosz:

Yeah

Alex MacKerell:

stuff in the twenties. I mean, this gets to. Yeah,

Milosz:

From a

Alex MacKerell:

the

Milosz:

an

Alex MacKerell:

task of the

Milosz:

all those

Alex MacKerell:

physical chemist.

Milosz:

Yeah, and for example, you see there is a discrepancy with a factor of two between the paper from 67 and 72, and the authors are already dead, so how do we even sure that they

Alex MacKerell:

Well, yeah, it's just the accuracy of the experimental method. I mean, you know, this is one of the things too, is just like computational people look at experimental data and it's like, you know, this is absolutely correct. And oftentimes it's like, well, probably not correct. And especially when you start to look at, you know, a range of properties in this, this is really strange. And sometimes you have to say, maybe the experiments were just not, you know,

Milosz:

yeah, so maybe there's a point in us computational people who know what the needs are running mini physical chemistry labs, right, and generating our own data for those particular purposes. I don't think that would be actually complicated if it was doable in the seventies. But as you say, for some reason nobody wants to repeat those things,

Alex MacKerell:

Well, nobody, you know, it's, you gotta pay for it too. You know, and this is like, know, you go to a funding agency and you're saying, I'm going to do, osmotic pressures of a bunch of these things and oftentimes they're like, well, that's not really exciting. So we're not going to fund that. And, you know, in people's interests too, it's just like, again, it's the physical chemists who are really into like looking at things like osmotic pressures and so on. So you have to find people who are interested in doing that type of work and these types of things. So it's, you know, there's a funding issue. There's an interest issue.

Milosz:

Right. You know, we're

Alex MacKerell:

Yeah,

Milosz:

there. So if anyone wants to cite, this podcast as a reason that there's a community interest in getting this funded it is. yeah. So that's, that's one thing. then the other thing I wanted to talk about is your famous fondness of protocols, right? Cause I even had Justin Lemkul on this podcast before. And we, and we joked about your famous sayings about how protocols essentially make, the field go forward. Right. In a way that's reproducible so on. And I think what you have in the CHARMM-GUI server. now. It's a great example of how you could get reproducibility. out of this standardization of workflows, right? What

Alex MacKerell:

yeah, well, this is, you know, obviously Wonpil's, you know, Wonpil Im is, the king of the CHARMM-GUI and it was sort of a great visit on his part. And then it was great because, you know, we knew each other well. And so we were able to sort of help you sort of put those standards in there. And, yes, it's very important to have your truncation methods, you know, the types of, the ensembles you're generating and so on, make sure that they're consistent and the force field is somewhat dependent on, what truncation scheme you use and so on, especially with, anisotropic systems such as bilayers and so on. So, yes, that has been important in getting consistency into the field and again, I think it's worked out well because, you know, people can go into the charm GUI, they can speak, you know, Wonpil and all his coworkers have put in great tools to like make these complex systems, but then they know that with what they're coming out and then how they're simulating them with respect to, cutoff schemes and all these types of things are well thought out and you're going to get the best behavior out of the underlying force field that's being used. So, yeah, I mean, just being very systematic is just. essential

Milosz:

Yeah, it's also true that there's a big lag time in terms of adopting best practices, right? Something comes out in the literature that, oh, now this is the go to, thermostat or the go to setting of the cutoff distance. And then it takes people five or 10 years to update there. MDP files in GROMACS or run input files in other softwares, right? So, yeah, I think if you can accelerate this, it's great.

Alex MacKerell:

You know, a good example of that was the LJ PME generated by Eric Lindahl and co workers, and I mean, it's great, okay, because it's a long range treatment of dispersion, you know, it's rigorous, it's nice because you can now use a shorter real space cutoff for both your electrostatics and your Lennard Jones terms, which gives you a speed up, you know, you got the extra overhead for the FFTs and so on. But you still get like a 20, 25 percent speed up going from 12 to 9 angstroms and so on. And I visited Eric Lindahl in Sweden. Pre pandemic and they just come out with this and they're like, why isn't everybody using it? It's like, well, the force fields aren't optimized for that, you know, we had to go back working with like Rich Pastor and Jeff Klauda and others to like, we optimize the parameters for the lipids because the lipids are particularly sensitive to that and so on and then once you've done that, then you can start to use LJ PME. And in addition, you've got to get people to put LJ PME and other codes besides Gromacs, right? And this is, the force field or whatever people have to support it, right? You need to have force field supported in openMM and CHARMM and NAMD and Gromacs and so on. And those logistics just take time for people to implement tests and then put things wide to, to wider use. Yeah,

Milosz:

but if it was the same with things like CMAP, right? When you introduce a correction into one software, then it takes half a decade for everyone else to,

Alex MacKerell:

exactly. Yes.

Milosz:

it into, yeah, implement it into the code. That's a big consideration for people doing, things that are out of the box. So, yeah, I mean, it's really cool that, we have this pathway to shorten it.

Alex MacKerell:

Yeah. I mean, Again, people have to, you know, get on board and it's like, okay, the force field sort of like halfway done and then convincing people to sort of implement that in their code to spend the time to do that it's always hard. I mean, with the Drude polarizable force field, we had it in CHARMM and then, um, Klaus Schulten moved ahead. Implemented in NAMD and, you know, CHARMM's a little slow compared to NAMD and these other codes and so on. So that was very helpful because it allowed us to start to do a lot more testing and so on using NAMD. In addition to the, the work we were doing with CHARMM. And so, and he, he knew the force field wasn't ready yet for Primetime, but he's like, let's get this in there because it's going to be useful. And that really facilitated our development of the force field and so on. And then later on, OpenMM, others have adopted. Like the two polarizable force field. Yeah, that's part of the, you know, part of the game

Milosz:

we've seen many cases in which great things come out of having just a big user base, right? Where people just figure out what the problems are and they share that and people will go back to fix it. I think there's also now this huge initiative. It's actually many groups in parallel, including ours in Barcelona, of having repositories for trajectories, So if people can have consistently generated trajectories, that's also going to improve a lot the reproducibility of, say, in the context, for example, of training, I don't know, machine learning models on trajectories, if they are generating with the same setups, the same parameter sets, and so on.

Alex MacKerell:

yeah, I mean, that's, I mean, people sort of pushing that for years, it's logistically very difficult and who's, again, it's such someone has to support, obviously, you know, a resource that's going to be able to store tons of data and make it accessible and people are going to have to be continuing to manage it. And so on. It's sort of like, The protein database on steroids. It's just it's great to say it. Doing it is going to be hard and, you know, or supportive of any, you know, activities. But yeah, having all that data stored with. know that the long range interactions are treated well and force fields consistent all these types of things will be hugely useful.

Milosz:

Yeah, unfortunately as a community, we are no Google to have like a big data storage facility in the middle of Nevada or somewhere. But yeah, I think, I think this is like a lot of nice outcomes are going to come out of this. so there's nice convergence of all those things from protocols to, cross supported algorithms and physical terms, right. To databases.

Alex MacKerell:

Exactly. I mean, it's, you know, Google and stuff does amazing things, but they've commercialized parts of their technology, which allows them to do other things. And, you know, this is 1 of the things is like, how do you get the finances to support these and governments will support it to a certain extent for a certain period of time. But that's the big problem is, you know, do you commercialize things? Can you commercialize something like a database like that? That can be useful that someone wants to pay to. You know, get access to the data, you know, commercial interests. And then that can then support maintenance, which makes it accessible to academic people at low or no charge. And that's a difficult model to do for a lot of academic projects and so on.

Milosz:

that's true. I mean, our biggest promise is probably that we will deliver rationally designed medicines of the future, right? But that doesn't materialize at the rate that we would like to see uh,

Alex MacKerell:

never fast enough,

Milosz:

it's a tough sell for the government.

Alex MacKerell:

but then you have the materials world and all these places. It's like, can we sort of add more to that? And like you said, it's just like, mix that in with the, you know, all the AI developments and allow because AI becomes, you know, it's a data hog and, you know, can physical models, large trajectory, right? Yeah. Yeah. Be the source for AI development and so on. And yeah,

Milosz:

exciting to think that we'll have all this data, as you say, uh, to improve. Yeah to repeat what you say, we probably don't need exciting physics there, right? We can just use data to improve the physics that we have in ways that we understand. And

Alex MacKerell:

what I like to say. It's, you know, and you can do a lot with small physics. I mean, again, the Lennard Jones is six 12, you know, people beat on it a lot, but it's sort of taken us pretty far

Milosz:

that's a great takeaway message, I think. So, Alex McCurl, thanks so much for the conversation.

Alex MacKerell:

and I enjoy myself. Thank you for the opportunity.

Milosz:

Thanks a lot and have a great day. Thank you for listening. See you in the next episode of Face Space Invaders.