Phase Space Invaders (ψ)

Episode 13 - Daniel Zuckerman: Trajectory ensembles, writing books, and learning biology through physics

Miłosz Wieczór Season 2 Episode 13

Send us a text

In the thirteenth episode, Daniel Zuckerman and I talk about textbooks on statistical biophysics and a physics-based vision of biology, a few of which he himself authored. Daniel reveals that his passion for clarity in writing comes from his early humanities background, and makes a case for how well-thought and physically motivated narratives can unlock profound insights into the inner workings of biology. Then, we move on to discuss the physical theory behind trajectory ensembles, Daniel's preferred lens through which to view statistical mechanics. We dissect the strengths and limitations of this approach, and let ourselves speculate about its future evolution.

Welcome to the phase space invaders podcast. Where we explore the future of computational biology and biophysics by interviewing researchers working on exciting transformative ideas. Today, I'm talking to Daniel Zuckerman a professor of biomedical engineering at Oregon health and science university. School of medicine. Daniel is known not only for his contributions to the foundations of statistical molecule, bio physics. And has meant methodological developments. But also as an avid popularizer of a deeply physical view of biological processes. several books he wrote as well as his statistical biophysics blog. I myself have to credit to his clear writing style and the unique trajectory-centric perspective of biology for providing me with early insights into how to think about molecular simulations. So we started by considering Daniel's motivations behind sharing case ideas in a written form. Which he traces back to his past background in the humanities. At truly impressive case of a career switch. We discussed the general landscape of statistical biophysics book writing. And how I physics informed narrative can unlock the understanding of biology in non-trivial ways. Daniel Dan shares his personal take on what he sees as the major scientific issues our community has struggled with. Finally, we move on to discuss the current state of the trajectory based methods in biophysics. Most notably in the weighted ensemble simulations championed by Daniel's group. How they enable you and your insights into non-equilibrium processes, what their limitations are and what can be done to improve their performance, using the latest developments in the field. I'm sure you will enjoy our conversation. So let's get started.

milosz-host740_1_06-12-2024_202124:

Daniel Zuckerman, welcome to the podcast.

squadcaster-b7d3_1_06-12-2024_112124:

Oh, thank you. I was very glad to be here.

milosz-host740_1_06-12-2024_202124:

So I remember how big of an impact reading your book, statistical physics of biomolecules had on my formation in molecular simulations had always been one of the first books I'd recommend to students who want to start working with simulations, due to the clarity, but also accessibility for someone without an explicit background in, uh, say mathematical physics. And then I came across your blog where you share your reflections and the exercises and then your online books, which are always in an open format for everyone to read. And I was again struck that you always somehow managed to sneak in a non trivial insight, whether that's a metaphor, plot, or just a phrase, you know, into something I thought 50 times. And so I was thinking, what's this story that got you, to work on, you know, so generously sharing all these resources with the community

squadcaster-b7d3_1_06-12-2024_112124:

oh, well, thank you for the kind words. It's, you know, it's nice to hear that. But so, I think there's, there's probably a few factors that contribute to my, weird penchant for writing, these types of materials. So. One of them, of course, stems from teaching and you start, you're going to do a class and you look around for that perfect textbook and then you, you don't find it. and you wish it were like this, that and the other. And then, so I think there's a lot of folks who have had that experience and developed some notes and so forth. And that was the origin of the book, that, that you mentioned. Certainly it's a lot of work to turn notes into a book but, Ultimately, I think that's something that was that was kind of I see that as a fun task in itself is kind of solving a problem in pedagogy, right? What's the right path from here to there? I would say, another aspect that, that probably a lot of people don't know about me is that, um, although I did a lot of math and science in high school, I actually kind of gave it up altogether for a number of years. And, I basically had to go back to school, uh, as an undergraduate studying math and physics. And so I had this sort of, as a young adult, we'll say, experience of, of really starting. From very little and I remember very clearly not understanding basic things and people around me knew this jargon. And so I think I, just, uh, always had a little bit of a sense of myself as an outsider as a learner. And I I thought that perspective would be helpful for people. I really wanted to avoid the kind of maximalist physics approach where you sort of like sometimes, well, okay, here's the general equation. And then, oh yeah, if you're lucky, I'll show you a few special cases that you can understand. I always felt that the right way was the opposite to start from the simplest instances and then show how you, you reach the general. And then the last thing, which is also probably people, maybe especially in Europe, aren't aware of I would say in the U. S. There's kind of three tracks or at least three tracks of institutions where a researcher might find themselves. There's the classical one is the one where people, faculty are engaged in teaching approximately 50 percent and they're expected to do, you know, excellent research. And there are other folks who are sort of in primarily teaching institutions, and I'm in the third category of medical schools, and I've There's a reason for that and a story behind that. But in medical schools, your requirements for teaching are much less. But I had this kind of you know, I like doing it and I, and I wanted to express myself. So ironically, it was because I didn't have to spend time in the classroom, I think doing the traditional physics, chemistry curriculum, but it felt a little more freedom to develop these materials. And I was fortunate to get support from, you know, granting agencies, to give me time to do that. And so, That's a long answer to your question.

milosz-host740_1_06-12-2024_202124:

Okay it wasn't just scribbling in your free time. It was actually support, for this particular purpose, right? That's, that's

squadcaster-b7d3_1_06-12-2024_112124:

Although not for my because it wasn't really in the interest of the School of Medicine in which I was primarily working when I wrote, which was the University of Pittsburgh, but it's because I had a grant from the National Science Foundation then I'm in control of my time. So I, and I did not have the time commitments to, to classroom teaching as intensively as, as many of my peers. Yeah.

milosz-host740_1_06-12-2024_202124:

That's interesting. Also interesting because, well, most of the people I interview are actually physicists by training and then, you know, they turn into biology. I understand that you came to, biophysics more from the biological side

squadcaster-b7d3_1_06-12-2024_112124:

no, I was a, um so I do have a, my PhD is in physics, but I, as an undergraduate, I actually studied humanities,

milosz-host740_1_06-12-2024_202124:

Oh, I see.

squadcaster-b7d3_1_06-12-2024_112124:

and so, and partly, I guess you don't do that if you don't enjoy writing to some degree. So that's when I said I was really away from science almost completely for a few years and had to go back. So I've learned biology as a physicist and that's another thing that I've had a lot of time to think about. How do we as physicists try to, you understand, simplify, organize, you know, what in a biology textbook could easily be more than a thousand pages. What are really the principles of that? And try to think about those questions.

milosz-host740_1_06-12-2024_202124:

Right deep way to, to ask this. I sympathize there because I, as a freshman, I spent my Fridays studying literature. So.

squadcaster-b7d3_1_06-12-2024_112124:

Very good.

milosz-host740_1_06-12-2024_202124:

a lot of people from sciences, you know, actually cover a lot of humanities. It's also a trend. But back to books, there is this idea that some fields have really, really classical books that all the students have to go through. And you mentioned that, you know, we as a field, say, statistical, biophysics, We don't really have classical books. Of course, your book is a very, important attempt summarizing those basic principles. And, great at conveying those concepts. But, I don't see anything on the level of, say, Albert or Stryer biochemistry do you think it's a worthwhile enterprise for someone to try to come up with such a book for our field?

squadcaster-b7d3_1_06-12-2024_112124:

Well, I do think, different perspectives are very valuable. And so people who, you know, read, have that feeling when they read through the books, like That there's, hey, there's something really important missing, whether it's content or pedagogical perspective, then I would encourage them to kind of go for it. I would, you know, when you talk about classical books, so you did mention Alberts. And I think that Alberts is amongst the cell biology books, really the outstanding one, because I don't know which of the authors, but at least one of them was attuned to physical principles. You see, they go in and they try to explain free energy to you and things like that. So if you look through it, you will find in little references and in some of the, in many of the figures, you know, so this notion of like somehow free energy is driving these processes and so forth, it's not the main theme of the book. Um, so that's maybe something that's not as ideal. I do have some new chapters, thinking about more focused on non equilibrium processes, uh, in the cell, but again, through simple examples, and that's a free. Online book in the Open Science Foundation. but in like, I mean, I think one of the reasons, that I wrote my book as well, there were really outstanding statistical mechanics books. So, what examples that come to mind might be Macquarie or David Chandler's book. They weren't really suited to beginners and more and more in the classroom that I was seeing, and I think many other faculty see that you get students from a range of background who are interested in this field. It's such an exciting field, right? But if you don't have the undergraduate degree in physics, some of those. Books are really going to be, tough going. so that, that was sort of the motivation again for starting simple. In my case,

milosz-host740_1_06-12-2024_202124:

Right. Definitely. And, we can also see that non equilibrium statistical mechanics is becoming more and more you know, important for the field. So it's also a very timely, question of like providing good background. I think your, focus on trajectory ensembles that we'll get to in a moment. It's a really nice starting point for, for this whole bunch of questions.

squadcaster-b7d3_1_06-12-2024_112124:

so I was going to make a comment, that we, from the physical sciences, see ourselves. of at a disadvantage potentially learning biology, and that's true to a large extent because there is a just a broad base of knowledge and facts that we don't have. On the other hand, there are certain things that we know that are helpful and fundamental. To biology, and you can't understand biology fully without them. And, just to, you know, I think free energy is always going to be a good one, right? And it's it's the energy that's driving processes in a single direction, right? The cell is not a network of random events. It's a very highly orchestrated sequence of things that are happening that are driven by free energy. So every process that matters in the cell is driven. And one example that I think is really important and that as a, you know, physical scientist, a physicist, you can be proud that you'll be, understand it, much quicker than your biological concepts is the idea of. proofreading or error correction. And that's something that's involved in all the information encoding and transfer. So from DNA to RNA to protein, um, and basically the brief version is that the cell uses free energy. It seems to waste free energy, but it basically, does it for error correction. So it slows down its processes. It basically does them more than once, just to be sure that the information is being, usually just replicated with high, high fidelity. And it's if you can make a quite simple physics model for it. and, that's something that John Hopfield did back in the 70s, and for which he should certainly win a Nobel Prize. But, that model will be straightforward to understand for physicists. I think for biologists, it's a, it's a heavier lift, but it's fundamental. We wouldn't be here having this conversation cells had not evolved this way to exploit free energy in these ways. It basically is like paying for, for information. so I do think physicists should not be shy about trying to learn biology both on biology's terms, but also on physics terms.

milosz-host740_1_06-12-2024_202124:

Yeah, that's a great point because I think the people who are in biochemistry are at some point struck by, you know, how wasteful all this is, how many cases you can find where, you know, the efficiency of something is percent or whatever. But. If you want to have really, really high efficiency, you also slow things down because all the driving forces are suddenly slower in terms of kinetics, right? So there's a balance to be had there perhaps.

squadcaster-b7d3_1_06-12-2024_112124:

Yes. Well, I mean, the cell is definitely optimizing things, but sometimes in obvious ways, but sometimes in, in ways that we can understand that just ultimately kind of contribute to, you know, survival And it's, you know, there are a lot of fascinating issues that, that are, that need to be explored.

milosz-host740_1_06-12-2024_202124:

of can you even reverse engineer whatever is being optimized, right? Maybe it's not a single function that can be named or written down.

squadcaster-b7d3_1_06-12-2024_112124:

No, that's right. Um, and, and that's, that those are really interesting questions.

milosz-host740_1_06-12-2024_202124:

Right. thing that strikes me in your writing is, you know, is the honesty and the kind of criticism of some common practices, you know, in the field. And by sharp, I don't mean rude. I'm just meaning, you know, really being honest about what we scientists tend to do, why we shouldn't do that and so on. Do you have your favorite? Complain or your favorite rant about what people and should reconsider not doing.

squadcaster-b7d3_1_06-12-2024_112124:

Well, so in the spirit of not being rude, let me say that I think the field You know, the evolution of computational molecular biophysics is a, you know, it's a slow process and it happens, you know, it's been happening over decades, but I do see things definitely moving in a positive direction. direction I see, in molecular simulation, people worrying more about sampling. So the simplest thing you can do is do a few replicas of whatever method you're doing, whether it's vanilla MD or, or your favorite fancy, algorithm, just do it a few times and you will see, uh, how your, how your data are, are reliable. Um, You know, and I think that's, that's a big change from the early days and something that, that turned me off in those early days when, people were doing molecular dynamics. You know, they can only afford to do one trajectory usually or, you know that's what they did. And so kind of was a bit of storytelling, trying to build a story rather than having solid statistical evidence. And, you know, you can guess what I think is going to lead to, you know, more reliable predictions so I just really urge people again to, to consider, you know, and you don't have to form a, do a statistical test, but do replicas, consider the issue of whether your system, you know, whether you can really say your system's in equilibrium. Probably not. Most things are relaxing from wherever you started them towards the steady state, which may be equilibrium, or it may be a non equilibrium steady state. But when you have those perspectives in mind, I think you're going to tend to do better science. And I think more people are aware of those issues and bringing them into their papers now, I think Another general trend which has not gone away and which I i'm not the biggest fan of is like well Obviously our computing power Increases every year. How do we use it? So you could kind of think small and say well, I want to do Something simpler and do it more carefully or you can say I have oh look at what I have at my you know, I have this great allocation I'm going to just do this giant system. And there's definitely a place for doing giant systems and particularly in technology development. But I think we need to be cautious because again, I think real biophysical conclusions are going to depend on sampling. We don't want to, you know, commit to Things that are essentially artifacts of where we started our simulation. so I, I do urge people to think small in some sense. And I want to say my favorite MD paper,, is 15 years ago when David Shaw's group studied the little, most boring protein of all time, B P T I, which is, it's not even like a machine. Its job is just to be an inhibitor. It's this kind of, but if you look at that trajectory, it's a millisecond long, but after hundreds of microseconds, suddenly new things occur. And they're, you know, they're not necessarily major populations, but still a few percent, you know, so you have to, So just be humble, bear in mind the things that that you don't know, and I think you can stay on the right path.

milosz-host740_1_06-12-2024_202124:

It's always hard to convey you know, 1 percent population of something might actually contribute a lot to, experimental observable you're measuring. Bats. some of them it does. And then, yeah, there's also definitely the field of things that were simulated for the first time, right? There's this kind of,, huge machines that someone put for 100 milliseconds and okay, you can write that someone simulated a 30 atom model. Now it's done. That's,

squadcaster-b7d3_1_06-12-2024_112124:

And again, there's a, there's a place for those and we need to push the limits and we,

milosz-host740_1_06-12-2024_202124:

Absolutely.

squadcaster-b7d3_1_06-12-2024_112124:

it's so not everyone needs to do the same thing. Not everyone should be doing science the way I do it, or, you know, and I think that's another important message like you can find your own way. But some of these things, you know it's a question of how you. convey to the community and to the public, you know, the value of your findings and, and making sure you believe in them. Yeah.

milosz-host740_1_06-12-2024_202124:

It's not going to be informative, but it's going to be probably groundbreaking also in hardware terms, right? Just to make sure that we have the hardware and all the other ways that we need to, to store those things. Like that's as whole, as you say, it's a whole separate story.

squadcaster-b7d3_1_06-12-2024_112124:

Right. And another example would be, well, maybe there's a pandemic and maybe there's a particular system that's worth studying with everything we've got in any way we can think of, you know, so there's, there's always, there's circumstances and, and I don't want to make a blanket. So,

milosz-host740_1_06-12-2024_202124:

Right, right okay, that kind of brings us to the question of trajectories and because you are known for being a vocal supporter of the trajectory on sample picture of integrating simulation data and also co developer one of the major weighted ensemble codes? These are say Not the mainstream approaches to determining free energies and, non Equilibrium properties, but kind of very promising and, as you make it a claim, very natural in certain settings. How do you see the state of this branch of statistical physics now? are we and where is it going? I

squadcaster-b7d3_1_06-12-2024_112124:

a couple of admissions to start with that, you know, a trajectory based methods such as weighted ensemble is not. going to be best for calculating everything, even though in principle it, it might be possible. So for example, a free energy profile, maybe there's going to be better tools. I think a lot of those tools sometimes might be overconfidently interpreted, but nevertheless, they should be better than weight ensemble. And, uh, it's also true that, you know I wrote a blog post about this. I once had a guy that I know pretty well over the years. He came up to me at a conference and I, we were talking and he kind of whispers to me, he says, well, we tried, we tried weighted ensemble and it didn't work, you know, for, and I was like, Well, it doesn't work for everything, you know, and there's a couple of parts that some of their fundamental limitations to, to any method in an ensemble method, it's going to based on kind of the minimal trajectory length you need and how many copies of that trajectory. So that's an absolute kind of minimum on your computing time. You can never do that. And then there's cases where you don't have the right setup but let me, let me go back to, you know, what I think you were the heart of your question, um, in terms of trajectories as a tool for studying things and for understanding them, is it natural? So I do think from a pedagogical point of view that trajectories are just wonderful because you know, we can all do molecular dynamics. We all, I think, have a pretty firm grasp of what's going on. And that's really the, the thing that you need, because the, you know, the trajectory ensemble, which starts to sound a little Intimidating, especially if I use the word path integral or something like that. Well, but if it's just multiple instances of molecular dynamics, suddenly it's much more approachable, right? So you know, I can again use jargon and talk about a Fokker Planck description of the probability distribution, and then, you know, I'm sure there's a lot of blank, faces, but then, If you just say, Well, I have multiple instances of my trajectories, and I just look at the distribution at time zero at at 10 nanoseconds at 20 nanoseconds, and I watch how that distribution changes. Well, that's the same thing that that is the Fokker Planck characterization of that system. And so, in the trajectory ensemble, I feel if you really take it seriously. lets you, learn the concepts of physics in a natural way, in a way that connects non equilibrium properties to equilibrium. As I said before, you may start from an initial distribution that's a single configuration. It's going to relax over time into equilibrium. If those are the boundary conditions you have, and in terms of a method in terms of calculating things, most methods have some kind of assumptions built in. And those assumptions can come back to bite you. Um, weighted ensemble it, you know, is relies on the fundamental non equilibrium physics. So you're watching that system as it does that. It's process. Um, you have continuous pathways that you can visualize the general weakness of weighted ensemble is that, you know, sampling is hard in every case and you have to watch your statistics. So we try to do the same thing that I was, expounding before, namely to do multiple instances of the whole weighted ensemble. And then you get a sense of where, where you are in terms of fluctuations. And we have lots of, Things that we do to try to analyze the data and understand the uncertainty. That's a whole more technical topic but it's a very, I think it's, it's the most powerful conceptual framework. It lets you understand. Many other things and it also is the underpinning for this, sort of class of algorithms that have proven very powerful. So that's why I still love it. So many years later. Right? And just to be clear, I think you hinted at this, but I did not invent the weighted ensemble. Um, in fact, the inventors of the weighted ensemble method didn't invent it because, it was published in 96 by, Huber and Kim, But you can trace back the idea of replicating trajectories back to, work done probably in the 40s and 50s and maybe in secret in the Los Alamos group. and it's attributed the idea of these sort of splitting these trajectories is attributed to Johnny von Neumann. So it's, it's, uh, there's a history. Yeah,

milosz-host740_1_06-12-2024_202124:

in our fields can be traced back to von Neumann or Shannon for the matter., No, I gotta say, I absolutely love the idea of Westpan. I enjoy it. there was even a sort of excitement, you know, watching the statistics of the trajectories, I gotta say. because just to give a bit of background to the people who, who are not familiar, it's basically relies on splitting duplicating the systems along a reaction coordinate in a binned manner, right? So you see the progression of your multiple trajectories along your reaction coordinate. And then, the ones that are feeling the missing bins are duplicated. And the ones that maybe are overpopulating crowded bins are killed. And yeah, there's a real beauty to this. I was just going to do you think this whole enterprise of splitting and killing trajectories, do you think it can benefit from modern, say, machine learning techniques? Because this looks like a great match for enhancing this sampling process. I know that principle, it is an evolutionary machine learning method, but can there be. of complexity put on top of that to those methods work in cases where they're really hard to to make work

squadcaster-b7d3_1_06-12-2024_112124:

so, the answer is definitely yes. And I think I can only, I can speak to a few, some things that I'm familiar with. So certainly I mentioned before there's implementation and weighted ensemble and there's a lot of things that the user has to decide. And most notably that what, what we call progress coordinate, which doesn't have to be a reaction coordinate, but the, closer it is to a good reaction coordinate probably, the better you'll do. So certainly there are many machine learning methods that have been published on developing good, um, reaction coordinates. So any of those in principle could be used. There's, folks who are working on that in a weighted ensemble context. in my own work. And I want to give credit to my collaborator, David Aristoff, who's an applied mathematician. Uh, we're trying to look at, this issue in terms of, minimizing statistical uncertainty and in observables that we want to estimate. We've been focusing on rate constants, not surprisingly. And so you can ask the question. If I want to minimize the variance in a certain quantity, how should I design the, the bins that you refer to? Where should they be in configuration space and how many trajectories should be allocated to each? so the answer to that question, so, so I mentioned the word progress coordinate reaction coordinate. Now we see many times in literature that the is the kind of a perfect reaction coordinate. And, I agree it's an excellent coordinate to consider. It, it, it's not necessarily the answer to everything. And in particular, it's not the answer to this question because, in our case, it turns out that there's another observable, which is different but related, uh, is the local mean first passage time. So if you think about it from every point in configuration space, if you have a target state, you can say what's the average time it would take me to get there. It turns out that coordinate, is most important for, minimizing variance in your estimate of, guess what, of the mean first passage time itself, which is basically the reciprocal rate constant. And you can show in some simple toy models that the committer. And that local me first passage time can be quite different. So now you said machine learning and maybe you think I lost that thread and maybe I did. But, uh, in fact, what you have to do is you have to do initial simulation, right? To learn basically a model of your dynamics. And we, you know, so the simplest example would be a kind of Markov model. Um, and then you can use that to estimate, you know, this local mean first passage time and so forth to do the optimization now, another thing that's sort of a more generic machine learning approach would be learning kind of, you know, continuous, representations, basically regression models, but they could be quite fancy of the coordinate that you think is important or you know is important. So it could be a committer and people are working on that and they working on in the, for some other path sampling methods or this local mean first passage time, you could imagine doing the same thing. We're, so we're still doing, if you will, kind of baby level machine learning but definitely there's a lot of room for, for improvement.

milosz-host740_1_06-12-2024_202124:

Okay. That's great for future inspiration for people who are listening, hopefully., Okay, Daniel thank you so much for the conversation for a lot of great points. Before I say goodbye, uh, let me just again, recommend your blog the Statistical Biophysics blog our listeners, as well as all the books that are, I believe linked there. And, uh, yeah, I really appreciate your time.

squadcaster-b7d3_1_06-12-2024_112124:

Yeah. Uh, let me say thank you and, uh, how much I, I admire, what you're doing. It's a great thing for the community. So, so keep it going. Thank you.

milosz-host740_1_06-12-2024_202124:

Thanks a lot as well, for the kind words. Hope you have a great day.

squadcaster-b7d3_1_06-12-2024_112124:

Okay. Bye now.

Thank you for listening. See you in the next episode of Face Space Invaders.