Dr. Oliver Hinder: Gradient Descent for Solving Linear Programs Artwork

Optimizing You

We discuss all things optimization. Through interviewing professors and practitioners in fields like Operations Research and Industrial Engineering, we show a variety of perspectives on what it is like to study optimization, what it's like to get a PhD in a field related to optimization, and why it's important to study optimization.

All Episodes

Optimizing You

Dr. Oliver Hinder: Gradient Descent for Solving Linear Programs

June 18, 2022 • Anthony • Season 1 • Episode 6

Send us a text

Oliver Hinder is an Assistant Professor in Industrial Engineering Department at University of Pittsburgh. Before that he was a visiting post-doc at Google in the Optimization and Algorithms group in New York and received his PhD in 2019 in Management Science and Engineering from Stanford working with professor Yinyu Ye. He studies local optimization, gradient descent, both convex and nonconvex problems, etc.

We chat about Oliver moving to the U.S. from New Zealand to start his PhD at Stanford; we talk about some of his recent work on gradient descent methods for solving LPs accurately and how using restarts can benefit algorithms like these. Finally, we touch on automated parameter tuning in ML especially in Deep Learning which is being widely used in many applications.

Check it out!

0:00

Hi there, and thanks for tuning in to optimizing you the podcast where we talk about all things optimization. Interested in studying optimization, interested in applications of optimization. Interested in hearing from current PhD students, faculty, and practitioners about what they're up to with optimization. Then listen on my friend. Hi everyone, Welcome back to optimizing you. This is Anthony here. And today I have the pleasure of speaking with oliver hinder, who is an assistant professor in Industrial Engineering at the University of Pittsburgh. Our first you Pitt professor chat. And before that, he was a visiting post-doc at Google in the optimization and algorithms group in New York. And before that received his PhD in 2019 in Management Science and Engineering from Stanford, working with Professor Ian UUA. And he does a lot of work on local optimization and gradient descent, machine-learning and all these good things. So I'm excited to talk to him today. Oliver, what's up? Not too much. Thanks for having me, Anthony. Yeah. How's your morning going? Yeah. It's it's pretty good. Pretty good. Yeah. I can't complain. Nice. So usually in these sessions we talk a little bit about you and a little bit about your research. So first, we can talk a little about you. So can you just tell us where you're from? You have a bit of an accent possibly. And how you got here? Yeah, unfortunately, my grew up in New Zealand. Unfortunately, some of my friends from New Zealand have said that I've been losing my accent and it's slowly becoming more and more Americanized. But I think at least to Americans, it still sounds like I have an accent. Yeah. I agree. Yeah, I grew up in New Zealand. One thing that was kinda funny about if we compare New Zealand and the states is to my wife as American. And when I talked to her, just comparing our experiences growing up, one thing I really see that it's very, very different is just like it seems like you guys don't get a lot of breaks at school? We have spring break, summer No, no, no, I don't mean those kinda break. No. I well, that's probably true too. I mean, like during the school day, at least, I don't know if this is your experience. But once you get to high school, it seems like you don't get to spend a lot of time outside playing. Is that accurate? That's very accurate. Illinois, where I grew up, we had one period of gym class. But in some states that's not mandatory and then students don't get outside at all. Yeah, It's crazy. So we would get like maybe 20 or 30 min in the morning to go outside. And then an hour at lunchtime, which was completely outside, like there was no cafeteria or anything like that. Wow. That's like pretty much standard. So that's kind of a fun little difference. So when you got to America for your PhD at Stanford, was that the first time you came here and was there enough outdoor play time for you? Ah, yeah. I think I like I think that was pretty much the first time unless you count on a layover. Yeah. It was very nerve-racking. Honestly. Like accepting a a a PhD position, I guess the United States and really not having been here other than, I guess a brief visit beforehand. Yeah. How did you choose and why did you choose to go to Stanford instead of school, maybe closer to home? Yeah. That's a really good question. I didn't really seriously consider. I'm going to do my doing my PhD and Oakland where I grew up and where I did my undergrad. But basically the professor who encouraged me to do a PhD in the first place basically said to me that, you know, I mean, I think there are benefits of doing a PhD and New Zealand like it would've been shorter and things like that. But if you really wanted to, he said If you really wanted to go into academia and have a good shot at that, you're probably better off in the States. And I think that's probably true. Yeah. Okay. So yeah, you are planning ahead a little bit knowing that yeah, it's phenomenal and you'll get a good job in the States, right? I mean, I Stanford. I think it's also, yeah, I mean, it's interesting because people internationally kind of critique the US PhD sometimes because it's along a PhD then what you find elsewhere in Europe and New Zealand and Australia, it's all three-year. She's basically, or that's the target. Whereas here it's more like five, right? That's the target and all Longer. Yeah. I mean, I did six. But I actually, I think if you're really serious about academia, I think that that is a lot better prep. Because the three if you do a three-year Ph.D. and this is what would have happened if I had gone to the University of Auckland. You come in, your advisor has a project in mind and you do that project and then you're done and it's a big project. And obviously you learned a lot along the way. But ideally and a PhD, what happens is maybe initially your advisor has a lot of guidance on what problem you solve. But hopefully by the end of the PhD, you're really fully taking the initiative and you can explore whatever area you want. And that's really difficult if you only have three years by the time you are at the end of the three-year mark, you've really figured out how to do research, let alone got into the process of how do I create a new idea or that kind of thing? That's really hard. And that, that takes a while to learn, I think. Yeah, I totally agree. I just finished my second year and I do not have a lot of understanding of how to find a good problem and kind of do my own thing. I've kind of thin, needing a lot of help from my advisor for the first couple of years. And then yeah, if it ended next year, I would be maybe a little disappointed in how the program right. So it's good that we have five, I guess. Yeah. And it's a real skill to I think that's a skill you have to constantly improve in terms of creating ideas for papers and then actually also learning how to not just create good ideas, but eliminate quickly your bad ideas, like you think were good. But there's a cognitive bias I think that you have to really get over, which is this idea that I created an idea. Therefore, you, when you create an idea, you think it's a good idea and you don't want to disprove it being a good idea. Yeah. And if you do end up falling into those kind of traps, you can end up spending a really long times on ideas that are really bad. But what you think like one of the biggest skills you kinda have to learn, or at least I think I've honed, is creating idea. And then thinking, oh, this is a great idea. And then as quickly as possible, figuring out why it's a bad idea. Seems like a very useful skill. Does that happen a lot? What do you think the percentages of bad ideas to? Good ideas tend to one. But I think, I think I eliminate eight out of ten within the first day of, of the idea being created. Nice, nice means your creative too. I guess it's, I find it fun to just come up with a bunch of ideas and see what sticks know for sure. I think, I mean this kinda like one of the funnest things about being an academic is just like coming up with ideas. Honestly. Like I think I should probably spend less time coming up with more and more time doing actual work, but it's so much faster to come up with ideas. I totally agree. So maybe we can fast forward through your Ph.D. into when you finished up. I've found your thesis online. It was very nice. You wrote this very long and awesome paper that I thought was well formatted as well. And then you went to Google research. Is that correct? Yeah. And tell us about that. What why did you do that? And what did you get out of that? Yeah. So the reason I went to Google is I really wanted the opportunity to have some practical impact and develop some practical optimization software. That was really, I think the number one motivate that. I do research in optimization. I think it's really easy to either produce just a lot of theory papers, which I do see a lot of value in theory. But if you're all you're doing is producing theory papers and not getting some information about whether your hypotheses that are not correct or not. Because honestly theory is just really a, a hypothesis generation, generation process in some sense, or a mechanism of understanding things. If you're not validating that, then you're not doing anything useful unless you're lucky enough that someone else comes along and validates it. But usually other people don't want to validate yourself. So I think that or you can end up in the situation where you have test your ideas. So you have some, some nice theory, your algorithm. And you test it out on a few different problems. But it's really hard to build a serious implementation that people, people can use. So go into Google. There are some really strong optimization researchers. They either that were really, really good at both optimization and developing software that people didn't use. A fast and reliable and all that kind of good stuff. So yeah, that was really why I was a super excited about that opportunity and still land. Yeah. Nice and cool. You wanted to go there to see the more practical side of your research or try to implement some things and have some real-world impact. That's the word. And did you work? So what, what did you work on? Did you still do? Like local optimization or like kind of gradient descent style algorithms and that kind of thing for one of their big solvers like they've DO OR tools or yeah. What were you working on exactly. Yeah. Yeah. So I was working on a few things, but the main thing I was working on was was gradient-based methods for solving linear programs. The idea there is, you know, what our classical linear programming solvers that basically simplex write an interior point method. And both of those methods, I mean, there are super awesome, but both of those methods have this drawback, which is that they need a factorization of a matrix. And the algorithm, and that's a drawback for a few reasons. One is there are instances where these factorizations are really slow. I mean, of course, in practice I would say 90% of problems, these factorizations is super-fast because these really nice heuristics that create these really nice sauce factorizations. But there really do exist problems with the factorizations can be slowed, not worst-case, slow, like not 0 n cubed that everyone writes down. But what I would say is in practice, like a lot of the time you just see the time for factorization is proportional to the number of non-zeros in the problem, roughly, okay? Most of the time, let's say 70% of the time, 30% of the time, it's a bit worse than that, then the factorizations can be genuinely slow and can cause an issue in terms of just raw runtime. So that's one problem. The second problem is when you want to solve a really big problem, a big linear program. Factorizations are great if you have a single machine with one thread and enough memory to fit your problem, including the factorization, which by the way, it takes more memory typically than the original problem. So if you, if you have that situation, then you're good. But the moment you are problem gets big enough that it can no longer fit in memory, then you're very close to screwed. It's not quiet, true that you're screwed, but I think on a practical and practice, like you have to try to hack around at somehow either by creating some sort of distributed factorization, which is really hard to do. And there's not, I mean, none of the commercial solvers off of that. I think good reasons. Or you have to just develop some sort of costume based algorithm to pack things like kinda divide the problem up and, and kind of hack things together. And the third issue is that it's to do with a single machine having 113 and factorizations work really well if you have one thread. Now, if I tell you, you have a GPU with say, 10,000 clause, which is what we are. I think what modern GPUs have and maybe I'm not quite right on that number. But that number is growing by the way, all the time and people are investing. That's where essentially the growth and compute is it in the moment, like the speed of your CPU has, has just stopped increasing, right? Like we haven't seen a speedup on that basically for a decade and I don't think we really will see much more speedup in a single thread CPU. And so they're huge. Factorizations are don't work basically, you can factorize on a GPU efficiently. So that's kinda the third path that these first-order methods there, they're all based on these matrix vector multiplies instead of factorization. So you just multiply a matrix by a vector and that's really suited for distributed and GPUs. Basically. Both you can even theoretically although require some engineering, distribute that across multiple GPUs if you want it. Yeah, interesting. So, yeah, because we always learn simplex and interior point methods for linear programs. And I never think about really detail of your saying factorizations have these three big issues, maybe more. And there's other ways. Of getting around these using what did you call these, these other methods that use the gradient based methods. So you actually look at your Lagrangian. So the Lagrangian function, right? So just to recap for it, people, what the Lagrangian is. It's like we, we basically take the linear program and we create a primal player and a dual play. And we create a combined objective. And one is trying to minimize that objective and want to try and maximize that objective may like playing a game against each other. And you, these methods, you compute gradients with respect to the primal place. So basically, what direction does the primal play? I want to move in and gradients for the fixed, the dual plants are what direction do I want to move them? And so the computing these gradients just as basically a matrix, that matrix vector multiplies. So you do a ton of them to solve these problems, like way more than you do factorization, suddenly, SharePoint method. But hopefully, hopefully it won't be advantages is they're much cheaper and distribute can be distributed and all these kinds of things. That's very interesting and that's very useful to hear. The other thing that I thought I saw in your research about this particular method was using restarts. Yeah. Does that go hand-in-hand with these gradient methods? Yeah, so that's a really good question. So one, I just want to give it a little bit of motivation for why we were studying these restart schemes fast. And then I'll, then I'll tell you what, what is a restart scheme amazing. So one reason that people were being very critical of first-order methods for linear programming. We're not the first people to do from sort of methods for linear burn. But one reason that there's been a lot of, I think, critique and perhaps reasonably, is that first-order methods. Often Laura, I'd say not always, but often struggle to get high accuracy solution to a problem. So what does that mean? When you solve a linear program? When you use, say, the interior point method or even honestly the simplex method, you don't exactly solve the problem. You have some error. Some particularly you are approximately primal feasible. But when we say approximately primal feasible for an interior point method, that means that the difference between the right hand side and the a times x, right? So you're trying to get AX equals B. That difference or that constraint violation is on the order of, let's say ten to the negative six. So it's a really small, small number. We just say that's feasible, right? I mean, actually, you can construct problems where that's definitely not feasible, but I'm in practice. Most people say that's feasible and then you don't think much more broadly than that. But let's say you have a violation now of ten to the negative one. Let's go do a real extreme. So all your constraints are violated by 0.1. Now, is that a good solution to the problem? I think a lot of people would say probably not. And, but, or I don't know. That doesn't as a user, that's not something you want to deal with, right? Because it's, It's really difficult to think about constraints being approximately satisfied. And what does that mean? And somehow you need to convert that into the problem you care about and see, figure out whether or not that's actually a good solution to problem. Or maybe somehow you need a project, you are a solution onto the feasible set, something like that. So that creates a lot of difficulty. And that's one reason that people have been very critical of Festival methods. So one big push we had was to try and improve their ability to find high accuracy solutions. So we want to actually get sunlight those competitive within an interior point method, an intensive accuracy, or at least really improve how well first-order methods can achieve those kind of solutions. So that's where this, this basically made a technique called restarts comments. And so restarts is actually a technique that's been around for awhile. Probably goes back to the 80s, I'd like to say, at least for optimization, I'm sure it probably, probably the idea has appeared in many other contexts and restarts. Others are the scheme that where you basically run your algorithm for some amount of time. Your algorithm. So it's an iterative algorithm. So what does that mean? You, you take the previous solution and you modify at some of the next solution. But often you have like some sort of say, momentum or something like that. Or in our case there's kind of like we have sort of like a momentum, momentum, I'm kind of thing. Well actually we have, I shouldn't be more precise. What we have is we have like an average interests that we're keeping around more specifically. And that's like R or the thing that in theory, you should be looking at this like the average iterate and then you have the current average iterate is like, I mean the average across like old run, all the iterations of the algorithm. That's the thing that in theory should be converging. Okay, Sorry. That was probably more technical detail than I liked it. I didn't make any sense. I'm sorry, I didn't quite get to the restart pot. Let me get 37 mi. I think we got into a real technical point on what an algorithm is, which is probably not, not super important. So what does the reset? Reset is basically where you run an algorithm for a certain amount of time. And then you stop the algorithm and you feed its output backend as input and run the algorithm again. Interesting, yes. So you run it once, see what happens, and then you use whatever happened. Maybe it wasn't a great solution, but it gave you some information. And then you try again with that new information, with that new solution. So like I think one way to think about this as like as I run my algorithm and closer and closer to the optimal solution. And I'm going to re-initialize the algorithm from this close-up point. And kind of wants. And you keep on doing this by the way, you don't do this once or twice, you do this like a typical run of an algorithm. It will probably do this 50 times or something like that. What changes? What changes is the initialization point keeps on getting closer and closer to the optimal solution. Okay, and then, so why does it run differently than if you just get it? Yeah. Okay. So this is what I was trying to explain them the technical detail, but I think I should have probably explain the reset scheme and then came back to what, what's going on. Cool. The kinda key, key thing here is that the output of the algorithm is not the current iterate, the output of the algorithm and our cases the average iterate. Okay? So then when you re-initialize the algorithm, the current iterate changes. Does that make sense? If we take these gradient steps were like on the current era and so we were moving through the space. And yet you create these spirals and moves in spirals. And you can show it theoretically that the average iterate is better than taking the current literature. Okay, interesting. Although I will say that in practice it's very interesting. I think before our work everyone would take the current era because if you don't restart eventually the current interim beats out the average return. So you think about like a spiral going in, like a spiral eventually starts converging faster than the average of the spiral. But what you wanna do is you wanna do one loop of the spiral. Take the average, and then restart the algorithm and the average ends up being closer to the optimal solution than any of the current points or any of the actual iterates of your algorithm. If you only do one spiral. Does that make more sense? That picture? Yeah, it makes a little sense and I saw it on your website. Actually. People want to look at what Oliver means. There's a nice picture, I think in one of your talks or in some of your slides. Yeah, yeah. I think that's also go to the paper on this. There's a nice picture of this spiral. Yeah. So we went the first people to invent the idea of our restart algorithm. But we were the first to recognize that it will be useful in this particular context, which is for primal-dual algorithms where you're averaging the odorant. Not actually, there's some concurrent work to be fair. I didn't want to take all the credit for work that also figured this out. But yeah, awesome that concurrent work, the first to really realize that for linear programming this, this is really useful. And I think we were also, we also really not just theoretically demonstrated that, but practically demonstrated it in a very rigorous way. Oh cool. You had some experiments as well. Oh, yeah. I mean, we ran like three different LP test sets. Eats sitting with probably 200 problem or something like that. That's okay. Some of them varied in size 250 problems. And it basically checked out on all those, those datasets that it was consistently improving the well, actually even at low accuracy, it was giving small improvements, but at high accuracy it really improves the ability of coastal methods to find i accuracy solutions. I think we got, Don't quote me on this, but I think we got maybe like a 20% improvement in the number of problem-solve. So something like from, I want to say like 70% of the problem-solve too late. No, actually that's, that's not the random of something like 60% to 80 per cent of problem-solved within the timeframe that we were running. Check the paper for the actual figures. I think I think I'm not getting them quite right, but it was a really, really big improvements. That's awesome. Where's your research going now? Yeah, so 11 area that I'm quite interested in right now is, is trying to make machine-learning more user-friendly basically. So what do I actually mean by that? I mean a very specific thing. I'm trying to I'm trying to reduce the number of parameters you need to tune when you, when you train a machine learning model, ideally a deep learning model. That's what I really care about, but I'll take gains and other problems like linear regression, at least as stepping stones to what I want. I'd like to get to can you tell us what deep learning means and why you're interested in it? Sure, yeah, I can, I can tell you a little bit about what deep learning means. So deep learning actually is just a fancy word for an artificial neural network. So the reason the word deep learning came about is people who were building these artificial neural networks. And they said, let's make them deeper. So this kind of one's depth mean in this context, basically, it means a neural network. You have layers. The, each layer is basically a matrix-vector multiplication and then you apply an activation function is supposed to kinda model of the brain. And a very devious sense. Biologically inspired people say, but I think in reality it's, it's, we just needed something really non-linear to do nasty deal with nasty functions. And this happened to work really well. Where was I going? So you have basically a matrix. You have an input that comes in which is, let's say an image. And your goal is to say, is it a cat as a dog? Okay? Then essentially you have them. You multiply that cat, which you turn into like a tensor or essentially a three-dimensional matrix. You multiply that through by a matrix and then or cancer and then you apply like activation function, which is like these values. So they are like Kinky functions. You can't say that on the pod. I can. Sorry. I'm just getting like an album. I know how to describe it in everyday because hinges. Hinges. Okay. Well, that's a dating app, but we know what you mean. Keep going. Okay. Anyway, a non-linear function, you basically have a linear operation and then a non-linear function and the pilot, and then you apply another matrix or tensor operation and then a non-linear function. You keep on doing that. And eventually the thing says, that's a cat, that's a dog. That's what eventually this long sequence of operations tells you. And then the key thing is to take a bunch of data and figure out what the major seizes you should be using to do the multiplications. Got it. And like on a really high level, this is, you can think about this as really nasty nonlinear regression. So like typical linear regression, I just have one weight matrix. I multiply it through by the input and then it gives me the output that I'm predicting, right? Of course there's like a big limitation on how much or how complicated a prediction you can make with just a linear function. So we stack a whole lot of non-linear functions and between the linear functions. And still we are attempting to learn those linear things. And that works. And you're saying you said you wanted to help make it easier to use. Yeah. So sounds complicated. Yeah, they're already it is really complicated and it is really hard to use there. Because there are so many decisions you need to make when you're building these things. So you need to decide e.g. the network architecture architecture. So that means how many layers do you have? What are the layers actually look like? Because there's all sorts of different structures you can have. You can have like convolutional neural networks, e.g. you can have dense layers. This is a little jargon, but the point is, is that there are a lot of different decisions that the user can make in terms of the model. And then they can make a whole lot of decisions in terms of like a bunch of, a bunch of other fancy tricks. So regularization is a really big one. So if you're familiar with regularization in regression, it's the same thing, but then we have a whole lot of other bells and whistles over the conventional regularization. So conventional regularization in linear regression, you just add a penalty to the norm of the solution, right? So if it gets too big, you say don't do that, even if it's a good fit, try to pick a norm that smaller, but also it gives a good fit. And deep learning, we have that type of regularization and then you have a whole lot of other types of regularization like data augmentation and playing with the neural network and weird ways and, and all kinds of stuff like that. So there's this like an overwhelming number of choices to make. I mean, I think the big thing with, with parameter tuning when we, when it comes to the learning, you can really screw up and I just want to give you an example of that. So OpenAI published a paper a couple of years ago, which was basically. So actually I should give a little background and kind of what is happening in the deep learning community before I talk about the urban elite am paper. So what is OpenAI? Yeah, I want US Open AI. Good, good, good point. So in deep learning, what people have realized is as you make these models bigger and you add more data, the quality of the production that goes up so that the ability of the model to say, recognize that that's a cat, that's a dog, improves the more data you give it and the larger the model kits or the ability of it to recognize speech like your Elixir or, or, or, or, or, or, or whatever that improve, that improves a lot and consistently the scaling out. So what has really pushed people to do is build these enormous models. Like some of these models, I want to say they have like 300 billion parameters, I think is the right number. 300 billion to the B. Yeah. Maybe you have to look that up on and check. But I think like GPT-3 is something on that, that older. So that's a particular type of deep learning or particular instance of a deep learning model. So anyway, I think that the main point here is that there's been a huge push to train these really, really large models to get better and better. Classifiers. And Open AI, which is kind of a company or a non-profit, I should say, that the works and trying to, on trying to build like open source artificial intelligence technology. And then two years ago they, they trained a model using an enormous amount of compute. And they released the model and everyone started using it and that was great. And then two years later at DeepMind came out and said, look, with the same amount of compute that use, we can, we can train a model that's much, much better. And really all they did wrong was they didn't Tune the step size of their algorithm. Probably like they did. They did try to tune it, but they didn't do it quite right. And that really, really screwed up there, their results. So what's the step size? We're in deep learning or machine learning in general, we're typically running gradient-based methods a little bit like the first-order methods we talked about earlier. And then you compute the gradient of the function and you take a step in that direction with value that the size of that, that the amount you move in that direction as proportional to the step size. And we don't really know how to choose that parameter. So what people typically do is they just try a bunch of different values and see what works best. And the mistake they made is that they were trying to figure out like what was the best model to, to to to train. And they basically kept the step size routine the same for all the different models they tried. Instead of tuning it for every different model. This sounds like a logo. A rookie mistake. Yeah. I guess maybe I shouldn't say that, but kinda makes sense. If you keep changing your model, you should. Yeah, but two reasons they didn't do it. One is essentially a pain to do that because you have to, every time you tune the step size, you need to run all these models like so. If you had tried ten different step sizes, you need to try ten different. You don't need to run the thing ten times, right? So that's ten times more compute. Second thing is there are so many parameters lying around that you have to tune and think about. You kinda have to make trade-offs on which one you should, should return on which one you shouldn't. So that's kind of a defense of them. I mean, they definitely made a mistake there, but, but I think the point is, is even experts make mistakes on tuning these parameters. That shows you how hard it is to tune these parameters, let alone someone who's just taken an introductory deep learning costs. So a big push from me now is I'm really trying to think about like, how, how do we try to reduce this burden of tuning the parameters? How can we just automate it so that no one has to think about at least as many of the parameters as possible like now. And maybe they only focus on a small number of parameters like the model architecture and all the other parameters. You just, You don't even worry about there. All the algorithm just figures it all out for you. That sounds great and it would make it very usable. Yet, even to someone who doesn't know how to do the tuning. Yeah. I mean, that's that's that's ultimately would be the end goal because I think a lot of the uses that would get the most benefit from deep learning are not people that are deep learning experts, but people who have an interesting problem and good data, um, that doesn't, that doesn't correlate at all with deep learning expertise. I would say, yeah, it sounds like deep learning is used in so many applications. And probably by a lot of people who just know it's a good method. And then think, yeah, let's throw it at our field. Yeah, I mean, that's a huge amount of papers like that. That's a good way to get, has been a good way to get a paper in the last few years, is just apply deep learning to your field. Sometimes I think it's a bit too aggressive and people start using deep learning for things for, it doesn't really make sense, but it's, you know, I was actually very skeptical of deep learning. I think early on my PhD. And I've slowly become a convert. And the sense that I don't think it's like these one size fits all solution. But you can't compete with it on unstructured tasks. Basically, it's clearly the best thing to do. If you want to do image recognition. If you want to pass text data, you want to pass speech data. And if you can integrate that with all the other tools that we have, including an operations research. There's so much power down. I think what we can do both from an academic standpoint and also from a practical standpoint. And those problems mining images or texts or any kind of data like that. Like, I mean, if we're going to have self-driving cars, if we're going to have like robotics and healthcare, they're all like doing, you know, having to understand images and texts and everything. So oh, yeah, yeah, absolutely. Yeah. I don't know. One thing I'd be understood I think to see is the future. And I'm sure we'll see this kind of integration of OR techniques with, with, with deep learning so that we can really build systems, can just directly interact with the world and make decisions. I think that that's something I'm really excited to see. It's not something I walk on thought. Maybe one day we can work on it together. Yeah. That's sounds good. Excellent. Well, thanks for coming on. And do you have any other parting words for the listeners? Cottonwoods. Take all advice with a grain of salt. Even that advice. Even that advice. Excellent. Thanks again for coming on Oliver. It was great talking to you. Have a good rest of your day. Thank you. Thanks for having me, Anthony.